BayJarvis Blog

paper AI Agents vs. Agentic AI: Understanding the Evolution of Autonomous Intelligence 2025-06-11

The landscape of artificial intelligence has undergone a dramatic transformation since the release of ChatGPT in November 2022. What began as impressive generative AI capabilities has rapidly evolved into two distinct but related paradigms: AI Agents and Agentic AI. This comprehensive analysis explores these emerging technologies that are reshaping how we think about autonomous intelligence. …

paper StockTime: A Time Series Specialized Large Language Model Architecture for Stock Price Prediction 2025-06-11

StockTime represents a paradigm shift in applying Large Language Models (LLMs) to financial time series prediction. Unlike existing Financial LLMs (FinLLMs) that focus primarily on textual analysis and interpretation, StockTime is specifically designed for stock price time series data. The framework leverages the natural ability of LLMs to predict the next token by treating stock prices as consecutive tokens, while extracting textual information such as stock correlations, statistical trends, and timestamps directly from the stock price data itself. …

paper PiFi: Bridging the Gap Between Small and Large Language Models - A Comprehensive Review 2025-06-11

Paper: Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models Authors: Kyeonghyun Kim¹, Jinhee Jang¹, Juhwan Choi²†, Yoonji Lee¹, Kyohoon Jin³†, YoungBin Kim¹ Affiliations: ¹Chung-Ang University, ²AITRICS, ³DATUMO Published:* June 9, 2025 …

paper Faith and Fate: Limits of Transformers on Compositionality 2024-04-16

Transformer language models like GPT-4 and ChatGPT have demonstrated remarkable capabilities across a wide range of tasks, sparking both admiration and concern about their potential impact. However, a recent paper titled "Faith and Fate: Limits of Transformers on Compositionality" by researchers from Allen Institute for AI, University of Washington, University of Southern California and University of Chicago takes a critical look at the limitations of these models in tasks requiring multi-step compositional reasoning. …

paper Reflexion: Language Agents with Verbal Reinforcement Learning 2024-04-13

Reflexion is a novel framework proposed by Shinn et al. for reinforcing language agents through linguistic feedback rather than traditional weight updates. The key idea is to have agents verbally reflect on feedback signals, maintain the reflective text in an episodic memory buffer, and use this to guide better decision making in subsequent trials. …

paper Voyager: An Open-Ended Embodied Agent with Large Language Models 2024-04-13

Voyager is the first LLM (Large Language Models) powered embodied lifelong learning agent that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. The agent is designed to operate in the Minecraft environment, a popular open-ended game that offers a rich set of tasks and interactions. …

paper Scaling Laws for Fine-Grained Mixture of Experts 2024-04-06

Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models (LLMs). "Scaling Laws for Fine-Grained Mixture of Experts", Jakub Krajewski, Jan Ludziejewski, and their colleagues from the University of Warsaw and IDEAS NCBR analyze the scaling properties of MoE models, incorporating an expanded range of variables. …

paper FrugalGPT: Making Large Language Models Affordable and Efficient 2024-04-04

Large Language Models (LLMs) like GPT-4, ChatGPT, and J1-Jumbo have revolutionized natural language processing, enabling unprecedented performance on a wide range of tasks. However, the high cost of querying these LLM APIs is a major barrier to their widespread adoption, especially for high-throughput applications. …

paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System 2024-04-04

Large language models (LLMs) have demonstrated impressive capabilities across a wide range of applications. However, no single model can optimally address all tasks, especially when considering the trade-off between performance and cost. This has led to the development of LLM routing systems that leverage the strengths of various models. …

paper Toy Models of Superposition 2024-04-03

Neural networks often exhibit a puzzling phenomenon called "polysemanticity" where many unrelated concepts are packed into a single neuron, making interpretability challenging. This paper provides toy models to understand polysemanticity as a result of models storing additional sparse features in "superposition". Key findings include: …

paper Cognitive Architectures for Language Agents 2024-04-01

Cognitive Architectures for Language Agents: A Framework for Building Intelligent Language Models. Large language models (LLMs) have achieved impressive results on many natural language tasks. However, to build truly intelligent agents, we need to equip LLMs with additional capabilities like memory, reasoning, learning, and interacting with the environment. A new paper titled "Cognitive Architectures for Language Agents" proposes a framework called CoALA to guide the development of such language agents. …

paper Retrieval-Augmented Generation for Large Language Models: A Survey 2024-03-31

Retrieval-Augmented Generation (RAG) has emerged as a promising solution to enhance Large Language Models (LLMs) by incorporating knowledge from external databases. This survey paper provides a comprehensive examination of the progression of RAG paradigms, including Naive RAG, Advanced RAG, and Modular RAG. …

paper LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models 2024-03-26

Large Language Models (LLMs) like ChatGPT have transformed numerous fields by leveraging their extensive reasoning and generalization capabilities. However, as the complexity of prompts increases, with techniques like chain-of-thought (CoT) and in-context learning (ICL) becoming more prevalent, the computational demands skyrocket. This paper introduces LLMLingua, a sophisticated prompt compression method designed to mitigate these challenges. By compressing prompts into a more compact form without significant loss of semantic integrity, LLMLingua enables faster inference and reduced computational costs, promising up to 20x compression rates with minimal performance degradation. …

paper Efficient Memory Management for Large Language Model Serving with PagedAttention 2024-03-25

The paper introduces a novel approach to optimize memory usage in serving Large Language Models (LLMs) through a method called PagedAttention, inspired by virtual memory and paging techniques in operating systems. This method addresses the significant memory waste in existing systems due to inefficient handling of key-value (KV) cache memory, which is crucial for the performance of LLMs. …

paper Evolutionary Optimization of Model Merging Recipes 2024-03-24

The field of large language models (LLMs) has witnessed a paradigm shift with the advent of model merging, a novel approach that combines multiple LLMs into a unified architecture without additional training, offering a cost-effective strategy for new model development. This technique has sparked a surge in experimentation due to its potential to democratize the development of foundational models. However, the reliance on human intuition and domain knowledge in model merging has been a limiting factor, calling for a more systematic method to explore new model combinations. …

paper GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection 2024-03-21

Training Large Language Models (LLMs) presents significant memory challenges predominantly due to the growing size of weights and optimizer states. While common memory-reduction approaches, such as Low-Rank Adaptation (LoRA), have been employed to mitigate these challenges, they typically underperform training with full-rank weights in both pre-training and fine-tuning stages. This limitation arises because these approaches restrict the parameter search to a low-rank subspace, altering training dynamics and potentially requiring a full-rank warm start. …

paper OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models 2024-03-20

A team of researchers has released OpenMoE, a series of open-source Mixture-of-Experts (MoE) based large language models ranging from 650M to 34B parameters. Their work provides valuable insights into training MoE models and analyzing their behavior. Here are some key takeaways: …

paper Training Language Model Agents without Modifying Language Models 2024-03-19

Reframing Large Language Models (LLMs) as agents has ushered in a new paradigm of automation. Researchers and practitioners have increasingly been using these models as agents to automate complex tasks using specialized functions. However, integrating useful functions into LLM agents often requires manual effort and extensive iterations, which is time-consuming and inefficient. Inspired by the analogy of humans continuously forging tools to adapt to tasks, this paper introduces a novel approach to train LLM agents by forging their functions, treating them as learnable 'agent parameters', without modifying the LLM weights. This paradigm, termed 'Agent Training', involves updating the agent's functions to maximize task-solving ability, offering a promising avenue for developing specialized LLM agents efficiently. …

paper Characterizing Large Language Models Geometry for Toxicity Detection and Generation 2024-03-18

Abstract: Large Language Models (LLMs) drive significant advancements in AI, yet understanding their internal workings remains a challenge. This paper introduces a novel geometric perspective to characterize LLMs, offering practical insights into their functionality. By analyzing the intrinsic dimension of Multi-Head Attention (MHA) embeddings and the affine mappings within layer feed-forward networks, we unlock new ways to manipulate and interpret LLMs. Our findings enable bypassing restrictions like RLHF in models such as Llama2, and we introduce seven interpretable spline features extracted from any LLM layer. These features, tested on models like Mistral-7B and Llama2, prove highly effective in toxicity detection, domain inference, and addressing the Jigsaw challenge, showcasing the practical utility of our geometric characterization. …

paper MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training 2024-03-17

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons: …

paper Scaling Laws for Forgetting When Fine-Tuning Large Language Models 2024-03-16

When fine-tuning Large Language Models (LLMs) like GPT-3 or BERT for specific tasks, a common challenge encountered is "forgetting" – where the model loses some of its pre-trained capabilities. This phenomenon is particularly noticeable in Parameter-Efficient Fine-Tuning (PEFT) methods such as Low-Rank Adapters (LoRA). …

paper Simple and Scalable Strategies to Continually Pre-train Large Language Models 2024-03-15

Large language models (LLMs) are cornerstone technologies in AI, driving advancements across various fields. However, the traditional approach of re-training LLMs with every new data set is both costly and computationally inefficient. This paper presents a novel approach, focusing on continual pre-training, which allows for the incremental updating of LLMs without the need for full re-training, significantly saving computational resources. …

paper A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA 2024-03-14

Low-Rank Adapters (LoRA) have emerged as a popular parameter-efficient fine-tuning method for large language models. By adding trainable low-rank "adapters" to selected layers, LoRA enables effective fine-tuning while dramatically reducing the number of parameters that need to be trained. However, the conventional LoRA method uses a scaling factor for the adapters that divides them by the rank. A new paper by researcher Damjan Kalajdzievski shows that this rank-dependent scaling actually slows down learning and limits performance improvements when using higher-rank adapters. …

paper Decision Transformer: Reinforcement Learning via Sequence Modeling 2024-03-14

The key idea is to reframe RL as a sequence modeling problem, allowing the use of powerful transformer architectures and language modeling advances. …

paper In-Context Learning for Extreme Multi-Label Classification 2024-03-13

Multi-label classification problems with thousands of possible classes are extremely challenging, especially when using in-context learning with large language models (LLMs). Demonstrating every possible class in the prompt is infeasible, and LLMs may lack the knowledge to precisely assign the correct labels. …

paper PinnerFormer: Sequence Modeling for User Representation at Pinterest 2024-03-11

Pinterest has introduced PinnerFormer, a state-of-the-art sequence modeling approach for learning user representations that power personalized recommendations on their platform. PinnerFormer aims to predict users' long-term engagement with Pins based on their recent actions, enabling Pinterest to surface the most relevant and engaging content to over 400 million monthly users. …

paper BitNet: Scaling 1-bit Transformers for Large Language Models 2024-03-09

The exponential growth of large language models poses significant challenges in terms of deployment costs and environmental impact due to high energy consumption. In response to these challenges, this paper introduces BitNet, a scalable and stable 1-bit Transformer architecture designed for large language models. By introducing BitLinear as a replacement for the traditional nn.Linear layer, BitNet aims to train with 1-bit weights from scratch, significantly reducing the memory footprint and energy consumption while maintaining competitive performance. …

paper Genie: Generative Interactive Environments 2024-02-28

In the realm of artificial intelligence and machine learning, the quest for creating more immersive and interactive experiences has led to significant advancements. The paper introduces "Genie," a groundbreaking generative model capable of creating interactive environments from unsupervised learning of internet videos. With its 11 billion parameters, Genie represents a new frontier in AI, blending the spatiotemporal dynamics of video with the interactivity of virtual worlds. …

paper AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents 2024-02-26

In the realm of Reinforcement Learning (RL), the paper introduces AMAGO, an innovative in-context RL agent designed to tackle the challenges of generalization, long-term memory, and meta-learning. AMAGO utilizes sequence models, specifically Transformers, to learn from entire rollouts in parallel, marking a significant departure from traditional approaches that often require extensive tuning and face scalability issues. …

paper Self-Discover: Large Language Models Self-Compose Reasoning Structures 2024-02-25

The realm of artificial intelligence has witnessed a significant breakthrough with the introduction of the SELF-DISCOVER framework, a novel approach that empowers Large Language Models (LLMs) to autonomously uncover and employ intrinsic reasoning structures. This advancement is poised to redefine how AI systems tackle complex reasoning challenges, offering a more efficient and interpretable method compared to traditional prompting techniques. …

Recent Blogs