Browse by tag: paper 66 · llm 57 · finetuning 15 · prompt 14 · autonomous-agent 12 · reinforcement-learning 12 · optimization 9 · rlhf 8 · mistral 7 · multi-agent 7 · openai 7 · peft 7 · quantization 7 · transformer 7 · llama2 6 · lora 6 · multistep-reasoning 6 · network-architecture 6 · rag 6 · rlaif 6 · chatgpt 5 · google 5 · huggingface 5 · mixture-of-experts 5 · zephyr 5 · agent 4 · deepmind 4 · recommender 4 · socratic 4 · survey 4 · system2 4 · trl 4 · advertising 3 · anthropic 3 · autogen 3 · continual-learning 3 · dpo 3 · foundation-model 3 · inference 3 · interpretability 3 · ppo 3 · routing 3 · safety 3 · scaling 3 · tenyx 3 · transformers 3 · vision 3 · alignment 2 · alphazero 2 · cnn 2 · cognitive-architecture 2 · compiler 2 · ddpo 2 · diffusion 2 · discretization 2 · dspy 2 · forecasting 2 · gptq 2 · langchain 2 · mamba 2 · meta 2 · mllm 2 · mlm 2 · multi-modal 2 · polydisciplinary 2 · ranking 2 · react 2 · reflexion 2 · reinforced-self-training 2 · rnn 2 · s4 2 · search 2 · sequence-modeling 2 · state-space-model 2 · structured-state-spaces 2 · theory 2 · time-series 2 · adaptive-agent 1 · apple 1 · autogpt 1 · autotrain 1 · benchmark 1 · bradley-terry 1 · chainlit 1 · cicero 1 · clip 1 · coala 1 · code-generation 1 · compositionality 1 · compression 1 · ctransformers 1 · diffusers 1 · diplomacy 1 · em-algorithm 1 · evolutionary-optimization 1 · fair 1 · feature-interactions-learning 1 · galore 1 · gan 1 · gating-network 1 · gemma 1 · geometry 1 · hiformer 1 · in-context-rl 1 · jax 1 · lamarckian-mutation 1 · langevin-dynamics 1 · learning-rate-schedule 1 · legendre-polynomials 1 · llama 1 · llava 1 · low-rank 1 · ludwig 1 · mcts 1 · memgpt 1 · meta-learning 1 · meta-rl 1 · metric-learning 1 · microsoft 1 · mm1 1 · model-merging 1 · moe 1 · multi-task 1 · multihop-retrieval 1 · nework-architecture-search 1 · nvidia 1 · orca 1 · os 1 · paged-attention 1 · patch 1 · plm 1 · preferece-learning 1 · preference-learning 1 · representation-engineering 1 · rest 1 · rest-em 1 · sakana 1 · sampling 1 · signal-propogation-theory 1 · softmax-loss 1 · spline-theory 1 · superposition 1 · toxicity-detection 1 · transparency 1 · unlearning 1 · vllm 1 · wgan 1 · withmartian 1 · world-model 1
Transformer language models like GPT-4 and ChatGPT have demonstrated remarkable capabilities across a wide range of tasks, sparking both admiration and concern about their potential impact. However, a recent paper titled "Faith and Fate: Limits of Transformers on Compositionality" by researchers from Allen Institute for AI, University of Washington, University of Southern California and University of Chicago takes a critical look at the limitations of these models in tasks requiring multi-step compositional reasoning. …
Voyager is the first LLM (Large Language Models) powered embodied lifelong learning agent that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. The agent is designed to operate in the Minecraft environment, a popular open-ended game that offers a rich set of tasks and interactions. …
Reflexion is a novel framework proposed by Shinn et al. for reinforcing language agents through linguistic feedback rather than traditional weight updates. The key idea is to have agents verbally reflect on feedback signals, maintain the reflective text in an episodic memory buffer, and use this to guide better decision making in subsequent trials. …
Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models (LLMs). "Scaling Laws for Fine-Grained Mixture of Experts", Jakub Krajewski, Jan Ludziejewski, and their colleagues from the University of Warsaw and IDEAS NCBR analyze the scaling properties of MoE models, incorporating an expanded range of variables. …
Large Language Models (LLMs) like GPT-4, ChatGPT, and J1-Jumbo have revolutionized natural language processing, enabling unprecedented performance on a wide range of tasks. However, the high cost of querying these LLM APIs is a major barrier to their widespread adoption, especially for high-throughput applications. …
Large language models (LLMs) have demonstrated impressive capabilities across a wide range of applications. However, no single model can optimally address all tasks, especially when considering the trade-off between performance and cost. This has led to the development of LLM routing systems that leverage the strengths of various models. …
Neural networks often exhibit a puzzling phenomenon called "polysemanticity" where many unrelated concepts are packed into a single neuron, making interpretability challenging. This paper provides toy models to understand polysemanticity as a result of models storing additional sparse features in "superposition". Key findings include: …
Cognitive Architectures for Language Agents: A Framework for Building Intelligent Language Models. Large language models (LLMs) have achieved impressive results on many natural language tasks. However, to build truly intelligent agents, we need to equip LLMs with additional capabilities like memory, reasoning, learning, and interacting with the environment. A new paper titled "Cognitive Architectures for Language Agents" proposes a framework called CoALA to guide the development of such language agents. …
Retrieval-Augmented Generation (RAG) has emerged as a promising solution to enhance Large Language Models (LLMs) by incorporating knowledge from external databases. This survey paper provides a comprehensive examination of the progression of RAG paradigms, including Naive RAG, Advanced RAG, and Modular RAG. …
Large Language Models (LLMs) like ChatGPT have transformed numerous fields by leveraging their extensive reasoning and generalization capabilities. However, as the complexity of prompts increases, with techniques like chain-of-thought (CoT) and in-context learning (ICL) becoming more prevalent, the computational demands skyrocket. This paper introduces LLMLingua, a sophisticated prompt compression method designed to mitigate these challenges. By compressing prompts into a more compact form without significant loss of semantic integrity, LLMLingua enables faster inference and reduced computational costs, promising up to 20x compression rates with minimal performance degradation. …
The paper introduces a novel approach to optimize memory usage in serving Large Language Models (LLMs) through a method called PagedAttention, inspired by virtual memory and paging techniques in operating systems. This method addresses the significant memory waste in existing systems due to inefficient handling of key-value (KV) cache memory, which is crucial for the performance of LLMs. …
The field of large language models (LLMs) has witnessed a paradigm shift with the advent of model merging, a novel approach that combines multiple LLMs into a unified architecture without additional training, offering a cost-effective strategy for new model development. This technique has sparked a surge in experimentation due to its potential to democratize the development of foundational models. However, the reliance on human intuition and domain knowledge in model merging has been a limiting factor, calling for a more systematic method to explore new model combinations. …
Training Large Language Models (LLMs) presents significant memory challenges predominantly due to the growing size of weights and optimizer states. While common memory-reduction approaches, such as Low-Rank Adaptation (LoRA), have been employed to mitigate these challenges, they typically underperform training with full-rank weights in both pre-training and fine-tuning stages. This limitation arises because these approaches restrict the parameter search to a low-rank subspace, altering training dynamics and potentially requiring a full-rank warm start. …
A team of researchers has released OpenMoE, a series of open-source Mixture-of-Experts (MoE) based large language models ranging from 650M to 34B parameters. Their work provides valuable insights into training MoE models and analyzing their behavior. Here are some key takeaways: …
Reframing Large Language Models (LLMs) as agents has ushered in a new paradigm of automation. Researchers and practitioners have increasingly been using these models as agents to automate complex tasks using specialized functions. However, integrating useful functions into LLM agents often requires manual effort and extensive iterations, which is time-consuming and inefficient. Inspired by the analogy of humans continuously forging tools to adapt to tasks, this paper introduces a novel approach to train LLM agents by forging their functions, treating them as learnable 'agent parameters', without modifying the LLM weights. This paradigm, termed 'Agent Training', involves updating the agent's functions to maximize task-solving ability, offering a promising avenue for developing specialized LLM agents efficiently. …
Abstract: Large Language Models (LLMs) drive significant advancements in AI, yet understanding their internal workings remains a challenge. This paper introduces a novel geometric perspective to characterize LLMs, offering practical insights into their functionality. By analyzing the intrinsic dimension of Multi-Head Attention (MHA) embeddings and the affine mappings within layer feed-forward networks, we unlock new ways to manipulate and interpret LLMs. Our findings enable bypassing restrictions like RLHF in models such as Llama2, and we introduce seven interpretable spline features extracted from any LLM layer. These features, tested on models like Mistral-7B and Llama2, prove highly effective in toxicity detection, domain inference, and addressing the Jigsaw challenge, showcasing the practical utility of our geometric characterization. …
In this work, we discuss building performant Multimodal Large Language Models (MLLMs). Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons: …
When fine-tuning Large Language Models (LLMs) like GPT-3 or BERT for specific tasks, a common challenge encountered is "forgetting" – where the model loses some of its pre-trained capabilities. This phenomenon is particularly noticeable in Parameter-Efficient Fine-Tuning (PEFT) methods such as Low-Rank Adapters (LoRA). …
Large language models (LLMs) are cornerstone technologies in AI, driving advancements across various fields. However, the traditional approach of re-training LLMs with every new data set is both costly and computationally inefficient. This paper presents a novel approach, focusing on continual pre-training, which allows for the incremental updating of LLMs without the need for full re-training, significantly saving computational resources. …
Low-Rank Adapters (LoRA) have emerged as a popular parameter-efficient fine-tuning method for large language models. By adding trainable low-rank "adapters" to selected layers, LoRA enables effective fine-tuning while dramatically reducing the number of parameters that need to be trained. However, the conventional LoRA method uses a scaling factor for the adapters that divides them by the rank. A new paper by researcher Damjan Kalajdzievski shows that this rank-dependent scaling actually slows down learning and limits performance improvements when using higher-rank adapters. …
The key idea is to reframe RL as a sequence modeling problem, allowing the use of powerful transformer architectures and language modeling advances. …
Multi-label classification problems with thousands of possible classes are extremely challenging, especially when using in-context learning with large language models (LLMs). Demonstrating every possible class in the prompt is infeasible, and LLMs may lack the knowledge to precisely assign the correct labels. …
Pinterest has introduced PinnerFormer, a state-of-the-art sequence modeling approach for learning user representations that power personalized recommendations on their platform. PinnerFormer aims to predict users' long-term engagement with Pins based on their recent actions, enabling Pinterest to surface the most relevant and engaging content to over 400 million monthly users. …
The exponential growth of large language models poses significant challenges in terms of deployment costs and environmental impact due to high energy consumption. In response to these challenges, this paper introduces BitNet, a scalable and stable 1-bit Transformer architecture designed for large language models. By introducing BitLinear as a replacement for the traditional nn.Linear layer, BitNet aims to train with 1-bit weights from scratch, significantly reducing the memory footprint and energy consumption while maintaining competitive performance. …
In the realm of artificial intelligence and machine learning, the quest for creating more immersive and interactive experiences has led to significant advancements. The paper introduces "Genie," a groundbreaking generative model capable of creating interactive environments from unsupervised learning of internet videos. With its 11 billion parameters, Genie represents a new frontier in AI, blending the spatiotemporal dynamics of video with the interactivity of virtual worlds. …
In the realm of Reinforcement Learning (RL), the paper introduces AMAGO, an innovative in-context RL agent designed to tackle the challenges of generalization, long-term memory, and meta-learning. AMAGO utilizes sequence models, specifically Transformers, to learn from entire rollouts in parallel, marking a significant departure from traditional approaches that often require extensive tuning and face scalability issues. …
The realm of artificial intelligence has witnessed a significant breakthrough with the introduction of the SELF-DISCOVER framework, a novel approach that empowers Large Language Models (LLMs) to autonomously uncover and employ intrinsic reasoning structures. This advancement is poised to redefine how AI systems tackle complex reasoning challenges, offering a more efficient and interpretable method compared to traditional prompting techniques. …
In the ever-evolving landscape of artificial intelligence, a groundbreaking development emerges with "Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution." This paper introduces an innovative approach that pushes the boundaries of how Large Language Models (LLMs) can be enhanced, not through manual tweaks but via an evolutionary mechanism that refines the art of prompting itself. …
Google has just introduced Gemma, an innovative family of state-of-the-art open Large Language Models (LLMs), marking a significant stride in the open-source AI landscape. This release, featuring both 7B and 2B parameter models, underscores Google's ongoing commitment to open-source AI. The Hugging Face team is thrilled to support this launch, ensuring seamless integration within our ecosystem. …
The paper "A Decoder-Only Foundation Model for Time-Series Forecasting" introduces a groundbreaking approach in the field of time-series forecasting, leveraging the power of decoder-only models, commonly used in natural language processing, to achieve remarkable zero-shot forecasting capabilities across a variety of domains. …