BayJarvis: Blogs on galore

paper GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection - 2024-03-21

Training Large Language Models (LLMs) presents significant memory challenges predominantly due to the growing size of weights and optimizer states. While common memory-reduction approaches, such as Low-Rank Adaptation (LoRA), have been employed to mitigate these challenges, they typically underperform training with full-rank weights in both pre-training and fine-tuning stages. This limitation arises because these approaches restrict the parameter search to a low-rank subspace, altering training dynamics and potentially requiring a full-rank warm start. …