BayJarvis: Blogs on ddpo

paper Diffusion Models for Reinforcement Learning: A Survey - 2023-12-13

In the ever-evolving landscape of machine learning, diffusion models have marked their territory as a groundbreaking class of generative models. The paper "Diffusion Models for Reinforcement Learning: A Survey" delves into how these models are revolutionizing reinforcement learning (RL). This blog aims to unpack the crux of the paper, highlighting how diffusion models are addressing long-standing challenges in RL and paving the way for future innovations. …

llm Harnessing Zephyr's Breeze: DPO Training on Mistral-7B-GPTQ for Language Model Alignment - 2023-11-09

We've taken on the exciting challenge of implementing the cutting-edge strategies presented in "ZEPHYR: Direct Distillation of LM Alignment". This paper's approach is not just theoretical—it's a blueprint for a significant leap in language model training. By adopting ZEPHYR's distilled direct preference optimization (dDPO), we've embarked on a code journey that brings these innovations from concept to reality. …