BayJarvis: Blogs on reinforced-self-training

paper Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models - 2024-02-06

A key challenge has been improving these models beyond a certain point, especially without the continuous infusion of human-annotated data. A groundbreaking paper by Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, and Quanquan Gu presents an innovative solution: Self-Play Fine-Tuning (SPIN). …

paper Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models - 2023-12-25

Language models (LMs) have been making remarkable strides in understanding and generating human language. Yet, their true potential in problem-solving tasks has been somewhat limited by the reliance on human-generated data. The groundbreaking paper, "Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models", introduces a novel method named Reinforced Self-Training (ReST) that promises to change this landscape. …