Language models (LMs) have been making remarkable strides in understanding and generating human language. Yet, their true potential in problem-solving tasks has been somewhat limited by the reliance on human-generated data. The groundbreaking paper, "Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models", introduces a novel method named Reinforced Self-Training (ReST) that promises to change this landscape. …