BayJarvis: Blogs on mcts

paper Branching Beyond PPO: How MCTS Sprouts Superior Text Generation - 2023-11-05

We've all been there - diligently using Proximal Policy Optimization (PPO) for text generation, only to wonder if there's more to be extracted from our models. If you've been in this boat, you're in for a treat! A recent paper under review for ICLR 2024 offers some intriguing insights. …