The Art of Scaling Reinforcement Learning Compute for LLMs Paper • 2510.13786 • Published Oct 15, 2025 • 32
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting Paper • 2510.08696 • Published Oct 9, 2025 • 15
Rethinking Thinking Tokens: LLMs as Improvement Operators Paper • 2510.01123 • Published Oct 1, 2025 • 6
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published Sep 23, 2025 • 23
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails Paper • 2502.05163 • Published Feb 7, 2025 • 22
PILAF: Optimal Human Preference Sampling for Reward Modeling Paper • 2502.04270 • Published Feb 6, 2025 • 12
Teaching Large Language Models to Reason with Reinforcement Learning Paper • 2403.04642 • Published Mar 7, 2024 • 49