Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning Paper • 2509.22601 • Published Sep 26, 2025 • 29
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward Paper • 2509.07430 • Published Sep 9, 2025 • 3
AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification Paper • 2502.11520 • Published Feb 17, 2025
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs Paper • 2502.10454 • Published Feb 12, 2025 • 7
SCP-116K: A High-Quality Problem-Solution Dataset and a Generalized Pipeline for Automated Extraction in the Higher Education Science Domain Paper • 2501.15587 • Published Jan 26, 2025
Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought Paper • 2407.14562 • Published Jul 18, 2024 • 1
RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning Paper • 2509.25958 • Published Sep 30, 2025
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents Paper • 2512.22322 • Published 12 days ago • 38
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models Paper • 2512.24618 • Published 8 days ago • 117
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published 8 days ago • 100
INF-Retriever-v1 Collection LLM-based dense retrieval models for EN & ZH (also effective in other languages) • 3 items • Updated 21 days ago • 4