Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM
Reasoners With Verifiers
Paper
•
2505.04842
•
Published
•
12
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Paper
•
2505.04588
•
Published
•
65
WebThinker: Empowering Large Reasoning Models with Deep Research
Capability
Paper
•
2504.21776
•
Published
•
59
Agentic Reasoning and Tool Integration for LLMs via Reinforcement
Learning
Paper
•
2505.01441
•
Published
•
39
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You
Think
Paper
•
2504.20708
•
Published
•
23
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
•
2504.13837
•
Published
•
139
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
•
2504.10481
•
Published
•
85
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
•
2504.09643
•
Published
•
34
Toward Evaluative Thinking: Meta Policy Optimization with Evolving
Reward Models
Paper
•
2504.20157
•
Published
•
37
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
•
2504.11536
•
Published
•
63
Inference-Time Scaling for Generalist Reward Modeling
Paper
•
2504.02495
•
Published
•
57
ToolRL: Reward is All Tool Learning Needs
Paper
•
2504.13958
•
Published
•
48
T1: Tool-integrated Self-verification for Test-time Compute Scaling in
Small Language Models
Paper
•
2504.04718
•
Published
•
42
START: Self-taught Reasoner with Tools
Paper
•
2503.04625
•
Published
•
113
RM-R1: Reward Modeling as Reasoning
Paper
•
2505.02387
•
Published
•
79
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in
Large Language Models
Paper
•
2505.02847
•
Published
•
28
Self-Generated In-Context Examples Improve LLM Agents for Sequential
Decision-Making Tasks
Paper
•
2505.00234
•
Published
•
26
Learning to Reason under Off-Policy Guidance
Paper
•
2504.14945
•
Published
•
88