RL - a AlwaysOU Collection

AlwaysOU 's Collections

RL

RL

updated May 13, 2025

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Paper • 2505.04842 • Published May 7, 2025 • 12
ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Paper • 2505.04588 • Published May 7, 2025 • 65
WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Paper • 2504.21776 • Published Apr 30, 2025 • 59
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Paper • 2505.01441 • Published Apr 28, 2025 • 39
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think

Paper • 2504.20708 • Published Apr 29, 2025 • 23
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 139
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85
Iterative Self-Training for Code Generation via Reinforced Re-Ranking

Paper • 2504.09643 • Published Apr 13, 2025 • 34
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

Paper • 2504.20157 • Published Apr 28, 2025 • 37
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15, 2025 • 63
Inference-Time Scaling for Generalist Reward Modeling

Paper • 2504.02495 • Published Apr 3, 2025 • 57
ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16, 2025 • 48
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models

Paper • 2504.04718 • Published Apr 7, 2025 • 42
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6, 2025 • 113
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5, 2025 • 79
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models

Paper • 2505.02847 • Published May 1, 2025 • 28
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks

Paper • 2505.00234 • Published May 1, 2025 • 26
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21, 2025 • 88