Reasoning Papers - a wangbing1416 Collection

wangbing1416 's Collections

Agent

RLHF

Reasoning Papers

Reasoning Papers

updated about 3 hours ago

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published Aug 11, 2025 • 43
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Paper • 2508.07101 • Published Aug 9, 2025 • 14
Compressing Chain-of-Thought in LLMs via Step Entropy

Paper • 2508.03346 • Published Aug 5, 2025 • 8
Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Paper • 2508.08940 • Published Aug 12, 2025 • 27
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Paper • 2508.09726 • Published Aug 13, 2025 • 15
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14, 2025 • 28
Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information

Paper • 2508.11252 • Published Aug 15, 2025 • 3
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21, 2025 • 90
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published Aug 19, 2025 • 118
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Paper • 2508.15868 • Published Aug 21, 2025 • 3
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published Aug 23, 2025 • 23
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24, 2025 • 80
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published Aug 26, 2025 • 16
StepWiser: Stepwise Generative Judges for Wiser Reasoning

Paper • 2508.19229 • Published Aug 26, 2025 • 20
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

Paper • 2509.01363 • Published Sep 1, 2025 • 58
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

Paper • 2509.02522 • Published Sep 2, 2025 • 25
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3, 2025 • 24
Reverse-Engineered Reasoning for Open-Ended Generation

Paper • 2509.06160 • Published Sep 7, 2025 • 150
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9, 2025 • 101
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding

Paper • 2509.06923 • Published Sep 8, 2025 • 22
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning

Paper • 2509.03646 • Published Sep 3, 2025 • 32
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 190
The Majority is not always right: RL training for solution aggregation

Paper • 2509.06870 • Published Sep 8, 2025 • 16
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward

Paper • 2509.07430 • Published Sep 9, 2025 • 3
Reasoning-Aware GRPO using Process Mining

Paper • 2510.25065 • Published Oct 29, 2025 • 42
Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29, 2025 • 221
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning

Paper • 2510.22543 • Published Oct 26, 2025 • 11
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29, 2025 • 45
SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens

Paper • 2510.24940 • Published Oct 28, 2025 • 17
MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models

Paper • 2510.24794 • Published Oct 27, 2025 • 31
Data-Efficient RLVR via Off-Policy Influence Guidance

Paper • 2510.26491 • Published Oct 30, 2025 • 10
Black-Box On-Policy Distillation of Large Language Models

Paper • 2511.10643 • Published Nov 13, 2025 • 49
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Paper • 2511.08577 • Published Nov 11, 2025 • 105
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Paper • 2511.22570 • Published Nov 27, 2025 • 85
REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

Paper • 2511.20233 • Published Nov 25, 2025 • 2
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

Paper • 2512.05033 • Published about 1 month ago • 15
LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning

Paper • 2512.05325 • Published about 1 month ago • 2
Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision

Paper • 2512.15489 • Published 18 days ago • 6
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Paper • 2512.23988 • Published 5 days ago • 12