Usefulness Judge Collection Finetuned judges to evaluate how useful a response is to a prompt • 3 items • Updated 6 days ago
Usefulness Judge Collection Finetuned judges to evaluate how useful a response is to a prompt • 3 items • Updated 6 days ago
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning Paper • 2506.01347 • Published Jun 2, 2025 • 3
PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques Paper • 2401.02122 • Published Jan 4, 2024 • 2
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging Paper • 2407.01470 • Published Jul 1, 2024 • 7