-
OpenAI o1 System Card
Paper • 2412.16720 • Published • 36 -
LearnLM: Improving Gemini for Learning
Paper • 2412.16429 • Published • 22 -
NILE: Internal Consistency Alignment in Large Language Models
Paper • 2412.16686 • Published • 8 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38
Sheikh Jubair
sheikhjubair
AI & ML interests
None yet
Organizations
Data-Training and Eval
-
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning
Paper • 2408.07089 • Published • 14 -
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
Paper • 2409.16191 • Published • 41 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Self-Boosting Large Language Models with Synthetic Preference Data
Paper • 2410.06961 • Published • 16
reasoning-agentic
-
OpenAI o1 System Card
Paper • 2412.16720 • Published • 36 -
LearnLM: Improving Gemini for Learning
Paper • 2412.16429 • Published • 22 -
NILE: Internal Consistency Alignment in Large Language Models
Paper • 2412.16686 • Published • 8 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38
Data-Training and Eval
-
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning
Paper • 2408.07089 • Published • 14 -
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
Paper • 2409.16191 • Published • 41 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Self-Boosting Large Language Models with Synthetic Preference Data
Paper • 2410.06961 • Published • 16
models
0
None public yet
datasets
0
None public yet