-
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
Paper • 2510.22037 • Published • 19 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 500 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 537 -
Scaling Language-Centric Omnimodal Representation Learning
Paper • 2510.11693 • Published • 100
Clément Castellon
Clemspace
AI & ML interests
Reinforcement learning, Neural Architecture Search, Transformers
Organizations
bangers?
Remember...
-
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents
Paper • 2309.17207 • Published -
Titans: Learning to Memorize at Test Time
Paper • 2501.00663 • Published • 28 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 166
Bangers 2025
-
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
Paper • 2510.22037 • Published • 19 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 500 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 537 -
Scaling Language-Centric Omnimodal Representation Learning
Paper • 2510.11693 • Published • 100
Remember...
-
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents
Paper • 2309.17207 • Published -
Titans: Learning to Memorize at Test Time
Paper • 2501.00663 • Published • 28 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 166
bangers?