SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published Nov 25, 2025 • 27 • 3
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published Nov 25, 2025 • 27
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published Nov 25, 2025 • 27
Running on CPU Upgrade Featured 2.77k The Smol Training Playbook 📚 2.77k The secrets to building world-class LLMs
Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time Paper • 2502.19230 • Published Feb 26, 2025 • 2
EnigmaToM: Improve LLMs' Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States Paper • 2503.03340 • Published Mar 5, 2025 • 1
DARS Collection Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time • 4 items • Updated Oct 22, 2025
DARS Collection Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time • 4 items • Updated Oct 22, 2025
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper • 2510.11052 • Published Oct 13, 2025 • 51
Fine-Tuning on Noisy Instructions: Effects on Generalization and Performance Paper • 2510.03528 • Published Oct 3, 2025 • 17
IntrEx: A Dataset for Modeling Engagement in Educational Conversations Paper • 2509.06652 • Published Sep 8, 2025 • 24
Running Featured 1.25k FineWeb: decanting the web for the finest text data at scale 🍷 1.25k Generate high-quality text data for LLMs using FineWeb