Benchmark Datasets - a Lumia101 Collection

Lumia101 's Collections

Benchmark Datasets

Quantized Language Models

Benchmark Datasets

updated Dec 9, 2025

Benchmarks for LLM

openai/gsm8k

Benchmark • Updated Dec 20, 2025 • 17.6k • 489k • 1.14k

Note Lv 2.9
allenai/winogrande

Viewer • Updated Jul 11, 2025 • 81.4k • 185k • 74

Note Lv 3.1
madrylab/gsm8k-platinum

Viewer • Updated Mar 11, 2025 • 1.21k • 2.01k • 45

Note Lv 3.5
maveriq/bigbenchhard

Viewer • Updated Sep 29, 2023 • 6.51k • 1.24k • 38

Note Lv 4.3
openai/openai_humaneval

Viewer • Updated Jan 4, 2024 • 164 • 153k • 364

Note Lv 4.8
google/simpleqa-verified

Viewer • Updated Sep 22, 2025 • 1k • 2.92k • 32

Note Lv 4.9
google-research-datasets/mbpp

Viewer • Updated Jan 4, 2024 • 1.4k • 1.21M • 213

Note Lv 5.1
cais/mmlu

Viewer • Updated Mar 8, 2024 • 231k • 305k • 640

Note Lv 6.0
allenai/ai2_arc

Viewer • Updated Dec 21, 2023 • 7.79k • 274k • 313

Note Lv 6.2
edinburgh-dawg/mmlu-redux-2.0

Viewer • Updated Feb 25, 2025 • 5.7k • 11.3k • 35

Note Lv 6.3
evalplus/humanevalplus

Viewer • Updated May 1, 2024 • 164 • 16.1k • 18

Note Lv 6.3
HuggingFaceH4/MATH

Viewer • Updated Jan 28, 2025 • 13.8k • 413 • 8

Note Lv 6.5
evalplus/mbppplus

Viewer • Updated Apr 17, 2024 • 378 • 11.5k • 15

Note Lv 6.8
google/IFEval

Viewer • Updated Aug 14, 2024 • 541 • 54.8k • 126

Note Lv 7.1
KbsdJames/Omni-MATH

Viewer • Updated Oct 12, 2024 • 4.43k • 2.23k • 123

Note Lv 7.5
TIGER-Lab/MMLU-Pro

Benchmark • Updated 15 days ago • 12.1k • 85.1k • 412

Note Lv 7.9
livecodebench/code_generation

Viewer • Updated Jun 13, 2024 • 121 • 4.19k • 28

Note Lv 8.3
pxferna/ARC-AGI-v1

Viewer • Updated Apr 14, 2025 • 800 • 1 • 1

Note Lv 8.6
princeton-nlp/SWE-bench_Verified

Viewer • Updated Feb 18, 2025 • 500 • 626k • 255

Note Lv 9.0
math-ai/aime24

Viewer • Updated 15 days ago • 30 • 6.65k • 12

Note Lv 9.2
math-ai/aime25

Viewer • Updated 15 days ago • 30 • 31.2k • 24

Note Lv 9.3
MathArena/hmmt_feb_2025

Viewer • Updated May 14, 2025 • 30 • 4.09k • 7

Note Lv 9.5
Idavidrein/gpqa

Benchmark • Updated 12 days ago • 1.25k • 82.9k • 346

Note Lv 9.6
cais/hle

Benchmark • Updated 13 days ago • 2.5k • 21.4k • 676

Note Lv 10.0