rogkesavan commited on
Commit
067871e
·
verified ·
1 Parent(s): 2391047

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +182 -5
README.md CHANGED
@@ -1,8 +1,185 @@
1
  ---
2
- tags: [moe, pruning, minimax, bfloat16, sglang]
3
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # MiniMax-M2-THRIFT (Pruned to 192 experts)
7
- Base: ModelCloud/MiniMax-M2-BF16
8
- Compression: 25% expert pruning (256 -> 192), top_k = 8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - moe
4
+ - minimax
5
+ - bfloat16
6
+ - sglang
7
+ - gguf
8
+ license: mit
9
+ datasets:
10
+ - nick007x/github-code-2025
11
+ - tatsu-lab/alpaca
12
+ base_model:
13
+ - MiniMaxAI/MiniMax-M2
14
  ---
15
 
16
+ # THRIFT Targeted Reduction for Inference and Fine-Tuning
17
+
18
+ A performance-optimized variant of the base model that delivers faster responses and lower memory usage while preserving quality for everyday tasks, developed by VibeStud.io.
19
+
20
+ ## TLDR
21
+
22
+ We, over-caffinated researchers at VibeStud.io wanted to create a 50% pruned version of the SOTA MiniMax M2 that is best suited for local/air-gapped coding. This version we achieved \~25%. A 50% pruned version is under development while a not so sucky team of ours is working on a 50% pruned version of Kimi K2 Thinking. Check back later, cheers\!
23
+
24
+ ## Why it’s useful
25
+
26
+ * **Lower latency:** Snappier responses for interactive apps and chatbots.
27
+ * **Smaller memory footprint:** Runs on cheaper GPUs or with fewer resources per replica.
28
+ * **Higher throughput:** Serve more concurrent users at the same cost.
29
+ * **Deployment-friendly:** Drop-in replacement for the base model in most inference stacks.
30
+ * **Adaptable:** Supports light fine-tuning to match your domain and style guidelines.
31
+
32
+ ## Intended use
33
+
34
+ * General chat and coding assistance
35
+ * Enterprise assistants with strict latency/VRAM budgets
36
+ * Batch or realtime serving in cloud and on-prem environments
37
+ * Edge or cost-sensitive deployments where efficiency matters
38
+
39
+ ## When to use it
40
+
41
+ * You’re constrained by GPU memory or need shorter response times
42
+ * You want to increase QPS without scaling infrastructure
43
+ * You need a model that is “good enough” for most tasks at a better cost profile
44
+
45
+ ---
46
+
47
+ # Model Comparison Report
48
+
49
+ **Models Under Evaluation**
50
+
51
+ | Model | Type |
52
+ | :---- | :---- |
53
+ | ModelCloud/MiniMax-M2-BF16 | Base Model |
54
+ | VibeStudio/MiniMax-M2-THRIFT | Compressed/Optimized |
55
+
56
+ **Evaluation Date: November 7, 2025**
57
+
58
+ ## 📊 Results Comparison
59
+
60
+ ### 1\) Multiple Choice Q\&A (lm-eval)
61
+
62
+ **Overall MMLU Performance**
63
+
64
+ | Model | MMLU Overall | Humanities | STEM | Social Sciences | Other |
65
+ | :---- | ----: | ----: | ----: | ----: | ----: |
66
+ | MiniMax-M2-BF16 | 83.16% | 77.45% | 80.91% | 90.02% | 87.29% |
67
+ | MiniMax-M2-THRIFT | 77.72% | 70.14% | 77.61% | 86.84% | 80.27% |
68
+ | **Δ (Difference)** | **\-5.44%** | **\-7.31%** | **\-3.30%** | **\-3.18%** | **\-7.02%** |
69
+
70
+ **Individual Task Performance**
71
+
72
+ | Task | BF16 (Base) | THRIFT-BF16 | Difference |
73
+ | :---- | ----: | ----: | ----: |
74
+ | arc\_challenge | 73.21% | 61.01% | \-12.20% ⬇️ |
75
+ | arc\_easy | 88.30% | 83.08% | \-5.22% ⬇️ |
76
+ | boolq | 87.95% | 84.95% | \-3.00% ⬇️ |
77
+ | hellaswag | 83.00% | 77.09% | \-5.91% ⬇️ |
78
+ | mmlu | 83.16% | 77.72% | \-5.44% ⬇️ |
79
+ | openbookqa | 48.60% | 43.00% | \-5.60% ⬇️ |
80
+ | rte | 75.45% | 80.14% | **\+4.69% ⬆️** |
81
+ | winogrande | 76.48% | 74.90% | \-1.58% ⬇️ |
82
+
83
+ **Average Accuracy Drop: \-4.28%**
84
+
85
+ ### 2\) Code Generation (EvalPlus)
86
+
87
+ **MBPP Results**
88
+
89
+ | Model | MBPP (base) | MBPP+ (extended) |
90
+ | :---- | ----: | ----: |
91
+ | MiniMax-M2-BF16 | 73.8% | 64.0% |
92
+ | MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 Coming Soon |
93
+
94
+ **HumanEval Results**
95
+
96
+ | Model | HumanEval (base) | HumanEval+ (extended) |
97
+ | :---- | ----: | ----: |
98
+ | MiniMax-M2-BF16 | ✅ Complete | ✅ Complete |
99
+ | MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 Coming Soon |
100
+
101
+ ### 3\) Math Benchmarks
102
+
103
+ **GSM8K Results**
104
+
105
+ | Model | Accuracy | Problems |
106
+ | :---- | ----: | ----: |
107
+ | MiniMax-M2-BF16 | 92.72% | 1,319 |
108
+ | MiniMax-M2-THRIFT | 🔄 Coming Soon | 1,319 |
109
+
110
+ **MATH-500 Results**
111
+
112
+ | Model | Overall | Level 1 | Level 2 | Level 3 | Level 4 | Level 5 |
113
+ | :---- | ----: | ----: | ----: | ----: | ----: | ----: |
114
+ | MiniMax-M2-BF16 | 87.2% | 90.7% | 95.56% | 82.86% | 85.16% | 85.82% |
115
+ | MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 | 🔄 | 🔄 | 🔄 | 🔄 |
116
+
117
+ ### 4\) LiveCodeBench (Live Coding Problems)
118
+
119
+ | Model | pass@1 | Problems | Status |
120
+ | :---- | ----: | ----: | :---- |
121
+ | **MiniMax-M2-BF16** | **35.71%** | 182 | ✅ Complete |
122
+ | **MiniMax-M2-THRIFT** | 🔄 Coming Soon | 182 | ⏳ Not Started Yet |
123
+
124
+ ---
125
+
126
+ ## 📈 Analysis (Preliminary)
127
+
128
+ ### Key Findings
129
+
130
+ **MMLU Performance Drop**
131
+
132
+ * THRIFT-BF16 shows **\-5.44%** overall MMLU drop
133
+ * Largest drop: **arc\_challenge (-12.20%)**
134
+ * Smallest drop: **winogrande (-1.58%)**
135
+ * **RTE improved by \+4.69%** 🎉
136
+
137
+ **Subject-Specific Performance**
138
+
139
+ * Best preservation: **Social Sciences (-3.18%)**
140
+ * Most degraded: **Other (-7.02%)**
141
+ * STEM: **Moderate drop (-3.30%)**
142
+
143
+ **Compression Trade-off**
144
+
145
+ * THRIFT-BF16 (compressed) vs BF16 (base)
146
+ * Average accuracy loss: **\~4–5%**
147
+ * Expected for compressed/quantized models
148
+
149
+ **MMLU Category Breakdown**
150
+
151
+ | Category | BF16 (Base) | THRIFT-BF16 | Difference | Status |
152
+ | :---- | ----: | ----: | ----: | :---- |
153
+ | High School Government | 97.93% | 94.82% | \-3.11% | ✅ Still Excellent |
154
+ | High School Psychology | 95.41% | 93.58% | \-1.83% | ✅ Well Preserved |
155
+ | Marketing | 95.73% | 91.88% | \-3.85% | ✅ Good |
156
+ | Professional Medicine | 92.28% | 79.78% | \-12.50% | ⚠️ Notable Drop |
157
+ | Clinical Knowledge | 92.83% | 85.66% | \-7.17% | ⚠️ Moderate Drop |
158
+
159
+ ---
160
+
161
+ ## Benchmarks
162
+
163
+ Coming soon.
164
+
165
+ ## Research paper
166
+
167
+ Coming soon.
168
+
169
+ ---
170
+
171
+ ## License
172
+
173
+ This model is derived from MiniMax-M2 and distributed under the MIT License [http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE](http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE)
174
+
175
+ ---
176
+
177
+ ## Credits
178
+
179
+ Model conversion and HF Transformers code by @Qubitum at ModelCloud.
180
+
181
+ Positive references to related work:
182
+
183
+ * Cerebras — [https://arxiv.org/abs/2510.13999](https://arxiv.org/abs/2510.13999)
184
+ * Alibaba Cloud Computing — [https://arxiv.org/html/2511.01354v1](https://arxiv.org/html/2511.01354v1)
185
+ * QLoRA — [https://arxiv.org/abs/2307.02973](https://arxiv.org/abs/2307.02973)