VibeStudio
/

MiniMax-M2-THRIFT

@@ -1,8 +1,185 @@
 ---
-tags: [moe, pruning, minimax, bfloat16, sglang]
-license: apache-2.0
 ---
-# MiniMax-M2-THRIFT (Pruned to 192 experts)
-Base: ModelCloud/MiniMax-M2-BF16
-Compression: 25% expert pruning (256 -> 192), top_k = 8

 ---
+tags:
+- moe
+- minimax
+- bfloat16
+- sglang
+- gguf
+license: mit
+datasets:
+- nick007x/github-code-2025
+- tatsu-lab/alpaca
+base_model:
+- MiniMaxAI/MiniMax-M2
 ---
+# THRIFT — Targeted Reduction for Inference and Fine-Tuning
+A performance-optimized variant of the base model that delivers faster responses and lower memory usage while preserving quality for everyday tasks, developed by VibeStud.io.
+## TLDR
+We, over-caffinated researchers at VibeStud.io wanted to create a 50% pruned version of the SOTA MiniMax M2 that is best suited for local/air-gapped coding. This version we achieved \~25%. A 50% pruned version is under development while a not so sucky team of ours is working on a  50% pruned version of Kimi K2 Thinking. Check back later, cheers\!
+## Why it’s useful
+* **Lower latency:** Snappier responses for interactive apps and chatbots.
+* **Smaller memory footprint:** Runs on cheaper GPUs or with fewer resources per replica.
+* **Higher throughput:** Serve more concurrent users at the same cost.
+* **Deployment-friendly:** Drop-in replacement for the base model in most inference stacks.
+* **Adaptable:** Supports light fine-tuning to match your domain and style guidelines.
+## Intended use
+* General chat and coding assistance
+* Enterprise assistants with strict latency/VRAM budgets
+* Batch or realtime serving in cloud and on-prem environments
+* Edge or cost-sensitive deployments where efficiency matters
+## When to use it
+* You’re constrained by GPU memory or need shorter response times
+* You want to increase QPS without scaling infrastructure
+* You need a model that is “good enough” for most tasks at a better cost profile
+---
+# Model Comparison Report
+**Models Under Evaluation**
+| Model | Type |
+| :---- | :---- |
+| ModelCloud/MiniMax-M2-BF16 | Base Model |
+| VibeStudio/MiniMax-M2-THRIFT | Compressed/Optimized |
+**Evaluation Date: November 7, 2025**
+## 📊 Results Comparison
+### 1\) Multiple Choice Q\&A (lm-eval)
+**Overall MMLU Performance**
+| Model | MMLU Overall | Humanities | STEM | Social Sciences | Other |
+| :---- | ----: | ----: | ----: | ----: | ----: |
+| MiniMax-M2-BF16 | 83.16% | 77.45% | 80.91% | 90.02% | 87.29% |
+| MiniMax-M2-THRIFT | 77.72% | 70.14% | 77.61% | 86.84% | 80.27% |
+| **Δ (Difference)** | **\-5.44%** | **\-7.31%** | **\-3.30%** | **\-3.18%** | **\-7.02%** |
+**Individual Task Performance**
+| Task | BF16 (Base) | THRIFT-BF16 | Difference |
+| :---- | ----: | ----: | ----: |
+| arc\_challenge | 73.21% | 61.01% | \-12.20% ⬇️ |
+| arc\_easy | 88.30% | 83.08% | \-5.22% ⬇️ |
+| boolq | 87.95% | 84.95% | \-3.00% ⬇️ |
+| hellaswag | 83.00% | 77.09% | \-5.91% ⬇️ |
+| mmlu | 83.16% | 77.72% | \-5.44% ⬇️ |
+| openbookqa | 48.60% | 43.00% | \-5.60% ⬇️ |
+| rte | 75.45% | 80.14% | **\+4.69% ⬆️** |
+| winogrande | 76.48% | 74.90% | \-1.58% ⬇️ |
+**Average Accuracy Drop: \-4.28%**
+### 2\) Code Generation (EvalPlus)
+**MBPP Results**
+| Model | MBPP (base) | MBPP+ (extended) |
+| :---- | ----: | ----: |
+| MiniMax-M2-BF16 | 73.8% | 64.0% |
+| MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 Coming Soon |
+**HumanEval Results**
+| Model | HumanEval (base) | HumanEval+ (extended) |
+| :---- | ----: | ----: |
+| MiniMax-M2-BF16 | ✅ Complete | ✅ Complete |
+| MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 Coming Soon |
+### 3\) Math Benchmarks
+**GSM8K Results**
+| Model | Accuracy | Problems |
+| :---- | ----: | ----: |
+| MiniMax-M2-BF16 | 92.72% | 1,319 |
+| MiniMax-M2-THRIFT | 🔄 Coming Soon | 1,319 |
+**MATH-500 Results**
+| Model | Overall | Level 1 | Level 2 | Level 3 | Level 4 | Level 5 |
+| :---- | ----: | ----: | ----: | ----: | ----: | ----: |
+| MiniMax-M2-BF16 | 87.2% | 90.7% | 95.56% | 82.86% | 85.16% | 85.82% |
+| MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 | 🔄 | 🔄 | 🔄 | 🔄 |
+### 4\) LiveCodeBench (Live Coding Problems)
+| Model | pass@1 | Problems | Status |
+| :---- | ----: | ----: | :---- |
+| **MiniMax-M2-BF16** | **35.71%** | 182 | ✅ Complete |
+| **MiniMax-M2-THRIFT** | 🔄 Coming Soon | 182 | ⏳ Not Started Yet |
+---
+## 📈 Analysis (Preliminary)
+### Key Findings
+**MMLU Performance Drop**
+* THRIFT-BF16 shows **\-5.44%** overall MMLU drop
+* Largest drop: **arc\_challenge (-12.20%)**
+* Smallest drop: **winogrande (-1.58%)**
+* **RTE improved by \+4.69%** 🎉
+**Subject-Specific Performance**
+* Best preservation: **Social Sciences (-3.18%)**
+* Most degraded: **Other (-7.02%)**
+* STEM: **Moderate drop (-3.30%)**
+**Compression Trade-off**
+* THRIFT-BF16 (compressed) vs BF16 (base)
+* Average accuracy loss: **\~4–5%**
+* Expected for compressed/quantized models
+**MMLU Category Breakdown**
+| Category | BF16 (Base) | THRIFT-BF16 | Difference | Status |
+| :---- | ----: | ----: | ----: | :---- |
+| High School Government | 97.93% | 94.82% | \-3.11% | ✅ Still Excellent |
+| High School Psychology | 95.41% | 93.58% | \-1.83% | ✅ Well Preserved |
+| Marketing | 95.73% | 91.88% | \-3.85% | ✅ Good |
+| Professional Medicine | 92.28% | 79.78% | \-12.50% | ⚠️ Notable Drop |
+| Clinical Knowledge | 92.83% | 85.66% | \-7.17% | ⚠️ Moderate Drop |
+---
+## Benchmarks
+Coming soon.
+## Research paper
+Coming soon.
+---
+## License
+This model is derived from MiniMax-M2 and distributed under the MIT License [http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE](http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE)
+---
+## Credits
+Model conversion and HF Transformers code by @Qubitum at ModelCloud.
+Positive references to related work:
+* Cerebras — [https://arxiv.org/abs/2510.13999](https://arxiv.org/abs/2510.13999)
+* Alibaba Cloud Computing — [https://arxiv.org/html/2511.01354v1](https://arxiv.org/html/2511.01354v1)
+* QLoRA — [https://arxiv.org/abs/2307.02973](https://arxiv.org/abs/2307.02973)