Yysrc commited on
Commit
bf4e17a
·
verified ·
1 Parent(s): 0fa2c72

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: robotics
4
+ library_name: transformers
5
+ ---
6
+
7
+ # Mantis
8
+
9
+ > This is the official checkpoint of **Mantis: A Versatile Vision-Language-Action Model
10
+ with Disentangled Visual Foresight**
11
+
12
+ - **Paper:** https://arxiv.org/pdf/2511.16175
13
+ - **Code:** https://github.com/zhijie-group/Mantis
14
+
15
+ ### 🔥 Highlights
16
+ - **Disentangled Visual Foresight** augments action learning without overburdening the backbone.
17
+ - **Progressive Training** preserves the understanding capabilities of the backbone.
18
+ - **Adaptive Temporal Ensemble** reduces inference cost while maintaining stable control.
19
+
20
+ ### How to use
21
+ This is the Mantis model pretrained on the [SSV2 dataset](https://www.qualcomm.com/developer/software/something-something-v-2-dataset). For detailed usage please refer to [our repository](https://github.com/zhijie-group/Mantis).
22
+
23
+ ### 📝 Citation
24
+ If you find our code or models useful in your work, please cite [our paper](https://arxiv.org/pdf/2511.16175):
25
+ ```
26
+ @article{yang2025mantis,
27
+ title={Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight},
28
+ author={Yang, Yi and Li, Xueqi and Chen, Yiyang and Song, Jin and Wang, Yihan and Xiao, Zipeng and Su, Jiadi and Qiaoben, You and Liu, Pengfei and Deng, Zhijie},
29
+ journal={arXiv preprint arXiv:2511.16175},
30
+ year={2025}
31
+ }
32
+ ```