--- title: Zen VL Training emoji: 🧘 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.0.0 app_file: app.py pinned: false license: apache-2.0 hardware: a10g-large --- # 🧘 Zen VL Training Space Train zen-vl vision-language models with combined ADP+xLAM datasets on HuggingFace Pro GPUs. ## Features - **Multi-Size Support**: Train 4B, 8B, or 30B parameter models - **GPU Options**: A10G (24GB), A100-Large (40GB), A100 (80GB) - **Combined Datasets**: Agent Data Protocol (ADP) + xLAM Function Calling - **Auto-Upload**: Trained models automatically uploaded to HuggingFace Hub - **Real-time Monitoring**: Live training logs and progress tracking ## Datasets ### Agent Data Protocol (ADP) - **Source**: [neulab/agent-data-collection](https://huggingface.co/datasets/neulab/agent-data-collection) - **Size**: ~220k agent trajectories (8.4GB) - **Citation**: arXiv:2510.24702 ### xLAM Function Calling 60k - **Source**: [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) - **Size**: 60k function calling examples (101MB) - **Citation**: Salesforce Research ## Training Configuration ### 4B Model (A10G - 24GB) - Batch size: 1 - Gradient accumulation: 8 - Max samples: 30,000 - Estimated time: 6-8 hours ### 8B Model (A100-Large - 40GB) - Batch size: 2 - Gradient accumulation: 8 - Max samples: 50,000 - Estimated time: 10-12 hours ### 30B Model (A100 - 80GB) - Batch size: 4 - Gradient accumulation: 8 - Max samples: 100,000 - Estimated time: 20-24 hours ## Usage 1. Select model size (4b, 8b, or 30b) 2. Choose GPU type (a10g, a100-large, or a100) 3. Click "Start Training" 4. Monitor progress in real-time 5. Trained model automatically uploads to `zenlm/zen-vl-{size}-agent` ## Requirements - HuggingFace Pro account (for GPU access) - HF_TOKEN environment variable set - Write access to zenlm organization ## Output Models Trained models will be uploaded to: - `zenlm/zen-vl-4b-agent` - `zenlm/zen-vl-8b-agent` - `zenlm/zen-vl-30b-agent` ## Technical Details **Base Architecture**: Qwen3-VL **Training Method**: Supervised Fine-Tuning (SFT) **Data Mixture**: 80% ADP, 20% xLAM **Precision**: bfloat16 **Framework**: Transformers + Accelerate ## License Apache 2.0 - See [LICENSE](https://github.com/zenlm/zen-vl/blob/main/LICENSE) ## Citation ```bibtex @software{zen-vl-2025, title={Zen VL: Vision-Language Models with Function Calling}, author={Zen AI Team}, year={2025}, url={https://github.com/zenlm/zen-vl} } ``` ## Links - **Website**: https://zenlm.org - **GitHub**: https://github.com/zenlm/zen-vl - **Models**: https://huggingface.co/zenlm - **Paper**: Coming soon