Spaces:
Running
on
Zero
Running
on
Zero
| title: Petite LLM 3 | |
| emoji: ππ» | |
| colorFrom: green | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.38.2 | |
| app_file: app.py | |
| pinned: true | |
| license: mit | |
| short_description: Smollm3 for French Understanding | |
| # π€ Petite Elle L'Aime 3 - Chat Interface | |
| A complete Gradio application for the [Petite Elle L'Aime 3](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft) model, featuring the full fine-tuned version for maximum performance and quality. | |
| ## π Features | |
| - **Multilingual Support**: English, French, Italian, Portuguese, Chinese, Arabic | |
| - **Full Fine-Tuned Model**: Maximum performance and quality with full precision | |
| - **Interactive Chat Interface**: Real-time conversation with the model | |
| - **Customizable System Prompt**: Define the assistant's personality and behavior | |
| - **Thinking Mode**: Enable reasoning mode with thinking tags | |
| - **Responsive Design**: Modern UI following the reference layout | |
| - **Chat Template Integration**: Proper Jinja template formatting | |
| - **Automatic Model Download**: Downloads full model at build time | |
| ## π Model Information | |
| - **Base Model**: HuggingFaceTB/SmolLM3-3B | |
| - **Parameters**: ~3B | |
| - **Context Length**: 128k | |
| - **Precision**: Full fine-tuned model (float16/float32) | |
| - **Performance**: Maximum quality and accuracy | |
| - **Languages**: English, French, Italian, Portuguese, Chinese, Arabic | |
| ## π οΈ Installation | |
| 1. Clone this repository: | |
| ```bash | |
| git clone <repository-url> | |
| cd Petite-LLM-3 | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## π Usage | |
| ### Local Development | |
| Run the application locally: | |
| ```bash | |
| python app.py | |
| ``` | |
| The application will be available at `http://localhost:7860` | |
| ### Hugging Face Spaces | |
| This application is configured for deployment on Hugging Face Spaces with automatic model download: | |
| 1. **Build Process**: The `build.py` script automatically downloads the int4 model during Space build | |
| 2. **Model Loading**: Uses local model files when available, falls back to Hugging Face download | |
| 3. **Caching**: Model files are cached for faster subsequent runs | |
| ## ποΈ Interface Features | |
| ### Layout Structure | |
| The interface follows the reference layout with: | |
| - **Title Section**: Main heading and description | |
| - **Information Panels**: Features and model information | |
| - **Input Section**: Context and user input areas | |
| - **Advanced Settings**: Collapsible parameter controls | |
| - **Chat Interface**: Real-time conversation display | |
| ### System Prompt | |
| - **Default**: "Tu es TonicIA, un assistant francophone rigoureux et bienveillant." | |
| - **Editable**: Users can customize the system prompt to define the assistant's personality | |
| - **Real-time**: Changes take effect immediately for new conversations | |
| ### Generation Parameters | |
| - **Max Length**: Maximum number of tokens to generate (64-2048) | |
| - **Temperature**: Controls randomness in generation (0.01-1.0) | |
| - **Top-p**: Nucleus sampling parameter (0.1-1.0) | |
| - **Enable Thinking**: Enable reasoning mode with thinking tags | |
| - **Advanced Settings**: Collapsible panel for fine-tuning | |
| ## π§ Technical Details | |
| ### Model Loading Strategy | |
| The application uses a smart loading strategy: | |
| 1. **Local Check**: First checks if full model files exist locally | |
| 2. **Local Loading**: If available, loads from `./model` folder | |
| 3. **Fallback Download**: If not available, downloads from Hugging Face | |
| 4. **Tokenizer**: Always uses main repo for chat template and configuration | |
| ### Build Process | |
| For Hugging Face Spaces deployment: | |
| 1. **Build Script**: `build.py` runs during Space build | |
| 2. **Model Download**: `download_model.py` downloads full model files | |
| 3. **Local Storage**: Model files stored in `./model` directory | |
| 4. **Fast Loading**: Subsequent runs use local files | |
| ### Chat Template Integration | |
| The application uses the custom chat template from the model, which supports: | |
| - System prompt integration | |
| - User and assistant message formatting | |
| - Thinking mode with `<think>` tags | |
| - Proper conversation flow management | |
| ### Memory Optimization | |
| - Uses full fine-tuned model for maximum quality | |
| - Automatic device detection (CUDA/CPU) | |
| - Efficient tokenization and generation | |
| - Float16 precision on GPU for optimal performance | |
| ## π Example Usage | |
| 1. **Basic Conversation**: | |
| - Add context in the system prompt area | |
| - Type your message in the user input box | |
| - Click the generate button to start chatting | |
| 2. **Customizing System Prompt**: | |
| - Edit the context in the dedicated text area | |
| - Changes apply to new messages immediately | |
| - Example: "Tu es un expert en programmation Python." | |
| 3. **Advanced Settings**: | |
| - Check the "Advanced Settings" checkbox | |
| - Adjust generation parameters as needed | |
| - Enable/disable thinking mode | |
| 4. **Real-time Chat**: | |
| - Messages appear in the chat interface | |
| - Conversation history is maintained | |
| - Responses are generated using the model's chat template | |
| ## π Troubleshooting | |
| ### Common Issues | |
| 1. **Model Loading Errors**: | |
| - Ensure you have sufficient RAM (8GB+ recommended) | |
| - Check your internet connection for model download | |
| - Verify all dependencies are installed | |
| 2. **Generation Errors**: | |
| - Try reducing the "Max Length" parameter | |
| - Adjust temperature and top-p values | |
| - Check the console for detailed error messages | |
| 3. **Performance Issues**: | |
| - The full model provides maximum quality but requires more memory | |
| - GPU acceleration recommended for optimal performance | |
| - Consider reducing model parameters if memory is limited | |
| 4. **System Prompt Issues**: | |
| - Ensure the system prompt is not too long (max 1000 characters) | |
| - Check that the prompt follows the expected format | |
| 5. **Build Process Issues**: | |
| - Check that `download_model.py` runs successfully | |
| - Verify that model files are downloaded to `./int4` directory | |
| - Ensure sufficient storage space for model files | |
| ## π License | |
| This project is licensed under the MIT License. The underlying model is licensed under Apache 2.0. | |
| ## π Acknowledgments | |
| - **Model**: [Tonic/petite-elle-L-aime-3-sft](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft) | |
| - **Base Model**: SmolLM3-3B by HuggingFaceTB | |
| - **Training Data**: legmlai/openhermes-fr | |
| - **Framework**: Gradio, Transformers, PyTorch | |
| - **Layout Reference**: [Tonic/Nvidia-OpenReasoning](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning) | |
| ## π Links | |
| - [Model on Hugging Face](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft) | |
| - [Chat Template](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft/blob/main/chat_template.jinja) | |
| - [Original App Reference](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning) | |
| --- | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |