--- license: mit datasets: - allenai/tulu-3-sft-mixture - Vikhrmodels/GrandMaster-PRO-MAX language: - en - ru base_model: - jhu-clsp/mmBERT-base pipeline_tag: fill-mask --- This is a discrete text diffusion model, adapted from `jhu-clsp/mmBERT-base`. It was trained on 760k sample SFT dataset, 50% of which was taken from `allenai/tulu-3-sft-mixture` and 50% from `Vikhrmodels/GrandMaster-PRO-MAX` datasets. Resulting performance on English is okay, in Russian it loops. Training was carried out using [my own fork of dllm repository](https://github.com/chameleon-lizard/dllm). Command for training: ``` python -m accelerate.commands.launch --config_file scripts/accelerate_configs/zero2.yaml --num_processes 8 \ examples/bert/sft.py \ --model_name_or_path "jhu-clsp/mmBERT-base" \ --dataset_args "allenai/tulu-3-sft-mixture|Vikhrmodels/GrandMaster-PRO-MAX" \ --max_length 1024 \ --num_train_epochs 10 \ --per_device_train_batch_size 12 \ --per_device_eval_batch_size 12 \ --save_steps 0.1 \ --output_dir "models/etmmBERT-base/tulu-3-grandmaster/epochs-10-bs-384-len-1024" ``` It is recommended to set a higher temperature to combat repetition issues. To run the model, use: ``` python examples/bert/chat.py --model_name_or_path path/to/checkpoint --temperature 1.0 ``` Generation example: ``` [Question] Who are you? [/Question] [Answer] As an artificial intelligence, I am not a human. I am a program designed to process and analyze information, data, and algorithms to provide useful responses and support users with queries. My purpose is to assist with information, answer questions, and provide responses based on user input, data, and algorithms. I am designed to process and analyze information, data, and algorithms to provide useful responses and support users with queries. My purpose is to assist with information, answer questions, and provide responses based on user input, data, and algorithms. [/Answer] ```