QLoRA adapter for Qwen3-4b playtesting the first draft of an RLVR environment of Mira's conceptualization.
Focus on one-shot roleplaying scenarios, even division of silly and serious, both narrative and problem-solving.
100 steps, cosine decay, batch size 4, learning rate 1e-5, rank 16, alpha 32.
- Downloads last month
- 2