QLoRA adapter for Qwen3-4b playtesting the first draft of an RLVR environment of Mira's conceptualization.

Focus on one-shot roleplaying scenarios, even division of silly and serious, both narrative and problem-solving.

100 steps, cosine decay, batch size 4, learning rate 1e-5, rank 16, alpha 32.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support