Contains smoltalk dataset in multiple minority languges. The dataset is useful in post-training a base model.