Synthetic baselines trained for our paper "Scaling Low-Resource MT via Synthetic Data Generation with LLMs" accepted as a main in EMNLP 2025.
AI & ML interests
At the University of Helsinki, we focus on: - NLP for morphologically-rich languages - Cross-lingual NLP - NLP in the humanities
Recent Activity
View all activity
Organization Card
Helsinki-NLP refers to the language technology research group at the University of Helsinki. Here, we publish various resource related to multilingual NLP, machine translation, text simplification to name a few application areas. We focus on wide language coverage, open data sets and public pre-trained models.
multilingual translation models trained on the Tatoeba Translation Challenge dataset (from OPUS) and a massively multilingual Bible corpus
-
Helsinki-NLP/opus-mt-tc-bible-big-aav-fra_ita_por_spa
Translation • 0.2B • Updated • 14 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-en
Translation • 0.2B • Updated • 34 • 1 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_nld
Translation • 0.2B • Updated • 14 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_fra_por_spa
Translation • 0.2B • Updated • 22
Synthetic baselines trained for our paper "Scaling Low-Resource MT via Synthetic Data Generation with LLMs" accepted as a main in EMNLP 2025.
multilingual translation models trained on the Tatoeba Translation Challenge dataset (from OPUS) and a massively multilingual Bible corpus
-
Helsinki-NLP/opus-mt-tc-bible-big-aav-fra_ita_por_spa
Translation • 0.2B • Updated • 14 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-en
Translation • 0.2B • Updated • 34 • 1 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_nld
Translation • 0.2B • Updated • 14 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_fra_por_spa
Translation • 0.2B • Updated • 22
models
1,536
Helsinki-NLP/opus-mt-eo-caenes
Translation
•
76.9M
•
Updated
Helsinki-NLP/opus-mt-caenes-eo
Translation
•
76.9M
•
Updated
Helsinki-NLP/opus-mt-fr-en
Translation
•
75.2M
•
Updated
•
1.17M
•
•
49
Helsinki-NLP/opus-mt-synthetic-en-eu
Updated
•
43
•
1
Helsinki-NLP/opus-mt-synthetic-en-mk
Updated
•
31
Helsinki-NLP/opus-mt-synthetic-en-ka
Updated
•
41
Helsinki-NLP/opus-mt-synthetic-en-so
Updated
•
118
•
1
Helsinki-NLP/opus-mt-synthetic-en-is
Updated
•
24
•
1
Helsinki-NLP/opus-mt-synthetic-en-uk
Updated
•
29
Helsinki-NLP/opus-mt-synthetic-en-gd
Updated
•
26
datasets
51
Helsinki-NLP/nemotron-cc-translated
Viewer
•
Updated
•
5.79B
•
13.5k
•
1
Helsinki-NLP/fineweb-edu-translated
Preview
•
Updated
•
95.4k
•
4
Helsinki-NLP/OpenSubtitles2024
Viewer
•
Updated
•
570M
•
1.93k
•
2
Helsinki-NLP/shroom
Preview
•
Updated
•
20
Helsinki-NLP/mu-shroom
Viewer
•
Updated
•
11.5k
•
199
•
4
Helsinki-NLP/tatoeba_mt_train
Viewer
•
Updated
•
13.7B
•
2.39k
•
3
Helsinki-NLP/tatoeba_mt
Updated
•
27.7k
•
61
Helsinki-NLP/un_pc
Viewer
•
Updated
•
323M
•
4.79k
•
23
Helsinki-NLP/un_ga
Viewer
•
Updated
•
1.11M
•
3.11k
•
3
Helsinki-NLP/opus_books
Viewer
•
Updated
•
1.25M
•
17k
•
84