metadata
language:
- eng
- efi
tags:
- translation
- nllb
- nllb-200
- english-efik
license: apache-2.0
datasets:
- Davlan/ibom-mt-en-efi
base_model: facebook/nllb-200-distilled-600M
library_name: transformers
pipeline_tag: translation
model-index:
- name: nllb-200-distilled-600M-ft-efi-en
results:
- task:
type: translation
name: Machine Translation
dataset:
name: Ibom-MT (en-efi)
type: Davlan/ibom-mt-en-efi
metrics:
- name: BLEU
type: bleu
value: 38.6
- name: chrF
type: chrf
value: 54.5
Efik - English (NLLB-200 Distilled)
Fine-tuned NLLB-200 model for translating Efik -> English. Efik is not directly supported in NLLB, we use the Igbo language code ibo_Latn as a close proxy during training and inference.
Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_id = "luel/nllb-200-distilled-600M-ft-efi-en"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True, src_lang="ibo_Latn")
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, use_auth_token=True)
input_example = "Ami nko nko."
inputs = tokenizer(input_example, return_tensors="pt")
generated_ids = model.generate(
**inputs, forced_bos_token_id = tokenizer.convert_tokens_to_ids("eng_Latn"), max_length=30
)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])
Training details (summary)
| Item | Value |
|---|---|
| Base model | facebook/nllb-200-distilled-600M |
| Dataset | Davlan/ibom-mt-en-efi |
| Script | lafand-mt |
| Epochs | 8 |
| Effective batch size | 32 (16 × 2 grad-accum) |
| Learning rate | 3e-5 |
| Mixed precision | bf16 |
| Early stopping | Patience = 3, min_delta (BLEU) = 0.001 |
Evaluation
| Metric | efi->en |
|---|---|
| BLEU | 38.6 |
| chrF | 54.5 |
Limitations
- Using the Igbo token (
ibo_Latn) as a stand-in for Efik may introduce lexical differences and tokenization mismatches. - The model has not been extensively evaluated for bias, toxicity, or gender neutrality.