Text-to-Speech (TTS) with VITS trained on Kiswahili and Luganda Common Voice

This repository provides all the necessary tools for Text-to-Speech (TTS) with Coqui TTS using a VITS fine-tuned on Kiswahili and Luganda Common Voice v13 from six speakers of a similar intonation.

The pre-trained model takes in as input a text and produces a waveform/audio in output.

How to Synthesize Speech using our models

First, you need to install TTS

pip install TTS

Perform Text-to-Speech (TTS)

from TTS.utils.synthesizer import Synthesizer


synthesizer = Synthesizer(
        "<model checkpoint path>",
        "<model configuration file>",
        None,
        None,
        None,
        None,
        None,
        None,
        None,
    )

sentence_to_synthesize = "Your Kiswahili or Luganda sentence here"
if sentence_to_synthesize:
    print(sentence_to_synthesize)
    wav = synthesizer.tts(sentence_to_synthesize, None, None, None)
    location = "output.wav"  # Choose a desired name for the output file
    synthesizer.save_wav(wav, location)

Limitations

We do not provide any warranty on the performance achieved by this model when used on other datasets.

Citing

Please, cite our work if you use our models for your research or business.

@inproceedings{buildingTTS,
  title={Building a Luganda Text-to-Speech Model from Crowdsourced Data},
  author={Kagumire, Sulaiman and Katumba, Andrew and Nakatumba-Nabende, Joyce and Quinn, John},
  booktitle={5th Workshop on African Natural Language Processing},
  year ={2024}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train marconilab/VITS-commonvoice-females

Paper for marconilab/VITS-commonvoice-females

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Paper • 2106.06103 • Published Jun 11, 2021 • 4