Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Paper
•
2106.06103
•
Published
•
4
This repository provides all the necessary tools for Text-to-Speech (TTS) with Coqui TTS using a VITS fine-tuned on Kiswahili and Luganda Common Voice v13 from six speakers of a similar intonation.
The pre-trained model takes in as input a text and produces a waveform/audio in output.
First, you need to install TTS
pip install TTS
from TTS.utils.synthesizer import Synthesizer
synthesizer = Synthesizer(
"<model checkpoint path>",
"<model configuration file>",
None,
None,
None,
None,
None,
None,
None,
)
sentence_to_synthesize = "Your Kiswahili or Luganda sentence here"
if sentence_to_synthesize:
print(sentence_to_synthesize)
wav = synthesizer.tts(sentence_to_synthesize, None, None, None)
location = "output.wav" # Choose a desired name for the output file
synthesizer.save_wav(wav, location)
We do not provide any warranty on the performance achieved by this model when used on other datasets.
Please, cite our work if you use our models for your research or business.
@inproceedings{buildingTTS,
title={Building a Luganda Text-to-Speech Model from Crowdsourced Data},
author={Kagumire, Sulaiman and Katumba, Andrew and Nakatumba-Nabende, Joyce and Quinn, John},
booktitle={5th Workshop on African Natural Language Processing},
year ={2024}
}