A complete voice-to-voice translation pipeline that takes spoken input, translates it to the target language, and outputs natural-sounding translated speech.