Terrible Quality!

#16
by qpqpqpqpqpqp - opened

I tried all VibeVoice models, they all generate speech with disgusting noise! It is not so hard to run a denoiser to make clean audio files and then train on them to make a better, high-quality model like some did. Are you deaf, excuse me

Could you share the generated speech? If you used the WebSocket realtime demo, one possible reason is that the device’s inference capability couldn’t keep up with the speech playback, which resulted in noticeable noise.

Sign up or log in to comment