Artifacts at boundaries during streaming

#18

by BlindTech - opened 10 days ago

Discussion

BlindTech

10 days ago

•

edited 10 days ago

Hi everyone 👋

I tried to run the ONNX model in streaming mode by decoding every N amount of output tokens. I tried with a range of values from 10 to 100. The issue I face now is that when I put the audio together I can hear some clicks and pops at audio chunk boundaries. This seems to be because decoder produces slightly different audio based on context and potentially makes audio to be quite at the end of each decode run or adds silence at the start of each decode run. Simply removing silence or using lookback with lookahead strategies does not seem to help.

Any advice and how this could be solved? Any direction advice? I assume this decoder was not designed for true streaming?

Thank you in advance!

UPDATE:
Looks like non ONNX model allows to make decoder outputs more deterministic. Plan to investigate in this direction.

BlindTech changed discussion title from Artifacts and boundaries during streaming to Artifacts at boundaries during streaming 10 days ago

amalshajikm

7 days ago

How did u made this model stream output? any code u can share so i can implement streaming for this model?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment