Chatterbox Turbo - WebGPU Compatible

This is a WebGPU-compatible version of ResembleAI/chatterbox-turbo-ONNX.

Changes from Original

The original model contains int64 Cast operations and tensors that WebGPU cannot execute. This version converts all int64 operations to int32, enabling direct WebGPU inference.

Modifications Made:

  • conditional_decoder: 521 Cast nodes inserted (376 Shape/Range ops)
  • speech_encoder: 350 Cast nodes inserted (243 Shape/Range ops)
  • language_model: 3 Cast nodes inserted
  • embed_tokens: 1 Cast node inserted

Usage with Transformers.js

import { AutoModel, AutoProcessor } from '@huggingface/transformers';

const model = await AutoModel.from_pretrained('spacekaren/chatterbox-turbo-webgpu', {
  device: 'webgpu',
  dtype: 'q4f16',
});

const processor = await AutoProcessor.from_pretrained('spacekaren/chatterbox-turbo-webgpu');

Model Size

  • Total: ~539 MB (q4f16 quantization)
  • Same architecture as original, just int64→int32 conversion

License

MIT (same as original)

Credits

Downloads last month
186
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for spacekaren/chatterbox-turbo-webgpu

Quantized
(1)
this model