whisper-medium.en_timestamped
This is the ONNX version of openai/whisper-medium.en with word-level timestamp support for use with Transformers.js.
Features
- β Word-level timestamps via cross-attention (alignment_heads configured)
- β Multiple quantization variants (fp32, int8, uint8)
- β Compatible with Transformers.js for browser-based inference
- β Merged decoder model for efficient inference
Usage with Transformers.js
import { pipeline } from '@huggingface/transformers';
const transcriber = await pipeline(
'automatic-speech-recognition',
'neonwatty/whisper-medium.en_timestamped'
);
const result = await transcriber(audioUrl, {
return_timestamps: 'word',
chunk_length_s: 30,
stride_length_s: 5,
});
console.log(result);
// { text: "Hello world", chunks: [{ text: "Hello", timestamp: [0.0, 0.5] }, ...] }
Model Files
The model includes the following ONNX files in the onnx/ directory:
| File | Description |
|---|---|
| encoder_model.onnx | Audio encoder (fp32) |
| decoder_model.onnx | Text decoder (fp32) |
| decoder_with_past_model.onnx | Decoder with KV cache |
| decoder_model_merged.onnx | Merged decoder for efficient inference |
| *_int8.onnx | INT8 quantized versions |
| *_uint8.onnx | UINT8 quantized versions |
Acknowledgments
- Original model by OpenAI
- ONNX conversion via Hugging Face Optimum
- Inspired by onnx-community
- Downloads last month
- 43
Model tree for onnx-community/whisper-medium.en_timestamped
Base model
openai/whisper-medium.en