|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
|
|
|
## SparQLe – Speech Queries to Text via Instruction‑Tuned LLM ⚡ |
|
|
|
|
|
**What it does:** |
|
|
SparQLe (Speech Routing to Query LLMs) enables direct speech-to-text understanding by aligning self‑supervised speech representations (e.g., HuBERT-like features) with instruction‑tuned Large Language Models (LLMs). This is achieved using a lightweight *modality adapter*, bridging the modalities without retraining the whole LLM. ([Moonlight][1]) |
|
|
|
|
|
**Key strengths:** |
|
|
|
|
|
* **Preserves semantic content** of spoken input in the produced text |
|
|
* **Efficiently leverages frozen SSL models**, avoiding heavy ASR backbones like Whisper |
|
|
* **Modular design** with a query‑former (Q‑former) adapter and LLM backend |
|
|
|
|
|
**Architecture:** |
|
|
|
|
|
1. **Speech encoder** (SSL) transforms raw input into latent features. |
|
|
2. **Modality adapter / Q‑former** aligns these with the LLM’s text embedding space. |
|
|
3. **Instruction‑tuned LLM** processes the adapted input to generate semantic text. |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use SparQLe in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{djanibekov2025sparqlespeechqueriestext, |
|
|
title={SparQLe: Speech Queries to Text Translation Through LLMs}, |
|
|
author={Amirbek Djanibekov and Hanan Aldarmaki}, |
|
|
year={2025}, |
|
|
eprint={2502.09284}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2502.09284}, |
|
|
} |
|
|
``` |
|
|
|
|
|
📄 Read the full paper on arXiv: [https://arxiv.org/abs/2502.09284](https://arxiv.org/abs/2502.09284) |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
|
|
|
|
|
--- |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- This work builds upon [fairseq](https://github.com/facebookresearch/fairseq) 💙 |
|
|
- The Qformer architecture is inspired by [BLIP-2](https://github.com/salesforce/BLIP-2) ✨ |
|
|
|