MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
Abstract
A novel automotive SLU dataset, MAC-SLU, benchmarks the performance of LLMs and LALMs using in-context learning, supervised fine-tuning, E2E, and pipeline paradigms, showing that SFT outperforms in-context learning and E2E LALMs match pipeline approaches.
Spoken Language Understanding (SLU), which aims to extract user semantics to execute downstream tasks, is a crucial component of task-oriented dialog systems. Existing SLU datasets generally lack sufficient diversity and complexity, and there is an absence of a unified benchmark for the latest Large Language Models (LLMs) and Large Audio Language Models (LALMs). This work introduces MAC-SLU, a novel Multi-Intent Automotive Cabin Spoken Language Understanding Dataset, which increases the difficulty of the SLU task by incorporating authentic and complex multi-intent data. Based on MAC-SLU, we conducted a comprehensive benchmark of leading open-source LLMs and LALMs, covering methods like in-context learning, supervised fine-tuning (SFT), and end-to-end (E2E) and pipeline paradigms. Our experiments show that while LLMs and LALMs have the potential to complete SLU tasks through in-context learning, their performance still lags significantly behind SFT. Meanwhile, E2E LALMs demonstrate performance comparable to pipeline approaches and effectively avoid error propagation from speech recognition. Codehttps://github.com/Gatsby-web/MAC\_SLU and datasetshuggingface.co/datasets/Gatsby1984/MAC\_SLU are released publicly.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper