See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published about 1 month ago • 8
LASER: Lip Landmark Assisted Speaker Detection for Robustness Paper • 2501.11899 • Published Jan 21, 2025
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published about 1 month ago • 8
UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios Paper • 2505.21954 • Published May 28, 2025 • 1