Attention-Guided Audio Compression for Multimodal LLMs
Audio compression is often proposed to improve the efficiency of multimodal large language models, but its impact on downstream task performance remains underexplored. This talk examines how semantic neural audio codecs behave under token reduction constraints, using cross-modal attention as a signal to discard frames with low semantic content. On audio question-answering benchmarks, attention-guided frame […]
