Contributions
(1) Semantic Redundancy Analysis. We conduct in-depth analyses of KV caches in LVLMs, revealing substantial inherent semantic redundancy. Besides, we demonstrate that importance-based methods fail to preserve full KV distribution coverage, exposing fundamental limitations.
(2) Mixing Importance with Diversity. Based on our analysis, we propose MixKV, a head-wise adaptive mechanism that quantifies semantic redundancy to create principled weighting between importance and diversity scores for joint optimization of KV cache compression.
(3) Comprehensive Experimental Validation. Extensive experiments across diverse multi-modal and text benchmarks demonstrate that MixKV yields consistent performance improvements for existing importance-based compression methods while maintaining inference efficiency.
Core Findings
Heterogeneous head-wise redundancy in LLMs and LVLMs. For both pure-text and vision-language data, different heads exhibit markedly different redundancy levels, and their overall patterns are highly similar: a head that is relatively more redundant on text remains relatively more redundant on vision-language inputs. We hypothesize that this is because different heads focus on different types of information: some heads primarily attend to local patterns and therefore exhibit higher semantic redundancy, while others capture more global information and consequently show much lower redundancy.
Overview
We argue that beyond importance, preserving diverse KV pairs at per-head granularity is essential for minimizing semantic redundancy while maintaining comprehensive information coverage. To this end, we propose MixKV, which adopts a principled "mixing importance with diversity" approach. Specifically, MixKV extends existing importance-based KV compression methods by incorporating head-wise semantic diversity evaluation. By independently measuring semantic similarity within each attention head, MixKV adaptively balances importance and diversity per head to achieve fine-grained joint optimization of KV cache compression in LVLMs.
MixKV enables KV cache compression methods (e.g. SnapKV, AdaKV, and SparseMM) to approximate the full original semantic distribution of the uncompressed KV cache more effectively.
PCA analysis of semantic coverage under KV compression.
Experimental Results
Since SparseMM does not provide head importance scores for InternVL3-8B, we cannot reproduce their results on this model. "Full KV" means caching all KV pairs (upper bound). Results here report only budget = 128.
| Methods | DocVQA (%) | OCRBench (%) | TextVQA (%) | ChartQA (%) | TextCaps |
|---|---|---|---|---|---|
| LLaVA-NeXT-Mistral-7B | |||||
| Full KV | 63.6 | 52.9 | 65.7 | 52.9 | 0.707 |
| SnapKV | 55.2 | 39.0 | 61.0 | 47.5 | 0.558 |
| + MixKV | 58.1 | 44.7 | 64.3 | 47.7 | 0.659 |
| △ | +2.9 | +5.7 | +3.3 | +0.2 | +0.101 |
| PyramidKV | 54.3 | 39.4 | 60.9 | 47.1 | 0.553 |
| + MixKV | 57.2 | 43.7 | 63.8 | 47.5 | 0.656 |
| △ | +2.9 | +4.3 | +2.9 | +0.4 | +0.103 |
| AdaKV | 55.9 | 40.4 | 60.5 | 47.8 | 0.566 |
| + MixKV | 58.3 | 44.9 | 63.7 | 48.5 | 0.660 |
| △ | +2.4 | +4.5 | +3.2 | +0.7 | +0.094 |
| SparseMM | 60.8 | 50.7 | 64.7 | 51.2 | 0.634 |
| + MixKV | 61.0 | 50.4 | 65.0 | 51.5 | 0.652 |
| △ | +0.2 | -0.3 | +0.3 | +0.3 | +0.018 |
| InternVL3-8B | |||||
| Full KV | 90.96 | 84.2 | 81.1 | 86.36 | 1.042 |
| SnapKV | 85.4 | 69.0 | 78.2 | 84.6 | 0.901 |
| + MixKV | 86.2 | 71.1 | 78.8 | 84.8 | 0.949 |
| △ | +0.8 | +2.1 | +0.6 | +0.2 | +0.048 |
| PyramidKV | 82.7 | 58.4 | 75.3 | 84.0 | 0.809 |
| + MixKV | 83.5 | 60.0 | 76.6 | 84.4 | 0.850 |
| △ | +0.8 | +1.6 | +1.3 | +0.4 | +0.041 |
| AdaKV | 86.0 | 70.2 | 78.0 | 84.4 | 0.921 |
| + MixKV | 86.7 | 71.6 | 78.7 | 85.2 | 0.955 |
| △ | +0.7 | +1.4 | +0.7 | +0.8 | +0.034 |
| Qwen2-VL-7B-Instruct | |||||
| Full KV | 93.9 | 82.1 | 82.1 | 81.5 | 1.469 |
| SnapKV | 80.1 | 71.9 | 77.5 | 79.6 | 1.142 |
| + MixKV | 82.6 | 75.4 | 80.6 | 81.2 | 1.342 |
| △ | +2.5 | +3.5 | +3.1 | +1.6 | +0.200 |
| PyramidKV | 74.0 | 67.9 | 74.6 | 79.2 | 0.951 |
| + MixKV | 76.3 | 72.6 | 77.1 | 80.7 | 1.119 |
| △ | +2.3 | +4.7 | +2.5 | +1.5 | +0.168 |
| AdaKV | 81.2 | 71.0 | 77.0 | 79.6 | 1.146 |
| + MixKV | 82.1 | 74.7 | 79.6 | 80.9 | 1.275 |
| △ | +0.9 | +3.7 | +2.6 | +1.3 | +0.129 |
| SparseMM | 91.5 | 79.0 | 81.6 | 81.5 | 1.430 |
| + MixKV | 92.7 | 81.0 | 82.0 | 81.8 | 1.459 |
| △ | +1.2 | +2.0 | +0.4 | +0.3 | +0.029 |
"Full KV" refers to caching all KV pairs of the LLM (upper bound). Results here report only budget = 128.
| Methods | Mobile Text | Mobile Icon/Widget | Desktop Text | Desktop Icon/Widget | Web Text | Web Icon/Widget | Average |
|---|---|---|---|---|---|---|---|
| Qwen2.5-VL-7B-Instruct | |||||||
| Full KV | 97.2 | 87.7 | 91.2 | 77.1 | 88.5 | 82.3 | 88.5 |
| SnapKV | 65.5 | 78.7 | 86.1 | 74.3 | 76.9 | 74.4 | 75.3 |
| + MixKV | 86.6 | 85.3 | 87.1 | 75.0 | 85.0 | 76.4 | 83.3 |
| △ | +21.1 | +6.6 | +1.0 | +0.7 | +8.1 | +2.0 | +7.9 |
| PyramidKV | 45.5 | 62.1 | 82.0 | 75.0 | 69.2 | 71.4 | 65.6 |
| + MixKV | 64.1 | 74.4 | 87.1 | 74.3 | 76.9 | 71.9 | 74.1 |
| △ | +18.6 | +12.3 | +5.1 | -0.7 | +7.7 | +0.5 | +8.5 |
| AdaKV | 80.7 | 84.8 | 90.2 | 74.3 | 82.1 | 75.9 | 81.6 |
| + MixKV | 94.1 | 88.6 | 89.7 | 75.0 | 85.0 | 76.9 | 86.0 |
| △ | +13.4 | +3.8 | -0.5 | +0.7 | +2.9 | +1.0 | +4.4 |
"Full KV" refers to caching all KV pairs of the LLM (upper bound). Results here report only budget = 512.
| Methods | NrtvQA | Qasper | MF-en | HotpotQA | 2WikiMQA | Musique | GovReport | QMSum | MultiNews | TREC | TriviaQA | SAMSum | PCount | PRe | Lcc | RB-P | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mistral-7B-Instruct-v0.2 | |||||||||||||||||
| Full KV | 26.81 | 33.19 | 49.26 | 43.02 | 27.12 | 18.78 | 32.80 | 24.16 | 27.02 | 71.00 | 86.23 | 42.64 | 2.75 | 86.98 | 55.09 | 53.01 | 42.49 |
| KV Cache Budget = 512 | |||||||||||||||||
| SnapKV | 23.69 | 27.71 | 49.16 | 39.70 | 25.44 | 17.38 | 23.31 | 23.28 | 24.20 | 66.00 | 86.17 | 41.54 | 3.24 | 86.29 | 53.71 | 51.19 | 40.13 |
| + MixKV | 23.56 | 28.19 | 48.96 | 40.36 | 25.86 | 17.34 | 24.63 | 23.36 | 25.32 | 66.00 | 86.23 | 42.25 | 3.02 | 87.66 | 53.87 | 51.40 | 40.50 |
| △ | -0.13 | +0.48 | -0.20 | +0.66 | +0.42 | -0.04 | +1.32 | +0.08 | +1.12 | 0.00 | +0.06 | +0.71 | -0.22 | +1.37 | +0.16 | +0.21 | +0.37 |
| AdaKV | 24.35 | 27.33 | 48.76 | 40.07 | 26.38 | 17.97 | 23.73 | 23.51 | 24.31 | 67.50 | 86.38 | 42.53 | 3.06 | 86.65 | 53.90 | 51.57 | 40.50 |
| + MixKV | 24.26 | 28.39 | 48.90 | 40.86 | 26.33 | 17.07 | 24.63 | 23.32 | 25.41 | 69.00 | 86.51 | 42.67 | 3.07 | 86.44 | 54.46 | 51.69 | 40.81 |
| △ | -0.09 | +1.06 | +0.14 | +0.79 | -0.05 | -0.90 | +0.90 | -0.19 | +1.10 | +1.50 | +0.13 | +0.14 | +0.01 | -0.21 | +0.56 | +0.12 | +0.31 |
| Llama-3.1-8B-Instruct | |||||||||||||||||
| Full KV | 30.22 | 45.37 | 55.80 | 55.97 | 45.00 | 31.26 | 35.12 | 25.38 | 27.20 | 72.50 | 91.64 | 43.57 | 9.41 | 99.50 | 62.88 | 56.43 | 49.20 |
| KV Cache Budget = 512 | |||||||||||||||||
| SnapKV | 27.42 | 38.95 | 53.57 | 55.20 | 44.68 | 29.75 | 25.55 | 24.21 | 24.28 | 64.50 | 92.35 | 41.04 | 9.98 | 99.50 | 62.50 | 54.93 | 46.53 |
| + MixKV | 26.76 | 41.77 | 53.77 | 55.19 | 44.72 | 30.02 | 26.03 | 24.28 | 25.27 | 69.00 | 91.44 | 42.24 | 9.98 | 99.50 | 61.84 | 55.17 | 47.37 |
| △ | -0.66 | +2.82 | +0.20 | -0.01 | +0.04 | +0.27 | +0.48 | +0.07 | +0.99 | +4.50 | -0.91 | +1.20 | +0.00 | +0.00 | -0.66 | +0.24 | +0.84 |
| AdaKV | 25.96 | 40.26 | 52.82 | 54.55 | 43.83 | 30.43 | 25.76 | 24.06 | 24.69 | 69.00 | 92.05 | 42.10 | 9.45 | 99.50 | 62.58 | 55.59 | 46.42 |
| + MixKV | 26.13 | 42.08 | 53.18 | 55.47 | 43.88 | 28.80 | 26.68 | 24.03 | 25.35 | 70.00 | 91.01 | 42.79 | 9.41 | 99.50 | 62.92 | 55.82 | 46.75 |
| △ | +0.17 | +1.82 | +0.36 | +0.92 | +0.05 | -1.63 | +0.92 | -0.03 | +0.66 | +1.00 | -1.04 | +0.69 | -0.04 | +0.00 | +0.34 | +0.23 | +0.33 |
Results here report only budget = 128.
| Methods | DocVQA (%) | OCRBench (%) | TextVQA (%) | ChartQA (%) | TextCaps |
|---|---|---|---|---|---|
| InternVL3-38B | |||||
| Full KV | 93.5 | 85.9 | 83.8 | 88.6 | 0.953 |
| SnapKV | 87.5 | 77.8 | 82.0 | 87.5 | 0.932 |
| + MixKV | 92.1 | 79.3 | 82.8 | 88.2 | 0.959 |
| △ | +4.6 | +1.5 | +0.8 | +0.7 | +0.027 |
| AdaKV | 92.0 | 79.6 | 82.0 | 87.4 | 0.940 |
| + MixKV | 92.3 | 81.1 | 82.9 | 88.2 | 0.961 |
| △ | +0.3 | +1.5 | +0.9 | +0.8 | +0.021 |
Results here report only budget = 128.
| Methods | DocVQA (%) | OCRBench (%) | TextVQA (%) | ChartQA (%) | TextCaps |
|---|---|---|---|---|---|
| Qwen3-VL-30B-A3B-Instruct | |||||
| Full KV | 94.5 | 84.0 | 83.5 | 85.1 | 0.287 |
| SnapKV | 91.9 | 71.0 | 75.3 | 83.8 | 0.314 |
| + MixKV | 93.2 | 80.7 | 80.8 | 84.5 | 0.411 |
| △ | +1.3 | +9.7 | +5.5 | +0.7 | +0.097 |
For a context length of 32,000, “Full KV” refers to caching the entire sequence, whereas KV compression strategies employ a budget of 64. The upper part is total time, while the lower part is peak memory.
Citation
If you find this project helpful, please consider citing our paper with:
@article{liu2025mixkv,
title={Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models},
author={Liu, Xuyang and Gui, Xiyan and Zhang, Yuchao and Zhang, Linfeng},
journal={arXiv preprint arXiv:2510.20707},
year={2025}
}