Xuyang Liu (刘旭洋) -

🌈 I am Xuyang Liu (刘旭洋), a third-year Master’s student at Sichuan University . I am also working as a research intern at OPPO Research Institute , supervised by Prof. Lei Zhang (PolyU, IEEE Fellow). Previously, I have interned at Ant Group focusing on GUI Agent, and Taobao & Tmall Group working on Efficient VLMs. I’ve also spent half a year visiting MiLAB at Westlake University, supervised by Prof. Donglin Wang. I am fortunate to work closely with Dr. Siteng Huang from DAMO Academy and Prof. Linfeng Zhang from SJTU.

📌 My research centers on efficient Large Vision-Language Models (LVLMs), including:

🖼️ Image-Text LVLMs: high-resolution understanding via context compression and fast decoding, including GlobalCom²_[AAAI’26], V²Drop_[CVPR’26], FiCoCo_[AAAI’26], and MixKV_[ICLR’26].
🎬 Video Understanding: long/audio-video, and streaming reasoning via efficient encoding and compression, including VidCom²_[EMNLP’25], STC_[CVPR’26], and OmniSIFT.
⚙️ Efficiency Toolbox: efficient transfer/fine-tuning and benchmarking for downstream task adaptation, including M2IST_[TCSVT’25], V-PETL_{[NeurIPS’24]} and AutoGnothi_[ICLR’25].

📢 If you find these directions interesting, feel free to reach out via email: liuxuyang@stu.scu.edu.cn.

🔥 News

2026.02.21 🎊🎊 Four papers have been accepted by CVPR 2026, including token compression for VLMs via V²Drop, efficient streaming video understanding via STC, and token compression for autonomous driving via Prune2Drive to the main conference, and Flash-Unified to the findings!
2026.01.26 🎊🎊 Two papers have been accepted by ICLR 2026, including fast decoding for VLM/LLM via MixKV and the first safety study of dLLMs DIJA! Congratulations to all collaborators!
2025.11.08 🎊🎊 Three papers have been accepted by AAAI 2026, including two LVLM acceleration methods GlobalCom² and FiCoCo, and a RL-based GUI grounding training framework GUI-G²!
2025.08.21 🎊🎊 One first author paper VidCom² about plug-and-play inference acceleration for VideoLLMs has been accepted by EMNLP 2025 main conference! Code is available!
2025.05.27 🙌🙌 We release a new paper, pointing to shifting AI efficiency from model-centric to data-centric compression. Project is available! Our paper is honored to be the #2 Paper of the day!
2025.03.11 🎊🎊 One first author paper (M2IST) about parameter-, memory-, and time-efficient fine-tuning for referring expression comprehension has been accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)!
2025.02.22 🎊🎊 Two papers (ToCa and AutoGnothi) have been accepted by ICLR 2025! Congratulations to all collaborators!
2024.09.26 🎊🎊 One co-first author paper (V-PETL) about unified visual parameter-efficient transfer learning benchmark has been accepted by NeurIPS 2024!

📝 Publications

Full publications are on my Google Scholar profile. *: Equal contribution. †: Project leader.

ICLR 2026

💡 The first to identify heterogeneous head-wise redundancy in the KV cache of both LVLMs and LLMs.

Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models

Xuyang Liu^*, Xiyan Gui^*, Yuchao Zhang, Linfeng Zhang

Works with diverse LVLMs and LLMs, including LLaVA, Qwen-VL, InternVL, Llama, and Mistral series.
Integrates with existing compressors (e.g., SnapKV, AdaKV, SparseMM) and consistently improves their performance.
Delivers accuracy gains (average of +5.1% across 5 tasks) without sacrificing speed or memory efficiency.

[paper] [code] [page] [Xiaohongshu] [52CV]

EMNLP 2025

⚡ The first token compression framework for VideoLLMs featuring dynamic frame budget allocation.

Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Xuyang Liu^*, Yiyu Wang^*, Junpeng Ma, Linfeng Zhang

Compatible with major VideoLLMs and OmniLLMs, including LLaVA, Qwen-VL, and Qwen-Omni series.
Uses only 25% of visual tokens while preserving 99.6% performance of LLaVA-OV.
Reduces LLaVA-OV LLM generation time by 70.8% and overall latency by 43.0%.

[paper] [code] [page] [Xiaohongshu] [机器之心] [PaperWeekly] [slides] [poster] [video]

AAAI 2026

📊 The first to systematically analyze token compression in HR-LVLMs with adaptive crop budget allocation.

Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models

Xuyang Liu, Ziming Wang, Junjie Chen, Yuhang Han, Yingyao Wang, Jiale Yuan, Jun Song, Siteng Huang, Honggang Chen

Compatible with major HR-LVLMs, including LLaVA-NeXT and LLaVA-OV.
Uses only 10% of visual tokens while maintaining above 90% performance across 10 tasks.
Reduces FLOPs and peak memory to 9.1% and 60%, and achieves 1.8x throughput.

[paper] [code] [poster]

🚩 Highlight: ICLR: 4, NeurIPS: 1, CVPR: 4, AAAI: 3, EMNLP: 1.

Conference Papers

Yiyu Wang^*, Xuyang Liu^*,†, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, Linfeng Zhang, "Accelerating Streaming Video Large Language Models via Hierarchical Token Compression". In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026. [paper] [code] [Xiaohongshu] [PaperWeekly]

Junjie Chen^*, Xuyang Liu^*,†, Zichen Wen, Yiyu Wang, Siteng Huang, Honggang Chen, "Variation-aware Vision Token Dropping for Faster Large Vision-Language Models". In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026. [paper] [code] [52CV]

Minhao Xiong, Zichen Wen, Zhuangcheng Gu, Xuyang Liu, Rui Zhang, Hengrui Kang, Jiabing Yang, Junyuan Zhang, Weijia Li, Conghui He, Yafei Wang, Linfeng Zhang, "Prune2Drive: A Plug-and-Play Framework for Accelerating Vision-Language Models in Autonomous Driving". In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026. [paper] [code]

Junlong Ke, Zichen Wen, Boxue Yang, Yantai Yang, Xuyang Liu, Chenfei Liao, Zhaorun Chen, Shaobo Wang, Linfeng Zhang, "Flash-Unified: A Training-Free and Task-Aware Acceleration Framework for Native Unified Models". In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026. [paper] [code]

Xuyang Liu^*, Xiyan Gui^*, Yuchao Zhang, Linfeng Zhang, "Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models". In International Conference on Learning Representations (ICLR), 2026. [paper] [code] [page] [Xiaohongshu] [52CV]

Zichen Wen, Jiashu Qu, Dongrui Liu, Zhiyuan Liu, Ruixi Wu, Yicun Yang, Xiangqi Jin, Haoyun Xu, Xuyang Liu, Weijia Li, Chaochao Lu, Jing Shao, Conghui He, Linfeng Zhang, "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs". In International Conference on Learning Representations (ICLR), 2026. [paper] [code] [huggingface paper] [量子位]

Xuyang Liu, Ziming Wang, Junjie Chen, Yuhang Han, Yingyao Wang, Jiale Yuan, Jun Song, Siteng Huang, Honggang Chen, "Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [code] [poster]

Yuhang Han^*, Xuyang Liu^*, Zihan Zhang, Pengxiang Ding, Junjie Chen, Donglin Wang, Honggang Chen, Qingsen Yan, Siteng Huang, "Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [page] [code][poster]

Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang, "GUI-G²: Gaussian Reward Modeling for GUI Grounding". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [code] [huggingface paper] [page] [机器之心]

Xuyang Liu^*, Yiyu Wang^*, Junpeng Ma, Linfeng Zhang, "Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models". In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025. [paper] [code] [page] [Xiaohongshu] [机器之心] [PaperWeekly] [slides] [poster] [video]

Chang Zou^*, Xuyang Liu^*, Ting Liu, Siteng Huang, Linfeng Zhang, "Accelerating Diffusion Transformers with Token-wise Feature Caching". In International Conference on Learning Representations (ICLR), 2025. [paper] [page] [code] [量子位] [poster]

Shaobo Wang, Hongxuan Tang, Mingyang Wang, Hongrui Zhang, Xuyang Liu, Weiya Li, Xuming Hu, Linfeng Zhang, "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers". In International Conference on Learning Representations (ICLR), 2025. [paper] [code]

Yi Xin^*, Siqi Luo^*, Xuyang Liu^*, Yuntao Du^*, Haodi Zhou, Xinyu Cheng, Christina Lee, and 10 more authors, "V-PETL Bench: A Unified Visual Parameter-Efficient Transfer Learning Benchmark". In Neural Information Processing Systems Datasets and Benchmarks Track (NeurlPS D&B Track), 2024. [paper][page] [code] [poster]

Journal Papers

Xuyang Liu^*, Ting Liu^*, Siteng Huang, Yi Xin, Yue Hu, Quanjun Yin, Donglin Wang, Yuanyuan Wu, Honggang Chen, "M2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension". IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2025. [paper] [code]

Preprints & Under Submission

Yue Ding, Yiyan Ji, Jungang Li, Xuyang Liu, Xinlong Chen, and 10 more authors, "OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models". arXiv preprint arXiv:2602.04804. [paper] [huggingface paper]

Xuyang Liu^*, Zichen Wen^*, Shaobo Wang^*, Junjie Chen, Zhishan Tao, and 10 more authors, "Shifting AI Efficiency From Model-Centric to Data-Centric Compression". arXiv preprint arXiv:2505.19147. [paper] [project] [huggingface paper] [Twitter@Rohan Paul]

Ting Liu^*, Xuyang Liu^*, Liangtao Shi, Zunnan Xu, Yue Hu, Siteng Huang, Yi Xin, Bineng Zhong, Donglin Wang, "Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference". arXiv preprint arXiv:2405.14700. [paper] [github]

🤗 Resources

Please find my full repositories on my GitHub profile.

Awesome Generation Acceleration
- Duty: Owner.
- Description: An open-source repository that curates a collection of recent awesome papers on AIGC acceleration.
Awesome Token-level Model Compression
- Duty: Owner.
- Description: An open-source repository that curates a collection of recent awesome papers on token-level model compression.

💻 Experiences

Internships

Research Intern - OPPO Research Institute, OPPO, Shenzhen
- Time: Jul 2025 - Present.
- Thesis: Video Understanding with Large Vision-Language Models.
- Supervisor: Prof. Lei Zhang.
Research Intern - Ant Security Lab, Ant Group, Hangzhou
- Time: Apr 2025 - Jul 2025.
- Thesis: Multi-modal Graphical User Interface (GUI) Agents.
Research Intern - Taobao & Tmall Group, Alibaba Group, Beijing
- Time: Jul 2024 - Mar 2025.
- Thesis: Efficient Multi-modal Large Language Models.

Visiting

Research Assistant - EPIC Lab, Shanghai Jiao Tong University, Remote
- Time: June 2024 - Present.
- Thesis: Efficient Multi-modal Large Language Models.
- Supervisor: Prof. Linfeng Zhang.
Visiting Student - MiLab, Westlake University, Hangzhou
- Time: Mar 2023 - Sep 2023.
- Thesis: Efficient Transfer of Vision-language Models.
- Supervisors: Dr. Siteng Huang and Prof. Donglin Wang.

🎤 Talks

2025.06.10: PolyU NLP Group directed by Prof. Wenjie Li: Shifting AI Efficiency From Model-Centric to Data-Centric Compression. [slides]

📠 Services

Conference Reviewer

International Conference on Learning Representations (ICLR)
International Conference on Machine Learning (ICML)
Advances in Neural Information Processing Systems (NeurIPS)
AAAI Conference on Artificial Intelligence (AAAI)
ACM International Conference on Multimedia (MM)

Journal Reviewer

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
Computer Vision and Image Understanding (CVIU)