Xuyang Liu (刘旭洋)
🌈 I am a third-year Master’s student at Sichuan University
. I am also working as a research intern at OPPO Research Institute
, supervised by Prof. Lei Zhang (PolyU HK, IEEE Fellow). Previously, I have interned at Ant Group
focusing on GUI Agent, and Taobao & Tmall Group
working on Efficient VLMs. I’ve also spent half a year visiting MiLAB at Westlake University, supervised by Prof. Donglin Wang. I am fortunate to work closely with Dr. Siteng Huang from DAMO Academy and Prof. Linfeng Zhang from SJTU.
📌 My research interests span Efficient Vision-Language Models, including:
- Efficient Inference: MixKV, V2Drop, VidCom2, GlobalCom2, FiCoCo, ToCa
- Efficient Training: M2IST, V-PETL Bench, DARA, Sparse-Tuning, AutoGnothi
📢 Recently, I am mainly focusing on Data-centric Model Compression. Feel free to reach out to me via Email liuxuyang@stu.scu.edu.cn, if you are interested in collaborating with me.
🔥 News
- 2025.12.02 🤗🤗 We release STC, a plug-and-play inference acceleration framework for streaming video understanding! Code is available!
- 2025.11.08 🎊🎊 Three papers have been accepted by AAAI 2026, including two LVLM acceleration methods GlobalCom2 and FiCoCo, and a RL-based GUI grounding training framework GUI-G2!
- 2025.10.24 🤗🤗 We release MixKV, a plug-and-play framework that enhances existing KV compression methods with consistent performance gains across multiple LVLMs and tasks.
- 2025.08.21 🎊🎊 One first author paper VidCom2 about plug-and-play inference acceleration for VideoLLMs has been accepted by EMNLP 2025 main conference! Code is available!
- 2025.05.27 🙌🙌 We release a new paper, pointing to shifting AI efficiency from model-centric to data-centric compression. Project is available! Our paper is honored to be the #2 Paper of the day!
- 2025.03.11 🎊🎊 One first author paper (M2IST) about parameter-, memory-, and time-efficient fine-tuning for referring expression comprehension has been accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)!
- 2025.02.22 🎊🎊 Two papers (ToCa and AutoGnothi) have been accepted by ICLR 2025! Congratulations to all collaborators!
- 2024.09.26 🎊🎊 One co-first author paper (V-PETL) about unified visual parameter-efficient transfer learning benchmark has been accepted by NeurIPS 2024!
📝 Publications
Full publications are on my Google Scholar profile. *: Equal contribution. †: Project leader.
Conference Papers
Xuyang Liu, Ziming Wang, Junjie Chen, Yuhang Han, Yingyao Wang, Jiale Yuan, Jun Song, Linfeng Zhang, Siteng Huang, Honggang Chen, "Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [code]
Yuhang Han*, Xuyang Liu*, Zihan Zhang, Pengxiang Ding, Junjie Chen, Donglin Wang, Honggang Chen, Qingsen Yan, Siteng Huang, "Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [page] [code]
Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang, "GUI-G2: Gaussian Reward Modeling for GUI Grounding". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [code] [huggingface paper] [page] [机器之心]
Xuyang Liu*, Yiyu Wang*, Junpeng Ma, Linfeng Zhang, "Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models". In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025. [paper] [code] [slides] [poster] [Chinese intro]
Chang Zou*, Xuyang Liu*, Ting Liu, Siteng Huang, Linfeng Zhang, "Accelerating Diffusion Transformers with Token-wise Feature Caching". In International Conference on Learning Representations (ICLR), 2025. [paper] [page] [code] [量子位] [poster]
Shaobo Wang, Hongxuan Tang, Mingyang Wang, Hongrui Zhang, Xuyang Liu, Weiya Li, Xuming Hu, Linfeng Zhang, "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models". In International Conference on Learning Representations (ICLR), 2025. [paper] [code]
Yi Xin*, Siqi Luo*, Xuyang Liu*, Yuntao Du*, Haodi Zhou, Xinyu Cheng, Christina Lee, and 10 more authors, "V-PETL Bench: A Unified Visual Parameter-Efficient Transfer Learning Benchmark". In Neural Information Processing Systems Datasets and Benchmarks Track (NeurlPS D&B Track), 2024. [paper][page] [code] [poster]
Journal Papers
Xuyang Liu*, Ting Liu*, Siteng Huang, Yi Xin, Yue Hu, Quanjun Yin, Donglin Wang, Yuanyuan Wu, Honggang Chen, "M2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension". IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2025. [paper] [code]
Preprints & Under Submission
Yiyu Wang*, Xuyang Liu*,†, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, Linfeng Zhang, "Accelerating Streaming Video Large Language Models via Hierarchical Token Compression". arXiv preprint arXiv:2512.00891. [paper] [code]
Xuyang Liu*, Xiyan Gui*, Yuchao Zhang, Linfeng Zhang, "Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models". arXiv preprint arXiv:2510.20707. [paper] [code]
Junjie Chen*, Xuyang Liu*,†, Zichen Wen, Yiyu Wang, Siteng Huang, Honggang Chen, "Variation-aware Vision Token Dropping for Faster Large Vision-Language Models". arXiv preprint arXiv:2509.01552. [paper] [code]
Xuyang Liu*, Zichen Wen*, Shaobo Wang*, Junjie Chen, Zhishan Tao, and 10 more authors, "Shifting AI Efficiency From Model-Centric to Data-Centric Compression". arXiv preprint arXiv:2505.19147. [paper] [project] [huggingface paper] [video]
Ting Liu*, Xuyang Liu*, Siteng Huang, Liangtao Shi, Zunnan Xu , Yi Xin, Quanjun Yin, "Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference". arXiv preprint arXiv:2405.14700. [paper] [github]
🤗 Resources
Please find my full repositories on my GitHub profile.
- Awesome Generation Acceleration
- Duty: Owner.
- Description: An open-source repository that curates a collection of recent awesome papers on AIGC acceleration.
- Awesome Token-level Model Compression
- Duty: Owner.
- Description: An open-source repository that curates a collection of recent awesome papers on token-level model compression.
💻 Experiences
Internships
- Research Intern - OPPO Research Institute, OPPO, Shenzhen
- Time: Jul 2025 - Present.
- Thesis: Video Understanding with Large Vision-Language Models.
- Supervisor: Prof. Lei Zhang.
- Research Intern - Ant Security Lab, Ant Group, Hangzhou
- Time: Apr 2025 - Jul 2025.
- Thesis: Multi-modal Graphical User Interface (GUI) Agents.
- Research Intern - Taobao & Tmall Group, Alibaba Group, Beijing
- Time: Jul 2024 - Mar 2025.
- Thesis: Efficient Multi-modal Large Language Models.
Visiting
- Research Assistant - EPIC Lab, Shanghai Jiao Tong University, Remote
- Time: June 2024 - Present.
- Thesis: Efficient Multi-modal Large Language Models.
- Supervisor: Prof. Linfeng Zhang.
- Visiting Student - MiLab, Westlake University, Hangzhou
- Time: Mar 2023 - Sep 2023.
- Thesis: Efficient Transfer of Vision-language Models.
- Supervisors: Dr. Siteng Huang and Prof. Donglin Wang.
🎤 Talks
- 2025.06.10: PolyU NLP Group directed by Prof. Wenjie Li: Shifting AI Efficiency From Model-Centric to Data-Centric Compression. [slides]
📠 Services
Conference Reviewer
- International Conference on Learning Representations (ICLR)
- Advances in Neural Information Processing Systems (NeurIPS)
- AAAI Conference on Artificial Intelligence (AAAI)
- ACM International Conference on Multimedia (MM)
Journal Reviewer
- IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
