Xuyang Liu (刘旭洋)
🌈 I am a second-year Master’s student at Sichuan University, supervised by Prof. Honggang Chen. I am working as a research intern at Ant Security Lab, part of Ant Group, focusing on GUI Agent. Previously, I completed an 8-month internship at Taobao & Tmall Group, part of Alibaba Group, working on Efficient MLLM. I’ve also spent half a year visiting MiLAB at Westlake University, supervised by Prof. Donglin Wang. I am very glad to be supervised and collaborated with Dr. Siteng Huang from DAMO Academy and Asst. Prof. Linfeng Zhang from SJTU.
📌 My research interests span Efficient Vision-Language Models, including:
- Efficient Inference: VidCom2, GlobalCom2, FiCoCo, ToCa, Dense-Tuning
- Efficient Training: M2IST, V-PETL Bench, DARA, Dense-Tuning, AutoGnothi
📢 Recently, I am mainly focusing on Token-level Model Compression. Feel free to reach out to me via Email liuxuyang@stu.scu.edu.cn
, if you are interested in collaborating with me. 🙋 I’m actively seeking Ph.D. position in 2026 Fall!
🔥 News
- 2025.05.27 🙌🙌 We release a new paper, pointing to shifting AI efficiency from model-centric to data-centric compression. Project is available! Our paper is honored to be the #2 Paper of the day!
- 2025.05.21 🤗🤗 We release VidCom2, a plug-and-play inference acceleration method of VideoLLMs. Code is available!
- 2025.04.07: ⛵⛵ I begin my research internship at Ant Security Lab at Ant Group in Hangzhou, focusing on multi-modal graphical user interface (GUI) agents.
- 2025.03.11 🎊🎊 One first author paper (M2IST) about parameter-, memory-, and time-efficient fine-tuning for referring expression comprehension has been accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)!
- 2025.02.22: 🎊🎊 Two papers (ToCa and AutoGnothi) have been accepted by ICLR 2025! Congratulations to all collaborators!
- 2025.01.10: 🤗🤗 We release GlobalCom2, a “global-to-local” approach for training-free acceleration of high-resolution LVLMs with dynamic tiling strategy. Code is available!
- 2024.11.17: We release FiCoCo, a “filter-correlate-compress” framework that decomposes the token reduction into three stages for training-free acceleration of MLLMs.
- 2024.09.26: 🎊🎊 One co-first author paper (V-PETL) about unified visual parameter-efficient transfer learning benchmark has been accepted by NeurIPS 2024!
📝 Publications
Please find my full publications on my Google Scholar profile.
Conference Papers
Chang Zou*, Xuyang Liu*, Ting Liu, Siteng Huang, Linfeng Zhang, "Accelerating Diffusion Transformers with Token-wise Feature Caching". In International Conference on Learning Representations (ICLR), 2025. [paper] [page] [code] [量子位] [poster]
Shaobo Wang, Hongxuan Tang, Mingyang Wang, Hongrui Zhang, Xuyang Liu, Weiya Li, Xuming Hu, Linfeng Zhang, "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models". In International Conference on Learning Representations (ICLR), 2025. [paper] [code]
Yi Xin*, Siqi Luo*, Xuyang Liu*, Yuntao Du*, Haodi Zhou, Xinyu Cheng, Christina Lee, and 10 more authors, "V-PETL Bench: A Unified Visual Parameter-Efficient Transfer Learning Benchmark". In Neural Information Processing Systems Datasets and Benchmarks Track (NeurlPS D&B Track), 2024. [paper][page] [code] [poster]
Xuyang Liu*, Siteng Huang*, Yachen Kang, Honggang Chen, Donglin Wang, "VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders". In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024. [paper] [code] [poster]
Ting Liu*, Xuyang Liu*, Siteng Huang, Honggang Chen, Quanjun Yin, Long Qin, Donglin Wang, Yue Hu, "DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding". In IEEE International Conference on Multimedia & Expo (ICME), 2024. (Oral) [paper] [code] [poster]
Journal Papers
Xuyang Liu*, Ting Liu*, Siteng Huang, Yi Xin, Yue Hu, Quanjun Yin, Donglin Wang, Yuanyuan Wu, Honggang Chen "M2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension". IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2025. [paper] [code]
Xinying Lin, Xuyang Liu, Hong Yang, Xiaohai He, Honggang Chen, "Perception- and Fidelity-aware Reduced-Reference Super-Resolution Image Quality Assessment". IEEE Transactions on Broadcasting (TBC), 2024. [paper] [code]
Xuyang Liu, "GLMLP-TRANS: A transportation mode detection model using lightweight sensors integrated in smartphones". Computer Communications, 2022. [paper] [code]
Preprints & Under Submission
Xuyang Liu*, Zichen Wen*, Shaobo Wang*, Junjie Chen, Zhishan Tao, Yubo Wang, Xiangqi Jin, Chang Zou, Yiyu Wang, Chenfei Liao, Xu Zheng, Honggang Chen, Weijia Li, Xuming Hu, Conghui He, Linfeng Zhang, "Shifting AI Efficiency From Model-Centric to Data-Centric Compression". arXiv preprint arXiv:2505.19147. [paper] [project] [huggingface paper] [video]
Xuyang Liu*, Yiyu Wang*, Junpeng Ma, Linfeng Zhang, "Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models". arXiv preprint arXiv:2505.14454. [paper] [code]
Xuyang Liu, Ziming Wang, Yuhang Han, Yingyao Wang, Jiale Yuan, Jun Song, Bo Zheng, Linfeng Zhang, Siteng Huang, Honggang Chen, "Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models". arXiv preprint arXiv:2501.05179. [paper] [code]
Yuhang Han*, Xuyang Liu*, Pengxiang Ding, Donglin Wang, Honggang Chen, Qingsen Yan, Siteng Huang "Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration". arXiv preprint arXiv:2411.17686. [paper] [page] [code]
Ting Liu*, Xuyang Liu*, Siteng Huang, Liangtao Shi, Zunnan Xu , Yi Xin, Quanjun Yin, Xiaohong Liu "Dense-Tuning: Densely Adapting Vision Transformers with Efficient Fine-tuning and Inference". arXiv preprint arXiv:2405.14700. [paper] [github] [Chinese intro (Zhihu)]
Junjie Chen, Xuyang Liu, Subin Huang, Linfeng Zhang, Hang Yu "Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models". arXiv preprint arXiv:2503.12149. [paper] [code]
🤗 Resources
Please find my full repositories on my GitHub profile.
- Awesome Generation Acceleration
- Duty: Owner.
- Description: An open-source repository that curates a collection of recent awesome papers on AIGC acceleration.
- Awesome Token-level Model Compression
- Duty: Owner.
- Description: An open-source repository that curates a collection of recent awesome papers on token-level model compression.
💻 Experiences
Internships
- Research Intern - Ant Security Lab, Ant Group, Hangzhou
- Time: Apr 2025 - Present.
- Thesis: Multi-modal Graphical User Interface (GUI) Agents.
- Research Intern - Taobao & Tmall Group, Alibaba Group, Beijing
- Time: Jul 2024 - Mar 2025.
- Thesis: Efficient Multi-modal Large Language Models.
Visiting
- Research Assistant - EPIC Lab, Shanghai Jiao Tong University, Remote
- Time: June 2024 - Present.
- Thesis: Efficient Multi-modal Large Language Models.
- Supervisers: Prof. Linfeng Zhang.
- Visiting Student - MiLab, Westlake University, Hangzhou
- Time: Mar 2023 - Sep 2023.
- Thesis: Efficient Transfer of Vision-language Models.
- Supervisers: Dr. Siteng Huang and Prof. Donglin Wang.