Xuyang Liu (刘旭洋)

🌈 I am a third-year Master’s student at Sichuan University. I am also working as a research intern at OPPO Research Institute, supervised by Prof. Lei Zhang (PolyU HK, IEEE Fellow). Previously, I have interned at Ant Security Lab focusing on GUI Agent, and Taobao & Tmall Group working on Efficient MLLM. I’ve also spent half a year visiting MiLAB at Westlake University, supervised by Prof. Donglin Wang. I am fortunate to work closely with Dr. Siteng Huang from DAMO Academy and Prof. Linfeng Zhang from SJTU.

📌 My research interests span Efficient Vision-Language Models, including:

📢 Recently, I am mainly focusing on Data-centric Model Compression. Feel free to reach out to me via Email liuxuyang@stu.scu.edu.cn, if you are interested in collaborating with me.

🔥 News

  • 2025.11.08 🎊🎊 Three papers have been accepted by AAAI 2026, including two LVLM acceleration methods GlobalCom2 and FiCoCo, and a RL-based GUI grounding training framework GUI-G2!
  • 2025.10.24 🤗🤗 We release MixKV, a plug-and-play framework that enhances existing KV compression methods with consistent performance gains across multiple LVLMs and tasks.
  • 2025.08.21 🎊🎊 One first author paper VidCom2 about plug-and-play inference acceleration for VideoLLMs has been accepted by EMNLP 2025 main conference! Code is available!
  • 2025.05.27 🙌🙌 We release a new paper, pointing to shifting AI efficiency from model-centric to data-centric compression. Project is available! Our paper is honored to be the #2 Paper of the day!
  • 2025.03.11 🎊🎊 One first author paper (M2IST) about parameter-, memory-, and time-efficient fine-tuning for referring expression comprehension has been accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)!
  • 2025.02.22: 🎊🎊 Two papers (ToCa and AutoGnothi) have been accepted by ICLR 2025! Congratulations to all collaborators!
  • 2024.09.26: 🎊🎊 One co-first author paper (V-PETL) about unified visual parameter-efficient transfer learning benchmark has been accepted by NeurIPS 2024!

📝 Publications

Please find my full publications on my Google Scholar profile. Google Scholar

Conference Papers

Xuyang Liu, Ziming Wang, Junjie Chen, Yuhang Han, Yingyao Wang, Jiale Yuan, Jun Song, Linfeng Zhang, Siteng Huang, Honggang Chen, "Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [code]

Yuhang Han*, Xuyang Liu*, Zihan Zhang, Pengxiang Ding, Junjie Chen, Donglin Wang, Honggang Chen, Qingsen Yan, Siteng Huang "Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [page] [code]

Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang, "GUI-G2: Gaussian Reward Modeling for GUI Grounding". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [code] [huggingface paper] [page] [机器之心]

Xuyang Liu*, Yiyu Wang*, Junpeng Ma, Linfeng Zhang, "Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models". In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025. [paper] [code]

Chang Zou*, Xuyang Liu*, Ting Liu, Siteng Huang, Linfeng Zhang, "Accelerating Diffusion Transformers with Token-wise Feature Caching". In International Conference on Learning Representations (ICLR), 2025. [paper] [page] [code] [量子位] [poster]

Shaobo Wang, Hongxuan Tang, Mingyang Wang, Hongrui Zhang, Xuyang Liu, Weiya Li, Xuming Hu, Linfeng Zhang, "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models". In International Conference on Learning Representations (ICLR), 2025. [paper] [code]

Yi Xin*, Siqi Luo*, Xuyang Liu*, Yuntao Du*, Haodi Zhou, Xinyu Cheng, Christina Lee, and 10 more authors, "V-PETL Bench: A Unified Visual Parameter-Efficient Transfer Learning Benchmark". In Neural Information Processing Systems Datasets and Benchmarks Track (NeurlPS D&B Track), 2024. [paper][page] [code] [poster]

Xuyang Liu*, Siteng Huang*, Yachen Kang, Honggang Chen, Donglin Wang, "VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders". In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024. [paper] [code] [poster]

Ting Liu*, Xuyang Liu*, Siteng Huang, Honggang Chen, Quanjun Yin, Long Qin, Donglin Wang, Yue Hu, "DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding". In IEEE International Conference on Multimedia & Expo (ICME), 2024. (Oral) [paper] [code] [poster]

Journal Papers

Xuyang Liu*, Ting Liu*, Siteng Huang, Yi Xin, Yue Hu, Quanjun Yin, Donglin Wang, Yuanyuan Wu, Honggang Chen "M2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension". IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2025. [paper] [code]

Junjie Chen, Xuyang Liu, Subin Huang, Linfeng Zhang, Hang Yu "Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models". IEEE Transactions on Computational Social Systems (TCSS), 2025. [paper] [code]

Xinying Lin, Xuyang Liu, Hong Yang, Xiaohai He, Honggang Chen, "Perception- and Fidelity-aware Reduced-Reference Super-Resolution Image Quality Assessment". IEEE Transactions on Broadcasting (TBC), 2024. [paper] [code]

Xuyang Liu, "GLMLP-TRANS: A transportation mode detection model using lightweight sensors integrated in smartphones". Computer Communications, 2022. [paper] [code]

Preprints & Under Submission

Xuyang Liu*, Xiyan Gui*, Yuchao Zhang, Linfeng Zhang, "Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models". arXiv preprint arXiv:2510.20707. [paper] [code]

Zichen Wen, Yiyu Wang, Chenfei Liao, Boxue Yang, Junxian Li, Weifeng Liu, Haocong He, Bolong Feng, Xuyang Liu, Yuanhuiyi Lyu, Xu Zheng, Xuming Hu, Linfeng Zhang "AI for Service: Proactive Assistance with AI Glasses". arXiv preprint arXiv:2510.14359. [paper] [huggingface paper]

Junjie Chen*, Xuyang Liu*,†, Zichen Wen, Yiyu Wang, Siteng Huang, Honggang Chen "Variation-aware Vision Token Dropping for Faster Large Vision-Language Models". arXiv preprint arXiv:2509.01552. [paper] [code]

Minhao Xiong, Zichen Wen, Zhuangcheng Gu, Xuyang Liu, Rui Zhang, Hengrui Kang, Jiabing Yang, Junyuan Zhang, Weijia Li, Conghui He, Yafei Wang, Linfeng Zhang, "Prune2Drive: A Plug-and-Play Framework for Accelerating Vision-Language Models in Autonomous Driving". arXiv preprint arXiv:2508.13305. [paper]

Zichen Wen, Jiashu Qu, Dongrui Liu, Zhiyuan Liu, Ruixi Wu, Yicun Yang, Xiangqi Jin, Haoyun Xu, Xuyang Liu, Weijia Li, Chaochao Lu, Jing Shao, Conghui He, Linfeng Zhang, "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs". arXiv preprint arXiv:2507.11097. [paper] [code] [huggingface paper] [量子位]

Xuyang Liu*, Zichen Wen*, Shaobo Wang*, Junjie Chen, Zhishan Tao, and 10 more authors, "Shifting AI Efficiency From Model-Centric to Data-Centric Compression". arXiv preprint arXiv:2505.19147. [paper] [project] [huggingface paper] [video]

Ting Liu*, Xuyang Liu*, Siteng Huang, Liangtao Shi, Zunnan Xu , Yi Xin, Quanjun Yin "Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference". arXiv preprint arXiv:2405.14700. [paper] [github] [Chinese intro (Zhihu)]

🤗 Resources

Please find my full repositories on my GitHub profile. GitHub

💻 Experiences

Internships

  • Research Intern - OPPO Research Institute, OPPO, Shenzhen
    • Time: Jul 2025 - Present.
    • Thesis: Efficient Long Video Understanding.
    • Supervisor: Prof. Lei Zhang.
  • Research Intern - Ant Security Lab, Ant Group, Hangzhou
    • Time: Apr 2025 - Jul 2025.
    • Thesis: Multi-modal Graphical User Interface (GUI) Agents.
  • Research Intern - Taobao & Tmall Group, Alibaba Group, Beijing
    • Time: Jul 2024 - Mar 2025.
    • Thesis: Efficient Multi-modal Large Language Models.

Visiting

  • Research Assistant - EPIC Lab, Shanghai Jiao Tong University, Remote
    • Time: June 2024 - Present.
    • Thesis: Efficient Multi-modal Large Language Models.
    • Supervisor: Prof. Linfeng Zhang.
  • Visiting Student - MiLab, Westlake University, Hangzhou
    • Time: Mar 2023 - Sep 2023.
    • Thesis: Efficient Transfer of Vision-language Models.
    • Supervisors: Dr. Siteng Huang and Prof. Donglin Wang.

🎤 Talks

📠 Services

Conference Reviewer

  • Advances in Neural Information Processing Systems (NeurIPS)
  • International Conference on Learning Representations (ICLR)
  • AAAI Conference on Artificial Intelligence (AAAI)
  • ACM International Conference on Multimedia (MM)

Journal Reviewer

  • IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)