🌈 I am Xuyang Liu (εˆ˜ζ—­ζ΄‹), a third-year Master’s student at Sichuan University . I am also working as a research intern at OPPO Research Institute , supervised by Prof. Lei Zhang (PolyU, IEEE Fellow). Previously, I have interned at Ant Group focusing on GUI Agent, and Taobao & Tmall Group working on Efficient VLMs. I’ve also spent half a year visiting MiLAB at Westlake University, supervised by Prof. Donglin Wang. I am fortunate to work closely with Dr. Siteng Huang from DAMO Academy and Prof. Linfeng Zhang from SJTU.

πŸ“Œ My research centers on efficient Large Vision-Language Models (LVLMs), including:

  • πŸ–ΌοΈ Image-Text LVLMs: high-resolution understanding via context compression and fast decoding.
  • 🎬 Video Understanding: long/audio-video, and streaming reasoning via efficient encoding and compression.
  • βš™οΈ Efficiency Toolbox: efficient transfer/fine-tuning and benchmarking for downstream task adaptation.

πŸ“’ If you find these directions interesting, feel free to reach out via email: liuxuyang@stu.scu.edu.cn.

πŸ”₯ News

  • 2026.01.26 🎊🎊 Two papers have been accepted by ICLR 2026, including fast decoding for VLM/LLM via MixKV and the first safety study of dLLMs DIJA! Congratulations to all collaborators!
  • 2025.12.02 πŸ€—πŸ€— We release STC, a plug-and-play inference acceleration framework for streaming video understanding! Code is available!
  • 2025.11.08 🎊🎊 Three papers have been accepted by AAAI 2026, including two LVLM acceleration methods GlobalCom2 and FiCoCo, and a RL-based GUI grounding training framework GUI-G2!
  • 2025.08.21 🎊🎊 One first author paper VidCom2 about plug-and-play inference acceleration for VideoLLMs has been accepted by EMNLP 2025 main conference! Code is available!
  • 2025.05.27 πŸ™ŒπŸ™Œ We release a new paper, pointing to shifting AI efficiency from model-centric to data-centric compression. Project is available! Our paper is honored to be the #2 Paper of the day!
  • 2025.03.11 🎊🎊 One first author paper (M2IST) about parameter-, memory-, and time-efficient fine-tuning for referring expression comprehension has been accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)!
  • 2025.02.22 🎊🎊 Two papers (ToCa and AutoGnothi) have been accepted by ICLR 2025! Congratulations to all collaborators!
  • 2024.09.26 🎊🎊 One co-first author paper (V-PETL) about unified visual parameter-efficient transfer learning benchmark has been accepted by NeurIPS 2024!

πŸ“ Publications

Full publications are on my Google Scholar profile. *: Equal contribution. †: Project leader. Google Scholar

Conference Papers

Xuyang Liu*, Xiyan Gui*, Yuchao Zhang, Linfeng Zhang, "Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models". In International Conference on Learning Representations (ICLR), 2026. [paper] [code]

Zichen Wen, Jiashu Qu, Dongrui Liu, Zhiyuan Liu, Ruixi Wu, Yicun Yang, Xiangqi Jin, Haoyun Xu, Xuyang Liu, Weijia Li, Chaochao Lu, Jing Shao, Conghui He, Linfeng Zhang, "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs". In International Conference on Learning Representations (ICLR), 2026. [paper] [code] [huggingface paper] [量子位]

Xuyang Liu, Ziming Wang, Junjie Chen, Yuhang Han, Yingyao Wang, Jiale Yuan, Jun Song, Linfeng Zhang, Siteng Huang, Honggang Chen, "Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [code] [poster]

Yuhang Han*, Xuyang Liu*, Zihan Zhang, Pengxiang Ding, Junjie Chen, Donglin Wang, Honggang Chen, Qingsen Yan, Siteng Huang, "Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [page] [code][poster]

Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang, "GUI-G2: Gaussian Reward Modeling for GUI Grounding". In Proceedings of the 40th AAAI Conference on Artificial Intelligence, 2026. [paper] [code] [huggingface paper] [page] [ζœΊε™¨δΉ‹εΏƒ]

Xuyang Liu*, Yiyu Wang*, Junpeng Ma, Linfeng Zhang, "Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models". In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025. [paper] [code] [Xiaohongshu] [ζœΊε™¨δΉ‹εΏƒ] [PaperWeekly] [slides] [poster] [video]

Chang Zou*, Xuyang Liu*, Ting Liu, Siteng Huang, Linfeng Zhang, "Accelerating Diffusion Transformers with Token-wise Feature Caching". In International Conference on Learning Representations (ICLR), 2025. [paper] [page] [code] [量子位] [poster]

Shaobo Wang, Hongxuan Tang, Mingyang Wang, Hongrui Zhang, Xuyang Liu, Weiya Li, Xuming Hu, Linfeng Zhang, "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models". In International Conference on Learning Representations (ICLR), 2025. [paper] [code]

Yi Xin*, Siqi Luo*, Xuyang Liu*, Yuntao Du*, Haodi Zhou, Xinyu Cheng, Christina Lee, and 10 more authors, "V-PETL Bench: A Unified Visual Parameter-Efficient Transfer Learning Benchmark". In Neural Information Processing Systems Datasets and Benchmarks Track (NeurlPS D&B Track), 2024. [paper][page] [code] [poster]

Journal Papers

Xuyang Liu*, Ting Liu*, Siteng Huang, Yi Xin, Yue Hu, Quanjun Yin, Donglin Wang, Yuanyuan Wu, Honggang Chen, "M2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension". IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2025. [paper] [code]

Preprints & Under Submission

Yiyu Wang*, Xuyang Liu*,†, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, Linfeng Zhang, "Accelerating Streaming Video Large Language Models via Hierarchical Token Compression". arXiv preprint arXiv:2512.00891. [paper] [code] [PaperWeekly]

Junjie Chen*, Xuyang Liu*,†, Zichen Wen, Yiyu Wang, Siteng Huang, Honggang Chen, "Variation-aware Vision Token Dropping for Faster Large Vision-Language Models". arXiv preprint arXiv:2509.01552. [paper] [code]

Xuyang Liu*, Zichen Wen*, Shaobo Wang*, Junjie Chen, Zhishan Tao, and 10 more authors, "Shifting AI Efficiency From Model-Centric to Data-Centric Compression". arXiv preprint arXiv:2505.19147. [paper] [project] [huggingface paper] [Twitter@Rohan Paul]

Ting Liu*, Xuyang Liu*, Liangtao Shi, Zunnan Xu, Yue Hu, Siteng Huang, Yi Xin, Bineng Zhong, Donglin Wang, "Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference". arXiv preprint arXiv:2405.14700. [paper] [github]

πŸ€— Resources

Please find my full repositories on my GitHub profile. GitHub

πŸ’» Experiences

Internships

  • Research Intern - OPPO Research Institute, OPPO, Shenzhen
    • Time: Jul 2025 - Present.
    • Thesis: Video Understanding with Large Vision-Language Models.
    • Supervisor: Prof. Lei Zhang.
  • Research Intern - Ant Security Lab, Ant Group, Hangzhou
    • Time: Apr 2025 - Jul 2025.
    • Thesis: Multi-modal Graphical User Interface (GUI) Agents.
  • Research Intern - Taobao & Tmall Group, Alibaba Group, Beijing
    • Time: Jul 2024 - Mar 2025.
    • Thesis: Efficient Multi-modal Large Language Models.

Visiting

  • Research Assistant - EPIC Lab, Shanghai Jiao Tong University, Remote
    • Time: June 2024 - Present.
    • Thesis: Efficient Multi-modal Large Language Models.
    • Supervisor: Prof. Linfeng Zhang.
  • Visiting Student - MiLab, Westlake University, Hangzhou
    • Time: Mar 2023 - Sep 2023.
    • Thesis: Efficient Transfer of Vision-language Models.
    • Supervisors: Dr. Siteng Huang and Prof. Donglin Wang.

🎀 Talks

πŸ“  Services

Conference Reviewer

  • International Conference on Learning Representations (ICLR)
  • International Conference on Machine Learning (ICML)
  • Advances in Neural Information Processing Systems (NeurIPS)
  • AAAI Conference on Artificial Intelligence (AAAI)
  • ACM International Conference on Multimedia (MM)

Journal Reviewer

  • IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
  • Computer Vision and Image Understanding (CVIU)