Xuyang Liu (刘旭洋)

🌈 I am a third-year Master’s student at Sichuan University. I am also working as a research intern at OPPO Research Institute, supervised by Prof. Lei Zhang. Previously, I have interned at Ant Security Lab focusing on GUI Agent, and Taobao & Tmall Group working on Efficient MLLM. I’ve also spent half a year visiting MiLAB at Westlake University, supervised by Prof. Donglin Wang. I am very glad to be supervised and collaborated with Dr. Siteng Huang from DAMO Academy and Asst. Prof. Linfeng Zhang from SJTU.

📌 My research interests span Efficient Vision-Language Models, including:

📢 Recently, I am mainly focusing on Token-level Model Compression, and applying it to efficient high-resolution understanding and long video understanding. Feel free to reach out to me via Email liuxuyang@stu.scu.edu.cn, if you are interested in collaborating with me.

🔥 News

  • 2025.05.27 🙌🙌 We release a new paper, pointing to shifting AI efficiency from model-centric to data-centric compression. Project is available! Our paper is honored to be the #2 Paper of the day!
  • 2025.05.21 🤗🤗 We release VidCom2, a plug-and-play inference acceleration method of VideoLLMs. Code is available!
  • 2025.03.11 🎊🎊 One first author paper (M2IST) about parameter-, memory-, and time-efficient fine-tuning for referring expression comprehension has been accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)!
  • 2025.02.22: 🎊🎊 Two papers (ToCa and AutoGnothi) have been accepted by ICLR 2025! Congratulations to all collaborators!
  • 2025.01.10: 🤗🤗 We release GlobalCom2, a “global-to-local” approach for training-free acceleration of high-resolution LVLMs with dynamic tiling strategy. Code is available!
  • 2024.11.17: We release FiCoCo, a “filter-correlate-compress” framework that decomposes the token reduction into three stages for training-free acceleration of MLLMs.
  • 2024.09.26: 🎊🎊 One co-first author paper (V-PETL) about unified visual parameter-efficient transfer learning benchmark has been accepted by NeurIPS 2024!

📝 Publications

Please find my full publications on my Google Scholar profile.

Conference Papers

Chang Zou*, Xuyang Liu*, Ting Liu, Siteng Huang, Linfeng Zhang, "Accelerating Diffusion Transformers with Token-wise Feature Caching". In International Conference on Learning Representations (ICLR), 2025. [paper] [page] [code] [量子位] [poster]

Shaobo Wang, Hongxuan Tang, Mingyang Wang, Hongrui Zhang, Xuyang Liu, Weiya Li, Xuming Hu, Linfeng Zhang, "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models". In International Conference on Learning Representations (ICLR), 2025. [paper] [code]

Yi Xin*, Siqi Luo*, Xuyang Liu*, Yuntao Du*, Haodi Zhou, Xinyu Cheng, Christina Lee, and 10 more authors, "V-PETL Bench: A Unified Visual Parameter-Efficient Transfer Learning Benchmark". In Neural Information Processing Systems Datasets and Benchmarks Track (NeurlPS D&B Track), 2024. [paper][page] [code] [poster]

Xuyang Liu*, Siteng Huang*, Yachen Kang, Honggang Chen, Donglin Wang, "VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders". In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024. [paper] [code] [poster]

Ting Liu*, Xuyang Liu*, Siteng Huang, Honggang Chen, Quanjun Yin, Long Qin, Donglin Wang, Yue Hu, "DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding". In IEEE International Conference on Multimedia & Expo (ICME), 2024. (Oral) [paper] [code] [poster]

Journal Papers

Xuyang Liu*, Ting Liu*, Siteng Huang, Yi Xin, Yue Hu, Quanjun Yin, Donglin Wang, Yuanyuan Wu, Honggang Chen "M2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension". IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2025. [paper] [code]

Xinying Lin, Xuyang Liu, Hong Yang, Xiaohai He, Honggang Chen, "Perception- and Fidelity-aware Reduced-Reference Super-Resolution Image Quality Assessment". IEEE Transactions on Broadcasting (TBC), 2024. [paper] [code]

Xuyang Liu, "GLMLP-TRANS: A transportation mode detection model using lightweight sensors integrated in smartphones". Computer Communications, 2022. [paper] [code]

Preprints & Under Submission

Xuyang Liu*, Zichen Wen*, Shaobo Wang*, Junjie Chen, Zhishan Tao, and 10 more authors, "Shifting AI Efficiency From Model-Centric to Data-Centric Compression". arXiv preprint arXiv:2505.19147. [paper] [project] [huggingface paper] [video]

Xuyang Liu*, Yiyu Wang*, Junpeng Ma, Linfeng Zhang, "Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models". arXiv preprint arXiv:2505.14454. [paper] [code]

Xuyang Liu, Ziming Wang, Yuhang Han, Yingyao Wang, Jiale Yuan, Jun Song, Bo Zheng, Linfeng Zhang, Siteng Huang, Honggang Chen, "Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models". arXiv preprint arXiv:2501.05179. [paper] [code]

Yuhang Han*, Xuyang Liu*, Pengxiang Ding, Donglin Wang, Honggang Chen, Qingsen Yan, Siteng Huang "Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration". arXiv preprint arXiv:2411.17686. [paper] [page] [code]

Ting Liu*, Xuyang Liu*, Siteng Huang, Liangtao Shi, Zunnan Xu , Yi Xin, Quanjun Yin, Xiaohong Liu "Dense-Tuning: Densely Adapting Vision Transformers with Efficient Fine-tuning and Inference". arXiv preprint arXiv:2405.14700. [paper] [github] [Chinese intro (Zhihu)]

Junjie Chen, Xuyang Liu, Subin Huang, Linfeng Zhang, Hang Yu "Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models". arXiv preprint arXiv:2503.12149. [paper] [code]

🤗 Resources

Please find my full repositories on my GitHub profile. GitHub

💻 Experiences

Internships

  • Research Intern - OPPO Research Institute, OPPO, Shenzhen
    • Time: Jul 2025 - Present.
    • Thesis: On-device Vision-language Models.
    • Superviser: Prof. Lei Zhang.
  • Research Intern - Ant Security Lab, Ant Group, Hangzhou
    • Time: Apr 2025 - Jul 2025.
    • Thesis: Multi-modal Graphical User Interface (GUI) Agents.
  • Research Intern - Taobao & Tmall Group, Alibaba Group, Beijing
    • Time: Jul 2024 - Mar 2025.
    • Thesis: Efficient Multi-modal Large Language Models.

Visiting

  • Research Assistant - EPIC Lab, Shanghai Jiao Tong University, Remote
    • Time: June 2024 - Present.
    • Thesis: Efficient Multi-modal Large Language Models.
    • Supervisers: Prof. Linfeng Zhang.
  • Visiting Student - MiLab, Westlake University, Hangzhou
    • Time: Mar 2023 - Sep 2023.
    • Thesis: Efficient Transfer of Vision-language Models.
    • Supervisers: Dr. Siteng Huang and Prof. Donglin Wang.

🎤 Talks

📠 Services

Conference Reviewer

  • Advances in Neural Information Processing Systems (NeurIPS)
  • ACM International Conference on Multimedia (MM)