Xuyang Liu (刘旭洋)
🌈 I am a second-year Master’s student at Sichuan University, under the supervision of Prof. Honggang Chen. Currently, I am working as a research intern at Taobao & Tmall Group, focusing on Efficient MLLM. Previously, I had the honor of visiting MiLAB at Westlake University, supervised by Prof. Donglin Wang. I am very glad to be supervised and collaborated with Dr. Siteng Huang from DAMO Academy and Asst. Prof. Linfeng Zhang from SJTU.
📌 My research interests span Efficient Multi-modal Large Language Models, including:
- Discrimination: visual grounding and referring video object segmentation.
- Adaptation: parameter-efficient transfer learning and model acceleration.
- Reconstruction: super-resolution and image quality assessment.
- Generation: text-to-image generation and text-to-video generation.
📢 Recently, I am focusing on Acceleration of Generative Models. Feel free to reach out to me at liuxuyang@stu.scu.edu.cn
, if you are interested in collaborating with me.
🔥 News
- 2024.10.12: 🚀🚀 We release our work ToCa about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of 2.36× on OpenSora!
- 2024.09.26: 🎊🎊 One co-first author paper (V-PETL) about unified visual parameter-efficient transfer learning benchmark has been accepted by NeurIPS 2024!
- 2024.09.18: One paper (PFIQA) about reference-reduced image quality assessment has been accepted by IEEE Transactions on Broadcasting!
- 2024.07.22: ⛵ I begin my research internship at Taobao & Tmall Group, focusing on multi-modal large language models (MLLM).
- 2024.03.13: One co-first author paper (DARA) about parameter-efficient tuning for visual grounding has been by ICME 2024, and selected as Oral!
- 2024.12.13: 🎊🎊 One first author paper (VGDiffZero) about zero-shot visual grounding has been accepted by ICASSP 2024, which is my first work during during master’s studies!
📝 Publications
Please find my full publications on my Google Scholar profile.
Conference Papers
Yi Xin*, Siqi Luo*, Xuyang Liu*, Yuntao Du*, Haodi Zhou, Xinyu Cheng, Christina Lee, and 10 more authors, "V-PETL Bench: A Unified Visual Parameter-Efficient Transfer Learning Benchmark". In Neural Information Processing Systems Datasets and Benchmarks Track (NeurlPS D&B Track), 2024. [paper][page] [code] [poster]
Xuyang Liu*, Siteng Huang*, Yachen Kang, Honggang Chen, Donglin Wang, "VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders". In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024. [paper] [code] [poster]
Ting Liu*, Xuyang Liu*, Siteng Huang, Honggang Chen, Quanjun Yin, Long Qin, Donglin Wang, Yue Hu, "DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding". In IEEE International Conference on Multimedia & Expo (ICME), 2024. (Oral) [paper] [code] [poster]
Journal Papers
Xinying Lin, Xuyang Liu, Hong Yang, Xiaohai He, Honggang Chen, "Perception- and Fidelity-aware Reduced-Reference Super-Resolution Image Quality Assessment". IEEE Transactions on Broadcasting, 2024. (SCI Q1, IF: 3.2) [paper] [code]
Xuyang Liu, "GLMLP-TRANS: A transportation mode detection model using lightweight sensors integrated in smartphones". Computer Communications, 2022. (SCI Q1, IF: 6.0) [paper] [code]
Preprints & Under Submission
Xuyang Liu*, Ting Liu*, Siteng Huang, Yi Xin, Yue Hu, Quanjun Yin, Donglin Wang, Honggang Chen "M2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension". arXiv preprint arXiv:2407.01131. [paper]
Chang Zou*, Xuyang Liu*, Ting Liu, Siteng Huang, Linfeng Zhang "Accelerating Diffusion Transformers with Token-wise Feature Caching". arXiv preprint arXiv:2410.05317. [paper] [page] [code]
Ting Liu*, Xuyang Liu*, Siteng Huang, Liangtao Shi, Zunnan Xu , Yi Xin, Quanjun Yin, Xiaohong Liu "Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference". arXiv preprint arXiv:2410.05317. [paper] [github] [Chinese intro] [Zhihu]
Shaobo Wang, Hongxuan Tang, Mingyang Wang, Hongrui Zhang, Xuyang Liu, Weiya Li, Xuming Hu, Linfeng Zhang "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models". arXiv preprint arXiv:2410.21815. [paper]
💻 Experience
- Research Intern - Taobao & Tmall Group, Alibaba Group, Beijing
- Time: July 2024 - Present.
- Thesis: Efficient Multi-modal Large Language Models.
- Research Intern - Machine Intelligence Laboratory, Westlake University, Hangzhou
- Time: Mar 2023 - Sep 2023.
- Thesis: Zero-shot Transfer of Vision-language Models.
- Supervisers: Dr. Siteng Huang and Prof. Donglin Wang.