I am a Ph.D. student at the HKU-MMLab, the University of Hong Kong, supervised by Prof. Xihui Liu. I received my B.Eng. degree at the Department of Automation, Tsinghua University.
My current research focuses on Generative models and Multimodal AI for Computer Vision. More specifically, I am looking into the development of visual tokenizers for better modeling of visual signals for generative AI models.
🔥 News
- 2025.04: 🎉🎉 Proud to release GigaTok, the first work that successfully scales visual tokenizers to 3B parameters!
- 2024.10: 🎉🎉 (LVD-2M: A Long-take Video Dataset with Temporally Dense Captions)[https://silentview.github.io/LVD-2M/] (NeurIPS 2024, D&B track) is released!
📝 Publications

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong, Jun Hao Liew, Zilong Huang, Jiashi Feng, Xihui Liu
- We propose solutions for reconstruction vs. generation delimma for scaling tokenziers.
- GigaTok is the first work that successfully scales visual tokenizers to 3B parameters!

LVD-2M: A Long-take Video Dataset with Temporally Dense Captions
Tianwei Xiong*, Yuqing Wang*, Daquan Zhou, Zhijie Lin, Jiashi Feng, Xihui Liu
- We pay special attention to long-take videos without cuts.
- We propose a data pipeline for filtering high-quality long-take videos and the temporally dense captioning of the videos.

Editing Massive Concepts in Text-to-Image Diffusion Models
Tianwei Xiong*, Yue Wu*, Enze Xie, Yue Wu, Zhenguo Li, Xihui Liu
- EMCID can edit massive concepts in text-to-image diffusion models, with limited costs and minimal negative effects on the performances.