I am a Ph.D. student at the HKU-MMLab, the University of Hong Kong, supervised by Prof. Xihui Liu. I received my B.Eng. degree at the Department of Automation, Tsinghua University.

My current research focuses on Generative models and Multimodal AI for Computer Vision. More specifically, I am looking into the development of visual tokenizers for better modeling of visual signals for generative AI models.

🔥 News

  • 2025.04:  🎉🎉 Proud to release GigaTok, the first work that successfully scales visual tokenizers to 3B parameters!
  • 2024.10:  🎉🎉 (LVD-2M: A Long-take Video Dataset with Temporally Dense Captions)[https://silentview.github.io/LVD-2M/] (NeurIPS 2024, D&B track) is released!

📝 Publications

Arxiv 2025
GigaTok

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Tianwei Xiong, Jun Hao Liew, Zilong Huang, Jiashi Feng, Xihui Liu

Project | Paper | Code

  • We propose solutions for reconstruction vs. generation delimma for scaling tokenziers.
  • GigaTok is the first work that successfully scales visual tokenizers to 3B parameters!
NeurIPS 2024
LVD-2M

LVD-2M: A Long-take Video Dataset with Temporally Dense Captions

Tianwei Xiong*, Yuqing Wang*, Daquan Zhou, Zhijie Lin, Jiashi Feng, Xihui Liu

Project | Paper | Code

  • We pay special attention to long-take videos without cuts.
  • We propose a data pipeline for filtering high-quality long-take videos and the temporally dense captioning of the videos.
Arxiv 2024
EMCID

Editing Massive Concepts in Text-to-Image Diffusion Models

Tianwei Xiong*, Yue Wu*, Enze Xie, Yue Wu, Zhenguo Li, Xihui Liu

Project | Paper | Code

  • EMCID can edit massive concepts in text-to-image diffusion models, with limited costs and minimal negative effects on the performances.