An-Chieh Cheng 鄭安傑
a8cheng at ucsd dot edu

I am a PhD student at the University of California, San Diego, advised by Prof. Xiaolong Wang. During my PhD studies, I interned at NVIDIA and Adobe, and my research has been supported by Qualcomm Innovation Fellowship. Prior to my PhD, I earned my Master’s and Bachelor’s degrees in computer science from National Tsing Hua University.

I'm interested in building multimodal foundation models capable of general spatial understanding and actionable intelligence.

Google Scholar  /  Curriculum Vitæ  /  Github  /  LinkedIn  /  Twitter

An-Chieh Cheng
News
Selected Publications Full List
OmniVinci teaser

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Hanrong Ye et al. ICLR, 2026
NVIDIA's state-of-the-art 9B Omni-Modal LLMs.

3D Aware Region Prompted Vision Language Model
An-Chieh Cheng, Yang Fu, Yukang Chen, Zhijian Liu, Xiaolong Li, Subhashree Radhakrishnan, Song Han, Yao Lu, Jan Kautz, Pavlo Molchanov, Hongxu Yin✝︎, Xiaolong Wang✝︎, Sifei Liu✝︎ ICLR, 2026
Region-level spatial reasoning for both single-view and multi-view inputs.

EgoVLA teaser

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
Ruihan Yang*, Qinxi Yu*, Yecheng Wu, Rui Yan, Borui Li, An-Chieh Cheng, Xueyan Zou, Yunhao Fang, Xuxin Cheng, Ri-Zhao Qiu, Hongxu Yin, Sifei Liu, Song Han, Yao Lu, Xiaolong Wang Preprint, 2025
Robust dexterous manipulation generalist model utilizing diverse egocentric human manipulation videos.

NaVILA: Legged Robot Vision-Language-Action Model for Navigation
An-Chieh Cheng*, Yandong Ji*, Zhaojing Yang*, Zaitian Gongye, Xueyan Zou, Jan Kautz, Erdem Bıyık, Hongxu Yin✝︎, Sifei Liu✝︎, Xiaolong Wang✝︎ RSS, 2025
A two-level framework that combines VLAs with locomotion skills for navigation. The VLA is adapted from a VLM and learns from human touring videos.

NVILA teaser

NVILA: Efficient Frontier Visual Language Models
Zhijian Liu et al.
CVPR, 2025
Efficient frontier VLM models with efficient training and inference.

pdf | website | demo | code | abstract

SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models
An-Chieh Cheng, Hongxu Yin, Yang Fu, Qiushan Guo, Ruihan Yang, Jan Kautz, Xiaolong Wang, Sifei Liu NeurIPS, 2024
A powerful region-level VLM adept at 3D spatial reasoning.
✨ Demoed at GTC 2025 as a part of Agentic AI for Physical Operations!

pdf | website | video | code | abstract

TUVF: Learning Generalizable Texture UV Radiance Fields
An-Chieh Cheng, Xueting Li, Sifei Liu✝︎, Xiaolong Wang✝︎ ICLR, 2024
Learning generalizable texture UV radiance fields for shapes.

pdf | website | video | code | abstract

Autoregressive 3D Shape Generation teaser

Autoregressive 3D Shape Generation via Canonical Mapping
An-Chieh Cheng*, Xueting Li*, Sifei Liu, Min Sun, Ming-Hsuan Yang ECCV, 2022
We decompose the point cloud into meaningful shape sequences, then we encode these sequences through a transformer for generation.

pdf | code | abstract

Canonical Point Autoencoder teaser

Learning 3D Dense Correspondence via Canonical Point Autoencoder
An-Chieh Cheng, Xueting Li, Min Sun, Ming-Hsuan Yang, Sifei Liu NeurIPS, 2021
Unsupervised learning of dense 3D correspondence.

pdf | website | code | abstract


Template from this awesome website.