Chengyang Zhao

I am a second-year master student (MS in Robotics) at Robotics Institute, Carnegie Mellon University, advised by Prof. Jean Oh. Previously, I obtained my bachelor's degree from Yuanpei College 🦁, Peking University, majoring in Data Science (Computer Science + Statistics). During my undergraduate years, I was honored to be advised by Prof. He Wang. I was also previleged to work closely with Prof. Chuang Gan.

I am always happy to chat and explore opportunities for collaboration. Feel free to reach out to me!

I am seeking PhD positions starting in Fall 2026. If you have any suggestions or opportunities, please let me know!

Email  /  CV (Dec. 2025)  /  Google Scholar  /  LinkedIn  /  Github  /  X (Twitter)

profile photo

Research

My research lies in the intersection of Robotics and Machine Learning, with a focus on Robot Manipulation.

My current research highlights the following perspectives:

  • Modeling consistent geometric and dynamic structures for generality and adaptability of interaction.
  • Developing robust real-world perception to capture key structures for reliability of interaction.
  • Building structured scene understanding from heterogeneous data to form a knowledge foundation for interaction.

Core research questions I am exploring:

  • How can we develop structured inductive bias from geometry, physics, causality, etc. for effective, efficient, scalable robot learning systems to enable general and adaptable robot interaction?
  • How can we combine general and robust learning-based design with solid engineering efforts to enable robots to operate as reliable, integrated systems in unstructured real-world environments?


Preprints & Publications ( show selected | show all by date )

(* indicates equal contribution.)

DYMO-Hair: Generalizable Volumetric Dynamics Modeling for Robot Hair Manipulation
Chengyang Zhao, Uksang Yoo, Arkadeep Narayan Chaudhury, Giljoo Nam, Jonathan Francis, Jeffrey Ichnowski, Jean Oh
Arxiv, 2025
[paper] [website]

TL;DR: Modeling dynamic structures of complex deformable objects for model-based generalizable and adaptable manipulation.
Dita Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Zhi Hou*, Tianyi Zhang*, Yuwen Xiong, Haonan Duan, Hengjun Pu, Ronglei Tong, Chengyang Zhao, Xizhou Zhu, Yu Qiao, Jifeng Dai, Yuntao Chen
International Conference on Computer Vision (ICCV), 2025
[paper] [website] [code]

TL;DR: A scalable DiT-based VLA policy with an in-context conditioning mechanism for inherent action denoising, enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations.


RoboVerse RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
Haoran Geng*, Feishi Wang*, Songlin Wei*, Yuyang Li*, Bangjun Wang*, Boshi An*, Charlie Tianyue Cheng*, Haozhe Lou, Peihao Li, Yen-Jen Wang, Yutong Liang, Dylan Goetting, Chaoyi Xu, Haozhe Chen, Yuxi Qian, Yiran Geng, Jiageng Mao, Weikang Wan, Mingtong Zhang, Jiangran Lyu, Siheng Zhao, Jiazhao Zhang, Jialiang Zhang, Chengyang Zhao, Haoran Lu, Yufei Ding, Ran Gong, Yuran Wang, Yuxuan Kuang, Ruihai Wu, Baoxiong Jia, Carlo Sferrazza, Hao Dong, Siyuan Huang, Yue Wang, Jitendra Malik, Pieter Abbeel
Robotics: Science and Systems (RSS), 2025
[paper] [website] [code]

TL;DR: A unified infrastructure with simulator-agnostic interfaces to a wide range of simulators, aiming to enable universal configuration and hybrid simulation for scalable and generalizable robot learning.
GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation
Wenbo Cui*, Chengyang Zhao*, Songlin Wei*, Jiazhao Zhang, Haoran Geng, Yaran Chen, He Wang
International Conference on Robotics & Automation (ICRA), 2025
[paper] [website]

TL;DR: Robust depth perception for elementary functional structures and adaptable manipulation via online planning across potential interaction modes for articulated objects.
D3RoMa D3RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation
Songlin Wei, Haoran Geng, Jiayi Chen, Congyue Deng, Wenbo Cui, Chengyang Zhao, Xiaomeng Fang, Leonidas Guibas, He Wang
Conference on Robot Learning (CoRL), 2024
[paper] [website] [code]

TL;DR: Robust depth perception for daily rigid objects with challenging physical materials in tabletop manipulation scenarios.
TextPSG TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
Chengyang Zhao, Yikang Shen, Zhenfang Chen, Mingyu Ding, Chuang Gan
International Conference on Computer Vision (ICCV), 2023
[paper] [website] [code]

TL;DR: Learning shared structured alignment across heterogeneous multi-modal data to build structured graph-based scene understanding capabilities.
GAPartNet GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts
Haoran Geng*, Helin Xu*, Chengyang Zhao*, Chao Xu, Li Yi, Siyuan Huang, He Wang
(* The order is determined by rolling dice.)
Conference on Computer Vision and Pattern Recognition (CVPR), 2023 (Highlight)
[paper] [website] [code] [dataset]

TL;DR: Modeling shared elementary functional structures that remain geometrically consistent across various articulated object categories for generalizable perception and manipulation.

Thanks Jon Barron for this amazing template :D
Last Updated: Dec. 2025