I am an AI researcher. My goal is to build intelligent agents that can assist humans in the physical world. This requires solving two fundamental problems: how to make AI perceive the world like humans, and how to make decisions based on that understanding. My research interests thus span computer vision, reinforcement learning, and multimodal intelligence.
I created and lead the Depth Anything series, aiming for human-level spatial perception. This includes DA, DAv2, PromptDA, Video DA, and DA3, each with over 1k GitHub stars, with more than 24k stars in total.
Previously, I led the Spatial Intelligence research team at ByteDance Seed. I also spent time at Sea AI Lab, Facebook AI Research (FAIR), UC Berkeley, and NUS.