Xiaofang Wang

I am a Staff Research Scientist at Meta Superintelligence Labs (MSL), working on multimodal post-training and reasoning. I received my Ph.D. from the Robotics Institute at Carnegie Mellon University and B.S. in Computer Science from Peking University.

E-mail / Google Scholar / LinkedIn

profile photo
Experience
  • 02/2023 - current: Research Scientist @ Meta Superintelligence Labs
  • 06/2022 - 02/2023: Research Scientist @ Mobile Vision, Meta Reality Labs
  • 05/2020 - 08/2020: Research Intern @ Google Perception
  • 05/2019 - 08/2019: Research Intern @ Google Cloud AI
Education
  • 08/2017 - 05/2022: Ph.D. in Robotics, Carnegie Mellon University (advisor: Kris Kitani)
  • 08/2015 - 05/2017: M.S. in Robotics, Carnegie Mellon University (advisors: Kris Kitani, Martial Hebert)
  • 08/2011 - 07/2015: B.S. in Computer Science, Peking University
Publications
Mind Palace Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Zeyi Huang, Yuyang Ji, Xiaofang Wang, Nikhil Mehta, Tong Xiao, Donghyun Lee, Sigmund Vanvalkenburgh, Shengxin Zha, Bolin Lai, Licheng Yu, Ning Zhang, Yong Jae Lee, Miao Liu
Computer Vision and Pattern Recognition Conference (CVPR), 2025
Apollo Apollo: An Exploration of Video Understanding in Large Multimodal Models
Orr Zohar, Xiaohan Wang, Yann Dubois, Nikhil Mehta, Tong Xiao, Philippe Hansen-Estruch, Licheng Yu, Xiaofang Wang, Felix Juefei-Xu, Ning Zhang, Serena Yeung-Levy, Xide Xia
Computer Vision and Pattern Recognition Conference (CVPR), 2025
Vision Token Reduction Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction
Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu, Xide Xia, Miao Liu, Xiaofang Wang, Mingfu Liang, Ning Zhang, Dimitris N Metaxas, Licheng Yu
Computer Vision and Pattern Recognition Conference (CVPR), 2025
ControlRoom3D ControlRoom3D: Room Generation using Semantic Proxy Rooms
Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou
Computer Vision and Pattern Recognition Conference (CVPR), 2024
Cost-aware Cost-Aware Evaluation and Model Scaling for LiDAR-Based 3D Object Detection
Xiaofang Wang, Kris M. Kitani
International Conference on Robotics and Automation (ICRA), 2023
Committee-based Wisdom of Committees: An Overlooked Approach To Faster and More Accurate Models
Xiaofang Wang, Dan Kondratyuk, Eric Christiansen, Kris M. Kitani, Yair Alon, Elad Eban
International Conference on Learning Representations (ICLR), 2022
[Poster] [Google AI Blog]
NANAS Neighborhood-Aware Neural Architecture Search
Xiaofang Wang, Shengcao Cao, Mengtian Li, Kris M. Kitani
British Machine Vision Conference (BMVC), 2021

AttentionNAS AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification
Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo,
Anelia Angelova, Kris M. Kitani, Wei Hua
European Conference on Computer Vision (ECCV), 2020
[Video-1 minute] [Video] [Slides]
ESNAC Learnable Embedding Space for Efficient Neural Architecture Compression
Shengcao Cao*, Xiaofang Wang*, Kris M. Kitani
International Conference on Learning Representations (ICLR), 2019
* indicates equal contribution.
[Code] [Poster] [Architecture Visualization]
ErrorCorrection Error Correction Maximization for Deep Image Hashing
Xiang Xu, Xiaofang Wang, Kris M. Kitani
British Machine Vision Conference (BMVC), 2018
DTSH Deep Supervised Hashing with Triplet Labels
Xiaofang Wang, Yi Shi, Kris M. Kitani
Asian Conference on Computer Vision (ACCV), 2016
Oral Presentation, (5.6% acceptance rate)
[Code]
HCQ Hamming Compatible Quantization for Hashing
Zhe Wang, Ling-Yu Duan, Jie Lin, Xiaofang Wang, Tiejun Huang, Wen Gao
International Joint Conference on Artificial Intelligence (IJCAI), 2015

Website design from Jon Barron