Teaser Image

InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts

✶ Equal Contribution, † Corresponding Author
NeurIPS 2025

📋 Abstract

The advancement of Embodied AI heavily relies on large-scale, simulatable 3D scene datasets characterized by scene diversity and realistic layouts. However, existing datasets typically suffer from limitations in data scale or diversity, sanitized layouts lacking small items, and severe object collisions. To address these shortcomings, we introduce InternScenes, a novel large-scale simulatable indoor scene dataset comprising approximately 40,000 diverse scenes by integrating three disparate scene sources, i.e., real-world scans, procedurally generated scenes, and designer-created scenes, including 1.96M objects and covering 15 common scene types and 288 object classes. We particularly preserve massive small items in the scenes, resulting in realistic and complex layouts with an average of 41.5 objects per region. Our comprehensive data processing pipeline ensures simulatability by creating real-to-sim replicas for real-world scans, enhances interactivity by incorporating interactive objects into these scenes, and resolves object collisions by physical simulations. We demonstrate the value of InternScenes with two benchmark applications: scene layout generation and point-goal navigation. Both show the new challenges posed by the complex and realistic layouts. More importantly, InternScenes paves the way for scaling up the model training for both tasks, making the generation and navigation in such complex scenes possible. We commit to open-sourcing the data, models, and benchmarks to benefit the whole community.

Teaser Image

🎬 Demo Video

🏘️ InternScenes-Real2Sim

Dataset Overview

Pipeline for retrieving synthetic scenes from real scan scenes.

🎮 InternScenes-Synthetic

Dataset Overview

Pipeline for annotating and processing raw scenes to extract precise layout information.

🌠 Samples

Comprehensive 3D Assets (*.usd and *.glb) with Canonical Poses and Semantic Labels

🔎 All 3D CAD models have been carefully annotated manually

Bed

Bulky Item

Bookshelf

Bulky Item

Chair

Bulky Item

Couch

Bulky Item

Desk

Bulky Item

Refrigerator

Bulky Item

Electric Cooker

Medium Item

Microwave

Medium Item

Oven

Medium Item

Pan

Medium Item

Pot

Medium Item

Lamp

Medium Item

Clock

Small Item

Clothes

Small Item

Cup

Small Item

Fan

Small Item

Pillow

Small Item

Phone

Small Item

Keyboard

Small Item

Mouse

Small Item

Laptop

Small Item

Tray

Small Item

Shoe

Small Item

Toy

Small Item

Retrieving from Real Scans to Synthetic Scenes

(Drag the slider below to rotate the scene)

🌗 Real2Sim Comparison

BibTeX

@inproceedings{InternScenes,
  title={InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts},
  author={Zhong, Weipeng and Cao, Peizhou and Jin, Yichen and Li, Luo and Cai, Wenzhe and Lin, Jingli and Lyu, Zhaoyang and Wang, Tai and Dai, Bo and Xu, Xudong and Pang, Jiangmiao},
  year={2025},
  booktitle={arXiv},
}