RealityOS: View Source on Reality

Tagline

An AI-powered spatial operating system that lets you "view source" on the real world, decomposing images into interactive "Reality Objects" that you can inspect, modify, and rewire using natural language and a visual 3D logic graph.

Inspiration

We interact with digital content every day—websites, apps, games—and we (developers) have the superpower to "Inspect Element" or "View Source" to understand and change them. But the physical world, and the images that capture it, have always been flattened, read-only static pixels.

We asked: What if we could "View Source" on reality?

What if you could look at a photo of a room, click on a chair, and see it as a "Register" (variable) in a program? What if you could chat with an AI OS to "remodel the room" or "change the lighting," and watch it construct a logic graph of operations to make that happen? This inspired RealityOS.

What it does

RealityOS is a web-based "Spatial IDE" for images.

Deconstructs Reality: You upload an image, and RealityOS uses FAL.ai's SAM-3 (Segment Anything Model) to decompose it into distinct "Reality Objects" (Registers).
Visualizes Logic: These objects are visualized in a 3D space as layers floating above the original image. You can rotate the view to see the "depth" of the scene's composition.
Natural Language Editing: You can chat with Gemini 3 to command the OS. For example, "Make the sofa pink" or "Remove the plant."
Generative Workflows: Gemini translates your intent into a node-based program of "Ops" (operations) and "Wires". It visually connects inputs (Registers) to effects (Style Transfer, Inpainting, etc.) to generate the new result.
Agent Insight: A "Thinking..." block exposes the AI's internal monologue as it plans the modifications, giving users transparency into the agentic workflow.

How we built it

Frontend: Built with React 19, TypeScript, and Vite. We used Three.js (via React Three Fiber) for the 3D "exploded view" visualization, allowing registers and ops to exist in a spatial canvas. Tailwind CSS provides the sleek, "Minority Report"-style glassmorphic UI.
AI Reasoning: We leveraged Google Gemini 3 (via gemini-3-pro-preview) as the core reasoning engine. It understands the user's natural language intent and outputs a structured JSON "program" (Registers, Ops, Wires) that the frontend renders.
Computer Vision: We integrated FAL.ai to run SAM-3 for high-precision semantic segmentation, turning raw pixels into masked "Registers".
Backend: A lightweight AWS Lambda proxy (deployed via CDK behind CloudFront) handles secure API communication with Gemini and FAL, ensuring our API keys remain protected.
State Management: Zustand manages the complex state of the "Reality Program" (the graph of nodes and connections).

Challenges we ran into

Visualizing Abstract Concepts: Translating the abstract idea of "editing reality" into a concrete UI was difficult. We iterated on several designs before landing on the "3D Layer Exploded View," which intuitively conveys that the image is made of parts.
Synchronizing AI & UI: Keeping the chat interface, the 3D canvas, and the underlying logic graph in sync was tricky. We had to ensure that when Gemini "thought" of a change, the UI immediately reflected the new nodes and connections.
Masking Precision: Getting clean object cutouts is crucial. Early models struggled, but switching to SAM-3 on FAL gave us the pixel-perfect segmentation needed for the "View Source" illusion.
Extreme Circumstances (Vibe Coding): Most of this application was "vibe coded" by Google Gemini 3 and Antigravity while traveling for medical tourism (hacking vet prices for my Corgi, securing 97% savings compared to SF or NYC). We faced spotty train wifi and overcrowded cabins with high CO2 levels that literally put us to sleep—but we could keep shipping by queuing up tasks in Agent Orchestration Mode, allowing the AI to continue building and deploying while we were offline or napping.
Also, Gemini 3/Antigravity deployed on a stack I don't actually know!

Accomplishments that we're proud of

The "Wow" Factor: The moment you rotate the camera and see your image split into 3D layers is genuinely magical.
Seamless Agentic Flow: We built a system where the AI doesn't just "do it"—it shows you how it's doing it by building a visual graph. This makes the AI feel like a collaborator rather than a black box.
Performance: Despite the heavy 3D rendering and AI calls, the app feels snappy and responsive, thanks to optimistic UI updates and React 19's performance features.

What we learned

Spatial UI is the future: Interacting with "files" (images) in 3D space opens up new intuitive workflows that 2D interfaces can't match.
Structured Output is Key: Getting Gemini to output strictly formatted JSON programs (Registers/Ops) instead of just text was a game-changer for building a reliable functional app.
Latency Matters: Proxying large image data adds time. We learned to use optimistic updates and "skeleton" nodes to keep the user engaged while the heavy AI processing happens in the background.

What's next for RealityOS

Real Image Generation: Implementing the actual image synthesis steps (currently visualized as logic nodes). We plan to use Flux or Stable Diffusion Inpainting to execute the "Style" and "Remove" ops.
Video Support: "View Source" on videos to track objects across frames.
Multi-Modal Inputs: Combining voice, text, and gesture inputs for a true "Tony Stark" experience.
IOT Integration: Connecting "Reality Objects" to real-world devices (e.g., clicking a lamp in the video feed to turn off the real hue light).