Inspiration

We’ve all stared at a problem—a tangled nest of server cables, a shattered circuit board, or a complex structural crack—and wished we had a master systems engineer standing right next to us. Traditional AI vision tools are great at identifying "what" an object is, but they often fail at explaining "why" it's broken or "how" to fix it.

With the release of Gemini 3 Pro, we saw an opportunity to move past simple object detection and into the realm of multimodal forensic reasoning. We were inspired to build a tool that doesn't just see the world, but debugs it, using a high-budget thinking process to simulate failure points and safety protocols.

What it does

Reality Debugger is an advanced diagnostic utility that ingests high-resolution imagery (via live camera or file upload) and subjects it to an intensive "Logic Engine" analysis.

  • Visual Diagnosis: Breaks down the scene into engineering components and anomalies.
  • Root Cause Hypothesis: Uses Gemini 3's high thinking budget to trace symptoms back to their source (e.g., identifying that a bridge collapse started with a specific stress fracture).
  • Execution Plan: Generates a step-by-step resolution path.
  • Transparent Reasoning: Crucially, it reveals the "Underlying Reasoning" for every single step, allowing users to understand the AI's internal logic and learn from it.
  • Safety First: Automatically generates high-priority safety warnings based on the identified environment.

How we built it

The application is built on a modern React 19 stack with a custom Cyber-Industrial UI designed for field visibility.

  • AI Core: Powered by the Gemini 3 Pro model. We utilized the new thinkingConfig with a 15,000-token thinking budget, ensuring the model "thinks" deeply before producing its structured JSON output.
  • Multimodal Ingestion: We implemented a robust image processing pipeline that handles live camera streams and high-resolution forensics.
  • UI/UX: We developed a "Scanner Overlay" system using Tailwind animations to give users visual feedback that their "reality" is being actively analyzed at a molecular level.
  • Grounding: The system uses specialized "Master Systems Engineer" instructions to ensure technical accuracy and professional terminology.

Challenges we ran into

One of the primary challenges was ensuring the model didn't just provide generic advice. We had to iterate on the System Instructions to force the model to look for anomalies rather than just describing the scene.

Another hurdle was CORS-friendly image processing. When loading high-resolution benchmark samples from cloud storage, we had to implement a cross-origin canvas bridge to convert those URLs into base64 data strings that the Gemini API could ingest without security bottlenecks.

Accomplishments that we're proud of

  • Reasoning Transparency: We successfully pulled back the curtain on AI "black box" logic. Seeing the "Underlying Reasoning" block for a structural repair step feels like having a real conversation with an expert.
  • The Scanner UX: The aesthetic transition from a raw camera feed to a "Cyber-Border" analyzed result with the laser scan effect creates a powerful sense of utility and precision.
  • Safety Integration: The model’s ability to recognize a "Shattered Hardware" scenario and immediately flag "hazardous battery chemicals" or "glass shards" demonstrates the safety potential of multimodal reasoning.

What we learned

We discovered that Thinking Budgets are a game-changer for technical vision tasks. In early tests with low reasoning budgets, the model would often suggest "buy a new one." With a 15,000-token budget, the model actually analyzes the screws, solder joints, and stress lines, suggesting specific re-seating or reinforcement techniques. High-budget reasoning turns AI from an assistant into a consultant.

What's next for REALITY DEBUGGER

  • Live API Integration: Moving from static snapshots to a continuous real-time video stream using the Gemini Live API for "Heads-Up" debugging.
  • AR Glasses Port: Imagine wearing AR glasses that highlight the "root cause" of a mechanical failure in your field of view as you look at a machine.
  • Collaborative War Rooms: A feature where multiple engineers can upload different angles of a disaster site to build a shared 3D "Logic Map" of the failure.
  • Schematic Grounding: Allowing users to upload a PDF blueprint alongside an image so Gemini can compare "as-built" vs "as-broken."

Built With

  • cyber-industrial-ui
  • gemini-3-pro
  • json
  • react
  • tailwind
Share this project:

Updates