Kitchen Studio

An AI-powered cooking co-pilot that turns videos into real cooking experiences


Inspiration

Kitchen Studio was born from frustration and love.

I love cooking. But in the era of 60-second TikToks and Instagram Reels, cooking became chaotic. Ingredients flash by. Measurements disappear. Steps blur together. Inspiration is everywhere but clarity is not.

The turning point came when I and my project partner discovered the Shipyard RevenueCat Hackathon.

We learned we would be building for influencers and a professional cook whose craft we genuinely admire was among the influencers. That’s when something clicked. We weren’t just building an app. We had an opportunity to solve a problem we had personally experienced for years.

Creators are incredible at inspiring us. But home cooks don’t need more inspiration, they need translation.

So we built Kitchen Studio, a Kitchen OS powered by AI that transforms fast, chaotic cooking content into structured recipes, organized ingredients, and real-time guidance.

It bridges the gap between “I saw this online” and “I made this perfectly.” Because cooking should feel joyful, not overwhelming.


Key Features

Real-Time AI Cooking Guidance

  • Camera detects food, pans, and cooking stages
  • AI provides timing, heat, and texture feedback
  • Hands-free voice guidance while cooking

Video → Interactive Cooking

  • Import videos from TikTok, Instagram, YouTube, or X
  • Automatically extract:
    • Ingredients
    • Steps
    • Cooking logic
  • Turn passive videos into guided cook-alongs

Cooking Intelligence

  • Learns from user behavior and inventory
  • Tracks detailed usage history (what you used vs. what you have)
  • Predicts when ingredients will run out based on consumption patterns

Smart Recipe Discovery

  • Hyper-Personalized Suggestions: Uses Gemini 2.0 Flash to generate recipes based on:
    • Dietary preferences (e.g., Keto, Vegan)
    • Cooking skill level
    • Health goals (e.g., High Protein)
  • "Cook From Pantry" Mode: Instantly finds recipes matching your exact available ingredients.
  • Natural Language Search: "I want something spicy with chicken" works instantly.

Kitchen Studio introduces Zero-UI cooking, where the phone watches, listens, and assists instead of demanding attention.


How We Built It: Kitchen Studio Technical Deep-Dive

The Kitchen OS Architecture

Kitchen Studio runs on a four-layer architecture designed for spatial awareness, real-time intelligence, and deterministic interaction.

Perception Layer (Spatial AI)

Using ViroReact and Expo AR, we anchor:

  • 3D interactive elements (timers, checkpoints)
  • Ingredient status cards
  • Interactive step nodes

Directly onto real-world kitchen surfaces. The interface doesn’t sit on a screen. It lives on your countertop.


Intelligence Layer (Multimodal AI)

We integrated Google Gemini 2.0 Flash Experimental (Client-Side) and Gemini 3.0 Flash Preview (Edge Functions).

This powers:

  • Real-time voice guidance
  • Context-aware Q&A
  • Automated Video → Recipe extraction (Ingredients, Steps, Pro tips)
  • Ingredient density normalization

Gemini 2.0 Flash handles the ultra-low latency multimodal streaming for the live assistant. Gemini 3.0 Flash handles the complex reasoning required to parse unstructured video content into structured recipe JSON.

Together, they act as a live cooking co-pilot.


Logic Layer (State Management)

Cooking is not linear. It is reactive.

We used XState to power our ARStateMachine, managing transitions between:

  • Surface scanning
  • Step progression
  • Audio playback queues
  • Voice interaction states

Finite State Machines ensured predictable behavior in a highly interactive AR environment. This is what makes the experience feel stable and intentional, not experimental.


Data Layer (Backend Infrastructure)

Built on Supabase (PostgreSQL), our backend manages:

  • User inventory with density-aware tracking
  • Cookbook and Recipe provenance
  • Parsed recipe structures
  • Gamification mechanics (XP, streaks, levels)

Supabase Edge Functions power the heavy lifting:

  • extract-recipe-from-video: Orchestrates the Gemini 3.0 analysis of social media URLs.
  • cooking-assistant: A WebSocket relay infrastructure for scalable AI interaction.

From Video → Interactive Cooking Engine

Scrolling is passive. Cooking is active.

We built a pipeline that converts short-form cooking videos into structured, executable guides.

Step 1: Multimodal Parsing

Users paste URLs from TikTok, Instagram, YouTube, or X. The app sends this to our extract-recipe-from-video Edge Function.

Step 2: AI Extraction

Using Gemini 3.0 Flash, we extract:

  • Ingredients (normalized to standard units)
  • Measurements
  • Sequential steps
  • Timing cues
  • Implied techniques

The result is structured JSON ready for execution.

Step 3: TimelineEngine Execution

The parsed data feeds into our custom TimelineEngine (TypeScript), which orchestrates:

  • Step-based spatial cards
  • Voice-guided prompts via Expo Speech (TTS)
  • Automated state transitions
  • Auto-deduction of ingredients from inventory upon step start

This transforms content into action.


The Discovery Engine: Context-Aware Suggestions

Static recipe feeds are boring. We built a dynamic discovery engine powered by Gemini 2.0 Flash.

It considers:

  • User Profile: Dietary restrictions, skill level, and health goals.
  • Pantry Inventory: What ingredients are actually available.
  • Current Context: Time of day ("Breakfast ideas"), trending topics.

Instead of searching a database, we ask Gemini to generate structured recipe suggestions that match these constraints perfectly. The result is a hyper-personalized feed that adapts to what you have and who you are.


Real-Time Voice Cooking Assistant

Standard REST APIs were too slow for a "live assistant" experience.

We implemented a direct WebSocket connection to the Gemini Multimodal Live API. Using the google/genai SDK directly in React Native, we:

  • Stream audio logic
  • Receive instantaneous text and audio responses
  • Manage response queues for natural conversation flow

This created a diverse, low-latency conversational cooking assistant that feels immediate and natural.


Solving the "Salt Math" Problem

Cooking math is messy. How do you deduct "2 teaspoons" of salt from inventory stored in grams? Volume-to-weight conversion depends on ingredient density.

Our Solution: UnitConversionService

We built:

  • A local Density Mapping database with cloud sync (ingredient_profiles)
  • A normalization pipeline (normalizing "cups" -> "ml" -> "g" based on specific ingredient density)
  • Canonical unit conversion logic

For example, our system knows that:

  • 1 cup of Rice ≈ 190g
  • 1 cup of Flour ≈ 120g
  • 1 cup of Salt ≈ 280g

Every deduction is mathematically accurate. No guessing. No rounding chaos. No broken inventory logic.


The Tech Stack

Frontend

  • React Native (Expo)
  • TypeScript
  • ViroReact (AR)

Backend

  • Supabase (PostgreSQL)
  • Supabase Edge Functions (Deno)

AI Models

  • Google Gemini 2.0 Flash Experimental (Live Multimodal Assistant)
  • Google Gemini 3.0 Flash Preview (Video Extraction & Parsing)

State Management

  • XState (Finite State Machines)
  • Zustand (Global Store)

Media

  • expo-av (Audio buffering & playback)
  • expo-camera (Scanning & perception input)
  • expo-speech (Text-to-Speech)

The Result

Kitchen Studio is not just an app. It is a spatial, AI-powered cooking operating system that:

  • Sees
  • Understands
  • Guides
  • Calculates
  • And cooks with the user

We built a bridge between digital inspiration and physical execution, powered by AR, multimodal AI, and real-time intelligence.

Accomplishments We’re Proud Of

Our First App Ever

We are incredibly proud to have built and completed our first-ever mobile application. From initial concept to a finished product, we successfully navigated the entire development lifecycle.

New Interaction Model

We created a hands-free interaction model for cooking that moves beyond static text to dynamic AR and voice guidance.

Scalable Infrastructure

We engineered a foundation using Supabase and Expo that is ready to grow far beyond this initial prototype.


What We Learned

Confidence Over Content

Users don't just want more recipes; they want the confidence to execute the ones they see.

Simplicity Unlocks Power

A clean, simple UX is the key to making complex AI logic (like Gemini and Spatial AR) accessible to everyday users.

Resilience in Learning

As first-time mobile app developers, we learned that technical roadblocks are just opportunities to understand the underlying native systems more deeply.


What’s Next for Kitchen Studio

Live Multi-Camera Sessions

Enabling broadcasters to show overhead and stove-side angles for better visual context.

Hardware Integration

Expanding the "Kitchen OS" to connect with smart stoves, thermometers, AR glasses, and scales for automated guidance.

Skill-Based Progression

Implementing a gamified "Culinary RPG" where users earn XP and badges based on the technical skills they master.

Offline Edge Inference

Moving vision and voice models to on-device processing to ensure a smooth experience even in kitchens with poor Wi-Fi.

Built With

  • deno
  • docker
  • expo-av
  • expo-camera
  • expo-haptics
  • expo-speech
  • expo.io
  • google-console
  • google-gemini-2.0-flash
  • google-gemini-3.0-flash
  • google-genai-sdk
  • google-play
  • google-sign-in
  • hono
  • lucide
  • native
  • openai-api
  • postgresql
  • react
  • react-native
  • revenuecat
  • supabase
  • supabase-edge-functions
  • tanstack-query
  • trpc
  • typescript
  • viroreact-(ar)
  • xstate
  • zustand
Share this project:

Updates