Kitchen Studio
An AI-powered cooking co-pilot that turns videos into real cooking experiences
Inspiration
Kitchen Studio was born from frustration and love.
I love cooking. But in the era of 60-second TikToks and Instagram Reels, cooking became chaotic. Ingredients flash by. Measurements disappear. Steps blur together. Inspiration is everywhere but clarity is not.
The turning point came when I and my project partner discovered the Shipyard RevenueCat Hackathon.
We learned we would be building for influencers and a professional cook whose craft we genuinely admire was among the influencers. That’s when something clicked. We weren’t just building an app. We had an opportunity to solve a problem we had personally experienced for years.
Creators are incredible at inspiring us. But home cooks don’t need more inspiration, they need translation.
So we built Kitchen Studio, a Kitchen OS powered by AI that transforms fast, chaotic cooking content into structured recipes, organized ingredients, and real-time guidance.
It bridges the gap between “I saw this online” and “I made this perfectly.” Because cooking should feel joyful, not overwhelming.
Key Features
Real-Time AI Cooking Guidance
- Camera detects food, pans, and cooking stages
- AI provides timing, heat, and texture feedback
- Hands-free voice guidance while cooking
Video → Interactive Cooking
- Import videos from TikTok, Instagram, YouTube, or X
- Automatically extract:
- Ingredients
- Steps
- Cooking logic
- Ingredients
- Turn passive videos into guided cook-alongs
Cooking Intelligence
- Learns from user behavior and inventory
- Tracks detailed usage history (what you used vs. what you have)
- Predicts when ingredients will run out based on consumption patterns
Smart Recipe Discovery
- Hyper-Personalized Suggestions: Uses Gemini 2.0 Flash to generate recipes based on:
- Dietary preferences (e.g., Keto, Vegan)
- Cooking skill level
- Health goals (e.g., High Protein)
- "Cook From Pantry" Mode: Instantly finds recipes matching your exact available ingredients.
- Natural Language Search: "I want something spicy with chicken" works instantly.
Kitchen Studio introduces Zero-UI cooking, where the phone watches, listens, and assists instead of demanding attention.
How We Built It: Kitchen Studio Technical Deep-Dive
The Kitchen OS Architecture
Kitchen Studio runs on a four-layer architecture designed for spatial awareness, real-time intelligence, and deterministic interaction.
Perception Layer (Spatial AI)
Using ViroReact and Expo AR, we anchor:
- 3D interactive elements (timers, checkpoints)
- Ingredient status cards
- Interactive step nodes
Directly onto real-world kitchen surfaces. The interface doesn’t sit on a screen. It lives on your countertop.
Intelligence Layer (Multimodal AI)
We integrated Google Gemini 2.0 Flash Experimental (Client-Side) and Gemini 3.0 Flash Preview (Edge Functions).
This powers:
- Real-time voice guidance
- Context-aware Q&A
- Automated Video → Recipe extraction (Ingredients, Steps, Pro tips)
- Ingredient density normalization
Gemini 2.0 Flash handles the ultra-low latency multimodal streaming for the live assistant. Gemini 3.0 Flash handles the complex reasoning required to parse unstructured video content into structured recipe JSON.
Together, they act as a live cooking co-pilot.
Logic Layer (State Management)
Cooking is not linear. It is reactive.
We used XState to power our ARStateMachine, managing transitions between:
- Surface scanning
- Step progression
- Audio playback queues
- Voice interaction states
Finite State Machines ensured predictable behavior in a highly interactive AR environment. This is what makes the experience feel stable and intentional, not experimental.
Data Layer (Backend Infrastructure)
Built on Supabase (PostgreSQL), our backend manages:
- User inventory with density-aware tracking
- Cookbook and Recipe provenance
- Parsed recipe structures
- Gamification mechanics (XP, streaks, levels)
Supabase Edge Functions power the heavy lifting:
extract-recipe-from-video: Orchestrates the Gemini 3.0 analysis of social media URLs.cooking-assistant: A WebSocket relay infrastructure for scalable AI interaction.
From Video → Interactive Cooking Engine
Scrolling is passive. Cooking is active.
We built a pipeline that converts short-form cooking videos into structured, executable guides.
Step 1: Multimodal Parsing
Users paste URLs from TikTok, Instagram, YouTube, or X.
The app sends this to our extract-recipe-from-video Edge Function.
Step 2: AI Extraction
Using Gemini 3.0 Flash, we extract:
- Ingredients (normalized to standard units)
- Measurements
- Sequential steps
- Timing cues
- Implied techniques
The result is structured JSON ready for execution.
Step 3: TimelineEngine Execution
The parsed data feeds into our custom TimelineEngine (TypeScript), which orchestrates:
- Step-based spatial cards
- Voice-guided prompts via Expo Speech (TTS)
- Automated state transitions
- Auto-deduction of ingredients from inventory upon step start
This transforms content into action.
The Discovery Engine: Context-Aware Suggestions
Static recipe feeds are boring. We built a dynamic discovery engine powered by Gemini 2.0 Flash.
It considers:
- User Profile: Dietary restrictions, skill level, and health goals.
- Pantry Inventory: What ingredients are actually available.
- Current Context: Time of day ("Breakfast ideas"), trending topics.
Instead of searching a database, we ask Gemini to generate structured recipe suggestions that match these constraints perfectly. The result is a hyper-personalized feed that adapts to what you have and who you are.
Real-Time Voice Cooking Assistant
Standard REST APIs were too slow for a "live assistant" experience.
We implemented a direct WebSocket connection to the Gemini Multimodal Live API.
Using the google/genai SDK directly in React Native, we:
- Stream audio logic
- Receive instantaneous text and audio responses
- Manage response queues for natural conversation flow
This created a diverse, low-latency conversational cooking assistant that feels immediate and natural.
Solving the "Salt Math" Problem
Cooking math is messy. How do you deduct "2 teaspoons" of salt from inventory stored in grams? Volume-to-weight conversion depends on ingredient density.
Our Solution: UnitConversionService
We built:
- A local Density Mapping database with cloud sync (
ingredient_profiles) - A normalization pipeline (normalizing "cups" -> "ml" -> "g" based on specific ingredient density)
- Canonical unit conversion logic
For example, our system knows that:
- 1 cup of Rice ≈ 190g
- 1 cup of Flour ≈ 120g
- 1 cup of Salt ≈ 280g
Every deduction is mathematically accurate. No guessing. No rounding chaos. No broken inventory logic.
The Tech Stack
Frontend
- React Native (Expo)
- TypeScript
- ViroReact (AR)
Backend
- Supabase (PostgreSQL)
- Supabase Edge Functions (Deno)
AI Models
- Google Gemini 2.0 Flash Experimental (Live Multimodal Assistant)
- Google Gemini 3.0 Flash Preview (Video Extraction & Parsing)
State Management
- XState (Finite State Machines)
- Zustand (Global Store)
Media
- expo-av (Audio buffering & playback)
- expo-camera (Scanning & perception input)
- expo-speech (Text-to-Speech)
The Result
Kitchen Studio is not just an app. It is a spatial, AI-powered cooking operating system that:
- Sees
- Understands
- Guides
- Calculates
- And cooks with the user
We built a bridge between digital inspiration and physical execution, powered by AR, multimodal AI, and real-time intelligence.
Accomplishments We’re Proud Of
Our First App Ever
We are incredibly proud to have built and completed our first-ever mobile application. From initial concept to a finished product, we successfully navigated the entire development lifecycle.
New Interaction Model
We created a hands-free interaction model for cooking that moves beyond static text to dynamic AR and voice guidance.
Scalable Infrastructure
We engineered a foundation using Supabase and Expo that is ready to grow far beyond this initial prototype.
What We Learned
Confidence Over Content
Users don't just want more recipes; they want the confidence to execute the ones they see.
Simplicity Unlocks Power
A clean, simple UX is the key to making complex AI logic (like Gemini and Spatial AR) accessible to everyday users.
Resilience in Learning
As first-time mobile app developers, we learned that technical roadblocks are just opportunities to understand the underlying native systems more deeply.
What’s Next for Kitchen Studio
Live Multi-Camera Sessions
Enabling broadcasters to show overhead and stove-side angles for better visual context.
Hardware Integration
Expanding the "Kitchen OS" to connect with smart stoves, thermometers, AR glasses, and scales for automated guidance.
Skill-Based Progression
Implementing a gamified "Culinary RPG" where users earn XP and badges based on the technical skills they master.
Offline Edge Inference
Moving vision and voice models to on-device processing to ensure a smooth experience even in kitchens with poor Wi-Fi.
Built With
- deno
- docker
- expo-av
- expo-camera
- expo-haptics
- expo-speech
- expo.io
- google-console
- google-gemini-2.0-flash
- google-gemini-3.0-flash
- google-genai-sdk
- google-play
- google-sign-in
- hono
- lucide
- native
- openai-api
- postgresql
- react
- react-native
- revenuecat
- supabase
- supabase-edge-functions
- tanstack-query
- trpc
- typescript
- viroreact-(ar)
- xstate
- zustand
Log in or sign up for Devpost to join the conversation.