Inspiration
What it does ARGuide — Because Anyone Can Be a Hero
Picture this: you're at a family dinner and your grandfather collapses. Everyone freezes. Someone knows they should do CPR but can't remember the steps. Someone else is already Googling. Precious seconds pass.
Now picture the same scene — but this time someone puts on their Meta Ray-Ban glasses, says "help me do CPR", and a calm voice immediately begins guiding them while AR animations appear showing exactly where to press and how fast to pump. Completely hands-free. Eyes on the patient the entire time.
That's what we built. Not because it's a cool tech demo. Because that moment happens every single day to real families, and the difference between life and death is often just knowing what to do.
But we didn't stop at emergencies. We thought about all the quiet moments where people feel helpless without guidance:
- A grandmother alone trying to assemble furniture her grandkids sent her, instruction manual in hand but too confusing to follow
- A first-generation college student fixing their own car for the first time because they can't afford a mechanic
- A kid trying to cook a meal for their family with no one to show them how
These moments don't make the news. But they matter just as much. ARGuide is for all of them.
The Gap We're Bridging
Every AI assistant today gives you audio. Siri tells you. Alexa tells you. ChatGPT tells you. But for hands-on physical tasks, being told is not enough.
When you're performing CPR on someone, you don't need a voice telling you "press down 2 inches" — you need to see exactly where to place your hands and feel the rhythm. When you're assembling furniture, a diagram in a manual is confusing — but a 3D animation floating over the actual part in front of you is instantly clear.
ARGuide bridges the gap between audio-only AI guidance and the physical world through real-time augmented reality.We don't just tell you what to do — we show you, right on top of the thing you're looking at, in 3D, in real time.
What We Built
ARGuide is a real-time augmented reality assistant that sees what you see, understands what you need, speaks to you like a patient teacher, and overlays 3D animated guides directly onto the real world — completely hands-free.
It works on both iPhone and Meta Ray-Ban smart glasses, making it truly wearable. With Ray-Bans you never even need to hold your phone — the camera sees what you see, Gemini understands the situation, and guidance appears in your field of view while your hands stay completely free to act.
You don't need training. You don't need experience. You just need to look.
How We Built It
The AI Brain (Python + Gemini Live)
At the core is Google's Gemini Live API — a multimodal real-time model that simultaneously processes your voice and your camera feed. It doesn't just hear you, it sees your situation. It watches your hands, tracks your progress, and adapts its guidance in real time.
When Gemini determines you're ready for the next step, it calls a structured function tool that publishes AR instructions over LiveKit's data channel:
{
"step": 1,
"animation": "cpr_hands",
"instruction": "Place both hands on center of chest, push down 2 inches"
}
It also calls draw_bounding_box to highlight specific objects or parts
the user should interact with next — drawing labeled boxes directly on
the camera feed.
The Nervous System (LiveKit Cloud)
LiveKit handles all real-time communication — audio, video, and data — over WebRTC. It connects the Meta Ray-Ban camera feed and microphone to our Python agent, and sends AR instructions back to the iOS app in under a second.
The Eyes (iOS + ARKit + RealityKit)
The iPhone app uses ARKit's body tracking to anchor 3D animations to real people and surfaces in the camera view. When a step instruction arrives, a 3D model appears overlaid on the real world showing exactly what to do. No guessing. No interpreting diagrams. Just clear visual guidance right in front of you.
Challenges We Faced
We hit every wall you can hit at a hackathon.
The Gemini Live API only responded once and then went silent. We spent
hours debugging before discovering that session.send() with types.Content
was the wrong approach — the correct method was send_realtime_input() with
proper audio blobs and the model's built-in voice activity detection.
Swift 6 strict concurrency was brutal. Every time we tried to pass data
from LiveKit's background delegate callbacks to our SwiftUI UI, the compiler
threw actor isolation errors. We rewrote the data pipeline three times before
landing on a clean ARStepManager singleton with proper Sendable conformances.
AR anchoring— getting a 3D model to stay locked to a real person's chest as they move required per-frame joint transform updates using ARKit's skeleton tracking. One wrong matrix multiplication and the hands float off into space.
Time — we built all of this in under 24 hours on no sleep. That was the hardest part.
What We Learned
We learned that the hardest problems in AR aren't technical — they're human. Getting the latency low enough that guidance feels natural. Making the overlays clear enough that a panicking person can follow them. Designing a system simple enough that your grandma can use it without instructions.
We also learned that Gemini's ability to simultaneously understand voice and video in real time is genuinely remarkable. Watching it track someone's hands and adapt its guidance accordingly — that's not a demo trick. That's a glimpse of something real.
What's Next
We believe ARGuide can become the layer of intelligence that sits between people and the physical world — always available, always patient, never judging.
The next steps are expanding the task library, deepening the Meta Ray-Ban integration for fully hands-free operation, and building an offline emergency mode so it works even when the internet doesn't.
Because the moment you need this most is exactly the moment you can least afford for it to fail.
Log in or sign up for Devpost to join the conversation.