Inspiration
You're scrolling TikTok. A movie clip plays. 3 seconds of something incredible but no title, no context, no way to find it.
You screenshot. Reverse image search. Nothing. You post on Reddit: "What movie is this?" and wait 6 hours for a stranger to maybe reply.
Shazam solved this for music in 2008. Why hasn't anyone solved it for movies?
I built Reckall because I was tired of losing hours trying to identify 10-second clips. If AI can recognize a song from ambient noise in a coffee shop, it can recognize a movie from a few frames and some dialogue.
What it does
Reckall is Shazam for movies. Record a clip. Get the answer.
Core Flow:
- Record or upload any video clip (5-30 seconds)
- Reckall extracts frames and transcribes audio
- Gemini 3's multimodal reasoning identifies the movie
- Get title, year, confidence score, and why it matched
- See where to stream it (real data, not hallucinated)
- Discover similar movies based on tone, director, and themes
Key Features:
- Multimodal Recognition: Analyzes video frames + audio transcript + actor detection simultaneously
- Confidence Scoring: Shows you how certain it is and what signals matched
- Transparent Reasoning: Explains why it thinks this is the movie not a black box
- Real Streaming Data: TMDB Watch Providers API actual Netflix/Prime/etc availability
- Smart Caching: Movies get faster to recognize over time as the database grows
- In-App Trimming: Clip too long? Trim it right in the app before upload
How I built it
The Pipeline:
Upload → Parse → Trim (optional) → FFmpeg Extract Frames + Audio (sequential)
→ Transcribe (Gemini, Whisper fallback) → One-Shot Multimodal Recognition
→ DB Lookup (skip TMDB if cached) → Actor Verification → TMDB Fetch → Cache
→ Streaming Providers → Response
Technical Stack:
- Frontend: Next.js 14, React, TailwindCSS
- Backend: Next.js API Routes, Server Actions
- AI: Gemini 3 Flash (multimodal recognition + transcription), OpenAI Whisper (fallback)
- Video Processing: FFmpeg (frame extraction, audio extraction, trimming)
- Database: Supabase (PostgreSQL) movie cache, user uploads, analytics
- External APIs: TMDB (movie data, posters, streaming providers)
- Infrastructure: Railway (512MB memory limit forced me to optimize)
The One-Shot Approach:
Instead of multiple API calls, I send frames + transcript to Gemini in a single multimodal request. The prompt asks for:
- Movie title and year
- Confidence score (0-1)
- Matched signals (dialogue, actors, visual style, setting)
- Actor names detected
- Reasoning explanation
- Alternative guesses
This is faster and more coherent than chaining separate vision + text calls.
Challenges I ran into
1. Memory Limits (512MB) Railway's free tier has 512MB RAM. Video processing is memory-hungry. I had to:
- Process frames and audio sequentially instead of parallel
- Explicitly null buffer references to help garbage collection
- Reject videos over 50MB upfront
- Use only 2 frames instead of 5 (still accurate)
2. Hallucinated Streaming Links First version: Asked Gemini where to watch the movie. It confidently gave fake Netflix URLs. Solution: Never let the LLM generate links. Fetch real data from TMDB's Watch Providers API, then let AI reason over real data.
3. Actor Verification Gemini would recognize actors correctly but match them to the wrong movie. "Leonardo DiCaprio" → suggests The Revenant when it's actually Inception. Solution: Added a verification step check if detected actors actually appear in the movie's TMDB cast before returning.
4. Transcript-Based Overrides Some content (like trailers for unreleased movies) has no TMDB entry yet. I added keyword detection in transcripts to catch these edge cases and route to manual database entries.
5. Upload Corruption Mobile uploads on slow networks would fail silently. I added detailed error codes (UPLOAD_INTERRUPTED, FILE_TOO_LARGE, UPLOAD_TIMEOUT) with user-friendly messages.
Accomplishments that I'm proud of
- Sub-3-second recognition for cached movies
- 92%+ accuracy on mainstream films
- Production-grade error handling not a hackathon demo that breaks
- Smart caching the system gets faster as more people use it
- Built in 4 days while working overnight cleaning shifts (1am-6am)
What I learned
- Multimodal AI is ready for real applications Gemini handles video frames + audio together remarkably well
- Memory management matters I learned more about garbage collection in 2 days than in 2 years
- The "Action Era" is real users don't want analysis, they want results + next steps
- Caching is underrated skipping redundant API calls made this 10x faster
What's next for Reckall
- Scene Timestamps: "This is 47 minutes into Inception"
- Social Features: Share identified clips with friends
- Browser Extension: Right-click any video on the web → identify
- Watchlist Sync: Connect to Letterboxd, Trakt, etc.
- Offline Mode: On-device recognition for common movies
Built With
- ffmpeg
- gemini-3-flash
- gemini-api
- next.js
- openai-whisper
- postgresql
- railway
- react
- react-native
- supabase
- tailwindcss
- tmdb-api
- typescript
- vercel
Log in or sign up for Devpost to join the conversation.