Reckall

Inspiration

You're scrolling TikTok. A movie clip plays. 3 seconds of something incredible but no title, no context, no way to find it.

You screenshot. Reverse image search. Nothing. You post on Reddit: "What movie is this?" and wait 6 hours for a stranger to maybe reply.

Shazam solved this for music in 2008. Why hasn't anyone solved it for movies?

I built Reckall because I was tired of losing hours trying to identify 10-second clips. If AI can recognize a song from ambient noise in a coffee shop, it can recognize a movie from a few frames and some dialogue.

What it does

Reckall is Shazam for movies. Record a clip. Get the answer.

Core Flow:

Record or upload any video clip (5-30 seconds)
Reckall extracts frames and transcribes audio
Gemini 3's multimodal reasoning identifies the movie
Get title, year, confidence score, and why it matched
See where to stream it (real data, not hallucinated)
Discover similar movies based on tone, director, and themes

Key Features:

Multimodal Recognition: Analyzes video frames + audio transcript + actor detection simultaneously
Confidence Scoring: Shows you how certain it is and what signals matched
Transparent Reasoning: Explains why it thinks this is the movie not a black box
Real Streaming Data: TMDB Watch Providers API actual Netflix/Prime/etc availability
Smart Caching: Movies get faster to recognize over time as the database grows
In-App Trimming: Clip too long? Trim it right in the app before upload

How I built it

The Pipeline:

Upload → Parse → Trim (optional) → FFmpeg Extract Frames + Audio (sequential)
→ Transcribe (Gemini, Whisper fallback) → One-Shot Multimodal Recognition
→ DB Lookup (skip TMDB if cached) → Actor Verification → TMDB Fetch → Cache
→ Streaming Providers → Response

Technical Stack:

Frontend: Next.js 14, React, TailwindCSS
Backend: Next.js API Routes, Server Actions
AI: Gemini 3 Flash (multimodal recognition + transcription), OpenAI Whisper (fallback)
Video Processing: FFmpeg (frame extraction, audio extraction, trimming)
Database: Supabase (PostgreSQL) movie cache, user uploads, analytics
External APIs: TMDB (movie data, posters, streaming providers)
Infrastructure: Railway (512MB memory limit forced me to optimize)

The One-Shot Approach:

Instead of multiple API calls, I send frames + transcript to Gemini in a single multimodal request. The prompt asks for:

Movie title and year
Confidence score (0-1)
Matched signals (dialogue, actors, visual style, setting)
Actor names detected
Reasoning explanation
Alternative guesses

This is faster and more coherent than chaining separate vision + text calls.

Challenges I ran into

1. Memory Limits (512MB) Railway's free tier has 512MB RAM. Video processing is memory-hungry. I had to:

Process frames and audio sequentially instead of parallel
Explicitly null buffer references to help garbage collection
Reject videos over 50MB upfront
Use only 2 frames instead of 5 (still accurate)

2. Hallucinated Streaming Links First version: Asked Gemini where to watch the movie. It confidently gave fake Netflix URLs. Solution: Never let the LLM generate links. Fetch real data from TMDB's Watch Providers API, then let AI reason over real data.

3. Actor Verification Gemini would recognize actors correctly but match them to the wrong movie. "Leonardo DiCaprio" → suggests The Revenant when it's actually Inception. Solution: Added a verification step check if detected actors actually appear in the movie's TMDB cast before returning.

4. Transcript-Based Overrides Some content (like trailers for unreleased movies) has no TMDB entry yet. I added keyword detection in transcripts to catch these edge cases and route to manual database entries.

5. Upload Corruption Mobile uploads on slow networks would fail silently. I added detailed error codes (UPLOAD_INTERRUPTED, FILE_TOO_LARGE, UPLOAD_TIMEOUT) with user-friendly messages.

Accomplishments that I'm proud of

Sub-3-second recognition for cached movies
92%+ accuracy on mainstream films
Production-grade error handling not a hackathon demo that breaks
Smart caching the system gets faster as more people use it
Built in 4 days while working overnight cleaning shifts (1am-6am)

What I learned

Multimodal AI is ready for real applications Gemini handles video frames + audio together remarkably well
Memory management matters I learned more about garbage collection in 2 days than in 2 years
The "Action Era" is real users don't want analysis, they want results + next steps
Caching is underrated skipping redundant API calls made this 10x faster

What's next for Reckall

Scene Timestamps: "This is 47 minutes into Inception"
Social Features: Share identified clips with friends
Browser Extension: Right-click any video on the web → identify
Watchlist Sync: Connect to Letterboxd, Trakt, etc.
Offline Mode: On-device recognition for common movies

Built With

ffmpeg
gemini-3-flash
gemini-api
next.js
openai-whisper
postgresql
railway
react
react-native
supabase
tailwindcss
tmdb-api
typescript
vercel

Updates

Travis Moore started this project — Feb 08, 2026 05:49 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.