ShadowTales
Interactive AI Storytelling Through Shadow Puppets
What ShadowTales Does
ShadowTales turns hand shadows into live narrated story moments.
- The user makes an animal shadow with their hands.
- A webcam detects the silhouette, classifies the animal, and reads motion.
- The system converts that into structured story context (animal + action).
- Gemini writes the next short narration beat.
- ElevenLabs speaks it immediately, with local SFX and ambient audio.
The experience is designed to feel responsive, playful, and physically interactive.
Core Features
Real-Time Shadow Detection + Motion
- Live camera processing with OpenCV.
- Silhouette extraction using grayscale thresholding + contour filtering.
- Animal classification via a ConvNeXt-based model.
- Motion labeling from tracked movement (for example:
walking left slow,jumping,still). - Detection smoothing so narration and SFX do not flip on one noisy frame.
AI Narration (Gemini)
- Gemini generates short, child-friendly narration in real time.
- Prompting enforces continuity, present tense, and concise output.
- Shadow input is treated as source-of-truth for current on-stage character.
- Optional mic transcription can be included to blend the child’s spoken ideas into the next line.
Voice + Sound (ElevenLabs + Local SFX)
- Narration is streamed with ElevenLabs over a persistent WebSocket connection.
- Audio is queued to keep flow smooth across turns.
- Animal one-shot SFX and looping ambient background run locally from
audio/.
Challenges We Ran Into
Latency
Real-time narration requires tight pacing across classification, generation, and playback. We kept outputs short and used streaming TTS to reduce perceived delay.
Noisy Visual Input
Hand shadows vary with light, background, and pose precision. We added confidence thresholds, batched observations, and hit-based switching to avoid unstable character changes.
Story Drift
LLM narration can stick to old context if character switches are noisy. We added explicit prompt rules and switching logic so new confirmed animals take over narration quickly.
Accomplishments We’re Proud Of
- Built an end-to-end multimodal pipeline (vision + language + voice + SFX).
- Delivered live, low-latency narration from physical shadow play.
- Added practical guardrails for stable character transitions.
Impact & Usefulness
ShadowTales encourages:
- Imaginative play
- Verbal expression
- Cause-and-effect learning
- Collaborative storytelling
Kids move, speak, and hear instant narrative feedback, which keeps them engaged in both physical and creative interaction.
What’s Next
- Better multi-puppet support and interactions.
- Richer motion vocabulary and scene-state tracking.
- Faster turn timing with additional generation/streaming optimizations.
- Optional education modes (science/history/reading themes).
- Session replay and teacher/parent-friendly summaries.
Repo Snapshot
- Main live run:
uv run python -m shadow_stories.live_stream_test - Optional mic mode:
uv run python -m shadow_stories.live_stream_test --interactive-voice - Debug overlay:
uv run python -m shadow_stories.live_stream_test --debug
Log in or sign up for Devpost to join the conversation.