Step Step Learn

Inspiration

Every child learns differently. We wondered: What if a child could turn their own science homework or favorite story into a custom music video they could dance to?

What it does

Lyric Generation: Takes a topic (e.g., "The Water Cycle") and generates age-appropriate, catchy lyrics.
Music Creation: Uses Suno to turn those lyrics into a high-quality song in any genre (Pop, Disco, Hip-Hop).
AI Choreography: Our custom pipeline analyzes the song’s rhythm and semantics to generate a 3D character that dances in sync with the words.
Interactive Play: Kids follow the character’s moves on a 3x3 grid, paired with hand motions, and are scored based on the accuracy of their movements.

How we built it

Backend stack: Express, MongoDB, Node.js, ClaudeSDK, Suno API, OpenAI Whisper
Frontend stack: React, Three.js, OpenCV, MediaPipe
ClaudeSDK to generate lyrics, break of lyrics into semantically meaningful fragments, and translate lyric fragments into 3D pose data (arm and elbow angles, feet positions on a grid) while following physical constraints.
Suno API to generate full tracks based on lyrics.
OpenAI Whisper to transcribe audio from Suno into words mapped to timestamps, so that we could align the music with movements in our UI.
OpenCV to register upper body motions, Mediapipe to process margin of error relative to specified movements
Three.js to build the 3D model in our UI.
Floor grid: We used acrylic tiles and cushioning foam, so tiles would compress and rebound when stepped on. Additionally, we used Hall effect sensors paired with magnets inserted into the foam to detect compression of a tile. We wired the sensors to an Arduino Uno.

Challenges we ran into

3D Modeling: Mastering Three.js required a deep dive into skeletal animation. Managing dozens of interconnected joints and ensuring smooth interpolation between AI-generated keyframes was a significant hurdle.
AI Context & Logic Failures: We faced "long context" struggles where Claude initially failed to maintain the mapping between lyric fragments and Whisper’s word-level timestamps. It took extensive prompt engineering and iterative experimentation to get the model to respect rhythmic timing.
Hardware Noise & Sensor Fusion: Our original vision included hand sensors for upper-body tracking, but we encountered significant accelerometer noise, making relative positioning too unstable for accurate gameplay.
Dance Tile Design: Designing the physical input—getting floor tiles to compress and rebound reliably to detect "steps"—presented a classic mechanical engineering challenge that required multiple iterations. We originally tried 3D printing springs but this was too costly and time-consuming. We also considered using copper strips to detect conductivity. Ultimately, we decided on
Pipeline Orchestration: Integrating a chain of multiple models (Suno, Whisper, Claude, Three.js) meant that a failure in any one link could break the entire experience, necessitating robust error-handling and iterative experimentation.

Accomplishments that we're proud of

DIY tile boards: We had to iterate through many designs to make sure the tiles compressed and decompressed properly.
3D model on the UI: We had to learn how to use Three.js for 3D modeling, constrained to physical reality
Agent pipeline: We integrated many models and had to iteratively experiment and design validation/schema to ensure robustness.
Pivot for hand movement detection: After various sensors didn't work, we pivoted to using a computer vision-based pipeline involving OpenCV and MediaPipe.

Built With

Updates

Angelina Ning started this project — Feb 15, 2026 12:20 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.