What's next for HapticAI HapticAI transforms 2D video into 4DX-ready haptic experiences using a two-pass Gemini 3 Pro architecture.

Pass 1 — Narrative Context: The full video is uploaded via the Files API and analyzed at 1 FPS. Gemini 3 Pro with Structured JSON Output and Thinking (low) performs scene segmentation, returning timestamped scene boundaries with narrative summaries. This gives the model full story-arc awareness before frame-level analysis.

Pass 2 — Physics Reasoning: The video is split into 15-second overlapping segments uploaded as inline data at native 24 FPS — leveraging Gemini 3 Pro's High-FPS Video Understanding. With Media Resolution HIGH (280 tokens/frame) and Thinking (high), the model performs deep physics reasoning to calculate force vectors for each moment. It emits haptic events via Function Calling using two tools: add_motion_event() (pitch/yaw/heave seat motion) and add_environment_effect() (wind/water/fog/strobe/scent). Each event includes an explainable reasoning field.

AI Editor: A natural-language timeline editor powered by Gemini 3 Flash lets users refine the generated haptic track conversationally.

Post-processing: Librosa snaps event timing to audio transients at 44.1kHz and blends RMS energy with Gemini's semantic intensity scores.

Built With

  • fastapi
  • flash
  • gemini-3-pro
  • librosa
  • react-three-fiber
Share this project:

Updates