SignCast: Real-Time Voice-to-Sign AI Translator


Inspiration

Communication is a fundamental human right, yet over 70 million Deaf people worldwide face significant barriers in accessing spoken content in real-time. While captions exist, they often fail to capture the nuances of sign language—the primary language for many in the Deaf community.

We built SignCast to provide a seamless, private, and powerful bridge between the hearing and Deaf worlds. Our goal was to create a tool that isn't just a translator, but an empowerment platform that works anywhere—from a doctor's office to a live lecture—without compromising user privacy.

What it does

SignCast is a sophisticated AI-powered application that transcribes spoken English and translates it instantly into two powerful visual formats:

  1. SignWriting Notation: A precise, visual script that captures the physical movements, handshapes, and facial expressions of sign language.
  2. Animated 2D Poses: Real-time skeletal animations that provide a lifelike representation of signs.

Key Features:

  • Real-Time Speech Capture: High-fidelity transcription using local AI models.
  • Intelligent Text Simplification: Leverages cutting-edge LLMs to simplify complex spoken English into sign-friendly structures.
  • Privacy-First (Local AI): Core translation and transcription models run entirely on the user's device.
  • Professional Export: Download translations as images for educational or documentation purposes.
  • Premium UI: A sleek, modern dashboard with dark mode and responsive design for all devices.

How we built it

We engineered a high-performance stack to handle the intensive requirements of real-time AI translation:

  • Frontend: Built with React + TypeScript and Vite for blistering speed. Styled with Tailwind CSS for a premium, accessible look.
  • Backend: A high-performance FastAPI server that orchestrates the local AI pipeline.
  • AI Models:
    • OpenAI Whisper: Used for robust, local speech-to-text.
    • Sockeye (NMT): A Neural Machine Translation model specifically fine-tuned for Text-to-SignWriting conversion.
    • Groq API: Integrated for ultra-fast, optional LLM-based text simplification (using Llama 3.3 / Gemini models).
  • Visualization: @sutton-signwriting for notation rendering and Pose Viewer for skeletal animations.

Challenges we ran into

  • Local Inference Latency: Running heavy ML models like Whisper and Sockeye locally while maintaining a snappy UI was a major hurdle. We optimized the pipeline using PyTorch's efficiency tricks and FastAPI's asynchronous architecture.
  • SignWriting Rendering: Rendering complex SignWriting symbols as high-quality SVGs in real-time required deep integration with the Sutton SignWriting engine.
  • Audio Complexity: Handling various audio inputs (System Audio vs. Microphone) across different browsers and OS environments.

Accomplishments that we're proud of

  • Low-Latency Sync: The "magic" moment when you speak and see the SignWriting symbols appear almost instantly is something we're incredibly proud of.
  • Aesthetic Integration: We didn't want this to look like a "research tool." We're proud of creating a product that feels like a premium, consumer-ready app.
  • Offline Capability: Demonstrating that high-quality AI shouldn't always require a cloud connection, ensuring better privacy for sensitive conversations.

What we learned

  • The Beauty of SignWriting: We dove deep into the world of sign language linguistics and realized how much richer it is than just "hand signals." Learning to represent physical motion in a written script was a fascinating challenge.
  • User-Centric Design: We learned that for an accessibility tool, accessibility in the interface is just as important as the core technology.

What's next for SignCast

  • Multilingual Support: Expanding beyond English to support Spanish, ASL, BSL, and more.
  • Holographic Display: Visualizing signs in 3D or AR for a more immersive experience.
  • Mobile App: Dedicated iOS and Android apps for on-the-go translation.
  • Collaborative Sessions: Allowing multiple users to join a translation stream in real-time.

Created with care for global accessibility.

Built With

Share this project:

Updates