About the project

Inspiration

Aphasia affects millions of stroke survivors and people with neurodegenerative conditions. In many cases, the person’s intelligence and intent are fully there, but the words come out as fragments like “Water… want” or “Drive… store.” That gap between what someone means and what they can produce creates constant friction. Caregivers end up guessing, conversations slow down, and people can feel isolated.

We wanted to build something that makes communication feel fast and natural, and that looks and feels like modern tech rather than a stigmatizing medical device.

What Somatica does

Somatica is an AI powered communication assistant for people with aphasia.

  • Listen: The user presses a large yellow button and speaks a few keywords
  • Transcribe: Speech is converted to text via Whisper
  • Interpret: A language model expands the fragment into intended meaning using context
  • Suggest: Somatica shows at most three clear sentence options, optimized for accessibility
  • Speak: The user presses a corresponding arcade button to speak the chosen sentence aloud using neural TTS
  • Support caregivers: A companion dashboard manages profiles, settings, and exports research logs

Example

  • Input: “Water… want”
  • Suggestions:
    1. “I want some water.”
    2. “Can you bring me a glass of water, please?”
    3. “I am thirsty, can I get water?”

How we built it

We built Somatica as a hardware plus software system.

Frontend and UI

  • Next.js 16 for the main app and caregiver companion dashboard
  • Tailwind CSS for an aphasia optimized UI: high contrast, large tap targets, visual cues, and a strict limit of three options

Backend and AI pipeline

  • OpenAI Whisper for low friction speech to text
  • Anthropic Claude as the intent engine that expands fragmented speech into natural sentences
  • ChromaDB as a vector store to support retrieval augmented generation, so Somatica can pull relevant profile details and past conversation context

Hardware

  • Raspberry Pi running a lightweight Python controller
  • Arcade buttons for tactile, high contrast access
    • Yellow records
    • Red, blue, gray select options
  • Microphone and speaker integration for a dedicated communication box experience

Challenges we ran into

  • Audio latency: Time to speak is everything. If output takes too long, it stops feeling like conversation, so we streamlined the pipeline from transcription to generation to TTS.
  • Raspberry Pi audio: Getting reliable microphone and speaker routing on embedded Linux while also running a web server took real systems debugging.
  • Hallucinations: Early versions generated sentences that were fluent but not grounded in what the user said. We reduced this by tightening prompts, retrieving only relevant profile memory, and enforcing strict output constraints.
  • UX constraints: Aphasia friendly design is not just big buttons. It is about minimizing cognitive load, avoiding clutter, and keeping choices small while still giving control.

Accomplishments that we're proud of

  • A tactile interface that feels empowering and fun, not clinical
  • A complete end to end loop: press button, speak keywords, pick an option, hear it spoken out loud
  • Context awareness that improves disambiguation, for example mapping “wife” to the spouse name stored in the profile
  • A caregiver companion dashboard that turns configuration and research export into a simple workflow
  • An analyst style summary of conversation logs that can surface patterns like recurring pain mentions or higher frustration at certain times

What we learned

  • Context is king: Generic language models struggle with AAC. Personalization plus memory makes the same model dramatically more useful.
  • Constraints beat cleverness: Limiting to three grounded options reduces confusion and lowers hallucination risk.
  • Accessibility is product engineering: Latency, contrast, layout, and tactile input matter as much as model quality.
  • Integration is the hard part: Real world assistive tech lives or dies on reliability, not demos.

What's next for Somatica

  • Visual context: Add a camera so the system can incorporate what the user is pointing at, like a cup or TV remote
  • Voice banking: Let users speak using their own pre injury voice through personalized TTS
  • Clinical pilot: Work with Speech Language Pathologists to validate usability, safety, and caregiver dashboard usefulness
  • Resilience and privacy: Better offline tolerance, clearer data controls, and configurable retention for logs

Built With

  • ai
  • anthropic-claude
  • chromadb
  • chromadb-for-rag-memory
  • claude
  • css
  • hardware
  • llm
  • next.js
  • openai-text-to-speech
  • openai-whisper
  • pwa
  • python-for-the-pi-button-controller
  • rag
  • raspberry-pi
  • raspberry-pi-hardware
  • tailwind
  • tailwind-css
  • tts
  • whisper
Share this project:

Updates