Project Story: Smart Audio Navigation System

Inspiration:

The inspiration for this project came from observing the challenges faced by visually impaired individuals navigating unfamiliar spaces. While developing assistive technologies, we noticed that many existing solutions rely heavily on expensive hardware or complex setups that aren't easily accessible or customizable.

We envisioned a system that could:

Provide real-time auditory feedback based on proximity to objects
Be highly customizable with different audio cues for different users
Leverage modern web technologies for an intuitive configuration interface
Use affordable, off-the-shelf components that anyone could assemble

Our goal was to create a proof-of-concept that demonstrates how embedded systems, web technologies, and AI can converge to create accessible, user-friendly assistive devices.

What We Learned:

Hardware

Implementing real-time sensor fusion with ESP32, combining ultrasonic distance sensors, RFID readers, and servo motors
Optimizing audio playback on constrained hardware using ESP32's DAC (Digital-to-Analog Converter)
Utilizing Web Serial API for direct browser-to-hardware communication

Audio Processing

Converting various audio formats (MP3, WAV) to raw PCM data suitable for ESP32 DAC playback
Generating .h header files with audio sample arrays for embedded C/C++ compilation
Balancing audio quality with memory constraints (downsampling, bit depth reduction)

Web Development

Building a responsive dashboard using Next.js and TypeScript
Integrating ElevenLabs API for AI-generated voice and sound effects
Creating an intuitive UI with distance-based audio configuration

Soft Skills

We learned to break down a complex multi-disciplinary project into manageable components: hardware assembly, firmware development, audio processing pipeline, web interface, and cloud storage.
Troubleshooting issues that spanned hardware wiring, Arduino C++ code, TypeScript web applications, and real-time serial communication taught us systematic debugging approaches.

Architecture Overview

Our system consists of four main components:

ESP32 Firmware (Arduino)
Web Dashboard (Next.js)
Audio Processing Pipeline
Cloud Storage (MongoDB)

Hardware Setup

ESP32 Development Board (brain of the system)
HC-SR04 Ultrasonic Sensor (distance measurement)
MFRC522 RFID Reader (user identification)
SG90 Servo Motor (sweeping sensor motion)
PAM8403 Audio Amplifier + Speaker (audio output)
SSD1306 OLED Display (status feedback)

Wiring & Integration: We carefully connected all components to the ESP32, mapping GPIO pins for the ultrasonic sensor (Trig: 33, Echo: 32), RFID (SPI interface), servo (PWM on pin 25), and DAC audio output (GPIO 26).

Firmware Development

The ESP32 firmware handles:

RFID card scanning for user identification
Ultrasonic distance measurement (0-250cm range)
Servo-controlled sensor sweeping
Distance-based audio trigger selection
DAC audio playback from SPIFFS
Serial communication protocol with web dashboard
Persistent settings storage using Preferences API

Key Challenges Solved:

Non-blocking sensor reads to prevent audio stuttering
Efficient memory management for audio buffers
Robust serial protocol for settings transfer

Web Dashboard

Built with modern web technologies:

Tech Stack:

Next.js 14 (App Router)
TypeScript
Tailwind CSS + shadcn/ui components
Web Serial API for ESP32 communication
ElevenLabs API integration

Features:

Custom Audio Upload (MP3/WAV files)
Tone Generator (custom frequency buzzer sounds)
AI Audio Generation (TTS and sound effects via ElevenLabs)
Visual Distance Range Configuration (0-250cm)
Direct ESP32 Connection via USB
Cloud Settings Storage per User

Audio Processing Pipeline

User uploads MP3/WAV or generates custom tone
Decode audio to PCM samples
Downsample to 16kHz (ESP32 DAC optimal rate)
Convert to 8-bit unsigned format
Generate C header files with sample arrays
Transfer to ESP32 via Web Serial
Store in SPIFFS filesystem

Distance-Based Triggering

Users configure zones (e.g., 0-50cm, 50-100cm, 100-200cm) and assign different audio cues to each. The ESP32:

Continuously measures distance
Checks which zone the measurement falls into
Plays the corresponding audio file
Switches seamlessly when crossing zone boundaries

Real-Time Visualization

The web dashboard includes a live preview mode that:

Animates a slider moving from 0 to 250cm
Plays the configured audio for each zone as the slider passes through
Provides immediate feedback on audio assignments
Syncs playback timing perfectly with visual position

Challenges We Faced

1. Audio Playback Quality

Initial audio playback was choppy and distorted due to blocking operations. To fix this, we implemented non-blocking distance measurement with timeouts and optimized DAC write timing for smooth playback.

2. Memory Constraints

The ESP32 has limited RAM (~520KB) and SPIFFS storage, but audio files can be large. Our compression downsampled all audio to 16kHz mono and reduced bit depth to 8 bits, and streams audio from SPIFFS instead of loading into RAM.

3. Cross-Domain Debugging

Challenge: Bugs could originate from hardware, firmware, web app, or the interface between them. We added extensive serial debug logging and implemented a systematic debugging method to test individual and integrated components.

4. Saving User Settings

Each user needs their own custom audio and range settings stored on ESP32. We implemented RFID-based user identification, using per-user settings stored in both cloud (MongoDB) and locally.