Inspiration
Public speaking is often cited as one of the most common fears, yet it's a critical skill for success in almost every field. While there are many tools for "writing" better presentations, there are very few for "delivering" them. We wanted to build an AI-powered coach that provides the kind of objective, granular feedback that usually requires a human coach.
What it does
Peak Performance (Presentation Grader) is an iPad-optimized application that serves as a personal communication lab. It records your presentation and uses a sophisticated multi-modal AI pipeline to grade your performance.
- Voice Analysis: Measures clarity, pace (WPM), volume consistency, and detects filler words ("um," "uh," "like").
- Video Analysis: Uses computer vision to track eye contact, posture, and hand gestures.
- Hybrid Scoring: Combines these metrics into an overall "A-F" grade with actionable, personalized feedback for improvement.
How we built it
We built the app using Cursor, Swift, and Xcode, leveraging a "Hybrid AI" architecture to balance speed and accuracy:
- The Voice Pipeline: We combined Apple's SFSpeechRecognizer for real-time feedback, WhisperKit for high-accuracy on-device ML grading, and the AssemblyAI API for deep cloud-based transcription.
- The Video Pipeline: We utilized Apple’s Vision Framework and AVFoundation to perform real-time face detection, pose estimation, and gesture recognition.
- The Orchestrator: A robust
PresentationViewModelmanaged the complex asynchronous flow between local hardware and cloud APIs using Swift’s Async/Await and Actors.
Challenges we ran into
- Concurrency & Performance: Running heavy ML models (WhisperKit) alongside real-time video processing (Vision) on an iPad was resource-intensive. We had to implement strict memory management and actor isolation to prevent UI lag.
- Model Cold Starts: WhisperKit models are large. Designing an intuitive "AnalysisView" that keeps the user engaged while models download and process was a UX challenge.
- Permission Spiral: Handling simultaneous access to the camera, microphone, and speech recognition libraries required a robust
PermissionsManagerto ensure a smooth onboarding flow.
Accomplishments that we're proud of
- Hybrid Transcription: Successfully syncing three different speech technologies to work as a unified grading engine.
- On-Device Privacy: Processing significant portions of the body language and voice data directly on the iPad, keeping user data secure.
- Intuitive UI: Creating a "ResultsView" that turns complex data (like pose estimation coordinates) into easy-to-understand metrics for the user.
What we learned
- Computer Vision Nuance: We learned that "confidence" in a presentation can actually be quantified through metrics like shoulder stability and hand-gesture frequency.
- Swift Package Management: Integrating complex external dependencies like WhisperKit taught us a lot about modern iOS build configurations.
- ML Accuracy vs. Latency: We learned when to use on-device models for speed and when to offload to the cloud (AssemblyAI) for maximum linguistic accuracy.
What's next for Peak Performance
- Historical Tracking: A dashboard to show how a user’s "Filler Word" count decreases over months of practice.
- Live Nudges: Implementing haptic feedback or visual cues that subtly alert the presenter if they are speaking too fast or looking away from the camera in real-time.
Log in or sign up for Devpost to join the conversation.