Inspiration

Public speaking is often cited as one of the most common fears, yet it's a critical skill for success in almost every field. While there are many tools for "writing" better presentations, there are very few for "delivering" them. We wanted to build an AI-powered coach that provides the kind of objective, granular feedback that usually requires a human coach.

What it does

Peak Performance (Presentation Grader) is an iPad-optimized application that serves as a personal communication lab. It records your presentation and uses a sophisticated multi-modal AI pipeline to grade your performance.

  • Voice Analysis: Measures clarity, pace (WPM), volume consistency, and detects filler words ("um," "uh," "like").
  • Video Analysis: Uses computer vision to track eye contact, posture, and hand gestures.
  • Hybrid Scoring: Combines these metrics into an overall "A-F" grade with actionable, personalized feedback for improvement.

How we built it

We built the app using Cursor, Swift, and Xcode, leveraging a "Hybrid AI" architecture to balance speed and accuracy:

  • The Voice Pipeline: We combined Apple's SFSpeechRecognizer for real-time feedback, WhisperKit for high-accuracy on-device ML grading, and the AssemblyAI API for deep cloud-based transcription.
  • The Video Pipeline: We utilized Apple’s Vision Framework and AVFoundation to perform real-time face detection, pose estimation, and gesture recognition.
  • The Orchestrator: A robust PresentationViewModel managed the complex asynchronous flow between local hardware and cloud APIs using Swift’s Async/Await and Actors.

Challenges we ran into

  • Concurrency & Performance: Running heavy ML models (WhisperKit) alongside real-time video processing (Vision) on an iPad was resource-intensive. We had to implement strict memory management and actor isolation to prevent UI lag.
  • Model Cold Starts: WhisperKit models are large. Designing an intuitive "AnalysisView" that keeps the user engaged while models download and process was a UX challenge.
  • Permission Spiral: Handling simultaneous access to the camera, microphone, and speech recognition libraries required a robust PermissionsManager to ensure a smooth onboarding flow.

Accomplishments that we're proud of

  • Hybrid Transcription: Successfully syncing three different speech technologies to work as a unified grading engine.
  • On-Device Privacy: Processing significant portions of the body language and voice data directly on the iPad, keeping user data secure.
  • Intuitive UI: Creating a "ResultsView" that turns complex data (like pose estimation coordinates) into easy-to-understand metrics for the user.

What we learned

  • Computer Vision Nuance: We learned that "confidence" in a presentation can actually be quantified through metrics like shoulder stability and hand-gesture frequency.
  • Swift Package Management: Integrating complex external dependencies like WhisperKit taught us a lot about modern iOS build configurations.
  • ML Accuracy vs. Latency: We learned when to use on-device models for speed and when to offload to the cloud (AssemblyAI) for maximum linguistic accuracy.

What's next for Peak Performance

  • Historical Tracking: A dashboard to show how a user’s "Filler Word" count decreases over months of practice.
  • Live Nudges: Implementing haptic feedback or visual cues that subtly alert the presenter if they are speaking too fast or looking away from the camera in real-time.

Built With

Share this project:

Updates