Inspiration

The future of technical hiring is broken. During a conversation with Narayana Aaditya from SkillSync (YC W26), a realization hit me: as AI agents become more powerful, LeetCode-style interviews will become obsolete within a few years. Startups don't need engineers who can invert binary trees on a whiteboard; they need builders who can ship PRs fast, prompt engineer effectively, and operate with high agency.

Gitty.ai was born from this insight: what if we replaced algorithmic puzzles with real GitHub work? Instead of artificial coding challenges, we let engineers prove their skills by solving actual issues from real repositories, using whatever tools they want, including AI agents. This is how modern development actually works.

What it does

Gitty.ai is a two-sided hiring platform that turns GitHub contributions into your technical interview.

For Engineers:

  • Complete timed interview challenges (3 issues, 30 minutes) that mirror real work you'd do on the job
  • Solve practice issues to build your developer profile
  • Compete for paid bounties by submitting PR solutions to real repository issues
  • Use AI agents, any IDE, or whatever tools you normally code with. We evaluate the output, not how you got there
  • Build a profile showcasing actual merged PRs and problem-solving ability

For Companies:

  • Post bounties for real issues in your repository that need fixing
  • Create custom role applications with timed issue challenges
  • Review candidates through their actual code contributions, not whiteboard performance
  • Filter by skills, PR history, and real GitHub activity
  • Select bounty winners and hire engineers who've already proven they can contribute to your codebase

Our AI-powered evaluation system analyzes PRs holistically: code quality, approach, completeness, and alignment with best practices.

How we built it

Backend (FastAPI + Python):

  • Single-service architecture with main.py orchestrating the evaluation workflow
  • GitHub REST API integration for issue/PR metadata, comments, file diffs, and README context
  • Browserbase + Playwright CDP for crawling rendered GitHub pages
  • Claude (Anthropic) API for intelligent PR evaluation with structured JSON output
  • Model fallback system across Claude Opus/Sonnet variants
  • Environment-driven config via ANTHROPIC_API_KEY, GITHUB_TOKEN, BROWSERBASE_WS_ENDPOINT
  • Render for hosting

Frontend (React + TypeScript + Vite):

  • React Router for multi-page navigation
  • Firebase Auth + Firestore for user management and data persistence
  • GitHub OAuth for developer authentication
  • Dual onboarding flows: engineers (GitHub) and companies (Google)
  • Real-time interview timer with localStorage state management
  • Custom CSS + Tailwind for styling
  • Vercel deployment pipeline

Evaluation Pipeline:

  1. Validate issue + PR URLs from same repo
  2. Fetch all GitHub context (metadata, comments, diffs, base files)
  3. Crawl rendered pages for additional context
  4. Send structured prompt to Claude with repo README, issue description, PR changes
  5. Parse AI response into normalized score (0-100) with actionable feedback
  6. Return to frontend for display

Challenges we ran into

Context assembly. GitHub's API spreads data across multiple endpoints. We had to stitch together issue comments, PR review comments, file patches, and base file content into coherent context without hitting rate limits.

Prompt engineering for consistent scoring. Getting Claude to return reliable, normalized scores with structured feedback took multiple iterations. We needed strict JSON schemas and explicit scoring criteria to make output machine-readable.

Browserbase integration. Connecting via CDP websockets and reliably extracting visible text from GitHub's dynamic UI required careful Playwright scripting and error handling for connection timeouts.

Balancing real-time with cost. Running Claude Opus on every PR evaluation is expensive and slow. We implemented model fallbacks (Opus to Sonnet 3.7 to Sonnet 3.5) and optimized prompt size by intelligently truncating diffs and README excerpts.

Dual user flows. Building separate onboarding, dashboards, and workflows for engineers vs. companies meant managing two parallel state machines while keeping the codebase maintainable.

Accomplishments that we're proud of

  • End-to-end AI evaluation pipeline that works: from URL input to scored PR feedback in under 30 seconds
  • Real timed interview system with 3-issue challenges that mirror actual on-the-job work
  • Bounty marketplace where companies post real issues and engineers compete with real solutions
  • Dual-sided platform serving both engineers and companies with Firebase + GitHub OAuth integration
  • Shipped a production-ready MVP in 36 hours with authentication, routing, API integration, and deployment
  • Agent-friendly evaluation: we don't care if you used Cursor, Copilot, or Claude Code. We grade the PR, not the process

What we learned

GitHub is your resume. The most signal-rich data about an engineer's ability isn't on their LinkedIn; it's in their commit history, PR descriptions, and code review discussions. We learned how to extract and structure this data programmatically.

LLMs can replace human code reviewers (mostly). Claude's ability to evaluate code quality, architectural decisions, and edge case handling is remarkably effective when given proper context. The gap between AI and senior engineer PR review is narrowing fast.

The interview process is ripe for disruption. Engineers universally hate LeetCode. Companies waste time on algorithmic hazing that has zero correlation with job performance. There's massive demand for a system that tests real skills.

Tooling doesn't matter, output does. By embracing AI agents in our interview flow, we're ahead of the curve. The engineers who thrive at startups in 2025+ will be the ones who can ship fast using any tool available.

What's next for Gitty.ai

Caching layer to reduce API costs and speed up repeat evaluations of similar repos

Company analytics dashboard showing candidate pipeline metrics, average scores, and time-to-hire

Leaderboard + reputation system where top bounty solvers get featured placement and direct recruiter outreach

Async evaluation queue with Redis/Celery to handle concurrent PR reviews at scale

Custom issue templates so companies can scaffold challenges that match their stack (e.g., "debug this React component" vs "optimize this SQL query")

Referral hiring where engineers can vouch for teammates they've collaborated with on bounties

Integration with job boards (Wellfound, YC Work at a Startup) to auto-populate role listings

Subscription tiers for companies (free for bounties, paid for unlimited interview challenges and advanced candidate filtering)

The long-term vision: Gitty becomes the default technical interview for every startup. When YC companies need to hire fast, they post issues on Gitty. When engineers want to break into startups, they build their profile by shipping real PRs. LeetCode fades into irrelevance, replaced by a system that actually measures what matters: can you ship?

Built With

Share this project:

Updates