Q Labs

Q Labs Dashboard
QA View
Sandbox View

Inspiration

QA is one of the most critical yet tedious parts of shipping software. Setting up test environments, writing smoke tests, reproducing bugs, and verifying fixes all demand significant time and infrastructure expertise. We asked ourselves: what if you could upload any Docker image and have AI agents automatically explore your app, find bugs, and prove they're fixed — all without writing a single test script?

What it does

Q Labs transforms any Docker image into an on-demand QA sandbox. Users upload their app, define test scenarios, and let AI agents do the rest:

Sandbox provisioning — Spin up isolated Docker containers from any app version in seconds
Auto QA — Automatic smoke testing across all app routes with pass/fail reporting and severity classification
AI agent navigation — AI agents that intelligently browse your app, fill forms, click buttons, and find broken flows
Multi-agent orchestration — Generate and run multiple test tasks in parallel across focus areas like checkout, search, and input validation
State capture — Snapshot a broken sandbox state, fix the bug, re-run QA, and show the before/after
Versioned test history — Track regressions over time with detailed per-endpoint results

How we built it

Frontend: Next.js 16 with React 19, Tailwind CSS, and shadcn/ui for the control panel dashboard
Backend: FastAPI (Python) handling sandbox orchestration, QA execution, and AI agent coordination
AI: Anthropic Claude API — Haiku for real-time agent navigation decisions, Opus for intelligent test task generation
Infrastructure: Docker Python SDK for container lifecycle management, port mapping, and volume mounting
Database: IBM Db2 for metadata and scenario state
Architecture: The agent system uses DOM analysis and CSS selectors (data-testid, aria-label) to navigate apps without any hardcoded test scripts

Challenges we ran into

Container orchestration at speed — Managing port allocation, health polling, and parallel sandbox launches required careful concurrency handling with threading and async HTTP
AI agent reliability — Getting Claude to consistently produce valid CSS selectors and avoid infinite action loops took significant prompt engineering and retry logic
State preservation — Capturing and restoring full app state (including SQLite WAL files) across container restarts was tricky to get right
Generalization — Making the platform work with any Docker image rather than a single hardcoded app required flexible config injection and environment variable passthrough

Accomplishments that we're proud of

A fully working end-to-end flow: upload an app, introduce a bug, watch AI find it, fix it, and verify the fix — all in minutes
Multi-agent orchestration that can probe different areas of an app in parallel without any pre-written test scripts
Clean sandbox state capture that lets you freeze a broken state and hand it to a developer for debugging
A polished dashboard that makes complex container orchestration feel simple

What we learned

AI agents are remarkably effective at exploratory QA when given the right DOM context and action primitives
Container orchestration is the unsexy backbone that makes or breaks a platform like this — reliability matters more than features
Prompt engineering for structured JSON output (action types, selectors, reasoning) is an art that directly impacts agent success rates
Building for generalization from day one (any Docker image, any app) forces better architecture decisions than building for a single use case

What's next for Q Labs

Visual regression testing — Screenshot comparison between QA runs to catch UI regressions
CI/CD integration — GitHub Actions / GitLab CI hooks to run Q Labs on every PR
Collaborative debugging — Shareable sandbox URLs with real-time co-browsing for distributed teams
Custom agent workflows — Record and replay agent flows as reusable test suites
Expanded AI capabilities — Multi-modal agents that can analyze screenshots and validate visual correctness alongside functional testing