Ai2QA — AI-Powered Autonomous Testing

We have an AI summary and an AI memory systems
The first AI call failed by Gemini rate limit but we have retry and it succeed after one retry
The architecture
The report will list the bad performance issues
The report summary
The performance measurement step will also show the performance metrics
The main page

Inspiration

Software teams spend up to 40% of development time writing and maintaining test scripts. The UI changes, selectors break, tests go red — and bugs still reach production. We've lived this pain across 20+ years of enterprise software development, watching teams burn weeks fixing flaky Selenium tests that add zero business value.

When Gemini 3 launched with its massive context window and advanced reasoning capabilities, we saw an opportunity: what if AI could test your web app autonomously — understanding intent, adapting to UI changes, and healing itself when selectors break? Not another record-and-replay tool, but a genuine AI agent that reasons about what to test and how.

What it does

Ai2QA is an autonomous QA testing platform powered entirely by Gemini 3. Paste any URL, pick one of four AI personas, and watch it test — zero scripts, zero setup.

Four personas, four strategies:

The Performance Hawk — captures Core Web Vitals (CLS, TTFB, FCP, LCP), flags performance bottlenecks with severity ratings
The Gremlin (CHAOS) — chaos engineering agent that rage-clicks, tests edge cases, and exposes fragile code
The White Hat (HACKER) — live penetration testing for OWASP Top 10 vulnerabilities (XSS, SQL injection)
The Auditor (STANDARD) — methodical regression testing, validates business logic with surgical precision

Each persona has a tuned temperature (0.2–0.6) and a specialized system prompt that shapes Gemini 3's reasoning and testing behavior.

Key features:

Self-healing tests — when a selector breaks, the agent takes a DOM snapshot, asks Gemini 3 to locate the new element, and continues autonomously
Aria accessibility snapshots — instead of sending full HTML to Gemini, we use compact Aria tree representations via MCP, dramatically reducing token usage and improving response speed
Two-stage security — a PlanSanitizer and PromptInjectionDetector screen every AI-generated action before execution, preventing off-target navigation or malicious steps
Full reports — health checks, console exceptions, accessibility grades, performance metrics, severity-tagged issues, and step-by-step execution timelines

How we built it

Architecture: Hexagonal (ports & adapters) with clean separation:

Module	Role
`ai2qa-domain-core`	Pure Java domain models — no framework dependencies. Records, value objects, port interfaces (`ActionQueuePort`, `DoneQueuePort`, `BrowserDriverPort`), and a functional `Result<T>` type
`ai2qa-application`	Business logic: `AgentOrchestrator` coordinates execution, `StepPlanner` uses Gemini 3 to decide the next action, `Reflector` analyzes results after each step
`ai2qa-mcp-bridge`	MCP protocol integration — `McpClient` communicates via stdin/stdout to a Node.js Playwright server with tools: `ClickTool`, `TypeTool`, `NavigateTool`, `ScreenshotTool`, `SnapshotTool` (Aria)
`ai2qa-infra-jpa`	H2 in-memory database with Flyway migrations, GCP Cloud SQL(PostgreSQL) on Production
`ai2qa-web-api`	REST controllers
`frontend`	Next.js 16 + React 19 dashboard

Tech stack: Java 21, Spring Boot 3.4, Gemini 3 via Vertex AI, MCP protocol for Chrome DevTools, Playwright for browser automation, Next.js 16 frontend.

Gemini 3 integration is pervasive — it powers the StepPlanner (deciding what action to take next), the Reflector (analyzing outcomes), the PersonaPromptComposer (shaping behavior per persona), and the self-healing loop (finding replacement selectors from DOM snapshots). Every test step involves at least one Gemini 3 API call.

Challenges we ran into

Gemini 3 rate limits in production. Under heavy testing, we hit 429 rate limits frequently. We implemented exponential backoff retry, and seeing the GCP logs show RATE_LIMITED → retry → success → test continues was both stressful and satisfying. Production resilience matters.

Aria snapshots vs. full DOM tradeoff. Full HTML pages can be 500KB+, which blows through tokens and slows reasoning. We switched to Aria accessibility tree snapshots via MCP's SnapshotTool, which gives Gemini compact, semantically meaningful element references. The tradeoff: some visual-only elements aren't in the Aria tree, so we fall back to screenshots for visual verification.

Prompt injection defense for an autonomous agent. An AI agent that navigates the open web is inherently risky — malicious pages could inject instructions. We built a two-stage security pipeline: PromptInjectionDetector scans for injection patterns, PlanSanitizer validates every planned action against allowed targets, and TargetGuardService enforces URL boundaries. This was the hardest engineering challenge — balancing agent autonomy with safety.

Obstacle detection. Real websites throw cookie banners, GDPR popups, newsletter modals, and age verification gates at you. Our ObstacleDetector maintains pattern lists for common consent buttons and dismisses them autonomously so the actual testing can proceed.

Accomplishments that we're proud of

Four distinct AI personas that produce genuinely different test reports on the same URL
Self-healing tests that survive UI changes without human intervention
The Performance Hawk capturing real Core Web Vitals and producing severity-tagged, actionable performance reports
A hexagonal architecture where the domain core has zero framework dependencies
Security-first design with PlanSanitizer + PromptInjectionDetector screening every AI action

What we learned

Gemini 3's reasoning capabilities are strong enough to drive a complex, multi-step autonomous agent — but prompt engineering for consistency across hundreds of test steps is an art
MCP protocol is a game-changer for browser automation — it provides a clean abstraction layer between AI reasoning and browser actions
Temperature tuning per persona (from 0.2 for the methodical Auditor, to 0.6 for the unpredictable Gremlin) has a dramatic effect on testing behavior and coverage
Building AI safety for autonomous web agents is a fundamentally different challenge than chatbot safety — the agent can act, not just speak

What's next for Ai2QA

Skill absorption — learning reusable testing patterns from GitHub repositories (the SkillAbsorptionService is already scaffolded)
Multi-page test flows — chaining autonomous tests across login → dashboard → checkout sequences
CI/CD integration — triggering persona-based test suites from GitHub Actions
Custom personas — users define their own testing personalities with custom system prompts
Playwright export refinement — generating production-ready Page Object Model patterns from autonomous test runs

Built With

Updates

snowmanjy Jing started this project — Feb 07, 2026 01:19 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.