demo
Datadog Dashboard
demo

AgentGuard - Hackathon Submission

Inspiration

I noticed a critical gap in AI deployments: there's no simple way to ensure AI agents are both secure and cost-effective.

Recent headlines about AI chatbots being manipulated through prompt injection attacks concerned me. At the same time, I saw companies receiving shocking bills from AI API usage without any visibility into what was happening.

Combining Datadog's observability platform with Google Gemini 2.0 presented the perfect opportunity to solve this problem - AgentGuard provides security and full visibility for AI agents.

What it does

AgentGuard sits between users and AI agents, providing three critical features:

1. Real-Time Security

Detects and blocks prompt injection attacks in real-time
Every threat is logged in Datadog for review
Configurable guardrails per agent

2. Complete Observability

Track every token consumed and dollar spent
Full distributed tracing with Datadog APM
Real-time performance metrics and alerts

3. Multi-Agent Orchestration

Routes queries to specialized agents (shopping, IT support, etc.)
Conversation memory across sessions
Powered by Google Gemini 2.0

Live Demo: https://agentguard.mouad.ma

How I built it

I built AgentGuard as a production system on Google Kubernetes Engine:

Tech Stack:

FastAPI for the backend
Google Gemini 2.0 for AI responses
Datadog APM for observability
Kubernetes for deployment

What I implemented:

Security middleware that scans every query
Custom Datadog integration tracking tokens, costs, and latency
Two live demos (e-commerce and IT helpdesk)
Per-request agent configuration for stateless operation
Optimized GKE deployment (~$50/month)

Challenges I ran into

1. Gemini API Key Bug The hardest bug to debug - my GeminiClient was initializing before environment variables loaded due to Python's @lru_cache. Spent hours figuring out why queries returned fallback messages despite correct API keys.

2. SSL Configuration Struggled with SSL handshake failures when using Cloudflare with GKE's managed certificates. Had to remove the certificate annotations and let Cloudflare handle HTTPS.

3. Making it Work Without a Database Required extensive refactoring to make everything work with in-memory storage for demos while remaining PostgreSQL-ready for production.

4. Cost Optimization Initial setup cost $150+/month. Reduced to ~$50 by optimizing node count and resource limits.

Accomplishments that I'm proud of

Production-grade security - Actually blocks real prompt injection attacks Full observability - Every AI interaction is traced with cost and performance data Live demos - Two working demos showing real-world use cases Complete documentation - Architecture diagrams, deployment guides, API docs Cost transparency - Users see exactly how much each interaction costs

What I learned

Technical:

Deep Datadog APM integration (custom metrics, distributed tracing)
Google Gemini API prompt engineering and context management
Kubernetes production patterns (health probes, secrets, zero-downtime deploys)
Real-time security threat detection

Insights:

Observability must be built in from day one
Real-time cost tracking changes how you optimize AI apps
Multiple security layers are essential (API keys + rate limiting + injection detection)
Cloud costs can spiral quickly without monitoring