AgentGuard - Hackathon Submission

Inspiration

I noticed a critical gap in AI deployments: there's no simple way to ensure AI agents are both secure and cost-effective.

Recent headlines about AI chatbots being manipulated through prompt injection attacks concerned me. At the same time, I saw companies receiving shocking bills from AI API usage without any visibility into what was happening.

Combining Datadog's observability platform with Google Gemini 2.0 presented the perfect opportunity to solve this problem - AgentGuard provides security and full visibility for AI agents.


What it does

AgentGuard sits between users and AI agents, providing three critical features:

1. Real-Time Security

  • Detects and blocks prompt injection attacks in real-time
  • Every threat is logged in Datadog for review
  • Configurable guardrails per agent

2. Complete Observability

  • Track every token consumed and dollar spent
  • Full distributed tracing with Datadog APM
  • Real-time performance metrics and alerts

3. Multi-Agent Orchestration

  • Routes queries to specialized agents (shopping, IT support, etc.)
  • Conversation memory across sessions
  • Powered by Google Gemini 2.0

Live Demo: https://agentguard.mouad.ma


How I built it

I built AgentGuard as a production system on Google Kubernetes Engine:

Tech Stack:

  • FastAPI for the backend
  • Google Gemini 2.0 for AI responses
  • Datadog APM for observability
  • Kubernetes for deployment

What I implemented:

  • Security middleware that scans every query
  • Custom Datadog integration tracking tokens, costs, and latency
  • Two live demos (e-commerce and IT helpdesk)
  • Per-request agent configuration for stateless operation
  • Optimized GKE deployment (~$50/month)

Challenges I ran into

1. Gemini API Key Bug The hardest bug to debug - my GeminiClient was initializing before environment variables loaded due to Python's @lru_cache. Spent hours figuring out why queries returned fallback messages despite correct API keys.

2. SSL Configuration Struggled with SSL handshake failures when using Cloudflare with GKE's managed certificates. Had to remove the certificate annotations and let Cloudflare handle HTTPS.

3. Making it Work Without a Database Required extensive refactoring to make everything work with in-memory storage for demos while remaining PostgreSQL-ready for production.

4. Cost Optimization Initial setup cost $150+/month. Reduced to ~$50 by optimizing node count and resource limits.


Accomplishments that I'm proud of

Production-grade security - Actually blocks real prompt injection attacks Full observability - Every AI interaction is traced with cost and performance data Live demos - Two working demos showing real-world use cases Complete documentation - Architecture diagrams, deployment guides, API docs Cost transparency - Users see exactly how much each interaction costs


What I learned

Technical:

  • Deep Datadog APM integration (custom metrics, distributed tracing)
  • Google Gemini API prompt engineering and context management
  • Kubernetes production patterns (health probes, secrets, zero-downtime deploys)
  • Real-time security threat detection

Insights:

  • Observability must be built in from day one
  • Real-time cost tracking changes how you optimize AI apps
  • Multiple security layers are essential (API keys + rate limiting + injection detection)
  • Cloud costs can spiral quickly without monitoring

What's next for AgentGuard

Short-term:

  • PostgreSQL integration for persistent storage
  • Machine learning-based anomaly detection
  • More agent types (healthcare, finance, education)
  • Predictive cost alerts

Long-term:

  • Support for multiple LLMs (Claude, GPT-4)
  • Agent marketplace with pre-built configurations
  • Enterprise SaaS offering
  • Multi-cloud support (AWS, Azure)

Links:

Share this project:

Updates