Inspiration
In the modern world, it has never been easier for the average business owner to make a website (theoretically). But in practice, they face countless challenges. For one, a website is more than just a collection of pages; it represents a brand's online identity. However, most business owners don't know how to make a good user experience, failing to communicate the brand's identity to potential customers, even with a strong product.
That’s where Darwin comes in. It brings AI agents that simulate human behaviour on the website, dealing with issues that affect the user experience, such as navigation, awareness of elements, etc. In other words, it allows businesses to simulate users on their website without needing to deploy anything or have actual users. And, inspired by reinforcement learning, this cycle runs an indefinite amount of time until the “optimal” website is reached.
What it does
Darwin is an SDK that provides pre-production businesses with a website that improves itself. It is a closed loop system that starts with a barebones site and autonomously iterates on the website in an RL-inspired environment.
In each iteration, AI agents simulate users performing various tasks around the site. Then, these tasks, along with the thought processes of the agents, are collected and analyzed by AI. Closing the loop, these analyses are used to build a subsequent version of the website, on which is iterated again.
To put it simply:
- Browser-based autonomous agents move around the website, of whom we collect 10+ unique analytics (e.g., user reasoning, scrolling behaviour, cursor positioning).
- Agents feed the data into a central brain which identifies potential issues in the website’s layout, user experience, etc.
- Gemini CLI takes these issues and fixes them accordingly.
- REPEAT
The result… a website that "survives" only if it provides the best possible user experience.
How we built it
The main SDK was composed of 3 parts: The agents simulating users, Claude updating the website in real time, and the orchestration framework behind it all. To be more specific:
- Agents interacted with the web using Stagehand while being powered (and mimicking different personas + collecting various data points) using LLMs
- Gemini CLI was automated by spawning processes to call it with the desired fixes
- The orchestration framework took in LLM analytics and thought processes and consequently determined the main pain points for the simulated users
The dashboard was integrated with the rest of the SDK by using Express.js to provide API routes where each of these actions would be called in sequential order
The demo website was built with Shopify Polaris components, Next.js, and TailwindCSS to simulate how a pre-production business would iterate on their website without users
Challenges we ran into
- Simulating real people was hard because we didn’t have billions of pieces of data to model behaviour off of
- Integrating another LLM into Gemini CLI directly was challenging because normally it’s the other way around (i.e. Gemini CLI normally uses MCPs to integrate with other tools)
Accomplishments that we're proud of
- We were able to create a full circle pipeline integrating with Claude, allowing for JSON analytics to directly result in a code change
- We simulated user analytics using web agent automation and integrating with the Amplitude SDK
What we learned
- It is difficult to fully encapsulate the exact thought processes and actions of a human user through an AI agent without huge datasets
- Sometimes the AI tries things a human never would, and sometimes, those "weird" designs actually perform better
What's next for Darwin
- Running multiple agents in parallel with various personas to simulate different user demographics
- Simulating other human decisions like those based on money instead of user experience
- Visualization of the various agents navigating the same site / one agent navigating 2 different sites (A/B automated testing)
Built With
- express.js
- gemini
- next.js
- polaris
- stagehand
- tailwindcss
- typescript







Log in or sign up for Devpost to join the conversation.