Darwin

Dashboard
Dashboard #2
Sample Navigating of a Page
Logo

Inspiration

In the modern world, it has never been easier for the average business owner to make a website (theoretically). But in practice, they face countless challenges. For one, a website is more than just a collection of pages; it represents a brand's online identity. However, most business owners don't know how to make a good user experience, failing to communicate the brand's identity to potential customers, even with a strong product.

That’s where Darwin comes in. It brings AI agents that simulate human behaviour on the website, dealing with issues that affect the user experience, such as navigation, awareness of elements, etc. In other words, it allows businesses to simulate users on their website without needing to deploy anything or have actual users. And, inspired by reinforcement learning, this cycle runs an indefinite amount of time until the “optimal” website is reached.

What it does

Darwin is an SDK that provides pre-production businesses with a website that improves itself. It is a closed loop system that starts with a barebones site and autonomously iterates on the website in an RL-inspired environment.

In each iteration, AI agents simulate users performing various tasks around the site. Then, these tasks, along with the thought processes of the agents, are collected and analyzed by AI. Closing the loop, these analyses are used to build a subsequent version of the website, on which is iterated again.

To put it simply:

Browser-based autonomous agents move around the website, of whom we collect 10+ unique analytics (e.g., user reasoning, scrolling behaviour, cursor positioning).
Agents feed the data into a central brain which identifies potential issues in the website’s layout, user experience, etc.
Gemini CLI takes these issues and fixes them accordingly.
REPEAT

The result… a website that "survives" only if it provides the best possible user experience.

How we built it

The main SDK was composed of 3 parts: The agents simulating users, Claude updating the website in real time, and the orchestration framework behind it all. To be more specific:

Agents interacted with the web using Stagehand while being powered (and mimicking different personas + collecting various data points) using LLMs
Gemini CLI was automated by spawning processes to call it with the desired fixes
The orchestration framework took in LLM analytics and thought processes and consequently determined the main pain points for the simulated users

The dashboard was integrated with the rest of the SDK by using Express.js to provide API routes where each of these actions would be called in sequential order

The demo website was built with Shopify Polaris components, Next.js, and TailwindCSS to simulate how a pre-production business would iterate on their website without users

Challenges we ran into

Simulating real people was hard because we didn’t have billions of pieces of data to model behaviour off of
Integrating another LLM into Gemini CLI directly was challenging because normally it’s the other way around (i.e. Gemini CLI normally uses MCPs to integrate with other tools)

Accomplishments that we're proud of

We were able to create a full circle pipeline integrating with Claude, allowing for JSON analytics to directly result in a code change
We simulated user analytics using web agent automation and integrating with the Amplitude SDK

What we learned

It is difficult to fully encapsulate the exact thought processes and actions of a human user through an AI agent without huge datasets
Sometimes the AI tries things a human never would, and sometimes, those "weird" designs actually perform better

What's next for Darwin

Running multiple agents in parallel with various personas to simulate different user demographics
Simulating other human decisions like those based on money instead of user experience
Visualization of the various agents navigating the same site / one agent navigating 2 different sites (A/B automated testing)