Inspiration
Two weeks ago someone’s OpenClaw incurred a $3k charge on his credit card after he gave it full access to his computer. 6 months ago a Replit agent deleted an entire companies database and then proceeded to lie about it. Stories like this, prompt injection attacks, data leaks, misplaced authority, made us realize how giving your machine to AI is probably a bad idea.
Naturally, we thought what if we could make it so each agent didn’t have to run locally. We decided to give each agent its own secure, sandboxed computing environment via cloud VMs. This adds a hard security wall and a way to operate with sudo access in an isolated environment.
The next step was to make them run in parallel via an orchestrator agent. We could then breakdown complex tasks into independent actions that each agent could run concurrently.
The result was advanced task management with full visibility into every click, keystroke, and window. Spawn an entire task force of agents equipped with their own computers at will.
What it does
Opticon lets users submit a prompt to orchestrate multiple agents that all individually have access to a VM. The platform’s orchestrator automatically breaks the prompt into independent subtasks and assigns each of them into an AI agent. Each agent gets access to its own full virtual machine with a browser, file system, and internet access, holding the ability to execute tasks automatically from clicking, typing, scrolling, and navigating a computer like a human would.
All of these tasks happen simultaneously, so while one agent researches in Chrome, another writes a document, and can pull API data. Users watch everything from a single dashboard, monitoring all the screens of every agent acting in real time, a sidebar of the all agent activity, and a shared whiteboard space where agents post findings and coordinate actions with each other.
From end to end:
- User inputs the program
- K2 intelligently reasons and decomposes the prompt into several subtasks and smartly distributes across worker subagents
- Subagents spin up their own VM interface using E2B, up to 4 linux cloud desktops.
- We stream the live feed of the machine to our webapp using E2B, simultaneously enabling real-time interaction
- Agents coordinate through a shared whiteboard where they can easily do interprocess communication
- Sessions are fully recorded for transparency, visiblity, and reliability. Saved on Neon PostgreSQL database
How we built it
Dedalus Custom Computer-Use Module: We rebuilt the computer-use framework to be compliant with Dedalus via smart tool calling. Each agent runs on a observation-based loop, capturing the screen of the VM, reasoning about its next steps comparative to its task, then performing a tool call (click(x, y), scroll(y), etc.), and then repeats.
Our architecture consists of two parts:
- Next.js handles the frontend, API routes, Socket.io server, and orchestration. It uses a custom server.ts with Socket.io wired into the HTTP server.
- Python agent workers are spawned as separate processes by the backend. Each worker boots an E2B cloud desktop sandbox and runs the vision loop.
Some of the tech we used:
- K2 Think for core LLM orchestration using advanced reasoning to breakdown tasks
- E2B Desktop SDK for cloud Linux sandboxes with built-in streaming
- Dedalus Labs API for spawning subagents
- Socket.io for all real-time communication (frontend <-> backend <-> Python workers)
- Next.js 16 App Router with Tailwind CSS and shadcn/ui for the frontend
- Neon PostgreSQL for session persistence
- Flowglad for billing and subscription management
- NextAuth with Google OAuth for authentication
Challenges we ran into
Our first challenge was getting the agent to actually see what is going on in the VM. We first used the DedalusRunner high level agent loop, which will stringify all tool results, including screenshots. This meant that the model would convert screenshots into raw base64 text instead of the actual images, which the model wouldn’t be able to interpret visually. We had to drop down to the Dedalus API directly and build a custom agentic loop where screenshots are injected as “image_url” content blocks in user messages.
We initially started our orchestration model using GPT 4.1, however we found it was lackluster in being able to break up complex tasks into independent ones that multiple agents could run in parallel, meaning that the agents had to wait serially for others to complete before they could start their task. We pivoted to K2 Think afterwards because of its advanced reasoning capabilities, being able to cleanly decompose our tasks into structured independent actions for every agent.
Accomplishments that we're proud of
The vision loop architecture: Building a custom observe-think-act loop that sends screenshots as actual images through the Dedalus API was non-obvious and required deep research into how the SDK handles tool results vs. user messages.
Real-time Interaction: Streaming via E2B enables user-interaction concurrently with agent work within the VM. Perfect for quick redirects/manual speed up.
Replay session: Capture screenshots at every tool use and save to a buffer that is converted into a timelapse at the end of the session.
Socket.io captures all state changes with the agents. Agents are able to interact and work with each other through a whiteboard space. They post their findings, intermediate results, and conclusions to coordinate effectively with each other to efficiently execute tasks
Push-based task assignment to avoid race conditions - backend pushed tasks to agents rather than the agent polling tasks from a single source.
What we learned
Multi-agent coordination is hard. Even with a shared whiteboard file, getting agents to meaningfully build on each other's work requires well-structured architecture and careful prompting.
Agent SDKs have tradeoffs. Dedalus gives us flexibility to swap between Claude, OpenAI, and other models, but it also means we can't use provider-specific features like Anthropic's native computer_use tool type, a capability that we had to build ourselves.
What's next for Opticon
- Agent-to-agent communication beyond a whiteboard, agents should be able to directly hand off URLs, file paths, credentials, etc. to each other
- Smarter orchestration logic: dynamic replanning throughout the session, agent discovery should allow the orchestrator to adapt the task breakdown
- Improve the reconnect logic that we have, right now when an agent goes down it is difficult to recover
- Currently local-only, next step is to have a hosted deployment
Built With
- dedalus
- e2b
- flowglad
- k2
- neon
- next.js
- nextauth
- postgresql
- socket.io

Log in or sign up for Devpost to join the conversation.