Application structure
To deploy an Agent Server application, you need to specify the graph(s) you want to deploy, as well as any relevant configuration settings, such as dependencies and environment variables. Read the application structure guide to learn how to structure your LangGraph application for deployment.Parts of a deployment
When you deploy Agent Server, you are deploying one or more graphs, a database for persistence, and a task queue.Graphs
When you deploy a graph with Agent Server, you are deploying a “blueprint” for an Assistant. An Assistant is a graph paired with specific configuration settings. You can create multiple assistants per graph, each with unique settings to accommodate different use cases that can be served by the same graph. Upon deployment, Agent Server will automatically create a default assistant for each graph using the graph’s default configuration settings.We often think of a graph as implementing an agent, but a graph does not necessarily need to implement an agent. For example, a graph could implement a simple
chatbot that only supports back-and-forth conversation, without the ability to influence any application control flow. In reality, as applications get more complex, a graph will often implement a more complex flow that may use multiple agents working in tandem.
Persistence and task queue
Agent Server leverages a database for persistence and a task queue. PostgreSQL is supported as a database for Agent Server and Redis as the task queue. If you’re deploying using LangSmith cloud, these components are managed for you. If you’re deploying Agent Server on your own infrastructure, you’ll need to set up and manage these components yourself. For more information on how these components are set up and managed, review the hosting options guide.How to deploy
Agent Server can be deployed using different methods depending on your infrastructure:- Cloud: Deploy from GitHub repositories with fully managed infrastructure.
- Hybrid or self-hosted with control plane: Build Docker images and deploy via the UI.
- Standalone servers: Deploy Agent Servers directly without the control plane.
Cloud deployments are available on all LangSmith plans. Hybrid and self-hosted options require an Enterprise plan and license key. To acquire a license key, contact our sales team.
Runtime architecture
The following description applies to the non-distributed runtime variant of LangSmith Deployment.Container architecture
A typical deployment consists of two kinds of long-running containers, both built from the same Docker image (a base image with your project code installed on top):- API servers handle client requests (creating runs, reading thread state, streaming results) but do not execute agent code themselves.
- Queue workers are the execution engine. They listen to the durable task queue, execute your graph code, and write checkpoints.
Run execution lifecycle
When you invoke a run, the request flows through several components:- A client sends a request to an API server, which creates a pending run in the durable task queue.
- A queue worker picks up the run, acquires a lease on it, loads the appropriate graph, and begins execution. The queue enforces that at most 1 run can be executed for a given thread at one time.
- As the graph executes, the worker writes checkpoints to the persistence layer (the frequency depends on the durability mode) and broadcasts streaming events over the configured pubsub provider.
- If the client opened a
/streamconnection, the API server subscribes to the pubsub channel and forwards events to the client via server-sent events in real time. - When execution completes, the worker updates the run status and releases its slot for the next run.
N_JOBS_PER_WORKER runs concurrently (default: 10), so a single worker container serves many runs in parallel. See Configure Agent Server for scale for tuning guidance.
Graph loading and compilation
How and when your graph is compiled depends on how you register it in your application structure:- Compiled graph: If you export an already-compiled graph (a
CompiledGraphinstance), it is loaded once at container startup and reused for every run. This is the most efficient path. - Factory function: If you export an agent factory function, the server invokes it each time it needs the graph. Factories receive the run’s configuration, enabling per-run graph customization (for example, choosing different models or tools based on the assistant config). Keep factory functions lightweight for best performance.
Learn more
- Application Structure guide explains how to structure your application for deployment.
- The API Reference provides detailed information on the API endpoints and data models.