Agent Server

LangSmith Deployment’s Agent Server offers an API for creating and managing agent-based applications. It is built on the concept of assistants, which are agents configured for specific tasks, and includes built-in persistence and a task queue. This versatile API supports a wide range of agentic application use cases, from background processing to real-time interactions. Use Agent Server to create and manage assistants, threads, runs, cron jobs, webhooks, and more.

API reference
For detailed information on the API endpoints and data models, refer to the Agent Server API reference.

Application structure

To deploy an Agent Server application, you need to specify the graph(s) you want to deploy, as well as any relevant configuration settings, such as dependencies and environment variables. Read the application structure guide to learn how to structure your LangGraph application for deployment.

Parts of a deployment

When you deploy Agent Server, you are deploying one or more graphs, a database for persistence, and a task queue.

Graphs

When you deploy a graph with Agent Server, you are deploying a “blueprint” for an Assistant. An Assistant is a graph paired with specific configuration settings. You can create multiple assistants per graph, each with unique settings to accommodate different use cases that can be served by the same graph. Upon deployment, Agent Server will automatically create a default assistant for each graph using the graph’s default configuration settings.

We often think of a graph as implementing an agent, but a graph does not necessarily need to implement an agent. For example, a graph could implement a simple chatbot that only supports back-and-forth conversation, without the ability to influence any application control flow. In reality, as applications get more complex, a graph will often implement a more complex flow that may use multiple agents working in tandem.

Persistence and task queue

Agent Server leverages a database for persistence and a task queue. PostgreSQL is supported as a database for Agent Server and Redis as the task queue. If you’re deploying using LangSmith cloud, these components are managed for you. If you’re deploying Agent Server on your own infrastructure, you’ll need to set up and manage these components yourself. For more information on how these components are set up and managed, review the hosting options guide.

How to deploy

Agent Server can be deployed using different methods depending on your infrastructure:

Cloud: Deploy from GitHub repositories with fully managed infrastructure.
Hybrid or self-hosted with control plane: Build Docker images and deploy via the UI.
Standalone servers: Deploy Agent Servers directly without the control plane.

Cloud deployments are available on all LangSmith plans. Hybrid and self-hosted options require an Enterprise plan and license key. To acquire a license key, contact our sales team.

Runtime architecture

The following description applies to the non-distributed runtime variant of LangSmith Deployment.

Container architecture

A typical deployment consists of two kinds of long-running containers, both built from the same Docker image (a base image with your project code installed on top):

API servers handle client requests (creating runs, reading thread state, streaming results) but do not execute agent code themselves.
Queue workers are the execution engine. They listen to the durable task queue, execute your graph code, and write checkpoints.

Containers are *stateless but persistent. At least 1 queue worker must listen to the task queue at any time to ensure no runs are orphaned. The containers can serve many runs over their lifetime. API servers and queue workers are separate container pools and scale independently.

Run execution lifecycle

When you invoke a run, the request flows through several components:

A client sends a request to an API server, which creates a pending run in the durable task queue.
A queue worker picks up the run, acquires a lease on it, loads the appropriate graph, and begins execution. The queue enforces that at most 1 run can be executed for a given thread at one time.
As the graph executes, the worker writes checkpoints to the persistence layer (the frequency depends on the durability mode) and broadcasts streaming events over the configured pubsub provider.
If the client opened a /stream connection, the API server subscribes to the pubsub channel and forwards events to the client via server-sent events in real time.
When execution completes, the worker updates the run status and releases its slot for the next run.

Each worker handles up to N_JOBS_PER_WORKER runs concurrently (default: 10), so a single worker container serves many runs in parallel. See Configure Agent Server for scale for tuning guidance.

Graph loading and compilation

How and when your graph is compiled depends on how you register it in your application structure:

Compiled graph: If you export an already-compiled graph (a CompiledGraph instance), it is loaded once at container startup and reused for every run. This is the most efficient path.
Factory function: If you export an agent factory function, the server invokes it each time it needs the graph. Factories receive the run’s configuration, enabling per-run graph customization (for example, choosing different models or tools based on the assistant config). Keep factory functions lightweight for best performance.

In both cases, the server automatically injects the checkpointer and memory store configured for that deployment at runtime. You should not configure these in your graph code, since the server needs to manage these for other operations.

Learn more

Application Structure guide explains how to structure your application for deployment.
The API Reference provides detailed information on the API endpoints and data models.

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Configure app for deployment

Deployment guides

App development

Studio

Auth & access control

Server customization

Application structure

Parts of a deployment

Graphs

Persistence and task queue

How to deploy

Runtime architecture

Container architecture

Run execution lifecycle

Graph loading and compilation

Learn more

Configure app for deployment

Deployment guides

App development

Studio

Auth & access control

Server customization

​Application structure

​Parts of a deployment

​Graphs

​Persistence and task queue

​How to deploy

​Runtime architecture

​Container architecture

​Run execution lifecycle

​Graph loading and compilation

​Learn more

Application structure

Parts of a deployment

Graphs

Persistence and task queue

How to deploy

Runtime architecture

Container architecture

Run execution lifecycle

Graph loading and compilation

Learn more