AI Workloads

Building AI Agents? LangChain, LangGraph, and LangSmith Explained.

April Wong

June 11, 20265 min read

Building AI Agents? LangChain, LangGraph, and LangSmith Explained.

Recently we published the LangChain Agent Infrastructure Benchmark, comparing our block storage ABS against AWS io2 and gp3 across 100K agent tasks. We got some questions about what LangChain actually is, how the stack works, and why we chose it as the workload. So here's a quick breakdown.

LangChain: The building blocks

Chain LLMs to your data, tools, and APIs.

LangChain is an open-source framework for building applications powered by LLMs (the name: Language model Chains). 138K GitHub stars, biggest ecosystem in the space. The core idea: instead of just sending a prompt to an LLM and getting a response, LangChain lets you chain together steps. Connect LLMs to your data, your tools, your APIs. The LLM can search your documents, query a database, call an API, and reason about the results before responding. That's what turns a chatbot into an agent.

LangChain is both LLM and model agnostic. You can swap between OpenAI, Anthropic, Google, Mistral, Llama, or any other provider with a one-line change. Same with embedding models. It doesn't care what's generating the responses, it just orchestrates the workflow around it.

The engine underneath is the ReAct loop (Reasoning + Acting, from a 2022 Princeton/Google paper). Instead of just thinking or just doing, the LLM alternates between both: Reason (decide what to do), Act (call a tool), Observe (read the result), repeat until the task is done. Every "Act" step hits your infrastructure. Vector search for context, cache lookup, checkpoint write. That's disk I/O on your VM, every single time.

Example: a customer support agent. User asks "what's the refund policy for my order?" The agent queries Qdrant to find the relevant policy docs, checks Redis for the customer's order history, generates a response with the LLM, and logs the interaction to Postgres. That's one user question, four infrastructure hits.

LangGraph: The execution engine for complex agents

Graphs (multi-step workflows), branches (if this then that), loops (retry until it works). State checkpointed at every node, which means every decision point writes to Postgres.

LangGraph is a separate library within the LangChain ecosystem, built for production agent work. LangChain still exists, but if you're building serious agents in 2025, you're using LangGraph.

The difference: LangChain is a DAG (directed acyclic graph). It always moves forward: step 1 then step 2 then step 3. LangGraph allows cycles. Your agent can loop back, retry, revisit previous nodes based on what it finds, take different paths, pause for human approval, and checkpoint its state at every node. And every time the graph moves, it writes state to Postgres.

That checkpointing is the part most people don't think about. Every node in the graph writes state to Postgres. A 10-step agent workflow is 10 Postgres writes minimum. At 1,000 concurrent agents, that's 10,000 checkpoint writes flowing through your storage. LangGraph is the reason Postgres shows up so heavily in our benchmark.

Example: a lead qualification agent. A new lead comes in: jane@acme.com. Here's every step the graph takes, and what hits your infrastructure at each one.

Node 01: CRM lookup. The agent queries Postgres to check if jane@acme.com is already a customer. Postgres checkpoint written. That's 1 write.

Node 02: Enrich. The agent runs a vector search over company data in Qdrant to pull enrichment info on Acme. Postgres checkpoint written. That's 2 writes.

Node 03: LLM scores the lead. The model API takes the CRM result and enrichment data, scores the lead. This is a conditional edge, meaning the graph branches based on the score. Postgres checkpoint written. That's 3 writes.

Now it branches:

Score high: Route to sales. Sends a Slack notification. Postgres checkpoint. Done. 4-5 writes total.
Score low: Nurture sequence. Adds jane to a drip campaign. Postgres checkpoint. Done. 4-5 writes total.
Data missing: Request more info. The graph loops back to Node 02 (Enrich) to retry with different search parameters. Another Qdrant query, another Postgres checkpoint, then back through Node 03 for rescoring. That's 7-8 writes before it resolves.

One lead, 5-8 Postgres writes depending on the path. Now multiply that by 1,000 concurrent agents running 10-step workflows. That's 10,000 checkpoint writes flowing through your storage. Every node writes state. That's why Postgres dominates the benchmark.

LangSmith: The observability layer

LangSmith is the observability and evaluation platform. It's the commercial product from the LangChain team (not open source). Think of it as the monitoring layer: trace every LLM call, every tool invocation, every step in the graph. See where your agent spent time, where it failed, what it cost. You can also run evals, comparing agent outputs against test cases to catch regressions before they hit production.

LangSmith doesn't hit your storage directly. It's a SaaS platform that collects telemetry from your agents. But it's how you'd actually see the performance differences we measured in the benchmark. If you're running LangSmith on your agent workloads, you'd see the per-step latencies we're reporting.

Example: you deploy the lead qualification agent above. LangSmith shows you that 30% of runs are taking 4x longer than average. You drill into the traces and see the Qdrant vector search step is spiking on cold reads. That's exactly the kind of bottleneck our benchmark measures. LangSmith tells you where the problem is. The storage underneath determines how bad the problem gets.

How they fit together

LangChain is the building blocks. LangGraph is the execution engine for complex agents. LangSmith is the observability layer.

That's why we chose LangChain as the workload. It's what most teams are running in production, and every layer of the stack generates real disk I/O. The framework gets all the attention. The infrastructure underneath doesn't. We think it should.

Full benchmark results and open source repo:
github.com/nirvana-labs-examples/langchain-benchmarks

Nirvana: High Performance Block Storage Cloud

High Performance Block Storage Cloud with High IOPS, powering blockchain, AI and real-time systems.

Learn more at Nirvana Labs

Product

Introducing Nirvana Agent Sandboxes

Fast, isolated, persistent agent sandboxes that make agent economics work. Priced 20% below the leading managed sandbox providers, with infra and orchestration on the Nirvana Cloud.

Jul 8, 20263 min read