LangChain agent benchmark · 5 platforms · 3 independent runs · 5M vectorsLangChain

Your agents. Faster.

Nirvana ABS finishes the same LangChain agent workload 14–17% faster than AWS io2 at every scale we tested, 1k, 10k, and 100k tasks, and reproducibly across three independent runs. Same code, same config, varying only the storage architecture underneath. The how, the why, and the one caveat are below.

ABS lead14–17%

faster task completion time vs AWS io2

The workload
5M-vector database
QdrantRedisPostgres

Vector search · cache · checkpoints · cold reads · 6 storage ops per task

Reproducibility
cold runs
Concurrency
1,000 agents × 100 tasks

100,000 tasks total

Key findings

Summary · Faster task completion end-to-end. High throughput. High IOPS.

Three independent runs of the LangChain multi-tenant Qdrant workload across five storage platforms, scaled from 100 to 1,000 concurrent agents (1k to 100k tasks). ABS finishes first on every run. The four numbers that matter:

Task completion
14–17%faster task completion

Finishes 14–17% faster than io2 at every scale (1K, 10K, 100K tasks). Reproduces across all 3 runs.

Throughput
+19%more throughput

172 app IOPS vs io2’s 145 at 100K tasks. More work through the pipeline, faster finish.

Raw disk IOPS
7.8×faster raw disk

313K IOPS vs io2-64k’s 40K (instance-capped). The hardware ceiling underneath.

Cost
31×cheaper than io2

$118/mo vs $3,710/mo (io2-64k). Flat per-GB pricing, no IOPS provisioning tax.

Background · why LangChain, why storage

Background · The agent layer where the disk actually matters.

LangChain is the most popular open-source framework for building LLM-powered agents (138K GitHub stars). It runs the agent as a loop: the LLM reasons about what to do, acts by calling a tool, observes the result, and repeats until the task is done. A single query can run that cycle 3 to 10 times.

Every Act calls a tool. Some are external (a web search, a Slack message) and resolve over the network. But the tools that retrieve context, cache results, and checkpoint state, Qdrant, Redis, and Postgres, run on your VM, and every vector search, every cache lookup, and every checkpoint write lands on the disk attached to your machine.

The LLM is one API call. The external tools are network requests. The disk is the only layer where cloud storage performance directly changes how fast the loop finishes. That’s why we scoped this benchmark to the storage layer, and chose LangChain as the workload: it’s the framework most teams run in production, generating real disk I/O across real services at real concurrency.

The ReAct loop · repeats until done
Bundled first: LLM + Tools + Prompt
Reason
LLM reads the query + history, decides the next action. One API call.
Act
Calls a tool: Qdrant search, Redis cache, Postgres checkpoint. This is the step that hits your disk.
Observe
Agent reads the tool’s result, then loops back to Reason.
Objectives · what we set out to prove

The questions · Three things we set out to answer.

Not the questions a vendor benchmark answers, the ones a team actually asks before moving agents off AWS, including the one most benchmarks avoid.

Objective 1

How much faster is Nirvana ABS vs AWS EBS (io2 & gp3)?

On real agent tasks, RAG, vector search, caching, checkpointing, not synthetic reads.

Objective 2

Do the FIO numbers survive a production workload?

ABS owns the 4K-random-read benchmark every vendor quotes. Does that win finish tasks first?

Objective 3

Where does the lead hold under load?

Stress-tested across three scales, 1k → 100k tasks.

What we're testing · same agents, same code, five platforms

The setup · Five platforms. Identical workload. Different block storage.

We deployed identical LangChain agent workloads across five infrastructure configurations, four on AWS and one on Nirvana. Each VM runs Qdrant (vector DB), Redis (cache), and Postgres (checkpoints) locally via Docker. Same code, same data, same task sequence, the only variable is the infrastructure underneath.

Nirvana ABS
Instance
n1-standard-4
vCPU / RAM
4 / 16 GB · DDR5
Storage
ABS 256 GB
Provisioned IOPS
20,000600k burst · included
Cost / mo · 256 GB
$23.94
gp3-3k
Instance
m6i.xlarge
vCPU / RAM
4 / 16 GB · DDR4
Storage
gp3 256 GB
Provisioned IOPS
3,000
Cost / mo · 256 GB
$20.48
gp3-16k
Instance
m6i.xlarge
vCPU / RAM
4 / 16 GB · DDR4
Storage
gp3 256 GB
Provisioned IOPS
16,000
Cost / mo · 256 GB
$85.48
io2-32k
Instance
m6i.xlarge
vCPU / RAM
4 / 16 GB · DDR4
Storage
io2 256 GB
Provisioned IOPS
32,000
Cost / mo · 256 GB
$2,112
io2-64k
Instance
m6i.xlarge
vCPU / RAM
4 / 16 GB · DDR4
Storage
io2 256 GB
Provisioned IOPS
64,000
Cost / mo · 256 GB
$3,584
Test 1 · Raw disk (fio)

1  FIO · ABS is 7.8× faster than AWS io2 on raw disk.

ABS pushes 313,074 IOPS, 7.8× the best io2 can deliver on this instance class. Latency at the same load: 817 μs vs 6,345 μs.

This is the ceiling. The question is whether the application actually pulls against it, and as the next layer shows, HNSW does not.

Raw fio · 4K random read · QD=256IOPS (log-ish scale, ABS = 100%)
Nirvana ABS
313,0741.00×
io2-64kinstance-capped
40,3390.13×
io2-32k
33,0700.11×
gp3-16k
16,5300.05×
gp3-3k
3,0970.01×
Nirvana ABS
Provisioned IOPS
20,000600k burst · included
Measured IOPS
313,074261K–313K across runs
Latency
817 μs
ABS IOPS lead
baseline
io2-64k
Provisioned IOPS
64,000
Measured IOPS
40,339capped at instance limit
Latency
6,345 μs
ABS IOPS lead
7.8×
io2-32k
Provisioned IOPS
32,000
Measured IOPS
33,070
Latency
7,740 μs
ABS IOPS lead
9.5×
gp3-16k
Provisioned IOPS
16,000
Measured IOPS
16,530
Latency
15,485 μs
ABS IOPS lead
19×
gp3-3k
Provisioned IOPS
3,000
Measured IOPS
3,097
Latency
82,632 μs
ABS IOPS lead
101×

m6i.xlarge instance limit caps io2-64k at ~40K IOPS regardless of provisioned ceiling. ABS has no fixed provisioned cap, so its fio result varies run-to-run (261K–313K IOPS); 313,074 is the peak measured, and even the low end is 6.5× io2-64k.

Methodology · test setup

2  3× Cold Reads · Methodology

Two pre-conditions have to be true before any result means something in production. Both are workload setup applied identically to all five platforms, not platform tuning.

Pre-condition 1

From 6K vectors to 5M. From RAM to disk.

A 6K-vector check ran fast everywhere, the index fit in RAM. So we pushed it to 5M vectors (768-dim, ~15 GB) across 50 collections of 100K, the production multi-tenant pattern. At that size the working set spills past page cache, every query hits disk.

Cold reads only: Qdrant restarted and OS page cache dropped before each run.

Pre-condition 2

inline_storage + INT8 scalar quantization.

Every collection runs hnsw_config.inline_storage=true + INT8 scalar quantization (a Qdrant 1.16 feature). Default HNSW lands a Qdrant p99 of 24–43 seconds on every tier, unshippable.

It compresses the p99 tail 7.6–12.8× on every platform, bigger than any inter-platform gap.

Tiered multi-tenancy at scale (Qdrant 1.16) · the production workload

2.1  ABS finishes first on end-to-end task completion.

We split the 5M vectors into 50 per-tenant collections and ran the cold-read workload at three scales, three times each (R1/R2/R3); R3 cranks the 1,000-agent run to 100k tasks. ABS wins task completion and task p99 at every scale, and the lead widens with sustained load — reproducibly, across all three runs.

Scale
Run · same cold-read test ×3 · R3 = canonical

100 agents × 10 tasks · run R3 · 1,000 tasks total

Nirvana ABS
Task completion
36 s
App IOPS
169.0
Task p50
343
Task p95
446
Task p99
564
io2-32k
Task completion
43 s
App IOPS
141.0
Task p50
419
Task p95
542
Task p99
600
io2-64k
Task completion
42 s
App IOPS
143.0
Task p50
413
Task p95
532
Task p99
595
gp3-16k
Task completion
43 s
App IOPS
140.0
Task p50
420
Task p95
537
Task p99
696
gp3-3k
Task completion
62 s
App IOPS
97.0
Task p50
606
Task p95
1,013
Task p99
2,360

Same cold-read test, run three times. ABS leads task completion, throughput, and task p99 in every run. Task latencies in ms. p50/p95 were captured at the 100×10 and 100k scales; 500×20 and 1000×10 report task p99 only. R3 is the canonical result; per-service Qdrant / Redis / Postgres latencies are below.

2.2  Holds across three scenarios, 1,000 to 100,000 tasks.

Same five VMs, three workload sizes, 1k, 10k, and 100k total tasks. ABS finishes first on every one. The lead widens with sustained load.

100×101k tasks
Metric
Task completion
Nirvana ABS· winner
36 s
io2-32k
43 s
io2-64k
42 s
gp3-16k
43 s
gp3-3k
62 s
ABS lead
14% faster
100×101k tasks
Metric
Task p99
Nirvana ABS· winner
564 ms
io2-32k
600 ms
io2-64k
595 ms
gp3-16k
696 ms
gp3-3k
2,360 ms
ABS lead
5% lower latency
500×2010k tasks
Metric
Task completion
Nirvana ABS· winner
351 s
io2-32k
420 s
io2-64k
424 s
gp3-16k
430 s
gp3-3k
433 s
ABS lead
16% faster
500×2010k tasks
Metric
Task p99
Nirvana ABS· winner
501 ms
io2-32k
591 ms
io2-64k
600 ms
gp3-16k
609 ms
gp3-3k
828 ms
ABS lead
15% lower latency
1000×100100k tasks
Metric
Task completion
Nirvana ABS· winner
58 min
io2-32k
69 min
io2-64k
69 min
gp3-16k
68 min
gp3-3k
71 min
ABS lead
15% faster
1000×100100k tasks
Metric
Task p99
Nirvana ABS· winner
725 ms
io2-32k
779 ms
io2-64k
771 ms
gp3-16k
760 ms
gp3-3k
809 ms
ABS lead
5% lower latency

All R3 runs on fresh terraform deploys. ABS wins both task completion and task p99 at every scale.

Per-service wins at 1000×100

Once the workload runs long enough to flush short-term caching effects, ABS also takes Redis p99 (68.0 ms vs 70–73 ms on AWS) and Postgres p99 (31.9 ms vs 37–38 ms on AWS), the first scenario where Nirvana sweeps the cache + checkpoint paths. AWS io2 keeps the per-query Qdrant p99 (140 ms vs ABS 301 ms at this scale), but at 100k tasks × 6 ops, the win flips wherever a single op isn’t on the user-facing critical path.

Per-service compound · 1000 × 100 production

2.3  Latency compounds across services. ABS finishes 16% faster.

Each agent task chains 6 ops across 3 services (Qdrant ×2 · Redis ×2 · Postgres ×2). Per-op latency compounds into task time. At the heaviest scale we tested, ABS leads on Redis, Postgres, and end-to-end task p99, even where io2 holds the per-query Qdrant tail.

Qdrantvector search
Per task
2 ops
ABS p99
301 ms
io2-32k
140 ms
io2-64k
140 ms
gp3-16k
147 ms
gp3-3k
307 ms
ABS vs io2-64k
+161 ms (115% slower)
Rediscache reads
Per task
2 ops
ABS p99· winner
68.0 ms
io2-32k
71.9 ms
io2-64k
70.4 ms
gp3-16k
71.8 ms
gp3-3k
72.6 ms
ABS vs io2-64k
−2.4 ms (3% faster)
Postgrescheckpoint writes
Per task
2 ops
ABS p99· winner
31.9 ms
io2-32k
38.1 ms
io2-64k
37.7 ms
gp3-16k
37.0 ms
gp3-3k
38.2 ms
ABS vs io2-64k
−5.8 ms (15% faster)
Task p99compound across services
Per task
6 ops
ABS p99· winner
725 ms
io2-32k
779 ms
io2-64k
771 ms
gp3-16k
760 ms
gp3-3k
809 ms
ABS vs io2-64k
−46 ms (6% faster)
Task completiontotal wall-clock
Per task
100k tasks
ABS p99· winner
58 min
io2-32k
69 min
io2-64k
69 min
gp3-16k
68 min
gp3-3k
71 min
ABS vs io2-64k
−11 min (16% faster)

Per-service p99 at the production scale (1000 × 100, 100k tasks, 600k ops · run R3). Latency compounds across services, ABS wins the compound, even when io2 holds the per-query Qdrant tail.

From FIO to production · what tuning revealed

3  Insights: from FIO to production.

FIO showed ABS at 313K IOPS, 7.8× faster than io2, but a raw-disk number is a ceiling, not a guarantee. Reaching it on a real agent workload, and holding the lead under load, came down to three things.

Insight 1 · Tuning

Tune the set-up to reach the ceiling.

On defaults, ABS spiked to 642 ms Qdrant p99. inline_storage + INT8 quantization on multi-tenant collections settled it at 169–182 ms across fresh deploys. The 642 ms never reproduced.

Insight 2 · Workload

The harder you push, the bigger the lead.

The advantage widens with sustained load: +14% at 1k tasks, +16% at 10k, +16% at 100k, end-to-end vs io2.

Insight 3 · Services

ABS wins despite the Qdrant tail.

ABS loses the per-query Qdrant op but wins the workload: Qdrant is just 2 of 6 ops, its p99 tail is rare (p50 66 vs 51 ms), and ~19% more throughput compounds across 100k tasks, turning a 7% task-p99 lead into a 16% completion lead.

Cost · performance per dollar

4  Cost comparison.

Identical compute (4 vCPU · 16 GB · 256 GB storage) across all five platforms. The only thing changing is the storage tier, and how AWS charges for provisioned IOPS. ABS bundles 20,000 sustained IOPS into a flat per-GB rate; AWS bills io2 IOPS by the kilo-IOP per month.

Instance
ABS· cheapest
$93
gp3-3k
$126
gp3-16k
$126
io2-32k
$126
io2-64k
$126
Storage volume (256 GB)
ABS· cheapest
$24
gp3-3k
$20
gp3-16k
$20
io2-32k
$32
io2-64k
$32
Storage IOPS
ABS· cheapest
included
gp3-3k
free
gp3-16k
$65
io2-32k
$2,080
io2-64k
$3,552
Total / month
ABS· cheapest
$118
gp3-3k
$147
gp3-16k
$212
io2-32k
$2,238
io2-64k
$3,710

ABS · $0.00013/GB/hr, 20,000 sustained IOPS baseline included (600,000 burst; measured 261K–313K in fio, no per-IOPS billing on this tier). AWS gp3 · $0.08/GB/mo, first 3K IOPS free, $0.005/IOPS above. AWS io2 · $0.125/GB/mo, tiered IOPS ($0.065 up to 32K, $0.046 for 32K–64K).

ABS vs io2

Faster and 19–31× cheaper.

$118 vs $2,238 (io2-32k) or $3,710 (io2-64k). ABS finishes the workload 14–17% faster at every scale while costing a fraction of the io2 bill. io2 still wins per-query Qdrant p99 under QD=1, real-time user-facing search may want that path.

The verdict
Faster14–17%

faster than AWS io2 end-to-end on real agent tasks, not synthetic reads.

Under load

Holds from 1k to 100k tasks, widening as you scale. ABS loves heavy.

Price
$118/mo

31× cheaper than $3,710 (io2-64k).

Methodology & reproducibility

Open source. Run it yourself.

All five VMs run identical instance size and workload code. Storage is the only variable. Every step from terraform apply to results JSON is in the repo.

Compute (identical across platforms)

AWS instance
m6i.xlarge
Nirvana instance
n1-standard-4
vCPU · RAM
4 · 16 GB
Disk size
256 GB (all)

Benchmark parameters

Pre-loaded vectors
5,000,000
Vector dimensions
768
Agents × tasks
100 × 10 · 500 × 20 · 1000 × 100
Ops per task
6 (2 Qdrant · 2 Redis · 2 PG)
Cold-read protocol

Every run starts on a fresh terraform-deployed VM. Pre-load 5M vectors into Qdrant (on_disk=True, inline_storage + INT8 quantization, the tuning that makes it shippable), then restart the Qdrant container and sync && echo 3 > /proc/sys/vm/drop_caches to drop the OS page cache before the benchmark. Cold reads only, no warm-cache artifacts.

In closing

Task completion matters. Your faster LangChain agents start here.

End-to-end task completion that leads across p50, p95, and p99, with higher throughput and lower latency where it compounds. Across every scale we tested (1k → 10k → 100k tasks), ABS finishes the workload 14–17% faster than io2, and the lead widens the harder you push it.

Nirvana Labs · LangChain benchmark · 5M vectors · final results · May 2026

Powering AI, blockchain, and
databases

Talk to Sales