LangChain agent benchmark · 5 platforms · 3 independent runs · 5M vectors

Your agents. Faster.

Nirvana ABS finishes the same LangChain agent workload 14–17% faster than AWS io2 at every scale we tested, 1k, 10k, and 100k tasks, and reproducibly across three independent runs. Same code, same config, varying only the storage architecture underneath. The how, the why, and the one caveat are below.

Download the PDF report View the repo

ABS lead14–17%

faster task completion time vs AWS io2

The workload

5M-vector database

QdrantRedisPostgres

Vector search · cache · checkpoints · cold reads · 6 storage ops per task

Reproducibility

3×cold runs

Concurrency

1,000 agents × 100 tasks

100,000 tasks total

Key findings

Summary · Faster task completion end-to-end. High throughput. High IOPS.

Three independent runs of the LangChain multi-tenant Qdrant workload across five storage platforms, scaled from 100 to 1,000 concurrent agents (1k to 100k tasks). ABS finishes first on every run. The four numbers that matter:

Task completion

14–17%faster task completion

Finishes 14–17% faster than io2 at every scale (1K, 10K, 100K tasks). Reproduces across all 3 runs.

Throughput

+19%more throughput

172 app IOPS vs io2’s 145 at 100K tasks. More work through the pipeline, faster finish.

Raw disk IOPS

7.8×faster raw disk

313K IOPS vs io2-64k’s 40K (instance-capped). The hardware ceiling underneath.

Cost

31×cheaper than io2

$118/mo vs $3,710/mo (io2-64k). Flat per-GB pricing, no IOPS provisioning tax.

Background · why LangChain, why storage

Background · The agent layer where the disk actually matters.

LangChain is the most popular open-source framework for building LLM-powered agents (138K GitHub stars). It runs the agent as a loop: the LLM reasons about what to do, acts by calling a tool, observes the result, and repeats until the task is done. A single query can run that cycle 3 to 10 times.

Every Act calls a tool. Some are external (a web search, a Slack message) and resolve over the network. But the tools that retrieve context, cache results, and checkpoint state, Qdrant, Redis, and Postgres, run on your VM, and every vector search, every cache lookup, and every checkpoint write lands on the disk attached to your machine.

The LLM is one API call. The external tools are network requests. The disk is the only layer where cloud storage performance directly changes how fast the loop finishes. That’s why we scoped this benchmark to the storage layer, and chose LangChain as the workload: it’s the framework most teams run in production, generating real disk I/O across real services at real concurrency.

The ReAct loop · repeats until done

Bundled first: LLM + Tools + Prompt

Reason

LLM reads the query + history, decides the next action. One API call.

Act

Calls a tool: Qdrant search, Redis cache, Postgres checkpoint. This is the step that hits your disk.

Observe

Agent reads the tool’s result, then loops back to Reason.

Objectives · what we set out to prove

The questions · Three things we set out to answer.

Not the questions a vendor benchmark answers, the ones a team actually asks before moving agents off AWS, including the one most benchmarks avoid.

Objective 1

How much faster is Nirvana ABS vs AWS EBS (io2 & gp3)?

On real agent tasks, RAG, vector search, caching, checkpointing, not synthetic reads.

Objective 2

Do the FIO numbers survive a production workload?

ABS owns the 4K-random-read benchmark every vendor quotes. Does that win finish tasks first?

Objective 3

Where does the lead hold under load?

Stress-tested across three scales, 1k → 100k tasks.

What we're testing · same agents, same code, five platforms

The setup · Five platforms. Identical workload. Different block storage.

We deployed identical LangChain agent workloads across five infrastructure configurations, four on AWS and one on Nirvana. Each VM runs Qdrant (vector DB), Redis (cache), and Postgres (checkpoints) locally via Docker. Same code, same data, same task sequence, the only variable is the infrastructure underneath.

Config	Instance	vCPU / RAM	Storage	Provisioned IOPS	Cost / mo · 256 GB
Nirvana ABS	n1-standard-4	4 / 16 GB · DDR5	ABS 256 GB	20,000600k burst · included	$23.94
gp3-3k	m6i.xlarge	4 / 16 GB · DDR4	gp3 256 GB	3,000	$20.48
gp3-16k	m6i.xlarge	4 / 16 GB · DDR4	gp3 256 GB	16,000	$85.48
io2-32k	m6i.xlarge	4 / 16 GB · DDR4	io2 256 GB	32,000	$2,112
io2-64k	m6i.xlarge	4 / 16 GB · DDR4	io2 256 GB	64,000	$3,584

Nirvana ABS

Instance: n1-standard-4
vCPU / RAM: 4 / 16 GB · DDR5
Storage: ABS 256 GB
Provisioned IOPS: 20,000600k burst · included
Cost / mo · 256 GB: $23.94

gp3-3k

Instance: m6i.xlarge
vCPU / RAM: 4 / 16 GB · DDR4
Storage: gp3 256 GB
Provisioned IOPS: 3,000
Cost / mo · 256 GB: $20.48

gp3-16k

Instance: m6i.xlarge
vCPU / RAM: 4 / 16 GB · DDR4
Storage: gp3 256 GB
Provisioned IOPS: 16,000
Cost / mo · 256 GB: $85.48

io2-32k

Instance: m6i.xlarge
vCPU / RAM: 4 / 16 GB · DDR4
Storage: io2 256 GB
Provisioned IOPS: 32,000
Cost / mo · 256 GB: $2,112

io2-64k

Instance: m6i.xlarge
vCPU / RAM: 4 / 16 GB · DDR4
Storage: io2 256 GB
Provisioned IOPS: 64,000
Cost / mo · 256 GB: $3,584

Test 1 · Raw disk (fio)

1 FIO · ABS is 7.8× faster than AWS io2 on raw disk.

ABS pushes 313,074 IOPS, 7.8× the best io2 can deliver on this instance class. Latency at the same load: 817 μs vs 6,345 μs.

This is the ceiling. The question is whether the application actually pulls against it, and as the next layer shows, HNSW does not.

Raw fio · 4K random read · QD=256IOPS (log-ish scale, ABS = 100%)

Nirvana ABS

313,0741.00×

io2-64kinstance-capped

40,3390.13×

io2-32k

33,0700.11×

gp3-16k

16,5300.05×

gp3-3k

3,0970.01×

Platform	Provisioned IOPS	Measured IOPS	Latency	ABS IOPS lead
Nirvana ABS	20,000600k burst · included	313,074261K–313K across runs	817 μs	baseline
io2-64k	64,000	40,339capped at instance limit	6,345 μs	7.8×
io2-32k	32,000	33,070	7,740 μs	9.5×
gp3-16k	16,000	16,530	15,485 μs	19×
gp3-3k	3,000	3,097	82,632 μs	101×

Nirvana ABS

Provisioned IOPS: 20,000600k burst · included
Measured IOPS: 313,074261K–313K across runs
Latency: 817 μs
ABS IOPS lead: baseline

io2-64k

Provisioned IOPS: 64,000
Measured IOPS: 40,339capped at instance limit
Latency: 6,345 μs
ABS IOPS lead: 7.8×

io2-32k

Provisioned IOPS: 32,000
Measured IOPS: 33,070
Latency: 7,740 μs
ABS IOPS lead: 9.5×

gp3-16k

Provisioned IOPS: 16,000
Measured IOPS: 16,530
Latency: 15,485 μs
ABS IOPS lead: 19×

gp3-3k

Provisioned IOPS: 3,000
Measured IOPS: 3,097
Latency: 82,632 μs
ABS IOPS lead: 101×

m6i.xlarge instance limit caps io2-64k at ~40K IOPS regardless of provisioned ceiling. ABS has no fixed provisioned cap, so its fio result varies run-to-run (261K–313K IOPS); 313,074 is the peak measured, and even the low end is 6.5× io2-64k.

Methodology · test setup

2 3× Cold Reads · Methodology

Two pre-conditions have to be true before any result means something in production. Both are workload setup applied identically to all five platforms, not platform tuning.

Pre-condition 1

From 6K vectors to 5M. From RAM to disk.

A 6K-vector check ran fast everywhere, the index fit in RAM. So we pushed it to 5M vectors (768-dim, ~15 GB) across 50 collections of 100K, the production multi-tenant pattern. At that size the working set spills past page cache, every query hits disk.

Cold reads only: Qdrant restarted and OS page cache dropped before each run.

Pre-condition 2

`inline_storage` + INT8 scalar quantization.

Every collection runs hnsw_config.inline_storage=true + INT8 scalar quantization (a Qdrant 1.16 feature). Default HNSW lands a Qdrant p99 of 24–43 seconds on every tier, unshippable.

It compresses the p99 tail 7.6–12.8× on every platform, bigger than any inter-platform gap.

Tiered multi-tenancy at scale (Qdrant 1.16) · the production workload

2.1 ABS finishes first on end-to-end task completion.

We split the 5M vectors into 50 per-tenant collections and ran the cold-read workload at three scales, three times each (R1/R2/R3); R3 cranks the 1,000-agent run to 100k tasks. ABS wins task completion and task p99 at every scale, and the lead widens with sustained load — reproducibly, across all three runs.

Scale

Run · same cold-read test ×3 · R3 = canonical

100 agents × 10 tasks · run R3 · 1,000 tasks total

Platform	Task completion	App IOPS	Task p50	Task p95	Task p99
Nirvana ABS	36 s	169.0	343	446	564
io2-32k	43 s	141.0	419	542	600
io2-64k	42 s	143.0	413	532	595
gp3-16k	43 s	140.0	420	537	696
gp3-3k	62 s	97.0	606	1,013	2,360

Nirvana ABS

Task completion: 36 s
App IOPS: 169.0
Task p50: 343
Task p95: 446
Task p99: 564

io2-32k

Task completion: 43 s
App IOPS: 141.0
Task p50: 419
Task p95: 542
Task p99: 600

io2-64k

Task completion: 42 s
App IOPS: 143.0
Task p50: 413
Task p95: 532
Task p99: 595

gp3-16k

Task completion: 43 s
App IOPS: 140.0
Task p50: 420
Task p95: 537
Task p99: 696

gp3-3k

Task completion: 62 s
App IOPS: 97.0
Task p50: 606
Task p95: 1,013
Task p99: 2,360

Same cold-read test, run three times. ABS leads task completion, throughput, and task p99 in every run. Task latencies in ms. p50/p95 were captured at the 100×10 and 100k scales; 500×20 and 1000×10 report task p99 only. R3 is the canonical result; per-service Qdrant / Redis / Postgres latencies are below.

2.2 Holds across three scenarios, 1,000 to 100,000 tasks.

Same five VMs, three workload sizes, 1k, 10k, and 100k total tasks. ABS finishes first on every one. The lead widens with sustained load.

Scale	Metric	Nirvana ABSwinner	io2-32k	io2-64k	gp3-16k	gp3-3k	ABS lead
100×101k tasks	Task completion	36 s	43 s	42 s	43 s	62 s	14% faster
	Task p99	564 ms	600 ms	595 ms	696 ms	2,360 ms	5% lower latency
500×2010k tasks	Task completion	351 s	420 s	424 s	430 s	433 s	16% faster
	Task p99	501 ms	591 ms	600 ms	609 ms	828 ms	15% lower latency
1000×100100k tasks	Task completion	58 min	69 min	69 min	68 min	71 min	15% faster
	Task p99	725 ms	779 ms	771 ms	760 ms	809 ms	5% lower latency

100×101k tasks

Metric: Task completion
Nirvana ABS· winner: 36 s
io2-32k: 43 s
io2-64k: 42 s
gp3-16k: 43 s
gp3-3k: 62 s
ABS lead: 14% faster

100×101k tasks

Metric: Task p99
Nirvana ABS· winner: 564 ms
io2-32k: 600 ms
io2-64k: 595 ms
gp3-16k: 696 ms
gp3-3k: 2,360 ms
ABS lead: 5% lower latency

500×2010k tasks

Metric: Task completion
Nirvana ABS· winner: 351 s
io2-32k: 420 s
io2-64k: 424 s
gp3-16k: 430 s
gp3-3k: 433 s
ABS lead: 16% faster

500×2010k tasks

Metric: Task p99
Nirvana ABS· winner: 501 ms
io2-32k: 591 ms
io2-64k: 600 ms
gp3-16k: 609 ms
gp3-3k: 828 ms
ABS lead: 15% lower latency

1000×100100k tasks

Metric: Task completion
Nirvana ABS· winner: 58 min
io2-32k: 69 min
io2-64k: 69 min
gp3-16k: 68 min
gp3-3k: 71 min
ABS lead: 15% faster

1000×100100k tasks

Metric: Task p99
Nirvana ABS· winner: 725 ms
io2-32k: 779 ms
io2-64k: 771 ms
gp3-16k: 760 ms
gp3-3k: 809 ms
ABS lead: 5% lower latency

All R3 runs on fresh terraform deploys. ABS wins both task completion and task p99 at every scale.

Per-service wins at 1000×100

Once the workload runs long enough to flush short-term caching effects, ABS also takes Redis p99 (68.0 ms vs 70–73 ms on AWS) and Postgres p99 (31.9 ms vs 37–38 ms on AWS), the first scenario where Nirvana sweeps the cache + checkpoint paths. AWS io2 keeps the per-query Qdrant p99 (140 ms vs ABS 301 ms at this scale), but at 100k tasks × 6 ops, the win flips wherever a single op isn’t on the user-facing critical path.

Per-service compound · 1000 × 100 production

2.3 Latency compounds across services. ABS finishes 16% faster.

Each agent task chains 6 ops across 3 services (Qdrant ×2 · Redis ×2 · Postgres ×2). Per-op latency compounds into task time. At the heaviest scale we tested, ABS leads on Redis, Postgres, and end-to-end task p99, even where io2 holds the per-query Qdrant tail.

Service	Per task	ABS p99winner	io2-32k	io2-64k	gp3-16k	gp3-3k	ABS vs io2-64k
Qdrantvector search	2 ops	301 ms	140 ms	140 ms	147 ms	307 ms	+161 ms (115% slower)
Rediscache reads	2 ops	68.0 ms	71.9 ms	70.4 ms	71.8 ms	72.6 ms	−2.4 ms (3% faster)
Postgrescheckpoint writes	2 ops	31.9 ms	38.1 ms	37.7 ms	37.0 ms	38.2 ms	−5.8 ms (15% faster)
Task p99compound across services	6 ops	725 ms	779 ms	771 ms	760 ms	809 ms	−46 ms (6% faster)
Task completiontotal wall-clock	100k tasks	58 min	69 min	69 min	68 min	71 min	−11 min (16% faster)

Qdrantvector search

Per task: 2 ops
ABS p99: 301 ms
io2-32k: 140 ms
io2-64k: 140 ms
gp3-16k: 147 ms
gp3-3k: 307 ms
ABS vs io2-64k: +161 ms (115% slower)

Rediscache reads

Per task: 2 ops
ABS p99· winner: 68.0 ms
io2-32k: 71.9 ms
io2-64k: 70.4 ms
gp3-16k: 71.8 ms
gp3-3k: 72.6 ms
ABS vs io2-64k: −2.4 ms (3% faster)

Postgrescheckpoint writes

Per task: 2 ops
ABS p99· winner: 31.9 ms
io2-32k: 38.1 ms
io2-64k: 37.7 ms
gp3-16k: 37.0 ms
gp3-3k: 38.2 ms
ABS vs io2-64k: −5.8 ms (15% faster)

Task p99compound across services

Per task: 6 ops
ABS p99· winner: 725 ms
io2-32k: 779 ms
io2-64k: 771 ms
gp3-16k: 760 ms
gp3-3k: 809 ms
ABS vs io2-64k: −46 ms (6% faster)

Task completiontotal wall-clock

Per task: 100k tasks
ABS p99· winner: 58 min
io2-32k: 69 min
io2-64k: 69 min
gp3-16k: 68 min
gp3-3k: 71 min
ABS vs io2-64k: −11 min (16% faster)

Per-service p99 at the production scale (1000 × 100, 100k tasks, 600k ops · run R3). Latency compounds across services, ABS wins the compound, even when io2 holds the per-query Qdrant tail.

From FIO to production · what tuning revealed

3 Insights: from FIO to production.

FIO showed ABS at 313K IOPS, 7.8× faster than io2, but a raw-disk number is a ceiling, not a guarantee. Reaching it on a real agent workload, and holding the lead under load, came down to three things.

Insight 1 · Tuning

Tune the set-up to reach the ceiling.

On defaults, ABS spiked to 642 ms Qdrant p99. inline_storage + INT8 quantization on multi-tenant collections settled it at 169–182 ms across fresh deploys. The 642 ms never reproduced.

Insight 2 · Workload

The harder you push, the bigger the lead.

The advantage widens with sustained load: +14% at 1k tasks, +16% at 10k, +16% at 100k, end-to-end vs io2.

Insight 3 · Services

ABS wins despite the Qdrant tail.

ABS loses the per-query Qdrant op but wins the workload: Qdrant is just 2 of 6 ops, its p99 tail is rare (p50 66 vs 51 ms), and ~19% more throughput compounds across 100k tasks, turning a 7% task-p99 lead into a 16% completion lead.

Cost · performance per dollar

4 Cost comparison.

Identical compute (4 vCPU · 16 GB · 256 GB storage) across all five platforms. The only thing changing is the storage tier, and how AWS charges for provisioned IOPS. ABS bundles 20,000 sustained IOPS into a flat per-GB rate; AWS bills io2 IOPS by the kilo-IOP per month.

Component	ABScheapest	gp3-3k	gp3-16k	io2-32k	io2-64k
Instance	$93	$126	$126	$126	$126
Storage volume (256 GB)	$24	$20	$20	$32	$32
Storage IOPS	included	free	$65	$2,080	$3,552
Total / month	$118	$147	$212	$2,238	$3,710

Instance

ABS· cheapest: $93
gp3-3k: $126
gp3-16k: $126
io2-32k: $126
io2-64k: $126

Storage volume (256 GB)

ABS· cheapest: $24
gp3-3k: $20
gp3-16k: $20
io2-32k: $32
io2-64k: $32

Storage IOPS

ABS· cheapest: included
gp3-3k: free
gp3-16k: $65
io2-32k: $2,080
io2-64k: $3,552

Total / month

ABS· cheapest: $118
gp3-3k: $147
gp3-16k: $212
io2-32k: $2,238
io2-64k: $3,710

ABS · $0.00013/GB/hr, 20,000 sustained IOPS baseline included (600,000 burst; measured 261K–313K in fio, no per-IOPS billing on this tier). AWS gp3 · $0.08/GB/mo, first 3K IOPS free, $0.005/IOPS above. AWS io2 · $0.125/GB/mo, tiered IOPS ($0.065 up to 32K, $0.046 for 32K–64K).

ABS vs io2

Faster and 19–31× cheaper.

$118 vs $2,238 (io2-32k) or $3,710 (io2-64k). ABS finishes the workload 14–17% faster at every scale while costing a fraction of the io2 bill. io2 still wins per-query Qdrant p99 under QD=1, real-time user-facing search may want that path.

The verdict

Faster14–17%

faster than AWS io2 end-to-end on real agent tasks, not synthetic reads.

Under load

Holds from 1k to 100k tasks, widening as you scale. ABS loves heavy.

Price

$118/mo

31× cheaper than $3,710 (io2-64k).

Methodology & reproducibility

Open source. Run it yourself.

All five VMs run identical instance size and workload code. Storage is the only variable. Every step from terraform apply to results JSON is in the repo.

Compute (identical across platforms)

AWS instance: m6i.xlarge
Nirvana instance: n1-standard-4
vCPU · RAM: 4 · 16 GB
Disk size: 256 GB (all)

Benchmark parameters

Pre-loaded vectors: 5,000,000
Vector dimensions: 768
Agents × tasks: 100 × 10 · 500 × 20 · 1000 × 100
Ops per task: 6 (2 Qdrant · 2 Redis · 2 PG)

Cold-read protocol

Every run starts on a fresh terraform-deployed VM. Pre-load 5M vectors into Qdrant (on_disk=True, inline_storage + INT8 quantization, the tuning that makes it shippable), then restart the Qdrant container and sync && echo 3 > /proc/sys/vm/drop_caches to drop the OS page cache before the benchmark. Cold reads only, no warm-cache artifacts.

In closing

Task completion matters. Your faster LangChain agents start here.

End-to-end task completion that leads across p50, p95, and p99, with higher throughput and lower latency where it compounds. Across every scale we tested (1k → 10k → 100k tasks), ABS finishes the workload 14–17% faster than io2, and the lead widens the harder you push it.

Download the PDF report View the repo

Nirvana Labs · LangChain benchmark · 5M vectors · final results · May 2026

Powering AI, blockchain, and
databases

Talk to Sales

Your agents. Faster.

Summary · Faster task completion end-to-end. High throughput. High IOPS.

Background · The agent layer where the disk actually matters.

The questions · Three things we set out to answer.

The setup · Five platforms. Identical workload. Different block storage.

1 FIO · ABS is 7.8× faster than AWS io2 on raw disk.

2 3× Cold Reads · Methodology

From 6K vectors to 5M. From RAM to disk.

inline_storage + INT8 scalar quantization.

2.1 ABS finishes first on end-to-end task completion.

2.2 Holds across three scenarios, 1,000 to 100,000 tasks.

2.3 Latency compounds across services. ABS finishes 16% faster.

3 Insights: from FIO to production.

4 Cost comparison.

Open source. Run it yourself.

Compute (identical across platforms)

Benchmark parameters

Task completion matters. Your faster LangChain agents start here.

Powering AI, blockchain, anddatabases

`inline_storage` + INT8 scalar quantization.

Powering AI, blockchain, and
databases