Your agents. Faster.
Nirvana ABS finishes the same LangChain agent workload 14–17% faster than AWS io2 at every scale we tested, 1k, 10k, and 100k tasks, and reproducibly across three independent runs. Same code, same config, varying only the storage architecture underneath. The how, the why, and the one caveat are below.
faster task completion time vs AWS io2
Vector search · cache · checkpoints · cold reads · 6 storage ops per task
100,000 tasks total
Summary · Faster task completion end-to-end. High throughput. High IOPS.
Three independent runs of the LangChain multi-tenant Qdrant workload across five storage platforms, scaled from 100 to 1,000 concurrent agents (1k to 100k tasks). ABS finishes first on every run. The four numbers that matter:
Finishes 14–17% faster than io2 at every scale (1K, 10K, 100K tasks). Reproduces across all 3 runs.
172 app IOPS vs io2’s 145 at 100K tasks. More work through the pipeline, faster finish.
313K IOPS vs io2-64k’s 40K (instance-capped). The hardware ceiling underneath.
$118/mo vs $3,710/mo (io2-64k). Flat per-GB pricing, no IOPS provisioning tax.
Background · The agent layer where the disk actually matters.
LangChain is the most popular open-source framework for building LLM-powered agents (138K GitHub stars). It runs the agent as a loop: the LLM reasons about what to do, acts by calling a tool, observes the result, and repeats until the task is done. A single query can run that cycle 3 to 10 times.
Every Act calls a tool. Some are external (a web search, a Slack message) and resolve over the network. But the tools that retrieve context, cache results, and checkpoint state, Qdrant, Redis, and Postgres, run on your VM, and every vector search, every cache lookup, and every checkpoint write lands on the disk attached to your machine.
The LLM is one API call. The external tools are network requests. The disk is the only layer where cloud storage performance directly changes how fast the loop finishes. That’s why we scoped this benchmark to the storage layer, and chose LangChain as the workload: it’s the framework most teams run in production, generating real disk I/O across real services at real concurrency.
The questions · Three things we set out to answer.
Not the questions a vendor benchmark answers, the ones a team actually asks before moving agents off AWS, including the one most benchmarks avoid.
How much faster is Nirvana ABS vs AWS EBS (io2 & gp3)?
On real agent tasks, RAG, vector search, caching, checkpointing, not synthetic reads.
Do the FIO numbers survive a production workload?
ABS owns the 4K-random-read benchmark every vendor quotes. Does that win finish tasks first?
Where does the lead hold under load?
Stress-tested across three scales, 1k → 100k tasks.
The setup · Five platforms. Identical workload. Different block storage.
We deployed identical LangChain agent workloads across five infrastructure configurations, four on AWS and one on Nirvana. Each VM runs Qdrant (vector DB), Redis (cache), and Postgres (checkpoints) locally via Docker. Same code, same data, same task sequence, the only variable is the infrastructure underneath.
| Config | Instance | vCPU / RAM | Storage | Provisioned IOPS | Cost / mo · 256 GB |
|---|---|---|---|---|---|
| Nirvana ABS | n1-standard-4 | 4 / 16 GB · DDR5 | ABS 256 GB | 20,000600k burst · included | $23.94 |
| gp3-3k | m6i.xlarge | 4 / 16 GB · DDR4 | gp3 256 GB | 3,000 | $20.48 |
| gp3-16k | m6i.xlarge | 4 / 16 GB · DDR4 | gp3 256 GB | 16,000 | $85.48 |
| io2-32k | m6i.xlarge | 4 / 16 GB · DDR4 | io2 256 GB | 32,000 | $2,112 |
| io2-64k | m6i.xlarge | 4 / 16 GB · DDR4 | io2 256 GB | 64,000 | $3,584 |
- Instance
- n1-standard-4
- vCPU / RAM
- 4 / 16 GB · DDR5
- Storage
- ABS 256 GB
- Provisioned IOPS
- 20,000600k burst · included
- Cost / mo · 256 GB
- $23.94
- Instance
- m6i.xlarge
- vCPU / RAM
- 4 / 16 GB · DDR4
- Storage
- gp3 256 GB
- Provisioned IOPS
- 3,000
- Cost / mo · 256 GB
- $20.48
- Instance
- m6i.xlarge
- vCPU / RAM
- 4 / 16 GB · DDR4
- Storage
- gp3 256 GB
- Provisioned IOPS
- 16,000
- Cost / mo · 256 GB
- $85.48
- Instance
- m6i.xlarge
- vCPU / RAM
- 4 / 16 GB · DDR4
- Storage
- io2 256 GB
- Provisioned IOPS
- 32,000
- Cost / mo · 256 GB
- $2,112
- Instance
- m6i.xlarge
- vCPU / RAM
- 4 / 16 GB · DDR4
- Storage
- io2 256 GB
- Provisioned IOPS
- 64,000
- Cost / mo · 256 GB
- $3,584
1 FIO · ABS is 7.8× faster than AWS io2 on raw disk.
ABS pushes 313,074 IOPS, 7.8× the best io2 can deliver on this instance class. Latency at the same load: 817 μs vs 6,345 μs.
This is the ceiling. The question is whether the application actually pulls against it, and as the next layer shows, HNSW does not.
| Platform | Provisioned IOPS | Measured IOPS | Latency | ABS IOPS lead |
|---|---|---|---|---|
| Nirvana ABS | 20,000600k burst · included | 313,074261K–313K across runs | 817 μs | baseline |
| io2-64k | 64,000 | 40,339capped at instance limit | 6,345 μs | 7.8× |
| io2-32k | 32,000 | 33,070 | 7,740 μs | 9.5× |
| gp3-16k | 16,000 | 16,530 | 15,485 μs | 19× |
| gp3-3k | 3,000 | 3,097 | 82,632 μs | 101× |
- Provisioned IOPS
- 20,000600k burst · included
- Measured IOPS
- 313,074261K–313K across runs
- Latency
- 817 μs
- ABS IOPS lead
- baseline
- Provisioned IOPS
- 64,000
- Measured IOPS
- 40,339capped at instance limit
- Latency
- 6,345 μs
- ABS IOPS lead
- 7.8×
- Provisioned IOPS
- 32,000
- Measured IOPS
- 33,070
- Latency
- 7,740 μs
- ABS IOPS lead
- 9.5×
- Provisioned IOPS
- 16,000
- Measured IOPS
- 16,530
- Latency
- 15,485 μs
- ABS IOPS lead
- 19×
- Provisioned IOPS
- 3,000
- Measured IOPS
- 3,097
- Latency
- 82,632 μs
- ABS IOPS lead
- 101×
m6i.xlarge instance limit caps io2-64k at ~40K IOPS regardless of provisioned ceiling. ABS has no fixed provisioned cap, so its fio result varies run-to-run (261K–313K IOPS); 313,074 is the peak measured, and even the low end is 6.5× io2-64k.
2 3× Cold Reads · Methodology
Two pre-conditions have to be true before any result means something in production. Both are workload setup applied identically to all five platforms, not platform tuning.
From 6K vectors to 5M. From RAM to disk.
A 6K-vector check ran fast everywhere, the index fit in RAM. So we pushed it to 5M vectors (768-dim, ~15 GB) across 50 collections of 100K, the production multi-tenant pattern. At that size the working set spills past page cache, every query hits disk.
Cold reads only: Qdrant restarted and OS page cache dropped before each run.
inline_storage + INT8 scalar quantization.
Every collection runs hnsw_config.inline_storage=true + INT8 scalar quantization (a Qdrant 1.16 feature). Default HNSW lands a Qdrant p99 of 24–43 seconds on every tier, unshippable.
It compresses the p99 tail 7.6–12.8× on every platform, bigger than any inter-platform gap.
2.1 ABS finishes first on end-to-end task completion.
We split the 5M vectors into 50 per-tenant collections and ran the cold-read workload at three scales, three times each (R1/R2/R3); R3 cranks the 1,000-agent run to 100k tasks. ABS wins task completion and task p99 at every scale, and the lead widens with sustained load — reproducibly, across all three runs.
100 agents × 10 tasks · run R3 · 1,000 tasks total
| Platform | Task completion | App IOPS | Task p50 | Task p95 | Task p99 |
|---|---|---|---|---|---|
| Nirvana ABS | 36 s | 169.0 | 343 | 446 | 564 |
| io2-32k | 43 s | 141.0 | 419 | 542 | 600 |
| io2-64k | 42 s | 143.0 | 413 | 532 | 595 |
| gp3-16k | 43 s | 140.0 | 420 | 537 | 696 |
| gp3-3k | 62 s | 97.0 | 606 | 1,013 | 2,360 |
- Task completion
- 36 s
- App IOPS
- 169.0
- Task p50
- 343
- Task p95
- 446
- Task p99
- 564
- Task completion
- 43 s
- App IOPS
- 141.0
- Task p50
- 419
- Task p95
- 542
- Task p99
- 600
- Task completion
- 42 s
- App IOPS
- 143.0
- Task p50
- 413
- Task p95
- 532
- Task p99
- 595
- Task completion
- 43 s
- App IOPS
- 140.0
- Task p50
- 420
- Task p95
- 537
- Task p99
- 696
- Task completion
- 62 s
- App IOPS
- 97.0
- Task p50
- 606
- Task p95
- 1,013
- Task p99
- 2,360
Same cold-read test, run three times. ABS leads task completion, throughput, and task p99 in every run. Task latencies in ms. p50/p95 were captured at the 100×10 and 100k scales; 500×20 and 1000×10 report task p99 only. R3 is the canonical result; per-service Qdrant / Redis / Postgres latencies are below.
2.2 Holds across three scenarios, 1,000 to 100,000 tasks.
Same five VMs, three workload sizes, 1k, 10k, and 100k total tasks. ABS finishes first on every one. The lead widens with sustained load.
| Scale | Metric | Nirvana ABSwinner | io2-32k | io2-64k | gp3-16k | gp3-3k | ABS lead |
|---|---|---|---|---|---|---|---|
| 100×101k tasks | Task completion | 36 s | 43 s | 42 s | 43 s | 62 s | 14% faster |
| Task p99 | 564 ms | 600 ms | 595 ms | 696 ms | 2,360 ms | 5% lower latency | |
| 500×2010k tasks | Task completion | 351 s | 420 s | 424 s | 430 s | 433 s | 16% faster |
| Task p99 | 501 ms | 591 ms | 600 ms | 609 ms | 828 ms | 15% lower latency | |
| 1000×100100k tasks | Task completion | 58 min | 69 min | 69 min | 68 min | 71 min | 15% faster |
| Task p99 | 725 ms | 779 ms | 771 ms | 760 ms | 809 ms | 5% lower latency |
- Metric
- Task completion
- Nirvana ABS· winner
- 36 s
- io2-32k
- 43 s
- io2-64k
- 42 s
- gp3-16k
- 43 s
- gp3-3k
- 62 s
- ABS lead
- 14% faster
- Metric
- Task p99
- Nirvana ABS· winner
- 564 ms
- io2-32k
- 600 ms
- io2-64k
- 595 ms
- gp3-16k
- 696 ms
- gp3-3k
- 2,360 ms
- ABS lead
- 5% lower latency
- Metric
- Task completion
- Nirvana ABS· winner
- 351 s
- io2-32k
- 420 s
- io2-64k
- 424 s
- gp3-16k
- 430 s
- gp3-3k
- 433 s
- ABS lead
- 16% faster
- Metric
- Task p99
- Nirvana ABS· winner
- 501 ms
- io2-32k
- 591 ms
- io2-64k
- 600 ms
- gp3-16k
- 609 ms
- gp3-3k
- 828 ms
- ABS lead
- 15% lower latency
- Metric
- Task completion
- Nirvana ABS· winner
- 58 min
- io2-32k
- 69 min
- io2-64k
- 69 min
- gp3-16k
- 68 min
- gp3-3k
- 71 min
- ABS lead
- 15% faster
- Metric
- Task p99
- Nirvana ABS· winner
- 725 ms
- io2-32k
- 779 ms
- io2-64k
- 771 ms
- gp3-16k
- 760 ms
- gp3-3k
- 809 ms
- ABS lead
- 5% lower latency
All R3 runs on fresh terraform deploys. ABS wins both task completion and task p99 at every scale.
Once the workload runs long enough to flush short-term caching effects, ABS also takes Redis p99 (68.0 ms vs 70–73 ms on AWS) and Postgres p99 (31.9 ms vs 37–38 ms on AWS), the first scenario where Nirvana sweeps the cache + checkpoint paths. AWS io2 keeps the per-query Qdrant p99 (140 ms vs ABS 301 ms at this scale), but at 100k tasks × 6 ops, the win flips wherever a single op isn’t on the user-facing critical path.
2.3 Latency compounds across services. ABS finishes 16% faster.
Each agent task chains 6 ops across 3 services (Qdrant ×2 · Redis ×2 · Postgres ×2). Per-op latency compounds into task time. At the heaviest scale we tested, ABS leads on Redis, Postgres, and end-to-end task p99, even where io2 holds the per-query Qdrant tail.
| Service | Per task | ABS p99winner | io2-32k | io2-64k | gp3-16k | gp3-3k | ABS vs io2-64k |
|---|---|---|---|---|---|---|---|
| Qdrantvector search | 2 ops | 301 ms | 140 ms | 140 ms | 147 ms | 307 ms | +161 ms (115% slower) |
| Rediscache reads | 2 ops | 68.0 ms | 71.9 ms | 70.4 ms | 71.8 ms | 72.6 ms | −2.4 ms (3% faster) |
| Postgrescheckpoint writes | 2 ops | 31.9 ms | 38.1 ms | 37.7 ms | 37.0 ms | 38.2 ms | −5.8 ms (15% faster) |
| Task p99compound across services | 6 ops | 725 ms | 779 ms | 771 ms | 760 ms | 809 ms | −46 ms (6% faster) |
| Task completiontotal wall-clock | 100k tasks | 58 min | 69 min | 69 min | 68 min | 71 min | −11 min (16% faster) |
- Per task
- 2 ops
- ABS p99
- 301 ms
- io2-32k
- 140 ms
- io2-64k
- 140 ms
- gp3-16k
- 147 ms
- gp3-3k
- 307 ms
- ABS vs io2-64k
- +161 ms (115% slower)
- Per task
- 2 ops
- ABS p99· winner
- 68.0 ms
- io2-32k
- 71.9 ms
- io2-64k
- 70.4 ms
- gp3-16k
- 71.8 ms
- gp3-3k
- 72.6 ms
- ABS vs io2-64k
- −2.4 ms (3% faster)
- Per task
- 2 ops
- ABS p99· winner
- 31.9 ms
- io2-32k
- 38.1 ms
- io2-64k
- 37.7 ms
- gp3-16k
- 37.0 ms
- gp3-3k
- 38.2 ms
- ABS vs io2-64k
- −5.8 ms (15% faster)
- Per task
- 6 ops
- ABS p99· winner
- 725 ms
- io2-32k
- 779 ms
- io2-64k
- 771 ms
- gp3-16k
- 760 ms
- gp3-3k
- 809 ms
- ABS vs io2-64k
- −46 ms (6% faster)
- Per task
- 100k tasks
- ABS p99· winner
- 58 min
- io2-32k
- 69 min
- io2-64k
- 69 min
- gp3-16k
- 68 min
- gp3-3k
- 71 min
- ABS vs io2-64k
- −11 min (16% faster)
Per-service p99 at the production scale (1000 × 100, 100k tasks, 600k ops · run R3). Latency compounds across services, ABS wins the compound, even when io2 holds the per-query Qdrant tail.
3 Insights: from FIO to production.
FIO showed ABS at 313K IOPS, 7.8× faster than io2, but a raw-disk number is a ceiling, not a guarantee. Reaching it on a real agent workload, and holding the lead under load, came down to three things.
Tune the set-up to reach the ceiling.
On defaults, ABS spiked to 642 ms Qdrant p99. inline_storage + INT8 quantization on multi-tenant collections settled it at 169–182 ms across fresh deploys. The 642 ms never reproduced.
The harder you push, the bigger the lead.
The advantage widens with sustained load: +14% at 1k tasks, +16% at 10k, +16% at 100k, end-to-end vs io2.
ABS wins despite the Qdrant tail.
ABS loses the per-query Qdrant op but wins the workload: Qdrant is just 2 of 6 ops, its p99 tail is rare (p50 66 vs 51 ms), and ~19% more throughput compounds across 100k tasks, turning a 7% task-p99 lead into a 16% completion lead.
4 Cost comparison.
Identical compute (4 vCPU · 16 GB · 256 GB storage) across all five platforms. The only thing changing is the storage tier, and how AWS charges for provisioned IOPS. ABS bundles 20,000 sustained IOPS into a flat per-GB rate; AWS bills io2 IOPS by the kilo-IOP per month.
| Component | ABScheapest | gp3-3k | gp3-16k | io2-32k | io2-64k |
|---|---|---|---|---|---|
| Instance | $93 | $126 | $126 | $126 | $126 |
| Storage volume (256 GB) | $24 | $20 | $20 | $32 | $32 |
| Storage IOPS | included | free | $65 | $2,080 | $3,552 |
| Total / month | $118 | $147 | $212 | $2,238 | $3,710 |
- ABS· cheapest
- $93
- gp3-3k
- $126
- gp3-16k
- $126
- io2-32k
- $126
- io2-64k
- $126
- ABS· cheapest
- $24
- gp3-3k
- $20
- gp3-16k
- $20
- io2-32k
- $32
- io2-64k
- $32
- ABS· cheapest
- included
- gp3-3k
- free
- gp3-16k
- $65
- io2-32k
- $2,080
- io2-64k
- $3,552
- ABS· cheapest
- $118
- gp3-3k
- $147
- gp3-16k
- $212
- io2-32k
- $2,238
- io2-64k
- $3,710
ABS · $0.00013/GB/hr, 20,000 sustained IOPS baseline included (600,000 burst; measured 261K–313K in fio, no per-IOPS billing on this tier). AWS gp3 · $0.08/GB/mo, first 3K IOPS free, $0.005/IOPS above. AWS io2 · $0.125/GB/mo, tiered IOPS ($0.065 up to 32K, $0.046 for 32K–64K).
Faster and 19–31× cheaper.
$118 vs $2,238 (io2-32k) or $3,710 (io2-64k). ABS finishes the workload 14–17% faster at every scale while costing a fraction of the io2 bill. io2 still wins per-query Qdrant p99 under QD=1, real-time user-facing search may want that path.
faster than AWS io2 end-to-end on real agent tasks, not synthetic reads.
Holds from 1k to 100k tasks, widening as you scale. ABS loves heavy.
31× cheaper than $3,710 (io2-64k).
Open source. Run it yourself.
All five VMs run identical instance size and workload code. Storage is the only variable. Every step from terraform apply to results JSON is in the repo.
Compute (identical across platforms)
- AWS instance
- m6i.xlarge
- Nirvana instance
- n1-standard-4
- vCPU · RAM
- 4 · 16 GB
- Disk size
- 256 GB (all)
Benchmark parameters
- Pre-loaded vectors
- 5,000,000
- Vector dimensions
- 768
- Agents × tasks
- 100 × 10 · 500 × 20 · 1000 × 100
- Ops per task
- 6 (2 Qdrant · 2 Redis · 2 PG)
Every run starts on a fresh terraform-deployed VM. Pre-load 5M vectors into Qdrant (on_disk=True, inline_storage + INT8 quantization, the tuning that makes it shippable), then restart the Qdrant container and sync && echo 3 > /proc/sys/vm/drop_caches to drop the OS page cache before the benchmark. Cold reads only, no warm-cache artifacts.
Task completion matters. Your faster LangChain agents start here.
End-to-end task completion that leads across p50, p95, and p99, with higher throughput and lower latency where it compounds. Across every scale we tested (1k → 10k → 100k tasks), ABS finishes the workload 14–17% faster than io2, and the lead widens the harder you push it.