How ScyllaDB performs on the new I8g and I8ge instances, across different workload types
Let’s start with the bottom line. For ScyllaDB, the new Graviton4-based i8g instances improve i4i throughput by up to 2x with better latency – and the i8ge improves i3en throughput by up to 2x with better latency. Benchmarks also show single-digit millisecond latency during maintenance operations like scaling. Fast and smooth scaling is an important part of
the new ScyllaDB X Cloud offering.
The chart below shows ScyllaDB max through under a latency SLA of 10ms latency for different workloads, for the old i4i, i3en and the new i8g, i8ge.
AWS recently launched the I8g and I8ge storage-optimized EC2 instances powered by
AWS Graviton4 CPUs and
3rd-generation AWS Nitro SSDs. They’re designed for I/O-intensive workloads like real-time databases, search, and analytics (so a nice fit for ScyllaDB).
Instance Family
Use Case
Number of vCPUs per instance
Storage
i8g
Compute bound
2 to 96
0.5 to 22.5 TB
i8ge
Storage bound
2 to 192
1.25 to 120 TB
Reduced TCO in ScyllaDB Cloud
Based on our performance results, ScyllaDB users migrating to Graviton4 can
reduce infrastructure requirements by up to 50% compared to i4i and i3en previous generations. This translates into significantly lower total cost of ownership (TCO) by requiring fewer nodes to sustain the same workload.
These improvements stem from a few factors – both in the new instances themselves, and in the match between ScyllaDB and these instances.
The new I8g architecture features:
vCPU-to-core mapping: On x86, each vCPU uses half a physical core (a hyperthread); for i8g (ARM), each core matches one physical core
Larger caches: 64kB instruction cache and 64kB data cache, compared to 32/48kB on Intel (shared between the two hyperthreads)
Faster storage and networking (see spec above)
In addition, ScyllaDB’s design allows it to take full advantage of the new server types:
The shard-per-core architecture scales with linear performance to any number of cores
The IO scheduler can take full advantage of the 3rd-generation AWS Nitro SSD, fully utilizing the higher IO rate, and lower latency without overloading it and increasing latency
ARM’s relaxed memory model suits
Seastar applications. Since locks and fences are rare, the memory subsystem has more opportunities to reorder memory accesses to optimize performance.
What this means for you
I8g and i8ge
are now available on ScyllaDB Cloud.
If you’re running ScyllaDB Cloud, the net impact is:
Compute-bound workloads: Move from I4i to I8g. This should provide up to 2x throughput at the same ScyllaDB Cloud price.
Storage-bound workloads: Move from I3en to I8ge. Here, you should expect up to 2x higher throughput at the same ScyllaDB Cloud price. Note that using the new ScyllaDB dictionary-based compression can lower the storage cost further.
For both use cases, ScyllaDB can keep the 10ms P99 latency SLA during maintenance operations, including scaling out and scaling down.
What we measured
Max Throughput: The maximum requests per second the database can handle
Max Throughput under SLA: The maximum request per second
under a P99 latency of 10ms. Only throughput with latency below this SLA counts. This throughput can be sustained under any operation, like scaling and repair. This is the number you should use when sizing your ScyllaDB Database on i8g instances.
P99 Latency: Measures the p99 latency for the Max Throughput under SLA
Results
Read Workload – cached data
Cached data: working set size < available RAM, resulting in close to 100% cache hit rate.
Instance type
Max throughput
Max Throughput Under Latency SLA
Improvement
P99 in ms
i4i.4xlarge
1,062,578
750,000
100%
7.84
i8g.4xlarge
1,434,215
1,300,000
135%
6.29
i3en.3xlarge
585,975
550,000
100%
4.37
i8ge.3xlarge
962,504
800,000
164%
6.38
Read Workload – non-cached data, storage only
Non-cached data: working set size >> available RAM, resulting in 0% cache hit rate. When most of the data is not cached, storage becomes a significant factor for performance.
Instance type
Max throughput
Max Throughput Under Latency SLA
Improvement
P99 in ms
i4i.4xlarge
218,674
210,000
100%
4.56
i8g.4xlarge
444,548
300,000
203%
4.24
i3en.3xlarge
145,702
140,000
100%
6.83
i8ge.3xlarge
259,693
255,000
178%
7.95
Write Workload
Instance type
Max throughput
Max Throughput Under Latency SLA
Improvement
P99 in ms
i4i.4xlarge
289,154
150,000
100%
2.4
i8g.4xlarge
689,474
600,000
238%
4.02
i3en.3xlarge
217,072
200,000
100%
5.42
i8ge.3xlarge
452,968
400,000
209%
3.41
Tests under maintenance operations
ScyllaDB takes pride in testing under realistic use cases, including scaling out and in, repair, backups, and various failure tests.
The following results represent the P99 average latency (across all nodes) of different maintenance operations on a 3-node cluster of i8ge.3xlarge. It’s using the same setup as above.
Setup
ScyllaDB version: 2025.3.1-20250907.2bbf3cf669bb
DB node amount: 3
DB instance types: i8ge.3xlarge
Loader node amount: 4
Loader instance type: c5.2xlarge
Throughput: Read 41K, write 81K, Mixed 35K
Results
Read Test: Read Latency
Operation
Read P99 latency in ms
Base: Steady State
0.95
During Repair
4.92
During Add Node (out scale)
2.68
During Replace Node
3.10
During Decommission Node (downscale)
2.44
Write Test: Write Latency
Operation
Write P99 latency in ms
Steady State
2.22
During Repair
3.24
Add Node (scale out)
2.49
Replace Node
3.07
Decommission Node (downscale)
2.37
Mixed Test: Write and Read Latency
Operation
Write P99 Latency in ms
Read P99 Latency in ms
Steady state
2.03
2.11
During Repair
3.21
4.70
Add Node (scale out)
2.19
2.71
Replace Node
3.00
3.37
Decommission Node (downscale)
2.20
3.05
The results indicate that ScyllaDB can meet the latency SLA under maintenance operations. This is critical for ScyllaDB Cloud, and in particular ScyllaDB X Cloud, where scaling out and in scaling are automatic, and can happen multiple times per day. It’s also critical in unexpected failure cases, when a node must be replaced rapidly, without hurting availability and the latency SLA.
Test Setup
ScyllaDB cluster
3-node cluster
I4i.4xlarge vs. i8g.4xlarge
I3en.3xlarge vs. i8ge.3xlarge
Loaders
Loader node amount: 4
Loader instance type: c7i.8xlarge
Workload
Replication Factor (RF): 3
Consistency Level (CL): Quorum
Data size 650GB for read/mixed, 1.5T for write