ScyllaDB R&D Year in Review: Elasticity, Efficiency, and Real-Time Vector Search
Author: ScyllaDB
Originally Sourced from: https://www.scylladb.com/2026/03/31/scylladb-rd-year-in-review-elasticity-efficiency-and-real-time-vector-search/
2025 was a busy year for ScyllaDB R&D. We shipped one major release, three minor releases, and a continuous stream of updates across both ScyllaDB and ScyllaDB Cloud. Our users are most excited about elasticity, vector search, and reduced total cost of ownership. We also made nice progress on years-long projects like strong consistency and object storage I raced through a year-in-review recap in my recent Monster Scale Summit talk (time limits, sorry), which you can watch on-demand. But I also want to share a written version that links to related talks and blog posts with additional detail. Feel free to choose your own adventure. Watch the talk Tablets: Fast Elasticity ScyllaDB’s “tablets” data distribution approach is now fully supported across all ScyllaDB capabilities, for both CQL and Alternator (our DynamoDB-compatible API). Tablets are the key technology that dynamically scales clusters in and out. It’s been in production for over a year now; some customers use it to scale for the usual daily fluctuations, others rely on its fast responses to workload volatility like expected events, unpredictable spikes, etc. Right now, the autoscaling lets users maximize their disks, which reduces costs. Soon, we’ll automatically scale based on workload characteristics as well as storage. A few things make this genuinely different from the vNodes design that we originally inherited from Cassandra, but replaced with tablets: When you add new nodes, load balancing starts immediately and in parallel. No cleanup operation is needed. There’s also no resharding; rebalancing is automated. Moreover, we shifted from mutation-based streaming to file-based streaming: stream the entire SSTable files without deserializing them into mutation fragments and reserializing them back into SSTables on receiving nodes. As a result, 3X less data is streamed over the network and less CPU is consumed, especially for data models that contain small cells. This change provides up to 25X faster streaming. As Avi’s talk explains, we can scale out in minutes and the tablets load balancing algorithm balances nodes based on their storage consumption. That means more usable space on an ongoing basis – up to 90% disk utilization. Also noteworthy: you can now scale with different node sizes – so you can increase capacity in much smaller increments. You can add tiny instances first, then replace them with larger ones if needed. That means you rarely pay for unused capacity. For example, before, if you started with an i4i.16xlarge node that had 15 TB of storage and you hit 70% utilization, you had to launch another i4i.16xlarge – adding 15 TB at once. Now, you might add two xlarge nodes (1.8 TB each) first. Then, if you need more storage, you add more small nodes, and eventually replace them with larger nodes. Vector Search: Real-Time AI Customers like Tripadvisor, ShareChat, Medium, and Agoda have been using ScyllaDB as a fast feature store in their AI pipelines for several years now. Now, ScyllaDB also has real-time vector search, which is embedded in ScyllaDB Cloud. Our vector search takes advantage of ScyllaDB’s unique architecture. Technologies like our super-fast Rust driver, CDC, and tablets help us deliver real-time AI queries. We can handle datasets of 1 billion vectors with P99 latency as low as 1.7 ms and throughput up to 252,000 QPS. Dictionary-Based Compression I’ve always been interested in compression, even as a student. It’s particularly interesting for databases, though. It allows you to deliver more with less, so it’s yet another way to reduce costs. ScyllaDB Cloud has always had compression enabled by default, both for data at rest and for data in transit. We compress our SSTables on disk, we compress traffic between nodes, and optionally between clients and nodes. In 2025, we improved its efficiency up to 80% by enabling dictionary-based compression. The compressor gains better context of the data being compressed. That gives it higher compression ratios and, in some cases, even better performance. Our two most popular compression algorithms, LZ4 and ZSTD, both benefit from dictionaries now. It works by sampling some data, from which it creates a dictionary and then uses it to further compress the data. The graph on the lower left shows the impact of enabling dictionary compression for network traffic. Both compression algorithms are working and both drop nicely – from 70% to less than 50% for LZ4 and from around 50% to 33% for ZSTD. The table on the lower right shows a similar change. It shows the benefit for disk utilization on a customer’s production cluster, essentially cutting down storage consumption from 50% to less than 30%. Note that the 50% was an already compressed dataset. With this new compression, we further compressed it to less than 30%, a significant saving. Raft-Based Topology We’ve been working on Raft and features built on top of it for several years now. Currently, we use Raft for multiple purposes. Schema consistency was the first step, but topology is the more interesting improvement. With Raft-based fast parallel scaling and safe schema updates, we’re ready to finally retire Gossip-based topology. Other features that use Raft-based topology are authentication, service levels (also known as workload prioritization), and tablets. We are actively working on making strong consistency available for data as well. ScyllaDB X Cloud ScyllaDB X Cloud is the next generation of ScyllaDB Cloud, a truly elastic database-as-a-service. It builds upon innovation in both ScyllaDB core (such as tablets and Raft-based topology), as well as innovation and improvements in ScyllaDB Cloud itself (such as parallel setup of cloud resources, reduced boot time, and the new resource resizing algorithm) to provide immediate elasticity to clusters. Curious how it works? It’s quite simple, really. You just select two important parameters for your cluster, and those define the minimum values for resources: The minimum vCPU count, which is a way to measure the ‘horsepower’ that you initially want to reserve for your clusters. The minimum storage size. And that’s it. You can do this via the API or UI. In the UI, it looks like this: If you wish, you can also limit it to a specific instance family. Now, let’s see how this scaling looks in action. A few things to note here: There are three somewhat large nodes and three additional nodes that are smaller. Some of the tablets are not equal, and that’s perfectly fine. The load was very high initially, then additional load moved gradually to the new nodes. The workload itself, in terms of requests, didn’t change. It changed which nodes it is going to, but the overall value remained the same. The average latency, the P95 latency, and the P99 latency are all great: even the P99s are in the single-digit milliseconds. And here’s a look at additional ScyllaDB Cloud updates before we move on: A Few More Things We Shipped Last Year Backup is a critical feature of any database, and we run it regularly in ScyllaDB Cloud for our customers. In the past, we used an external utility that backed up snapshots of the data. That was somewhat inefficient. Also, it competed with the memory, CPU, disk, and network resources that ScyllaDB was consuming – potentially affecting the throughput and latency of user workloads. We reimplemented the backup client so it now runs inside ScyllaDB’s scheduler and cooperates with the rest of the system. The result is minimal impact on user workload and 11X faster execution that scales linearly with the number of shards per node. On the infrastructure side, the new AWS Graviton i8g instances proved themselves this year. We measured up to 2X throughput and lower latency at the same price, along with higher usable storage from improved compression and utilization. We don’t embrace every new instance type since we have very specific and demanding requirements. However, when we see clear value like this, we encourage customers to move to newer generations. On the security side, all new clusters now have their data encrypted at rest by default. When creating a cluster, you can either use your own key (known as ‘BYOK’) or use the ScyllaDB Cloud key. We also reached general availability of our Rust driver. This is interesting because it’s our fastest driver. Also, its binding is the foundation for our grand plan: unifying our drivers under the Rust infrastructure. We started with a new C++ driver. Next up (almost ready) is our NodeJS driver – and we’ll continue with others as well. We also released our C# driver (another popular demand) and across the board improved our drivers’ reliability, capabilities, compatibility, and performance. Finally, our Alternator clients in various languages received some important updates as well, such as network compression to both requests and responses. What’s Next for ScyllaDB Finally, let’s close with a glimpse at some of the major things we are expecting to deliver in 2026: An efficient, easy-to-use, and online process for migrating from vNodes to tablets. Strong data consistency for data (beyond metadata and internal features). A fast dedicated data path to key-value data Additional capabilities focused on 1) improved total cost of ownership and 2) more real-time AI features and integrations. There’s a lot to look forward to! We’d love to answer questions or hear your feedback.