You can enable row caching. It can avoid significant READ load on your disks. First, in cassandra.yaml, define the amount of heap space you want to dedicate to row caching. Then you can activate it per table to set how much data to fit per partition key in the row cache.

Of course, row cache is better suited with read intensive, size limited tables. Eg: , this perfectly fits our denormalized metadata tables.

War story: You need to reboot the node when enabling row-cache though row_cache_size_in_mb cassandra.yaml configuration file. nodetool will not be enough.

Once this (annoying) configuration parameter is enabled, you can use CQL per table to enable row cache:

Avoid reads on SSTables, and thus on disks can not be always avoided. In this case, we want to limit the amount of I/Os to the maximum.

Our problem is that writes to given key might be spread across several SSTables. For instance, this is the case for tables with many updates, deletes or with Clustering keys.

We can hopefully trade time and I/Os against read efficiency. The idea is to switch compaction algorithm from Size Tiered Compaction Strategy to Levelled Compaction Strategy. Levelled compaction strategy will by its structure limit the number of SSTables a given key can belong to. Of course, read intensive tables, with updates, deletes or partition keys will benefit a lot from this change. Avoid it on immutable, not clustered tables, as you will not get collocation benefits, but will pay the costs of more expensive compactions.

Date Tiered Compaction Strategy stores data written within a certain period of time in the same SSTable. It’s very useful for time series of tables implying the use of TTL, where entries expire.

Updating the compaction strategy of your tables can be done without downtime, at the cost of running compactions. Warning: this might consume IO and memories, and thus decrease performances when the compaction is running.

You need to modify the CQL table declaration to change the compaction strategy:

use apache_james ;
ALTER TABLE modseq
WITH compaction = { ‘class’ : ‘LeveledCompactionStrategy’ };

For the changes to take effect, you need to compact the SSTables. To force this, you need to use nodetool:

nodetool compact keyspace table

Be careful not to ommit keyspace or table, if you do not want to trigger a global compaction…

For the following compaction on large tables, you can use:

nodetool compactionstats

The rule of thumb for compaction time estimate is, with our hardware (16GB, HDD), approximatively one hour per GB stored on the table.

Bloom filters

If you have a high false positive rate, then you might consider increasing the memory dedicated to bloom filters. Again, this parameter . I will not detail it here as it was not a problem for us.

Compression

leads to less data being read and less data being written. If I/O bound, this is a nice trade-off. It turns out default behaviour is to compress SSTables by chunks using LZ4. Of course, this can be tuned on the table level.

Commitlog

An obvious tip is to store your commitlog on a different disk than the SSTables. Thus I/Os are shared across disk.

As a conclusion, Cassandra offers a data model that can be optimised in details. It requires knowing well the way Cassandra works. It also demands knowing well your data model. But significant improvments can be gained.

By applying these configuration changes on the table level, we achieved a x3 reduction in read latencies on some of our read intensive tables. We have a 62% row cache hit rate. And our Cassandra seems now to handle read load better. This tuning session has been both instructive, and promising. We even now have identified new room for improvements, for example on blob storage. Note that with a bad schema, good performances can not be achieved.

You don’t want your RAM to end up like this!

Also please note, that as Cassandra is a JVM application, single node performance is also impacted by your Garbage Collection settings. We decided not to cover this aspect of Cassandra configuration, as we did not have enough free memory to switch to the G1 garbage collector, and we would have ended describing minor settings. You can read this , wich covers the topic pretty well.