You can enable row caching. It can avoid significant READ load on your disks. First, in cassandra.yaml, define the amount of heap space you want to dedicate to row caching. Then you can activate it per table to set how much data to fit per partition key in the row cache.
Of course, row cache is better suited with read intensive, size limited tables. Eg: on James, this perfectly fits our denormalized metadata tables.
War story: You need to reboot the node when enabling row-cache though row_cache_size_in_mb cassandra.yaml configuration file. nodetool will not be enough.
Once this (annoying) configuration parameter is enabled, you can use CQL per table to enable row cache:
Avoid reads on SSTables, and thus on disks can not be always avoided. In this case, we want to limit the amount of I/Os to the maximum.
Our problem is that writes to given key might be spread across several SSTables. For instance, this is the case for tables with many updates, deletes or with Clustering keys.
We can hopefully trade compaction time and I/Os against read efficiency. The idea is to switch compaction algorithm from Size Tiered Compaction Strategy to Levelled Compaction Strategy. Levelled compaction strategy will by its structure limit the number of SSTables a given key can belong to. Of course, read intensive tables, with updates, deletes or partition keys will benefit a lot from this change. Avoid it on immutable, not clustered tables, as you will not get collocation benefits, but will pay the costs of more expensive compactions.
Date Tiered Compaction Strategy stores data written within a certain period of time in the same SSTable. It’s very useful for time series of tables implying the use of TTL, where entries expire.
Updating the compaction strategy of your tables can be done without downtime, at the cost of running compactions. Warning: this might consume IO and memories, and thus decrease performances when the compaction is running.
You need to modify the CQL table declaration to change the compaction strategy:
use apache_james ;
ALTER TABLE modseq
WITH compaction = { ‘class’ : ‘LeveledCompactionStrategy’ };
For the changes to take effect, you need to compact the SSTables. To force this, you need to use nodetool:
nodetool compact keyspace table
Be careful not to ommit keyspace or table, if you do not want to trigger a global compaction…
For the following compaction on large tables, you can use:
nodetool compactionstats
The rule of thumb for compaction time estimate is, with our hardware (16GB, HDD), approximatively one hour per GB stored on the table.