Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

12/2/2019

Reading time:1 min

Tuning Cassandra Performance

by John Doe

There are multiple dimensions where Cassandra performance can be tuned. Some of them are described below:Write Operations:Commit log and data dirs (sstables) should be on different disks. Commit log uses sequential write however, if SSTables share the same drive with commit log , I/O contention between commit log & SSTables may deteriorate commit log writes and SSTable reads. Read Operations:A good rule of thumb is 4 concurrent_reads per processor core. May increase the value for systems with fast I/O storage. Cassandra Compaction Contention:Reduce the frequency of memtable flush by increasing the memtable size or preventing too pre-mature flushing. Less frequent memtable flush results in fewer SSTables files and less compaction. Fewer compaction reduces SSTables I/O contention, and therefore improves read operations. Bigger memtables absorb more overwrites for updates to the same keys, and therefore accommodating more read/write operations between each flushes. Memory Cache:Do not increase Cassandra cache size unless there is enough physical memory (RAM). Avoid memory swapping at any cost. Row Cache:The row cache holds the entire content of a row in memory. It provides data caching instead of reading data from the disk. good if column’s data is small so the cache is big enough to hold most of the hotspot data. Bad if column’s data is too large so the cache is not big enough to hold most of the hotspot data. It’s bad for high write/read ratios. By default, it is off. If hit ratio is below 30%, row cache should be disabled. Key Cache Tuning:The key cache holds the location of data in memory for each column family. Its Effective if there are hot data spot & cannot use row cache effectively because of the large column size. By default, Cassandra caches 200000 keys per column family. Use absolute number for keys_cached instead of percentage. JVM:Minimum and Maximum Java Heap Size should be half of available physical memory. Size of young generation heap should be 1/4 of Java Heap. Do NOT increase the size without confirming there are enough available physical memory- Always reserves memory for OS File cache.A detailed understanding of Apache Cassandra is available in this blog post for your perusal!

Illustration Image

There are multiple dimensions where Cassandra performance can be tuned. Some of them are described below:
Write Operations:
Commit log and data dirs (sstables) should be on different disks. Commit log uses sequential write however, if SSTables share the same drive with commit log , I/O contention between commit log & SSTables may deteriorate commit log writes and SSTable reads.
 
Read Operations:
A good rule of thumb is 4 concurrent_reads per processor core. May increase the value for systems with fast I/O storage.
 
Cassandra Compaction Contention:
Reduce the frequency of memtable flush by increasing the memtable size or preventing too pre-mature flushing. Less frequent memtable flush results in fewer SSTables files and less compaction. Fewer compaction reduces SSTables I/O contention, and therefore improves read operations. Bigger memtables absorb more overwrites for updates to the same keys, and therefore accommodating more read/write operations between each flushes.
 
Memory Cache:
Do not increase Cassandra cache size unless there is enough physical memory (RAM). Avoid memory swapping at any cost.
 
Row Cache:
The row cache holds the entire content of a row in memory. It provides data caching instead of reading data from the disk. good if column’s data is small so the cache is big enough to hold most of the hotspot data. Bad if column’s data is too large so the cache is not big enough to hold most of the hotspot data. It’s bad for high write/read ratios. By default, it is off. If hit ratio is below 30%, row cache should be disabled.
 
Key Cache Tuning:
The key cache holds the location of data in memory for each column family. Its Effective if there are hot data spot & cannot use row cache effectively because of the large column size. By default, Cassandra caches 200000 keys per column family. Use absolute number for keys_cached instead of percentage.
 
JVM:
Minimum and Maximum Java Heap Size should be half of available physical memory. Size of young generation heap should be 1/4 of Java Heap. Do NOT increase the size without confirming there are enough available physical memory- Always reserves memory for OS File cache.
A detailed understanding of Apache Cassandra is available in this blog post for your perusal!

Related Articles

monitoring
cassandra
performance

How Do You Monitor Cassandra Performance: Key Metrics to Measure

Rafal Kuć

11/8/2021

cassandra
troubleshooting and tuning

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra