Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

3/13/2020

Reading time:3 min

My Cassandra 2.0 Diagnostics Checklist (Brain Dump) · Los Techies

by John Doe

25 November, 2014. It was a Tuesday.UPDATE:This list needs to be updated and as of today only has been verified with Cassandra 2.0.Original Blog Post:This isn’t remotely complete, but I had a colleague ask me to do a brain dump of my process and this is by and large usually it. I’m sure this will leave more questions than answers for many of you, and I’d like to follow up this post at some point with detail of the why and the how of a lot of it so it can be more useful to beginners. Today this is in a very raw form.logscassandra.yamlcassandra-env.shhistogramstpstatsschema of all tablesnodetool statusulimit -a as user that cassandra is running as (make sure it matches our settings)heap usage under load, is it hitting 3/4 of MAX_HEAP ?Pending compactions (opscenter, will have to look up JMX metric later)size of writes/readsmax partition size (histograms will say)tps per nodelist of queries run against the clusterERRORWARNdroppedGCInspector (see parnews over 200ms and CMS )EmergencyOut Of Memoryheap not set to 8gbparnew no more than 800mb (unless using tunings from https://issues.apache.org/jira/browse/CASSANDRA-8150)row cache being enabled (unless 95% read with even width rows)vnodes set with solrsystem_auth keyspace still with RF 1 and SimpleStrategyflush_writers set to crazy high level (varies by disk configuration, follow documentation advice, double digits is suspect)rcp_address: 0.0.0.0 (slows down certain versions of the driver)multithreaded_compaction: true (almost always wrong)double humplong long tailpartitions with cell count over 100kpartitions with size over in_memory_compaction_limit (default 64mb)dropped (anything) especially mutations.blocked flush writers (if all time is in the 100s it’s usually a problem)STC compaction in use on SSD when customers have a low read SLA (or no defined one).Using RF less than 3 per DCUsing RF more than 3 per DCUsing SimpleStrategy with multiDCread_repair_chance and dclocal_read_repair_chance adding up to more than 0.1 is usually a bad tradeoff.Secondary indexes in use (on writes think write amplification, and on reads think synchronous full cluster scan).Is system_auth replicated correctly? And has it has repair run after this was changed? If you see auth errors in the log..the answer is probably nouse of racks is not even (4 in one rack and 2 in another..that’s a no no)use of racks is not enough to fulfil muliple of RF (if you have 2 racks of 2 and RF 3..how will that get evenly laid out?).if load is wildly off. may not mean anything, but go look on disk, if the cassandra data files are really imbalanced badly figure out why.Run cstress as a baselineRun cstress with appications write size this will often identify bottlenecksDo math on writes and desired TPS. Are the writes saturating the network? Don’t forget bits and bytes are different 🙂Lower ParNew for lower peak 99th, now this flies in the face of what is happening with this Jira (https://issues.apache.org/jira/browse/CASSANDRA-8150), but until I’ve worked through all of that ParNew lower than 800MB is generally a good way to tradeoff throughput for smaller ParNew GCs.Run Fio with the following profile https://gist.github.com/tobert/10685735 (adjusted to match users system). Using these as baseline (http://tobert.org/disk-latency-graphs/).Once you’ve established the system is awesome. Review queries and code.Things to look for BATCH keyword for bulk loading (fine for consistency, but has to take into account SLA hit if the writes are larger than BATCH can handle).If using batches what is the write size?Are you using a thrift driver and destroying one or two nodes because of bad load balancing?Are you using the DataStax driver, is it using the token aware policy (shuffle on with 2.0.8 ideally)?If using the DataStax driver, is it the latest 2.0.x Lots of useful fixes in each release, it really matters.Using LWT? They involve 4 round trips and so ..while they’re awesome they’re slower.←Domain Modeling Around Deletes or “Using Cassandra as a queue even when you know better”DataStax a Love Letter →

Illustration Image
25 November, 2014. It was a Tuesday.

UPDATE:

This list needs to be updated and as of today only has been verified with Cassandra 2.0.

Original Blog Post:

This isn’t remotely complete, but I had a colleague ask me to do a brain dump of my process and this is by and large usually it. I’m sure this will leave more questions than answers for many of you, and I’d like to follow up this post at some point with detail of the why and the how of a lot of it so it can be more useful to beginners. Today this is in a very raw form.

  • logs
  • cassandra.yaml
  • cassandra-env.sh
  • histograms
  • tpstats
  • schema of all tables
  • nodetool status
  • ulimit -a as user that cassandra is running as (make sure it matches our settings)
  • heap usage under load, is it hitting 3/4 of MAX_HEAP ?
  • Pending compactions (opscenter, will have to look up JMX metric later)
  • size of writes/reads
  • max partition size (histograms will say)
  • tps per node
  • list of queries run against the cluster
  • ERROR
  • WARN
  • dropped
  • GCInspector (see parnews over 200ms and CMS )
  • Emergency
  • Out Of Memory
  • heap not set to 8gb
  • parnew no more than 800mb (unless using tunings from https://issues.apache.org/jira/browse/CASSANDRA-8150)
  • row cache being enabled (unless 95% read with even width rows)
  • vnodes set with solr
  • system_auth keyspace still with RF 1 and SimpleStrategy
  • flush_writers set to crazy high level (varies by disk configuration, follow documentation advice, double digits is suspect)
  • rcp_address: 0.0.0.0 (slows down certain versions of the driver)
  • multithreaded_compaction: true (almost always wrong)
  • double hump
  • long long tail
  • partitions with cell count over 100k
  • partitions with size over in_memory_compaction_limit (default 64mb)
  • dropped (anything) especially mutations.
  • blocked flush writers (if all time is in the 100s it’s usually a problem)
  • STC compaction in use on SSD when customers have a low read SLA (or no defined one).
  • Using RF less than 3 per DC
  • Using RF more than 3 per DC
  • Using SimpleStrategy with multiDC
  • read_repair_chance and dclocal_read_repair_chance adding up to more than 0.1 is usually a bad tradeoff.
  • Secondary indexes in use (on writes think write amplification, and on reads think synchronous full cluster scan).
  • Is system_auth replicated correctly? And has it has repair run after this was changed? If you see auth errors in the log..the answer is probably no
  • use of racks is not even (4 in one rack and 2 in another..that’s a no no)
  • use of racks is not enough to fulfil muliple of RF (if you have 2 racks of 2 and RF 3..how will that get evenly laid out?).
  • if load is wildly off. may not mean anything, but go look on disk, if the cassandra data files are really imbalanced badly figure out why.
  • Run cstress as a baseline
  • Run cstress with appications write size this will often identify bottlenecks
  • Do math on writes and desired TPS. Are the writes saturating the network? Don’t forget bits and bytes are different 🙂
  • Lower ParNew for lower peak 99th, now this flies in the face of what is happening with this Jira (https://issues.apache.org/jira/browse/CASSANDRA-8150), but until I’ve worked through all of that ParNew lower than 800MB is generally a good way to tradeoff throughput for smaller ParNew GCs.
  • Run Fio with the following profile https://gist.github.com/tobert/10685735 (adjusted to match users system). Using these as baseline (http://tobert.org/disk-latency-graphs/).

Once you’ve established the system is awesome. Review queries and code.

  • Things to look for BATCH keyword for bulk loading (fine for consistency, but has to take into account SLA hit if the writes are larger than BATCH can handle).
  • If using batches what is the write size?
  • Are you using a thrift driver and destroying one or two nodes because of bad load balancing?
  • Are you using the DataStax driver, is it using the token aware policy (shuffle on with 2.0.8 ideally)?
  • If using the DataStax driver, is it the latest 2.0.x Lots of useful fixes in each release, it really matters.
  • Using LWT? They involve 4 round trips and so ..while they’re awesome they’re slower.

Related Articles

garbage.collection
java
cassandra

Tuning Java resources

John Doe

7/21/2021

cassandra
troubleshooting and tuning

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra