3/13/2020

Reading time:3 min

My Cassandra 2.0 Diagnostics Checklist (Brain Dump) · Los Techies

by John Doe

25 November, 2014. It was a Tuesday.UPDATE:This list needs to be updated and as of today only has been verified with Cassandra 2.0.Original Blog Post:This isn’t remotely complete, but I had a colleague ask me to do a brain dump of my process and this is by and large usually it. I’m sure this will leave more questions than answers for many of you, and I’d like to follow up this post at some point with detail of the why and the how of a lot of it so it can be more useful to beginners. Today this is in a very raw form.logscassandra.yamlcassandra-env.shhistogramstpstatsschema of all tablesnodetool statusulimit -a as user that cassandra is running as (make sure it matches our settings)heap usage under load, is it hitting 3/4 of MAX_HEAP ?Pending compactions (opscenter, will have to look up JMX metric later)size of writes/readsmax partition size (histograms will say)tps per nodelist of queries run against the clusterERRORWARNdroppedGCInspector (see parnews over 200ms and CMS )EmergencyOut Of Memoryheap not set to 8gbparnew no more than 800mb (unless using tunings from https://issues.apache.org/jira/browse/CASSANDRA-8150)row cache being enabled (unless 95% read with even width rows)vnodes set with solrsystem_auth keyspace still with RF 1 and SimpleStrategyflush_writers set to crazy high level (varies by disk configuration, follow documentation advice, double digits is suspect)rcp_address: 0.0.0.0 (slows down certain versions of the driver)multithreaded_compaction: true (almost always wrong)double humplong long tailpartitions with cell count over 100kpartitions with size over in_memory_compaction_limit (default 64mb)dropped (anything) especially mutations.blocked flush writers (if all time is in the 100s it’s usually a problem)STC compaction in use on SSD when customers have a low read SLA (or no defined one).Using RF less than 3 per DCUsing RF more than 3 per DCUsing SimpleStrategy with multiDCread_repair_chance and dclocal_read_repair_chance adding up to more than 0.1 is usually a bad tradeoff.Secondary indexes in use (on writes think write amplification, and on reads think synchronous full cluster scan).Is system_auth replicated correctly? And has it has repair run after this was changed? If you see auth errors in the log..the answer is probably nouse of racks is not even (4 in one rack and 2 in another..that’s a no no)use of racks is not enough to fulfil muliple of RF (if you have 2 racks of 2 and RF 3..how will that get evenly laid out?).if load is wildly off. may not mean anything, but go look on disk, if the cassandra data files are really imbalanced badly figure out why.Run cstress as a baselineRun cstress with appications write size this will often identify bottlenecksDo math on writes and desired TPS. Are the writes saturating the network? Don’t forget bits and bytes are different 🙂Lower ParNew for lower peak 99th, now this flies in the face of what is happening with this Jira (https://issues.apache.org/jira/browse/CASSANDRA-8150), but until I’ve worked through all of that ParNew lower than 800MB is generally a good way to tradeoff throughput for smaller ParNew GCs.Run Fio with the following profile https://gist.github.com/tobert/10685735 (adjusted to match users system). Using these as baseline (http://tobert.org/disk-latency-graphs/).Once you’ve established the system is awesome. Review queries and code.Things to look for BATCH keyword for bulk loading (fine for consistency, but has to take into account SLA hit if the writes are larger than BATCH can handle).If using batches what is the write size?Are you using a thrift driver and destroying one or two nodes because of bad load balancing?Are you using the DataStax driver, is it using the token aware policy (shuffle on with 2.0.8 ideally)?If using the DataStax driver, is it the latest 2.0.x Lots of useful fixes in each release, it really matters.Using LWT? They involve 4 round trips and so ..while they’re awesome they’re slower.←Domain Modeling Around Deletes or “Using Cassandra as a queue even when you know better”DataStax a Love Letter →

Read this article if you want to know more about My Cassandra 2.0 Diagnostics Checklist (Brain Dump) · Los Techies

25 November, 2014. It was a Tuesday.

UPDATE:

This list needs to be updated and as of today only has been verified with Cassandra 2.0.

Original Blog Post:

This isn’t remotely complete, but I had a colleague ask me to do a brain dump of my process and this is by and large usually it. I’m sure this will leave more questions than answers for many of you, and I’d like to follow up this post at some point with detail of the why and the how of a lot of it so it can be more useful to beginners. Today this is in a very raw form.

logs
cassandra.yaml
cassandra-env.sh
histograms
tpstats
schema of all tables
nodetool status
ulimit -a as user that cassandra is running as (make sure it matches our settings)

heap usage under load, is it hitting 3/4 of MAX_HEAP ?
Pending compactions (opscenter, will have to look up JMX metric later)
size of writes/reads
max partition size (histograms will say)
tps per node
list of queries run against the cluster

ERROR
WARN
dropped
GCInspector (see parnews over 200ms and CMS )
Emergency
Out Of Memory

heap not set to 8gb
parnew no more than 800mb (unless using tunings from https://issues.apache.org/jira/browse/CASSANDRA-8150)

row cache being enabled (unless 95% read with even width rows)
vnodes set with solr
system_auth keyspace still with RF 1 and SimpleStrategy
flush_writers set to crazy high level (varies by disk configuration, follow documentation advice, double digits is suspect)
rcp_address: 0.0.0.0 (slows down certain versions of the driver)
multithreaded_compaction: true (almost always wrong)

double hump
long long tail
partitions with cell count over 100k
partitions with size over in_memory_compaction_limit (default 64mb)

dropped (anything) especially mutations.
blocked flush writers (if all time is in the 100s it’s usually a problem)

STC compaction in use on SSD when customers have a low read SLA (or no defined one).
Using RF less than 3 per DC
Using RF more than 3 per DC
Using SimpleStrategy with multiDC

read_repair_chance and dclocal_read_repair_chance adding up to more than 0.1 is usually a bad tradeoff.
Secondary indexes in use (on writes think write amplification, and on reads think synchronous full cluster scan).
Is system_auth replicated correctly? And has it has repair run after this was changed? If you see auth errors in the log..the answer is probably no

use of racks is not even (4 in one rack and 2 in another..that’s a no no)
use of racks is not enough to fulfil muliple of RF (if you have 2 racks of 2 and RF 3..how will that get evenly laid out?).
if load is wildly off. may not mean anything, but go look on disk, if the cassandra data files are really imbalanced badly figure out why.

Run cstress as a baseline
Run cstress with appications write size this will often identify bottlenecks
Do math on writes and desired TPS. Are the writes saturating the network? Don’t forget bits and bytes are different 🙂
Lower ParNew for lower peak 99th, now this flies in the face of what is happening with this Jira (https://issues.apache.org/jira/browse/CASSANDRA-8150), but until I’ve worked through all of that ParNew lower than 800MB is generally a good way to tradeoff throughput for smaller ParNew GCs.
Run Fio with the following profile https://gist.github.com/tobert/10685735 (adjusted to match users system). Using these as baseline (http://tobert.org/disk-latency-graphs/).

Once you’ve established the system is awesome. Review queries and code.

Things to look for BATCH keyword for bulk loading (fine for consistency, but has to take into account SLA hit if the writes are larger than BATCH can handle).
If using batches what is the write size?
Are you using a thrift driver and destroying one or two nodes because of bad load balancing?
Are you using the DataStax driver, is it using the token aware policy (shuffle on with 2.0.8 ideally)?
If using the DataStax driver, is it the latest 2.0.x Lots of useful fixes in each release, it really matters.
Using LWT? They involve 4 round trips and so ..while they’re awesome they’re slower.

←Domain Modeling Around Deletes or “Using Cassandra as a queue even when you know better”

DataStax a Love Letter →

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further