Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

3/22/2019

Reading time:2 min

instaclustr/instarepair

by John Doe

Repairs a Cassandra cluster using read repairs. Supports the same options asnodetool repair where possible.ic-repair [ ...]OptionDescription-u,--username Cassandra username to connect with-pw,--password Cassandra password to connect with-h,--host Host to connect to. Defaults to localhost-p,--port Port to connect to. Defaults to 9042-sslEnable connecting with SSL. Uses JSSE.-f,--file File to load/save repair state.-freshForces a fresh repair (ie. no resuming of previous repair)-reportPrint report of repair state and exit.-nosys,--exclude-systemExclude repair of system keyspaces-s,--steps Steps per token range. Defaults to 1.-pr,--partitioner-rangePerform partitioner range repair.-t,--threads threads>Maximum number parallel repairs. Defaults to number of available processors.-r,--retry <max_retry>Maximum number of retries when there are unavailable nodes.-d,--retry-delay <delay_ms>Base delay between retries when nodes are unavailable.The -ssl flag enables connecting with SSL using JSSE. You can config SSL settingsvia the JSSE system properties.Example of connecting with SSL:ic-repair -Djavax.net.ssl.trustStore=/path/to/client.truststore -Djavax.net.ssl.trustStorePassword=password123 -sslRead repairs can be quite intensive on the cluster therefore you will wantto adjust the maximum number of parallel read repairs with the -t flag. Theoptimal setting may vary between tables since the data model and compactionstrategy have a large impact on the performance of read repairs. Therefore youmay want to run repair for each table separately with different maximumrequest parameters.If a node becomes unavailable the repair application will wait up to max_retrytimes for the node to become available. It will wait delay_ms milliseconds andincrease this exponentially on each subsequent retry. Once the maximum numberof retry attempts is reached the repair will be suspended.The problem with standard repairs occur when there is large amounts ofinconsistency as any differences in the merkle tree requries streaming replicasfrom all nodes involved which can lead to:Running out of disk space due to sending multiple replicasLots of sstables from streaming sstable sections for the inconsistenttoken rangeCompactions falling behind from all the sstables being streamedHigh read latency from an increase in sstables per readHigh number of sstables causes high CPU usage in sorting them into bucketsIn our experience we have seen repairs lead to cluster outages. The aim of thisapplication is to avoid these issues by relying on read repairs which incomparison just send a mutation with the correct version of the row to nodeswithout it. Additionally this application supports suspending and resuming therepair. It can also handle nodes going down. This makes it more robust theneven tools such as Cassandra reaper.Please see https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/ for Instaclustr support status of this project.

Illustration Image

Repairs a Cassandra cluster using read repairs. Supports the same options as nodetool repair where possible.

ic-repair [ ...]

Option Description
-u,--username Cassandra username to connect with
-pw,--password Cassandra password to connect with
-h,--host Host to connect to. Defaults to localhost
-p,--port Port to connect to. Defaults to 9042
-ssl Enable connecting with SSL. Uses JSSE.
-f,--file File to load/save repair state.
-fresh Forces a fresh repair (ie. no resuming of previous repair)
-report Print report of repair state and exit.
-nosys,--exclude-system Exclude repair of system keyspaces
-s,--steps Steps per token range. Defaults to 1.
-pr,--partitioner-range Perform partitioner range repair.
-t,--threads threads> Maximum number parallel repairs. Defaults to number of available processors.
-r,--retry <max_retry> Maximum number of retries when there are unavailable nodes.
-d,--retry-delay <delay_ms> Base delay between retries when nodes are unavailable.

The -ssl flag enables connecting with SSL using JSSE. You can config SSL settings via the JSSE system properties.

Example of connecting with SSL:

ic-repair -Djavax.net.ssl.trustStore=/path/to/client.truststore -Djavax.net.ssl.trustStorePassword=password123 -ssl

Read repairs can be quite intensive on the cluster therefore you will want to adjust the maximum number of parallel read repairs with the -t flag. The optimal setting may vary between tables since the data model and compaction strategy have a large impact on the performance of read repairs. Therefore you may want to run repair for each table separately with different maximum request parameters.

If a node becomes unavailable the repair application will wait up to max_retry times for the node to become available. It will wait delay_ms milliseconds and increase this exponentially on each subsequent retry. Once the maximum number of retry attempts is reached the repair will be suspended.

The problem with standard repairs occur when there is large amounts of inconsistency as any differences in the merkle tree requries streaming replicas from all nodes involved which can lead to:

  • Running out of disk space due to sending multiple replicas
  • Lots of sstables from streaming sstable sections for the inconsistent token range
  • Compactions falling behind from all the sstables being streamed
  • High read latency from an increase in sstables per read
  • High number of sstables causes high CPU usage in sorting them into buckets

In our experience we have seen repairs lead to cluster outages. The aim of this application is to avoid these issues by relying on read repairs which in comparison just send a mutation with the correct version of the row to nodes without it. Additionally this application supports suspending and resuming the repair. It can also handle nodes going down. This makes it more robust then even tools such as Cassandra reaper.

Please see https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/ for Instaclustr support status of this project.

Related Articles

cassandra
tools
sstables

ic-tools for Apache Cassandra SSTables

John Doe

2/17/2023

cassandra
tools

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra