Repairs a Cassandra cluster using read repairs. Supports the same options as nodetool repair where possible.
ic-repair [ ...]
Option | Description |
---|---|
-u,--username | Cassandra username to connect with |
-pw,--password | Cassandra password to connect with |
-h,--host | Host to connect to. Defaults to localhost |
-p,--port | Port to connect to. Defaults to 9042 |
-ssl | Enable connecting with SSL. Uses JSSE. |
-f,--file | File to load/save repair state. |
-fresh | Forces a fresh repair (ie. no resuming of previous repair) |
-report | Print report of repair state and exit. |
-nosys,--exclude-system | Exclude repair of system keyspaces |
-s,--steps | Steps per token range. Defaults to 1. |
-pr,--partitioner-range | Perform partitioner range repair. |
-t,--threads threads> | Maximum number parallel repairs. Defaults to number of available processors. |
-r,--retry <max_retry> | Maximum number of retries when there are unavailable nodes. |
-d,--retry-delay <delay_ms> | Base delay between retries when nodes are unavailable. |
The -ssl flag enables connecting with SSL using JSSE. You can config SSL settings via the JSSE system properties.
Example of connecting with SSL:
ic-repair -Djavax.net.ssl.trustStore=/path/to/client.truststore -Djavax.net.ssl.trustStorePassword=password123 -ssl
Read repairs can be quite intensive on the cluster therefore you will want to adjust the maximum number of parallel read repairs with the -t flag. The optimal setting may vary between tables since the data model and compaction strategy have a large impact on the performance of read repairs. Therefore you may want to run repair for each table separately with different maximum request parameters.
If a node becomes unavailable the repair application will wait up to max_retry times for the node to become available. It will wait delay_ms milliseconds and increase this exponentially on each subsequent retry. Once the maximum number of retry attempts is reached the repair will be suspended.
The problem with standard repairs occur when there is large amounts of inconsistency as any differences in the merkle tree requries streaming replicas from all nodes involved which can lead to:
- Running out of disk space due to sending multiple replicas
- Lots of sstables from streaming sstable sections for the inconsistent token range
- Compactions falling behind from all the sstables being streamed
- High read latency from an increase in sstables per read
- High number of sstables causes high CPU usage in sorting them into buckets
In our experience we have seen repairs lead to cluster outages. The aim of this application is to avoid these issues by relying on read repairs which in comparison just send a mutation with the correct version of the row to nodes without it. Additionally this application supports suspending and resuming the repair. It can also handle nodes going down. This makes it more robust then even tools such as Cassandra reaper.
Please see https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/ for Instaclustr support status of this project.