Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

1/25/2019

Reading time:1 min

How to performance tune data streaming activities like repair and bootstrap

by John Doe

When running streaming processes like repair and bootstrap it's possible to tune the performance. You can throttle performance if your nodes are becoming overloaded and unthrottle performance to allow repair or bootstrap to complete more quickly. SymptomsWhile repair or bootstrap is running, if the output of top shows your nodes are under load, the iostat command shows lots of heavy I/O on your disks and your network utilization is high you may want to throttle the repair or bootstrap process.On the contrary, you may find repair or bootstrap is running and top shows very little load on the nodes, iostat is showing little I/O on your disks and the network has lots of available bandwidth. Under these circumstances you may want to unthrottle performance to allow repair or bootstrap to complete more quickly.CauseThe repair process performs validation compaction and streams data from other nodes in the cluster. The bootstrap process only streams data from other nodes in the cluster. The cassandra.yaml contains these parameters that affect the rate of compaction and stream throughput on the network:compaction_throughput_mb_per_sec (Default 16)stream_throughput_outbound_megabits_per_sec (Default 200)Depending on your environment, the default values may be configured too high or too low. SolutionChanging the compaction_throughput_mb_per_sec and stream_throughput_outbound_megabits_per_sec parameters in the cassandra.yaml requires a restart of DSE for the change to be picked up. However, you can adjust these parameters on the fly using these nodetool commands:nodetool setcompactionthroughputnodetool setstreamthroughputSetting both of these parameters to 0 unthrottles compaction and streaming of data on the network, but you need to be careful not to overload your nodes. Set the stream_throughput_outbound_megabits_per_sec parameter to the same value on all your nodes because as stated by the name, it tunes the outbound traffic from the node. You can obtain the current setting for these values using these nodetool commands:nodetool getcompactionthroughputnodetool getstreamthroughputTo determine the right values for these parameters, you may want to make small adjustments and monitor the affect on the your nodes before making further changes. Once the repair or bootstrap process is complete you may want to revert these parameters to their default values.  

Illustration Image

When running streaming processes like repair and bootstrap it's possible to tune the performance. You can throttle performance if your nodes are becoming overloaded and unthrottle performance to allow repair or bootstrap to complete more quickly. 

Symptoms

While repair or bootstrap is running, if the output of top shows your nodes are under load, the iostat command shows lots of heavy I/O on your disks and your network utilization is high you may want to throttle the repair or bootstrap process.

On the contrary, you may find repair or bootstrap is running and top shows very little load on the nodes, iostat is showing little I/O on your disks and the network has lots of available bandwidth. Under these circumstances you may want to unthrottle performance to allow repair or bootstrap to complete more quickly.

Cause

The repair process performs validation compaction and streams data from other nodes in the cluster. The bootstrap process only streams data from other nodes in the cluster. The cassandra.yaml contains these parameters that affect the rate of compaction and stream throughput on the network:

compaction_throughput_mb_per_sec (Default 16)

stream_throughput_outbound_megabits_per_sec (Default 200)

Depending on your environment, the default values may be configured too high or too low. 

Solution

Changing the compaction_throughput_mb_per_sec and stream_throughput_outbound_megabits_per_sec parameters in the cassandra.yaml requires a restart of DSE for the change to be picked up. However, you can adjust these parameters on the fly using these nodetool commands:

nodetool setcompactionthroughput

nodetool setstreamthroughput

Setting both of these parameters to 0 unthrottles compaction and streaming of data on the network, but you need to be careful not to overload your nodes. Set the stream_throughput_outbound_megabits_per_sec parameter to the same value on all your nodes because as stated by the name, it tunes the outbound traffic from the node. 

You can obtain the current setting for these values using these nodetool commands:

nodetool getcompactionthroughput

nodetool getstreamthroughput

To determine the right values for these parameters, you may want to make small adjustments and monitor the affect on the your nodes before making further changes. Once the repair or bootstrap process is complete you may want to revert these parameters to their default values.  

Related Articles

node
hybrid.cloud
datastax

GitHub - IBM/datastax-cassandra-clickstream: Use DataStax Enterprise built on Apache Cassandra as a clickstream database

IBM

12/8/2023

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra