10/6/2021

Reading time:3 min

Why You Shouldn't Run Nodetool Removenode

by John Doe

There comes a time when your Apache Cassandra implementation will require scaling down or general maintenance (significant drop in disk usage or a number of reads and writes per second, etc.). Removing a node, whether it is working or not, falls under this scaling and maintenance. When scaling down, it is necessary to have the tools to remove a dead or live node. When such a case arises, the nodetool utility provides us with the following options: nodetool decommission, nodetool removenode, and nodetool assassinate.Decommission steams data from the decommissioned node. This guarantees consistency before, during, and after the operation.nodetool decommissionData owned by this node will be streamed to the other nodesCassandra process will still be running on this node but outside the clusterThis process can be killed or the instance can be shut down safely at this pointDon’t decommission a node without good reasonRemovenode alters the token allocation and streams data from the remaining replicas, which entertains the possibility of violating consistency and losing data. Removenode should only be used as a last resort where decommission cannot be used. Note that removenode will trigger streaming to other nodes in the cluster. On an already overloaded cluster, this will place additional strain leading to further instability.nodetool removenodeData owned by this node will be streamed to the new token ranges from live replicas within the cluster. Upon completion, the node will be removed.Assassinate should be viewed as an absolute last resort option when forcing a node out of a cluster, and you should fully understand the implications. There is a significant chance of data loss in this case, especially when replication factors and consistency levels are not in line with best practices. This is due to the fact that the assassinate command will simply drop the node without streaming any data from other replicas or the leaving node itself.We have seen numerous cases of removenode and/or assassinate being used during an outage situation with disastrous results. Removenode is a particularly bad idea in this situation as removenode places additional load on the UN nodes in terms of streaming (compounding the issue of loss of capacity while having a down node). Also when a cluster is unstable, like during an outage situation, you shouldn’t be making token changes because messing with token ownership can result in data loss or data unavailability on all nodes. Again, we have been involved in too many incidents where a customer has found themselves in this situation. Had they persisted with recovering the down node, they would have been in a much better position for recovery and the overall outage time would have been greatly reduced. What you should do during a node outage is fix the node or follow the replace procedure if you can’t fix the down node. Never use removenode to recover a node that has failed.Downsizing A valid use for removenode is for permanent removal of a node, i.e. downsizing. However, this is still not recommended, and you should use nodetool decommission instead, as decommission is consistent and removenode isn’t. Decommission streams data from the leaving node, so any special data that node has falls onto other nodes, reducing the chance of data loss. Removenode streams from any replica but not the leaving node, so it can miss data that resides only on the leaving node, or in the case of quorum writes, break quorum guarantees.Replacing a Node Starting a new node by indicating to Cassandra that it is a replacementNote: Make sure that the original node is stopped (DN) before beginning the replace operation. If replacing an existing node, everything in the Cassandra DIR needs to be deleted (delete data replace), and autho_bootstrap needs to be set to true so the node will stream off other nodes within the cluster.Similar procedure to adding a nodeConfigure the new node similarly to the old node (cassandra.yaml, rackdc etc.)Use the same Cassandra versionEdit cassandra-env.sh or jvm.option as appropriate with:JVM_OPTS=”$JVM_OPTS -Dcassandra.replace_address=<old_ip_address>”Once the node is UN, remove the JVM_OPTS lineAdjust the seeds as neededIf you need to decommission:nodetool decommissionCheck the joining status with nodetool status from another node in the cluster: nodetool statusCheck the streaming status with nodetool netstats from another node in the cluster: nodetool netstatsYou should see some streaming at this point.Have more questions on configuring node types or other best practices for Apache Cassandra? Reach out to schedule a consultation with one of our experts.Cassandra Consulting ServicesLet us help you develop and deploy high performance and continually available solutions with limitless scale.

Read this article if you want to know more about Why You Shouldn't Run Nodetool Removenode

There comes a time when your Apache Cassandra implementation will require scaling down or general maintenance (significant drop in disk usage or a number of reads and writes per second, etc.). Removing a node, whether it is working or not, falls under this scaling and maintenance. When scaling down, it is necessary to have the tools to remove a dead or live node. When such a case arises, the nodetool utility provides us with the following options: nodetool decommission, nodetool removenode, and nodetool assassinate.

Decommission steams data from the decommissioned node. This guarantees consistency before, during, and after the operation.

nodetool decommission

Data owned by this node will be streamed to the other nodes
Cassandra process will still be running on this node but outside the cluster
This process can be killed or the instance can be shut down safely at this point
Don’t decommission a node without good reason

Removenode alters the token allocation and streams data from the remaining replicas, which entertains the possibility of violating consistency and losing data. Removenode should only be used as a last resort where decommission cannot be used. Note that removenode will trigger streaming to other nodes in the cluster. On an already overloaded cluster, this will place additional strain leading to further instability.

nodetool removenode

Data owned by this node will be streamed to the new token ranges from live replicas within the cluster. Upon completion, the node will be removed.

Assassinate should be viewed as an absolute last resort option when forcing a node out of a cluster, and you should fully understand the implications. There is a significant chance of data loss in this case, especially when replication factors and consistency levels are not in line with best practices. This is due to the fact that the assassinate command will simply drop the node without streaming any data from other replicas or the leaving node itself.

We have seen numerous cases of removenode and/or assassinate being used during an outage situation with disastrous results. Removenode is a particularly bad idea in this situation as removenode places additional load on the UN nodes in terms of streaming (compounding the issue of loss of capacity while having a down node). Also when a cluster is unstable, like during an outage situation, you shouldn’t be making token changes because messing with token ownership can result in data loss or data unavailability on all nodes. Again, we have been involved in too many incidents where a customer has found themselves in this situation. Had they persisted with recovering the down node, they would have been in a much better position for recovery and the overall outage time would have been greatly reduced.

What you should do during a node outage is fix the node or follow the replace procedure if you can’t fix the down node. Never use removenode to recover a node that has failed.

Downsizing

A valid use for removenode is for permanent removal of a node, i.e. downsizing.

However, this is still not recommended, and you should use nodetool decommission instead, as decommission is consistent and removenode isn’t. Decommission streams data from the leaving node, so any special data that node has falls onto other nodes, reducing the chance of data loss. Removenode streams from any replica but not the leaving node, so it can miss data that resides only on the leaving node, or in the case of quorum writes, break quorum guarantees.

Replacing a Node

Starting a new node by indicating to Cassandra that it is a replacement

Note: Make sure that the original node is stopped (DN) before beginning the replace operation. If replacing an existing node, everything in the Cassandra DIR needs to be deleted (delete data replace), and autho_bootstrap needs to be set to true so the node will stream off other nodes within the cluster.

Similar procedure to adding a node
Configure the new node similarly to the old node (cassandra.yaml, rackdc etc.)
Use the same Cassandra version
Edit cassandra-env.sh or jvm.option as appropriate with:
- JVM_OPTS=”$JVM_OPTS -Dcassandra.replace_address=<old_ip_address>”
Once the node is UN, remove the JVM_OPTS line
Adjust the seeds as needed

If you need to decommission:

nodetool decommission

Check the joining status with nodetool status from another node in the cluster: nodetool status

Check the streaming status with nodetool netstats from another node in the cluster: nodetool netstats

You should see some streaming at this point.

Have more questions on configuring node types or other best practices for Apache Cassandra? Reach out to schedule a consultation with one of our experts.

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us