There comes a time when your Apache Cassandra implementation will require scaling down or general maintenance (significant drop in disk usage or a number of reads and writes per second, etc.). Removing a node, whether it is working or not, falls under this scaling and maintenance. When scaling down, it is necessary to have the tools to remove a dead or live node. When such a case arises, the nodetool utility provides us with the following options: nodetool decommission, nodetool removenode, and nodetool assassinate.
Decommission steams data from the decommissioned node. This guarantees consistency before, during, and after the operation.
nodetool decommission
- Data owned by this node will be streamed to the other nodes
- Cassandra process will still be running on this node but outside the cluster
- This process can be killed or the instance can be shut down safely at this point
- Don’t decommission a node without good reason
Removenode alters the token allocation and streams data from the remaining replicas, which entertains the possibility of violating consistency and losing data. Removenode should only be used as a last resort where decommission cannot be used. Note that removenode will trigger streaming to other nodes in the cluster. On an already overloaded cluster, this will place additional strain leading to further instability.
nodetool removenode
- Data owned by this node will be streamed to the new token ranges from live replicas within the cluster. Upon completion, the node will be removed.
Assassinate should be viewed as an absolute last resort option when forcing a node out of a cluster, and you should fully understand the implications. There is a significant chance of data loss in this case, especially when replication factors and consistency levels are not in line with best practices. This is due to the fact that the assassinate command will simply drop the node without streaming any data from other replicas or the leaving node itself.
We have seen numerous cases of removenode and/or assassinate being used during an outage situation with disastrous results. Removenode is a particularly bad idea in this situation as removenode places additional load on the UN nodes in terms of streaming (compounding the issue of loss of capacity while having a down node). Also when a cluster is unstable, like during an outage situation, you shouldn’t be making token changes because messing with token ownership can result in data loss or data unavailability on all nodes. Again, we have been involved in too many incidents where a customer has found themselves in this situation. Had they persisted with recovering the down node, they would have been in a much better position for recovery and the overall outage time would have been greatly reduced.
What you should do during a node outage is fix the node or follow the replace procedure if you can’t fix the down node. Never use removenode to recover a node that has failed.
Downsizing
A valid use for removenode is for permanent removal of a node, i.e. downsizing.
However, this is still not recommended, and you should use nodetool decommission instead, as decommission is consistent and removenode isn’t. Decommission streams data from the leaving node, so any special data that node has falls onto other nodes, reducing the chance of data loss. Removenode streams from any replica but not the leaving node, so it can miss data that resides only on the leaving node, or in the case of quorum writes, break quorum guarantees.
Replacing a Node
- Starting a new node by indicating to Cassandra that it is a replacement
Note: Make sure that the original node is stopped (DN) before beginning the replace operation. If replacing an existing node, everything in the Cassandra DIR needs to be deleted (delete data replace), and autho_bootstrap needs to be set to true so the node will stream off other nodes within the cluster.
- Similar procedure to adding a node
- Configure the new node similarly to the old node (cassandra.yaml, rackdc etc.)
- Use the same Cassandra version
- Edit cassandra-env.sh or jvm.option as appropriate with:
- JVM_OPTS=”$JVM_OPTS -Dcassandra.replace_address=<old_ip_address>”
- Once the node is UN, remove the JVM_OPTS line
- Adjust the seeds as needed
If you need to decommission:
nodetool decommission
Check the joining status with nodetool status from another node in the cluster: nodetool status
Check the streaming status with nodetool netstats from another node in the cluster: nodetool netstats
You should see some streaming at this point.
Have more questions on configuring node types or other best practices for Apache Cassandra? Reach out to schedule a consultation with one of our experts.