In Cassandra Lunch #17, we discuss tombstones in Cassandra. Tombstones are a special kind of write that signifies deleted values, stops them from being returned on reads, and eventually allows them to be deleted during compaction. We discuss what tombstones are and why they are used, as well as how they work in practice.
What and Why
Tombstones are a value that can be written to a Cassandra cluster in a number of different ways. Tombstones contain deletion_info, which details when a record gets deleted. We use tombstones in order to allow Cassandra to have high speed writes and reads. Rather than having a process where individual deletes must be processed immediately, making those deletes take a very long time, we instead have a process where a tombstone is written to the cluster instead, taking advantage of the high-speed process that Cassandra has in place for writes. This also allows it to scale to a large number of machines. Tombstones cause problems by taking up cluster space before compaction takes place, and also by slowing down reads since Cassandra select queries pull entire SSTables, including tombstones, into memtable before filtering and returning results.
Tombstones are created when data in a Cassandra cluster is deleted. This deletion can create partition, row, or cell tombstones depending on the nature of that deletion. They can also be created when null values are inserted or updated into a Cassandra table, or when a table, row, or cell is created with a TTL (Time to Live) value defining when the selected data is to be deleted.
Tombstones are cleaned up automatically during the compaction process. On the cluster level, the settings that affect when compaction takes place are Compaction Executors and Compaction Throughput settings. On the table, the setting that affects compaction are gc_grace_seconds, which defaults to 10 days, and is a counter that triggers a compaction when it reaches zero. The tombstone_threshold is a percentage value for when a table has enough tombstones that it is ready to be marked for compaction. The default value is 0.2 or 20% tombstones. The tombstone_compaction_interval setting determines how long a table must exist before being eligible for compaction. The unchecked_tombstone_compaction allows the bypass of the previous setting.
We can trigger compaction manually via nodetool compact, nodetool garbagecollect, and nodetool scrub. Generally, it is better to configure a cluster such that tombstone accumulation is not a problem. The accumulation of tombstones can happen when automatic processes push null values. Sometimes manual compaction is the best way to take care of this.
In order to monitor tombstones and compaction processes, we can use nodetool tablestats to get statistics for a particular machine. Anant’s Cassandra Toolkit’s TableAnalyzer can aggregate these values from the machines that make up a cluster. We can also look in the logs for tombstone warnings that display when reads pull a certain number of tombstones. Cassandra toolkit also has tools for this, under NodeAnalyzer or log-analysis.
Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.
We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!