My co-worker and I have been thrown into a project that uses Cassandra with no introductions. Alright, let's do this!
SELECT * FROM reports WHERE timestamp < '2019-01-01 00:00:00' ALLOW FILTERING;
Apparently, we have too many tombstones. What's that?
A tombstone is deleted data that hasn't been removed yet for performance reasons.
Tombstones should be deleted with
nodetool repair before the
gc_grace_period has expired, default is 10 days.
Now, this project is around 7 years old and it doesn't seem like there's a job that runs
According to default warning and error values, 1K tombstones are a lot. We find about 1.4M.
We measured the number of tombstones with
Tracing on, running a SELECT query, and accumulating the tombstones reported.
We tried to run
nodetool repair --full -pr -j 4 but we get
Validation failed in /10.0.3.1.
DataStax's guide to repairing repairs wants us to fix the validation error with
But we still get the same error afterwards.
The guide then wants us to run
sstablescrub, which failed with an out-of-memory exception.
Going back to our original problem of deleting data before 2019, we tried to run
DELETE FROM reports WHERE timestamp < '2019-01-01 00:00:00'.
timestamp is not our partition key so we are not allowed to delete data like this, which has also been confirmed by many other StackOverflow posts and an DataStax issue on Jira.
Every post mentions that we should "just" change the schema of our Cassandra database to fit our queries.
First, we only need to do this once; second, our client wants to have this data deleted as soon as possible.
- Is there a way of easily changing the schema of a Cassandra database?
- Is there a way that we can make a slow solution that at least works?
All in all, we are new to Cassandra and we are unsure on how to proceed.
What we want is
- delete all data from before 2019 and confirm that it is deleted
- have stable selects, avoiding error 1300
Can you help?
We have 4 nodes running in Docker on Azure if that is necessary to know. The version of Cassandra is 3.11.6.