Fixing Cassandra Database


Author: Little Helper

Originally Sourced from: https://stackoverflow.com/questions/64136796/fixing-cassandra-database

My co-worker and I have been thrown into a project that uses Cassandra with no introductions. Alright, let's do this!

SELECT * FROM reports WHERE timestamp < '2019-01-01 00:00:00' ALLOW FILTERING;

Error: 1300

Apparently, we have too many tombstones. What's that? A tombstone is deleted data that hasn't been removed yet for performance reasons. Tombstones should be deleted with nodetool repair before the gc_grace_period has expired, default is 10 days. Now, this project is around 7 years old and it doesn't seem like there's a job that runs repair. According to default warning and error values, 1K tombstones are a lot. We find about 1.4M. We measured the number of tombstones with Tracing on, running a SELECT query, and accumulating the tombstones reported.

We tried to run nodetool repair --full -pr -j 4 but we get Validation failed in /10.0.3.1. DataStax's guide to repairing repairs wants us to fix the validation error with nodetool scrub. But we still get the same error afterwards. The guide then wants us to run sstablescrub, which failed with an out-of-memory exception.

Going back to our original problem of deleting data before 2019, we tried to run DELETE FROM reports WHERE timestamp < '2019-01-01 00:00:00'. However, timestamp is not our partition key so we are not allowed to delete data like this, which has also been confirmed by many other StackOverflow posts and an DataStax issue on Jira. Every post mentions that we should "just" change the schema of our Cassandra database to fit our queries. First, we only need to do this once; second, our client wants to have this data deleted as soon as possible.

  1. Is there a way of easily changing the schema of a Cassandra database?
  2. Is there a way that we can make a slow solution that at least works?

All in all, we are new to Cassandra and we are unsure on how to proceed.

What we want is

  1. delete all data from before 2019 and confirm that it is deleted
  2. have stable selects, avoiding error 1300

Can you help?

We have 4 nodes running in Docker on Azure if that is necessary to know. The version of Cassandra is 3.11.6.