How to handle cassandra tombstones in continuously high pressure update call mode?

Author: Adam Z

Originally Sourced from:

The product is required to handle 500k TPS updates and deletions which generate a lot of tombstones. In our performance test, we found cassandra performance dropped dramatically after 15 hours and read/write latencies were rather high. Then we use a nodetool compact script to cleanup the tombstones continuously and the system can get through long hour test.

But according to the document, a nodetool repair must be executed before gc_grace_seconds which is 10 days in default. Even we shorten it to 2 days, it's gonna generate huge amount of tombstones within 2 days and degrade performance. If we keep executing nodetool compaction, it may cause data integration problem. How can I handle such situation? Thanks in advance.