WitFoo Precinct persists and replicates data on big-data NoSQL platform Apache Cassandra. Precinct 6.1.3 is built on Cassandra 3.11. In preparation for upgrade to Cassandra 4.0, the following lab & production testing was conducted.

Lab Appliances

WitFoo Precinct clusters consisting of 1 Management, 1 Streamer and 3 Data nodes were deployed in AWS using the official Marketplace images. The instances were configured to use AWS GP2 SSD drives (the recommended default) and were running on c5d.2xlarge hardware (16GB RAM, 8 CPU Cores.)

The code running on each deployment was identical except for the Cassandra version changes. The Cassandra 3.11 (C3) cluster was configured with identical AWS nodes as the Cassandra 4.0 (C4) nodes. Schema, replication strategies and other key settings were also identical. Replication factor was set to 3 in both clusters. Cassandra heap was set to the following on all nodes: -Xms3866M -Xmx3866M.

Test Data

Both clusters were configured to process, store, and replicate the same data. TTL on inserts was set to 8640 seconds. Data was inserted at a rate of 3 million rows per hour. 1,000 rows were inserted per partition and average partition size was 16MB.

Each record is inserted as JSON. For more details on how we store and process data see: Our Move from Elastic to Cassandra. In this test each cluster has a separate Streamer node reading from AWS and individually fingerprinting, parsing, and normalizing the data through NLP semantic framing.

Performance Results

The following are the results of performance in the test environment.

Table histograms

Results from tablehistograms artifacts.artifacts are as follows:

tablehistograms artifacts.artifactsCassandra 3.11 (in microseconds)Cassandra 4.0beta (in microseconds)
Read Latency 50P2,816.162,816.16
Read Latency 75P4,055.274,055.27
Read Latency 95P8,409.015,839.59
Read Latency 98P17,436.925,839.59
Read Latency 99P17,436.925,839.59
Write Latency 50P9.899.89
Write Latency 75P11.8611.86
Write Latency 95P17.0817.09
Write Latency 98P24.6024.60
Write Latency 99P29.5229.52

Repairs

Times for running nodetool repair are as follows:

After inserts startCassandra 3.11Cassandra 4.0beta
30 minutes32 seconds8 seconds
90 minutes71 seconds15 seconds
6 hours 95 seconds 34 seconds
11 hours 77 seconds 34 seconds

Compaction

Times for running nodetool compact are as follows:

After inserts startCassandra 3.11Cassandra 4.0beta
30 minutes28 seconds15 seconds
90 minutes61 seconds 25 seconds
6 hours 80 seconds 41 seconds
11 hours 79 seconds 38 seconds

Garbage Collection

Times for running nodetool garbagecollect are as follows:

After inserts startCassandra 3.11Cassandra 4.0beta
30 minutes29 seconds19 seconds
90 minutes 62 seconds32 seconds
6 hours 85 seconds 45 seconds
11 hours 86 seconds 57 seconds

Performance Observations

Cassandra 4 delivered mild improvements in reads and writes of data with much more stable results in higher percentiles. The major observable improvements were seen in maintenance action costs delivering extreme improvement.

Production Testing

In addition to lab testing, we have deployed Cassandra 4.0 to 48 data nodes across 15 individual clusters. Utilizing our approaches outlined in Metric Driven Development we were able to observe similar success across all clusters. Tested clusters included deployments running on a wide array of disk configurations including slow, magnetic spindle and extremely fast SSD arrays. Cluster sizes ranged from 1 node and 7 nodes with data retention of up to 12TB (compressed.) Replication across geographies also saw improvement in production. Memory, IOPS and CPU utilization saw mild improvements over Cassandra 3.11 values.

Summary

The performance and stability improvements in Cassandra are a stride forward in big data efforts. It is our intention to include Cassandra 4.0 in the upcoming 6.1.4 release to deliver reliable function at a lower resource cost to our customers. Great work by the entire Cassandra community in taking big-data to the next level.