By Deepak Vohra - March 16, 2020

Apache Cassandra is an open source, distributed NoSQL database based on the wide column model. The highly scalable, highly available database is great for handling large amounts of data across many commodity servers.

There is no set release date yet for the next version, Cassandra 4.0, but we do already know about several new features.

Support for Java 11

Java has a new release cycle in which a new version is made available every six months, but not all of these are longtime support (LTS) versions. Java 11 is the latest LTS version, and Cassandra 4.0 is adding experimental support for it. Experimental support implies that it is not yet recommended for production use.

Virtual Tables

Virtual tables are not your regular tables, as they are backed by an API instead of by SSTables. This implies that the data presented to a user on a virtual table query is fetched from the dynamic state of the database.

Virtual tables are not meant to be created by a user. Instead, a fixed set of read-only virtual tables are provided. Virtual tables are used for exposing the current settings in the cassandra.yaml configuration file, currently running SSTable tasks, system caches, and currently connected clients. Three different virtual tables could be used to monitor the performance of the database, providing information about the read, write, and scan latency. Other virtual tables can present data about disk usage and internode inbound/outbound messaging. 

Audit Logging

All database activity is monitored and recorded to audit logs that are stored in the local filesystem. Audit logging records all authentication attempts made against the database, whether a login attempt was successful or failed. All CQL commands (DDL and DML), whether failed or successful, are also logged. Audit logging is configured in a configuration file.

Full Query Logging

Full-query logging (FQL) is similar to audit logging, except it only logs queries. FQL is dedicated to requests made to the CQL interface. Audit logging also logs CQL requests but lacks features such as FQL Replay and FQL Compare.

FQL Replay could be used to replay the FQL for testing, debugging, and performance benchmarking. This could be performed on a different machine or cluster for different runs of production traffic, to compare different versions and configurations. The FQL Compare tool is used to compare results output by FQL Replay. 

Transient Replication

To understand transient replication, first we need to understand how Cassandra performs repair. Cassandra stores multiple replicas of the same data for durability and high availability. When all replicas are not consistent, a repair needs to be performed. Full repair is performed across the whole cluster, extending to newly added nodes. Incremental repair is performed only on data that has not been repaired previously.

Transient replication is used to create transient replicas that store data that has not been incrementally repaired. When sufficient numbers of full replicas become available, transient replicas stream the data they were storing to the full replicas. This is also an experimental feature.