Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

7/20/2020

Reading time:7 min

Audit Logging in Apache Cassandra 4.0

by John Doe

Posted on October 29, 2018 by the Apache Cassandra Community« Back to the Apache Cassandra BlogDatabase audit logging is an industry standard tool for enterprises tocapture critical data change events including what data changed and whotriggered the event. These captured records can then be reviewed laterto ensure compliance with regulatory, security and operational policies.Prior to Apache Cassandra 4.0, the open source community did not have agood way of tracking such critical database activity. With this goal inmind, Netflix implementedCASSANDRA-12151so that users of Cassandra would have a simple yet powerful auditlogging tool built into their database out of the box.Why are Audit Logs Important?Audit logging database activity is one of the key components for makinga database truly ready for the enterprise. Audit logging is generallyuseful but enterprises frequently use it for:Regulatory compliance with laws such as SOX, PCI and GDPR et al. These types of compliance are crucial for companies that are traded on public stock exchanges, hold payment information such as credit cards, or retain private user information. Security compliance. Companies often have strict rules for what data can be accessed by which employees, both to protect the privacy of users but also to limit the probability of a data breach. Debugging complex data corruption bugs such as those found in massively distributed microservice architectures like Netflix’s.Implementing a simple logger in the request (inbound/outbound) pathsounds easy, but the devil is in the details. In particular, the “fastpath” of a database, where audit logging must operate, strives to do aslittle as humanly possible so that users get the fastest and mostscalable database system possible. While implementing Cassandra auditlogging, we had to ensure that the audit log infrastructure does nottake up excessive CPU or IO resources from the actual database executionitself. However, one cannot simply optimize only for performance becausethat may compromise the guarantees of the audit logging.For example, if producing an audit record would block a thread, itshould be dropped to maintain maximum performance. However, mostcompliance requirements prohibit dropping records. Therefore, the key toimplementing audit logging correctly lies in allowing users to achieveboth performance and reliability, or absent being able to achieve bothallow users to make an explicit trade-off through configuration.The design goal of the Audit log are broadly categorized into 3different areas:Performance: Considering the Audit Log injection points arelive in the request path, performance is an important goal in everydesign decision.Accuracy : Accuracy is required by compliance and is thus acritical goal. Audit Logging must be able to answer crucial auditorquestions like “Is every write request to the database being audited?”.As such, accuracy cannot be compromised.Usability & Extensibility: The diverse Cassandra ecosystemdemands that any frequently used feature must be easily usable andpluggable (e.g., Compaction, Compression, SeedProvider etc...), so theAudit Log interface was designed with this context in mind from thestart.ImplementationWith these three design goals in mind, theOpenHFT libraries were anobvious choice due to their reliability and high performance. Earlier inCASSANDRA-13983the chronical queuelibrary ofOpenHFT was introduced as a BinLog utility to the Apache Cassandra codebase. The performance of Full Query Logging (FQL) was excellent, but it only instrumented mutation and read query paths. It was missing a lot of critical data such as when queries failed, where they came from, and which user issued the query. The FQL was also single purpose: preferring to drop messages rather than delay the process (which makes sense for FQL but not for Audit Logging). Lastly, the FQL didn’t allow for pluggability, which would make it harder to adopt in the codebase for this feature.As shown in the architecture figure below, we were able to unify the FQL feature with the AuditLog functionality through the AuditLogManager and IAuditLogger abstractions. Using this architecture, we can support any output format: logs, files, databases, etc. By default, the BinAuditLogger implementation comes out of the box to maintain performance. Users can choose the custom audit logger implementation by dropping the jar file on Cassandra classpath and customizing with configuration options incassandra.yamlfile.ArchitectureWhat does it logEach audit log implementation has access to the following attributes. For the default text-based logger, these fields are concatenated with | to yield the final message.user: User name(if available) host: Host IP, where the command is being executed source ip address: Source IP address from where the request initiated source port: Source port number from where the request initiated timestamp: unix time stamp type: Type of the request (SELECT, INSERT, etc.,) category - Category of the request (DDL, DML, etc.,) keyspace - Keyspace(If applicable) on which request is targeted to be executed scope - Table/Aggregate name/ function name/ trigger name etc., as applicable operation - CQL command being executedExample of Audit log messagesType: AuditLogLogMessage: user:anonymous|host:127.0.0.1:7000|source:/127.0.0.1|port:53418|timestamp:1539978679457|type:SELECT|category:QUERY|ks:k1|scope:t1|operation:SELECT * from k1.t1 ;Type: AuditLogLogMessage: user:anonymous|host:127.0.0.1:7000|source:/127.0.0.1|port:53418|timestamp:1539978692456|type:SELECT|category:QUERY|ks:system|scope:peers|operation:SELECT * from system.peers limit 1;Type: AuditLogLogMessage: user:anonymous|host:127.0.0.1:7000|source:/127.0.0.1|port:53418|timestamp:1539980764310|type:SELECT|category:QUERY|ks:system_virtual_schema|scope:columns|operation:SELECT * from system_virtual_schema.columns ;How to configureAuditlog can be configured using cassandra.yaml. If you want to try Auditlog on one node, it can also be enabled and configured using nodetool.cassandra.yaml configurations for AuditLogenabled: This option enables/ disables audit log logger: Class name of the logger/ custom logger. audit_logs_dir: Auditlogs directory location, if not set, default to cassandra.logdir.audit or cassandra.logdir + /audit/ included_keyspaces: Comma separated list of keyspaces to be included in audit log, default - includes all keyspaces excluded_keyspaces: Comma separated list of keyspaces to be excluded from audit log, default - excludes no keyspace included_categories: Comma separated list of Audit Log Categories to be included in audit log, default - includes all categories excluded_categories: Comma separated list of Audit Log Categories to be excluded from audit log, default - excludes no category included_users: Comma separated list of users to be included in audit log, default - includes all users excluded_users: Comma separated list of users to be excluded from audit log, default - excludes no userNote: BinAuditLogger configurations can be tuned using cassandra.yaml properties as well.List of available categories are: QUERY, DML, DDL, DCL, OTHER, AUTH, ERROR, PREPAREenableauditlog: Enables AuditLog with yaml defaults. yaml configurations can be overridden using options via nodetool command.nodetool enableauditlogOptions:--excluded-categories Comma separated list of Audit Log Categories to be excluded for audit log. If not set the value from cassandra.yaml will be used--excluded-keyspaces Comma separated list of keyspaces to be excluded for audit log. If not set the value from cassandra.yaml will be used--excluded-users Comma separated list of users to be excluded for audit log. If not set the value from cassandra.yaml will be used--included-categories Comma separated list of Audit Log Categories to be included for audit log. If not set the value from cassandra.yaml will be used--included-keyspaces Comma separated list of keyspaces to be included for audit log. If not set the value from cassandra.yaml will be used--included-users Comma separated list of users to be included for audit log. If not set the value from cassandra.yaml will be used--logger Logger name to be used for AuditLogging. Default BinAuditLogger. If not set the value from cassandra.yaml will be useddisableauditlog: Disables AuditLog.nodetool disableuditlogenableauditlog: NodeTool enableauditlog command can be used to reload auditlog filters when called with default or previous loggername and updated filtersnodetool enableauditlog --loggername <Default/ existing loggerName> --included-keyspaces <New Filter values>ConclusionNow that Apache Cassandra ships with audit logging out of the box, userscan easily capture data change events to a persistent record indicatingwhat happened, when it happened, and where the event originated. Thistype of information remains critical to modern enterprises operating ina diverse regulatory environment. While audit logging represents one ofmany steps forward in the 4.0 release, we believe that it will uniquelyenable enterprises to use the database in ways they could notpreviously.

Illustration Image

Posted on October 29, 2018 by the Apache Cassandra Community

« Back to the Apache Cassandra Blog

Database audit logging is an industry standard tool for enterprises to capture critical data change events including what data changed and who triggered the event. These captured records can then be reviewed later to ensure compliance with regulatory, security and operational policies.

Prior to Apache Cassandra 4.0, the open source community did not have a good way of tracking such critical database activity. With this goal in mind, Netflix implemented CASSANDRA-12151 so that users of Cassandra would have a simple yet powerful audit logging tool built into their database out of the box.

Why are Audit Logs Important?

Audit logging database activity is one of the key components for making a database truly ready for the enterprise. Audit logging is generally useful but enterprises frequently use it for:

  1. Regulatory compliance with laws such as SOX, PCI and GDPR et al. These types of compliance are crucial for companies that are traded on public stock exchanges, hold payment information such as credit cards, or retain private user information.
  2. Security compliance. Companies often have strict rules for what data can be accessed by which employees, both to protect the privacy of users but also to limit the probability of a data breach.
  3. Debugging complex data corruption bugs such as those found in massively distributed microservice architectures like Netflix’s.

Implementing a simple logger in the request (inbound/outbound) path sounds easy, but the devil is in the details. In particular, the “fast path” of a database, where audit logging must operate, strives to do as little as humanly possible so that users get the fastest and most scalable database system possible. While implementing Cassandra audit logging, we had to ensure that the audit log infrastructure does not take up excessive CPU or IO resources from the actual database execution itself. However, one cannot simply optimize only for performance because that may compromise the guarantees of the audit logging.

For example, if producing an audit record would block a thread, it should be dropped to maintain maximum performance. However, most compliance requirements prohibit dropping records. Therefore, the key to implementing audit logging correctly lies in allowing users to achieve both performance and reliability, or absent being able to achieve both allow users to make an explicit trade-off through configuration.


The design goal of the Audit log are broadly categorized into 3 different areas:

Performance: Considering the Audit Log injection points are live in the request path, performance is an important goal in every design decision.

Accuracy : Accuracy is required by compliance and is thus a critical goal. Audit Logging must be able to answer crucial auditor questions like “Is every write request to the database being audited?”. As such, accuracy cannot be compromised.

Usability & Extensibility: The diverse Cassandra ecosystem demands that any frequently used feature must be easily usable and pluggable (e.g., Compaction, Compression, SeedProvider etc...), so the Audit Log interface was designed with this context in mind from the start.

Implementation

With these three design goals in mind, the OpenHFT libraries were an obvious choice due to their reliability and high performance. Earlier in CASSANDRA-13983 the chronical queue library of OpenHFT was introduced as a BinLog utility to the Apache Cassandra code base. The performance of Full Query Logging (FQL) was excellent, but it only instrumented mutation and read query paths. It was missing a lot of critical data such as when queries failed, where they came from, and which user issued the query. The FQL was also single purpose: preferring to drop messages rather than delay the process (which makes sense for FQL but not for Audit Logging). Lastly, the FQL didn’t allow for pluggability, which would make it harder to adopt in the codebase for this feature.

As shown in the architecture figure below, we were able to unify the FQL feature with the AuditLog functionality through the AuditLogManager and IAuditLogger abstractions. Using this architecture, we can support any output format: logs, files, databases, etc. By default, the BinAuditLogger implementation comes out of the box to maintain performance. Users can choose the custom audit logger implementation by dropping the jar file on Cassandra classpath and customizing with configuration options in cassandra.yaml file.


Architecture

Fig 1. AuditLog Architecture Figure.


What does it log

Each audit log implementation has access to the following attributes. For the default text-based logger, these fields are concatenated with | to yield the final message.

  • user: User name(if available)
  • host: Host IP, where the command is being executed
  • source ip address: Source IP address from where the request initiated
  • source port: Source port number from where the request initiated
  • timestamp: unix time stamp
  • type: Type of the request (SELECT, INSERT, etc.,)
  • category - Category of the request (DDL, DML, etc.,)
  • keyspace - Keyspace(If applicable) on which request is targeted to be executed
  • scope - Table/Aggregate name/ function name/ trigger name etc., as applicable
  • operation - CQL command being executed

Example of Audit log messages

Type: AuditLog
LogMessage: user:anonymous|host:127.0.0.1:7000|source:/127.0.0.1|port:53418|timestamp:1539978679457|type:SELECT|category:QUERY|ks:k1|scope:t1|operation:SELECT * from k1.t1 ;
Type: AuditLog
LogMessage: user:anonymous|host:127.0.0.1:7000|source:/127.0.0.1|port:53418|timestamp:1539978692456|type:SELECT|category:QUERY|ks:system|scope:peers|operation:SELECT * from system.peers limit 1;
Type: AuditLog
LogMessage: user:anonymous|host:127.0.0.1:7000|source:/127.0.0.1|port:53418|timestamp:1539980764310|type:SELECT|category:QUERY|ks:system_virtual_schema|scope:columns|operation:SELECT * from system_virtual_schema.columns ;

How to configure

Auditlog can be configured using cassandra.yaml. If you want to try Auditlog on one node, it can also be enabled and configured using nodetool.

cassandra.yaml configurations for AuditLog

  • enabled: This option enables/ disables audit log
  • logger: Class name of the logger/ custom logger.
  • audit_logs_dir: Auditlogs directory location, if not set, default to cassandra.logdir.audit or cassandra.logdir + /audit/
  • included_keyspaces: Comma separated list of keyspaces to be included in audit log, default - includes all keyspaces
  • excluded_keyspaces: Comma separated list of keyspaces to be excluded from audit log, default - excludes no keyspace
  • included_categories: Comma separated list of Audit Log Categories to be included in audit log, default - includes all categories
  • excluded_categories: Comma separated list of Audit Log Categories to be excluded from audit log, default - excludes no category
  • included_users: Comma separated list of users to be included in audit log, default - includes all users
  • excluded_users: Comma separated list of users to be excluded from audit log, default - excludes no user

Note: BinAuditLogger configurations can be tuned using cassandra.yaml properties as well.

List of available categories are: QUERY, DML, DDL, DCL, OTHER, AUTH, ERROR, PREPARE

enableauditlog: Enables AuditLog with yaml defaults. yaml configurations can be overridden using options via nodetool command.

nodetool enableauditlog

Options:

--excluded-categories Comma separated list of Audit Log Categories to be excluded for audit log. If not set the value from cassandra.yaml will be used

--excluded-keyspaces Comma separated list of keyspaces to be excluded for audit log. If not set the value from cassandra.yaml will be used

--excluded-users Comma separated list of users to be excluded for audit log. If not set the value from cassandra.yaml will be used

--included-categories Comma separated list of Audit Log Categories to be included for audit log. If not set the value from cassandra.yaml will be used

--included-keyspaces Comma separated list of keyspaces to be included for audit log. If not set the value from cassandra.yaml will be used

--included-users Comma separated list of users to be included for audit log. If not set the value from cassandra.yaml will be used

--logger Logger name to be used for AuditLogging. Default BinAuditLogger. If not set the value from cassandra.yaml will be used

disableauditlog: Disables AuditLog.

nodetool disableuditlog

enableauditlog: NodeTool enableauditlog command can be used to reload auditlog filters when called with default or previous loggername and updated filters

nodetool enableauditlog --loggername <Default/ existing loggerName> --included-keyspaces <New Filter values>

Conclusion

Now that Apache Cassandra ships with audit logging out of the box, users can easily capture data change events to a persistent record indicating what happened, when it happened, and where the event originated. This type of information remains critical to modern enterprises operating in a diverse regulatory environment. While audit logging represents one of many steps forward in the 4.0 release, we believe that it will uniquely enable enterprises to use the database in ways they could not previously.

Related Articles

cluster
troubleshooting
datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

arodrime

4/3/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra