Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

7/10/2018

Reading time:5 min

zegelin/cassandra-exporter

by John Doe

Project Status: alphaIntroductioncassandra-exporter is a Java agent that exports Cassandra metrics to Prometheus.It enables high performance collection of Cassandra metrics and follows the Prometheus best practices for metrics naming and labeling.For example, the following PromQL query will return an estimate of the number of pending compactions per keyspace, per node.sum(cassandra_table_estimated_pending_compactions) by (cassandra_node, keyspace)Compatibilitycassandra-exporter is has been tested with:ComponentVersionApache Cassandra3.11.2Prometheus2.0 and laterOther Cassandra and Prometheus versions will be tested for compatibility in the future.UsageDownload the latest release and copy cassandra-exporter-agent-<version>.jar to $CASSANDRA_HOME/lib (typically /usr/share/cassandra/lib in most package installs).Then edit $CASSANDRA_CONF/cassandra-env.sh (typically /etc/cassandra/cassandra-env.sh) and append the following:JVM_OPTS="$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/cassandra-exporter-agent-<version>.jar=http://localhost:9998/"Then (re-)start Cassandra.Prometheus metrics will be available at http://localhost:9998/metrics.Configure Prometheus to scrape the endpoint by adding the following to prometheus.yml:scrape_configs: ... - job_name: 'cassandra' static_configs: - targets: ['<cassandra node IP>:9998']See the Prometheus documentation for more details on configuring scrape targets.Viewing the exposed endpoint in a web browser will display a HTML version of the exported metrics.To view the raw, plain text metrics (in the Prometheus text exposition format), either request the endpoint with a HTTP client that prefers plain text(or one that can specify the Accept: text/plain header) or add the following query parameter to the URL: ?x-content-type=text/plain.An experimental JSON output is also provided, via Accept: application/json or ?x-content-type=application/json.The format/structure of this output is subject to change.OptionsCurrently only the HTTP endpoint (address & port) can be configured.FeaturesPerformanceJMX is slow, really slow. JMX adds significant overhead to every method invocation on exported MBean methods, even when those methods are called from within the same JVM.On a 300-ish table Cassandra node, trying to collect all exposed metrics via JVM resulted in a collection time that was upwards of 2-3 seconds.For exporters that run as a separate process there is additional overhead of inter-process communications and that time can reach the 10's of seconds.cassandra-exporter on the same node collects all metrics in 10-20 milliseconds.Best practicesThe exporter follows Prometheus best practices for metric names, labels and data types.Aggregate metrics, such as the aggregated table metrics at the keyspace and node level, are skipped. Instead these should be aggregated using PromQL queries or Prometheus recording rules.Metrics are coalesced when appropriate so they share the same name, opting for labels to differentiate indiviual time series. For example, each table level metric has a constant name and at minimum a table & keyspace label, which allows for complex PromQL queries.For example the cassandra_table_operation_latency_seconds[_count|_sum] summary metric combines read, write, range read, CAS prepare, CAS propose and CAS commit latency metrics together into a single metric family.A summary exposes percentiles (via the quantile label), a total count of recorded samples (via the _count metric),and (if available, NaN otherwise) an accumulated sum of all samples (via the _sum metric).Individual time-series are separated by different labels. In this example, the operation type is exported as the operation label.The source keyspace, table, table_type (table, view or index), table_id (CF UUID), and numerous other metadata labels are available.cassandra_table_operation_latency_seconds_count{keyspace="system_schema",table="tables",table_type="table",operation="read",...}cassandra_table_operation_latency_seconds_count{keyspace="system_schema",table="tables",table_type="table",operation="write",...}cassandra_table_operation_latency_seconds_count{keyspace="system_schema",table="keyspaces",table_type="table",operation="read",...}cassandra_table_operation_latency_seconds_count{keyspace="system_schema",table="keyspaces",table_type="table",operation="write",...}These metrics can then be queried:sum(cassandra_table_operation_latency_seconds_count) by (keyspace, operation) # total operations by keyspace & typeElementValue{keyspace="system",operation="write"}13989{keyspace="system",operation="cas_commit"}0{keyspace="system",operation="cas_prepare"}0{keyspace="system",operation="cas_propose"}0{keyspace="system",operation="range_read"}10894{keyspace="system",operation="read"}74{keyspace="system_schema",operation="write"}78{keyspace="system_schema",operation="cas_commit"}0{keyspace="system_schema",operation="cas_prepare"}0{keyspace="system_schema",operation="cas_propose"}0{keyspace="system_schema",operation="range_read"}75{keyspace="system_schema",operation="read"}618Global LabelsThe exporter does attach global labels to the exported metrics. At this time these cannot be disabled without recompiling the agent.These labels are:cassandra_cluster_nameThe name of the cluster, as specified in cassandra.yamlcassandra_host_idThe unique UUID of the nodecassandra_nodeThe IP address of the nodecassandra_datacenterThe configured data center name of the nodecassandra_rackThe configured rack name of the nodeThese labels allow aggregation of metrics at the cluster, data center and rack levels.While these labels could be defined in the prometheus scrape config, the authors feel that having these labels be automaticallyapplied simplifies things, especially when Prometheus is monitoring multiple clusters across numerous DCs and racks.JMX Standalone (Experimental)While it is preferable to run cassandra-exporter as a Java agent for performance, it can instead be run as an external application if required.Metrics will be queried via JMX.The set of metrics should be identical, but currently some additional metadata labels attached to the cassandra_table_* metrics willnot be available.This was originally designed to assist with benchmarking and development of the exporter. Currently the JMX RMI service URL and HTTP endpointvalues are hard-coded. The application will need to be recompiled if these parameters need to be changed.Exported MetricsSee the Exported Metrics wiki page for a list.We suggest viewing the metrics endpoint (e.g., http://localhost:9998/metrics) in a browser to get an understanding of what metricsare exported by your Cassandra node.Unstable, Missing & Future FeaturesSee the project issue tracker for a complete list.Configuration parametersCurrently only the listen address & port can be configured.Allow configuration of:listen address and portexported metrics (aka, blacklist certain metrics)enable/disable global labelsexclude help from JSONJVM metricsFuture versions should add support for collecting and exporting JVM metrics (memory, GC pause times, etc).Add some example queriesAdd Grafana dashboard templatesDocumentation improvementsImprove standalone JMX exporterConfiguration parameters

Illustration Image

Project Status: alpha

Introduction

cassandra-exporter is a Java agent that exports Cassandra metrics to Prometheus.

It enables high performance collection of Cassandra metrics and follows the Prometheus best practices for metrics naming and labeling.

For example, the following PromQL query will return an estimate of the number of pending compactions per keyspace, per node.

sum(cassandra_table_estimated_pending_compactions) by (cassandra_node, keyspace)

Compatibility

cassandra-exporter is has been tested with:

Component Version
Apache Cassandra 3.11.2
Prometheus 2.0 and later

Other Cassandra and Prometheus versions will be tested for compatibility in the future.

Usage

Download the latest release and copy cassandra-exporter-agent-<version>.jar to $CASSANDRA_HOME/lib (typically /usr/share/cassandra/lib in most package installs).

Then edit $CASSANDRA_CONF/cassandra-env.sh (typically /etc/cassandra/cassandra-env.sh) and append the following:

JVM_OPTS="$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/cassandra-exporter-agent-<version>.jar=http://localhost:9998/"

Then (re-)start Cassandra.

Prometheus metrics will be available at http://localhost:9998/metrics.

Configure Prometheus to scrape the endpoint by adding the following to prometheus.yml:

scrape_configs:
  ...
  
  - job_name: 'cassandra'
    static_configs:
      - targets: ['<cassandra node IP>:9998']

See the Prometheus documentation for more details on configuring scrape targets.

Viewing the exposed endpoint in a web browser will display a HTML version of the exported metrics.

To view the raw, plain text metrics (in the Prometheus text exposition format), either request the endpoint with a HTTP client that prefers plain text (or one that can specify the Accept: text/plain header) or add the following query parameter to the URL: ?x-content-type=text/plain.

An experimental JSON output is also provided, via Accept: application/json or ?x-content-type=application/json. The format/structure of this output is subject to change.

Options

Currently only the HTTP endpoint (address & port) can be configured.

Features

Performance

JMX is slow, really slow. JMX adds significant overhead to every method invocation on exported MBean methods, even when those methods are called from within the same JVM. On a 300-ish table Cassandra node, trying to collect all exposed metrics via JVM resulted in a collection time that was upwards of 2-3 seconds. For exporters that run as a separate process there is additional overhead of inter-process communications and that time can reach the 10's of seconds.

cassandra-exporter on the same node collects all metrics in 10-20 milliseconds.

Best practices

The exporter follows Prometheus best practices for metric names, labels and data types.

Aggregate metrics, such as the aggregated table metrics at the keyspace and node level, are skipped. Instead these should be aggregated using PromQL queries or Prometheus recording rules.

Metrics are coalesced when appropriate so they share the same name, opting for labels to differentiate indiviual time series. For example, each table level metric has a constant name and at minimum a table & keyspace label, which allows for complex PromQL queries.

For example the cassandra_table_operation_latency_seconds[_count|_sum] summary metric combines read, write, range read, CAS prepare, CAS propose and CAS commit latency metrics together into a single metric family. A summary exposes percentiles (via the quantile label), a total count of recorded samples (via the _count metric), and (if available, NaN otherwise) an accumulated sum of all samples (via the _sum metric).

Individual time-series are separated by different labels. In this example, the operation type is exported as the operation label. The source keyspace, table, table_type (table, view or index), table_id (CF UUID), and numerous other metadata labels are available.

cassandra_table_operation_latency_seconds_count{keyspace="system_schema",table="tables",table_type="table",operation="read",...}
cassandra_table_operation_latency_seconds_count{keyspace="system_schema",table="tables",table_type="table",operation="write",...}
cassandra_table_operation_latency_seconds_count{keyspace="system_schema",table="keyspaces",table_type="table",operation="read",...}
cassandra_table_operation_latency_seconds_count{keyspace="system_schema",table="keyspaces",table_type="table",operation="write",...}

These metrics can then be queried:

sum(cassandra_table_operation_latency_seconds_count) by (keyspace, operation) # total operations by keyspace & type
Element Value
{keyspace="system",operation="write"} 13989
{keyspace="system",operation="cas_commit"} 0
{keyspace="system",operation="cas_prepare"} 0
{keyspace="system",operation="cas_propose"} 0
{keyspace="system",operation="range_read"} 10894
{keyspace="system",operation="read"} 74
{keyspace="system_schema",operation="write"} 78
{keyspace="system_schema",operation="cas_commit"} 0
{keyspace="system_schema",operation="cas_prepare"} 0
{keyspace="system_schema",operation="cas_propose"} 0
{keyspace="system_schema",operation="range_read"} 75
{keyspace="system_schema",operation="read"} 618

Global Labels

The exporter does attach global labels to the exported metrics. At this time these cannot be disabled without recompiling the agent.

These labels are:

  • cassandra_cluster_name

    The name of the cluster, as specified in cassandra.yaml

  • cassandra_host_id

    The unique UUID of the node

  • cassandra_node

    The IP address of the node

  • cassandra_datacenter

    The configured data center name of the node

  • cassandra_rack

    The configured rack name of the node

These labels allow aggregation of metrics at the cluster, data center and rack levels.

While these labels could be defined in the prometheus scrape config, the authors feel that having these labels be automatically applied simplifies things, especially when Prometheus is monitoring multiple clusters across numerous DCs and racks.

JMX Standalone (Experimental)

While it is preferable to run cassandra-exporter as a Java agent for performance, it can instead be run as an external application if required. Metrics will be queried via JMX.

The set of metrics should be identical, but currently some additional metadata labels attached to the cassandra_table_* metrics will not be available.

This was originally designed to assist with benchmarking and development of the exporter. Currently the JMX RMI service URL and HTTP endpoint values are hard-coded. The application will need to be recompiled if these parameters need to be changed.

Exported Metrics

See the Exported Metrics wiki page for a list.

We suggest viewing the metrics endpoint (e.g., http://localhost:9998/metrics) in a browser to get an understanding of what metrics are exported by your Cassandra node.

Unstable, Missing & Future Features

See the project issue tracker for a complete list.

  • Configuration parameters

    Currently only the listen address & port can be configured.

    Allow configuration of:

    • listen address and port
    • exported metrics (aka, blacklist certain metrics)
    • enable/disable global labels
    • exclude help from JSON
  • JVM metrics

    Future versions should add support for collecting and exporting JVM metrics (memory, GC pause times, etc).

  • Add some example queries

  • Add Grafana dashboard templates

  • Documentation improvements

  • Improve standalone JMX exporter

    • Configuration parameters

Related Articles

monitoring
cassandra

Cassandra Summit Recap: Diagnosing Problems in Production - RustyRazorblade.com

John Doe

3/1/2023

monitoring
cassandra
performance

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

monitoring