Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

9/15/2020

Reading time:7 min

DataStax-Toolkit/diagnostic-collection

by DataStax-Toolkit

This directory contains a set of scripts to generate a diagnostic tarball for DSE, DDAC &open source Cassandra installations, similar (and partially compatible) to diagnostictarball generated by OpsCenter.Generation of diagnostic tarball for DSE/Cassandra consists of 2 parts:Collection of data on individual nodesMerging all data into single fileThese 2 steps could be separate because administrators may have different ways to accessnodes in the cluster, and transferring the data. The collect_diag.sh performs datacollection from cluster as one command, executed on one of the nodes of the cluster, orfrom Mac or Linux machine outside of the cluster.collect_diag.shThis script copies the collect_node_diag.sh file to all nodes, executing it in parallel,copying back collected data, and generate resulting tarball with DSE Insights ordiagnostic data using generate_diag.sh.Usage:./collect_diag.sh -t <type> [options] [install_root_path]This script accepts the same arguments as collect_node_diag.sh,with only -t as required parameter - all other are optional, but could be used to passSSH, nodetool, cqlsh, dsetool options (for example, you need to pass these options if youhave authentication enabled for cqlsh and/or JMX). For tarball installations it'srequired to pass the path to the top-level directory of the DSE/DDAC/Cassandrainstallation (pass -h to get a list of all options):-c - specifies options to pass to cqlsh (user name, password, etc.)-d - specifies the options for dsetool command (JMX user, password, etc.)-f - specifies file with list of hosts where to executecollect_node_diag.sh (default - try to get list from nodetool status)-i - specifies that we need to collect insights data (DSE Metrics Collector and Metric Collector for Apache Cassandra)-I - specifies directory that contains insights .gz files (default is /var/lib/cassandra/insights_data/insights)-m - specifies the collection mode: light, normal, extended (default:normal). See section below for information on what is collected:light - collect only necessary information - logs, schema, nodetool, base systeminformationnormal - as previous, plus extended system information (iostat, vmstat, ...)extended - all Cassandra logs, longer vmstat/iostat, etc. It could be significantlyslower, and will generate bigger archive-n - specifies a list of general options to pass to nodetool (JMX user, password, etc.)-o - specifies directory where to put where to put resulting file (default:automatically created)-p - specifies the PID of DSE/DDAC/Cassandra process. Script tries to detect it from the output ofps, but this may not work reliably in case if you have several Cassandra processes onthe same machine. This PID is used to get information about limits set for process, etc.-r - remove collected files after generation of resulting tarball-s - options to pass to SSH/SCP-t (required) - specifies the type of installation: dse, ddac, coss;-u - specifies timeout for SSH in seconds (default: 600). You may need to increase itif you're using extended collection mode-v - enables more verbose output by all scripts-z - don't execute commands that require sudoPlease note that user name should be passed as -o User=... in -s option, as scp andssh are using different ways to pass user name.Example of usage with list of hosts passed explicitly (file mhosts):./collect_diag.sh -t dse -f mhosts -r -s \ "-i ~/.ssh/private_key -o StrictHostKeyChecking=no -o User=automaton"or for DDAC:./collect_diag.sh -t ddac -f mhosts -r -s \ "-i ~/.ssh/private_key -o StrictHostKeyChecking=no -o User=automaton" \ /usr/local/lib/cassandraif it's running on the machine with DSE/DDAC/C*, then it could be as simple as, as it willuse nodetool status to obtain a list of nodes:./collect_diag.sh -t ddac -r /usr/local/lib/cassandraWhat is collectedThe collect_node_diag.sh script collects following information:all DSE/DDAC/Cassandra configuration files - cassandra.yaml, dse.yaml, /etc/defaults/dseall current log files in the light & normal collection modes, all log files,included rotated, in the extended modedata from nodetool sub-commands, like, tablestats, tpstats, etc.data from dsetool commands - at least status & ring(in extended mode) executes for DSE the nodetool sjk mxdump to get all current values in JMX;database schemaschema and configuration for DSE Search coressystem information to help identify the problems caused by incorrect system settings(you may need to install some tools, like, iostat, vmstat, etc.):information about CPUs, block devices, disks, memory, etc. (primarily from /proc filesystem)information about operating system (name, version, etc.)limits for user that runs DSE/DDAC/Cassandraoutput of sysctl -a, dmesg, etc.IO and VM statistics via iostat and vmstat (only in normal & extended modes)Important: The generate_diag.sh script removes all sensitive information, such as,passwords from configuration files.Collecting diagnostic on individual nodesCollection of the data on individual nodes is performed by collect_node_diag.sh scriptthat executes different commands to collect data described in the previous section.Script should be executed on every node of cluster, and if it's a tarball installation,you need to pass one required parameter - full path to the root directory ofDSE/DDAC/Cassandra installation. For package installation, location of the files will bedetected automatically, without specification of the root directory. There are alsooptional parameters, that could be provided if, for example, you have authenticationenabled for Cassandra or JMX, changed JMX port, etc. (pass -h to get list of options):-c - specifies options to pass to cqlsh (user name, password, etc.)-d - specifies the options for dsetool command (JMX user, password, etc.)-f - specifies the name where it should put the collected results (could be useful forsome automation)-i - specifies that we need to collect data for insights data (DSE Metrics Collector and Metric Collector for Apache Cassandra)-I - specifies directory that contains insights .gz files (default is /var/lib/cassandra/insights_data/insights)-m - specifies the collection mode: light, normal, extended (default:normal). See section below for information on what is collected:light - collect only necessary information - logs, schema, nodetool, base systeminformationnormal - as previous, plus extended system information (iostat, vmstat, ...)extended - all Cassandra logs, longer vmstat/iostat, etc. It could be significantlyslower, and will generate bigger archive-n - specifies a list of general options to pass to nodetool (JMX user, password, etc.)-o - specifies directory where to put where to put resulting file (default: /var/tmp/)-p - specifies the PID of DSE/DDAC/Cassandra process. Script tries to detect it fromthe output of ps, but this may not work reliably in case if you have several Cassandraprocesses on the same machine. This PID is used to get information about limits set forprocess, etc.-t - specifies the type of installation: dse, ddac, coss (default: dse)-v - enables more verbose output by all scripts-z - don't execute commands that require sudoAfter successful execution, script generates file with name/var/tmp/dse-diag-<IP_Address>.tar.gz, like, /var/tmp/dse-diag-10.200.179.237.tar.gz,or into specified by option -f (or if the -w flag was used, the file name prefix willbe ddac-diag-...).Merging all diagnostics into single tarballAfter the collect_node_diag.sh script was executed on every machine of the cluster,generated files should be collected into single directory on one machine to be mergedusing the generate_diag.sh script. This script accepts single parameter - path todirectory with collected files (it could be either relative, or absolute). There are alsooptional parameters:-f - specifies path to file where data should be put-i - specifies that we need to merge Insights data-o - specifies directory where to put where to put resulting file (default: /var/tmp/)-p - specifies the custom pattern for file names if collect_node_diag.sh was calledwith -f parameter - otherwise this script may not find the data-r - remove individual diagnostic files after processing-t - specifies the type of installation: dse, ddac, coss (default: dse)This script performs following:Creates a temporary directory in the /var/tmp or specified by -o flag;Unpacks each of collected files;Removes sensitive information, such as, passwords from configuration files;Packs everything together into single file that has name<cluster_name>-diagnostics.tar.gz, for example, dsetest-diagnostics.tar.gz, or intospecified by -f option.This file could be then sent for analysis to DataStax support, or analyzed by tools, like, sperf.

Illustration Image

This directory contains a set of scripts to generate a diagnostic tarball for DSE, DDAC & open source Cassandra installations, similar (and partially compatible) to diagnostic tarball generated by OpsCenter.

Generation of diagnostic tarball for DSE/Cassandra consists of 2 parts:

  1. Collection of data on individual nodes
  2. Merging all data into single file

These 2 steps could be separate because administrators may have different ways to access nodes in the cluster, and transferring the data. The collect_diag.sh performs data collection from cluster as one command, executed on one of the nodes of the cluster, or from Mac or Linux machine outside of the cluster.

collect_diag.sh

This script copies the collect_node_diag.sh file to all nodes, executing it in parallel, copying back collected data, and generate resulting tarball with DSE Insights or diagnostic data using generate_diag.sh.

Usage:

./collect_diag.sh -t <type> [options] [install_root_path]

This script accepts the same arguments as collect_node_diag.sh, with only -t as required parameter - all other are optional, but could be used to pass SSH, nodetool, cqlsh, dsetool options (for example, you need to pass these options if you have authentication enabled for cqlsh and/or JMX). For tarball installations it's required to pass the path to the top-level directory of the DSE/DDAC/Cassandra installation (pass -h to get a list of all options):

  • -c - specifies options to pass to cqlsh (user name, password, etc.)
  • -d - specifies the options for dsetool command (JMX user, password, etc.)
  • -f - specifies file with list of hosts where to execute collect_node_diag.sh (default - try to get list from nodetool status)
  • -i - specifies that we need to collect insights data (DSE Metrics Collector and Metric Collector for Apache Cassandra)
  • -I - specifies directory that contains insights .gz files (default is /var/lib/cassandra/insights_data/insights)
  • -m - specifies the collection mode: light, normal, extended (default: normal). See section below for information on what is collected:
    • light - collect only necessary information - logs, schema, nodetool, base system information
    • normal - as previous, plus extended system information (iostat, vmstat, ...)
    • extended - all Cassandra logs, longer vmstat/iostat, etc. It could be significantly slower, and will generate bigger archive
  • -n - specifies a list of general options to pass to nodetool (JMX user, password, etc.)
  • -o - specifies directory where to put where to put resulting file (default: automatically created)
  • -p - specifies the PID of DSE/DDAC/Cassandra process. Script tries to detect it from the output of ps, but this may not work reliably in case if you have several Cassandra processes on the same machine. This PID is used to get information about limits set for process, etc.
  • -r - remove collected files after generation of resulting tarball
  • -s - options to pass to SSH/SCP
  • -t (required) - specifies the type of installation: dse, ddac, coss;
  • -u - specifies timeout for SSH in seconds (default: 600). You may need to increase it if you're using extended collection mode
  • -v - enables more verbose output by all scripts
  • -z - don't execute commands that require sudo

Please note that user name should be passed as -o User=... in -s option, as scp and ssh are using different ways to pass user name.

Example of usage with list of hosts passed explicitly (file mhosts):

./collect_diag.sh -t dse -f mhosts -r -s \
  "-i ~/.ssh/private_key -o StrictHostKeyChecking=no -o User=automaton"

or for DDAC:

./collect_diag.sh -t ddac -f mhosts -r -s \
  "-i ~/.ssh/private_key -o StrictHostKeyChecking=no -o User=automaton" \
  /usr/local/lib/cassandra

if it's running on the machine with DSE/DDAC/C*, then it could be as simple as, as it will use nodetool status to obtain a list of nodes:

./collect_diag.sh -t ddac -r /usr/local/lib/cassandra

What is collected

The collect_node_diag.sh script collects following information:

  • all DSE/DDAC/Cassandra configuration files - cassandra.yaml, dse.yaml, /etc/defaults/dse
  • all current log files in the light & normal collection modes, all log files, included rotated, in the extended mode
  • data from nodetool sub-commands, like, tablestats, tpstats, etc.
  • data from dsetool commands - at least status & ring
  • (in extended mode) executes for DSE the nodetool sjk mxdump to get all current values in JMX;
  • database schema
  • schema and configuration for DSE Search cores
  • system information to help identify the problems caused by incorrect system settings (you may need to install some tools, like, iostat, vmstat, etc.):
    • information about CPUs, block devices, disks, memory, etc. (primarily from /proc filesystem)
    • information about operating system (name, version, etc.)
    • limits for user that runs DSE/DDAC/Cassandra
    • output of sysctl -a, dmesg, etc.
    • IO and VM statistics via iostat and vmstat (only in normal & extended modes)

Important: The generate_diag.sh script removes all sensitive information, such as, passwords from configuration files.

Collecting diagnostic on individual nodes

Collection of the data on individual nodes is performed by collect_node_diag.sh script that executes different commands to collect data described in the previous section. Script should be executed on every node of cluster, and if it's a tarball installation, you need to pass one required parameter - full path to the root directory of DSE/DDAC/Cassandra installation. For package installation, location of the files will be detected automatically, without specification of the root directory. There are also optional parameters, that could be provided if, for example, you have authentication enabled for Cassandra or JMX, changed JMX port, etc. (pass -h to get list of options):

  • -c - specifies options to pass to cqlsh (user name, password, etc.)
  • -d - specifies the options for dsetool command (JMX user, password, etc.)
  • -f - specifies the name where it should put the collected results (could be useful for some automation)
  • -i - specifies that we need to collect data for insights data (DSE Metrics Collector and Metric Collector for Apache Cassandra)
  • -I - specifies directory that contains insights .gz files (default is /var/lib/cassandra/insights_data/insights)
  • -m - specifies the collection mode: light, normal, extended (default: normal). See section below for information on what is collected:
  • light - collect only necessary information - logs, schema, nodetool, base system information
  • normal - as previous, plus extended system information (iostat, vmstat, ...)
  • extended - all Cassandra logs, longer vmstat/iostat, etc. It could be significantly slower, and will generate bigger archive
  • -n - specifies a list of general options to pass to nodetool (JMX user, password, etc.)
  • -o - specifies directory where to put where to put resulting file (default: /var/tmp/)
  • -p - specifies the PID of DSE/DDAC/Cassandra process. Script tries to detect it from the output of ps, but this may not work reliably in case if you have several Cassandra processes on the same machine. This PID is used to get information about limits set for process, etc.
  • -t - specifies the type of installation: dse, ddac, coss (default: dse)
  • -v - enables more verbose output by all scripts
  • -z - don't execute commands that require sudo

After successful execution, script generates file with name /var/tmp/dse-diag-<IP_Address>.tar.gz, like, /var/tmp/dse-diag-10.200.179.237.tar.gz, or into specified by option -f (or if the -w flag was used, the file name prefix will be ddac-diag-...).

Merging all diagnostics into single tarball

After the collect_node_diag.sh script was executed on every machine of the cluster, generated files should be collected into single directory on one machine to be merged using the generate_diag.sh script. This script accepts single parameter - path to directory with collected files (it could be either relative, or absolute). There are also optional parameters:

  • -f - specifies path to file where data should be put
  • -i - specifies that we need to merge Insights data
  • -o - specifies directory where to put where to put resulting file (default: /var/tmp/)
  • -p - specifies the custom pattern for file names if collect_node_diag.sh was called with -f parameter - otherwise this script may not find the data
  • -r - remove individual diagnostic files after processing
  • -t - specifies the type of installation: dse, ddac, coss (default: dse)

This script performs following:

  • Creates a temporary directory in the /var/tmp or specified by -o flag;
  • Unpacks each of collected files;
  • Removes sensitive information, such as, passwords from configuration files;
  • Packs everything together into single file that has name <cluster_name>-diagnostics.tar.gz, for example, dsetest-diagnostics.tar.gz, or into specified by -f option.

This file could be then sent for analysis to DataStax support, or analyzed by tools, like, sperf.

Related Articles

cluster
troubleshooting
datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

arodrime

4/3/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra