9/29/2018

Reading time:7 min

spotify/cstar

by John Doe

cstar is an Apache Cassandra cluster orchestration tool for the command line.Why not simply use Ansible or Fabric?Ansible does not have the primitives required to run things in a topology aware fashion. One couldsplit the C* cluster into groups that can be safely executed in parallel and run one group at a time.But unless the job takes almost exactly the same amount of time to run on every host, such a solutionwould run with a significantly lower rate of parallelism, not to mention it would be kludgy enough tobe unpleasant to work with.Unfortunately, Fabric is not thread safe, so the same type of limitations apply. Fabric allows one torun a job in parallel on many machines, but with similar restrictions as those of Ansible groups.It’s possibly to use fabric and celery together to do what is needed, but it’s a very complicatedsolution.RequirementsAll involved machines are assumed to be some sort of UNIX-like system like OS X or Linux. The machinerunning cstar must have python3, the Cassandra hosts must have a Bourne style shell.InstallingYou need to have Python3 and run an updated version of pip (9.0.1).# pip3 install cstarIt's also possible to install straight from repo. This installs the latest version that may not be pushed to pypi:# pip install git+https://github.com/spotify/cstar.gitSome systems (like Ubuntu 14.04) might trigger ssh2-python related errors when installing because the locally available libssh2 is too old (<1.6.0).In such case, please apply the following procedure :sudo apt-get install cmake libssl-dev libffi-dev python3-pip -ygit clone --recurse-submodules https://github.com/ParallelSSH/ssh2-python.gitcd ssh2-python; sudo ./ci/install-ssh2.shsudo pip3 install setuptools bcrypt --upgradesudo pip3 install cstar --upgrade# or: # sudo pip3 install git+https://github.com/spotify/cstar.git --upgradeThis will build libssh2 from source using the one that ships with ssh2-python and install some required dependencies.Code of conductThis project adheres to theOpen Code of Conduct.By participating, you are expected to honor this code.CLICStar is run through the cstar command, like so# cstar COMMAND [HOST-SPEC] [PARAMETERS]The HOST-SPEC specifies what nodes to run the script on. There are three ways to specify a the spec:The --seed-host switch tells cstar to connect to a specific host and fetch the full ring topologyfrom there, and then run the script on all nodes in the cluster. --seed-host can be specifiedmultiple times, and multiple hosts can be specified as a comma-separated list in order to run ascript across multiple clusters.The --host switch specifies an exact list of hosts to use. --host can be specified multipletimes, and multiple hosts can be specified as a comma-separated list.The --host-file switch points to a file name containing a newline separated list of hosts. Thiscan be used together with process substitution, e.g. --host-file <(dig -t srv ...)The command is the name of a script located in either /usr/lib/cstar/commands or in~/.cstar/commands. This script will be uploaded to all nodes in the cluster and executed. File suffixesare stripped. The requirements of the script are described below. Cstar comes pre-packaged with one script filecalled run which takes a single parameter --command - see examples below.Some additional switches to control cstar:One can override the parallelism specified in a script by setting the switches--cluster-parallelism, --dc-parallelism and --strategy.There are two special case invocations:One can skip the script name and instead use the continue command to specify a previously halted jobto resume.One can skip the script name and instead use the cleanup-jobs. See Cleaning up old jobs.Two python ssh modules can be used : paramiko (default) and ssh2-python. To use the faster (but experimental) ssh2-python module add the following flag : --ssh-lib=ssh2If you need to access the remote cluster with a specific username, add --ssh-username=remote_username to your cstar command line. A private key file can also be specified using --ssh-identity-file=my_key_file.pem.To use plain text authentication, please add --ssh-password=my_password to the command line.In order to run the command first on a single node and then stop execution to verify everything worked as expected, add the following flag to your command line : --stop-after=1. cstar will stop after the first node executed the command and print out the appropriate resume command to continue the execution when ready : cstar continue <JOB_ID>A script file can specify additional parameters.Command syntaxIn order to run a command, it is first uploaded to the relevant host, and then executed from there.Commands can be written in any scripting language in which the hash symbol starts a line comment, e.g.shell-script, python, perl or ruby.The first line must be a valid shebang. After that, commented lines containing key value pairs maybe used to override how the script is parallelised as well as providing additional parameters forthe script, e.g. # C* dc-parallel: trueThe possible keys are:cluster-parallelism, can the script be run on multiple clusters in parallel. Default value is true.dc-parallelism, can the script be run on multiple data centers in the same cluster in parallel. Default value is false.strategy, how many nodes within one data center can the script be run on. Default is topology.Can be one of:one, only one node per data centertopology, inspect topology and run on as many nodes as the topology allowsall, can be run on all nodes at oncedescription, specifies a description for the script used in the help message.argument, specifies an additional input parameter for the script, as well as a help text and anoptional default value.Job outputCstar automatically saves the job status to file during operation.Standard output, standard error and exit status of each command run against a Cassandra host issaved locally on machine where cstar is running. They are available under the users home directory in.cstar/jobs/JOB_ID/HOSTNAMEHow jobs are runWhen a new cstar job is created, it is assigned an id. (It's a UUID)Cstar stores intermediate job output in the directory~/.cstar/remote_jobs/<JOB_ID>. This directory contains files with the stdout, stderr and PID of thescript, and once it finishes, it will also contain a file with the exit status of the script.Once the job finishes, these files will be moved over to the original host and put in the directory ~/.cstar/jobs/<JOB_ID>/<REMOTE_HOST_NAME>.Cstar jobs are run nohuped, this means that even if the ssh connection is severed, the job will proceed.In order to kill a cstar script invocation on a specific host, you will need ssh to the host and killthe proccess.If a job is halted half-way, either by pressing ^C or by using the --stop-after parameter, it can berestarted using cstar continue <JOB_ID>. If the script was finished or already running when cstarshut down, it will not be rerun.Cleaning up old jobsEven on successful completion, the output of a cstar job is not deleted. This means it's easy to checkwhat the output of a script was after it completed. The downside of this is that you can get a lot ofdata lying around in ~/.cstar/jobs. In order to clean things up, you can usecstar cleanup-jobs. By default it will remove all jobs older than one week. You can override themaximum age of a job before it's deleted by using the --max-job-age parameter.Examples# cstar run --command='service cassandra restart' --seed-host some-hostExplanation: Run the local cli command service cassandra restart on a cluster. If necessary, add sudo to thecommand.# cstar puppet-upgrade-cassandra --seed-host some-host --puppet-branch=cass-2.2-upgradeExplanation: Run the command puppet-upgrade-cassandra on a cluster. The puppet-upgrade-cassandracommand expects a parameter, the puppet branch to run in order to perform the Cassandra upgrade. See thepuppet-upgrade-cassandra example below.# cstar puppet-upgrade-cassandra --helpExplanation: Show help for the puppet-upgrade-cassandra command. This includes documentation for anyadditional command-specific switches for the puppet-upgrade-cassandra command.# cstar continue 90642c11-4714-44c4-a13a-94b86f09e3bbExplanation: Resume previously created job with job id 90642c11-4714-44c4-a13a-94b86f09e3bb.The job id is the first line written on any executed job.Example script fileThis is an example script file that would saved to ~/.cstar/commands/puppet-upgrade-cassandra.sh. It upgrades aCassandra cluster by running puppet on a different branch, then restarting the node, then upgrading the sstables.# !/usr/bin/env bash# C* cluster-parallel: true # C* dc-parallel: true # C* strategy: topology # C* description: Upgrade one or more clusters by switching to a different puppet branch # C* argument: {"option":"--snapshot-name", "name":"SNAPSHOT_NAME", "description":"Name of pre-upgrade snapshot", "default":"preupgrade"} # C* argument: {"option":"--puppet-branch", "name":"PUPPET_BRANCH", "description":"Name of puppet branch to switch to", "required":true} nodetool snapshot -t $SNAPSHOT_NAMEsudo puppet --branch $PUPPET_BRANCHsudo service cassandra restartnodetool upgradesstables

Read this article if you want to know more about spotify/cstar

cstar is an Apache Cassandra cluster orchestration tool for the command line.

Why not simply use Ansible or Fabric?

Ansible does not have the primitives required to run things in a topology aware fashion. One could split the C* cluster into groups that can be safely executed in parallel and run one group at a time. But unless the job takes almost exactly the same amount of time to run on every host, such a solution would run with a significantly lower rate of parallelism, not to mention it would be kludgy enough to be unpleasant to work with.

Unfortunately, Fabric is not thread safe, so the same type of limitations apply. Fabric allows one to run a job in parallel on many machines, but with similar restrictions as those of Ansible groups. It’s possibly to use fabric and celery together to do what is needed, but it’s a very complicated solution.

Requirements

All involved machines are assumed to be some sort of UNIX-like system like OS X or Linux. The machine running cstar must have python3, the Cassandra hosts must have a Bourne style shell.

Installing

You need to have Python3 and run an updated version of pip (9.0.1).

# pip3 install cstar

It's also possible to install straight from repo. This installs the latest version that may not be pushed to pypi:

# pip install git+https://github.com/spotify/cstar.git

Some systems (like Ubuntu 14.04) might trigger ssh2-python related errors when installing because the locally available libssh2 is too old (<1.6.0). In such case, please apply the following procedure :

sudo apt-get install cmake libssl-dev libffi-dev python3-pip -y
git clone --recurse-submodules https://github.com/ParallelSSH/ssh2-python.git
cd ssh2-python; sudo ./ci/install-ssh2.sh
sudo pip3 install setuptools bcrypt --upgrade
sudo pip3 install cstar --upgrade
# or: 
# sudo pip3 install git+https://github.com/spotify/cstar.git --upgrade

This will build libssh2 from source using the one that ships with ssh2-python and install some required dependencies.

Code of conduct

This project adheres to the Open Code of Conduct. By participating, you are expected to honor this code.

CLI

CStar is run through the cstar command, like so

# cstar COMMAND [HOST-SPEC] [PARAMETERS]

The HOST-SPEC specifies what nodes to run the script on. There are three ways to specify a the spec:

The --seed-host switch tells cstar to connect to a specific host and fetch the full ring topology from there, and then run the script on all nodes in the cluster. --seed-host can be specified multiple times, and multiple hosts can be specified as a comma-separated list in order to run a script across multiple clusters.
The --host switch specifies an exact list of hosts to use. --host can be specified multiple times, and multiple hosts can be specified as a comma-separated list.
The --host-file switch points to a file name containing a newline separated list of hosts. This can be used together with process substitution, e.g. --host-file <(dig -t srv ...)

The command is the name of a script located in either /usr/lib/cstar/commands or in ~/.cstar/commands. This script will be uploaded to all nodes in the cluster and executed. File suffixes are stripped. The requirements of the script are described below. Cstar comes pre-packaged with one script file called run which takes a single parameter --command - see examples below.

Some additional switches to control cstar:

One can override the parallelism specified in a script by setting the switches --cluster-parallelism, --dc-parallelism and --strategy.

There are two special case invocations:

One can skip the script name and instead use the continue command to specify a previously halted job to resume.
One can skip the script name and instead use the cleanup-jobs. See Cleaning up old jobs.
Two python ssh modules can be used : paramiko (default) and ssh2-python. To use the faster (but experimental) ssh2-python module add the following flag : --ssh-lib=ssh2
If you need to access the remote cluster with a specific username, add --ssh-username=remote_username to your cstar command line. A private key file can also be specified using --ssh-identity-file=my_key_file.pem.
To use plain text authentication, please add --ssh-password=my_password to the command line.
In order to run the command first on a single node and then stop execution to verify everything worked as expected, add the following flag to your command line : --stop-after=1. cstar will stop after the first node executed the command and print out the appropriate resume command to continue the execution when ready : cstar continue <JOB_ID>

A script file can specify additional parameters.

Command syntax

In order to run a command, it is first uploaded to the relevant host, and then executed from there.

Commands can be written in any scripting language in which the hash symbol starts a line comment, e.g. shell-script, python, perl or ruby.

The first line must be a valid shebang. After that, commented lines containing key value pairs may be used to override how the script is parallelised as well as providing additional parameters for the script, e.g. # C* dc-parallel: true

The possible keys are:

cluster-parallelism, can the script be run on multiple clusters in parallel. Default value is true.

dc-parallelism, can the script be run on multiple data centers in the same cluster in parallel. Default value is false.

strategy, how many nodes within one data center can the script be run on. Default is topology. Can be one of:

one, only one node per data center
topology, inspect topology and run on as many nodes as the topology allows
all, can be run on all nodes at once

description, specifies a description for the script used in the help message.

argument, specifies an additional input parameter for the script, as well as a help text and an optional default value.

Job output

Cstar automatically saves the job status to file during operation.

Standard output, standard error and exit status of each command run against a Cassandra host is saved locally on machine where cstar is running. They are available under the users home directory in .cstar/jobs/JOB_ID/HOSTNAME

How jobs are run

When a new cstar job is created, it is assigned an id. (It's a UUID)

Cstar stores intermediate job output in the directory ~/.cstar/remote_jobs/<JOB_ID>. This directory contains files with the stdout, stderr and PID of the script, and once it finishes, it will also contain a file with the exit status of the script.

Once the job finishes, these files will be moved over to the original host and put in the directory ~/.cstar/jobs/<JOB_ID>/<REMOTE_HOST_NAME>.

Cstar jobs are run nohuped, this means that even if the ssh connection is severed, the job will proceed. In order to kill a cstar script invocation on a specific host, you will need ssh to the host and kill the proccess.

If a job is halted half-way, either by pressing ^C or by using the --stop-after parameter, it can be restarted using cstar continue <JOB_ID>. If the script was finished or already running when cstar shut down, it will not be rerun.

Cleaning up old jobs

Even on successful completion, the output of a cstar job is not deleted. This means it's easy to check what the output of a script was after it completed. The downside of this is that you can get a lot of data lying around in ~/.cstar/jobs. In order to clean things up, you can use cstar cleanup-jobs. By default it will remove all jobs older than one week. You can override the maximum age of a job before it's deleted by using the --max-job-age parameter.

Examples

# cstar run --command='service cassandra restart' --seed-host some-host

Explanation: Run the local cli command service cassandra restart on a cluster. If necessary, add sudo to the command.

# cstar puppet-upgrade-cassandra --seed-host some-host --puppet-branch=cass-2.2-upgrade

Explanation: Run the command puppet-upgrade-cassandra on a cluster. The puppet-upgrade-cassandra command expects a parameter, the puppet branch to run in order to perform the Cassandra upgrade. See the puppet-upgrade-cassandra example below.

# cstar puppet-upgrade-cassandra --help

Explanation: Show help for the puppet-upgrade-cassandra command. This includes documentation for any additional command-specific switches for the puppet-upgrade-cassandra command.

# cstar continue 90642c11-4714-44c4-a13a-94b86f09e3bb

Explanation: Resume previously created job with job id 90642c11-4714-44c4-a13a-94b86f09e3bb. The job id is the first line written on any executed job.

Example script file

This is an example script file that would saved to ~/.cstar/commands/puppet-upgrade-cassandra.sh. It upgrades a Cassandra cluster by running puppet on a different branch, then restarting the node, then upgrading the sstables.

# !/usr/bin/env bash
# C* cluster-parallel: true                                                                                                                                                                                    
# C* dc-parallel: true                                                                                                                                                                                         
# C* strategy: topology                                                                                                                                                                                        
# C* description: Upgrade one or more clusters by switching to a different puppet branch                                                                                                                       
# C* argument: {"option":"--snapshot-name", "name":"SNAPSHOT_NAME", "description":"Name of pre-upgrade snapshot", "default":"preupgrade"}                                                                      
# C* argument: {"option":"--puppet-branch", "name":"PUPPET_BRANCH", "description":"Name of puppet branch to switch to", "required":true}                                                                       
nodetool snapshot -t $SNAPSHOT_NAME
sudo puppet --branch $PUPPET_BRANCH
sudo service cassandra restart
nodetool upgradesstables

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.