8/2/2018

Reading time:9 min

riptano/ccm

by John Doe

A script/library to create, launch and remove an Apache Cassandra cluster onlocalhost.The goal of ccm and ccmlib is to make it easy to create, manage and destroy asmall Cassandra cluster on a local box. It is meant for testing a Cassandra cluster.RequirementsA working python installation (tested to work with python 2.7).pyYAML (http://pyyaml.org/ -- sudo easy_install pyYaml)six (https://pypi.org/project/six/ -- sudo easy_install six)ant (http://ant.apache.org/, on Mac OS X, brew install ant)psutil (https://pypi.org/project/psutil/)Java (which version depends on the version of Cassandra you plan to use. Ifunsure, use Java 7 as it is known to work with current versions of Cassandra).If you want to create multiple node clusters, the simplest way is to usemultiple loopback aliases. On modern linux distributions you probably don'tneed to do anything, but on Mac OS X, you will need to create the aliases withsudo ifconfig lo0 alias 127.0.0.2 upsudo ifconfig lo0 alias 127.0.0.3 up...Note that the usage section assumes that at least 127.0.0.1, 127.0.0.2 and127.0.0.3 are available.Optional RequirementsParamiko (http://www.paramiko.org/): Paramiko adds the ability to execute CCMremotely; pip install paramikoNote: The remote machine must be configured with an SSH server and a workingCCM. When working with multiple nodes each exposed IP address must bein sequential order. For example, the last number in the 4th octet ofa IPv4 address must start with 1 (e.g. 192.168.33.11). SeeVagrantfile for help with configuration of remoteCCM machine.Known issuesWindows only:node start pops up a window, stealing focus.cqlsh started from ccm show incorrect prompts on command-promptnon nodetool-based command-line options fail (sstablesplit, scrub, etc)To install psutil, you must use the .msi from pypi. pip install psutil will not workYou will need ant.bat in your PATH in order to build C* from sourceYou must run with an Unrestricted Powershell Execution-Policy if using Cassandra 2.1.0+Ant installed via chocolatey will not be found by ccm, so you must create a symboliclink in order to fix the issue (as administrator):cmd /c mklink C:\ProgramData\chocolatey\bin\ant.bat C:\ProgramData\chocolatey\bin\ant.exeRemote Execution only:Using --config-dir and --install-dir with create may not work asexpected; since the configuration directory and the installation directorycontain lots of files they will not be copied over to the remote machinelike most other options for cluster and node operationscqlsh started from ccm using remote execution will not startproperly (e.g.ccm --ssh-host 192.168.33.11 node1 cqlsh); however-x <CMDS> or --exec=CMDS can still be used to execute a CQLSH commandon a remote node.Installationccm uses python distutils so from the source directory run:sudo ./setup.py installccm is available on the Python Package Index:pip install ccmThere is also a Homebrew package available:brew install ccmUsageLet's say you wanted to fire up a 3 node Cassandra cluster.Short versionccm create test -v 2.0.5 -n 3 -sYou will of course want to replace 2.0.5 by whichever version of Cassandrayou want to test.Longer versionccm works from a Cassandra source tree (not the jars). There are two ways totell ccm how to find the sources:If you have downloaded and compiled Cassandra sources, you can ask ccmto use those by initiating a new cluster with:ccm create test --install-dir=<path/to/cassandra-sources>or, from that source tree directory, simply ccm create testYou can ask ccm to use a released version of Cassandra. For instance touse Cassandra 2.0.5, run ccm create test -v 2.0.5ccm will download the binary (from http://archive.apache.org/dist/cassandra),and set the new cluster to use it. This meansthat this command can take a few minutes the first time youcreate a cluster for a given version. ccm saves the compiledsource in ~/.ccm/repository/, so creating a cluster for thatversion will be much faster the second time you run it(note however that if you create a lot of clusters withdifferent versions, this will take up disk space).Once the cluster is created, you can populate it with 3 nodes with:ccm populate -n 3For Mac OSX, create a new interface for every node besides the first, for example if you populated your cluster with 3 nodes, create interfaces for 127.0.0.2 and 127.0.0.3 like so:sudo ifconfig lo0 alias 127.0.0.2sudo ifconfig lo0 alias 127.0.0.3Note these aliases will disappear on reboot. For permanent network aliases on Mac OSX see .After that execute:ccm startThat will start 3 nodes on IP 127.0.0.[1, 2, 3] on port 9160 for thrift, port7000 for the internal cluster communication and ports 7100, 7200 and 7300 for JMX.You can check that the cluster is correctly set up withccm node1 ringYou can then bootstrap a 4th node withccm add node4 -i 127.0.0.4 -j 7400 -b(populate is just a shortcut for adding multiple nodes initially)ccm provides a number of conveniences, like flushing all of the nodes ofthe cluster:ccm flushor only one node:ccm node2 flushYou can also easily look at the log file of a given node with:ccm node1 showlogFinally, you can get rid of the whole cluster (which will stop the node andremove all the data) withccm removeThe list of other provided commands is available throughccmEach command is then documented through the -h (or --help) flag. Forinstance ccm add -h describes the options for ccm add.Remote Usage (SSH/Paramiko)All the usage examples above will work exactly the same for a remotelyconfigured machine; however remote options are required in order to establish aconnection to the remote machine before executing the CCM commands:ArgumentValueDescription--ssh-hoststringHostname or IP address to use for SSH connection--ssh-portintPort to use for SSH connectionDefault is 22--ssh-usernamestringUsername to use for username/password or public key authentication--ssh-passwordstringPassword to use for username/password or private key passphrase using public key authentication--ssh-private-keyfilenamePrivate key to use for SSH connectionSpecial HandlingSome commands require files to be located on the remote server. Those commandsare pre-processed, file transfers are initiated, and updates are made to theargument value for the remote execution of the CCM command:ParameterDescription--dse-credentialsCopy local DSE credentials file to remote server--node-sslRecursively copy node SSL directory to remote server--sslRecursively copy SSL directory to remote serverShort Versionccm --ssh-host=192.168.33.11 --ssh-username=vagrant --ssh-password=vagrant create test -v 2.0.5 -n 3 -i 192.168.33.1 -sNote: -i is used to add an IP prefix during the create process to ensurethat the nodes communicate using the proper IP address for their nodeSource DistributionIf you'd like to use a source distribution instead of the default binary each time (for example, for Continuous Integration), you can prefix cassandra version with source:, for example:ccm create test -v source:2.0.5 -n 3 -sAutomatic Version FallbackIf 'binary:' or 'source:' are not explicitly specified in your version string, then ccm will fallback to building the requested version from git if it cannot access the apache mirrors.Git and GitHubTo use the latest version from the canonical Apache Git repository, use the version name git:branch-name, e.g.:ccm create trunk -v git:trunk -n 5and to download a branch from a GitHub fork of Cassandra, you can prefix the repository and branch with github:, e.g.:ccm create patched -v github:jbellis/trunk -n 1Bash command-line completionccm has many sub-commands for both cluster commands as well as node commands, and sometimes you don't quite remember the name of the sub-command you want to invoke. Also, command lines may be long due to long cluster or node names.Leverage bash's programmable completion feature to make ccm use more pleasant. Copy misc/ccm-completion.bash to somewhere in your home directory (or /etc if you want to make it accessible to all users of your system) and source it in your .bash_profile:. ~/scripts/ccm-completion.bashOnce set up, ccm sw<tab> expands to ccm switch, for example. The switch sub-command has extra completion logic to help complete the cluster name. So ccm switch cl<tab> would expand to ccm switch cluster-58 if cluster-58 is the only cluster whose name starts with "cl". If there is ambiguity, hitting <tab> a second time shows the choices that match:$ ccm switch cl<tab> ... becomes ...$ ccm switch cluster- ... then hit tab twice ...cluster-56 cluster-85 cluster-96$ ccm switch cluster-8<tab> ... becomes ...$ ccm switch cluster-85It dynamically determines available sub-commands based on the ccm being invoked. Thus, users running multiple ccm's (or a ccm that they are continuously updating with new commands) will automagically work.The completion script relies on ccm having two hidden subcommands:show-cluster-cmds - emits the names of cluster sub-commands.show-node-cmds - emits the names of node sub-commands.Thus, it will not work with sufficiently old versions of ccm.Remote debuggingIf you would like to connect to your Cassandra nodes with a remote debugger you have to pass the -d (or --debug) flag to the populate command:ccm populate -d -n 3That will populate 3 nodes on IP 127.0.0.[1, 2, 3] setting up the remote debugging on ports 2100, 2200 and 2300.The main thread will not be suspended so you don't have to connect with a remote debugger to start a node.Alternatively you can also specify a remote port with the -r (or --remote-debug-port) flag while adding a nodeccm add node4 -r 5005 -i 127.0.0.4 -j 7400 -bWhere things are storedBy default, ccm stores all the node data and configuration files under ~/.ccm/cluster_name/.This can be overridden using the --config-dir option with each command.DataStax EnterpriseCCM 2.0 supports creating and interacting with DSE clusters. The --dseoption must be used with the ccm create command. See the ccm create -hhelp for assistance.CCM LibThe ccm facilities are available programmatically through ccmlib. This couldbe used to implement automated tests against Cassandra. A simple example ofhow to use ccmlib follows:import ccmlib.clusterCLUSTER_PATH="."cluster = ccmlib.cluster.Cluster(CLUSTER_PATH, 'test', cassandra_version='2.1.14')cluster.populate(3).start()[node1, node2, node3] = cluster.nodelist()# do some tests on the cluster/nodes. To connect to a node through thrift,# the host and port to a node is available through# node.network_interfaces['thrift']cluster.flush()node2.compact()# do some other tests# after the test, you can leave the cluster running, you can stop all nodes# using cluster.stop() but keep the data around (in CLUSTER_PATH/test), or# you can remove everything with cluster.remove()--Sylvain Lebresne sylvain@datastax.com

Read this article if you want to know more about riptano/ccm

A script/library to create, launch and remove an Apache Cassandra cluster on localhost.

The goal of ccm and ccmlib is to make it easy to create, manage and destroy a small Cassandra cluster on a local box. It is meant for testing a Cassandra cluster.

Requirements

A working python installation (tested to work with python 2.7).
pyYAML (http://pyyaml.org/ -- sudo easy_install pyYaml)
six (https://pypi.org/project/six/ -- sudo easy_install six)
ant (http://ant.apache.org/, on Mac OS X, brew install ant)
psutil (https://pypi.org/project/psutil/)
Java (which version depends on the version of Cassandra you plan to use. If unsure, use Java 7 as it is known to work with current versions of Cassandra).
If you want to create multiple node clusters, the simplest way is to use multiple loopback aliases. On modern linux distributions you probably don't need to do anything, but on Mac OS X, you will need to create the aliases with
```
sudo ifconfig lo0 alias 127.0.0.2 up
sudo ifconfig lo0 alias 127.0.0.3 up
...
```
Note that the usage section assumes that at least 127.0.0.1, 127.0.0.2 and 127.0.0.3 are available.

Optional Requirements

Paramiko (http://www.paramiko.org/): Paramiko adds the ability to execute CCM remotely; pip install paramiko

Note: The remote machine must be configured with an SSH server and a working CCM. When working with multiple nodes each exposed IP address must be in sequential order. For example, the last number in the 4th octet of a IPv4 address must start with 1 (e.g. 192.168.33.11). See Vagrantfile for help with configuration of remote CCM machine.

Known issues

Windows only:

node start pops up a window, stealing focus.
cqlsh started from ccm show incorrect prompts on command-prompt
non nodetool-based command-line options fail (sstablesplit, scrub, etc)
To install psutil, you must use the .msi from pypi. pip install psutil will not work
You will need ant.bat in your PATH in order to build C* from source
You must run with an Unrestricted Powershell Execution-Policy if using Cassandra 2.1.0+
Ant installed via chocolatey will not be found by ccm, so you must create a symbolic link in order to fix the issue (as administrator):
- cmd /c mklink C:\ProgramData\chocolatey\bin\ant.bat C:\ProgramData\chocolatey\bin\ant.exe

Remote Execution only:

Using --config-dir and --install-dir with create may not work as expected; since the configuration directory and the installation directory contain lots of files they will not be copied over to the remote machine like most other options for cluster and node operations
cqlsh started from ccm using remote execution will not start properly (e.g.ccm --ssh-host 192.168.33.11 node1 cqlsh); however -x <CMDS> or --exec=CMDS can still be used to execute a CQLSH command on a remote node.

Installation

ccm uses python distutils so from the source directory run:

sudo ./setup.py install

ccm is available on the Python Package Index:

pip install ccm

There is also a Homebrew package available:

brew install ccm

Usage

Let's say you wanted to fire up a 3 node Cassandra cluster.

Short version

ccm create test -v 2.0.5 -n 3 -s

You will of course want to replace 2.0.5 by whichever version of Cassandra you want to test.

Longer version

ccm works from a Cassandra source tree (not the jars). There are two ways to tell ccm how to find the sources:

If you have downloaded and compiled Cassandra sources, you can ask ccm to use those by initiating a new cluster with:

ccm create test --install-dir=<path/to/cassandra-sources>

or, from that source tree directory, simply
```
 ccm create test
```
You can ask ccm to use a released version of Cassandra. For instance to use Cassandra 2.0.5, run
```
 ccm create test -v 2.0.5
```
ccm will download the binary (from http://archive.apache.org/dist/cassandra), and set the new cluster to use it. This means that this command can take a few minutes the first time you create a cluster for a given version. ccm saves the compiled source in ~/.ccm/repository/, so creating a cluster for that version will be much faster the second time you run it (note however that if you create a lot of clusters with different versions, this will take up disk space).

Once the cluster is created, you can populate it with 3 nodes with:

ccm populate -n 3

For Mac OSX, create a new interface for every node besides the first, for example if you populated your cluster with 3 nodes, create interfaces for 127.0.0.2 and 127.0.0.3 like so:

sudo ifconfig lo0 alias 127.0.0.2
sudo ifconfig lo0 alias 127.0.0.3

Note these aliases will disappear on reboot. For permanent network aliases on Mac OSX see .

After that execute:

ccm start

That will start 3 nodes on IP 127.0.0.[1, 2, 3] on port 9160 for thrift, port 7000 for the internal cluster communication and ports 7100, 7200 and 7300 for JMX. You can check that the cluster is correctly set up with

ccm node1 ring

You can then bootstrap a 4th node with

ccm add node4 -i 127.0.0.4 -j 7400 -b

(populate is just a shortcut for adding multiple nodes initially)

ccm provides a number of conveniences, like flushing all of the nodes of the cluster:

ccm flush

or only one node:

ccm node2 flush

You can also easily look at the log file of a given node with:

ccm node1 showlog

Finally, you can get rid of the whole cluster (which will stop the node and remove all the data) with

ccm remove

The list of other provided commands is available through

ccm

Each command is then documented through the -h (or --help) flag. For instance ccm add -h describes the options for ccm add.

Remote Usage (SSH/Paramiko)

All the usage examples above will work exactly the same for a remotely configured machine; however remote options are required in order to establish a connection to the remote machine before executing the CCM commands:

Argument	Value	Description
--ssh-host	string	Hostname or IP address to use for SSH connection
--ssh-port	int	Port to use for SSH connection Default is 22
--ssh-username	string	Username to use for username/password or public key authentication
--ssh-password	string	Password to use for username/password or private key passphrase using public key authentication
--ssh-private-key	filename	Private key to use for SSH connection

Special Handling

Some commands require files to be located on the remote server. Those commands are pre-processed, file transfers are initiated, and updates are made to the argument value for the remote execution of the CCM command:

Parameter	Description
`--dse-credentials`	Copy local DSE credentials file to remote server
`--node-ssl`	Recursively copy node SSL directory to remote server
`--ssl`	Recursively copy SSL directory to remote server

Short Version

ccm --ssh-host=192.168.33.11 --ssh-username=vagrant --ssh-password=vagrant create test -v 2.0.5 -n 3 -i 192.168.33.1 -s

Note: -i is used to add an IP prefix during the create process to ensure that the nodes communicate using the proper IP address for their node

Source Distribution

If you'd like to use a source distribution instead of the default binary each time (for example, for Continuous Integration), you can prefix cassandra version with source:, for example:

ccm create test -v source:2.0.5 -n 3 -s

Automatic Version Fallback

If 'binary:' or 'source:' are not explicitly specified in your version string, then ccm will fallback to building the requested version from git if it cannot access the apache mirrors.

Git and GitHub

To use the latest version from the canonical Apache Git repository, use the version name git:branch-name, e.g.:

ccm create trunk -v git:trunk -n 5

and to download a branch from a GitHub fork of Cassandra, you can prefix the repository and branch with github:, e.g.:

ccm create patched -v github:jbellis/trunk -n 1

Bash command-line completion

ccm has many sub-commands for both cluster commands as well as node commands, and sometimes you don't quite remember the name of the sub-command you want to invoke. Also, command lines may be long due to long cluster or node names.

Leverage bash's programmable completion feature to make ccm use more pleasant. Copy misc/ccm-completion.bash to somewhere in your home directory (or /etc if you want to make it accessible to all users of your system) and source it in your .bash_profile:

. ~/scripts/ccm-completion.bash

Once set up, ccm sw<tab> expands to ccm switch, for example. The switch sub-command has extra completion logic to help complete the cluster name. So ccm switch cl<tab> would expand to ccm switch cluster-58 if cluster-58 is the only cluster whose name starts with "cl". If there is ambiguity, hitting <tab> a second time shows the choices that match:

$ ccm switch cl<tab>
    ... becomes ...
$ ccm switch cluster-
    ... then hit tab twice ...
cluster-56  cluster-85  cluster-96
$ ccm switch cluster-8<tab>
    ... becomes ...
$ ccm switch cluster-85

It dynamically determines available sub-commands based on the ccm being invoked. Thus, users running multiple ccm's (or a ccm that they are continuously updating with new commands) will automagically work.

The completion script relies on ccm having two hidden subcommands:

show-cluster-cmds - emits the names of cluster sub-commands.
show-node-cmds - emits the names of node sub-commands.

Thus, it will not work with sufficiently old versions of ccm.

Remote debugging

If you would like to connect to your Cassandra nodes with a remote debugger you have to pass the -d (or --debug) flag to the populate command:

ccm populate -d -n 3

That will populate 3 nodes on IP 127.0.0.[1, 2, 3] setting up the remote debugging on ports 2100, 2200 and 2300. The main thread will not be suspended so you don't have to connect with a remote debugger to start a node.

Alternatively you can also specify a remote port with the -r (or --remote-debug-port) flag while adding a node

ccm add node4 -r 5005 -i 127.0.0.4 -j 7400 -b

Where things are stored

By default, ccm stores all the node data and configuration files under ~/.ccm/cluster_name/. This can be overridden using the --config-dir option with each command.

DataStax Enterprise

CCM 2.0 supports creating and interacting with DSE clusters. The --dse option must be used with the ccm create command. See the ccm create -h help for assistance.

CCM Lib

The ccm facilities are available programmatically through ccmlib. This could be used to implement automated tests against Cassandra. A simple example of how to use ccmlib follows:

import ccmlib.cluster
CLUSTER_PATH="."
cluster = ccmlib.cluster.Cluster(CLUSTER_PATH, 'test', cassandra_version='2.1.14')
cluster.populate(3).start()
[node1, node2, node3] = cluster.nodelist()
# do some tests on the cluster/nodes. To connect to a node through thrift,
# the host and port to a node is available through
#   node.network_interfaces['thrift']
cluster.flush()
node2.compact()
# do some other tests
# after the test, you can leave the cluster running, you can stop all nodes
# using cluster.stop() but keep the data around (in CLUSTER_PATH/test), or
# you can remove everything with cluster.remove()

-- Sylvain Lebresne sylvain@datastax.com

migration

proxy

datastax

GitHub - datastax/zdm-proxy: An open-source component designed to seamlessly handle the real-time client application activity while a migration is in progress.

datastax

11/1/2024

migration

proxy

datastax

GitHub - datastax/zdm-proxy: An open-source component designed to seamlessly handle the real-time client application activity while a migration is in progress.

',d,a,t,a,s,t,a,x,'

11/1/2024

cloud

kubernetes

datastax

DataStax Hyper-Converged Database: The Future of Data Infrastructure Is Here | DataStax

Patrick McFadin

7/11/2024

cluster

troubleshooting

datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

arodrime

4/3/2024

node

hybrid.cloud

datastax

GitHub - IBM/datastax-cassandra-clickstream: Use DataStax Enterprise built on Apache Cassandra as a clickstream database

IBM

12/8/2023

examples

cassandra

datastax

GitHub - datastaxdevs/workshop-betterreads: Clone of Good Reads using Spring and Cassandra

datastaxdevs

12/2/2023

examples

cassandra

datastax

NoSQL Database Built on Apache Cassandra | DataStax

John Doe

12/2/2023

examples

cassandra

datastax

DataStax Examples

John Doe

12/2/2023

web.scraping

scraping

datastax

Build a Website Scraper with Astra DB + Python Examples | DataStax

John Doe

12/2/2023

datastax

cassandra

langchain

Super Charge AI Assistants with Superagent and DataStax | DataStax

John Doe

11/30/2023

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cluster

troubleshooting

datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

arodrime

4/3/2024