This is part 1 of a Cassandra cluster tutorial series. Part 1 uses Vagrant to setup a local Cassandra Cluster and installs Cassandra on boxes. Later parts of this Cassandra Cluster tutorial series will setup Ansible/ssh for DevOps/DBA tasks, use Packer to create EC2 AMIs and instances, and setup a Cassandra cluster in EC2.
The cassandra-image
(on GitHub) project creates CentOS 7/Cassandra images for Docker, VirtualBox/Vagrant and AWS/EC2 using best practices for Cassandra Linux/OS setup and utilities to auto-configure Cassandra based on the ergonmics of the environment.
It is nice to use Vagrant and/or Docker for local development so we support both. At this time, it is hard to develop systemd services using Docker, so we use Vagrant. Since we do a lot of systemd development, we like to use Vagrant. Our real target, for the most part, is EC2, AWS, VPCs, etc.
The cassandra-image
project packages systemd utilities, which run as systemd services to monitor:
the OS and send metrics to AWS CloudWatch metrics.
logs from Cassandra and send them to AWS CloudWatch Logs.
Cassandra stats and send them to AWS CloudWatch Metrics.
The cassandra-image project uses the Cassandra cloud project to configure Cassandra running in instances to aid in setting up the cluster.
With this in mind, let’s setup Vagrant to launch a Cassandra cluster locally.
We are going to setup three nodes using Vagrant as follows that use our provision scripts to install Cassandra and utilities:
192.168.50.4 cassandra node0
192.168.50.5 cassandra node1
192.168.50.5 cassandra node2
Cassandra Cluster: Set Up Network of Boxes Using Vagrant
Vagrant.configure("2") do |config|
# Use CentOS 7
config.vm.box = "centos/7"
# Setup 4 cpus and 3096 MB of memory for each instance
config.vm.provider "virtualbox" do |vb|
vb.memory = "3096"
vb.cpus = 4
end
# Run the provision install scripts
config.vm.provision "shell", inline: <<-SHELL
sudo /vagrant/scripts/000-vagrant-provision.sh
SHELL
config.vm.define "node0" do |node0|
...
# Node 0 is 192.168.50.4
node0.vm.network "private_network", ip: "192.168.50.4"
...
end
config.vm.define "node1" do |node1|
...
# Node 1 is 192.168.50.5
node1.vm.network "private_network", ip: "192.168.50.5"
...
end
config.vm.define "node2" do |node2|
...
# Node 2 is 192.168.50.6
node2.vm.network "private_network", ip: "192.168.50.6"
end
Notice that we set up three boxes on a private network from the same Vagrant file.
In this example, we will use these three servers as seed nodes. Seed nodes are Cassandra nodes that are first contacted by other Cassandra nodes that join the Cassandra cluster. It is a good idea to have two or three of seeds node as having one would be a SPOF (single point of failure).
In this example, we will use the utility cassandra-cloud
to configure the seed nodes. We will also use cassandra-cloud
to tell Cassandra which address to listen on for clustering (storage network), and which address to listen on for client connections.
The cassandra-cloud (open source utility written in Go lang) is a utility that helps you configure Cassandra and install Cassandra for cloud environments based on server ergonomics (num of data centers, number of disks, number of Cores, type of disk). This utility works well in Docker, Heroku, Mesos/Marathon, Kubernetes, EC2, and VirtualBox environments (and similar environments). For example, it could be kicked off as a USER_DATA script in Amazon EC2 (AWS EC2), and if you change the size of the EC2 instance it can adjust the Cassandra setting accordingly. CassandraCloud usually runs once when an instance is first launched and then never again (or if you redeploy on a larger EC2 instance).
Using Cassandra-Cloud From Vagrant for Cassandra Cluster nodes
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
...
config.vm.define "node0" do |node0|
...
node0.vm.network "private_network", ip: "192.168.50.4"
### Use Cassandra cloud to configure Cassandra before launching it.
### Set the cluster name to test, set the client-address and the cluster-address.
### Also setup the Cassandra seed nodes.
node0.vm.provision "shell", inline: <<-SHELL
sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \
-client-address 192.168.50.4 \
-cluster-address 192.168.50.4 \
-cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6
/opt/cassandra/bin/cassandra -R
SHELL
end
config.vm.define "node1" do |node1|
...
node1.vm.provision "shell", inline: <<-SHELL
sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \
-client-address 192.168.50.5 \
-cluster-address 192.168.50.5 \
-cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6
/opt/cassandra/bin/cassandra -R
SHELL
end
config.vm.define "node2" do |node2|
...
node2.vm.provision "shell", inline: <<-SHELL
sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \
-client-address 192.168.50.6 \
-cluster-address 192.168.50.6 \
-cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6
/opt/cassandra/bin/cassandra -R
SHELL
end
...
end
Above you can see that we use cassandra-cloud
is invoked as the provision shell for each indivdual box; it installs and configures Cassandra. It sets the name of the cassandra cluster (test) (-cluster-name commnad line argument), which address to bind the cassandra client tranport to (-client-address command line argument), which address to bid the cluster transport (storage transport) to, and a list of seed nodes (-cluster-seeds).
We could start ten more servers and we would not have to change the seeds nodes. New servers would learn the topology of the cluster from one or more of the Cassandra seeds nodes specified with -cluster-seeds.
The utility cassandra-cloud
can read setting from environment variables so that it can work well in Mesos, Docker, Heroku, Kubernetes, (or any 12 factor DevOps environment) etc. In later tutorials in this Cassanrda tutorial series, we will use cassandra-cloud with AWS/EC2 when we cover AWS Cassandra. The cassandra-cloud can also read properties from a config file. It can also read properties from the command line. Environment variables override config file settings, and command line args override Environment variables. The cassandra-image
creates a cassandra-cloud
config file and config templates that can be modified. The cassandra-cloud
utility can setup memory, threads, number of workers, etc. for Cassandra. You can set values explicitly or they can be set by looking that the ergonomics of the server.
Ok. Let’s test our cassandra cluster out. Here we will use vagrant to start up our cassandra cluster. Then we will log into one of the nodes (node0), and run the Cassandra nodetool command to see which servers are connected to the cluster.
Testing Our Cassandra Cluster Setup with nodetool
$ vagrant up
$ vagrant ssh node0
[vagrant@localhost ~]$ ps -ef | grep cassandra
root 12414 1 2 19:16 ? 00:00:26 java -Xloggc:/opt/cassandra/bin/../logs/gc.log
...
$ /opt/cassandra/bin/nodetool describecluster
Cluster Information:
Name: test
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
86afa796-d883-3932-aa73-6b017cef0d19: [192.168.50.4, 192.168.50.5, 192.168.50.6]
We can see that we have a cluster of three servers that make up the Cassandra cluster, namely, 192.16850.4, 192.16850.5, 192.16850.6. You can see the full Vagrant file on GitHub.
More to Come from this Cassandra Cluster tutorial
Check back with us at the Cloudurable blog to find out more about cassandra-image and cassandra-cloud. We have a follow-up article where we setup SSL encryption for Cassandra. We setup SSL for the client transport and the cluster transport. Then we setup SSL for cqlsh so you can connect to your remote instance securely.
Cloudurable provides cassandra support, cassandra consulting, cassandra training, as well as Cassandra examples like AWS CloudFormation templates, Packer, ansible to do common cassandra DBA and cassandra DevOps tasks. We also provide monitoring tools and images (AMI/Docker) to support Cassandra in production running in EC2. Our advanced Cassandra courses teaches how one could develop, support and deploy Cassandra to production in AWS EC2 and is geared towards DevOps, architects and DBAs.