This is part 1 of a Cassandra cluster tutorial series. Part 1 uses Vagrant to setup a local Cassandra Cluster and installs Cassandra on boxes. Later parts of this Cassandra Cluster tutorial series will setup Ansible/ssh for DevOps/DBA tasks, use Packer to create EC2 AMIs and instances, and setup a Cassandra cluster in EC2.
cassandra-image (on GitHub) project creates CentOS 7/Cassandra images for Docker, VirtualBox/Vagrant and AWS/EC2 using best practices for Cassandra Linux/OS setup and utilities to auto-configure Cassandra based on the ergonmics of the environment.
It is nice to use Vagrant and/or Docker for local development so we support both. At this time, it is hard to develop systemd services using Docker, so we use Vagrant. Since we do a lot of systemd development, we like to use Vagrant. Our real target, for the most part, is EC2, AWS, VPCs, etc.
cassandra-image project packages systemd utilities, which run as systemd services to monitor:
logs from Cassandra and send them to AWS CloudWatch Logs.
Cassandra stats and send them to AWS CloudWatch Metrics.
With this in mind, let’s setup Vagrant to launch a Cassandra cluster locally.
We are going to setup three nodes using Vagrant as follows that use our provision scripts to install Cassandra and utilities:
192.168.50.4 cassandra node0
192.168.50.5 cassandra node1
192.168.50.5 cassandra node2
Cassandra Cluster: Set Up Network of Boxes Using Vagrant
Vagrant.configure("2") do |config| # Use CentOS 7 config.vm.box = "centos/7" # Setup 4 cpus and 3096 MB of memory for each instance config.vm.provider "virtualbox" do |vb| vb.memory = "3096" vb.cpus = 4 end # Run the provision install scripts config.vm.provision "shell", inline: <<-SHELL sudo /vagrant/scripts/000-vagrant-provision.sh SHELL config.vm.define "node0" do |node0| ... # Node 0 is 192.168.50.4 node0.vm.network "private_network", ip: "192.168.50.4" ... end config.vm.define "node1" do |node1| ... # Node 1 is 192.168.50.5 node1.vm.network "private_network", ip: "192.168.50.5" ... end config.vm.define "node2" do |node2| ... # Node 2 is 192.168.50.6 node2.vm.network "private_network", ip: "192.168.50.6" end
In this example, we will use these three servers as seed nodes. Seed nodes are Cassandra nodes that are first contacted by other Cassandra nodes that join the Cassandra cluster. It is a good idea to have two or three of seeds node as having one would be a SPOF (single point of failure).
In this example, we will use the utility
cassandra-cloud to configure the seed nodes. We will also use
cassandra-cloud to tell Cassandra which address to listen on for clustering (storage network), and which address to listen on for client connections.
The cassandra-cloud (open source utility written in Go lang) is a utility that helps you configure Cassandra and install Cassandra for cloud environments based on server ergonomics (num of data centers, number of disks, number of Cores, type of disk). This utility works well in Docker, Heroku, Mesos/Marathon, Kubernetes, EC2, and VirtualBox environments (and similar environments). For example, it could be kicked off as a USER_DATA script in Amazon EC2 (AWS EC2), and if you change the size of the EC2 instance it can adjust the Cassandra setting accordingly. CassandraCloud usually runs once when an instance is first launched and then never again (or if you redeploy on a larger EC2 instance).
Using Cassandra-Cloud From Vagrant for Cassandra Cluster nodes
# -*- mode: ruby -*- # vi: set ft=ruby : Vagrant.configure("2") do |config| ... config.vm.define "node0" do |node0| ... node0.vm.network "private_network", ip: "192.168.50.4" ### Use Cassandra cloud to configure Cassandra before launching it. ### Set the cluster name to test, set the client-address and the cluster-address. ### Also setup the Cassandra seed nodes. node0.vm.provision "shell", inline: <<-SHELL sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \ -client-address 192.168.50.4 \ -cluster-address 192.168.50.4 \ -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6 /opt/cassandra/bin/cassandra -R SHELL end config.vm.define "node1" do |node1| ... node1.vm.provision "shell", inline: <<-SHELL sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \ -client-address 192.168.50.5 \ -cluster-address 192.168.50.5 \ -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6 /opt/cassandra/bin/cassandra -R SHELL end config.vm.define "node2" do |node2| ... node2.vm.provision "shell", inline: <<-SHELL sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \ -client-address 192.168.50.6 \ -cluster-address 192.168.50.6 \ -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6 /opt/cassandra/bin/cassandra -R SHELL end ... end
Above you can see that we use
cassandra-cloud is invoked as the provision shell for each indivdual box; it installs and configures Cassandra. It sets the name of the cassandra cluster (test) (-cluster-name commnad line argument), which address to bind the cassandra client tranport to (-client-address command line argument), which address to bid the cluster transport (storage transport) to, and a list of seed nodes (-cluster-seeds).
We could start ten more servers and we would not have to change the seeds nodes. New servers would learn the topology of the cluster from one or more of the Cassandra seeds nodes specified with -cluster-seeds.
cassandra-cloud can read setting from environment variables so that it can work well in Mesos, Docker, Heroku, Kubernetes, (or any 12 factor DevOps environment) etc. In later tutorials in this Cassanrda tutorial series, we will use cassandra-cloud with AWS/EC2 when we cover AWS Cassandra. The cassandra-cloud can also read properties from a config file. It can also read properties from the command line. Environment variables override config file settings, and command line args override Environment variables. The
cassandra-image creates a
cassandra-cloud config file and config templates that can be modified. The
cassandra-cloud utility can setup memory, threads, number of workers, etc. for Cassandra. You can set values explicitly or they can be set by looking that the ergonomics of the server.
Ok. Let’s test our cassandra cluster out. Here we will use vagrant to start up our cassandra cluster. Then we will log into one of the nodes (node0), and run the Cassandra nodetool command to see which servers are connected to the cluster.
Testing Our Cassandra Cluster Setup with nodetool
$ vagrant up $ vagrant ssh node0 [vagrant@localhost ~]$ ps -ef | grep cassandra root 12414 1 2 19:16 ? 00:00:26 java -Xloggc:/opt/cassandra/bin/../logs/gc.log ... $ /opt/cassandra/bin/nodetool describecluster Cluster Information: Name: test Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 86afa796-d883-3932-aa73-6b017cef0d19: [192.168.50.4, 192.168.50.5, 192.168.50.6]
We can see that we have a cluster of three servers that make up the Cassandra cluster, namely, 192.16850.4, 192.16850.5, 192.16850.6. You can see the full Vagrant file on GitHub.
More to Come from this Cassandra Cluster tutorial
Check back with us at the Cloudurable blog to find out more about cassandra-image and cassandra-cloud. We have a follow-up article where we setup SSL encryption for Cassandra. We setup SSL for the client transport and the cluster transport. Then we setup SSL for cqlsh so you can connect to your remote instance securely.
Cloudurable provides cassandra support, cassandra consulting, cassandra training, as well as Cassandra examples like AWS CloudFormation templates, Packer, ansible to do common cassandra DBA and cassandra DevOps tasks. We also provide monitoring tools and images (AMI/Docker) to support Cassandra in production running in EC2. Our advanced Cassandra courses teaches how one could develop, support and deploy Cassandra to production in AWS EC2 and is geared towards DevOps, architects and DBAs.