Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

6/26/2019

Reading time:7 min

Setting Up a Cassandra Cluster With Vagrant - DZone Cloud

by John Doe

This is part 1 of a Cassandra cluster tutorial series. Part 1 uses Vagrant to setup a local Cassandra Cluster and installs Cassandra on boxes.  Later parts of this Cassandra Cluster tutorial series will setup Ansible/ssh for DevOps/DBA tasks, use Packer to create EC2 AMIs and instances, and setup a Cassandra cluster in EC2. The cassandra-image (on GitHub) project creates CentOS 7/Cassandra images for Docker, VirtualBox/Vagrant and AWS/EC2 using best practices for Cassandra Linux/OS setup and utilities to auto-configure Cassandra based on the ergonmics of the environment.It is nice to use Vagrant and/or Docker for local development so we support both. At this time, it is hard to develop systemd services using Docker, so we use Vagrant. Since we do a lot of systemd development, we like to use Vagrant. Our real target, for the most part, is EC2, AWS, VPCs, etc.The cassandra-image project packages systemd utilities, which run as systemd services to monitor:the OS and send metrics to AWS CloudWatch metrics. logs from the OS and send them to AWS CloudWatch logs. logs from Cassandra and send them to AWS CloudWatch Logs. Cassandra stats and send them to AWS CloudWatch Metrics.The cassandra-image project uses the Cassandra cloud project to configure Cassandra running in instances to aid in setting up the cluster.With this in mind, let’s setup Vagrant to launch a Cassandra cluster locally.We are going to setup three nodes using Vagrant as follows that use our provision scripts to install Cassandra and utilities:192.168.50.4 cassandra node0 192.168.50.5 cassandra node1 192.168.50.5 cassandra node2Cassandra Cluster: Set Up Network of Boxes Using VagrantVagrant.configure("2") do |config| # Use CentOS 7 config.vm.box = "centos/7" # Setup 4 cpus and 3096 MB of memory for each instance config.vm.provider "virtualbox" do |vb| vb.memory = "3096" vb.cpus = 4 end # Run the provision install scripts config.vm.provision "shell", inline: <<-SHELL sudo /vagrant/scripts/000-vagrant-provision.sh SHELL config.vm.define "node0" do |node0| ... # Node 0 is 192.168.50.4 node0.vm.network "private_network", ip: "192.168.50.4" ... end config.vm.define "node1" do |node1| ... # Node 1 is 192.168.50.5 node1.vm.network "private_network", ip: "192.168.50.5" ... end config.vm.define "node2" do |node2| ... # Node 2 is 192.168.50.6 node2.vm.network "private_network", ip: "192.168.50.6" endNotice that we set up three boxes on a private network from the same Vagrant file.In this example, we will use these three servers as seed nodes. Seed nodes are Cassandra nodes that are first contacted by other Cassandra nodes that join the Cassandra cluster. It is a good idea to have two or three of seeds node as having one would be a SPOF (single point of failure).In this example, we will use the utility cassandra-cloud to configure the seed nodes. We will also use cassandra-cloud to tell Cassandra which address to listen on for clustering (storage network), and which address to listen on for client connections.The cassandra-cloud (open source utility written in Go lang) is a utility that helps you configure Cassandra and install Cassandra for cloud environments based on server ergonomics (num of data centers, number of disks, number of Cores, type of disk). This utility works well in Docker, Heroku, Mesos/Marathon, Kubernetes, EC2, and VirtualBox environments (and similar environments). For example, it could be kicked off as a USER_DATA script in Amazon EC2 (AWS EC2), and if you change the size of the EC2 instance it can adjust the Cassandra setting accordingly. CassandraCloud usually runs once when an instance is first launched and then never again (or if you redeploy on a larger EC2 instance).Using Cassandra-Cloud From Vagrant for Cassandra Cluster nodes# -*- mode: ruby -*-# vi: set ft=ruby :Vagrant.configure("2") do |config|... config.vm.define "node0" do |node0|... node0.vm.network "private_network", ip: "192.168.50.4" ### Use Cassandra cloud to configure Cassandra before launching it. ### Set the cluster name to test, set the client-address and the cluster-address. ### Also setup the Cassandra seed nodes. node0.vm.provision "shell", inline: <<-SHELL sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \ -client-address 192.168.50.4 \ -cluster-address 192.168.50.4 \ -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6 /opt/cassandra/bin/cassandra -R SHELL end config.vm.define "node1" do |node1|... node1.vm.provision "shell", inline: <<-SHELL sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \ -client-address 192.168.50.5 \ -cluster-address 192.168.50.5 \ -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6 /opt/cassandra/bin/cassandra -R SHELL end config.vm.define "node2" do |node2|... node2.vm.provision "shell", inline: <<-SHELL sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \ -client-address 192.168.50.6 \ -cluster-address 192.168.50.6 \ -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6 /opt/cassandra/bin/cassandra -R SHELL end...endAbove you can see that we use cassandra-cloud is invoked as the provision shell for each indivdual box; it installs and configures Cassandra. It sets the name of the cassandra cluster (test) (-cluster-name commnad line argument), which address to bind the cassandra client tranport to (-client-address command line argument), which address to bid the cluster transport (storage transport) to, and a list of seed nodes (-cluster-seeds).We could start ten more servers and we would not have to change the seeds nodes. New servers would learn the topology of the cluster from one or more of the Cassandra seeds nodes specified with -cluster-seeds.The utility cassandra-cloud can read setting from environment variables so that it can work well in Mesos, Docker, Heroku, Kubernetes, (or any 12 factor DevOps environment) etc. In later tutorials in this Cassanrda tutorial series, we will use cassandra-cloud with AWS/EC2 when we cover AWS Cassandra. The cassandra-cloud can also read properties from a config file. It can also read properties from the command line. Environment variables override config file settings, and command line args override Environment variables. The cassandra-image creates a cassandra-cloud config file and config templates that can be modified. The cassandra-cloud utility can setup memory, threads, number of workers, etc. for Cassandra. You can set values explicitly or they can be set by looking that the ergonomics of the server.Ok. Let’s test our cassandra cluster out. Here we will use vagrant to start up our cassandra cluster. Then we will log into one of the nodes (node0), and run the Cassandra nodetool command to see which servers are connected to the cluster.Testing Our Cassandra Cluster Setup with nodetool$ vagrant up$ vagrant ssh node0[vagrant@localhost ~]$ ps -ef | grep cassandraroot 12414 1 2 19:16 ? 00:00:26 java -Xloggc:/opt/cassandra/bin/../logs/gc.log...$ /opt/cassandra/bin/nodetool describeclusterCluster Information: Name: test Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 86afa796-d883-3932-aa73-6b017cef0d19: [192.168.50.4, 192.168.50.5, 192.168.50.6]We can see that we have a cluster of three servers that make up the Cassandra cluster, namely, 192.16850.4, 192.16850.5, 192.16850.6. You can see the full Vagrant file on GitHub.More to Come from this Cassandra Cluster tutorialCheck back with us at the Cloudurable blog to find out more about cassandra-image and cassandra-cloud. We have a follow-up article where we setup SSL encryption for Cassandra. We setup SSL for the client transport and the cluster transport. Then we setup SSL for cqlsh so you can connect to your remote instance securely.Cloudurable provides cassandra support, cassandra consulting, cassandra training, as well as Cassandra examples like AWS CloudFormation templates, Packer, ansible to do common cassandra DBA and cassandra DevOps tasks. We also provide monitoring tools and images (AMI/Docker) to support Cassandra in production running in EC2. Our advanced Cassandra courses teaches how one could develop, support and deploy Cassandra to production in AWS EC2 and is geared towards DevOps, architects and DBAs.

Illustration Image

This is part 1 of a Cassandra cluster tutorial series. Part 1 uses Vagrant to setup a local Cassandra Cluster and installs Cassandra on boxes.  Later parts of this Cassandra Cluster tutorial series will setup Ansible/ssh for DevOps/DBA tasks, use Packer to create EC2 AMIs and instances, and setup a Cassandra cluster in EC2. 

The cassandra-image (on GitHub) project creates CentOS 7/Cassandra images for Docker, VirtualBox/Vagrant and AWS/EC2 using best practices for Cassandra Linux/OS setup and utilities to auto-configure Cassandra based on the ergonmics of the environment.

It is nice to use Vagrant and/or Docker for local development so we support both. At this time, it is hard to develop systemd services using Docker, so we use Vagrant. Since we do a lot of systemd development, we like to use Vagrant. Our real target, for the most part, is EC2, AWS, VPCs, etc.

The cassandra-image project packages systemd utilities, which run as systemd services to monitor:

The cassandra-image project uses the Cassandra cloud project to configure Cassandra running in instances to aid in setting up the cluster.

With this in mind, let’s setup Vagrant to launch a Cassandra cluster locally.

We are going to setup three nodes using Vagrant as follows that use our provision scripts to install Cassandra and utilities:

  • 192.168.50.4 cassandra node0

  • 192.168.50.5 cassandra node1

  • 192.168.50.5 cassandra node2

Cassandra Cluster: Set Up Network of Boxes Using Vagrant

Vagrant.configure("2") do |config|
  # Use CentOS 7
  config.vm.box = "centos/7"
  # Setup 4 cpus and 3096 MB of memory for each instance
  config.vm.provider "virtualbox" do |vb|
       vb.memory = "3096"
       vb.cpus = 4
  end
  # Run the provision install scripts
  config.vm.provision "shell", inline: <<-SHELL
        sudo /vagrant/scripts/000-vagrant-provision.sh
  SHELL
  config.vm.define "node0" do |node0|
    ...
    # Node 0 is 192.168.50.4
    node0.vm.network "private_network", ip: "192.168.50.4"
   ...
  end
  config.vm.define "node1" do |node1|
    ...
    # Node 1 is 192.168.50.5
    node1.vm.network "private_network", ip: "192.168.50.5"
    ...
  end
  config.vm.define "node2" do |node2|
    ...
    # Node 2 is 192.168.50.6
    node2.vm.network "private_network", ip: "192.168.50.6"
  end

Notice that we set up three boxes on a private network from the same Vagrant file.

In this example, we will use these three servers as seed nodes. Seed nodes are Cassandra nodes that are first contacted by other Cassandra nodes that join the Cassandra cluster. It is a good idea to have two or three of seeds node as having one would be a SPOF (single point of failure).

In this example, we will use the utility cassandra-cloud to configure the seed nodes. We will also use cassandra-cloud to tell Cassandra which address to listen on for clustering (storage network), and which address to listen on for client connections.

The cassandra-cloud (open source utility written in Go lang) is a utility that helps you configure Cassandra and install Cassandra for cloud environments based on server ergonomics (num of data centers, number of disks, number of Cores, type of disk). This utility works well in Docker, Heroku, Mesos/Marathon, Kubernetes, EC2, and VirtualBox environments (and similar environments). For example, it could be kicked off as a USER_DATA script in Amazon EC2 (AWS EC2), and if you change the size of the EC2 instance it can adjust the Cassandra setting accordingly. CassandraCloud usually runs once when an instance is first launched and then never again (or if you redeploy on a larger EC2 instance).

Using Cassandra-Cloud From Vagrant for Cassandra Cluster nodes

# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
...
  config.vm.define "node0" do |node0|
...
    node0.vm.network "private_network", ip: "192.168.50.4"
    ### Use Cassandra cloud to configure Cassandra before launching it.
    ### Set the cluster name to test, set the client-address and the cluster-address.
    ### Also setup the Cassandra seed nodes.
    node0.vm.provision "shell", inline: <<-SHELL
                sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \
                -client-address 192.168.50.4 \
                -cluster-address  192.168.50.4 \
                -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6
                /opt/cassandra/bin/cassandra -R
    SHELL
  end
  config.vm.define "node1" do |node1|
...
    node1.vm.provision "shell", inline: <<-SHELL
                sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \
                -client-address 192.168.50.5 \
                -cluster-address  192.168.50.5 \
                -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6
                /opt/cassandra/bin/cassandra -R
    SHELL
  end
  config.vm.define "node2" do |node2|
...
    node2.vm.provision "shell", inline: <<-SHELL
                sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test  \
                -client-address 192.168.50.6 \
                -cluster-address  192.168.50.6 \
                -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6
                /opt/cassandra/bin/cassandra -R
    SHELL
  end
...
end

Above you can see that we use cassandra-cloud is invoked as the provision shell for each indivdual box; it installs and configures Cassandra. It sets the name of the cassandra cluster (test) (-cluster-name commnad line argument), which address to bind the cassandra client tranport to (-client-address command line argument), which address to bid the cluster transport (storage transport) to, and a list of seed nodes (-cluster-seeds).

We could start ten more servers and we would not have to change the seeds nodes. New servers would learn the topology of the cluster from one or more of the Cassandra seeds nodes specified with -cluster-seeds.

The utility cassandra-cloud can read setting from environment variables so that it can work well in Mesos, Docker, Heroku, Kubernetes, (or any 12 factor DevOps environment) etc. In later tutorials in this Cassanrda tutorial series, we will use cassandra-cloud with AWS/EC2 when we cover AWS Cassandra. The cassandra-cloud can also read properties from a config file. It can also read properties from the command line. Environment variables override config file settings, and command line args override Environment variables. The cassandra-image creates a cassandra-cloud config file and config templates that can be modified. The cassandra-cloud utility can setup memory, threads, number of workers, etc. for Cassandra. You can set values explicitly or they can be set by looking that the ergonomics of the server.

Ok. Let’s test our cassandra cluster out. Here we will use vagrant to start up our cassandra cluster. Then we will log into one of the nodes (node0), and run the Cassandra nodetool command to see which servers are connected to the cluster.

Testing Our Cassandra Cluster Setup with nodetool

$ vagrant up
$ vagrant ssh node0
[vagrant@localhost ~]$ ps -ef | grep cassandra
root     12414     1  2 19:16 ?        00:00:26 java -Xloggc:/opt/cassandra/bin/../logs/gc.log
...
$ /opt/cassandra/bin/nodetool describecluster
Cluster Information:
        Name: test
        Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
                86afa796-d883-3932-aa73-6b017cef0d19: [192.168.50.4, 192.168.50.5, 192.168.50.6]

We can see that we have a cluster of three servers that make up the Cassandra cluster, namely, 192.16850.4, 192.16850.5, 192.16850.6. You can see the full Vagrant file on GitHub.

More to Come from this Cassandra Cluster tutorial

Check back with us at the Cloudurable blog to find out more about cassandra-image and cassandra-cloud. We have a follow-up article where we setup SSL encryption for Cassandra. We setup SSL for the client transport and the cluster transport. Then we setup SSL for cqlsh so you can connect to your remote instance securely.

Cloudurable provides cassandra supportcassandra consulting, cassandra training, as well as Cassandra examples like AWS CloudFormation templates, Packer, ansible to do common cassandra DBA and cassandra DevOps tasks. We also provide monitoring tools and images (AMI/Docker) to support Cassandra in production running in EC2. Our advanced Cassandra courses teaches how one could develop, support and deploy Cassandra to production in AWS EC2 and is geared towards DevOps, architects and DBAs.

Related Articles

cassandra
ansible
vagrant

apkan/vagrant-cassandra-ansible

John Doe

11/14/2019

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra