11/14/2019

Reading time:6 min

Create Cassandra Cluster in Vagrant using Ansible

by John Doe

Pratheep AnanthanMay 4 · 6 min readPhoto by Samson Creative. on UnsplashThis post explains about deploying a 3-node Cassandra cluster in your laptop by simply using a ‘vagrant up’ command.Usually, developers struggle in setting up tools and project environment in their laptop especially in Windows. If the purpose is just to check the connectivity with their program and a third party tool, its time consuming to understand the configurations of the third party tool and manually do the setup. It will be a more difficult task if its a database cluster setup. Therefore, the following automation can be a good friend in such cases.I prepared an Ansible role to automate Cassandra cluster deployment. Details of vagrant setup and Ansible role will be explained below.Required tools set:VirtualboxVagrantExecution steps:Install virtual box and vagrantgit clone https://github.com/apkan/vagrant-cassandra-ansible.gitcd vagrant-cassandra-ansiblevagrant upOutput:Ansible playbook will execute a set of tasks and eventually a Cassandra cluster status will be displayed.The final result of ‘vagrant up’ commandVagrantIn vagrant, Ansible local provisioner is used for this automated deployment. This helps to avoid initial difficulties in Ansible installation and setup, especially, you do not need to worry about SSH key exchanges.This vagrant file will deploy 3 nodes and 1 controller which will be used as Ansible controller node. Ansible will be installed automatically and execute ‘cassandraCluster’ playbook from the controller node when you issue ‘ vagrant up’ command.machine.vm.provider "virtualbox" do |vbox|vbox.memory = "1024"vbox.cpus = "1"vbox.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]vbox.customize ["modifyvm", :id, "--natdnsproxy1", "on"]endHardware resources for each vagrant nodes can be customized as per your hardware availability. ‘natdnshostresolver1’ and ‘natdnsproxy1’ will be enabled on all nodes to give a boost on networking performance.---all:hosts:controller:ansible_connection: localansible_ssh_host: x.x.x.xnode1:ansible_ssh_host: x.x.x.xansible_ssh_private_key_file: /vagrant/.vagrant/machines/node1/virtualbox/private_keynode2:ansible_ssh_host: x.x.x.xansible_ssh_private_key_file: /vagrant/.vagrant/machines/node2/virtualbox/private_keynode3:ansible_ssh_host: x.x.x.xansible_ssh_private_key_file: /vagrant/.vagrant/machines/node3/virtualbox/private_keychildren:cluster-nodes:hosts:node1:node2:node3:seed-nodes:hosts:node1:node2:This inventory file plays a key role in this setup. This will be the global configuration for host IPs, private keys and cassandra seed nodes. Vagrant and Ansible roles will fetch the IP address from this inventory yml file.config.vm.define "node1" do |machine| machine.vm.network "private_network", ip: inventory["all"]["hosts"]["node1"]["ansible_ssh_host"] ... endAnsible role└── cassandra-cluster├── handlers│ └── main.yml├── tasks│ └── main.yml├── templates│ ├── cassandra.sh│ └── cassandra.yaml.j2└── vars└── main.ymlThis is a simple Ansible role to deploy 3 node cassandra cluster. Following shows the tasks used in the role:- name: Check if java 8 alread installedcommand: java -version 2>&1 | grep version | awk '{print $3}' | sed 's/"//g'register: java_versionignore_errors: True- debug: msg="Java not installed"when: java_version is failed- name: installJavaRepoapt_repository: repo='ppa:openjdk-r/ppa'when: java_version is failed- name: updateCacheapt: update_cache=yeswhen: java_version is failed- name: installJavaapt: name=openjdk-8-jdk state=present when: java_version is failed- stat: path: /opt/apache-cassandra-3.7-bin.tar.gzregister: cassandra_source_file- name: download the sourceget_url: url: https://archive.apache.org/dist/cassandra/3.7/apache-cassandra-3.7-bin.tar.gzdest: /opt/apache-cassandra-3.7-bin.tar.gzmode: 0440when: cassandra_source_file.stat.exists == False- stat:path: /opt/apache-cassandra-3.7-bin.tar.gzchecksum_algorithm: sha1register: downloaded_file_checksum- name: download checksum from apache pageuri: url=https://archive.apache.org/dist/cassandra/3.7/apache-cassandra-3.7-bin.tar.gz.sha1 return_content=yesregister: apache_page_checksumfailed_when: downloaded_file_checksum.stat.checksum != apache_page_checksum.content- name: Create home directoryfile: path: "{{ cassandra_home }}"state: directorymode: 0755 - name: Extract cassandra sourceunarchive:src: /opt/apache-cassandra-3.7-bin.tar.gzdest: "{{ cassandra_home }}"extra_opts: [--strip-components=1]remote_src: yes - name: Export pathshell: "echo 'PATH=$PATH:{{ cassandra_path }}' > /etc/profile.d/custom-path.sh && . /etc/profile.d/custom-path.sh"- name: install the configuration filetemplate: src=cassandra.yaml.j2 dest='{{ cassandra_home }}/conf/cassandra.yaml' mode=750- name: copy the startup scripttemplate: src=cassandra.sh dest=/etc/init.d/cassandra owner=root group=root mode=755- name: Enable the daemonshell: update-rc.d cassandra "{{ item }}"with_items:- "defaults"- "enable"- name: start cassandra serviceservice: name=cassandra state=startednotify:- CheckClusterStatus- GetClusterStatusOutput- DisplayClusterStatusOutputMost of the important parameters in Cassandra.yaml are configured using Ansible template. Following shows the variables used in this role:cluster_name: MyCluster#Directory locationscassandra_home: /opt/cassandra cassandra_path: /opt/cassandra/bincassandra_data_directory: /opt/cassandra/datacassandra_hints_directory: /opt/cassandra/data/hintscassandra_commitlog_directory: /opt/cassandra/data/commitlogscassandra_saved_caches_directory: /opt/cassandra/data/saved_cachescassandra_seeds_resolved: "{{ groups['seed-nodes'] | map('extract', hostvars, ['ansible_ssh_host']) | join(',') }}"#For IP Address configuration in YMLbroadcast_address: "{{ hostvars[inventory_hostname]['ansible_ssh_host'] }}"listen_address: "{{ hostvars[inventory_hostname]['ansible_ssh_host'] }}"broadcast_rpc_address: "{{ hostvars[inventory_hostname]['ansible_ssh_host'] }}"#Port configurationscassandra_port: 9042rpc_port: 9160storage_port: 7000ssl_storage_port: 7001It was an interesting challenge to configure a list of Cassandra seed IPs. I have mentioned node 2 and node 3 as seed nodes in inventory file. Those two IPs will be passed to cassandra.yaml file from ‘cassandra_seeds_resolved’ variable.Best practice tips:Make use of variable as much as possible and avoid hard coding in Ansible tasks.Prepare Ansible roles with the concept of idempotency. This is to ensure that your environment is consistent. Also, this will help in reusing the same environment to test the changes in your Ansible roles.I used an Ansible task to check if java already installed. These sort of verification can help if we want to run this playbook in some other environment.Display cluster status at last of all tasks, so you can easily identify whether playbook execution successfully created the cluster.Connectivity TestingIn addition, I prepared a simple java program to check the connectivity. You just need to parse one of the node IP and port (mentioned in inventory file), then the driver can fetch the other node’s IP and display the cluster details.Connectivity testDestroying the setupJust like how we deployed this setup, destroying it is also super easy. Anytime, you can destroy this whole setup by simply using the ‘vagrant destroy’ command.ConclusionI hope this post helps to give an idea about how deployment in local environment can be automated using tools like Vagrant and Ansible.Please note that this can be faster and more efficient if you use cloud environment instead of Vagrant.( I will write that in another post — “Deployment using Terraform and Ansible”).ReferencesFull source code available in https://github.com/apkan/vagrant-cassandra-ansible.githttps://www.vagrantup.com/docs/provisioning/ansible_local.htmlhttps://galaxy.ansible.com/andrewrothstein/cassandra-clusterhttps://docs.datastax.com/en/developer/java-driver/3.6/manual/Follow us on Twitter 🐦 and Facebook 👥 and join our Facebook Group 💬.To join our community Slack 🗣️ and read our weekly Faun topics 🗞️, click here⬇

Read this article if you want to know more about Create Cassandra Cluster in Vagrant using Ansible

Pratheep Ananthan

May 4 · 6 min read

Photo by Samson Creative. on Unsplash

This post explains about deploying a 3-node Cassandra cluster in your laptop by simply using a ‘vagrant up’ command.

Usually, developers struggle in setting up tools and project environment in their laptop especially in Windows. If the purpose is just to check the connectivity with their program and a third party tool, its time consuming to understand the configurations of the third party tool and manually do the setup. It will be a more difficult task if its a database cluster setup. Therefore, the following automation can be a good friend in such cases.

I prepared an Ansible role to automate Cassandra cluster deployment. Details of vagrant setup and Ansible role will be explained below.

Required tools set:

Virtualbox
Vagrant

Execution steps:

Install virtual box and vagrant
git clone https://github.com/apkan/vagrant-cassandra-ansible.git
cd vagrant-cassandra-ansible
vagrant up

Output:

Ansible playbook will execute a set of tasks and eventually a Cassandra cluster status will be displayed.

The final result of ‘vagrant up’ command

Vagrant

In vagrant, Ansible local provisioner is used for this automated deployment. This helps to avoid initial difficulties in Ansible installation and setup, especially, you do not need to worry about SSH key exchanges.

This vagrant file will deploy 3 nodes and 1 controller which will be used as Ansible controller node. Ansible will be installed automatically and execute ‘cassandraCluster’ playbook from the controller node when you issue ‘ vagrant up’ command.

machine.vm.provider "virtualbox" do |vbox|
vbox.memory = "1024"
vbox.cpus = "1"
vbox.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
vbox.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
end

Hardware resources for each vagrant nodes can be customized as per your hardware availability. ‘natdnshostresolver1’ and ‘natdnsproxy1’ will be enabled on all nodes to give a boost on networking performance.

---
all:
hosts:
controller:
ansible_connection: local
ansible_ssh_host: x.x.x.x
node1:
ansible_ssh_host: x.x.x.x
ansible_ssh_private_key_file: /vagrant/.vagrant/machines/node1/virtualbox/private_key
node2:
ansible_ssh_host: x.x.x.x
ansible_ssh_private_key_file: /vagrant/.vagrant/machines/node2/virtualbox/private_key
node3:
ansible_ssh_host: x.x.x.x
ansible_ssh_private_key_file: /vagrant/.vagrant/machines/node3/virtualbox/private_key
children:
cluster-nodes:
hosts:
node1:
node2:
node3:
seed-nodes:
hosts:
node1:
node2:

This inventory file plays a key role in this setup. This will be the global configuration for host IPs, private keys and cassandra seed nodes. Vagrant and Ansible roles will fetch the IP address from this inventory yml file.

config.vm.define "node1" do |machine| 
machine.vm.network "private_network", 
ip: inventory["all"]["hosts"]["node1"]["ansible_ssh_host"] 
... 
end

Ansible role

└── cassandra-cluster
├── handlers
│   └── main.yml
├── tasks
│   └── main.yml
├── templates
│   ├── cassandra.sh
│   └── cassandra.yaml.j2
└── vars
└── main.yml

This is a simple Ansible role to deploy 3 node cassandra cluster. Following shows the tasks used in the role:

- name: Check if java 8 alread installed
command: java -version 2>&1 | grep version | awk '{print $3}' | sed 's/"//g'
register: java_version
ignore_errors: True- debug:  msg="Java not installed"
when: java_version is failed- name: installJavaRepo
apt_repository: repo='ppa:openjdk-r/ppa'
when: java_version is failed- name: updateCache
apt: update_cache=yes
when: java_version is failed- name: installJava
apt: name=openjdk-8-jdk state=present  
when: java_version is failed- stat: 
path: /opt/apache-cassandra-3.7-bin.tar.gz
register: cassandra_source_file- name: download the source
get_url: 
url: https://archive.apache.org/dist/cassandra/3.7/apache-cassandra-3.7-bin.tar.gz
dest: /opt/apache-cassandra-3.7-bin.tar.gz
mode: 0440
when: cassandra_source_file.stat.exists == False- stat:
path: /opt/apache-cassandra-3.7-bin.tar.gz
checksum_algorithm: sha1
register: downloaded_file_checksum- name: download checksum from apache page
uri: url=https://archive.apache.org/dist/cassandra/3.7/apache-cassandra-3.7-bin.tar.gz.sha1 return_content=yes
register: apache_page_checksum
failed_when: downloaded_file_checksum.stat.checksum != apache_page_checksum.content- name: Create home directory
file: 
path: "{{ cassandra_home }}"
state: directory
mode: 0755  - name: Extract cassandra source
unarchive:
src: /opt/apache-cassandra-3.7-bin.tar.gz
dest: "{{ cassandra_home }}"
extra_opts: [--strip-components=1]
remote_src: yes  - name: Export path
shell: "echo 'PATH=$PATH:{{ cassandra_path }}' > /etc/profile.d/custom-path.sh && . /etc/profile.d/custom-path.sh"- name: install the configuration file
template: src=cassandra.yaml.j2 dest='{{ cassandra_home }}/conf/cassandra.yaml' mode=750- name: copy the startup script
template: src=cassandra.sh dest=/etc/init.d/cassandra owner=root group=root mode=755- name: Enable the daemon
shell: update-rc.d cassandra "{{ item }}"
with_items:
- "defaults"
- "enable"- name: start cassandra service
service: name=cassandra state=started
notify:
- CheckClusterStatus
- GetClusterStatusOutput
- DisplayClusterStatusOutput

Most of the important parameters in Cassandra.yaml are configured using Ansible template. Following shows the variables used in this role:

cluster_name: MyCluster#Directory locations
cassandra_home: /opt/cassandra 
cassandra_path: /opt/cassandra/bin
cassandra_data_directory: /opt/cassandra/data
cassandra_hints_directory: /opt/cassandra/data/hints
cassandra_commitlog_directory: /opt/cassandra/data/commitlogs
cassandra_saved_caches_directory: /opt/cassandra/data/saved_cachescassandra_seeds_resolved: "{{ groups['seed-nodes'] | map('extract', hostvars, ['ansible_ssh_host']) | join(',') }}"#For IP Address configuration in YML
broadcast_address: "{{ hostvars[inventory_hostname]['ansible_ssh_host'] }}"
listen_address: "{{ hostvars[inventory_hostname]['ansible_ssh_host'] }}"
broadcast_rpc_address: "{{ hostvars[inventory_hostname]['ansible_ssh_host'] }}"#Port configurations
cassandra_port: 9042
rpc_port: 9160
storage_port: 7000
ssl_storage_port: 7001

It was an interesting challenge to configure a list of Cassandra seed IPs. I have mentioned node 2 and node 3 as seed nodes in inventory file. Those two IPs will be passed to cassandra.yaml file from ‘cassandra_seeds_resolved’ variable.

Best practice tips:

Make use of variable as much as possible and avoid hard coding in Ansible tasks.
Prepare Ansible roles with the concept of idempotency. This is to ensure that your environment is consistent. Also, this will help in reusing the same environment to test the changes in your Ansible roles.
I used an Ansible task to check if java already installed. These sort of verification can help if we want to run this playbook in some other environment.
Display cluster status at last of all tasks, so you can easily identify whether playbook execution successfully created the cluster.

Connectivity Testing

In addition, I prepared a simple java program to check the connectivity. You just need to parse one of the node IP and port (mentioned in inventory file), then the driver can fetch the other node’s IP and display the cluster details.

Connectivity test

Destroying the setup

Just like how we deployed this setup, destroying it is also super easy. Anytime, you can destroy this whole setup by simply using the ‘vagrant destroy’ command.

Conclusion

I hope this post helps to give an idea about how deployment in local environment can be automated using tools like Vagrant and Ansible.

Please note that this can be faster and more efficient if you use cloud environment instead of Vagrant.( I will write that in another post — “Deployment using Terraform and Ansible”).

References

Follow us on Twitter 🐦 and Facebook 👥 and join our Facebook Group 💬.

To join our community Slack 🗣️ and read our weekly Faun topics 🗞️, click here⬇

Required tools set:

Execution steps:

Output:

Vagrant

Ansible role

Connectivity Testing

Destroying the setup

Conclusion

References

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us