In this post, we will use Ansible to and set-up an Apache Cassandra database cluster. We will use AWS EC2 instances as the nodes for the cluster. Creating a cluster manually is a tedious task. We have to manually configure each node and each node must be correctly configured before starting the cluster.With Ansible, we can automate the task and let Ansible handle the configuration management for us.
First of all, create a directory for storing the files and folders related to the playbook. It helps in keeping our work organized and saves us from the confusion which may arise due to relative and absolute path references in passing the variables in our playbook. Following is the structure of my directory that contains the playbook and the roles:
Steps To Follow While Using AWS:
-
Create 2-3 instances of AWS EC2 which will serve as the nodes in a cluster.
-
Create a security group to allow all connections and add the nodes to that security groups.
-
Create an inventory that has the IP addresses of the nodes.
-
Add the inventory file into the configuration file of the Ansible, i.e. ansible.cfg.
Now, we create a playbook to set up the nodes for us. Following is the playbook for the same.
--- - hosts: aws-webservers gather_facts: yes remote_user: ec2-user become: yes vars: cluster_name: Test_Cluster seeds: 13.xxx.xxx.xxx roles: - installation
Then, we define the roles we have created. In the role, installation, the following tasks have been achieved:
-
Installing a JRE
-
Adding and unpacking the Apache Cassandra tar.
-
Replacing the cassandra.yaml having default configurations with cassandra.yaml with our own configurations whose details are given below.
-
Ensuring Cassandra is started
The following is the main.yml file from the roles:
--- - name: Copt Java RPM file copy: src: jdk-8_linux-x64_bin.rpm dest: /tmp/jdk-8_linux-x64_bin.rpm - name: install JDK via RPM file with yum yum: name: /tmp/jdk-8_linux-x64_bin.rpm state: present - name: Copy Cassandra tar copy: src: apache-cassandra-3.11.2-bin.tar.gz dest: /tmp/apache-cassandra-3.11.2-bin.tar.gz - name: Extract Cassandra command: tar -xvf /tmp/apache-cassandra-3.11.2-bin.tar.gz - name: override cassandra.yaml file template: src=cassandra.yaml dest=apache-cassandra-3.11.2/conf/ - name: Run Cassandra from bin folder command: ./cassandra -fR args: chdir: /home/ec2-user/apache-cassandra-3.11.2/bin/
The cassandra.yaml contains most of the Cassandra configuration, such as ports used, file locations and seed node IP addresses. We need to edit this file on each node, so I have created a template for the file. The template cassandra.yaml uses the following variables:
- cluster_name: ‘{{ cluster_name }}’ – can be anything chosen by you to describe the name of the cluster.
- seeds: “{{ seeds }}” – are the IP addresses of the clusters seed servers. Seed nodes are used as known places where cluster information (such as a list of nodes in the cluster) can be obtained.
- listen_address: {{ aws-webservers }} – is the IP address that Cassandra will listen on for internal (Cassandra to Cassandra) communication will occur.
- rpc_address: {{ aws-webservers }} – is the IP address that Cassandra will listen on for client-based communication.
Now, we can run the playbook and our cluster will be up and running. We can add more nodes to the list by simply adding them to the hosts list and Ansible will ensure that Cassandra is installed and the nodes are connected to the cluster and started.
Points To Remember
-
The host IP should be the public IP of a node.
-
Put the Java rpm packages and Cassandra tar file in the files directory of the role created.
-
Use Java 8 as Cassandra is not supported on higher versions of Java. It will throw the following error.
[0.000s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/home/mmatak/monero/apache-cassandra-3.11.1/logs/gc.log instead. intx ThreadPriorityPolicy=42 is outside the allowed range [ 0 ... 1 ] Improperly specified VM option 'ThreadPriorityPolicy=42' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.
Thus, Ansible makes it very easy to install distributed systems like Cassandra. The thought of doing it manually is very disheartening. The full source code including templates and directory structure is here.