Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

6/19/2019

Reading time:3 min

Setting Up Cassandra Cluster Through Ansible

by John Doe

In this post, we will use Ansible to and set-up an Apache Cassandra database cluster. We will use AWS EC2 instances as the nodes for the cluster. Creating a cluster manually is a tedious task. We have to manually configure each node and each node must be correctly configured before starting the cluster.With Ansible, we can automate the task and let Ansible handle the configuration management for us.First of all, create a directory for storing the files and folders related to the playbook. It helps in keeping our work organized and saves us from the confusion which may arise due to relative and absolute path references in passing the variables in our playbook. Following is the structure of my directory that contains the playbook and the roles:Steps To Follow While Using AWS:Create 2-3 instances of AWS EC2 which will serve as the nodes in a cluster.Create a security group to allow all connections and add the nodes to that security groups. Create an inventory that has the IP addresses of the nodes.Add the inventory file into the configuration file of the Ansible, i.e. ansible.cfg.Now, we create a playbook to set up the nodes for us. Following is the playbook for the same. ---
- hosts: aws-webservers
 gather_facts: yes
 remote_user: ec2-user
 become: yes
 vars:
 cluster_name: Test_Cluster
 seeds: 13.xxx.xxx.xxx
 roles:
 - installationThen, we define the roles we have created. In the role, installation, the following tasks have been achieved:Installing a JREAdding and unpacking the Apache Cassandra tar.Replacing the cassandra.yaml having default configurations with cassandra.yaml with our own configurations whose details are given below.Ensuring Cassandra is startedThe following is the main.yml file from the roles:---
- name: Copt Java RPM file
 copy:
 src: jdk-8_linux-x64_bin.rpm
 dest: /tmp/jdk-8_linux-x64_bin.rpm

- name: install JDK via RPM file with yum
 yum:
 name: /tmp/jdk-8_linux-x64_bin.rpm
 state: present
- name: Copy Cassandra tar
 copy:
 src: apache-cassandra-3.11.2-bin.tar.gz
 dest: /tmp/apache-cassandra-3.11.2-bin.tar.gz

- name: Extract Cassandra
 command: tar -xvf /tmp/apache-cassandra-3.11.2-bin.tar.gz

- name: override cassandra.yaml file
 template: src=cassandra.yaml dest=apache-cassandra-3.11.2/conf/

- name: Run Cassandra from bin folder
 command: ./cassandra -fR
 args:
 chdir: /home/ec2-user/apache-cassandra-3.11.2/bin/The cassandra.yaml contains most of the Cassandra configuration, such as ports used, file locations and seed node IP addresses. We need to edit this file on each node, so I have created a template for the file. The template cassandra.yaml uses the following variables:cluster_name: ‘{{ cluster_name }}’ – can be anything chosen by you to describe the name of the cluster.seeds: “{{ seeds }}” – are the IP addresses of the clusters seed servers. Seed nodes are used as known places where cluster information (such as a list of nodes in the cluster) can be obtained.listen_address: {{ aws-webservers }} – is the IP address that Cassandra will listen on for internal (Cassandra to Cassandra) communication will occur.rpc_address: {{ aws-webservers }} – is the IP address that Cassandra will listen on for client-based communication.Now, we can run the playbook and our cluster will be up and running. We can add more nodes to the list by simply adding them to the hosts list and Ansible will ensure that Cassandra is installed and the nodes are connected to the cluster and started.Points To RememberThe host IP should be the public IP of a node.Put the Java rpm packages and Cassandra tar file in the files directory of the role created.Use Java 8 as Cassandra is not supported on higher versions of Java. It will throw the following error.[0.000s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/home/mmatak/monero/apache-cassandra-3.11.1/logs/gc.log instead.
intx ThreadPriorityPolicy=42 is outside the allowed range [ 0 ... 1 ]
Improperly specified VM option 'ThreadPriorityPolicy=42'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit. Thus, Ansible makes it very easy to install distributed systems like Cassandra. The thought of doing it manually is very disheartening. The full source code including templates and directory structure is here.

Illustration Image

In this post, we will use Ansible to and set-up an Apache Cassandra database cluster. We will use AWS EC2 instances as the nodes for the cluster. Creating a cluster manually is a tedious task. We have to manually configure each node and each node must be correctly configured before starting the cluster.With Ansible, we can automate the task and let Ansible handle the configuration management for us.

First of all, create a directory for storing the files and folders related to the playbook. It helps in keeping our work organized and saves us from the confusion which may arise due to relative and absolute path references in passing the variables in our playbook. Following is the structure of my directory that contains the playbook and the roles:

image

Steps To Follow While Using AWS:

  • Create 2-3 instances of AWS EC2 which will serve as the nodes in a cluster.

  • Create a security group to allow all connections and add the nodes to that security groups.

  •  Create an inventory that has the IP addresses of the nodes.

  • Add the inventory file into the configuration file of the Ansible, i.e. ansible.cfg.

image

Now, we create a playbook to set up the nodes for us. Following is the playbook for the same.

 ---
- hosts: aws-webservers
  gather_facts: yes
  remote_user: ec2-user
  become: yes
  vars:
    cluster_name: Test_Cluster
    seeds: 13.xxx.xxx.xxx
  roles:
    - installation

Then, we define the roles we have created. In the role, installation, the following tasks have been achieved:

  • Installing a JRE

  • Adding and unpacking the Apache Cassandra tar.

  • Replacing the cassandra.yaml having default configurations with cassandra.yaml with our own configurations whose details are given below.

  • Ensuring Cassandra is started

The following is the main.yml file from the roles:

---
- name: Copt Java RPM file
  copy:
     src: jdk-8_linux-x64_bin.rpm
     dest: /tmp/jdk-8_linux-x64_bin.rpm

- name: install JDK via RPM file with yum
  yum:
    name: /tmp/jdk-8_linux-x64_bin.rpm
    state: present
- name: Copy Cassandra tar
  copy:
     src: apache-cassandra-3.11.2-bin.tar.gz
     dest: /tmp/apache-cassandra-3.11.2-bin.tar.gz

- name: Extract Cassandra
  command: tar -xvf /tmp/apache-cassandra-3.11.2-bin.tar.gz

- name: override cassandra.yaml file
  template: src=cassandra.yaml dest=apache-cassandra-3.11.2/conf/

- name: Run Cassandra from bin folder
  command: ./cassandra -fR
  args:
    chdir: /home/ec2-user/apache-cassandra-3.11.2/bin/

The cassandra.yaml contains most of the Cassandra configuration, such as ports used, file locations and seed node IP addresses. We need to edit this file on each node, so I have created a template for the file. The template cassandra.yaml uses the following variables:

  • cluster_name: ‘{{ cluster_name }}’ – can be anything chosen by you to describe the name of the cluster.
  • seeds: “{{ seeds }}” – are the IP addresses of the clusters seed servers. Seed nodes are used as known places where cluster information (such as a list of nodes in the cluster) can be obtained.
  • listen_address: {{ aws-webservers }} – is the IP address that Cassandra will listen on for internal (Cassandra to Cassandra) communication will occur.
  • rpc_address: {{ aws-webservers }} – is the IP address that Cassandra will listen on for client-based communication.

Now, we can run the playbook and our cluster will be up and running. We can add more nodes to the list by simply adding them to the hosts list and Ansible will ensure that Cassandra is installed and the nodes are connected to the cluster and started.

Points To Remember

  • The host IP should be the public IP of a node.

  • Put the Java rpm packages and Cassandra tar file in the files directory of the role created.

  • Use Java 8 as Cassandra is not supported on higher versions of Java. It will throw the following error.

    [0.000s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/home/mmatak/monero/apache-cassandra-3.11.1/logs/gc.log instead.
    intx ThreadPriorityPolicy=42 is outside the allowed range [ 0 ... 1 ]
    Improperly specified VM option 'ThreadPriorityPolicy=42'
    Error: Could not create the Java Virtual Machine.
    Error: A fatal exception has occurred. Program will exit.
     

Thus, Ansible makes it very easy to install distributed systems like Cassandra. The thought of doing it manually is very disheartening. The full source code including templates and directory structure is here.

knoldus-advt-sticker

Related Articles

cassandra
ansible

GitHub - locp/ansible-role-cassandra: Ansible role to install and configure Apache Cassandra

locp

8/25/2022

kubernetes
terraform
cassandra

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra