Cassandra is an open-source, distributed, decentralized, horizontally scalable, and highly available NoSQL database.  It is based on Amazon Dynamo distribution model and its data model is based on Google BigTable. Cassandra does not have any notion of master/slave as all of its nodes are the same. This helps Cassandra in becoming fault-tolerant and avoiding single point of failure.  The purpose of this blog is to show users how they can install Cassandra on OpenShift platform as a service. In case you want to learn more about Cassandra please read the documentation.

In this blog, we will be doing single node Cassandra installation. Please note that whole point of using Cassandra is fault tolerance and high availability, so single node installation is only good for a POC and getting your hands dirty with Cassandra. This blog is divided into two parts:

  1. How to install Cassandra running on a DIY application
  2. How you can use Cassandra as an embedded cartridge in a simple Java application.

Prerequisites

Before we can start with deploying Cassandra on OpenShift you need to do the following :

  1. Sign up for an OpenShift Account : If you don’t already have an OpenShift account, head on over to the website and sign up. It is completely free and Red Hat gives every user three free Gears on which to run your applications. At the time of this writing, the combined resources allocated for each user is 1.5 GB of memory and 3 GB of disk space.
  2. Install the client tools on your machine : The OpenShift client tools are written in a very popular programming language called Ruby. With OSX 10.6 or later and most Linux distributions, ruby is installed by default so installing the client tools is a snap. Simply issue the following command on your terminal application:
    sudo gem install rhc
  3. Setting up OpenShift : The rhc client tool makes it very easy to setup your openshift instance with ssh keys, git and your applications namespace. The namespace is a unique name per user which becomes part of your application url. For example, if your namespace is cix and application name is cassandra then url of the application will be http://cassandra-cix.rhcloud.com/. The command is shown below
    rhc setup -l openshift_login

Part 1 : Installing Cassandra on a DIY application

After you have signed up for OpenShift account and ran rhc setup command. The next step is to create a diy application using rhc command line tool. OpenShift’s powerful Do-It-Yourself (DIY) feature allows you to use your own Languages and Data Stores if the built-in Perl, Ruby, PHP, Python, and Java support doesn’t suit you. People have used it to run clojure,jruby,go, couchdb, redis and many other programming languages and datastores. OpenShift can run any binary that will run on RHEL 6.2 x64 because the OpenShift execution environment is a carefully secured Red Hat Enterprise Linux 6.2 running on x64 systems.

Creating Cassandra DIY Application

To create a diy application execute the command shown below.

rhc app create cassandra diy

This will create an application container for us, called a gear, and setup all of the required SELinux policies and cgroup configuration. OpenShift will also setup a private git repository for you and clone the repository to your local system. Finally OpenShift will propagate the DNS to outside world.

The template code generated by OpenShift has nothing interesting as it only contains a very simple ruby based http server listening on 8080 port and serves index.html file. The testrubyserver.rb and index.html exists in diy folder.

Pulling the code from Github

To get started with Cassandra quickly I have created a quickstart application which we can use to install Cassandra on OpenShift. The code is on github at https://github.com/shekhargulati/cassandra-openshift-quickstart. The quickstart downloads the latest cassandra tar, untar it, make configuration changes, and finally starts the Cassandra database. I will talk about it in detail latter in the post. Execute the git commands shown below to pull the quickstart code.

git remote add upstream https://github.com/shekhargulati/cassandra-openshift-quickstart.git
 
git pull -s recursive -X theirs upstream master

Pushing the code to OpenShift

Now that you have Cassandra quickstart on your machine, let’s push the code to OpenShift which will do all the necessary steps required to install Cassandra on OpenShift.

git push

After you execute git push, please wait for a minute as this command will do all the necessary steps required to install Cassandra on your diy application. After git push succeeds, ssh into the application gear as shown below.

ssh f677086ae4b84936XXXXefrfrfr3f8e53f43eb56@cassandra-demo.rhcloud.com

Now if you run ps -ef|grep cassandra you will find that cassandra Java process is running as shown below.

View Cassandra Process

Taking Cassandra to Test drive

Now that we are sure Cassandra is running on the OpenShift gear, lets test it by creating some sample keyspace and column family. Then we will install some data into Column Family. Cassandra provides a command line utility called CQL which we can use for testing. To run CQL go to cassandra/bin folder in $OPENSHIFT_DATA_DIR and run cqlsh as shown below.

cd app-root/data/cassandra/bin/
./cqlsh $OPENSHIFT_DIY_IP 19160 -2

You can also run DESCRIBE schema command which will output the Cassandra system keyspace schema.

Lets now create our keyspace, column family. Execute the commands shown below.

CREATE KEYSPACE MyKeyspace with strategy_class = 'org.apache.cassandra.locator.SimpleStrategy' AND strategy_options:replication_factor = 1;
 
use MyKeyspace;
CREATE TABLE users (
  user_name varchar PRIMARY KEY,
  password varchar,
  gender varchar,
  session_token varchar,
  state varchar,
  birth_year bigint
);
 
INSERT INTO MyKeyspace.users(user_name,password,gender,session_token,state,birth_year) VALUES ( 'shekhar','password','M','session','Haryana',1984);

You can also view the data using SELECT query as shown below.
Cassandra Select Statement

Under the Hood

Now that we have got Cassandra running on OpenShift, let’s take a look at how we achieved that. The changes that we made in the code are in three files which exists in .openshift/action_hooks folder inside your application directory. Let’s take a look at all these files one by one.

  1. deploy : The deploy hook gets invoked after dependencies of an application are resolved but before starting back the application again. In this script we create a new cassandra directory under $OPENSHIFT_DATA_DIR, download the tar file,create directories required by Cassandra to keep application data, logs, etc. And finally, update some configuration files to make Cassandra work. The configuration file we change are cassandra.yaml, log4j-server.properties, and cassandra-env.sh. The changes that we make in these files are related to port changes, using $OPENSHIFT_DIY_IP instead of localhost,setting memory, etc. You can view the file on github.
  2. start : The start script encapsulate the logic required to start the application. So, this script is where we start the cassandra database. You can view the start script on github.
  3. stop : The stop script encapsulate the logic to stop the application. Here we find the cassandra process id and kill the process. You can view the stop script on github.

That’s the only changes that we had to make in order to run Cassandra on OpenShift.

Part 2 : Using Cassandra as an Embedded Cartridge from with in a Java application

So, far we have seen how you can install cassandra on a diy application. But, it would make more sense to use Cassandra as an embedded cartridge from other application type like Java, PHP,Python, Ruby supported by OpenShift. This way you don’t have to install any other server or runtime for your application. In this part, we will create a very simple Java application which will be deployed on tomcat via JBoss EWS cartridge.

Creating Tomcat Application

The first step that we will be doing is create a tomcat application called cassandrajavademo. Execute the command shown below to create the application.

rhc app create cassandrajavademo tomcat-6

Updating OpenShift Action Hook Scripts to install Cassandra

To install cassandra we have to update three scripts deploy, pre_start_jbossews, pre_stop_jbossews. These files will contain the same content as contained by deploy, start, and stop scripts we created in Part1.So, please update them accordingly. The only one change that we have to make is that we will be creating the keyspace and column family in pre_start_jbossews. So,please add the following line at the end.

bin/cassandra-cli -h $OPENSHIFT_DIY_IP -p 19160 -f $OPENSHIFT_REPO_DIR/cassandra-tutorial.txt

Java code to interact with Cassandra

Finally, I have written a very simple Spring MVC application which just has one controller which write data into cassandra and you can view the data by ssh’ing to the instance and running cqlsh command line utility. The controller is shown below.

@Controller
public class CassandraController {
 
    @RequestMapping(value = "/cassandra", method = RequestMethod.GET)
    public String process() throws TException, InvalidRequestException,
            UnavailableException, UnsupportedEncodingException,
            NotFoundException, TimedOutException {
 
        String host = System.getenv("OPENSHIFT_INTERNAL_IP");
        int port = 19160;
        TTransport transport = new TFramedTransport(new TSocket(host,port));
        TProtocol protocol = new TBinaryProtocol(transport);
        Cassandra.Client client = new Cassandra.Client(protocol);
        transport.open();
 
        client.set_keyspace("tutorials");
 
        // define column parent
        ColumnParent parent = new ColumnParent("User");
 
        // define row id
        ByteBuffer rowid = ByteBuffer.wrap("100".getBytes());
 
        // define column to add
        Column username = new Column();
        username.setName("username".getBytes());
        username.setValue("shekhargulati".getBytes());
        username.setTimestamp(System.currentTimeMillis());
 
        // define consistency level
        ConsistencyLevel consistencyLevel = ConsistencyLevel.ONE;
 
        // execute insert
        client.insert(rowid, parent, username, consistencyLevel);
 
        Column password = new Column();
        password.setName("password".getBytes());
        password.setValue("password".getBytes());
        password.setTimestamp(System.currentTimeMillis());
        client.insert(rowid, parent, password, consistencyLevel);
 
        // release resources
        transport.flush();
        transport.close();
        return "hello";
    }
 
}

Push code to OpenShift

Finally push the code to github which will install cassandra, build a new war and deploy it on tomcat. Now, if you hit http://cassandrajavademo-cix.rhcloud.com/cassandra a new row will be created in Cassandra and you will see “Hello from Cassandra”.

Source code of the application is available on my github repository

Conclusion

In this blog, I showed you how easy it is to extend OpenShift by installing Cassandra on top of it. What are you waiting for, Try it now!

What’s Next?