Here is a simple, straightforward guide to start Cassandra on your local machine. Also we will create a sample keyspace as well as add data to our database.
- Download DataStax Community Server
curl -OL http://downloads.datastax.com/community/dsc.tar.gz
2. Install Cassandra
tar -xzf dsc.tar.gz
3. Move the file to where you want and uncompress the file.
cd dsc-cassandra-3.0.9/bin
4. Start up Cassandra!
sudo ./cassandra
5. Now that you have Cassandra running, next up is to connect to the server and begin creating database objects using CQL. The CQL utility (cqlsh) is in the same bin directory as the cassandra executable:
./cqlsh
6. Now let’s create a keyspace. A keyspace holds data objects and is the level where you specify options for a data partitioning and replication strategy. It is similar to a database in RDBMS. Let’s create a keyspace named dev.
cqlsh> create keyspace dev
... with replication = {'class':'SimpleStrategy','replication_factor':1};
A map of properties and values defines the 2 different types of keyspaces: SimpleStrategy and NetworkTopologyStrategy. Below are 4 properties and values that are required to create a keyspace.
Keyspace Property 1: class
The value is SimpleStrategy or NetworkTopologyStrategy.
SimpleStrategy — Use only for a single datacenter and one rack. If you ever intend more than one datacenter, use the NetworkTopologyStrategy. NetworkTopologyStrategy — Highly recommended for most deployments because it is much easier to expand to multiple datacenters when required by future expansion.
Keyspace Property 2: replication_factor
This means the number of replicas. If the class is SimpleStrategy, this is not used. This value is the number of replicas of data.
Keyspace Property 3: <first data center>
This is required if class is NetworkTopologyStrategy and you provide the name of the first data center. This value is the number of replicas of data on each node in the first data center.
Keyspace Property 4: <next data center>
Required if class is NetworkTopologyStrategy and you provide the name of the second data center. The value is the number of replicas of data on each node in the data center. More replication factors for optional named data centers.
For this example, we will use SimpleStrategy.
7. Create — Now let’s add some data! Here we create a base table to hold employee data using the CQL CREATE command. The column family is named emp and contains four columns, including the employee ID, which acts as the primary key of the table.
cqlsh> use dev;
cqlsh:dev> create table emp (empid int primary key, emp_first varchar, emp_last varchar, emp_dept varchar);
8. Insert — Next we insert data into our new column family using the CQL INSERT command:
cqlsh> insert into emp (empid, emp_first, emp_last, emp_dept) values (1, 'marika','lam','eng');
9. Update — Here we update our employee department from engineering to finance using the CQL UPDATE command.
cqlsh> update emp set emp_dept = 'fin' where empid = 1;
10. Select — Here we will query data using the CQL SELECT command.
cqlsh:dev> select * from emp;empid | emp_dept | emp_first | emp_last
------+----------+-----------+----------
1 | fin | marika | lam
- In Cassandra, if you want to query columns other than the primary key, you need to create a secondary index on them.
cqlsh:dev> create index idx_dept on emp(emp_dept);
cqlsh:dev> select * from emp where emp_dept = 'fin';
And that’s pretty much it! Hopefully this guide has helped you to have Cassandra up and running locally on your machine.