Wanting to up your database game? If so, you're probably need a NoSQL database. Jack Wallen shows you how to install Apache Cassandra to scratch that itch.
Big business requires big data. In order for your company to handle such massive amounts of data, you'll need a NoSQL database. Of the available databases, which is best suited for your task? Given you have quite a few to consider, the choice might be a bit daunting. Fortunately a number of those options are open source and pretty easy to deploy so you can quickly kick the tires and see if it's the right one to fit your needs.
One such NoSQL database is Apache Cassandra, which offers linear scalability, high availability, no single point of failure, and can work on commodity hardware. Cassandra is used by a number of very large companies (including Reddit, Netflix, and Github).
I want to walk you through the process of installing Apache Cassandra on Ubuntu Server 20.04.
SEE: Navigating data privacy (free PDF) (TechRepublic)
What you'll need
A running instance of Ubuntu Server 20.04
A user with sudo privileges
How to install Java
Apache Cassandra requires Java. We need to make sure to install the correct version of Java, which is OpenJDK 8. For that, we turn to the standard repository.
Log in to your Ubuntu Server and install Java with the command:
sudo apt install openjdk-8-jdk -y
When that installation completes, you're ready to install Apache Cassandra.
How to install Apache Cassandra
Apache Cassandra can't be installed from the standard repositories. Because of that, we need to add the official repository. In order to do this over HTTPS, we must first install apt-transport-https with the command:
sudo apt install apt-transport-https -y
Once that dependency is taken care of, download and install the necessary GPG key with the command:
wget -q -O - https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
Create an apt list file for Apache Cassandra with the command:
sudo nano /etc/apt/sources.list.d/cassandra.list
In that file, add the following:
deb http://www.apache.org/dist/cassandra/debian 311x main
Save and close the file.
Update apt with the command:
sudo apt-get update
Finally, install Apache Cassandra with the command:
sudo apt-get install cassandra -y
When the installation completes, verify it with the command:
nodetool status
You should see the database server is running on datacenter1 (Figure A).
Figure A
How to change the name of your database cluster
By default, your Apache Cassandra cluster will be named "Test Cluster." Let's rename that. To do so, you must first access the Apache Cassandra console with the command:
cqlsh
At the console, rename the cluster with the command:
UPDATE system.local SET cluster_name = 'NAME' WHERE KEY = 'local';
Where NAME is the new name of your cluster.
Exit the console with the command:
exit
Now we need to change the name in the Apache Cassandra configuration file. Open the file for editing with the command:
sudo nano /etc/cassandra/cassandra.yaml
Look for the line starting with cluster_name and change it to the same name you used in the UPDATE command (Figure B).
Figure B
Flush the Apache Cassandra cache with the command:
nodetool flush system
Restart Apache Cassandra with the command:
sudo systemctl restart cassandra
Now, when you log in to the Apache Cassandra console, you should see the new name of your cluster is in effect (Figure C).
Figure C
How to open Apache Cassandra for remote connection
By default, Apache Cassandra is only listening to local connections. That's great, if the applications that will use the database are hosted on the same machine. If you need to use Apache Cassandra as a remote database, you'll need to enable it for remote connection. To do this, open the configuration file again with the command:
sudo nano /etc/cassandra/cassandra.yaml
In that file, you must change three lines. The first is:
start_rpc: false
Change the above to:
start_rpc: true
The next line is:
rpc_address: localhost
Change the above line to:
rpc_address: 0.0.0.0
The next line needs to be uncommented (by removing the # character) and changed to:
broadcast_rpc_address: SERVER_IP
Where SERVER_IP is the IP address of your hosting server.
Save and close the file. Flush and restart Apache Cassandra with these commands:
nodetool flush system sudo systemctl restart cassandra
You should now be able to connect to Apache Cassandra from remote servers.
And that's how you can easily get this powerful, open source NoSQL database server installed. Enjoy your newfound ability to scale your data to meet enterprise-level needs.
Subscribe to TechRepublic's How To Make Tech Work on YouTube for all the latest tech advice for business pros from Jack Wallen.