Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

9/23/2020

Reading time:6 min

Introduction to Cassandra

by Gokhan Atil

Introduction to Cassandra SlideShare Explore You Successfully reported this slideshow.Introduction to CassandraUpcoming SlideShareLoading in …5× 3 Comments 6 Likes Statistics Notes Nandlal Sarda , Professor at IIT Bombay, at Indian Institute of Technology, Bombay Michael (毕建华) Bi , Solution Architect APAC at Irdeto at Irdeto Gawi Jyu Bui Kiet , gfg at fgf at gfg Mary Ann Redd / CCNA / PARALEGAL , CCNA, CWMP, SAP, on the road to AWS Certifications Show More No DownloadsNo notes for slide 1. INTRODUCTION TOAPACHE CASSANDRAGökhan Atıl 2. GÖKHAN ATIL➤ Database Administrator➤ Oracle ACE Director (2016)
ACE (2011)➤ 10g/11g and R12 Oracle Certified Professional (OCP)➤ Co-author of Expert Oracle Enterprise Manager 12c➤ Founding Member and Vice President of TROUG➤ Blogger (since 2008) gokhanatil.com➤ Twitter: @gokhanatil2 3. INTRODUCTION TO APACHE CASSANDRA➤ What is Apache Cassandra? Why to use it?➤ Cassandra Architecture➤ Cassandra Query Language (CQL)➤ Cassandra Data Modeling➤ How to install and run Cassandra?➤ Cassandra nodetool➤ Backup and Recovery3 4. WHAT IS APACHE CASSANDRA? WHY TO USE IT?4 5. WHAT IS APACHE CASSANDRA? WHY TO USE IT?➤ Fast Distributed (Column Family NoSQL) DatabaseHigh availabilityLinear ScalabilityHigh Performance➤ Fault tolerant on Commodity Hardware➤ Multi-Data Center Support➤ Easy to operate➤ Proven: CERN, Netflix, eBay, GitHub, Instagram, Reddit5 6. HIGH AVAILABILITY: CAP THEOREM AND CASSANDRA6PartitionToleranceAvailabilityConsistency
(ACID)RDBMSAtomicityConsistencyIsolationDurability 7. HIGH AVAILABILITY: THE RING7NO MASTER NO SLAVEPEER TOPEERgossipgossipI'm online! 8. LINEAR SCALABILITY8 9. CASSANDRA ARCHITECTURE9 10. CASSANDRA PARTITIONS10EMAIL NAME PHONEgokhan@ Gokhan 542xxxxxxxaylin@ Aylin 532xxxxxxxilayda@ Ilayda 532xxxxxxxpartitionerPRIMARY KEYPARTITION KEY, CLUSTERING KEY 11. REPLICATION FACTOR11EMAILgokhan@Murmur3Partitioner# 60 12. WRITE PATH (CLUSTER)12coordinatornodeclienthintedhand off 13. WRITE PATH (NODE)➤ Logging data in the commit log➤ Writing data to the memtable➤ Flushing to (immutable)SSTables (Sorted Strings Table)13memtablecommit log SSTable SSTable SSTablediskmemflushcompaction 14. READ PATH (CLUSTER)14coordinatornodeclient➤ Read Repair: repair during read path using digest and timestampdatadigestdigest 15. READ PATH (NODE)15memtable row (read) cachebloom filter
(maybe or no)partition keycachepartitionsummarypartition index SSTablefoundmaybefoundnodiskmem 16. CONSISTENCY LEVELS➤ Formula for Strong Consistency: R + W > N16ANY (write only) at least one nodeONE, TWO, THREEat least one/two/three replicanodeQUORUMa quorum (N/2+1) of replicanodes across all datacentersLOCAL_QUORUMa quorum (N/2+1) of replicanodes in the same datacenterALL on all replica nodes 17. CASSANDRA QUERY LANGUAGE (CQL)17 18. CASSANDRA QUERY LANGUAGE (CQL)➤ Create a Keyspace (Database):
create keyspace demo with replication = { 'class' :'SimpleStrategy', 'replication_factor' :1 };➤ Remove a keyspace:
drop keyspace demo;➤ Select a keyspace to operate:
use demo;18 19. CASSANDRA QUERY LANGUAGE (CQL)➤ Create a table:
create table demo.democlients ( email text, name text,phone text, primary key (email, name));➤ Alter a table:
alter table democlients add money int;➤ Remove a table:
drop table democlients;➤ Remove all rows in a table:
truncate table democlients;19EMAIL: PARTITION KEYNAME: CLUSTERING KEY 20. CASSANDRA QUERY LANGUAGE (CQL)➤ Retrieve rows:
select * from democlients where name='Gokhan Atil'ALLOW FILTERING; -- or create a secondary index➤ Retrieve distinct values:
select DISTINCT email from democlients;➤ Limit the number of rows returned:
select * from democlients LIMIT 1;➤ Sort the result:
select * from democlients where email='gokhan atgokhanatil.com' ORDER by name DESC;20NAME: CLUSTERING KEYEMAIL: PARTITION KEY 21. CASSANDRA QUERY LANGUAGE (CQL)➤ Retrieve the results in the JSON format:
select JSON * from democlients;➤ Insert a row:
insert into democlients (email, name, phone) values('gokhan at gokhanatil.com','Gokhan Atil','542' ) IF NOTEXISTS;➤ Insert a row with TTL (Time to live - seconds):
insert into democlients (email, name, phone) values ('infoat gokhanatil.com','Information','542' ) USING TTL 10;21 22. CASSANDRA QUERY LANGUAGE (CQL)➤ Update records:
update democlients set phone='535' whereemail='gokhan at gokhanatil.com' and 
name='Gokhan' IF EXISTS;➤ Update records with a condition:
update democlients set money=20 where email='gokhanat gokhanatil.com' and name='Gokhan Atil' 
IF phone='542';➤ Delete rows:
delete from democlients where email='gokhan atgokhanatil.com' IF EXISTS;22 23. CASSANDRA QUERY LANGUAGE (CQL)➤ Delete row with a condition:
delete from democlients where email='gokhan atgokhanatil.com' and name='Gokhan Atil' IF money > 10;➤ Delete columns in a row:
delete money from democlients where email='gokhan atgokhanatil.com' and name='Gokhan Atil';23 24. CASSANDRA DATA MODELING➤ Query-Driven Data Modeling➤ Spread data evenly across the cluster➤ Use Denormalization➤ Be careful about using secondary indexes24 25. HOW TO INSTALL AND RUN CASSANDRA?25 26. HOW TO INSTALL AND RUN CASSANDRA CLUSTER?➤ Make sure you have JDK (8u40 or newer) installed➤ Download apache-cassandra-VERSION-bin.tar.gz➤ Extract the file to a folder➤ Make data and logs directories in cassandra folder➤ Run bin/cassandra➤ Edit the configuration file (conf/cassandra.yaml)➤ Give a name to cluster, change listening address, data and logsdirectory locations, enable authentication and authorization.26 27. HOW TO INSTALL AND RUN CASSANDRA CLUSTER?➤ User docker to pull the latest image:
docker pull cassandra➤ Run it as standalone:
docker run --name cas1 -p 9042:9042 -eCASSANDRA_CLUSTER_NAME=MyCluster -d cassandra➤ Connect using clqsh:
docker exec -it cas1 cqlsh➤ Run nodetool (i.e for check status):
docker exec -it cas1 nodetool status27 28. CASSANDRA NODETOOL28 29. CASSANDRA NODETOOL➤ Get a quick summary of the node:
nodetool info➤ Get version of Cassandra:
nodetool version29 30. CASSANDRA NODETOOL➤ Get status of the cluster/keyspace:
nodetool status <keyspace_name>➤ View the network statistics of the node:
nodetool netstats➤ Get information of a table:
nodetool cfstats <keyspace_name.table_name>30 31. CASSANDRA NODETOOL➤ Repair a node (you can run it weekly on non-peak hours):
nodetool repair➤ Cleanup of keys no longer belonging to a node:
nodetool cleanup➤ Start a major compaction process:
nodetool compact➤ Check the compaction process:
nodetool compactionstats31 32. CASSANDRA NODETOOL➤ Decommission a node (to prepare to remove it):
nodetool decommission <node_UUID>➤ Remove a dead/or decommissioned node from the cluster:
nodetool removenode <node_UUID>➤ Take a snapshot (for backup):
nodetool snapshot➤ Remove previous snapshots:
nodetool clearsnapshot32 33. BACKUP AND RECOVERY33 34. BACKUP AND RECOVERY➤ Back up a cluster:1. Take a snapshot of each node.2. Move the snapshots to another storage (S3 bucket?)3. Clean all the snapshots➤ Restore node(s):➤ Make sure schema exists➤ Truncate table➤ Copy most recent snapshots to a directory. Its name shouldbe formatted as "keyspace/tablename". Run:
sstableloader -d <nodeip> keyspace/tablename34 35. BUILD A BACKUP NODE➤ Use multi-DC replication:
CREATE KEYSPACE "MyKeyspace"
WITH replication = { 
'class' : 'NetworkTopologyStrategy',
'datacenter1' : 3, 'datacenter2' : 1 };35RF=3clientsnapshots 36. QUESTIONS?36 37. Blog: www.gokhanatil.com Twitter: @gokhanatil Recommended Introduction to Spark with PythonGokhan Atil SQL or noSQL - Oracle Cloud Day IstanbulGokhan Atil EM13c: Write Powerful Scripts with EMCLIGokhan Atil Oracle Enterprise Manager Cloud Control 13c for DBAsGokhan Atil Essential Linux Commands for DBAsGokhan Atil Oracle Enterprise Manager Cloud Control 13c for DBAsGokhan Atil Enterprise Manager: Write powerful scripts with EMCLIGokhan Atil About Blog Terms Privacy Copyright × Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Illustration Image
Introduction to Cassandra

Successfully reported this slideshow.

Introduction to Cassandra
INTRODUCTION TO
APACHE CASSANDRA
Gökhan Atıl
GÖKHAN ATIL
➤ Database Administrator
➤ Oracle ACE Director (2016)

ACE (2011)
➤ 10g/11g and R12 Oracle Certified Profession...
INTRODUCTION TO APACHE CASSANDRA
➤ What is Apache Cassandra? Why to use it?
➤ Cassandra Architecture
➤ Cassandra Query Lan...
WHAT IS APACHE CASSANDRA? WHY TO USE IT?
4
WHAT IS APACHE CASSANDRA? WHY TO USE IT?
➤ Fast Distributed (Column Family NoSQL) Database
High availability
Linear Scalab...
HIGH AVAILABILITY: CAP THEOREM AND CASSANDRA
6
Partition
Tolerance
Availability
Consistency

(ACID)
RDBMS
Atomicity
Consis...
HIGH AVAILABILITY: THE RING
7
NO MASTER NO SLAVE
PEER TO
PEER
gossip
gossip
I'm online!
LINEAR SCALABILITY
8
CASSANDRA ARCHITECTURE
9
CASSANDRA PARTITIONS
10
EMAIL NAME PHONE
gokhan@ Gokhan 542xxxxxxx
aylin@ Aylin 532xxxxxxx
ilayda@ Ilayda 532xxxxxxx
parti...
REPLICATION FACTOR
11
EMAIL
gokhan@
Murmur3Partitioner
# 60
WRITE PATH (CLUSTER)
12
coordinator
node
client
hinted
hand off
WRITE PATH (NODE)
➤ Logging data in the commit log
➤ Writing data to the memtable
➤ Flushing to (immutable)
SSTables (Sort...
READ PATH (CLUSTER)
14
coordinator
node
client
➤ Read Repair: repair during read path using digest and timestamp
data
dige...
READ PATH (NODE)
15
memtable row (read) cache
bloom filter

(maybe or no)
partition key
cache
partition
summary
partition ...
CONSISTENCY LEVELS
➤ Formula for Strong Consistency: R + W > N
16
ANY (write only) at least one node
ONE, TWO, THREE
at le...
CASSANDRA QUERY LANGUAGE (CQL)
17
CASSANDRA QUERY LANGUAGE (CQL)
➤ Create a Keyspace (Database):

create keyspace demo with replication = { 'class' :
'Simpl...
CASSANDRA QUERY LANGUAGE (CQL)
➤ Create a table:

create table demo.democlients ( email text, name text,
phone text, prima...
CASSANDRA QUERY LANGUAGE (CQL)
➤ Retrieve rows:

select * from democlients where name='Gokhan Atil'
ALLOW FILTERING; -- or...
CASSANDRA QUERY LANGUAGE (CQL)
➤ Retrieve the results in the JSON format:

select JSON * from democlients;
➤ Insert a row:...
CASSANDRA QUERY LANGUAGE (CQL)
➤ Update records:

update democlients set phone='535' where
email='gokhan at gokhanatil.com...
CASSANDRA QUERY LANGUAGE (CQL)
➤ Delete row with a condition:

delete from democlients where email='gokhan at
gokhanatil.c...
CASSANDRA DATA MODELING
➤ Query-Driven Data Modeling
➤ Spread data evenly across the cluster
➤ Use Denormalization
➤ Be ca...
HOW TO INSTALL AND RUN CASSANDRA?
25
HOW TO INSTALL AND RUN CASSANDRA CLUSTER?
➤ Make sure you have JDK (8u40 or newer) installed
➤ Download apache-cassandra-V...
HOW TO INSTALL AND RUN CASSANDRA CLUSTER?
➤ User docker to pull the latest image:

docker pull cassandra
➤ Run it as stand...
CASSANDRA NODETOOL
28
CASSANDRA NODETOOL
➤ Get a quick summary of the node:

nodetool info
➤ Get version of Cassandra:

nodetool version
29
CASSANDRA NODETOOL
➤ Get status of the cluster/keyspace:

nodetool status <keyspace_name>
➤ View the network statistics of...
CASSANDRA NODETOOL
➤ Repair a node (you can run it weekly on non-peak hours):

nodetool repair
➤ Cleanup of keys no longer...
CASSANDRA NODETOOL
➤ Decommission a node (to prepare to remove it):

nodetool decommission <node_UUID>
➤ Remove a dead/or ...
BACKUP AND RECOVERY
33
BACKUP AND RECOVERY
➤ Back up a cluster:
1. Take a snapshot of each node.
2. Move the snapshots to another storage (S3 buc...
BUILD A BACKUP NODE
➤ Use multi-DC replication:

CREATE KEYSPACE "MyKeyspace"

WITH replication = { 

'class' : 'NetworkTo...
QUESTIONS?
36
Blog: www.gokhanatil.com Twitter: @gokhanatil

Upcoming SlideShare

Loading in …5

×

  1. 1. INTRODUCTION TO APACHE CASSANDRA Gökhan Atıl
  2. 2. GÖKHAN ATIL ➤ Database Administrator ➤ Oracle ACE Director (2016)
 ACE (2011) ➤ 10g/11g and R12 Oracle Certified Professional (OCP) ➤ Co-author of Expert Oracle Enterprise Manager 12c ➤ Founding Member and Vice President of TROUG ➤ Blogger (since 2008) gokhanatil.com ➤ Twitter: @gokhanatil 2
  3. 3. INTRODUCTION TO APACHE CASSANDRA ➤ What is Apache Cassandra? Why to use it? ➤ Cassandra Architecture ➤ Cassandra Query Language (CQL) ➤ Cassandra Data Modeling ➤ How to install and run Cassandra? ➤ Cassandra nodetool ➤ Backup and Recovery 3
  4. 4. WHAT IS APACHE CASSANDRA? WHY TO USE IT? 4
  5. 5. WHAT IS APACHE CASSANDRA? WHY TO USE IT? ➤ Fast Distributed (Column Family NoSQL) Database High availability Linear Scalability High Performance ➤ Fault tolerant on Commodity Hardware ➤ Multi-Data Center Support ➤ Easy to operate ➤ Proven: CERN, Netflix, eBay, GitHub, Instagram, Reddit 5
  6. 6. HIGH AVAILABILITY: CAP THEOREM AND CASSANDRA 6 Partition Tolerance Availability Consistency
 (ACID) RDBMS Atomicity Consistency Isolation Durability
  7. 7. HIGH AVAILABILITY: THE RING 7 NO MASTER NO SLAVE PEER TO PEER gossip gossip I'm online!
  8. 8. LINEAR SCALABILITY 8
  9. 9. CASSANDRA ARCHITECTURE 9
  10. 10. CASSANDRA PARTITIONS 10 EMAIL NAME PHONE gokhan@ Gokhan 542xxxxxxx aylin@ Aylin 532xxxxxxx ilayda@ Ilayda 532xxxxxxx partitionerPRIMARY KEY PARTITION KEY, CLUSTERING KEY
  11. 11. REPLICATION FACTOR 11 EMAIL gokhan@ Murmur3Partitioner # 60
  12. 12. WRITE PATH (CLUSTER) 12 coordinator node client hinted hand off
  13. 13. WRITE PATH (NODE) ➤ Logging data in the commit log ➤ Writing data to the memtable ➤ Flushing to (immutable) SSTables (Sorted Strings Table) 13 memtable commit log SSTable SSTable SSTable disk mem flush compaction
  14. 14. READ PATH (CLUSTER) 14 coordinator node client ➤ Read Repair: repair during read path using digest and timestamp data digest digest
  15. 15. READ PATH (NODE) 15 memtable row (read) cache bloom filter
 (maybe or no) partition key cache partition summary partition index SSTable found maybe found no disk mem
  16. 16. CONSISTENCY LEVELS ➤ Formula for Strong Consistency: R + W > N 16 ANY (write only) at least one node ONE, TWO, THREE at least one/two/three replica node QUORUM a quorum (N/2+1) of replica nodes across all datacenters LOCAL_QUORUM a quorum (N/2+1) of replica nodes in the same datacenter ALL on all replica nodes
  17. 17. CASSANDRA QUERY LANGUAGE (CQL) 17
  18. 18. CASSANDRA QUERY LANGUAGE (CQL) ➤ Create a Keyspace (Database):
 create keyspace demo with replication = { 'class' : 'SimpleStrategy', 'replication_factor' :1 }; ➤ Remove a keyspace:
 drop keyspace demo; ➤ Select a keyspace to operate:
 use demo; 18
  19. 19. CASSANDRA QUERY LANGUAGE (CQL) ➤ Create a table:
 create table demo.democlients ( email text, name text, phone text, primary key (email, name)); ➤ Alter a table:
 alter table democlients add money int; ➤ Remove a table:
 drop table democlients; ➤ Remove all rows in a table:
 truncate table democlients; 19 EMAIL: PARTITION KEY NAME: CLUSTERING KEY
  20. 20. CASSANDRA QUERY LANGUAGE (CQL) ➤ Retrieve rows:
 select * from democlients where name='Gokhan Atil' ALLOW FILTERING; -- or create a secondary index ➤ Retrieve distinct values:
 select DISTINCT email from democlients; ➤ Limit the number of rows returned:
 select * from democlients LIMIT 1; ➤ Sort the result:
 select * from democlients where email='gokhan at gokhanatil.com' ORDER by name DESC; 20 NAME: CLUSTERING KEY EMAIL: PARTITION KEY
  21. 21. CASSANDRA QUERY LANGUAGE (CQL) ➤ Retrieve the results in the JSON format:
 select JSON * from democlients; ➤ Insert a row:
 insert into democlients (email, name, phone) values ('gokhan at gokhanatil.com','Gokhan Atil','542' ) IF NOT EXISTS; ➤ Insert a row with TTL (Time to live - seconds):
 insert into democlients (email, name, phone) values ('info at gokhanatil.com','Information','542' ) USING TTL 10; 21
  22. 22. CASSANDRA QUERY LANGUAGE (CQL) ➤ Update records:
 update democlients set phone='535' where email='gokhan at gokhanatil.com' and 
 name='Gokhan' IF EXISTS; ➤ Update records with a condition:
 update democlients set money=20 where email='gokhan at gokhanatil.com' and name='Gokhan Atil' 
 IF phone='542'; ➤ Delete rows:
 delete from democlients where email='gokhan at gokhanatil.com' IF EXISTS; 22
  23. 23. CASSANDRA QUERY LANGUAGE (CQL) ➤ Delete row with a condition:
 delete from democlients where email='gokhan at gokhanatil.com' and name='Gokhan Atil' IF money > 10; ➤ Delete columns in a row:
 delete money from democlients where email='gokhan at gokhanatil.com' and name='Gokhan Atil'; 23
  24. 24. CASSANDRA DATA MODELING ➤ Query-Driven Data Modeling ➤ Spread data evenly across the cluster ➤ Use Denormalization ➤ Be careful about using secondary indexes 24
  25. 25. HOW TO INSTALL AND RUN CASSANDRA? 25
  26. 26. HOW TO INSTALL AND RUN CASSANDRA CLUSTER? ➤ Make sure you have JDK (8u40 or newer) installed ➤ Download apache-cassandra-VERSION-bin.tar.gz ➤ Extract the file to a folder ➤ Make data and logs directories in cassandra folder ➤ Run bin/cassandra ➤ Edit the configuration file (conf/cassandra.yaml) ➤ Give a name to cluster, change listening address, data and logs directory locations, enable authentication and authorization. 26
  27. 27. HOW TO INSTALL AND RUN CASSANDRA CLUSTER? ➤ User docker to pull the latest image:
 docker pull cassandra ➤ Run it as standalone:
 docker run --name cas1 -p 9042:9042 -e CASSANDRA_CLUSTER_NAME=MyCluster -d cassandra ➤ Connect using clqsh:
 docker exec -it cas1 cqlsh ➤ Run nodetool (i.e for check status):
 docker exec -it cas1 nodetool status 27
  28. 28. CASSANDRA NODETOOL 28
  29. 29. CASSANDRA NODETOOL ➤ Get a quick summary of the node:
 nodetool info ➤ Get version of Cassandra:
 nodetool version 29
  30. 30. CASSANDRA NODETOOL ➤ Get status of the cluster/keyspace:
 nodetool status <keyspace_name> ➤ View the network statistics of the node:
 nodetool netstats ➤ Get information of a table:
 nodetool cfstats <keyspace_name.table_name> 30
  31. 31. CASSANDRA NODETOOL ➤ Repair a node (you can run it weekly on non-peak hours):
 nodetool repair ➤ Cleanup of keys no longer belonging to a node:
 nodetool cleanup ➤ Start a major compaction process:
 nodetool compact ➤ Check the compaction process:
 nodetool compactionstats 31
  32. 32. CASSANDRA NODETOOL ➤ Decommission a node (to prepare to remove it):
 nodetool decommission <node_UUID> ➤ Remove a dead/or decommissioned node from the cluster:
 nodetool removenode <node_UUID> ➤ Take a snapshot (for backup):
 nodetool snapshot ➤ Remove previous snapshots:
 nodetool clearsnapshot 32
  33. 33. BACKUP AND RECOVERY 33
  34. 34. BACKUP AND RECOVERY ➤ Back up a cluster: 1. Take a snapshot of each node. 2. Move the snapshots to another storage (S3 bucket?) 3. Clean all the snapshots ➤ Restore node(s): ➤ Make sure schema exists ➤ Truncate table ➤ Copy most recent snapshots to a directory. Its name should be formatted as "keyspace/tablename". Run:
 sstableloader -d <nodeip> keyspace/tablename 34
  35. 35. BUILD A BACKUP NODE ➤ Use multi-DC replication:
 CREATE KEYSPACE "MyKeyspace"
 WITH replication = { 
 'class' : 'NetworkTopologyStrategy',
 'datacenter1' : 3, 'datacenter2' : 1 }; 35 RF=3 client snapshots
  36. 36. QUESTIONS? 36
  37. 37. Blog: www.gokhanatil.com Twitter: @gokhanatil

×

Related Articles

cassandra
slides
java

Seattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra

Josh Turner

9/23/2020

cassandra
slides

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra