9/23/2020

Reading time:6 min

Introduction to Cassandra

by Gokhan Atil

Introduction to Cassandra SlideShare Explore You Successfully reported this slideshow.Introduction to CassandraUpcoming SlideShareLoading in …5× 3 Comments 6 Likes Statistics Notes Nandlal Sarda , Professor at IIT Bombay, at Indian Institute of Technology, Bombay Michael (毕建华) Bi , Solution Architect APAC at Irdeto at Irdeto Gawi Jyu Bui Kiet , gfg at fgf at gfg Mary Ann Redd / CCNA / PARALEGAL , CCNA, CWMP, SAP, on the road to AWS Certifications Show More No DownloadsNo notes for slide 1. INTRODUCTION TOAPACHE CASSANDRAGökhan Atıl 2. GÖKHAN ATIL➤ Database Administrator➤ Oracle ACE Director (2016) ACE (2011)➤ 10g/11g and R12 Oracle Certiﬁed Professional (OCP)➤ Co-author of Expert Oracle Enterprise Manager 12c➤ Founding Member and Vice President of TROUG➤ Blogger (since 2008) gokhanatil.com➤ Twitter: @gokhanatil2 3. INTRODUCTION TO APACHE CASSANDRA➤ What is Apache Cassandra? Why to use it?➤ Cassandra Architecture➤ Cassandra Query Language (CQL)➤ Cassandra Data Modeling➤ How to install and run Cassandra?➤ Cassandra nodetool➤ Backup and Recovery3 4. WHAT IS APACHE CASSANDRA? WHY TO USE IT?4 5. WHAT IS APACHE CASSANDRA? WHY TO USE IT?➤ Fast Distributed (Column Family NoSQL) DatabaseHigh availabilityLinear ScalabilityHigh Performance➤ Fault tolerant on Commodity Hardware➤ Multi-Data Center Support➤ Easy to operate➤ Proven: CERN, Netﬂix, eBay, GitHub, Instagram, Reddit5 6. HIGH AVAILABILITY: CAP THEOREM AND CASSANDRA6PartitionToleranceAvailabilityConsistency (ACID)RDBMSAtomicityConsistencyIsolationDurability 7. HIGH AVAILABILITY: THE RING7NO MASTER NO SLAVEPEER TOPEERgossipgossipI'm online! 8. LINEAR SCALABILITY8 9. CASSANDRA ARCHITECTURE9 10. CASSANDRA PARTITIONS10EMAIL NAME PHONEgokhan@ Gokhan 542xxxxxxxaylin@ Aylin 532xxxxxxxilayda@ Ilayda 532xxxxxxxpartitionerPRIMARY KEYPARTITION KEY, CLUSTERING KEY 11. REPLICATION FACTOR11EMAILgokhan@Murmur3Partitioner# 60 12. WRITE PATH (CLUSTER)12coordinatornodeclienthintedhand off 13. WRITE PATH (NODE)➤ Logging data in the commit log➤ Writing data to the memtable➤ Flushing to (immutable)SSTables (Sorted Strings Table)13memtablecommit log SSTable SSTable SSTablediskmemflushcompaction 14. READ PATH (CLUSTER)14coordinatornodeclient➤ Read Repair: repair during read path using digest and timestampdatadigestdigest 15. READ PATH (NODE)15memtable row (read) cachebloom filter (maybe or no)partition keycachepartitionsummarypartition index SSTablefoundmaybefoundnodiskmem 16. CONSISTENCY LEVELS➤ Formula for Strong Consistency: R + W > N16ANY (write only) at least one nodeONE, TWO, THREEat least one/two/three replicanodeQUORUMa quorum (N/2+1) of replicanodes across all datacentersLOCAL_QUORUMa quorum (N/2+1) of replicanodes in the same datacenterALL on all replica nodes 17. CASSANDRA QUERY LANGUAGE (CQL)17 18. CASSANDRA QUERY LANGUAGE (CQL)➤ Create a Keyspace (Database): create keyspace demo with replication = { 'class' :'SimpleStrategy', 'replication_factor' :1 };➤ Remove a keyspace: drop keyspace demo;➤ Select a keyspace to operate: use demo;18 19. CASSANDRA QUERY LANGUAGE (CQL)➤ Create a table: create table demo.democlients ( email text, name text,phone text, primary key (email, name));➤ Alter a table: alter table democlients add money int;➤ Remove a table: drop table democlients;➤ Remove all rows in a table: truncate table democlients;19EMAIL: PARTITION KEYNAME: CLUSTERING KEY 20. CASSANDRA QUERY LANGUAGE (CQL)➤ Retrieve rows: select * from democlients where name='Gokhan Atil'ALLOW FILTERING; -- or create a secondary index➤ Retrieve distinct values: select DISTINCT email from democlients;➤ Limit the number of rows returned: select * from democlients LIMIT 1;➤ Sort the result: select * from democlients where email='gokhan atgokhanatil.com' ORDER by name DESC;20NAME: CLUSTERING KEYEMAIL: PARTITION KEY 21. CASSANDRA QUERY LANGUAGE (CQL)➤ Retrieve the results in the JSON format: select JSON * from democlients;➤ Insert a row: insert into democlients (email, name, phone) values('gokhan at gokhanatil.com','Gokhan Atil','542' ) IF NOTEXISTS;➤ Insert a row with TTL (Time to live - seconds): insert into democlients (email, name, phone) values ('infoat gokhanatil.com','Information','542' ) USING TTL 10;21 22. CASSANDRA QUERY LANGUAGE (CQL)➤ Update records: update democlients set phone='535' whereemail='gokhan at gokhanatil.com' and  name='Gokhan' IF EXISTS;➤ Update records with a condition: update democlients set money=20 where email='gokhanat gokhanatil.com' and name='Gokhan Atil'  IF phone='542';➤ Delete rows: delete from democlients where email='gokhan atgokhanatil.com' IF EXISTS;22 23. CASSANDRA QUERY LANGUAGE (CQL)➤ Delete row with a condition: delete from democlients where email='gokhan atgokhanatil.com' and name='Gokhan Atil' IF money > 10;➤ Delete columns in a row: delete money from democlients where email='gokhan atgokhanatil.com' and name='Gokhan Atil';23 24. CASSANDRA DATA MODELING➤ Query-Driven Data Modeling➤ Spread data evenly across the cluster➤ Use Denormalization➤ Be careful about using secondary indexes24 25. HOW TO INSTALL AND RUN CASSANDRA?25 26. HOW TO INSTALL AND RUN CASSANDRA CLUSTER?➤ Make sure you have JDK (8u40 or newer) installed➤ Download apache-cassandra-VERSION-bin.tar.gz➤ Extract the ﬁle to a folder➤ Make data and logs directories in cassandra folder➤ Run bin/cassandra➤ Edit the conﬁguration ﬁle (conf/cassandra.yaml)➤ Give a name to cluster, change listening address, data and logsdirectory locations, enable authentication and authorization.26 27. HOW TO INSTALL AND RUN CASSANDRA CLUSTER?➤ User docker to pull the latest image: docker pull cassandra➤ Run it as standalone: docker run --name cas1 -p 9042:9042 -eCASSANDRA_CLUSTER_NAME=MyCluster -d cassandra➤ Connect using clqsh: docker exec -it cas1 cqlsh➤ Run nodetool (i.e for check status): docker exec -it cas1 nodetool status27 28. CASSANDRA NODETOOL28 29. CASSANDRA NODETOOL➤ Get a quick summary of the node: nodetool info➤ Get version of Cassandra: nodetool version29 30. CASSANDRA NODETOOL➤ Get status of the cluster/keyspace: nodetool status <keyspace_name>➤ View the network statistics of the node: nodetool netstats➤ Get information of a table: nodetool cfstats <keyspace_name.table_name>30 31. CASSANDRA NODETOOL➤ Repair a node (you can run it weekly on non-peak hours): nodetool repair➤ Cleanup of keys no longer belonging to a node: nodetool cleanup➤ Start a major compaction process: nodetool compact➤ Check the compaction process: nodetool compactionstats31 32. CASSANDRA NODETOOL➤ Decommission a node (to prepare to remove it): nodetool decommission <node_UUID>➤ Remove a dead/or decommissioned node from the cluster: nodetool removenode <node_UUID>➤ Take a snapshot (for backup): nodetool snapshot➤ Remove previous snapshots: nodetool clearsnapshot32 33. BACKUP AND RECOVERY33 34. BACKUP AND RECOVERY➤ Back up a cluster:1. Take a snapshot of each node.2. Move the snapshots to another storage (S3 bucket?)3. Clean all the snapshots➤ Restore node(s):➤ Make sure schema exists➤ Truncate table➤ Copy most recent snapshots to a directory. Its name shouldbe formatted as "keyspace/tablename". Run: sstableloader -d <nodeip> keyspace/tablename34 35. BUILD A BACKUP NODE➤ Use multi-DC replication: CREATE KEYSPACE "MyKeyspace" WITH replication = {  'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3, 'datacenter2' : 1 };35RF=3clientsnapshots 36. QUESTIONS?36 37. Blog: www.gokhanatil.com Twitter: @gokhanatil Recommended Introduction to Spark with PythonGokhan Atil SQL or noSQL - Oracle Cloud Day IstanbulGokhan Atil EM13c: Write Powerful Scripts with EMCLIGokhan Atil Oracle Enterprise Manager Cloud Control 13c for DBAsGokhan Atil Essential Linux Commands for DBAsGokhan Atil Oracle Enterprise Manager Cloud Control 13c for DBAsGokhan Atil Enterprise Manager: Write powerful scripts with EMCLIGokhan Atil About Blog Terms Privacy Copyright × Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Read this article if you want to know more about Introduction to Cassandra

Introduction to Cassandra

SlideShare Explore You

Successfully reported this slideshow.

Introduction to Cassandra

INTRODUCTION TO
APACHE CASSANDRA
Gökhan Atıl

Upcoming SlideShare

Loading in …5

×

3 Comments

1. INTRODUCTION TO APACHE CASSANDRA Gökhan Atıl
2. GÖKHAN ATIL ➤ Database Administrator ➤ Oracle ACE Director (2016)  ACE (2011) ➤ 10g/11g and R12 Oracle Certiﬁed Professional (OCP) ➤ Co-author of Expert Oracle Enterprise Manager 12c ➤ Founding Member and Vice President of TROUG ➤ Blogger (since 2008) gokhanatil.com ➤ Twitter: @gokhanatil 2
3. INTRODUCTION TO APACHE CASSANDRA ➤ What is Apache Cassandra? Why to use it? ➤ Cassandra Architecture ➤ Cassandra Query Language (CQL) ➤ Cassandra Data Modeling ➤ How to install and run Cassandra? ➤ Cassandra nodetool ➤ Backup and Recovery 3
4. WHAT IS APACHE CASSANDRA? WHY TO USE IT? 4
5. WHAT IS APACHE CASSANDRA? WHY TO USE IT? ➤ Fast Distributed (Column Family NoSQL) Database High availability Linear Scalability High Performance ➤ Fault tolerant on Commodity Hardware ➤ Multi-Data Center Support ➤ Easy to operate ➤ Proven: CERN, Netﬂix, eBay, GitHub, Instagram, Reddit 5
6. HIGH AVAILABILITY: CAP THEOREM AND CASSANDRA 6 Partition Tolerance Availability Consistency  (ACID) RDBMS Atomicity Consistency Isolation Durability
7. HIGH AVAILABILITY: THE RING 7 NO MASTER NO SLAVE PEER TO PEER gossip gossip I'm online!
8. LINEAR SCALABILITY 8
9. CASSANDRA ARCHITECTURE 9
10. CASSANDRA PARTITIONS 10 EMAIL NAME PHONE gokhan@ Gokhan 542xxxxxxx aylin@ Aylin 532xxxxxxx ilayda@ Ilayda 532xxxxxxx partitionerPRIMARY KEY PARTITION KEY, CLUSTERING KEY
11. REPLICATION FACTOR 11 EMAIL gokhan@ Murmur3Partitioner # 60
12. WRITE PATH (CLUSTER) 12 coordinator node client hinted hand off
13. WRITE PATH (NODE) ➤ Logging data in the commit log ➤ Writing data to the memtable ➤ Flushing to (immutable) SSTables (Sorted Strings Table) 13 memtable commit log SSTable SSTable SSTable disk mem flush compaction
14. READ PATH (CLUSTER) 14 coordinator node client ➤ Read Repair: repair during read path using digest and timestamp data digest digest
15. READ PATH (NODE) 15 memtable row (read) cache bloom filter  (maybe or no) partition key cache partition summary partition index SSTable found maybe found no disk mem
16. CONSISTENCY LEVELS ➤ Formula for Strong Consistency: R + W > N 16 ANY (write only) at least one node ONE, TWO, THREE at least one/two/three replica node QUORUM a quorum (N/2+1) of replica nodes across all datacenters LOCAL_QUORUM a quorum (N/2+1) of replica nodes in the same datacenter ALL on all replica nodes
17. CASSANDRA QUERY LANGUAGE (CQL) 17
18. CASSANDRA QUERY LANGUAGE (CQL) ➤ Create a Keyspace (Database):  create keyspace demo with replication = { 'class' : 'SimpleStrategy', 'replication_factor' :1 }; ➤ Remove a keyspace:  drop keyspace demo; ➤ Select a keyspace to operate:  use demo; 18
19. CASSANDRA QUERY LANGUAGE (CQL) ➤ Create a table:  create table demo.democlients ( email text, name text, phone text, primary key (email, name)); ➤ Alter a table:  alter table democlients add money int; ➤ Remove a table:  drop table democlients; ➤ Remove all rows in a table:  truncate table democlients; 19 EMAIL: PARTITION KEY NAME: CLUSTERING KEY
20. CASSANDRA QUERY LANGUAGE (CQL) ➤ Retrieve rows:  select * from democlients where name='Gokhan Atil' ALLOW FILTERING; -- or create a secondary index ➤ Retrieve distinct values:  select DISTINCT email from democlients; ➤ Limit the number of rows returned:  select * from democlients LIMIT 1; ➤ Sort the result:  select * from democlients where email='gokhan at gokhanatil.com' ORDER by name DESC; 20 NAME: CLUSTERING KEY EMAIL: PARTITION KEY
21. CASSANDRA QUERY LANGUAGE (CQL) ➤ Retrieve the results in the JSON format:  select JSON * from democlients; ➤ Insert a row:  insert into democlients (email, name, phone) values ('gokhan at gokhanatil.com','Gokhan Atil','542' ) IF NOT EXISTS; ➤ Insert a row with TTL (Time to live - seconds):  insert into democlients (email, name, phone) values ('info at gokhanatil.com','Information','542' ) USING TTL 10; 21
22. CASSANDRA QUERY LANGUAGE (CQL) ➤ Update records:  update democlients set phone='535' where email='gokhan at gokhanatil.com' and   name='Gokhan' IF EXISTS; ➤ Update records with a condition:  update democlients set money=20 where email='gokhan at gokhanatil.com' and name='Gokhan Atil'   IF phone='542'; ➤ Delete rows:  delete from democlients where email='gokhan at gokhanatil.com' IF EXISTS; 22
23. CASSANDRA QUERY LANGUAGE (CQL) ➤ Delete row with a condition:  delete from democlients where email='gokhan at gokhanatil.com' and name='Gokhan Atil' IF money > 10; ➤ Delete columns in a row:  delete money from democlients where email='gokhan at gokhanatil.com' and name='Gokhan Atil'; 23
24. CASSANDRA DATA MODELING ➤ Query-Driven Data Modeling ➤ Spread data evenly across the cluster ➤ Use Denormalization ➤ Be careful about using secondary indexes 24
25. HOW TO INSTALL AND RUN CASSANDRA? 25
26. HOW TO INSTALL AND RUN CASSANDRA CLUSTER? ➤ Make sure you have JDK (8u40 or newer) installed ➤ Download apache-cassandra-VERSION-bin.tar.gz ➤ Extract the ﬁle to a folder ➤ Make data and logs directories in cassandra folder ➤ Run bin/cassandra ➤ Edit the conﬁguration ﬁle (conf/cassandra.yaml) ➤ Give a name to cluster, change listening address, data and logs directory locations, enable authentication and authorization. 26
27. HOW TO INSTALL AND RUN CASSANDRA CLUSTER? ➤ User docker to pull the latest image:  docker pull cassandra ➤ Run it as standalone:  docker run --name cas1 -p 9042:9042 -e CASSANDRA_CLUSTER_NAME=MyCluster -d cassandra ➤ Connect using clqsh:  docker exec -it cas1 cqlsh ➤ Run nodetool (i.e for check status):  docker exec -it cas1 nodetool status 27
28. CASSANDRA NODETOOL 28
29. CASSANDRA NODETOOL ➤ Get a quick summary of the node:  nodetool info ➤ Get version of Cassandra:  nodetool version 29
30. CASSANDRA NODETOOL ➤ Get status of the cluster/keyspace:  nodetool status <keyspace_name> ➤ View the network statistics of the node:  nodetool netstats ➤ Get information of a table:  nodetool cfstats <keyspace_name.table_name> 30
31. CASSANDRA NODETOOL ➤ Repair a node (you can run it weekly on non-peak hours):  nodetool repair ➤ Cleanup of keys no longer belonging to a node:  nodetool cleanup ➤ Start a major compaction process:  nodetool compact ➤ Check the compaction process:  nodetool compactionstats 31
32. CASSANDRA NODETOOL ➤ Decommission a node (to prepare to remove it):  nodetool decommission <node_UUID> ➤ Remove a dead/or decommissioned node from the cluster:  nodetool removenode <node_UUID> ➤ Take a snapshot (for backup):  nodetool snapshot ➤ Remove previous snapshots:  nodetool clearsnapshot 32
33. BACKUP AND RECOVERY 33
34. BACKUP AND RECOVERY ➤ Back up a cluster: 1. Take a snapshot of each node. 2. Move the snapshots to another storage (S3 bucket?) 3. Clean all the snapshots ➤ Restore node(s): ➤ Make sure schema exists ➤ Truncate table ➤ Copy most recent snapshots to a directory. Its name should be formatted as "keyspace/tablename". Run:  sstableloader -d <nodeip> keyspace/tablename 34
35. BUILD A BACKUP NODE ➤ Use multi-DC replication:  CREATE KEYSPACE "MyKeyspace"  WITH replication = {   'class' : 'NetworkTopologyStrategy',  'datacenter1' : 3, 'datacenter2' : 1 }; 35 RF=3 client snapshots
36. QUESTIONS? 36
37. Blog: www.gokhanatil.com Twitter: @gokhanatil

×

Visibility Others can see my Clipboard

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

slides

cassandra

slides

java

Seattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra

Josh Turner

9/23/2020

cassandra

slides

Cassandra @ T-Mobile

Josh Turner

9/23/2020

cassandra

slides

Introduction to Apache Cassandra

Knoldus Inc.

9/23/2020

cassandra

slides

Introduction to Apache Cassandra

Robert Stupp

9/23/2020

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us