Successfully reported this slideshow.
Introduction to Cassandra





































Upcoming SlideShare
Loading in …5
×
No Downloads
No notes for slide
- 1. INTRODUCTION TO APACHE CASSANDRA Gökhan Atıl
- 2. GÖKHAN ATIL ➤ Database Administrator ➤ Oracle ACE Director (2016) ACE (2011) ➤ 10g/11g and R12 Oracle Certified Professional (OCP) ➤ Co-author of Expert Oracle Enterprise Manager 12c ➤ Founding Member and Vice President of TROUG ➤ Blogger (since 2008) gokhanatil.com ➤ Twitter: @gokhanatil 2
- 3. INTRODUCTION TO APACHE CASSANDRA ➤ What is Apache Cassandra? Why to use it? ➤ Cassandra Architecture ➤ Cassandra Query Language (CQL) ➤ Cassandra Data Modeling ➤ How to install and run Cassandra? ➤ Cassandra nodetool ➤ Backup and Recovery 3
- 4. WHAT IS APACHE CASSANDRA? WHY TO USE IT? 4
- 5. WHAT IS APACHE CASSANDRA? WHY TO USE IT? ➤ Fast Distributed (Column Family NoSQL) Database High availability Linear Scalability High Performance ➤ Fault tolerant on Commodity Hardware ➤ Multi-Data Center Support ➤ Easy to operate ➤ Proven: CERN, Netflix, eBay, GitHub, Instagram, Reddit 5
- 6. HIGH AVAILABILITY: CAP THEOREM AND CASSANDRA 6 Partition Tolerance Availability Consistency (ACID) RDBMS Atomicity Consistency Isolation Durability
- 7. HIGH AVAILABILITY: THE RING 7 NO MASTER NO SLAVE PEER TO PEER gossip gossip I'm online!
- 8. LINEAR SCALABILITY 8
- 9. CASSANDRA ARCHITECTURE 9
- 10. CASSANDRA PARTITIONS 10 EMAIL NAME PHONE gokhan@ Gokhan 542xxxxxxx aylin@ Aylin 532xxxxxxx ilayda@ Ilayda 532xxxxxxx partitionerPRIMARY KEY PARTITION KEY, CLUSTERING KEY
- 11. REPLICATION FACTOR 11 EMAIL gokhan@ Murmur3Partitioner # 60
- 12. WRITE PATH (CLUSTER) 12 coordinator node client hinted hand off
- 13. WRITE PATH (NODE) ➤ Logging data in the commit log ➤ Writing data to the memtable ➤ Flushing to (immutable) SSTables (Sorted Strings Table) 13 memtable commit log SSTable SSTable SSTable disk mem flush compaction
- 14. READ PATH (CLUSTER) 14 coordinator node client ➤ Read Repair: repair during read path using digest and timestamp data digest digest
- 15. READ PATH (NODE) 15 memtable row (read) cache bloom filter (maybe or no) partition key cache partition summary partition index SSTable found maybe found no disk mem
- 16. CONSISTENCY LEVELS ➤ Formula for Strong Consistency: R + W > N 16 ANY (write only) at least one node ONE, TWO, THREE at least one/two/three replica node QUORUM a quorum (N/2+1) of replica nodes across all datacenters LOCAL_QUORUM a quorum (N/2+1) of replica nodes in the same datacenter ALL on all replica nodes
- 17. CASSANDRA QUERY LANGUAGE (CQL) 17
- 18. CASSANDRA QUERY LANGUAGE (CQL) ➤ Create a Keyspace (Database): create keyspace demo with replication = { 'class' : 'SimpleStrategy', 'replication_factor' :1 }; ➤ Remove a keyspace: drop keyspace demo; ➤ Select a keyspace to operate: use demo; 18
- 19. CASSANDRA QUERY LANGUAGE (CQL) ➤ Create a table: create table demo.democlients ( email text, name text, phone text, primary key (email, name)); ➤ Alter a table: alter table democlients add money int; ➤ Remove a table: drop table democlients; ➤ Remove all rows in a table: truncate table democlients; 19 EMAIL: PARTITION KEY NAME: CLUSTERING KEY
- 20. CASSANDRA QUERY LANGUAGE (CQL) ➤ Retrieve rows: select * from democlients where name='Gokhan Atil' ALLOW FILTERING; -- or create a secondary index ➤ Retrieve distinct values: select DISTINCT email from democlients; ➤ Limit the number of rows returned: select * from democlients LIMIT 1; ➤ Sort the result: select * from democlients where email='gokhan at gokhanatil.com' ORDER by name DESC; 20 NAME: CLUSTERING KEY EMAIL: PARTITION KEY
- 21. CASSANDRA QUERY LANGUAGE (CQL) ➤ Retrieve the results in the JSON format: select JSON * from democlients; ➤ Insert a row: insert into democlients (email, name, phone) values ('gokhan at gokhanatil.com','Gokhan Atil','542' ) IF NOT EXISTS; ➤ Insert a row with TTL (Time to live - seconds): insert into democlients (email, name, phone) values ('info at gokhanatil.com','Information','542' ) USING TTL 10; 21
- 22. CASSANDRA QUERY LANGUAGE (CQL) ➤ Update records: update democlients set phone='535' where email='gokhan at gokhanatil.com' and name='Gokhan' IF EXISTS; ➤ Update records with a condition: update democlients set money=20 where email='gokhan at gokhanatil.com' and name='Gokhan Atil' IF phone='542'; ➤ Delete rows: delete from democlients where email='gokhan at gokhanatil.com' IF EXISTS; 22
- 23. CASSANDRA QUERY LANGUAGE (CQL) ➤ Delete row with a condition: delete from democlients where email='gokhan at gokhanatil.com' and name='Gokhan Atil' IF money > 10; ➤ Delete columns in a row: delete money from democlients where email='gokhan at gokhanatil.com' and name='Gokhan Atil'; 23
- 24. CASSANDRA DATA MODELING ➤ Query-Driven Data Modeling ➤ Spread data evenly across the cluster ➤ Use Denormalization ➤ Be careful about using secondary indexes 24
- 25. HOW TO INSTALL AND RUN CASSANDRA? 25
- 26. HOW TO INSTALL AND RUN CASSANDRA CLUSTER? ➤ Make sure you have JDK (8u40 or newer) installed ➤ Download apache-cassandra-VERSION-bin.tar.gz ➤ Extract the file to a folder ➤ Make data and logs directories in cassandra folder ➤ Run bin/cassandra ➤ Edit the configuration file (conf/cassandra.yaml) ➤ Give a name to cluster, change listening address, data and logs directory locations, enable authentication and authorization. 26
- 27. HOW TO INSTALL AND RUN CASSANDRA CLUSTER? ➤ User docker to pull the latest image: docker pull cassandra ➤ Run it as standalone: docker run --name cas1 -p 9042:9042 -e CASSANDRA_CLUSTER_NAME=MyCluster -d cassandra ➤ Connect using clqsh: docker exec -it cas1 cqlsh ➤ Run nodetool (i.e for check status): docker exec -it cas1 nodetool status 27
- 28. CASSANDRA NODETOOL 28
- 29. CASSANDRA NODETOOL ➤ Get a quick summary of the node: nodetool info ➤ Get version of Cassandra: nodetool version 29
- 30. CASSANDRA NODETOOL ➤ Get status of the cluster/keyspace: nodetool status <keyspace_name> ➤ View the network statistics of the node: nodetool netstats ➤ Get information of a table: nodetool cfstats <keyspace_name.table_name> 30
- 31. CASSANDRA NODETOOL ➤ Repair a node (you can run it weekly on non-peak hours): nodetool repair ➤ Cleanup of keys no longer belonging to a node: nodetool cleanup ➤ Start a major compaction process: nodetool compact ➤ Check the compaction process: nodetool compactionstats 31
- 32. CASSANDRA NODETOOL ➤ Decommission a node (to prepare to remove it): nodetool decommission <node_UUID> ➤ Remove a dead/or decommissioned node from the cluster: nodetool removenode <node_UUID> ➤ Take a snapshot (for backup): nodetool snapshot ➤ Remove previous snapshots: nodetool clearsnapshot 32
- 33. BACKUP AND RECOVERY 33
- 34. BACKUP AND RECOVERY ➤ Back up a cluster: 1. Take a snapshot of each node. 2. Move the snapshots to another storage (S3 bucket?) 3. Clean all the snapshots ➤ Restore node(s): ➤ Make sure schema exists ➤ Truncate table ➤ Copy most recent snapshots to a directory. Its name should be formatted as "keyspace/tablename". Run: sstableloader -d <nodeip> keyspace/tablename 34
- 35. BUILD A BACKUP NODE ➤ Use multi-DC replication: CREATE KEYSPACE "MyKeyspace" WITH replication = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3, 'datacenter2' : 1 }; 35 RF=3 client snapshots
- 36. QUESTIONS? 36
- 37. Blog: www.gokhanatil.com Twitter: @gokhanatil
Public clipboards featuring this slide
No public clipboards found for this slide