9/23/2020

Reading time:7 min

Introduction to Apache Cassandra

by Robert Stupp

Introduction to Apache Cassandra SlideShare Explore You Successfully reported this slideshow.Introduction to Apache CassandraUpcoming SlideShareLoading in …5× 5 Comments 12 Likes Statistics Notes Ahmed Ali at Mahindra Satyam Anand Rudran , Middleware Integration Engineer at Infosys Jasonmao5 Kiran Babu Bing Yin , Senior Machine Learning Scientist at A9.com at A9.com Show More No DownloadsNo notes for slide 1. Introduction to Apache 1 2. Me Robert Stupp Freelancer, Coder, Architect @snazy snazy@snazy.de Contributor to Apache Cassandra, 3.0 UDFs (CASSANDRA-7395 + related) Databases, Network, Backend 2 3. Agenda Apache Cassandra History Design Principles Outstanding differences CQL Intro Access C* Clusters Cassandra Future 3 4. Apache Cassandra History 4 5. Apache Cassandra started at Facebook inspired by Note: Facebook initially had two data centers. 5 6. 2.1 released in Sep 2014 6 7. Apache Cassandra Design Principles 7 8. Hardware failures can and will occur! Cassandra handles failures. From single node to whole data center. From client to server. 8 9. The complicated part when learning Cassandra, is to understand Cassandra’s simplicity 9 10. Keep it simple all nodes are equal master-less architecture no name nodes no SPOF (single point of failure) no read before modify (prevent race conditions) 10 11. Keep it running No need to take cluster down … e.g. during maintenance during software update Rolling restart is your friend 11 12. Outstanding Differences 12 13. Cassandra Highly scalable runs with a few nodes up to 1000+ nodes cluster! Linear scalability (proven!) Multi datacenter aware (world-wide!) No SPOF 13 14. Cassandra @ Apple 14 15. Linear Scalability 15 16. Scaling Cassandra More data? -> add more nodes Faster access? -> add more nodes 16 17. Read / Write performance Reads are fast Writes are even faster 17 18. Durability Writes are durable - period. 18 19. Availability @ Netflix 19 Chaos Monkey kills nodes randomly 20. Availability @ Netflix 20 Chaos Gorilla kill regions randomly 21. Availability @ Netflix Chaos Kong kills whole data centers 21 22. Availability @ Netflix http://de.slideshare.net/planetcassandra/ active-active-c-behind-the-scenes-at-netflix 22 23. 32 node cluster (Rasperry PIs) @DataStax 23 24. Most outstanding Great documentation Many blog posts Many presentations Many videos Regular webinars Huge, active and healthy community 24 25. Data Distribution 25 26. DHT Data is organized in a „Distributed Hash Table“ (hash over row key) 26 27. DHT 0 27 1 2 3 4 5 6 7 28. Replication 28 29. Replication Factor 2 0 29 1 2 3 4 5 6 7 Row A Row B 30. Replication Factor 3 0 30 1 2 3 4 5 6 7 Row A Row B 31. Consistency Consistency defined per request Several consistency levels (CLs) for different needs 31 32. Eventual consistency is not hopefully consistent EC means there’s a time gap until updates are consistently readable 32 33. Consistency Levels ANY (only for writes) ONE, LOCAL_ONE, TWO, THREE, (not recommended) ALL, (not recommended) QUORUM, LOCAL_QUORUM, EACH_QUORUM SERIAL, LOCAL_SERIAL 33 34. Consistency Data is always replicated CL defines how many replicas must fulfill the request 34 35. Write 0 35 1 2 3 4 5 6 7 Write 36. Write 0 36 1 2 3 4 5 6 7 Write 37. Mutli DC setup DC 1 DC 2 37 38. Multi DC replication 38 Write DC 1 DC 2 39. Mutli DC replication 39 Write DC 1 DC 2 40. Mutli DC replication 40 Write DC 1 DC 2 41. Replication & Consistency Define # of replicas using replication factor Define required consistency per request 41 42. CQL Introduction CQL = Cassandra query language 42 43. “CQL is SQL minus joins, minus subqueries, plus collections” (plus user types, plus tuple types) 43 44. Why CQL? Introduces a schema to Cassandra Familiar syntax Easy to understand DML operations are atomic 44 45. Data model (hierarchical view) Keyspace (schema) Table (column family) Row partition key (part of primary key) static columns clustering key (part of primary key) columns 45 46. CQL / DDL Similar to SQL CREATE TABLE … ALTER TABLE … DROP TABLE … 46 47. CQL / DML Similar to SQL INSERT … UPDATE … DELETE … SELECT … 47 48. CQL / BATCH Group related modifications (INSERT, UPDATE, DELETE) Atomic operation 48 49. CQL types boolean, int (32bit), bigint (64bit), float, double, decimal ("BigDecimal"), varint ("BigInteger"), ascii, text (= varchar), blob, inet, timestamp, uuid, timeuuid 49 50. CQL collection types list < foo > set < foo > map < foo , bar > Since C* 2.1 collections can contain any type - even other collections. 50 51. CQL composite types user types (C* 2.1) are composite types with named fields tuple types (C* 2.1) are unstructured lists of values 51 52. CQL / user types CREATE TYPE address ( street text, zip int, city text); CREATE TABLE users ( username text, addresses map<text, address>, ... 52 53. Cassandra Data Modeling Access by key no access by arbitrary WHERE clause Duplicate data (it’s ok!) Aggregate data Build application maintained indexes 53 54. RDBMS modeling 54 55. C* modeling 55 56. Data Modeling with RDBMS Driven by "How can I store something right?" "What answers do I have?" 56 57. Data Modeling with NoSQL Driven by "How can I access something right?" "What questions do I have?" 57 58. Data Modeling Basics Work top-down. Think about: What does the application do? What are the access patterns? Now design data model 58 59. Data Modeling http://de.slideshare.net/planetcassandra/ cassandra-day-sv-2014-fundamentals-of- apache-cassandra-data-modeling http://de.slideshare.net/planetcassandra/ data-modeling-with-travis-price 59 60. Accessing Cassandra 60 61. Command Line cqlsh CQL shell nodetool node/cluster administration 61 62. GUI: DevCenter Visual query tool 62 63. Stress test? Cassandra 2.1 comes with improved stress tool Simulate read+write workload Uses configurable data Works against older C* versions, too 63 64. DataStax APLv2 Open Source Drivers for Java for Python for C# for Scala / Spark https://github.com/datastax/ or http://www.datastax.com/download 64 65. Native protocol C*’s own net protocol for clients Request multiplexing Schema change notifications Cluster change notifications 65 66. Third Party Drivers for huge number of languages 66 67. Mappers High level mappers exist at least for Java Special case: Scala due to its strong+complex type model (DataStax OSS Spark driver) 67 68. Spark + Hadoop Yes - works really good Note: Spark is about 100x faster 68 69. Clusters 69 70. Cluster sizes C* works with a few nodes C* works with several hundred / thousand nodes 70 71. Cluster setup Configure for multiple data centers Plan for multi-DC setup :) 71 72. Cluster experience Remember: A single Cassandra clusters works over multiple data centers all over the world „Desaster proven“ Hurricanes Amazon DC outages 72 73. Apache Cassandra Future 73 74. Cassandra 3.0 (in development) User Defined Functions Aggregate functions Functional indexes Workload recording + playback Better SSTables, Fully off-heap row cache, Better serial consistency Indexes w/ high cardinality 74 Subject to change!!! 75. Get active ! 75 76. Cassandra Community http://cassandra.apache.org/ http://planetcassandra.org/ - Blog http://www.slideshare.net/ planetcassandra/presentations http://de.slideshare.net/DataStax/ presentations 76 77. Cassandra Community https://www.youtube.com/user/ PlanetCassandra https://www.youtube.com/user/DataStax http://www.datastax.com/dev/blog/ http://www.datastax.com/docs/ Users Mailing List users@cassandra.apache.org 77 78. Free C* Training! http://planetcassandra.org/cassandra-training/ 78 79. Get involved! Ask questions, submit RFEs or experiences to user mailing list user@cassandra.apache.org Answers arrive quickly! 79 80. Live Demo User Defined Functions 80 81. C* 3.0 UDFs Users create functions using CREATE FUNCTION … LANGUAGE … AS … Java, JavaScript, Scala, Groovy, JRuby, Jython Functions work on all nodes 81 82. C* 3.0 UDFs Example CREATE FUNCTION sin(input double) RETURNS double LANGUAGE javascript AS 'Math.sin(input)'; 82 This is JavaScript! 83. UDFs for what? Own aggregation code - e.g. SELECT sum(value) FROM table WHERE …; Functional indexes - e.g. CREATE INDEX idx ON table ( myFunction(colname) ); 83 Targeted for C* 3.0 84. Thanks for your attention Download Apache Cassandra at http://cassandra.apache.org/ Robert Stupp @snazy snazy@snazy.de de.slideshare.net/RobertStupp 84 85. Q & A 85 86. 86 87. BACKUP SLIDES User-Defined-Functions Demo 87 88. 88 89. 89 90. 90 91. 91 92. 92 93. 93 94. 94 95. 95 96. 96 97. 97 98. 98 99. 99 Recommended Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...DataStax Academy Cql – cassandra query languageCourtney Robinson Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft Solr & Cassandra: Searching Cassandra with DataStax EnterpriseDataStax Academy Introduction to cassandraNguyen Quang Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax Cassandra at eBay - Cassandra Summit 2012Jay Patel About Blog Terms Privacy Copyright × Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Read this article if you want to know more about Introduction to Apache Cassandra

Introduction to Apache Cassandra

SlideShare Explore You

Successfully reported this slideshow.

Introduction to Apache Cassandra

Upcoming SlideShare

Loading in …5

5 Comments

1. Introduction to Apache 1
2. Me Robert Stupp Freelancer, Coder, Architect @snazy snazy@snazy.de Contributor to Apache Cassandra, 3.0 UDFs (CASSANDRA-7395 + related) Databases, Network, Backend 2
3. Agenda Apache Cassandra History Design Principles Outstanding differences CQL Intro Access C* Clusters Cassandra Future 3
4. Apache Cassandra History 4
5. Apache Cassandra started at Facebook inspired by Note: Facebook initially had two data centers. 5
6. 2.1 released in Sep 2014 6
7. Apache Cassandra Design Principles 7
8. Hardware failures can and will occur! Cassandra handles failures. From single node to whole data center. From client to server. 8
9. The complicated part when learning Cassandra, is to understand Cassandra’s simplicity 9
10. Keep it simple all nodes are equal master-less architecture no name nodes no SPOF (single point of failure) no read before modify (prevent race conditions) 10
11. Keep it running No need to take cluster down … e.g. during maintenance during software update Rolling restart is your friend 11
12. Outstanding Differences 12
13. Cassandra Highly scalable runs with a few nodes up to 1000+ nodes cluster! Linear scalability (proven!) Multi datacenter aware (world-wide!) No SPOF 13
14. Cassandra @ Apple 14
15. Linear Scalability 15
16. Scaling Cassandra More data? -> add more nodes Faster access? -> add more nodes 16
17. Read / Write performance Reads are fast Writes are even faster 17
18. Durability Writes are durable - period. 18
19. Availability @ Netflix 19 Chaos Monkey kills nodes randomly
20. Availability @ Netflix 20 Chaos Gorilla kill regions randomly
21. Availability @ Netflix Chaos Kong kills whole data centers 21
22. Availability @ Netflix http://de.slideshare.net/planetcassandra/ active-active-c-behind-the-scenes-at-netflix 22
23. 32 node cluster (Rasperry PIs) @DataStax 23
24. Most outstanding Great documentation Many blog posts Many presentations Many videos Regular webinars Huge, active and healthy community 24
25. Data Distribution 25
26. DHT Data is organized in a „Distributed Hash Table“ (hash over row key) 26
27. DHT 0 27 1 2 3 4 5 6 7
28. Replication 28
29. Replication Factor 2 0 29 1 2 3 4 5 6 7 Row A Row B
30. Replication Factor 3 0 30 1 2 3 4 5 6 7 Row A Row B
31. Consistency Consistency defined per request Several consistency levels (CLs) for different needs 31
32. Eventual consistency is not hopefully consistent EC means there’s a time gap until updates are consistently readable 32
33. Consistency Levels ANY (only for writes) ONE, LOCAL_ONE, TWO, THREE, (not recommended) ALL, (not recommended) QUORUM, LOCAL_QUORUM, EACH_QUORUM SERIAL, LOCAL_SERIAL 33
34. Consistency Data is always replicated CL defines how many replicas must fulfill the request 34
35. Write 0 35 1 2 3 4 5 6 7 Write
36. Write 0 36 1 2 3 4 5 6 7 Write
37. Mutli DC setup DC 1 DC 2 37
38. Multi DC replication 38 Write DC 1 DC 2
39. Mutli DC replication 39 Write DC 1 DC 2
40. Mutli DC replication 40 Write DC 1 DC 2
41. Replication & Consistency Define # of replicas using replication factor Define required consistency per request 41
42. CQL Introduction CQL = Cassandra query language 42
43. “CQL is SQL minus joins, minus subqueries, plus collections” (plus user types, plus tuple types) 43
44. Why CQL? Introduces a schema to Cassandra Familiar syntax Easy to understand DML operations are atomic 44
45. Data model (hierarchical view) Keyspace (schema) Table (column family) Row partition key (part of primary key) static columns clustering key (part of primary key) columns 45
46. CQL / DDL Similar to SQL CREATE TABLE … ALTER TABLE … DROP TABLE … 46
47. CQL / DML Similar to SQL INSERT … UPDATE … DELETE … SELECT … 47
48. CQL / BATCH Group related modifications (INSERT, UPDATE, DELETE) Atomic operation 48
49. CQL types boolean, int (32bit), bigint (64bit), float, double, decimal ("BigDecimal"), varint ("BigInteger"), ascii, text (= varchar), blob, inet, timestamp, uuid, timeuuid 49
50. CQL collection types list < foo > set < foo > map < foo , bar > Since C* 2.1 collections can contain any type - even other collections. 50
51. CQL composite types user types (C* 2.1) are composite types with named fields tuple types (C* 2.1) are unstructured lists of values 51
52. CQL / user types CREATE TYPE address ( street text, zip int, city text); CREATE TABLE users ( username text, addresses map<text, address>, ... 52
53. Cassandra Data Modeling Access by key no access by arbitrary WHERE clause Duplicate data (it’s ok!) Aggregate data Build application maintained indexes 53
54. RDBMS modeling 54
55. C* modeling 55
56. Data Modeling with RDBMS Driven by "How can I store something right?" "What answers do I have?" 56
57. Data Modeling with NoSQL Driven by "How can I access something right?" "What questions do I have?" 57
58. Data Modeling Basics Work top-down. Think about: What does the application do? What are the access patterns? Now design data model 58
59. Data Modeling http://de.slideshare.net/planetcassandra/ cassandra-day-sv-2014-fundamentals-of- apache-cassandra-data-modeling http://de.slideshare.net/planetcassandra/ data-modeling-with-travis-price 59
60. Accessing Cassandra 60
61. Command Line cqlsh CQL shell nodetool node/cluster administration 61
62. GUI: DevCenter Visual query tool 62
63. Stress test? Cassandra 2.1 comes with improved stress tool Simulate read+write workload Uses configurable data Works against older C* versions, too 63
64. DataStax APLv2 Open Source Drivers for Java for Python for C# for Scala / Spark https://github.com/datastax/ or http://www.datastax.com/download 64
65. Native protocol C*’s own net protocol for clients Request multiplexing Schema change notifications Cluster change notifications 65
66. Third Party Drivers for huge number of languages 66
67. Mappers High level mappers exist at least for Java Special case: Scala due to its strong+complex type model (DataStax OSS Spark driver) 67
68. Spark + Hadoop Yes - works really good Note: Spark is about 100x faster 68
69. Clusters 69
70. Cluster sizes C* works with a few nodes C* works with several hundred / thousand nodes 70
71. Cluster setup Configure for multiple data centers Plan for multi-DC setup :) 71
72. Cluster experience Remember: A single Cassandra clusters works over multiple data centers all over the world „Desaster proven“ Hurricanes Amazon DC outages 72
73. Apache Cassandra Future 73
74. Cassandra 3.0 (in development) User Defined Functions Aggregate functions Functional indexes Workload recording + playback Better SSTables, Fully off-heap row cache, Better serial consistency Indexes w/ high cardinality 74 Subject to change!!!
75. Get active ! 75
76. Cassandra Community http://cassandra.apache.org/ http://planetcassandra.org/ - Blog http://www.slideshare.net/ planetcassandra/presentations http://de.slideshare.net/DataStax/ presentations 76
77. Cassandra Community https://www.youtube.com/user/ PlanetCassandra https://www.youtube.com/user/DataStax http://www.datastax.com/dev/blog/ http://www.datastax.com/docs/ Users Mailing List users@cassandra.apache.org 77
78. Free C* Training! http://planetcassandra.org/cassandra-training/ 78
79. Get involved! Ask questions, submit RFEs or experiences to user mailing list user@cassandra.apache.org Answers arrive quickly! 79
80. Live Demo User Defined Functions 80
81. C* 3.0 UDFs Users create functions using CREATE FUNCTION … LANGUAGE … AS … Java, JavaScript, Scala, Groovy, JRuby, Jython Functions work on all nodes 81
82. C* 3.0 UDFs Example CREATE FUNCTION sin(input double) RETURNS double LANGUAGE javascript AS 'Math.sin(input)'; 82 This is JavaScript!
83. UDFs for what? Own aggregation code - e.g. SELECT sum(value) FROM table WHERE …; Functional indexes - e.g. CREATE INDEX idx ON table ( myFunction(colname) ); 83 Targeted for C* 3.0
84. Thanks for your attention Download Apache Cassandra at http://cassandra.apache.org/ Robert Stupp @snazy snazy@snazy.de de.slideshare.net/RobertStupp 84
85. Q & A 85
86. 86
87. BACKUP SLIDES User-Defined-Functions Demo 87
88. 88
89. 89
90. 90
91. 91
92. 92
93. 93
94. 94
95. 95
96. 96
97. 97
98. 98
99. 99

Visibility Others can see my Clipboard

cassandra

slides

java

Seattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra

Josh Turner

9/23/2020

cassandra

slides

Cassandra @ T-Mobile

Josh Turner

9/23/2020

cassandra

slides

Introduction to Apache Cassandra

Knoldus Inc.

9/23/2020

cassandra

slides

architecture

Cassandra Architecture FTW

Jeffrey Carpenter

9/23/2020

cassandra

slides

Introduction to Cassandra

Gokhan Atil

9/23/2020

cassandra

redis

slides

Cassandra Redis

Diego Pacheco

3/19/2020

resources

github

slides

HAPI Cassandra

John Doe

10/31/2017

resources

rest

slides

Cassandra DataTables Using Restful API

John Doe

8/1/2017

acid

open.source

cassandra

GitHub - pmcfadin/awesome-accord: Repository of all kinds of things to help you get up and running with ACID transactions on Apache Cassandra®

',p,m,c,f,a,d,i,n,'

1/16/2025

mongo

nocode

elasticsearch

GitHub - ibagroup-eu/Visual-Flow: Visual-Flow main repository

ibagroup-eu

12/2/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further