Successfully reported this slideshow.
Introduction to Apache Cassandra
Upcoming SlideShare
Loading in …5
×
- 1. Introduction to Apache 1
- 2. Me Robert Stupp Freelancer, Coder, Architect @snazy snazy@snazy.de Contributor to Apache Cassandra, 3.0 UDFs (CASSANDRA-7395 + related) Databases, Network, Backend 2
- 3. Agenda Apache Cassandra History Design Principles Outstanding differences CQL Intro Access C* Clusters Cassandra Future 3
- 4. Apache Cassandra History 4
- 5. Apache Cassandra started at Facebook inspired by Note: Facebook initially had two data centers. 5
- 6. 2.1 released in Sep 2014 6
- 7. Apache Cassandra Design Principles 7
- 8. Hardware failures can and will occur! Cassandra handles failures. From single node to whole data center. From client to server. 8
- 9. The complicated part when learning Cassandra, is to understand Cassandra’s simplicity 9
- 10. Keep it simple all nodes are equal master-less architecture no name nodes no SPOF (single point of failure) no read before modify (prevent race conditions) 10
- 11. Keep it running No need to take cluster down … e.g. during maintenance during software update Rolling restart is your friend 11
- 12. Outstanding Differences 12
- 13. Cassandra Highly scalable runs with a few nodes up to 1000+ nodes cluster! Linear scalability (proven!) Multi datacenter aware (world-wide!) No SPOF 13
- 14. Cassandra @ Apple 14
- 15. Linear Scalability 15
- 16. Scaling Cassandra More data? -> add more nodes Faster access? -> add more nodes 16
- 17. Read / Write performance Reads are fast Writes are even faster 17
- 18. Durability Writes are durable - period. 18
- 19. Availability @ Netflix 19 Chaos Monkey kills nodes randomly
- 20. Availability @ Netflix 20 Chaos Gorilla kill regions randomly
- 21. Availability @ Netflix Chaos Kong kills whole data centers 21
- 22. Availability @ Netflix http://de.slideshare.net/planetcassandra/ active-active-c-behind-the-scenes-at-netflix 22
- 23. 32 node cluster (Rasperry PIs) @DataStax 23
- 24. Most outstanding Great documentation Many blog posts Many presentations Many videos Regular webinars Huge, active and healthy community 24
- 25. Data Distribution 25
- 26. DHT Data is organized in a „Distributed Hash Table“ (hash over row key) 26
- 27. DHT 0 27 1 2 3 4 5 6 7
- 28. Replication 28
- 29. Replication Factor 2 0 29 1 2 3 4 5 6 7 Row A Row B
- 30. Replication Factor 3 0 30 1 2 3 4 5 6 7 Row A Row B
- 31. Consistency Consistency defined per request Several consistency levels (CLs) for different needs 31
- 32. Eventual consistency is not hopefully consistent EC means there’s a time gap until updates are consistently readable 32
- 33. Consistency Levels ANY (only for writes) ONE, LOCAL_ONE, TWO, THREE, (not recommended) ALL, (not recommended) QUORUM, LOCAL_QUORUM, EACH_QUORUM SERIAL, LOCAL_SERIAL 33
- 34. Consistency Data is always replicated CL defines how many replicas must fulfill the request 34
- 35. Write 0 35 1 2 3 4 5 6 7 Write
- 36. Write 0 36 1 2 3 4 5 6 7 Write
- 37. Mutli DC setup DC 1 DC 2 37
- 38. Multi DC replication 38 Write DC 1 DC 2
- 39. Mutli DC replication 39 Write DC 1 DC 2
- 40. Mutli DC replication 40 Write DC 1 DC 2
- 41. Replication & Consistency Define # of replicas using replication factor Define required consistency per request 41
- 42. CQL Introduction CQL = Cassandra query language 42
- 43. “CQL is SQL minus joins, minus subqueries, plus collections” (plus user types, plus tuple types) 43
- 44. Why CQL? Introduces a schema to Cassandra Familiar syntax Easy to understand DML operations are atomic 44
- 45. Data model (hierarchical view) Keyspace (schema) Table (column family) Row partition key (part of primary key) static columns clustering key (part of primary key) columns 45
- 46. CQL / DDL Similar to SQL CREATE TABLE … ALTER TABLE … DROP TABLE … 46
- 47. CQL / DML Similar to SQL INSERT … UPDATE … DELETE … SELECT … 47
- 48. CQL / BATCH Group related modifications (INSERT, UPDATE, DELETE) Atomic operation 48
- 49. CQL types boolean, int (32bit), bigint (64bit), float, double, decimal ("BigDecimal"), varint ("BigInteger"), ascii, text (= varchar), blob, inet, timestamp, uuid, timeuuid 49
- 50. CQL collection types list < foo > set < foo > map < foo , bar > Since C* 2.1 collections can contain any type - even other collections. 50
- 51. CQL composite types user types (C* 2.1) are composite types with named fields tuple types (C* 2.1) are unstructured lists of values 51
- 52. CQL / user types CREATE TYPE address ( street text, zip int, city text); CREATE TABLE users ( username text, addresses map<text, address>, ... 52
- 53. Cassandra Data Modeling Access by key no access by arbitrary WHERE clause Duplicate data (it’s ok!) Aggregate data Build application maintained indexes 53
- 54. RDBMS modeling 54
- 55. C* modeling 55
- 56. Data Modeling with RDBMS Driven by "How can I store something right?" "What answers do I have?" 56
- 57. Data Modeling with NoSQL Driven by "How can I access something right?" "What questions do I have?" 57
- 58. Data Modeling Basics Work top-down. Think about: What does the application do? What are the access patterns? Now design data model 58
- 59. Data Modeling http://de.slideshare.net/planetcassandra/ cassandra-day-sv-2014-fundamentals-of- apache-cassandra-data-modeling http://de.slideshare.net/planetcassandra/ data-modeling-with-travis-price 59
- 60. Accessing Cassandra 60
- 61. Command Line cqlsh CQL shell nodetool node/cluster administration 61
- 62. GUI: DevCenter Visual query tool 62
- 63. Stress test? Cassandra 2.1 comes with improved stress tool Simulate read+write workload Uses configurable data Works against older C* versions, too 63
- 64. DataStax APLv2 Open Source Drivers for Java for Python for C# for Scala / Spark https://github.com/datastax/ or http://www.datastax.com/download 64
- 65. Native protocol C*’s own net protocol for clients Request multiplexing Schema change notifications Cluster change notifications 65
- 66. Third Party Drivers for huge number of languages 66
- 67. Mappers High level mappers exist at least for Java Special case: Scala due to its strong+complex type model (DataStax OSS Spark driver) 67
- 68. Spark + Hadoop Yes - works really good Note: Spark is about 100x faster 68
- 69. Clusters 69
- 70. Cluster sizes C* works with a few nodes C* works with several hundred / thousand nodes 70
- 71. Cluster setup Configure for multiple data centers Plan for multi-DC setup :) 71
- 72. Cluster experience Remember: A single Cassandra clusters works over multiple data centers all over the world „Desaster proven“ Hurricanes Amazon DC outages 72
- 73. Apache Cassandra Future 73
- 74. Cassandra 3.0 (in development) User Defined Functions Aggregate functions Functional indexes Workload recording + playback Better SSTables, Fully off-heap row cache, Better serial consistency Indexes w/ high cardinality 74 Subject to change!!!
- 75. Get active ! 75
- 76. Cassandra Community http://cassandra.apache.org/ http://planetcassandra.org/ - Blog http://www.slideshare.net/ planetcassandra/presentations http://de.slideshare.net/DataStax/ presentations 76
- 77. Cassandra Community https://www.youtube.com/user/ PlanetCassandra https://www.youtube.com/user/DataStax http://www.datastax.com/dev/blog/ http://www.datastax.com/docs/ Users Mailing List users@cassandra.apache.org 77
- 78. Free C* Training! http://planetcassandra.org/cassandra-training/ 78
- 79. Get involved! Ask questions, submit RFEs or experiences to user mailing list user@cassandra.apache.org Answers arrive quickly! 79
- 80. Live Demo User Defined Functions 80
- 81. C* 3.0 UDFs Users create functions using CREATE FUNCTION … LANGUAGE … AS … Java, JavaScript, Scala, Groovy, JRuby, Jython Functions work on all nodes 81
- 82. C* 3.0 UDFs Example CREATE FUNCTION sin(input double) RETURNS double LANGUAGE javascript AS 'Math.sin(input)'; 82 This is JavaScript!
- 83. UDFs for what? Own aggregation code - e.g. SELECT sum(value) FROM table WHERE …; Functional indexes - e.g. CREATE INDEX idx ON table ( myFunction(colname) ); 83 Targeted for C* 3.0
- 84. Thanks for your attention Download Apache Cassandra at http://cassandra.apache.org/ Robert Stupp @snazy snazy@snazy.de de.slideshare.net/RobertStupp 84
- 85. Q & A 85
- 86. 86
- 87. BACKUP SLIDES User-Defined-Functions Demo 87
- 88. 88
- 89. 89
- 90. 90
- 91. 91
- 92. 92
- 93. 93
- 94. 94
- 95. 95
- 96. 96
- 97. 97
- 98. 98
- 99. 99
Public clipboards featuring this slide
No public clipboards found for this slide