10/15/2020

Reading time:3 min

Cassandra advanced data modeling

by Romain Hardouin

Cassandra advanced data modeling SlideShare Explore You Successfully reported this slideshow.Cassandra advanced data modelingUpcoming SlideShareLoading in …5× 3 Comments 10 Likes Statistics Notes Nusrat Sheikh myolisi ngubentombi Ali Mashatan mcoira Bruno Pacheco , Director at OUSIA Show More No DownloadsNo notes for slide 1. CassandraAdvanceddata modelingLyon Cassandra UsersRomain Hardouin2016-05-31 2. $ whoRomain$ pgrep -fl workCassandra architect$ whatis teadsNo.1 Video Advertising Marketplace 3. I. IntroductionII. Key principlesIII. Chebotko methodologyIV. Time handlingData modeling 4. I. Introduction 5. Theory 6. TheoryChebotko diagramsE&R 7. II. Key principles 8. Know yourdataDenormalizeKnow yourqueriesKey PrinciplesNest DataDuplicate Data 9. Know your domainConceptual Data Model, E&R●Entities●Relationships●Attributes / Keys●Cardinalities●ConstraintsKnow yourdata 10. Entities &relationshipsKnow yourdata 11. Query-driven modelApplication WorkflowNew needs?●New queries => new tables●Alter table possible?Know yourdataKnow yourqueries 12. Goal: one partition per queryAnti-pattern:●Table scan●Client joins (a.k.a multi-table)●Secondary index●Allow filteringKnow yourdataKnow yourqueries 13. Nest DataClustering columnsCollection columnsUDT columnsKnow yourdataDenormalize 14. Nest DataKnow yourdataDenormalizeCREATE TABLE actors_by_video (video_id uuid,actor_name text,character_name text,PRIMARY KEY ((video_id),actor_name, character_name)); 15. Duplicate dataWrites are cheap: « Joins on write »Duplication occurs at different levels:●Table: Materialized views●Partition●RowsKnow yourdataDenormalize 16. III. Chebotko Methodology 17. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »Application workflowApplication workflowQuery workflow Query list 18. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »Chebotko DiagramChebotko Diagram 19. actors_by_videovideo_id uuid Kactor_name text C↑character_name text C↑CREATE TABLE actors_by_video (video_id uuid,actor_name text,character_name text,PRIMARY KEY ((video_id), actor_name, character_name));Chebotko DiagramChebotko Diagram 20. MR 1Entities & RelationshipsMR 2Equalitysearch attributesMR 3Inequalitysearch attribuesChebotko mapping rulesMR 5Key attributes,uniquenessMR 4Ordering attributes<>=↑↓ 21. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »Chebotko mapping rulesChebotko mapping rules 22. Internet of ThingsDemoKashlev Data Modeler 23. IV. Time handling- Tombstones- TTL- UPSERTs 24. IV. Time handling- Tombstones- TTL- UPSERTs 25. Eventually consistencyNo instant deletesDeletes are writesSSTables are immutable filesWrites are spread across many files 26. Goal: avoid to read too many* tombstones......* see tombstone_warn_threshold & tombstone_failure_threshold 27. IV. Time handling- Tombstones- TTL- UPSERTs 28. TTLsTTLsData must be designed to be TTL'edtombstones 29. Why?What we add? 30. TIMEdimension 31. IV. Time handling- Tombstones- TTL- UPSERTs 32. UPSERTsUPSERTsSame INSERT over and over again?UPSERTs hide this behaviorWhat if… one day you want to add time 33. Questions? 34. Resources« A Big Data Modeling Methodology for Apache Cassandra »- Artem Chebotko, Andrey Kashlev & Shiyong Lu- www.cs.wayne.edu/andrey/papers/TR-BIGDATA-05-2015-CKL.pdfKDM- Andrey Kashlev- kdm.dataview.org Recommended Cassandra at teadsRomain Hardouin Cassandra: Open Source Bigtable + Dynamojbellis Advanced data modeling with apache cassandraPatrick McFadin Troubleshooting RabbitMQ and services that use itMichael Klishin Understanding How CQL3 Maps to Cassandra's Internal Data StructureDataStax OpenStack en 10 minutesRomain Hardouin Cassandra By Example: Data Modelling with CQL3Eric Evans About Blog Terms Privacy Copyright × Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Read this article if you want to know more about Cassandra advanced data modeling

Cassandra advanced data modeling

SlideShare Explore You

Successfully reported this slideshow.

Cassandra advanced data modeling

Upcoming SlideShare

Loading in …5

×

3 Comments

1. Cassandra Advanced data modeling Lyon Cassandra Users Romain Hardouin 2016-05-31
2. $ who Romain $ pgrep -fl work Cassandra architect $ whatis teads No.1 Video Advertising Marketplace
3. I. Introduction II. Key principles III. Chebotko methodology IV. Time handling Data modeling
4. I. Introduction
5. Theory
6. Theory Chebotko diagrams E&R
7. II. Key principles
8. Know your data DenormalizeKnow your queries Key Principles Nest Data Duplicate Data
9. Know your domain Conceptual Data Model, E&R ● Entities ● Relationships ● Attributes / Keys ● Cardinalities ● Constraints Know your data
10. Entities & relationships Know your data
11. Query-driven model Application Workflow New needs? ● New queries => new tables ● Alter table possible? Know your data Know your queries
12. Goal: one partition per query Anti-pattern: ● Table scan ● Client joins (a.k.a multi-table) ● Secondary index ● Allow filtering Know your data Know your queries
13. Nest Data Clustering columns Collection columns UDT columns Know your data Denormalize
14. Nest Data Know your data Denormalize CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id), actor_name, character_name) );
15. Duplicate data Writes are cheap: « Joins on write » Duplication occurs at different levels: ● Table: Materialized views ● Partition ● Rows Know your data Denormalize
16. III. Chebotko Methodology
17. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra » Application workflowApplication workflow Query workflow Query list
18. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra » Chebotko DiagramChebotko Diagram
19. actors_by_video video_id uuid K actor_name text C↑ character_name text C↑ CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id), actor_name, character_name) ); Chebotko DiagramChebotko Diagram
20. MR 1 Entities & Relationships MR 2 Equality search attributes MR 3 Inequality search attribues Chebotko mapping rules MR 5 Key attributes, uniqueness MR 4 Ordering attributes <>= ↑↓
21. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra » Chebotko mapping rulesChebotko mapping rules
22. Internet of Things Demo Kashlev Data Modeler
23. IV. Time handling - Tombstones - TTL - UPSERTs
24. IV. Time handling - Tombstones - TTL - UPSERTs
25. Eventually consistency No instant deletes Deletes are writes SSTables are immutable files Writes are spread across many files
26. Goal: avoid to read too many* tombstones ... ... * see tombstone_warn_threshold & tombstone_failure_threshold
27. IV. Time handling - Tombstones - TTL - UPSERTs
28. TTLsTTLs Data must be designed to be TTL'ed tombstones
29. Why? What we add?
30. TIMEdimension
31. IV. Time handling - Tombstones - TTL - UPSERTs
32. UPSERTsUPSERTs Same INSERT over and over again? UPSERTs hide this behavior What if… one day you want to add time
33. Questions?
34. Resources « A Big Data Modeling Methodology for Apache Cassandra » - Artem Chebotko, Andrey Kashlev & Shiyong Lu - www.cs.wayne.edu/andrey/papers/TR-BIGDATA-05-2015-CKL.pdf KDM - Andrey Kashlev - kdm.dataview.org

×

Visibility Others can see my Clipboard

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

data.modeling

cassandra

Search key of big partition in cassandra

John Doe

2/17/2023

data.modeling

cassandra

Apache Cassandra Data Partitioning

Anup Shirolkar

2/17/2023

data.modeling

cassandra

spark

Dealing with Large Spark Partitions

John Doe

2/17/2023

data.modeling

astra

cassandra

Data Modeling in Cassandra and Astra DB - NLJUG - Nederlandse Java User Group

John Doe

9/22/2022

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us