Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

10/15/2020

Reading time:3 min

Cassandra advanced data modeling

by Romain Hardouin

Cassandra advanced data modeling SlideShare Explore You Successfully reported this slideshow.Cassandra advanced data modelingUpcoming SlideShareLoading in …5× 3 Comments 10 Likes Statistics Notes Nusrat Sheikh myolisi ngubentombi Ali Mashatan mcoira Bruno Pacheco , Director at OUSIA Show More No DownloadsNo notes for slide 1. CassandraAdvanceddata modelingLyon Cassandra UsersRomain Hardouin2016-05-31 2. $ whoRomain$ pgrep -fl workCassandra architect$ whatis teadsNo.1 Video Advertising Marketplace 3. I. IntroductionII. Key principlesIII. Chebotko methodologyIV. Time handlingData modeling 4. I. Introduction 5. Theory 6. TheoryChebotko diagramsE&R 7. II. Key principles 8. Know yourdataDenormalizeKnow yourqueriesKey PrinciplesNest DataDuplicate Data 9. Know your domainConceptual Data Model, E&REntitiesRelationshipsAttributes / KeysCardinalitiesConstraintsKnow yourdata 10. Entities &relationshipsKnow yourdata 11. Query-driven modelApplication WorkflowNew needs?New queries => new tablesAlter table possible?Know yourdataKnow yourqueries 12. Goal: one partition per queryAnti-pattern:Table scanClient joins (a.k.a multi-table)Secondary indexAllow filteringKnow yourdataKnow yourqueries 13. Nest DataClustering columnsCollection columnsUDT columnsKnow yourdataDenormalize 14. Nest DataKnow yourdataDenormalizeCREATE TABLE actors_by_video (video_id uuid,actor_name text,character_name text,PRIMARY KEY ((video_id),actor_name, character_name)); 15. Duplicate dataWrites are cheap: « Joins on write »Duplication occurs at different levels:Table: Materialized viewsPartitionRowsKnow yourdataDenormalize 16. III. Chebotko Methodology 17. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »Application workflowApplication workflowQuery workflow Query list 18. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »Chebotko DiagramChebotko Diagram 19. actors_by_videovideo_id uuid Kactor_name text C↑character_name text C↑CREATE TABLE actors_by_video (video_id uuid,actor_name text,character_name text,PRIMARY KEY ((video_id), actor_name, character_name));Chebotko DiagramChebotko Diagram 20. MR 1Entities & RelationshipsMR 2Equalitysearch attributesMR 3Inequalitysearch attribuesChebotko mapping rulesMR 5Key attributes,uniquenessMR 4Ordering attributes<>=↑↓ 21. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »Chebotko mapping rulesChebotko mapping rules 22. Internet of ThingsDemoKashlev Data Modeler 23. IV. Time handling- Tombstones- TTL- UPSERTs 24. IV. Time handling- Tombstones- TTL- UPSERTs 25. Eventually consistencyNo instant deletesDeletes are writesSSTables are immutable filesWrites are spread across many files 26. Goal: avoid to read too many* tombstones......* see tombstone_warn_threshold & tombstone_failure_threshold 27. IV. Time handling- Tombstones- TTL- UPSERTs 28. TTLsTTLsData must be designed to be TTL'edtombstones 29. Why?What we add? 30. TIMEdimension 31. IV. Time handling- Tombstones- TTL- UPSERTs 32. UPSERTsUPSERTsSame INSERT over and over again?UPSERTs hide this behaviorWhat if… one day you want to add time 33. Questions? 34. Resources« A Big Data Modeling Methodology for Apache Cassandra »- Artem Chebotko, Andrey Kashlev & Shiyong Lu- www.cs.wayne.edu/andrey/papers/TR-BIGDATA-05-2015-CKL.pdfKDM- Andrey Kashlev- kdm.dataview.org Recommended Cassandra at teadsRomain Hardouin Cassandra: Open Source Bigtable + Dynamojbellis Advanced data modeling with apache cassandraPatrick McFadin Troubleshooting RabbitMQ and services that use itMichael Klishin Understanding How CQL3 Maps to Cassandra's Internal Data StructureDataStax OpenStack en 10 minutesRomain Hardouin Cassandra By Example: Data Modelling with CQL3Eric Evans About Blog Terms Privacy Copyright × Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Illustration Image
Cassandra advanced data modeling

Successfully reported this slideshow.

Cassandra advanced data modeling
Cassandra
Advanced
data modeling
Lyon Cassandra Users
Romain Hardouin
2016-05-31
$ who
Romain
$ pgrep -fl work
Cassandra architect
$ whatis teads
No.1 Video Advertising Marketplace
I. Introduction
II. Key principles
III. Chebotko methodology
IV. Time handling
Data modeling
I. Introduction
Theory
Theory
Chebotko diagrams
E&R
II. Key principles
Know your
data
DenormalizeKnow your
queries
Key Principles
Nest Data
Duplicate Data
Know your domain
Conceptual Data Model, E&R
●
Entities
●
Relationships
●
Attributes / Keys
●
Cardinalities
●
Constraints
K...
Entities &
relationships
Know your
data
Query-driven model
Application Workflow
New needs?
●
New queries => new tables
●
Alter table possible?
Know your
data
Know...
Goal: one partition per query
Anti-pattern:
●
Table scan
●
Client joins (a.k.a multi-table)
●
Secondary index
●
Allow filt...
Nest Data
Clustering columns
Collection columns
UDT columns
Know your
data
Denormalize
Nest Data
Know your
data
Denormalize
CREATE TABLE actors_by_video (
video_id uuid,
actor_name text,
character_name text,
P...
Duplicate data
Writes are cheap: « Joins on write »
Duplication occurs at different levels:
●
Table: Materialized views
●
...
III. Chebotko Methodology
From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »...
From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »...
actors_by_video
video_id uuid K
actor_name text C↑
character_name text C↑
CREATE TABLE actors_by_video (
video_id uuid,
ac...
MR 1
Entities & Relationships
MR 2
Equality
search attributes
MR 3
Inequality
search attribues
Chebotko mapping rules
MR 5...
From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »...
Internet of Things
Demo
Kashlev Data Modeler
IV. Time handling
- Tombstones
- TTL
- UPSERTs
IV. Time handling
- Tombstones
- TTL
- UPSERTs
Eventually consistency
No instant deletes
Deletes are writes
SSTables are immutable files
Writes are spread across many fi...
Goal: avoid to read too many* tombstones
...
...
* see tombstone_warn_threshold & tombstone_failure_threshold
IV. Time handling
- Tombstones
- TTL
- UPSERTs
TTLsTTLs
Data must be designed to be TTL'ed
tombstones
Why?
What we add?
TIMEdimension
IV. Time handling
- Tombstones
- TTL
- UPSERTs
UPSERTsUPSERTs
Same INSERT over and over again?
UPSERTs hide this behavior
What if… one day you want to add time
Questions?
Resources
« A Big Data Modeling Methodology for Apache Cassandra »
- Artem Chebotko, Andrey Kashlev & Shiyong Lu
- www.cs....
Cassandra advanced data modeling
Cassandra advanced data modeling

Upcoming SlideShare

Loading in …5

×

  1. 1. Cassandra Advanced data modeling Lyon Cassandra Users Romain Hardouin 2016-05-31
  2. 2. $ who Romain $ pgrep -fl work Cassandra architect $ whatis teads No.1 Video Advertising Marketplace
  3. 3. I. Introduction II. Key principles III. Chebotko methodology IV. Time handling Data modeling
  4. 4. I. Introduction
  5. 5. Theory
  6. 6. Theory Chebotko diagrams E&R
  7. 7. II. Key principles
  8. 8. Know your data DenormalizeKnow your queries Key Principles Nest Data Duplicate Data
  9. 9. Know your domain Conceptual Data Model, E&R ● Entities ● Relationships ● Attributes / Keys ● Cardinalities ● Constraints Know your data
  10. 10. Entities & relationships Know your data
  11. 11. Query-driven model Application Workflow New needs? ● New queries => new tables ● Alter table possible? Know your data Know your queries
  12. 12. Goal: one partition per query Anti-pattern: ● Table scan ● Client joins (a.k.a multi-table) ● Secondary index ● Allow filtering Know your data Know your queries
  13. 13. Nest Data Clustering columns Collection columns UDT columns Know your data Denormalize
  14. 14. Nest Data Know your data Denormalize CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id), actor_name, character_name) );
  15. 15. Duplicate data Writes are cheap: « Joins on write » Duplication occurs at different levels: ● Table: Materialized views ● Partition ● Rows Know your data Denormalize
  16. 16. III. Chebotko Methodology
  17. 17. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra » Application workflowApplication workflow Query workflow Query list
  18. 18. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra » Chebotko DiagramChebotko Diagram
  19. 19. actors_by_video video_id uuid K actor_name text C↑ character_name text C↑ CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id), actor_name, character_name) ); Chebotko DiagramChebotko Diagram
  20. 20. MR 1 Entities & Relationships MR 2 Equality search attributes MR 3 Inequality search attribues Chebotko mapping rules MR 5 Key attributes, uniqueness MR 4 Ordering attributes <>= ↑↓
  21. 21. From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra » Chebotko mapping rulesChebotko mapping rules
  22. 22. Internet of Things Demo Kashlev Data Modeler
  23. 23. IV. Time handling - Tombstones - TTL - UPSERTs
  24. 24. IV. Time handling - Tombstones - TTL - UPSERTs
  25. 25. Eventually consistency No instant deletes Deletes are writes SSTables are immutable files Writes are spread across many files
  26. 26. Goal: avoid to read too many* tombstones ... ... * see tombstone_warn_threshold & tombstone_failure_threshold
  27. 27. IV. Time handling - Tombstones - TTL - UPSERTs
  28. 28. TTLsTTLs Data must be designed to be TTL'ed tombstones
  29. 29. Why? What we add?
  30. 30. TIMEdimension
  31. 31. IV. Time handling - Tombstones - TTL - UPSERTs
  32. 32. UPSERTsUPSERTs Same INSERT over and over again? UPSERTs hide this behavior What if… one day you want to add time
  33. 33. Questions?
  34. 34. Resources « A Big Data Modeling Methodology for Apache Cassandra » - Artem Chebotko, Andrey Kashlev & Shiyong Lu - www.cs.wayne.edu/andrey/papers/TR-BIGDATA-05-2015-CKL.pdf KDM - Andrey Kashlev - kdm.dataview.org

×

Related Articles

data.modeling
cassandra

Search key of big partition in cassandra

John Doe

2/17/2023

data.modeling
cassandra
spark

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

data.modeling