A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad…

Successfully reported this slideshow.

A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The Last Pickle) | C* Summit 2016
JON HADDAD
THE LAST PICKLE
LEARN DATA MODELING BY EXAMPLE
THIS IS
AWESOME!!!
WHAT’S THE LAST
PICKLE DO?
WE HELP MAKE YOU A
TEAM OF EXPERTS
> 50 YEARS COMBINED
EXPERIENCE
WHO IS THIS GUY?
15 YEARS
EXPERIENCE
4 YEARS WITH CASSANDRA
LEARNING HOW TO
CASSANDRA
WHAT’S YOUR
BACKGROUND?
ORACLE!
MYSQL!
POSTGRES!
CQL LOOKS
LIKE SQL
BAD
ASSUMPTIONS
3RD
NORMAL
FORM?
WHERE’S
MY
JOINS?
SECONDARY
INDEX?
DO IT WRONG
TRY TO DATA MODELGET ANGRY
WATCH VIDEOS & READ
EVERYTHING I
KNOW IS WRONG
LEARN BY
EXAMPLE
CASSANDRA DATASET
MANAGER
CDM
APT FOR CASSANDRA
DATA
INSTALL DATA TO YOUR
CASSANDRA CLUSTER
cdm install <dataset>
jhaddad@rustyrazorblade ~$ cdm list
Starting CDM
Datasets:
movielens
killrvideo
killrweather
Finished.
jhaddad@rustyrazorblade ~$ cdm install movielens
Starting CDM
Installing movielens
Checking for repo at /Users/jhaddad/.cd...
jhaddad@rustyrazorblade ~/dev/cassandra$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.10-...
WHAT CAN WE DO WITH IT?
▸ Learn by example
▸ Blog posts / Tutorials
▸ Jupyter notebooks
▸ Reference applications
▸ Data Mo...
MANAGING
REFERENCE / TEST DATA
DATASETS
MOVIELENS
DETAILS
▸GroupLens Research Project
▸University of Minnesota
▸100K ratings
▸1K users
▸1700 movies
cqlsh:movielens> select id, avg_rating, genres, name
... from movies limit 1;
@ Row 1
------------+-----------------------...
cqlsh:movielens> select * from users limit 1;
@ Row 1
------------+--------------------------------------
id | b52fcdfc-0e...
BLOG: WORKING
RELATIONALLY WITH
CASSANDRA
CONNECTING
CASSANDRA
DATA WITH
GRAPHFRAMES
cdm install killrweather
Helena Edelson
Patrick McFadin
cdm install killrvideo
Luke Tillman
Patrick McFadin
UPCOMING
DATA SETS
openflights.org
‣ airports
‣ flight data
HEALTH CARE
▸ Cancer Genome Atlas Project
▸ Ebola cases
▸ Healthcare financial data
▸ Dani Traphagen
NYC TAXI DATA
▸ pick up / drop off times & locations
▸ trip distances
▸ itemized fares
▸ rate types
▸ payment types
SOCIAL DATA
▸Higgs Twitter Data
▸Foursquare
▸Enron executive emails
HOW TO
CONTRIBUTE
https://github.com/riptano/cdm-java
ADD FEATURES
SUGGEST DATASETS
CREATE A DATASET
▸ create a git repo
▸ datasets.yaml
▸ schema.cql
▸ insert data
▸ “cdm dump”
▸ cdm install .
▸ create a PR...
@RUSTYRAZORBLADE
THANK YOU, KIND HUMANS
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The Last Pickle) | C* Summit 2016
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The Last Pickle) | C* Summit 2016
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The Last Pickle) | C* Summit 2016
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The Last Pickle) | C* Summit 2016
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The Last Pickle) | C* Summit 2016
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The Last Pickle) | C* Summit 2016
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The Last Pickle) | C* Summit 2016
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The Last Pickle) | C* Summit 2016

Upcoming SlideShare

Loading in …5

×

  1. 1. JON HADDAD THE LAST PICKLE LEARN DATA MODELING BY EXAMPLE THIS IS AWESOME!!!
  2. 2. WHAT’S THE LAST PICKLE DO?
  3. 3. WE HELP MAKE YOU A TEAM OF EXPERTS
  4. 4. > 50 YEARS COMBINED EXPERIENCE
  5. 5. WHO IS THIS GUY?
  6. 6. 15 YEARS EXPERIENCE
  7. 7. 4 YEARS WITH CASSANDRA
  8. 8. LEARNING HOW TO CASSANDRA
  9. 9. WHAT’S YOUR BACKGROUND?
  10. 10. ORACLE! MYSQL! POSTGRES!
  11. 11. CQL LOOKS LIKE SQL
  12. 12. BAD ASSUMPTIONS
  13. 13. 3RD NORMAL FORM?
  14. 14. WHERE’S MY JOINS?
  15. 15. SECONDARY INDEX?
  16. 16. DO IT WRONG TRY TO DATA MODELGET ANGRY WATCH VIDEOS & READ
  17. 17. EVERYTHING I KNOW IS WRONG
  18. 18. LEARN BY EXAMPLE
  19. 19. CASSANDRA DATASET MANAGER
  20. 20. CDM
  21. 21. APT FOR CASSANDRA DATA
  22. 22. INSTALL DATA TO YOUR CASSANDRA CLUSTER
  23. 23. cdm install <dataset>
  24. 24. jhaddad@rustyrazorblade ~$ cdm list Starting CDM Datasets: movielens killrvideo killrweather Finished.
  25. 25. jhaddad@rustyrazorblade ~$ cdm install movielens Starting CDM Installing movielens Checking for repo at /Users/jhaddad/.cdm/movielens Pulling latest CDM is using dataset path: /Users/jhaddad/.cdm/movielens cqlsh -e "DROP KEYSPACE IF EXISTS movielens; CREATE KEYSPACE movielens WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}" Schema: /Users/jhaddad/.cdm/movielens/schema.cql Loading data cqlsh -k movielens -e "COPY movies FROM '/Users/jhaddad/.cdm/movielens/data/ movies.csv'" cqlsh -k movielens -e "COPY users FROM '/Users/jhaddad/.cdm/movielens/data/ users.csv'" cqlsh -k movielens -e "COPY ratings_by_user FROM '/Users/jhaddad/.cdm/movielens data/ratings_by_user.csv'" cqlsh -k movielens -e "COPY original_movie_map FROM '/Users/jhaddad/.cdm/ movielens/data/original_movie_map.csv'"
  26. 26. jhaddad@rustyrazorblade ~/dev/cassandra$ cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.10-SNAPSHOT | CQL spec 3.4.3 | Native protocol v4] Use HELP for help. cqlsh> use movielens ; cqlsh:movielens> desc tables; movies users ratings_by_user original_movie_map ratings_by_movie
  27. 27. WHAT CAN WE DO WITH IT? ▸ Learn by example ▸ Blog posts / Tutorials ▸ Jupyter notebooks ▸ Reference applications ▸ Data Models for presentations
  28. 28. MANAGING REFERENCE / TEST DATA
  29. 29. DATASETS
  30. 30. MOVIELENS
  31. 31. DETAILS ▸GroupLens Research Project ▸University of Minnesota ▸100K ratings ▸1K users ▸1700 movies
  32. 32. cqlsh:movielens> select id, avg_rating, genres, name ... from movies limit 1; @ Row 1 ------------+-------------------------------------- id | 76a38f64-94d8-4b8f-b830-a40af96f8d20 avg_rating | 3.16667 genres | {'Drama'} name | Little Lord Fauntleroy (1936) (1 rows)
  33. 33. cqlsh:movielens> select * from users limit 1; @ Row 1 ------------+-------------------------------------- id | b52fcdfc-0eaf-4432-9896-aa22db56edb2 address | 0322 Mattie Ramp Apt. 177 age | 37 city | South Fremont gender | M name | Harrold Hills occupation | administrator zip | 06513 (1 rows)
  34. 34. BLOG: WORKING RELATIONALLY WITH CASSANDRA
  35. 35. CONNECTING CASSANDRA DATA WITH GRAPHFRAMES
  36. 36. cdm install killrweather
  37. 37. Helena Edelson Patrick McFadin
  38. 38. cdm install killrvideo
  39. 39. Luke Tillman Patrick McFadin
  40. 40. UPCOMING DATA SETS
  41. 41. openflights.org ‣ airports ‣ flight data
  42. 42. HEALTH CARE ▸ Cancer Genome Atlas Project ▸ Ebola cases ▸ Healthcare financial data ▸ Dani Traphagen
  43. 43. NYC TAXI DATA ▸ pick up / drop off times & locations ▸ trip distances ▸ itemized fares ▸ rate types ▸ payment types
  44. 44. SOCIAL DATA ▸Higgs Twitter Data ▸Foursquare ▸Enron executive emails
  45. 45. HOW TO CONTRIBUTE
  46. 46. https://github.com/riptano/cdm-java
  47. 47. ADD FEATURES
  48. 48. SUGGEST DATASETS
  49. 49. CREATE A DATASET ▸ create a git repo ▸ datasets.yaml ▸ schema.cql ▸ insert data ▸ “cdm dump” ▸ cdm install . ▸ create a PR on cdm-java OMG BEST DATASET EVER
  50. 50. @RUSTYRAZORBLADE THANK YOU, KIND HUMANS