Successfully reported this slideshow.
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The Last Pickle) | C* Summit 2016
Upcoming SlideShare
Loading in …5
×
- 1. JON HADDAD THE LAST PICKLE LEARN DATA MODELING BY EXAMPLE THIS IS AWESOME!!!
- 2. WHAT’S THE LAST PICKLE DO?
- 3. WE HELP MAKE YOU A TEAM OF EXPERTS
- 4. > 50 YEARS COMBINED EXPERIENCE
- 5. WHO IS THIS GUY?
- 6. 15 YEARS EXPERIENCE
- 7. 4 YEARS WITH CASSANDRA
- 8. LEARNING HOW TO CASSANDRA
- 9. WHAT’S YOUR BACKGROUND?
- 10. ORACLE! MYSQL! POSTGRES!
- 11. CQL LOOKS LIKE SQL
- 12. BAD ASSUMPTIONS
- 13. 3RD NORMAL FORM?
- 14. WHERE’S MY JOINS?
- 15. SECONDARY INDEX?
- 16. DO IT WRONG TRY TO DATA MODELGET ANGRY WATCH VIDEOS & READ
- 17. EVERYTHING I KNOW IS WRONG
- 18. LEARN BY EXAMPLE
- 19. CASSANDRA DATASET MANAGER
- 20. CDM
- 21. APT FOR CASSANDRA DATA
- 22. INSTALL DATA TO YOUR CASSANDRA CLUSTER
- 23. cdm install <dataset>
- 24. jhaddad@rustyrazorblade ~$ cdm list Starting CDM Datasets: movielens killrvideo killrweather Finished.
- 25. jhaddad@rustyrazorblade ~$ cdm install movielens Starting CDM Installing movielens Checking for repo at /Users/jhaddad/.cdm/movielens Pulling latest CDM is using dataset path: /Users/jhaddad/.cdm/movielens cqlsh -e "DROP KEYSPACE IF EXISTS movielens; CREATE KEYSPACE movielens WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}" Schema: /Users/jhaddad/.cdm/movielens/schema.cql Loading data cqlsh -k movielens -e "COPY movies FROM '/Users/jhaddad/.cdm/movielens/data/ movies.csv'" cqlsh -k movielens -e "COPY users FROM '/Users/jhaddad/.cdm/movielens/data/ users.csv'" cqlsh -k movielens -e "COPY ratings_by_user FROM '/Users/jhaddad/.cdm/movielens data/ratings_by_user.csv'" cqlsh -k movielens -e "COPY original_movie_map FROM '/Users/jhaddad/.cdm/ movielens/data/original_movie_map.csv'"
- 26. jhaddad@rustyrazorblade ~/dev/cassandra$ cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.10-SNAPSHOT | CQL spec 3.4.3 | Native protocol v4] Use HELP for help. cqlsh> use movielens ; cqlsh:movielens> desc tables; movies users ratings_by_user original_movie_map ratings_by_movie
- 27. WHAT CAN WE DO WITH IT? ▸ Learn by example ▸ Blog posts / Tutorials ▸ Jupyter notebooks ▸ Reference applications ▸ Data Models for presentations
- 28. MANAGING REFERENCE / TEST DATA
- 29. DATASETS
- 30. MOVIELENS
- 31. DETAILS ▸GroupLens Research Project ▸University of Minnesota ▸100K ratings ▸1K users ▸1700 movies
- 32. cqlsh:movielens> select id, avg_rating, genres, name ... from movies limit 1; @ Row 1 ------------+-------------------------------------- id | 76a38f64-94d8-4b8f-b830-a40af96f8d20 avg_rating | 3.16667 genres | {'Drama'} name | Little Lord Fauntleroy (1936) (1 rows)
- 33. cqlsh:movielens> select * from users limit 1; @ Row 1 ------------+-------------------------------------- id | b52fcdfc-0eaf-4432-9896-aa22db56edb2 address | 0322 Mattie Ramp Apt. 177 age | 37 city | South Fremont gender | M name | Harrold Hills occupation | administrator zip | 06513 (1 rows)
- 34. BLOG: WORKING RELATIONALLY WITH CASSANDRA
- 35. CONNECTING CASSANDRA DATA WITH GRAPHFRAMES
- 36. cdm install killrweather
- 37. Helena Edelson Patrick McFadin
- 38. cdm install killrvideo
- 39. Luke Tillman Patrick McFadin
- 40. UPCOMING DATA SETS
- 41. openflights.org ‣ airports ‣ flight data
- 42. HEALTH CARE ▸ Cancer Genome Atlas Project ▸ Ebola cases ▸ Healthcare financial data ▸ Dani Traphagen
- 43. NYC TAXI DATA ▸ pick up / drop off times & locations ▸ trip distances ▸ itemized fares ▸ rate types ▸ payment types
- 44. SOCIAL DATA ▸Higgs Twitter Data ▸Foursquare ▸Enron executive emails
- 45. HOW TO CONTRIBUTE
- 46. https://github.com/riptano/cdm-java
- 47. ADD FEATURES
- 48. SUGGEST DATASETS
- 49. CREATE A DATASET ▸ create a git repo ▸ datasets.yaml ▸ schema.cql ▸ insert data ▸ “cdm dump” ▸ cdm install . ▸ create a PR on cdm-java OMG BEST DATASET EVER
- 50. @RUSTYRAZORBLADE THANK YOU, KIND HUMANS
Public clipboards featuring this slide
No public clipboards found for this slide