Successfully reported this slideshow.
Updates from Cassandra Summit 2016 & SASI Indexes
Upcoming SlideShare
Loading in …5
×
-
Be the first to like this
No Downloads
No notes for slide
Introduced in C* 0.7 (~2010)Introduced in C* 3.0Introduced in C* 3.4 (2016)Bug Fixes in C* 3.5
- 1. Jim Hatcher DFW Cassandra Users - Meetup 10/5/2016 Updates from Cassandra Summit 2016
- 2. Agenda • Highlights of Cassandra Summit • Options for Querying Data • Denormalization • Inverted Index Tables • Materialized Views • Indexes • SASI Indexes • Deeper Dive on SASI Indexes • Resources
- 3. Introduction Jim Hatcher james_hatcher@hotmail.com At IHS Markit, we take raw data and turn it into information and insights for our customers. Automotive Systems (CarFax) Defense Systems (Jane’s) Oil & Gas Systems (Petra) Maritime Systems Technology Systems (Electronic Parts Database, Root Metrics) Sources of Raw Data Structure Data Add Value Customer-facing Systems
- 4. Highlights from C* Summit 2016 Videos Keynote: https://www.youtube.com/watch?v=2bG7pU9ZyJM&index=86&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk All sessions: https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk My session (shameless plug ): https://www.youtube.com/watch?v=WsXBFDPGDLo&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&index=5
- 5. Highlights from C* Summit 2016 Popular Topics from Sessions Core Cassandra • Compactions / Tombstones • Data Modeling • New Features: SASI Indexes • Operations / Monitoring DataStax • DSE Graph • Security Cassandra Ecosystem • Spark / Spark Streaming • Kafka • Data Science • Mesos
- 6. Highlights from C* Summit 2016 Summary of Technical Part of the Keynote New Storage Engine (3.0) Materialized Views (3.0) SASI Indexes (3.4) COPY FROM Improvements (3.5) Read Path Improvements (3.6) Time Window Compaction Strategy (3.8) GROUP BY (3.10) sstabledump (3.4)
- 7. Cassandra Write CREATE KEYSPACE acct_mgmt WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 3 }; CREATE TABLE user ( username text, last_name text, country_code text, age int, PRIMARY KEY ( username ) ); INSERT INTO user ( username, country_code, last_name, age ) VALUES ( ‘jhatcher’, ‘US’, ‘Hatcher’, 29 ); Cassandra Cluster B C D E F A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z Client
- 8. Read by Key SELECT username, country_code, last_name, age FROM user WHERE username = ‘jhatcher’; SELECT username, country_code, last_name, age FROM user WHERE country_code = ‘US’; Cassandra Cluster B C D E F Client A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z Client ??
- 9. Denormalization CREATE TABLE user_by_country_code ( username text, country_code text, last_name text, age int, PRIMARY KEY ( country_code ) ); SELECT username, country_code, last_name, age FROM user WHERE country_code = ‘US’; Cassandra Cluster B C D E F Client A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z You have to keep this in sync manually
- 10. SELECT username, country_code, age FROM user WHERE username = ?; Inverted Index Table CREATE TABLE acct_mgmt.invert_user_by_country_code ( country_code text, username text, PRIMARY KEY ( country_code ) ); -- notice that age and last_name are not in this table SELECT username, country_code, last_name, age FROM user WHERE country_code = ‘US’; Cassandra Cluster B C D E F Client A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z You have to keep this in sync manually One of these for each of the results returned in the first query SELECT username FROM invert_user_by_country WHERE country_code = ‘US’; 1. 2.
- 11. Secondary Indexes CREATE INDEX country_code_idx ON user ( country_code ) ; SELECT username, country_code, last_name, age FROM user WHERE country_code = ‘US’ Cassandra Cluster B C D E F A Token Range: A-D Token Range: E-I Token Range: J-MToken Range: N-Q Token Range: R-U Token Range: V-Z Client C* will keep this in sync Good Use Cases: • Analytics queries where you’re going to scan the whole table anyway • Limited number of records returned (use LIMIT) • You can specify the Partitioning Key in addition to the indexed field value US, alice CN, bob CA, van UK, zed RU, ed UK, frank US, joe UK, kim US, pat UK, otto US, ron UK, sam Restrictions: • No nested predicates, range queries, LIKE searches
- 12. Materialized View CREATE MATERIALIZED VIEW mv_user_by_cntry_code AS SELECT * FROM user WHERE country_code IS NOT NULL PRIMARY KEY ( country_code, username ) ); SELECT username, country_code, last_name, age FROM mv_user_by_cntry_code WHERE country_code = ‘US’; Cassandra Cluster B C D E F Client A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z C* will keep this in sync Restrictions: • PK of MV must include the PK of base table + one additional field • Equalities Only (no ranges)
- 13. SASI (SSTable Attached Secondary Index) CREATE CUSTOM INDEX ON user ( country_code ) USING 'org.apache.cassandra.index.sasi.SASIIndex' SELECT username, country_code, last_name, age FROM user WHERE country_code = ‘US’ Cassandra Cluster B C D E F A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z US, alice CN, bob CA, van UK, zed RU, ed UK, frank US, joe UK, kim US, pat UK, otto US, ron UK, sam Client
- 14. What’s the difference between SASI and traditional C* indexes? Improvements with SASI Indexes: • Better integration with Cassandra engine • SASI indexes use less memory, disk, CPU resources • Different structure (b-tree) • Allows for range queries (age > 10), intersection scans (age = 10 && name = bob) • Allows for full-text search capabilities (LIKE support)
- 15. Cassandra Write Path - Refresher When data is written to a Cassandra node: • Data is written to the memTable (and sorted in memory) and to the commit log (in the order it comes) • When the data reaches a certain size, the memTable is flushed to disk and written as an SSTable. At this time, the commit log is purged. SSTables are immutable (cannot be edited). • On a different schedule, compaction two takes SSTables, combines them into a single SSTable and deletes the source SSTables. At this time, data is merged and cleaned up. Source: https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_write_path_c.html
- 16. Cassandra Write Path – SASI Indexes Source: http://www.doanduyhai.com/blog/?p=2058
- 17. Syntax CREATE CUSTOM INDEX ON user ( country_code ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'PREFIX' }; //mode=PREFIX is default; can be omitted CREATE CUSTOM INDEX ON user ( last_name ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': ‘CONTAINS' }; CREATE CUSTOM INDEX ON user ( last_name ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': ‘CONTAINS‘, 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'analyzed': 'true', 'tokenization_skip_stop_words': 'and, the, or', 'tokenization_enable_stemming': 'true', 'tokenization_normalize_lowercase': 'true', 'tokenization_locale': 'en' }; CREATE CUSTOM INDEX ON user ( age ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'PREFIX' }; //mode=PREFIX is default; can be omitted SELECT * FROM user WHERE country_code = ‘US’ SELECT * FROM user WHERE country_code LIKE ‘U*’ SELECT * FROM user WHERE age = 33 SELECT * FROM user WHERE age > 21 SELECT * FROM user WHERE last_name LIKE ‘%atch%’ SELECT * FROM user WHERE last_name LIKE ‘hatch’ Will return hatch, hatched, hatching, etc. All the stuff you can do with PREFIX, plus…
- 18. Resources for SASI Indexes C* Summit keynote (skip to 39:05 for Materialized Views or 44:09 for SASI) https://www.youtube.com/watch?v=2bG7pU9ZyJM&index=86&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk Doan DuyHai – Talk at C* Summit 2016 https://www.youtube.com/watch?v=dxiuQ2CkXfM&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&index=119 CQL Docs https://docs.datastax.com/en/cql/3.3/cql/cql_reference/refCreateSASIIndex.html Jon Haddad – blog post http://rustyrazorblade.com/2016/02/cassandra-secondary-index-preview-1/ Doan DuyHai – blog post (lots of technical internals) http://www.doanduyhai.com/blog/?p=2058
Public clipboards featuring this slide
No public clipboards found for this slide