Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

6/10/2020

Reading time:7 min

Updates from Cassandra Summit 2016 & SASI Indexes

by Jim Hatcher

Updates from Cassandra Summit 2016 & SASI Indexes SlideShare Explore You Successfully reported this slideshow.Updates from Cassandra Summit 2016 & SASI IndexesUpcoming SlideShareLoading in …5× 0 Comments 0 Likes Statistics Notes Be the first to like this No DownloadsNo notes for slideIntroduced in C* 0.7 (~2010)Introduced in C* 3.0Introduced in C* 3.4 (2016) Bug Fixes in C* 3.5 1. Jim HatcherDFW Cassandra Users - Meetup10/5/2016Updates from Cassandra Summit 2016 2. Agenda• Highlights of Cassandra Summit• Options for Querying Data• Denormalization• Inverted Index Tables• Materialized Views• Indexes• SASI Indexes• Deeper Dive on SASI Indexes• Resources 3. IntroductionJim Hatcherjames_hatcher@hotmail.comAt IHS Markit, we take raw data and turn it into information and insights for our customers.Automotive Systems (CarFax)Defense Systems (Jane’s)Oil & Gas Systems (Petra)Maritime SystemsTechnology Systems (Electronic Parts Database, Root Metrics)Sources of Raw DataStructure DataAdd ValueCustomer-facingSystems 4. Highlights from C* Summit 2016VideosKeynote:https://www.youtube.com/watch?v=2bG7pU9ZyJM&index=86&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIkAll sessions:https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIkMy session (shameless plug ):https://www.youtube.com/watch?v=WsXBFDPGDLo&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&index=5 5. Highlights from C* Summit 2016Popular Topics from SessionsCore Cassandra• Compactions / Tombstones• Data Modeling• New Features: SASI Indexes• Operations / MonitoringDataStax• DSE Graph• SecurityCassandra Ecosystem• Spark / Spark Streaming• Kafka• Data Science• Mesos 6. Highlights from C* Summit 2016Summary of Technical Part of the KeynoteNew Storage Engine (3.0)Materialized Views (3.0)SASI Indexes (3.4)COPY FROM Improvements (3.5)Read Path Improvements (3.6)Time Window Compaction Strategy (3.8)GROUP BY (3.10)sstabledump (3.4) 7. Cassandra WriteCREATE KEYSPACE acct_mgmtWITH replication ={'class': 'SimpleStrategy','replication_factor': 3};CREATE TABLE user (username text,last_name text,country_code text,age int,PRIMARY KEY ( username ));INSERT INTO user ( username, country_code, last_name, age )VALUES ( ‘jhatcher’, ‘US’, ‘Hatcher’, 29 );Cassandra ClusterBCDEFAToken Range: A-DToken Range: E-IToken Range: J-MToken Range: N-QTokenRange: R-UTokenRange: V-ZClient 8. Read by KeySELECT username, country_code,last_name, ageFROM userWHERE username = ‘jhatcher’;SELECT username, country_code,last_name, ageFROM userWHERE country_code = ‘US’;Cassandra ClusterBCDEFClientAToken Range: A-DToken Range: E-IToken Range: J-MToken Range: N-QTokenRange: R-UTokenRange: V-ZClient?? 9. DenormalizationCREATE TABLE user_by_country_code (username text,country_code text,last_name text,age int,PRIMARY KEY ( country_code ));SELECT username, country_code,last_name, ageFROM userWHERE country_code = ‘US’;Cassandra ClusterBCDEFClientAToken Range: A-DToken Range: E-IToken Range: J-MToken Range: N-QTokenRange:R-UTokenRange: V-ZYou have to keep thisin sync manually 10. SELECT username, country_code, ageFROM userWHERE username = ?;Inverted Index TableCREATE TABLEacct_mgmt.invert_user_by_country_code (country_code text,username text,PRIMARY KEY ( country_code )); -- notice that age and last_name are notin this tableSELECT username, country_code, last_name, ageFROM userWHERE country_code = ‘US’;Cassandra ClusterBCDEFClientAToken Range: A-DToken Range: E-IToken Range: J-MToken Range: N-QTokenRange:R-UTokenRange: V-ZYou have to keep thisin sync manuallyOne of these for each of theresults returned in the firstquerySELECT usernameFROM invert_user_by_countryWHERE country_code = ‘US’;1.2. 11. Secondary IndexesCREATE INDEX country_code_idx ONuser ( country_code ) ;SELECT username, country_code,last_name, ageFROM userWHERE country_code = ‘US’Cassandra ClusterBCDEFAToken Range: A-DTokenRange:E-IToken Range: J-MToken Range: N-QTokenRange:R-UTokenRange: V-ZClientC* will keep this in syncGood Use Cases:• Analytics queries where you’re going to scan the whole table anyway• Limited number of records returned (use LIMIT)• You can specify the Partitioning Key in addition to the indexed field valueUS, aliceCN, bobCA, vanUK, zed RU, edUK, frankUS, joeUK, kimUS, patUK, ottoUS, ronUK, samRestrictions:• No nested predicates, range queries, LIKE searches 12. Materialized ViewCREATE MATERIALIZED VIEW mv_user_by_cntry_codeASSELECT *FROM userWHERE country_code IS NOT NULLPRIMARY KEY ( country_code, username ));SELECT username, country_code,last_name, ageFROM mv_user_by_cntry_codeWHERE country_code = ‘US’;Cassandra ClusterBCDEFClientAToken Range: A-DToken Range: E-IToken Range: J-MToken Range: N-QTokenRange:R-UTokenRange: V-ZC* will keep this in syncRestrictions:• PK of MV must include the PK ofbase table + one additional field• Equalities Only (no ranges) 13. SASI (SSTable Attached Secondary Index)CREATE CUSTOM INDEX ON user ( country_code )USING 'org.apache.cassandra.index.sasi.SASIIndex'SELECT username, country_code,last_name, ageFROM userWHERE country_code = ‘US’Cassandra ClusterBCDEFAToken Range: A-DTokenRange:E-IToken Range: J-MToken Range: N-QTokenRange:R-UTokenRange: V-ZUS, aliceCN, bobCA, vanUK, zed RU, edUK, frankUS, joeUK, kimUS, patUK, ottoUS, ronUK, samClient 14. What’s the difference between SASI and traditional C* indexes?Improvements with SASI Indexes:• Better integration with Cassandra engine• SASI indexes use less memory, disk, CPU resources• Different structure (b-tree)• Allows for range queries (age > 10), intersection scans (age = 10 && name = bob)• Allows for full-text search capabilities (LIKE support) 15. Cassandra Write Path - RefresherWhen data is written to a Cassandra node:• Data is written to the memTable (and sorted in memory) and to the commit log (in the order it comes)• When the data reaches a certain size, the memTable is flushed to disk and written as an SSTable. At this time, thecommit log is purged. SSTables are immutable (cannot be edited).• On a different schedule, compaction two takes SSTables, combines them into a single SSTable and deletes thesource SSTables. At this time, data is merged and cleaned up.Source: https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_write_path_c.html 16. Cassandra Write Path – SASI IndexesSource: http://www.doanduyhai.com/blog/?p=2058 17. SyntaxCREATE CUSTOM INDEX ON user ( country_code )USING 'org.apache.cassandra.index.sasi.SASIIndex'WITH OPTIONS = { 'mode': 'PREFIX' };//mode=PREFIX is default; can be omittedCREATE CUSTOM INDEX ON user ( last_name )USING 'org.apache.cassandra.index.sasi.SASIIndex'WITH OPTIONS = { 'mode': ‘CONTAINS' };CREATE CUSTOM INDEX ON user ( last_name )USING 'org.apache.cassandra.index.sasi.SASIIndex'WITH OPTIONS = {'mode': ‘CONTAINS‘,'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer','analyzed': 'true','tokenization_skip_stop_words': 'and, the, or','tokenization_enable_stemming': 'true','tokenization_normalize_lowercase': 'true','tokenization_locale': 'en'};CREATE CUSTOM INDEX ON user ( age )USING 'org.apache.cassandra.index.sasi.SASIIndex'WITH OPTIONS = { 'mode': 'PREFIX' };//mode=PREFIX is default; can be omittedSELECT *FROM userWHERE country_code = ‘US’SELECT *FROM userWHERE country_code LIKE ‘U*’SELECT *FROM userWHERE age = 33SELECT *FROM userWHERE age > 21SELECT *FROM userWHERE last_name LIKE ‘%atch%’SELECT *FROM userWHERE last_name LIKE ‘hatch’Will return hatch, hatched, hatching, etc.All the stuff you can do withPREFIX, plus… 18. Resources for SASI IndexesC* Summit keynote (skip to 39:05 for Materialized Views or 44:09 for SASI)https://www.youtube.com/watch?v=2bG7pU9ZyJM&index=86&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIkDoan DuyHai – Talk at C* Summit 2016https://www.youtube.com/watch?v=dxiuQ2CkXfM&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&index=119CQL Docshttps://docs.datastax.com/en/cql/3.3/cql/cql_reference/refCreateSASIIndex.htmlJon Haddad – blog posthttp://rustyrazorblade.com/2016/02/cassandra-secondary-index-preview-1/Doan DuyHai – blog post (lots of technical internals)http://www.doanduyhai.com/blog/?p=2058 Recommended The Neuroscience of LearningOnline Course - LinkedIn Learning Time Management Tips WeeklyOnline Course - LinkedIn Learning Gamification for Interactive LearningOnline Course - LinkedIn Learning 5 Ways to Use Spark to Enrich your Cassandra EnvironmentJim Hatcher Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...DataStax Running Apache Cassandra on DockerJim Hatcher GraphFrames Access Methods in DSE GraphJim Hatcher Using Spark to Load Oracle Data into CassandraJim Hatcher Introduction to Data Modeling in CassandraJim Hatcher Global Healthcare Report Q2 2019CB Insights About Blog Terms Privacy Copyright LinkedIn Corporation © 2020 × Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Illustration Image
Updates from Cassandra Summit 2016 & SASI Indexes

Successfully reported this slideshow.

Updates from Cassandra Summit 2016 & SASI Indexes
Jim Hatcher
DFW Cassandra Users - Meetup
10/5/2016
Updates from Cassandra Summit 2016
Agenda
• Highlights of Cassandra Summit
• Options for Querying Data
• Denormalization
• Inverted Index Tables
• Materializ...
Introduction
Jim Hatcher
james_hatcher@hotmail.com
At IHS Markit, we take raw data and turn it into information and insigh...
Highlights from C* Summit 2016
Videos
Keynote:
https://www.youtube.com/watch?v=2bG7pU9ZyJM&index=86&list=PLm-EPIkBI3YoiA-0...
Highlights from C* Summit 2016
Popular Topics from Sessions
Core Cassandra
• Compactions / Tombstones
• Data Modeling
• Ne...
Highlights from C* Summit 2016
Summary of Technical Part of the Keynote
New Storage Engine (3.0)
Materialized Views (3.0)
...
Cassandra Write
CREATE KEYSPACE acct_mgmt
WITH replication =
{
'class': 'SimpleStrategy',
'replication_factor': 3
};
CREAT...
Read by Key
SELECT username, country_code,
last_name, age
FROM user
WHERE username = ‘jhatcher’;
SELECT username, country_...
Denormalization
CREATE TABLE user_by_country_code (
username text,
country_code text,
last_name text,
age int,
PRIMARY KEY...
SELECT username, country_code, age
FROM user
WHERE username = ?;
Inverted Index Table
CREATE TABLE
acct_mgmt.invert_user_b...
Secondary Indexes
CREATE INDEX country_code_idx ON
user ( country_code ) ;
SELECT username, country_code,
last_name, age
F...
Materialized View
CREATE MATERIALIZED VIEW mv_user_by_cntry_code
AS
SELECT *
FROM user
WHERE country_code IS NOT NULL
PRIM...
SASI (SSTable Attached Secondary Index)
CREATE CUSTOM INDEX ON user ( country_code )
USING 'org.apache.cassandra.index.sas...
What’s the difference between SASI and traditional C* indexes?
Improvements with SASI Indexes:
• Better integration with C...
Cassandra Write Path - Refresher
When data is written to a Cassandra node:
• Data is written to the memTable (and sorted i...
Cassandra Write Path – SASI Indexes
Source: http://www.doanduyhai.com/blog/?p=2058
Syntax
CREATE CUSTOM INDEX ON user ( country_code )
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 'mo...
Resources for SASI Indexes
C* Summit keynote (skip to 39:05 for Materialized Views or 44:09 for SASI)
https://www.youtube....

Upcoming SlideShare

Loading in …5

×

  • Be the first to like this

  1. 1. Jim Hatcher DFW Cassandra Users - Meetup 10/5/2016 Updates from Cassandra Summit 2016
  2. 2. Agenda • Highlights of Cassandra Summit • Options for Querying Data • Denormalization • Inverted Index Tables • Materialized Views • Indexes • SASI Indexes • Deeper Dive on SASI Indexes • Resources
  3. 3. Introduction Jim Hatcher james_hatcher@hotmail.com At IHS Markit, we take raw data and turn it into information and insights for our customers. Automotive Systems (CarFax) Defense Systems (Jane’s) Oil & Gas Systems (Petra) Maritime Systems Technology Systems (Electronic Parts Database, Root Metrics) Sources of Raw Data Structure Data Add Value Customer-facing Systems
  4. 4. Highlights from C* Summit 2016 Videos Keynote: https://www.youtube.com/watch?v=2bG7pU9ZyJM&index=86&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk All sessions: https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk My session (shameless plug ): https://www.youtube.com/watch?v=WsXBFDPGDLo&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&index=5
  5. 5. Highlights from C* Summit 2016 Popular Topics from Sessions Core Cassandra • Compactions / Tombstones • Data Modeling • New Features: SASI Indexes • Operations / Monitoring DataStax • DSE Graph • Security Cassandra Ecosystem • Spark / Spark Streaming • Kafka • Data Science • Mesos
  6. 6. Highlights from C* Summit 2016 Summary of Technical Part of the Keynote New Storage Engine (3.0) Materialized Views (3.0) SASI Indexes (3.4) COPY FROM Improvements (3.5) Read Path Improvements (3.6) Time Window Compaction Strategy (3.8) GROUP BY (3.10) sstabledump (3.4)
  7. 7. Cassandra Write CREATE KEYSPACE acct_mgmt WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 3 }; CREATE TABLE user ( username text, last_name text, country_code text, age int, PRIMARY KEY ( username ) ); INSERT INTO user ( username, country_code, last_name, age ) VALUES ( ‘jhatcher’, ‘US’, ‘Hatcher’, 29 ); Cassandra Cluster B C D E F A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z Client
  8. 8. Read by Key SELECT username, country_code, last_name, age FROM user WHERE username = ‘jhatcher’; SELECT username, country_code, last_name, age FROM user WHERE country_code = ‘US’; Cassandra Cluster B C D E F Client A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z Client ??
  9. 9. Denormalization CREATE TABLE user_by_country_code ( username text, country_code text, last_name text, age int, PRIMARY KEY ( country_code ) ); SELECT username, country_code, last_name, age FROM user WHERE country_code = ‘US’; Cassandra Cluster B C D E F Client A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z You have to keep this in sync manually
  10. 10. SELECT username, country_code, age FROM user WHERE username = ?; Inverted Index Table CREATE TABLE acct_mgmt.invert_user_by_country_code ( country_code text, username text, PRIMARY KEY ( country_code ) ); -- notice that age and last_name are not in this table SELECT username, country_code, last_name, age FROM user WHERE country_code = ‘US’; Cassandra Cluster B C D E F Client A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z You have to keep this in sync manually One of these for each of the results returned in the first query SELECT username FROM invert_user_by_country WHERE country_code = ‘US’; 1. 2.
  11. 11. Secondary Indexes CREATE INDEX country_code_idx ON user ( country_code ) ; SELECT username, country_code, last_name, age FROM user WHERE country_code = ‘US’ Cassandra Cluster B C D E F A Token Range: A-D Token Range: E-I Token Range: J-MToken Range: N-Q Token Range: R-U Token Range: V-Z Client C* will keep this in sync Good Use Cases: • Analytics queries where you’re going to scan the whole table anyway • Limited number of records returned (use LIMIT) • You can specify the Partitioning Key in addition to the indexed field value US, alice CN, bob CA, van UK, zed RU, ed UK, frank US, joe UK, kim US, pat UK, otto US, ron UK, sam Restrictions: • No nested predicates, range queries, LIKE searches
  12. 12. Materialized View CREATE MATERIALIZED VIEW mv_user_by_cntry_code AS SELECT * FROM user WHERE country_code IS NOT NULL PRIMARY KEY ( country_code, username ) ); SELECT username, country_code, last_name, age FROM mv_user_by_cntry_code WHERE country_code = ‘US’; Cassandra Cluster B C D E F Client A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z C* will keep this in sync Restrictions: • PK of MV must include the PK of base table + one additional field • Equalities Only (no ranges)
  13. 13. SASI (SSTable Attached Secondary Index) CREATE CUSTOM INDEX ON user ( country_code ) USING 'org.apache.cassandra.index.sasi.SASIIndex' SELECT username, country_code, last_name, age FROM user WHERE country_code = ‘US’ Cassandra Cluster B C D E F A Token Range: A-D Token Range: E-I Token Range: J-M Token Range: N-Q Token Range: R-U Token Range: V-Z US, alice CN, bob CA, van UK, zed RU, ed UK, frank US, joe UK, kim US, pat UK, otto US, ron UK, sam Client
  14. 14. What’s the difference between SASI and traditional C* indexes? Improvements with SASI Indexes: • Better integration with Cassandra engine • SASI indexes use less memory, disk, CPU resources • Different structure (b-tree) • Allows for range queries (age > 10), intersection scans (age = 10 && name = bob) • Allows for full-text search capabilities (LIKE support)
  15. 15. Cassandra Write Path - Refresher When data is written to a Cassandra node: • Data is written to the memTable (and sorted in memory) and to the commit log (in the order it comes) • When the data reaches a certain size, the memTable is flushed to disk and written as an SSTable. At this time, the commit log is purged. SSTables are immutable (cannot be edited). • On a different schedule, compaction two takes SSTables, combines them into a single SSTable and deletes the source SSTables. At this time, data is merged and cleaned up. Source: https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_write_path_c.html
  16. 16. Cassandra Write Path – SASI Indexes Source: http://www.doanduyhai.com/blog/?p=2058
  17. 17. Syntax CREATE CUSTOM INDEX ON user ( country_code ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'PREFIX' }; //mode=PREFIX is default; can be omitted CREATE CUSTOM INDEX ON user ( last_name ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': ‘CONTAINS' }; CREATE CUSTOM INDEX ON user ( last_name ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': ‘CONTAINS‘, 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'analyzed': 'true', 'tokenization_skip_stop_words': 'and, the, or', 'tokenization_enable_stemming': 'true', 'tokenization_normalize_lowercase': 'true', 'tokenization_locale': 'en' }; CREATE CUSTOM INDEX ON user ( age ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'PREFIX' }; //mode=PREFIX is default; can be omitted SELECT * FROM user WHERE country_code = ‘US’ SELECT * FROM user WHERE country_code LIKE ‘U*’ SELECT * FROM user WHERE age = 33 SELECT * FROM user WHERE age > 21 SELECT * FROM user WHERE last_name LIKE ‘%atch%’ SELECT * FROM user WHERE last_name LIKE ‘hatch’ Will return hatch, hatched, hatching, etc. All the stuff you can do with PREFIX, plus…
  18. 18. Resources for SASI Indexes C* Summit keynote (skip to 39:05 for Materialized Views or 44:09 for SASI) https://www.youtube.com/watch?v=2bG7pU9ZyJM&index=86&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk Doan DuyHai – Talk at C* Summit 2016 https://www.youtube.com/watch?v=dxiuQ2CkXfM&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&index=119 CQL Docs https://docs.datastax.com/en/cql/3.3/cql/cql_reference/refCreateSASIIndex.html Jon Haddad – blog post http://rustyrazorblade.com/2016/02/cassandra-secondary-index-preview-1/ Doan DuyHai – blog post (lots of technical internals) http://www.doanduyhai.com/blog/?p=2058

×

Related Articles

cassandra
cassandra.sasi

Apache Cassandra indexing without having to say I’m sorry - SD Times

John Doe

1/15/2021

scylladb
cassandra
cassandra.sasi

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra