Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

2/15/2017

Reading time:18 mins

Storing time series data with Apache Cassandra

by John Doe

Storing time series data with Apache Cassandra Upcoming SlideShare Loading in …5× Show More No DownloadsNo notes for slide 1. @PatrickMcFadinPatrick McFadin
Chief Evangelist for Apache Cassandra, DataStaxStoring Time Series Data with1 2. My Background…ran into this problem 3. Gave it my best shotshard 1 shard 2 shard 3 shard 4routerclientPatrick,All your wildestdreams will cometrue. 4. Just add complexity! 5. A new plan 6. Dynamo Paper(2007)• How do we build a data store that is:• Reliable• Performant• “Always On”• Nothing new and shinyEvolutionary. Real. Computer ScienceAlso the basis for Riak and Voldemort 7. BigTable(2006)• Richer data model• 1 key. Lots of values• Fast sequential access• 38 Papers cited 8. Cassandra(2008)• Distributed features of Dynamo• Data Model and storage fromBigTable• February 17, 2010 it graduated toa top-level Apache project 9. A Data Ocean or Pond., LakeAn In-Memory DatabaseA Key-Value StoreA magical database unicorn that farts rainbows 10. Cassandra for ApplicationsAPACHECASSANDRA 11. Basic Architecture 12. RowColumn1PartitionKey 1Column2Column3Column4 13. PartitionColumn1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4 14. Table Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 2Column2Column3Column4Column1Column2Column3Column4Column1Column2Column3Column4Column1Column2Column3Column4PartitionKey 2PartitionKey 2PartitionKey 2 15. KeyspaceColumn1PartitionKey 1Column2Column3Column4Column1PartitionKey 2Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 2Column2Column3Column4Column1PartitionKey 2Column2Column3Column4Column1PartitionKey 2Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 2Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 2Column2Column3Column4Column1PartitionKey 2Column2Column3Column4Column1PartitionKey 2Column2Column3Column4Table 1 Table 2Keyspace 1 16. NodeServer 17. TokenServer•Each partition is a 128 bit value•Consistent hash between 2-63and 264•Each node owns a range of thosevalues•The token is the beginning of thatrange to the next node’s token value•Virtual Nodes break these downfurtherDataToken Range0 … 18. Cluster ServerToken Range0 0-1000-100 19. Cluster ServerToken Range0 0-5051 51-100Server0-5051-100 20. Cluster ServerToken Range0 0-2526 26-5051 51-7576 76-100ServerServerServer0-2576-10026-5051-75 21. Replication10.0.0.100-25DC1DC1: RF=1Node Primary10.0.0.1 00-2510.0.0.2 26-5010.0.0.3 51-7510.0.0.4 76-10010.0.0.100-2510.0.0.476-10010.0.0.226-5010.0.0.351-75 22. Replication10.0.0.100-2510.0.0.476-10010.0.0.226-5010.0.0.351-75DC1DC1: RF=2Node Primary Replica10.0.0.1 00-25 76-10010.0.0.2 26-50 00-2510.0.0.3 51-75 26-5010.0.0.4 76-100 51-7576-10000-2526-5051-75 23. ReplicationDC1DC1: RF=3Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-5010.0.0.100-2510.0.0.476-10010.0.0.226-5010.0.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50 24. ConsistencyDC1DC1: RF=3Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-5010.0.0.100-2510.0.0.476-10010.0.0.226-5010.0.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50ClientWrite topartition 15 25. Consistency levelConsistency Level Number of Nodes AcknowledgedOne One - Read repair triggeredLocal One One - Read repair in local DCQuorum 51%Local Quorum 51% in local DC 26. ConsistencyDC1DC1: RF=3Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-5010.0.0.100-2510.0.0.476-10010.0.0.226-5010.0.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50ClientWrite topartition 15CL= One 27. ConsistencyDC1DC1: RF=3Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-5010.0.0.100-2510.0.0.476-10010.0.0.226-5010.0.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50ClientWrite topartition 15CL= One 28. ConsistencyDC1DC1: RF=3Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-5010.0.0.100-2510.0.0.476-10010.0.0.226-5010.0.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50ClientWrite topartition 15CL= Quorum 29. Multi-datacenterDC1DC1: RF=3Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-5010.0.0.100-2510.0.0.476-10010.0.0.226-5010.0.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50ClientWrite topartition 15DC210.1.0.100-2510.1.0.476-10010.1.0.226-5010.1.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-50DC2: RF=3 30. Multi-datacenterDC1DC1: RF=3Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-5010.0.0.100-2510.0.0.476-10010.0.0.226-5010.0.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50ClientWrite topartition 15DC210.1.0.100-2510.1.0.476-10010.1.0.226-5010.1.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-50DC2: RF=3 31. Multi-datacenterDC1DC1: RF=3Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-5010.0.0.100-2510.0.0.476-10010.0.0.226-5010.0.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50ClientWrite topartition 15DC210.1.0.100-2510.1.0.476-10010.1.0.226-5010.1.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-50DC2: RF=3 32. Cassandra Query Language - CQL 33. TableCREATE TABLE weather_station (
id text,
name text,
country_code text,
state_code text,
call_sign text,
lat double,
long double,
elevation double,
PRIMARY KEY(id)
);Table NameColumn NameColumn CQL TypePrimary Key Designation Partition Key 34. TableCREATE TABLE daily_aggregate_precip (
wsid text,
year int,
month int,
day int,
precipitation counter,
PRIMARY KEY ((wsid), year, month, day)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC);Partition KeyClustering ColumnsOrder Override 35. InsertINSERT INTO weather_station (id, call_sign, country_code, elevation, lat, long, name, state_code)
VALUES ('727930:24233', 'KSEA', 'US', 121.9, 47.467, -122.32, 'SEATTLE SEATTLE-TACOMA INTL A', ‘WA');Table Name FieldsValuesPartition Key: Required 36. Selectid | call_sign | country_code | elevation | lat | long | name | state_code
--------------+-----------+--------------+-----------+--------+---------+-------------------------------+------------
727930:24233 | KSEA | US | 121.9 | 47.467 | -122.32 | SEATTLE SEATTLE-TACOMA INTL A | WASELECT id, call_sign, country_code, elevation, lat, long, name, state_code
FROM weather_station
WHERE id = '727930:24233';FieldsTable NamePrimary Key: Partition Key Required 37. UpdateUPDATE weather_station
SET name = 'SeaTac International Airport'
WHERE id = '727930:24233';id | call_sign | country_code | elevation | lat | long | name | state_code
--------------+-----------+--------------+-----------+--------+---------+------------------------------+------------
727930:24233 | KSEA | US | 121.9 | 47.467 | -122.32 | SeaTac International Airport | WATable NameFields to Update: Not in Primary KeyPrimary Key 38. DeleteDELETE FROM weather_station
WHERE id = '727930:24233';Table NamePrimary Key: Required 39. CollectionsSetCREATE TABLE weather_station (
id text,
name text,
country_code text,
state_code text,
call_sign text,
lat double,
long double,
elevation double,
equipment set<text>
PRIMARY KEY(id)
);equipment set<text>CQL Type: For OrderingColumn Name 40. CollectionsSetListCREATE TABLE weather_station (
id text,
name text,
country_code text,
state_code text,
call_sign text,
lat double,
long double,
elevation double,
equipment set<text>,
service_dates list<timestamp>,
PRIMARY KEY(id)
);equipment set<text>service_dates list<timestamp>CQL TypeColumn NameCQL Type: For OrderingColumn Name 41. CollectionsSetListMapCREATE TABLE weather_station (
id text,
name text,
country_code text,
state_code text,
call_sign text,
lat double,
long double,
elevation double,
equipment set<text>,
service_dates list<timestamp>,
service_notes map<timestamp,text>,
PRIMARY KEY(id)
);equipment set<text>service_dates list<timestamp>service_notes map<timestamp,text>CQL TypeColumn NameColumn NameCQL Key Type CQL Value TypeCQL Type: For OrderingColumn Name 42. UDF and UDAUser Defined FunctionCREATE OR REPLACE AGGREGATE group_and_count(text)
SFUNC state_group_and_count
STYPE map<text, int>
INITCOND {};CREATE FUNCTION state_group_and_count( state map<text, int>, type text )
CALLED ON NULL INPUT
RETURNS map<text, int>
LANGUAGE java AS '
Integer count = (Integer) state.get(type);if (count == null)count = 1;else count++;state.put(type, count);return state; ' ;User Defined AggregateAs of Cassandra 2.2 43. Example: Weather Station• Weather station collects data• Cassandra stores in sequence• Application reads in sequence 44. Queries supportedCREATE TABLE raw_weather_data (
wsid text,
year int,
month int,
day int,
hour int,
temperature double,
dewpoint double,
pressure double,
wind_direction int,
wind_speed double,
sky_condition int,
sky_condition_text text,
one_hour_precip double,
six_hour_precip double,
PRIMARY KEY ((wsid), year, month, day, hour)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);Get weather data given•Weather Station ID•Weather Station ID and Time•Weather Station ID and Range of Time 45. Primary KeyCREATE TABLE raw_weather_data (
wsid text,
year int,
month int,
day int,
hour int,
temperature double,
dewpoint double,
pressure double,
wind_direction int,
wind_speed double,
sky_condition int,
sky_condition_text text,
one_hour_precip double,
six_hour_precip double,
PRIMARY KEY ((wsid), year, month, day, hour)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); 46. Primary key relationshipPRIMARY KEY ((wsid),year,month,day,hour) 47. Primary key relationshipPartition KeyPRIMARY KEY ((wsid),year,month,day,hour) 48. Primary key relationshipPRIMARY KEY ((wsid),year,month,day,hour)Partition Key Clustering Columns 49. Primary key relationshipPartition Key Clustering Columns10010:99999PRIMARY KEY ((wsid),year,month,day,hour) 50. 2005:12:1:10-5.6Primary key relationshipPartition Key Clustering Columns10010:99999-5.3-4.9-5.12005:12:1:9 2005:12:1:8 2005:12:1:7PRIMARY KEY ((wsid),year,month,day,hour) 51. Partition keys10010:99999 Murmur3 Hash Token = 7224631062609997448722266:13850 Murmur3 Hash Token = -6804302034103043898INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)VALUES (‘10010:99999’,2005,12,1,7,-5.6);INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)VALUES (‘722266:13850’,2005,12,1,7,-5.6);Consistent hash. 128 bit numberbetween 2-63and 264 52. Partition keys10010:99999 Murmur3 Hash Token = 15722266:13850 Murmur3 Hash Token = 77For this example, let’s make it areasonable numberINSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)VALUES (‘10010:99999’,2005,12,1,7,-5.6);INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)VALUES (‘722266:13850’,2005,12,1,7,-5.6); 53. Data LocalityDC1DC1: RF=3Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-5010.0.0.100-2510.0.0.476-10010.0.0.226-5010.0.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50ClientRead partition15DC210.1.0.100-2510.1.0.476-10010.1.0.226-5010.1.0.351-7576-10051-7500-2576-10026-5000-2551-7526-50Node Primary Replica Replica10.0.0.1 00-25 76-100 51-7510.0.0.2 26-50 00-25 76-10010.0.0.3 51-75 26-50 00-2510.0.0.4 76-100 51-75 26-50DC2: RF=3ClientRead partition15 54. Data Localitywsid=‘10010:99999’ ?1000 Node ClusterYou are here! 55. WritesCREATE TABLE raw_weather_data (
wsid text,
year int,
month int,
day int,
hour int,
temperature double,
dewpoint double,
pressure double,
wind_direction int,
wind_speed double,
sky_condition int,
sky_condition_text text,
one_hour_precip double,
six_hour_precip double,
PRIMARY KEY ((wsid), year, month, day, hour)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); 56. WritesCREATE TABLE raw_weather_data (
wsid text,
year int,
month int,
day int,
hour int,
temperature double,
PRIMARY KEY ((wsid), year, month, day, hour)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,10,-5.6);INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,9,-5.1);INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,8,-4.9);INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,7,-5.3); 57. Write PathClientINSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,7,-5.3);Column1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4MemtableSSTableSSTableSSTableSSTableNodeCommit Log Data * Compaction * 58. Date Tiered Compaction Strategy•Group similar time blocks•Never compact again•Used for high densitySSTableSSTableSSTableT=2015-01-01 -> 2015-01-5T=2015-01-06 -> 2015-01-10T=2015-01-11 -> 2015-01-15 59. Storage Model - Logical View2005:12:1:10-5.62005:12:1:9-5.12005:12:1:8-4.910010:9999910010:9999910010:99999wsid hour temperature2005:12:1:7-5.310010:99999SELECT wsid, hour, temperature
FROM raw_weather_data
WHERE wsid=‘10010:99999’
AND year = 2005 AND month = 12 AND day = 1; 60. 2005:12:1:10-5.6 -5.3-4.9-5.1Storage Model - Disk Layout2005:12:1:9 2005:12:1:810010:999992005:12:1:7Merged, Sorted and Stored SequentiallySELECT wsid, hour, temperature
FROM raw_weather_data
WHERE wsid=‘10010:99999’
AND year = 2005 AND month = 12 AND day = 1; 61. 2005:12:1:10-5.62005:12:1:11-4.9 -5.3-4.9-5.1Storage Model - Disk Layout2005:12:1:9 2005:12:1:810010:999992005:12:1:7Merged, Sorted and Stored SequentiallySELECT wsid, hour, temperature
FROM raw_weather_data
WHERE wsid=‘10010:99999’
AND year = 2005 AND month = 12 AND day = 1; 62. 2005:12:1:10-5.62005:12:1:11-4.9 -5.3-4.9-5.1Storage Model - Disk Layout2005:12:1:9 2005:12:1:810010:999992005:12:1:7Merged, Sorted and Stored SequentiallySELECT wsid, hour, temperature
FROM raw_weather_data
WHERE wsid=‘10010:99999’
AND year = 2005 AND month = 12 AND day = 1;2005:12:1:12-5.4 63. Read PathClientColumn1PartitionKey 1Column2Column3Column4Column1PartitionKey 1Column2Column3Column4MemtableSSTableSSTableSSTableNodeDataSELECT wsid,hour,temperature
FROM raw_weather_data
WHERE wsid='10010:99999'
AND year = 2005 AND month = 12 AND day = 1 
AND hour >= 7 AND hour <= 10; 64. Query patterns• Range queries• “Slice” operation on diskSingle seek on disk10010:99999Partition key for localitySELECT wsid,hour,temperature
FROM raw_weather_data
WHERE wsid='10010:99999'
AND year = 2005 AND month = 12 AND day = 1 
AND hour >= 7 AND hour <= 10;2005:12:1:10-5.6 -5.3-4.9-5.12005:12:1:9 2005:12:1:8 2005:12:1:7 65. Query patterns• Range queries• “Slice” operation on diskProgrammers like thisSorted by event_time2005:12:1:10-5.62005:12:1:9-5.12005:12:1:8-4.910010:9999910010:9999910010:99999weather_station hour temperature2005:12:1:7-5.310010:99999SELECT weatherstation,hour,temperatureFROM temperatureWHERE weatherstation_id=‘10010:99999'AND year = 2005 AND month = 12 AND day = 1AND hour >= 7 AND hour <= 10; 66. Thank you!Bring the questionsFollow me on twitter@PatrickMcFadin Recommended LinkedIn Corporation © 2017 Public clipboards featuring this slide No public clipboards found for this slide Select another clipboard×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide!Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Illustration Image
Storing time series data with Apache Cassandra
@PatrickMcFadin
Patrick McFadin

Chief Evangelist for Apache Cassandra, DataStax
Storing Time Series Data with
1
My Background
…ran into this problem
Gave it my best shot
shard 1 shard 2 shard 3 shard 4
router
client
Patrick,
All your wildest
dreams will come
true.
Just add complexity!
A new plan
Dynamo Paper(2007)
• How do we build a data store that is:
• Reliable
• Performant
• “Always On”
• Nothing new and shiny
E...
BigTable(2006)
• Richer data model
• 1 key. Lots of values
• Fast sequential access
• 38 Papers cited
Cassandra(2008)
• Distributed features of Dynamo
• Data Model and storage from
BigTable
• February 17, 2010 it graduated t...
A Data Ocean or Pond., Lake
An In-Memory Database
A Key-Value Store
A magical database unicorn that farts rainbows
Cassandra for Applications
APACHE
CASSANDRA
Basic Architecture
Row
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Partition
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1...
Table Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Par...
Keyspace
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 2
Column
2
Column
3
Column
4
Column
1
...
Node
Server
Token
Server
•Each partition is a 128 bit value
•Consistent hash between 2-63
and 264
•Each node owns a range of those
val...
Cluster Server
Token Range
0 0-100
0-100
Cluster Server
Token Range
0 0-50
51 51-100
Server
0-50
51-100
Cluster Server
Token Range
0 0-25
26 26-50
51 51-75
76 76-100
Server
ServerServer
0-25
76-100
26-5051-75
Replication
10.0.0.1
00-25
DC1
DC1: RF=1
Node Primary
10.0.0.1 00-25
10.0.0.2 26-50
10.0.0.3 51-75
10.0.0.4 76-100
10.0.0....
Replication
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
DC1
DC1: RF=2
Node Primary Replica
10.0.0.1 00-25...
Replication
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51...
Consistency
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51...
Consistency level
Consistency Level Number of Nodes Acknowledged
One One - Read repair triggered
Local One One - Read repa...
Consistency
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51...
Consistency
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51...
Consistency
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51...
Multi-datacenter
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0...
Multi-datacenter
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0...
Multi-datacenter
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0...
Cassandra Query Language - CQL
Table
CREATE TABLE weather_station (

id text,

name text,

country_code text,

state_code text,

call_sign text,

lat dou...
Table
CREATE TABLE daily_aggregate_precip (

wsid text,

year int,

month int,

day int,

precipitation counter,

PRIMARY ...
Insert
INSERT INTO weather_station (id, call_sign, country_code, elevation, lat, long, name, state_code)

VALUES ('727930:...
Select
id | call_sign | country_code | elevation | lat | long | name | state_code

--------------+-----------+------------...
Update
UPDATE weather_station

SET name = 'SeaTac International Airport'

WHERE id = '727930:24233';
id | call_sign | coun...
Delete
DELETE FROM weather_station

WHERE id = '727930:24233';
Table Name
Primary Key: Required
Collections
Set
CREATE TABLE weather_station (

id text,

name text,

country_code text,

state_code text,

call_sign text...
Collections
Set
List
CREATE TABLE weather_station (

id text,

name text,

country_code text,

state_code text,

call_sign...
Collections
Set
List
Map
CREATE TABLE weather_station (

id text,

name text,

country_code text,

state_code text,

call_...
UDF and UDA
User Defined Function
CREATE OR REPLACE AGGREGATE group_and_count(text)

SFUNC state_group_and_count

STYPE ma...
Example: Weather Station
• Weather station collects data
• Cassandra stores in sequence
• Application reads in sequence
Queries supported
CREATE TABLE raw_weather_data (

wsid text,

year int,

month int,

day int,

hour int,

temperature dou...
Primary Key
CREATE TABLE raw_weather_data (

wsid text,

year int,

month int,

day int,

hour int,

temperature double,

...
Primary key relationship
PRIMARY KEY ((wsid),year,month,day,hour)
Primary key relationship
Partition Key
PRIMARY KEY ((wsid),year,month,day,hour)
Primary key relationship
PRIMARY KEY ((wsid),year,month,day,hour)
Partition Key Clustering Columns
Primary key relationship
Partition Key Clustering Columns
10010:99999
PRIMARY KEY ((wsid),year,month,day,hour)
2005:12:1:10
-5.6
Primary key relationship
Partition Key Clustering Columns
10010:99999
-5.3-4.9-5.1
2005:12:1:9 2005:12:1...
Partition keys
10010:99999 Murmur3 Hash Token = 7224631062609997448
722266:13850 Murmur3 Hash Token = -6804302034103043898...
Partition keys
10010:99999 Murmur3 Hash Token = 15
722266:13850 Murmur3 Hash Token = 77
For this example, let’s make it a
...
Data Locality
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 ...
Data Locality
wsid=‘10010:99999’ ?
1000 Node Cluster
You are here!
Writes
CREATE TABLE raw_weather_data (

wsid text,

year int,

month int,

day int,

hour int,

temperature double,

dewpo...
Writes
CREATE TABLE raw_weather_data (

wsid text,

year int,

month int,

day int,

hour int,

temperature double,

PRIMA...
Write Path
Client
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)

VALUES (‘10010:99999’,2005,12,1,7,-5...
Date Tiered Compaction Strategy
•Group similar time blocks
•Never compact again
•Used for high density
SSTable
SSTable
SST...
Storage Model - Logical View
2005:12:1:10
-5.6
2005:12:1:9
-5.1
2005:12:1:8
-4.9
10010:99999
10010:99999
10010:99999
wsid ...
2005:12:1:10
-5.6 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:8
10010:99999
2005:12:1:7
Merged, Sorted ...
2005:12:1:10
-5.6
2005:12:1:11
-4.9 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:8
10010:99999
2005:12:1...
2005:12:1:10
-5.6
2005:12:1:11
-4.9 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:8
10010:99999
2005:12:1...
Read Path
Client
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
M...
Query patterns
• Range queries
• “Slice” operation on disk
Single seek on disk
10010:99999
Partition key for locality
SELE...
Query patterns
• Range queries
• “Slice” operation on disk
Programmers like this
Sorted by event_time
2005:12:1:10
-5.6
20...
Thank you!
Bring the questions
Follow me on twitter
@PatrickMcFadin
Storing time series data with Apache Cassandra

Upcoming SlideShare

Loading in …5

×

  1. 1. @PatrickMcFadin Patrick McFadin
 Chief Evangelist for Apache Cassandra, DataStax Storing Time Series Data with 1
  2. 2. My Background …ran into this problem
  3. 3. Gave it my best shot shard 1 shard 2 shard 3 shard 4 router client Patrick, All your wildest dreams will come true.
  4. 4. Just add complexity!
  5. 5. A new plan
  6. 6. Dynamo Paper(2007) • How do we build a data store that is: • Reliable • Performant • “Always On” • Nothing new and shiny Evolutionary. Real. Computer Science Also the basis for Riak and Voldemort
  7. 7. BigTable(2006) • Richer data model • 1 key. Lots of values • Fast sequential access • 38 Papers cited
  8. 8. Cassandra(2008) • Distributed features of Dynamo • Data Model and storage from BigTable • February 17, 2010 it graduated to a top-level Apache project
  9. 9. A Data Ocean or Pond., Lake An In-Memory Database A Key-Value Store A magical database unicorn that farts rainbows
  10. 10. Cassandra for Applications APACHE CASSANDRA
  11. 11. Basic Architecture
  12. 12. Row Column 1 Partition Key 1 Column 2 Column 3 Column 4
  13. 13. Partition Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4
  14. 14. Table Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 2 Column 2 Column 3 Column 4 Column 1 Column 2 Column 3 Column 4 Column 1 Column 2 Column 3 Column 4 Column 1 Column 2 Column 3 Column 4 Partition Key 2 Partition Key 2 Partition Key 2
  15. 15. Keyspace Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 2 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 2 Column 2 Column 3 Column 4 Column 1 Partition Key 2 Column 2 Column 3 Column 4 Column 1 Partition Key 2 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 2 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 2 Column 2 Column 3 Column 4 Column 1 Partition Key 2 Column 2 Column 3 Column 4 Column 1 Partition Key 2 Column 2 Column 3 Column 4 Table 1 Table 2 Keyspace 1
  16. 16. Node Server
  17. 17. Token Server •Each partition is a 128 bit value •Consistent hash between 2-63 and 264 •Each node owns a range of those values •The token is the beginning of that range to the next node’s token value •Virtual Nodes break these down further Data Token Range 0 …
  18. 18. Cluster Server Token Range 0 0-100 0-100
  19. 19. Cluster Server Token Range 0 0-50 51 51-100 Server 0-50 51-100
  20. 20. Cluster Server Token Range 0 0-25 26 26-50 51 51-75 76 76-100 Server ServerServer 0-25 76-100 26-5051-75
  21. 21. Replication 10.0.0.1 00-25 DC1 DC1: RF=1 Node Primary 10.0.0.1 00-25 10.0.0.2 26-50 10.0.0.3 51-75 10.0.0.4 76-100 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75
  22. 22. Replication 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 DC1 DC1: RF=2 Node Primary Replica 10.0.0.1 00-25 76-100 10.0.0.2 26-50 00-25 10.0.0.3 51-75 26-50 10.0.0.4 76-100 51-75 76-100 00-25 26-50 51-75
  23. 23. Replication DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50
  24. 24. Consistency DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15
  25. 25. Consistency level Consistency Level Number of Nodes Acknowledged One One - Read repair triggered Local One One - Read repair in local DC Quorum 51% Local Quorum 51% in local DC
  26. 26. Consistency DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 CL= One
  27. 27. Consistency DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 CL= One
  28. 28. Consistency DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 CL= Quorum
  29. 29. Multi-datacenter DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 DC2 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 DC2: RF=3
  30. 30. Multi-datacenter DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 DC2 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 DC2: RF=3
  31. 31. Multi-datacenter DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 DC2 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 DC2: RF=3
  32. 32. Cassandra Query Language - CQL
  33. 33. Table CREATE TABLE weather_station (
 id text,
 name text,
 country_code text,
 state_code text,
 call_sign text,
 lat double,
 long double,
 elevation double,
 PRIMARY KEY(id)
 ); Table Name Column Name Column CQL Type Primary Key Designation Partition Key
  34. 34. Table CREATE TABLE daily_aggregate_precip (
 wsid text,
 year int,
 month int,
 day int,
 precipitation counter,
 PRIMARY KEY ((wsid), year, month, day)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC); Partition Key Clustering Columns Order Override
  35. 35. Insert INSERT INTO weather_station (id, call_sign, country_code, elevation, lat, long, name, state_code)
 VALUES ('727930:24233', 'KSEA', 'US', 121.9, 47.467, -122.32, 'SEATTLE SEATTLE-TACOMA INTL A', ‘WA'); Table Name Fields Values Partition Key: Required
  36. 36. Select id | call_sign | country_code | elevation | lat | long | name | state_code
 --------------+-----------+--------------+-----------+--------+---------+-------------------------------+------------
 727930:24233 | KSEA | US | 121.9 | 47.467 | -122.32 | SEATTLE SEATTLE-TACOMA INTL A | WA SELECT id, call_sign, country_code, elevation, lat, long, name, state_code
 FROM weather_station
 WHERE id = '727930:24233'; Fields Table Name Primary Key: Partition Key Required
  37. 37. Update UPDATE weather_station
 SET name = 'SeaTac International Airport'
 WHERE id = '727930:24233'; id | call_sign | country_code | elevation | lat | long | name | state_code
 --------------+-----------+--------------+-----------+--------+---------+------------------------------+------------
 727930:24233 | KSEA | US | 121.9 | 47.467 | -122.32 | SeaTac International Airport | WA Table Name Fields to Update: Not in Primary Key Primary Key
  38. 38. Delete DELETE FROM weather_station
 WHERE id = '727930:24233'; Table Name Primary Key: Required
  39. 39. Collections Set CREATE TABLE weather_station (
 id text,
 name text,
 country_code text,
 state_code text,
 call_sign text,
 lat double,
 long double,
 elevation double,
 equipment set<text>
 PRIMARY KEY(id)
 ); equipment set<text> CQL Type: For Ordering Column Name
  40. 40. Collections Set List CREATE TABLE weather_station (
 id text,
 name text,
 country_code text,
 state_code text,
 call_sign text,
 lat double,
 long double,
 elevation double,
 equipment set<text>,
 service_dates list<timestamp>,
 PRIMARY KEY(id)
 ); equipment set<text> service_dates list<timestamp> CQL Type Column Name CQL Type: For Ordering Column Name
  41. 41. Collections Set List Map CREATE TABLE weather_station (
 id text,
 name text,
 country_code text,
 state_code text,
 call_sign text,
 lat double,
 long double,
 elevation double,
 equipment set<text>,
 service_dates list<timestamp>,
 service_notes map<timestamp,text>,
 PRIMARY KEY(id)
 ); equipment set<text> service_dates list<timestamp> service_notes map<timestamp,text> CQL Type Column Name Column Name CQL Key Type CQL Value Type CQL Type: For Ordering Column Name
  42. 42. UDF and UDA User Defined Function CREATE OR REPLACE AGGREGATE group_and_count(text)
 SFUNC state_group_and_count
 STYPE map<text, int>
 INITCOND {}; CREATE FUNCTION state_group_and_count( state map<text, int>, type text )
 CALLED ON NULL INPUT
 RETURNS map<text, int>
 LANGUAGE java AS '
 Integer count = (Integer) state.get(type); if (count == null) count = 1; else count++; state.put(type, count); return state; ' ; User Defined Aggregate As of Cassandra 2.2
  43. 43. Example: Weather Station • Weather station collects data • Cassandra stores in sequence • Application reads in sequence
  44. 44. Queries supported CREATE TABLE raw_weather_data (
 wsid text,
 year int,
 month int,
 day int,
 hour int,
 temperature double,
 dewpoint double,
 pressure double,
 wind_direction int,
 wind_speed double,
 sky_condition int,
 sky_condition_text text,
 one_hour_precip double,
 six_hour_precip double,
 PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); Get weather data given •Weather Station ID •Weather Station ID and Time •Weather Station ID and Range of Time
  45. 45. Primary Key CREATE TABLE raw_weather_data (
 wsid text,
 year int,
 month int,
 day int,
 hour int,
 temperature double,
 dewpoint double,
 pressure double,
 wind_direction int,
 wind_speed double,
 sky_condition int,
 sky_condition_text text,
 one_hour_precip double,
 six_hour_precip double,
 PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
  46. 46. Primary key relationship PRIMARY KEY ((wsid),year,month,day,hour)
  47. 47. Primary key relationship Partition Key PRIMARY KEY ((wsid),year,month,day,hour)
  48. 48. Primary key relationship PRIMARY KEY ((wsid),year,month,day,hour) Partition Key Clustering Columns
  49. 49. Primary key relationship Partition Key Clustering Columns 10010:99999 PRIMARY KEY ((wsid),year,month,day,hour)
  50. 50. 2005:12:1:10 -5.6 Primary key relationship Partition Key Clustering Columns 10010:99999 -5.3-4.9-5.1 2005:12:1:9 2005:12:1:8 2005:12:1:7 PRIMARY KEY ((wsid),year,month,day,hour)
  51. 51. Partition keys 10010:99999 Murmur3 Hash Token = 7224631062609997448 722266:13850 Murmur3 Hash Token = -6804302034103043898 INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.6); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘722266:13850’,2005,12,1,7,-5.6); Consistent hash. 128 bit number between 2-63 and 264
  52. 52. Partition keys 10010:99999 Murmur3 Hash Token = 15 722266:13850 Murmur3 Hash Token = 77 For this example, let’s make it a reasonable number INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.6); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘722266:13850’,2005,12,1,7,-5.6);
  53. 53. Data Locality DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Read partition 15 DC2 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 DC2: RF=3 Client Read partition 15
  54. 54. Data Locality wsid=‘10010:99999’ ? 1000 Node Cluster You are here!
  55. 55. Writes CREATE TABLE raw_weather_data (
 wsid text,
 year int,
 month int,
 day int,
 hour int,
 temperature double,
 dewpoint double,
 pressure double,
 wind_direction int,
 wind_speed double,
 sky_condition int,
 sky_condition_text text,
 one_hour_precip double,
 six_hour_precip double,
 PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
  56. 56. Writes CREATE TABLE raw_weather_data (
 wsid text,
 year int,
 month int,
 day int,
 hour int,
 temperature double,
 PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
 VALUES (‘10010:99999’,2005,12,1,10,-5.6); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
 VALUES (‘10010:99999’,2005,12,1,9,-5.1); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
 VALUES (‘10010:99999’,2005,12,1,8,-4.9); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
 VALUES (‘10010:99999’,2005,12,1,7,-5.3);
  57. 57. Write Path Client INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
 VALUES (‘10010:99999’,2005,12,1,7,-5.3); Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Memtable SSTable SSTable SSTable SSTable Node Commit Log Data * Compaction *
  58. 58. Date Tiered Compaction Strategy •Group similar time blocks •Never compact again •Used for high density SSTable SSTable SSTable T=2015-01-01 -> 2015-01-5 T=2015-01-06 -> 2015-01-10 T=2015-01-11 -> 2015-01-15
  59. 59. Storage Model - Logical View 2005:12:1:10 -5.6 2005:12:1:9 -5.1 2005:12:1:8 -4.9 10010:99999 10010:99999 10010:99999 wsid hour temperature 2005:12:1:7 -5.3 10010:99999 SELECT wsid, hour, temperature
 FROM raw_weather_data
 WHERE wsid=‘10010:99999’
 AND year = 2005 AND month = 12 AND day = 1;
  60. 60. 2005:12:1:10 -5.6 -5.3-4.9-5.1 Storage Model - Disk Layout 2005:12:1:9 2005:12:1:8 10010:99999 2005:12:1:7 Merged, Sorted and Stored Sequentially SELECT wsid, hour, temperature
 FROM raw_weather_data
 WHERE wsid=‘10010:99999’
 AND year = 2005 AND month = 12 AND day = 1;
  61. 61. 2005:12:1:10 -5.6 2005:12:1:11 -4.9 -5.3-4.9-5.1 Storage Model - Disk Layout 2005:12:1:9 2005:12:1:8 10010:99999 2005:12:1:7 Merged, Sorted and Stored Sequentially SELECT wsid, hour, temperature
 FROM raw_weather_data
 WHERE wsid=‘10010:99999’
 AND year = 2005 AND month = 12 AND day = 1;
  62. 62. 2005:12:1:10 -5.6 2005:12:1:11 -4.9 -5.3-4.9-5.1 Storage Model - Disk Layout 2005:12:1:9 2005:12:1:8 10010:99999 2005:12:1:7 Merged, Sorted and Stored Sequentially SELECT wsid, hour, temperature
 FROM raw_weather_data
 WHERE wsid=‘10010:99999’
 AND year = 2005 AND month = 12 AND day = 1; 2005:12:1:12 -5.4
  63. 63. Read Path Client Column 1 Partition Key 1 Column 2 Column 3 Column 4 Column 1 Partition Key 1 Column 2 Column 3 Column 4 Memtable SSTable SSTable SSTable Node Data SELECT wsid,hour,temperature
 FROM raw_weather_data
 WHERE wsid='10010:99999'
 AND year = 2005 AND month = 12 AND day = 1 
 AND hour >= 7 AND hour <= 10;
  64. 64. Query patterns • Range queries • “Slice” operation on disk Single seek on disk 10010:99999 Partition key for locality SELECT wsid,hour,temperature
 FROM raw_weather_data
 WHERE wsid='10010:99999'
 AND year = 2005 AND month = 12 AND day = 1 
 AND hour >= 7 AND hour <= 10; 2005:12:1:10 -5.6 -5.3-4.9-5.1 2005:12:1:9 2005:12:1:8 2005:12:1:7
  65. 65. Query patterns • Range queries • “Slice” operation on disk Programmers like this Sorted by event_time 2005:12:1:10 -5.6 2005:12:1:9 -5.1 2005:12:1:8 -4.9 10010:99999 10010:99999 10010:99999 weather_station hour temperature 2005:12:1:7 -5.3 10010:99999 SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation_id=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
  66. 66. Thank you! Bring the questions Follow me on twitter @PatrickMcFadin

Related Articles

data.modeling
cassandra

Search key of big partition in cassandra

John Doe

2/17/2023

data.modeling
cassandra
spark

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

data.modeling