Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

1/6/2020

Reading time:9 min

Bulk Loading Data into Cassandra

by DataStax

Bulk Loading Data into Cassandra SlideShare Explore You Successfully reported this slideshow.Bulk Loading Data into CassandraUpcoming SlideShareLoading in …5× 3 Comments 26 Likes Statistics Notes Padmabushan Reddy , Engineer at Qolsys 승완 조 , FusionData의 Intern at 홍익대학교 Jie Yao , 高级架构师 at 唯品会 at 唯品会 Steve Min , Data Engineer at Netmarble Games at Netmarble Games Steve Min , Data Engineer at Netmarble Games at Netmarble Games Show More No DownloadsNo notes for slide 1. Planet Cassandra 2014Bulk-Loading Data into CassandraPatricia Gorla@patriciagorlaCassandra Consultantwww.thelastpickle.com 2. About UsWork with clients to deliver and improveApache Cassandra servicesApache Cassandra committer, DatastaxMVP, Hector maintainer, Apache UsergridcommitterBased in New Zealand & USA 3. Why is bulk loading useful?Performance tests 4. Why is bulk loading useful?Performance testsMigrating historical data 5. Why is bulk loading useful?Performance testsMigrating historical dataChanging topologies 6. !How Data is StoredCase Studies - Generating Dummy Data - Backfilling Historical Data - Changing TopologiesConclusion 7. Cassandra Write Pathwrite[0] 8. Cassandra Write Pathwrite[0]Writes written to both the commit log andmemtable.commitlogmemtable 9. Cassandra Write Pathwrite[0]Writes written to both the commit log andmemtable.Memtable is sorted.commitlogmemtable 10. Cassandra Write Pathwrite[0]Memtable flushed out to sstables.commitlogmemtablesstable[0]sstable[2]sstable[1] 11. Cassandra Write Pathwrite[0]Compaction helps keep the read latencylow.commitlogmemtablesstable[0]sstable[2]sstable[1]sstable[n] 12. Sorted String Tablesmykeyspace-mycf-jb-1-CompressionInfo.dbmykeyspace-mycf-jb-1-Data.dbmykeyspace-mycf-jb-1-Filter.dbmykeyspace-mycf-jb-1-Index.dbmykeyspace-mycf-jb-1-Statistics.dbmykeyspace-mycf-jb-1-Summary.dbmykeyspace-mycf-jb-1-TOC.txt 13. Sorted String Tablesmykeyspace-mycf-jb-1-CompressionInfo.dbmykeyspace-mycf-jb-1-Data.dbmykeyspace-mycf-jb-1-Filter.dbmykeyspace-mycf-jb-1-Index.dbmykeyspace-mycf-jb-1-Statistics.dbmykeyspace-mycf-jb-1-Summary.dbmykeyspace-mycf-jb-1-TOC.txtContains all data needed to regenerate components 14. Sorted String Tablesmykeyspace-mycf-jb-1-CompressionInfo.dbmykeyspace-mycf-jb-1-Data.dbmykeyspace-mycf-jb-1-Filter.dbmykeyspace-mycf-jb-1-Index.dbmykeyspace-mycf-jb-1-Statistics.dbmykeyspace-mycf-jb-1-Summary.dbmykeyspace-mycf-jb-1-TOC.txtIndex of row keys 15. Sorted String Tablesmykeyspace-mycf-jb-1-CompressionInfo.dbmykeyspace-mycf-jb-1-Data.dbmykeyspace-mycf-jb-1-Filter.dbmykeyspace-mycf-jb-1-Index.dbmykeyspace-mycf-jb-1-Statistics.dbmykeyspace-mycf-jb-1-Summary.dbmykeyspace-mycf-jb-1-TOC.txtIndex summary from Index.db file 16. Sorted String Tablesmykeyspace-mycf-jb-1-CompressionInfo.dbmykeyspace-mycf-jb-1-Data.dbmykeyspace-mycf-jb-1-Filter.dbmykeyspace-mycf-jb-1-Index.dbmykeyspace-mycf-jb-1-Statistics.dbmykeyspace-mycf-jb-1-Summary.dbmykeyspace-mycf-jb-1-TOC.txtBloom filter over sstable 17. Sorted String Tablesmykeyspace-mycf-jb-1-CompressionInfo.dbmykeyspace-mycf-jb-1-Data.dbmykeyspace-mycf-jb-1-Filter.dbmykeyspace-mycf-jb-1-Index.dbmykeyspace-mycf-jb-1-Statistics.dbmykeyspace-mycf-jb-1-Summary.dbmykeyspace-mycf-jb-1-TOC.txtTable of contents of all components 18. !How Data is StoredCase Studies - Generating Dummy Data - Backfilling Historical Data - Changing TopologiesConclusion 19. create keyspace testwith placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'and strategy_options = {replication_factor:1};!create column family testwith comparator = 'AsciiType'and default_validation_class = 'AsciiType'and key_validation_class = 'AsciiType';Set up keyspace and column family 20. AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter(directory,partitioner,keyspace,columnFamily,AsciiType.instance,null, // subcomparator for super columnssize_per_sstable_mb);SStableGen.java 21. AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter(directory,partitioner,keyspace,columnFamily,AsciiType.instance,null, // subcomparator for super columnssize_per_sstable_mb);SStableGen.java 22. AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter(directory,partitioner,keyspace,columnFamily,AsciiType.instance,null, // subcomparator for super columnssize_per_sstable_mb);SStableGen.java 23. ByteBuffer randomBytes = ByteBufferUtil.bytes(randomAscii(1024));KeyGenerator keyGen = new KeyGenerator();long dataSize = 0;writer = new SSTableSimpleUnsortedWriter(…);while (dataSize < max_data_bytes) {writer.newRow(key);for (int j=0; j<num_cols; j++) {ByteBuffer colName = ByteBufferUtil.bytes("col_" + j);ByteBuffer colValue = ByteBuffer.wrap(new byte[20]);randomBytes.get(colValue.array());colValue.position(0);writer.addColumn(colName, colValue, timestamp);if (randomBytes.remaining() < colValue.limit()) {randomBytes.position(0);}else {randomBytes.position(randomBytes.position() + colValue.limit());}}}} 24. patricia@dev:~/../data$total 64-rw-r--r-- 1 patricia-rw-r--r-- 1 patricia-rw-r--r-- 1 patricia-rw-r--r-- 1 patricia-rw-r--r-- 1 patricia-rw-r--r-- 1 patricia-rw-r--r-- 1 patricials -lh mykeyspace/mycfstaffstaffstaffstaffstaffstaffstaff43B79K16B36B4.3K80B79BFebFebFebFebFebFebFeb222222215:3115:3115:3115:3115:3115:3115:31mykeyspace-mycf-jb-1-CompressionInfo.dbmykeyspace-mycf-jb-1-Data.dbmykeyspace-mycf-jb-1-Filter.dbmykeyspace-mycf-jb-1-Index.dbmykeyspace-mycf-jb-1-Statistics.dbmykeyspace-mycf-jb-1-Summary.dbmykeyspace-mycf-jb-1-TOC.txtExamining sstable output 25. $ bin/sstableloader Keyspace1/ColFam1patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhostStreaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/127.0.0.1]progress: [/127.0.0.1 1/1 (100)] [total: 100 - 0MB/s (avg: 0MB/s)] 26. $ bin/sstableloader Keyspace1/ColFam1patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhostStreaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/127.0.0.1]progress: [/127.0.0.1 1/1 (100)] [total: 100 - 0MB/s (avg: 0MB/s)] 27. $ bin/sstableloader Keyspace1/ColFam1patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhostStreaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/127.0.0.1]progress: [/127.0.0.1 1/1 (100)] [total: 100 - 0MB/s (avg: 0MB/s)] 28. $ bin/sstableloader Keyspace1/ColFam1patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhostStreaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/127.0.0.1]progress: [/127.0.0.1 1/1 (100)] [total: 100 - 0MB/s (avg: 0MB/s)] 29. $ bin/sstableloader Keyspace1/ColFam1Run command on separate server 30. $ bin/sstableloader Keyspace1/ColFam1Run command on separate serverThrottle command 31. $ bin/sstableloader Keyspace1/ColFam1Run command on separate serverThrottle commandParallelise processes 32. !How Data is StoredCase Studies - Generating Dummy Data - Backfilling Historical Data - Changing TopologiesConclusion 33. // list of orders by usercustomerOrders = new SSTableSimpleUnsortedWriter(…);// orders by order idorders = new SSTableSimpleUnsortedWriter(…);!// assume orders are in date orderfor (Order order : oldOrders) {customerOrders.newRow(ByteBufferUtil.bytes(order.customerId));customerOrders.addColumn(ByteBufferUtil.bytes(order.orderId), ByBufferUtil.EMPTY_BYTE_BUFFER,timestamp);!orders.newRow(ByteBufferUtil.bytes(order.userId));orders.addColumn(ByteBufferUtil.bytes(“customer_id), ByteBufferUtil.bytes(order.customerId),timestamp);orders.addColumn(ByteBufferUtil.bytes(“date), ByteBufferUtil.bytes(order.date), timestamp);orders.addColumn(ByteBufferUtil.bytes(“total), ByteBufferUtil.bytes(order.total), timestamp);}!customerOrders.close()orders.close() 34. // list of orders by usercustomerOrders = new SSTableSimpleUnsortedWriter(…);// orders by order idorders = new SSTableSimpleUnsortedWriter(…);!// assume orders are in date orderfor (Order order : oldOrders) {customerOrders.newRow(ByteBufferUtil.bytes(order.customerId));customerOrders.addColumn(ByteBufferUtil.bytes(order.orderId), ByBufferUtil.EMPTY_BYTE_BUFFER,timestamp);!orders.newRow(ByteBufferUtil.bytes(order.userId));orders.addColumn(ByteBufferUtil.bytes(“customer_id), ByteBufferUtil.bytes(order.customerId),timestamp);orders.addColumn(ByteBufferUtil.bytes(“date), ByteBufferUtil.bytes(order.date), timestamp);orders.addColumn(ByteBufferUtil.bytes(“total), ByteBufferUtil.bytes(order.total), timestamp);}!customerOrders.close()orders.close() 35. // list of orders by usercustomerOrders = new SSTableSimpleUnsortedWriter(…);// orders by order idorders = new SSTableSimpleUnsortedWriter(…);!// assume orders are in date orderfor (Order order : oldOrders) {customerOrders.newRow(ByteBufferUtil.bytes(order.customerId));customerOrders.addColumn(ByteBufferUtil.bytes(order.orderId), ByBufferUtil.EMPTY_BYTE_BUFFER,timestamp);!orders.newRow(ByteBufferUtil.bytes(order.userId));orders.addColumn(ByteBufferUtil.bytes(“customer_id), ByteBufferUtil.bytes(order.customerId),timestamp);orders.addColumn(ByteBufferUtil.bytes(“date), ByteBufferUtil.bytes(order.date), timestamp);orders.addColumn(ByteBufferUtil.bytes(“total), ByteBufferUtil.bytes(order.total), timestamp);}!customerOrders.close()orders.close() 36. !How Data is StoredCase Studies - Generating Dummy Data - Backfilling Historical Data - Changing TopologiesConclusion 37. $ bin/sstableloader Keyspace1/ColFam1patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d cass1,cass2,cass3!Streaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/cass1,cass2,cass3,cass4,cass5,cass6]!progress: [/cas1 3/3 (100)] [/cas2 0/4 (0)] [/cas3 0/0 (0)] [/cas4 0/0 (0)] [/cas5 0/0(0)] [/cas6 1/2 (50)] [total: 50 - 0MB/s (avg: 5MB/s)] 38. $ bin/sstableloader Keyspace1/ColFam1patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d cass1,cass2,cass3!Streaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/cass1,cass2,cass3,cass4,cass5,cass6]!progress: [/cas1 3/3 (100)] [/cas2 0/4 (0)] [/cas3 0/0 (0)] [/cas4 0/0 (0)] [/cas5 0/0(0)] [/cas6 1/2 (50)] [total: 50 - 0MB/s (avg: 5MB/s)] 39. $ bin/sstableloader Keyspace1/ColFam1patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d cass1,cass2,cass3!Streaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/cass1,cass2,cass3,cass4,cass5,cass6]!progress: [/cas1 3/3 (100)] [/cas2 0/4 (0)] [/cas3 0/0 (0)] [/cas4 0/0 (0)] [/cas5 0/0(0)] [/cas6 1/2 (50)] [total: 50 - 0MB/s (avg: 5MB/s)] 40. $ bin/sstableloader Keyspace1/ColFam1patricia@dev:~/.../cassandra-2.0.4$ bin/nodetool compactionstatspending tasks: 30Active compaction remaining time :n/a 41. !How Data is StoredCase Studies - Generating Dummy Data - Backfilling Historical Data - Changing TopologiesConclusion 42. cqlsh> CREATE KEYSPACE "test"WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };!cqlsh> CREATE COLUMNFAMILY "test" (id text PRIMARY KEY ) ;CQL: Keep schema consistent 43. CQL3 ConsiderationsUses CompositeType comparator 44. Planet Cassandra 2014Q&APatricia Gorla@patriciagorlaCassandra Consultantwww.thelastpickle.com Recommended Test Prep: GREOnline Course - LinkedIn Learning Learning the Basics of BrandingOnline Course - LinkedIn Learning Learning How to Increase Learner EngagementOnline Course - LinkedIn Learning Bulk Loading into CassandraBrian Hess Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy Webinar: DataStax Training - Everything you need to become a Cassandra RockstarDataStax Cassandra Community Webinar | Make Life Easier - An Introduction to Cassandra...DataStax Cassandra Virtual Node talkPatrick McFadin Cassandra Summit 2014: Apache Cassandra Best Practices at EbayDataStax Academy Webinar: Eventual Consistency != Hopeful ConsistencyDataStax About Blog Terms Privacy Copyright LinkedIn Corporation © 2020 × Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Illustration Image
Bulk Loading Data into Cassandra

Successfully reported this slideshow.

Bulk Loading Data into Cassandra
Planet Cassandra 2014

Bulk-Loading Data into Cassandra
Patricia Gorla

@patriciagorla

Cassandra Consultant

www.thelastp...
About Us
•

Work with clients to deliver and improve
Apache Cassandra services


•

Apache Cassandra committer, Datastax
M...
Why is bulk loading useful?
•

Performance tests
Why is bulk loading useful?
•

Performance tests

•

Migrating historical data
Why is bulk loading useful?
•

Performance tests

•

Migrating historical data

•

Changing topologies
!

•

How Data is Stored

•

Case Studies
	 - Generating Dummy Data
	 - Backfilling Historical Data
	 - Changing Topologies...
Cassandra Write Path

write[0]
Cassandra Write Path
•

write[0]

Writes written to both the commit log and
memtable.

commitlog

memtable
Cassandra Write Path
•

•

write[0]

Writes written to both the commit log and
memtable.

Memtable is sorted.

commitlog

...
Cassandra Write Path
•

write[0]

Memtable flushed out to sstables.

commitlog

memtable

sstable[0]
sstable[2]
sstable[1]
Cassandra Write Path
•

write[0]

Compaction helps keep the read latency
low.

commitlog

memtable

sstable[0]
sstable[2]
...
Sorted String Tables
mykeyspace-mycf-jb-1-CompressionInfo.db
mykeyspace-mycf-jb-1-Data.db
mykeyspace-mycf-jb-1-Filter.db
m...
Sorted String Tables
mykeyspace-mycf-jb-1-CompressionInfo.db
mykeyspace-mycf-jb-1-Data.db
mykeyspace-mycf-jb-1-Filter.db
m...
Sorted String Tables
mykeyspace-mycf-jb-1-CompressionInfo.db
mykeyspace-mycf-jb-1-Data.db
mykeyspace-mycf-jb-1-Filter.db
m...
Sorted String Tables
mykeyspace-mycf-jb-1-CompressionInfo.db
mykeyspace-mycf-jb-1-Data.db
mykeyspace-mycf-jb-1-Filter.db
m...
Sorted String Tables
mykeyspace-mycf-jb-1-CompressionInfo.db
mykeyspace-mycf-jb-1-Data.db
mykeyspace-mycf-jb-1-Filter.db
m...
Sorted String Tables
mykeyspace-mycf-jb-1-CompressionInfo.db
mykeyspace-mycf-jb-1-Data.db
mykeyspace-mycf-jb-1-Filter.db
m...
!

•

How Data is Stored

•

Case Studies
	 - Generating Dummy Data
	 - Backfilling Historical Data
	 - Changing Topologies...
create keyspace test
with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options = {repli...
AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter(
directory,
partitioner,
keyspace,
columnFamily,
Asci...
AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter(
directory,
partitioner,
keyspace,
columnFamily,
Asci...
AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter(
directory,
partitioner,
keyspace,
columnFamily,
Asci...
ByteBuffer randomBytes = ByteBufferUtil.bytes(randomAscii(1024));
KeyGenerator keyGen = new KeyGenerator();
long dataSize ...
patricia@dev:~/../data$
total 64
-rw-r--r-- 1 patricia
-rw-r--r-- 1 patricia
-rw-r--r-- 1 patricia
-rw-r--r-- 1 patricia
-...
$ bin/sstableloader Keyspace1/ColFam1
patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhost
Str...
$ bin/sstableloader Keyspace1/ColFam1
patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhost
Str...
$ bin/sstableloader Keyspace1/ColFam1
patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhost
Str...
$ bin/sstableloader Keyspace1/ColFam1
patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhost
Str...
$ bin/sstableloader Keyspace1/ColFam1
•

Run command on separate server
$ bin/sstableloader Keyspace1/ColFam1
•

Run command on separate server

•

Throttle command
$ bin/sstableloader Keyspace1/ColFam1
•

Run command on separate server

•

Throttle command

•

Parallelise processes
!

•

How Data is Stored

•

Case Studies
	 - Generating Dummy Data
	 - Backfilling Historical Data
	 - Changing Topologies...
// list of orders by user
customerOrders = new SSTableSimpleUnsortedWriter(…);
// orders by order id
orders = new SSTableS...
// list of orders by user
customerOrders = new SSTableSimpleUnsortedWriter(…);
// orders by order id
orders = new SSTableS...
// list of orders by user
customerOrders = new SSTableSimpleUnsortedWriter(…);
// orders by order id
orders = new SSTableS...
!

•

How Data is Stored

•

Case Studies
	 - Generating Dummy Data
	 - Backfilling Historical Data
	 - Changing Topologies...
$ bin/sstableloader Keyspace1/ColFam1
patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d 
cass1,cass2,...
$ bin/sstableloader Keyspace1/ColFam1
patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d 
cass1,cass2,...
$ bin/sstableloader Keyspace1/ColFam1
patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d 
cass1,cass2,...
$ bin/sstableloader Keyspace1/ColFam1
patricia@dev:~/.../cassandra-2.0.4$ bin/nodetool compactionstats
pending tasks: 30
A...
!

•

How Data is Stored

•

Case Studies
	 - Generating Dummy Data
	 - Backfilling Historical Data
	 - Changing Topologies...
cqlsh> CREATE KEYSPACE "test"
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };
!

cqlsh> CREATE C...
CQL3 Considerations
•

Uses CompositeType comparator
Planet Cassandra 2014

Q&A
Patricia Gorla

@patriciagorla

Cassandra Consultant

www.thelastpickle.com

Upcoming SlideShare

Loading in …5

×

  1. 1. Planet Cassandra 2014 Bulk-Loading Data into Cassandra Patricia Gorla @patriciagorla Cassandra Consultant www.thelastpickle.com
  2. 2. About Us • Work with clients to deliver and improve Apache Cassandra services • Apache Cassandra committer, Datastax MVP, Hector maintainer, Apache Usergrid committer • Based in New Zealand & USA
  3. 3. Why is bulk loading useful? • Performance tests
  4. 4. Why is bulk loading useful? • Performance tests • Migrating historical data
  5. 5. Why is bulk loading useful? • Performance tests • Migrating historical data • Changing topologies
  6. 6. ! • How Data is Stored • Case Studies - Generating Dummy Data - Backfilling Historical Data - Changing Topologies • Conclusion
  7. 7. Cassandra Write Path write[0]
  8. 8. Cassandra Write Path • write[0] Writes written to both the commit log and memtable. commitlog memtable
  9. 9. Cassandra Write Path • • write[0] Writes written to both the commit log and memtable. Memtable is sorted. commitlog memtable
  10. 10. Cassandra Write Path • write[0] Memtable flushed out to sstables. commitlog memtable sstable[0] sstable[2] sstable[1]
  11. 11. Cassandra Write Path • write[0] Compaction helps keep the read latency low. commitlog memtable sstable[0] sstable[2] sstable[1] sstable[n]
  12. 12. Sorted String Tables mykeyspace-mycf-jb-1-CompressionInfo.db mykeyspace-mycf-jb-1-Data.db mykeyspace-mycf-jb-1-Filter.db mykeyspace-mycf-jb-1-Index.db mykeyspace-mycf-jb-1-Statistics.db mykeyspace-mycf-jb-1-Summary.db mykeyspace-mycf-jb-1-TOC.txt
  13. 13. Sorted String Tables mykeyspace-mycf-jb-1-CompressionInfo.db mykeyspace-mycf-jb-1-Data.db mykeyspace-mycf-jb-1-Filter.db mykeyspace-mycf-jb-1-Index.db mykeyspace-mycf-jb-1-Statistics.db mykeyspace-mycf-jb-1-Summary.db mykeyspace-mycf-jb-1-TOC.txt Contains all data needed to regenerate components
  14. 14. Sorted String Tables mykeyspace-mycf-jb-1-CompressionInfo.db mykeyspace-mycf-jb-1-Data.db mykeyspace-mycf-jb-1-Filter.db mykeyspace-mycf-jb-1-Index.db mykeyspace-mycf-jb-1-Statistics.db mykeyspace-mycf-jb-1-Summary.db mykeyspace-mycf-jb-1-TOC.txt Index of row keys
  15. 15. Sorted String Tables mykeyspace-mycf-jb-1-CompressionInfo.db mykeyspace-mycf-jb-1-Data.db mykeyspace-mycf-jb-1-Filter.db mykeyspace-mycf-jb-1-Index.db mykeyspace-mycf-jb-1-Statistics.db mykeyspace-mycf-jb-1-Summary.db mykeyspace-mycf-jb-1-TOC.txt Index summary from Index.db file
  16. 16. Sorted String Tables mykeyspace-mycf-jb-1-CompressionInfo.db mykeyspace-mycf-jb-1-Data.db mykeyspace-mycf-jb-1-Filter.db mykeyspace-mycf-jb-1-Index.db mykeyspace-mycf-jb-1-Statistics.db mykeyspace-mycf-jb-1-Summary.db mykeyspace-mycf-jb-1-TOC.txt Bloom filter over sstable
  17. 17. Sorted String Tables mykeyspace-mycf-jb-1-CompressionInfo.db mykeyspace-mycf-jb-1-Data.db mykeyspace-mycf-jb-1-Filter.db mykeyspace-mycf-jb-1-Index.db mykeyspace-mycf-jb-1-Statistics.db mykeyspace-mycf-jb-1-Summary.db mykeyspace-mycf-jb-1-TOC.txt Table of contents of all components
  18. 18. ! • How Data is Stored • Case Studies - Generating Dummy Data - Backfilling Historical Data - Changing Topologies • Conclusion
  19. 19. create keyspace test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1}; ! create column family test with comparator = 'AsciiType' and default_validation_class = 'AsciiType' and key_validation_class = 'AsciiType'; Set up keyspace and column family
  20. 20. AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter( directory, partitioner, keyspace, columnFamily, AsciiType.instance, null, // subcomparator for super columns size_per_sstable_mb ); SStableGen.java
  21. 21. AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter( directory, partitioner, keyspace, columnFamily, AsciiType.instance, null, // subcomparator for super columns size_per_sstable_mb ); SStableGen.java
  22. 22. AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter( directory, partitioner, keyspace, columnFamily, AsciiType.instance, null, // subcomparator for super columns size_per_sstable_mb ); SStableGen.java
  23. 23. ByteBuffer randomBytes = ByteBufferUtil.bytes(randomAscii(1024)); KeyGenerator keyGen = new KeyGenerator(); long dataSize = 0; writer = new SSTableSimpleUnsortedWriter(…); while (dataSize < max_data_bytes) { writer.newRow(key); for (int j=0; j<num_cols; j++) { ByteBuffer colName = ByteBufferUtil.bytes("col_" + j); ByteBuffer colValue = ByteBuffer.wrap(new byte[20]); randomBytes.get(colValue.array()); colValue.position(0); writer.addColumn(colName, colValue, timestamp); if (randomBytes.remaining() < colValue.limit()) { randomBytes.position(0); } else { randomBytes.position(randomBytes.position() + colValue.limit()); } } } }
  24. 24. patricia@dev:~/../data$ total 64 -rw-r--r-- 1 patricia -rw-r--r-- 1 patricia -rw-r--r-- 1 patricia -rw-r--r-- 1 patricia -rw-r--r-- 1 patricia -rw-r--r-- 1 patricia -rw-r--r-- 1 patricia ls -lh mykeyspace/mycf staff staff staff staff staff staff staff 43B 79K 16B 36B 4.3K 80B 79B Feb Feb Feb Feb Feb Feb Feb 2 2 2 2 2 2 2 15:31 15:31 15:31 15:31 15:31 15:31 15:31 mykeyspace-mycf-jb-1-CompressionInfo.db mykeyspace-mycf-jb-1-Data.db mykeyspace-mycf-jb-1-Filter.db mykeyspace-mycf-jb-1-Index.db mykeyspace-mycf-jb-1-Statistics.db mykeyspace-mycf-jb-1-Summary.db mykeyspace-mycf-jb-1-TOC.txt Examining sstable output
  25. 25. $ bin/sstableloader Keyspace1/ColFam1 patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhost Streaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/127.0.0.1] progress: [/127.0.0.1 1/1 (100)] [total: 100 - 0MB/s (avg: 0MB/s)]
  26. 26. $ bin/sstableloader Keyspace1/ColFam1 patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhost Streaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/127.0.0.1] progress: [/127.0.0.1 1/1 (100)] [total: 100 - 0MB/s (avg: 0MB/s)]
  27. 27. $ bin/sstableloader Keyspace1/ColFam1 patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhost Streaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/127.0.0.1] progress: [/127.0.0.1 1/1 (100)] [total: 100 - 0MB/s (avg: 0MB/s)]
  28. 28. $ bin/sstableloader Keyspace1/ColFam1 patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d localhost Streaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/127.0.0.1] progress: [/127.0.0.1 1/1 (100)] [total: 100 - 0MB/s (avg: 0MB/s)]
  29. 29. $ bin/sstableloader Keyspace1/ColFam1 • Run command on separate server
  30. 30. $ bin/sstableloader Keyspace1/ColFam1 • Run command on separate server • Throttle command
  31. 31. $ bin/sstableloader Keyspace1/ColFam1 • Run command on separate server • Throttle command • Parallelise processes
  32. 32. ! • How Data is Stored • Case Studies - Generating Dummy Data - Backfilling Historical Data - Changing Topologies • Conclusion
  33. 33. // list of orders by user customerOrders = new SSTableSimpleUnsortedWriter(…); // orders by order id orders = new SSTableSimpleUnsortedWriter(…); ! // assume orders are in date order for (Order order : oldOrders) { customerOrders.newRow(ByteBufferUtil.bytes(order.customerId)); customerOrders.addColumn(ByteBufferUtil.bytes(order.orderId), ByBufferUtil.EMPTY_BYTE_BUFFER, timestamp); ! orders.newRow(ByteBufferUtil.bytes(order.userId)); orders.addColumn(ByteBufferUtil.bytes(“customer_id), ByteBufferUtil.bytes(order.customerId), timestamp); orders.addColumn(ByteBufferUtil.bytes(“date), ByteBufferUtil.bytes(order.date), timestamp); orders.addColumn(ByteBufferUtil.bytes(“total), ByteBufferUtil.bytes(order.total), timestamp); } ! customerOrders.close() orders.close()
  34. 34. // list of orders by user customerOrders = new SSTableSimpleUnsortedWriter(…); // orders by order id orders = new SSTableSimpleUnsortedWriter(…); ! // assume orders are in date order for (Order order : oldOrders) { customerOrders.newRow(ByteBufferUtil.bytes(order.customerId)); customerOrders.addColumn(ByteBufferUtil.bytes(order.orderId), ByBufferUtil.EMPTY_BYTE_BUFFER, timestamp); ! orders.newRow(ByteBufferUtil.bytes(order.userId)); orders.addColumn(ByteBufferUtil.bytes(“customer_id), ByteBufferUtil.bytes(order.customerId), timestamp); orders.addColumn(ByteBufferUtil.bytes(“date), ByteBufferUtil.bytes(order.date), timestamp); orders.addColumn(ByteBufferUtil.bytes(“total), ByteBufferUtil.bytes(order.total), timestamp); } ! customerOrders.close() orders.close()
  35. 35. // list of orders by user customerOrders = new SSTableSimpleUnsortedWriter(…); // orders by order id orders = new SSTableSimpleUnsortedWriter(…); ! // assume orders are in date order for (Order order : oldOrders) { customerOrders.newRow(ByteBufferUtil.bytes(order.customerId)); customerOrders.addColumn(ByteBufferUtil.bytes(order.orderId), ByBufferUtil.EMPTY_BYTE_BUFFER, timestamp); ! orders.newRow(ByteBufferUtil.bytes(order.userId)); orders.addColumn(ByteBufferUtil.bytes(“customer_id), ByteBufferUtil.bytes(order.customerId), timestamp); orders.addColumn(ByteBufferUtil.bytes(“date), ByteBufferUtil.bytes(order.date), timestamp); orders.addColumn(ByteBufferUtil.bytes(“total), ByteBufferUtil.bytes(order.total), timestamp); } ! customerOrders.close() orders.close()
  36. 36. ! • How Data is Stored • Case Studies - Generating Dummy Data - Backfilling Historical Data - Changing Topologies • Conclusion
  37. 37. $ bin/sstableloader Keyspace1/ColFam1 patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d cass1,cass2,cass3 ! Streaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/cass1,cass2, cass3,cass4,cass5,cass6] ! progress: [/cas1 3/3 (100)] [/cas2 0/4 (0)] [/cas3 0/0 (0)] [/cas4 0/0 (0)] [/cas5 0/0 (0)] [/cas6 1/2 (50)] [total: 50 - 0MB/s (avg: 5MB/s)]
  38. 38. $ bin/sstableloader Keyspace1/ColFam1 patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d cass1,cass2,cass3 ! Streaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/cass1,cass2, cass3,cass4,cass5,cass6] ! progress: [/cas1 3/3 (100)] [/cas2 0/4 (0)] [/cas3 0/0 (0)] [/cas4 0/0 (0)] [/cas5 0/0 (0)] [/cas6 1/2 (50)] [total: 50 - 0MB/s (avg: 5MB/s)]
  39. 39. $ bin/sstableloader Keyspace1/ColFam1 patricia@dev:~/…/cassandra-2.0.4$ bin/sstableloader mykeyspace/mycf -d cass1,cass2,cass3 ! Streaming relevant part of mykeyspace/mycf/mykeyspace-mycf-ic-1-Data.db to [/cass1,cass2, cass3,cass4,cass5,cass6] ! progress: [/cas1 3/3 (100)] [/cas2 0/4 (0)] [/cas3 0/0 (0)] [/cas4 0/0 (0)] [/cas5 0/0 (0)] [/cas6 1/2 (50)] [total: 50 - 0MB/s (avg: 5MB/s)]
  40. 40. $ bin/sstableloader Keyspace1/ColFam1 patricia@dev:~/.../cassandra-2.0.4$ bin/nodetool compactionstats pending tasks: 30 Active compaction remaining time : n/a
  41. 41. ! • How Data is Stored • Case Studies - Generating Dummy Data - Backfilling Historical Data - Changing Topologies • Conclusion
  42. 42. cqlsh> CREATE KEYSPACE "test" WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 }; ! cqlsh> CREATE COLUMNFAMILY "test" (id text PRIMARY KEY ) ; CQL: Keep schema consistent
  43. 43. CQL3 Considerations • Uses CompositeType comparator
  44. 44. Planet Cassandra 2014 Q&A Patricia Gorla @patriciagorla Cassandra Consultant www.thelastpickle.com

×

Related Articles

migration
schema
scylladb

GitHub - eighty4/cquill: Versioned CQL migrations for Cassandra and ScyllaDB

eighty4

12/2/2023

migration
cassandra

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

data.processing