Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

11/22/2019

Reading time:1 min

yukim/cassandra-bulkload-example

by John Doe

Sample SSTable generating and bulk loading code for DataStax Using Cassandra Bulk Loader, Updated blog post.This fetches historical prices from Yahoo! Finance in CSV format, and turn them to SSTables.Generating SSTablesRun:$ ./gradlew runThis will generate SSTable(s) under data directory.Bulk loadingFirst, create schema using schema.cql file:$ cqlsh -f schema.cqlThen, load SSTables to Cassandra using sstableloader:$ sstableloader -d <ip address of the node> data/quote/historical_prices(assuming you have cqlsh and sstableloader in your $PATH)Check loaded data$ bin/cqlshConnected to Test Cluster at 127.0.0.1:9042.[cqlsh 5.0.1 | Cassandra 2.1.0 | CQL spec 3.2.0 | Native protocol v3]Use HELP for help.cqlsh> USE quote ;cqlsh:quote> SELECT * FROM historical_prices WHERE ticker = 'ORCL' LIMIT 3; ticker | date | adj_close | close | high | low | open | volume--------+--------------------------+-----------+-------+-------+-------+-------+---------- ORCL | 2014-09-25 00:00:00-0500 | 38.76 | 38.76 | 39.35 | 38.65 | 39.35 | 13287800 ORCL | 2014-09-24 00:00:00-0500 | 39.42 | 39.42 | 39.56 | 38.57 | 38.77 | 18906200 ORCL | 2014-09-23 00:00:00-0500 | 38.83 | 38.83 | 39.59 | 38.80 | 39.50 | 34353300(3 rows)Voilà!

Illustration Image

Sample SSTable generating and bulk loading code for DataStax Using Cassandra Bulk Loader, Updated blog post. This fetches historical prices from Yahoo! Finance in CSV format, and turn them to SSTables.

Generating SSTables

Run:

$ ./gradlew run

This will generate SSTable(s) under data directory.

Bulk loading

First, create schema using schema.cql file:

$ cqlsh -f schema.cql

Then, load SSTables to Cassandra using sstableloader:

$ sstableloader -d <ip address of the node> data/quote/historical_prices

(assuming you have cqlsh and sstableloader in your $PATH)

Check loaded data

$ bin/cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.0 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh> USE quote ;
cqlsh:quote> SELECT * FROM historical_prices WHERE ticker = 'ORCL' LIMIT 3;
 ticker | date                     | adj_close | close | high  | low   | open  | volume
--------+--------------------------+-----------+-------+-------+-------+-------+----------
   ORCL | 2014-09-25 00:00:00-0500 |     38.76 | 38.76 | 39.35 | 38.65 | 39.35 | 13287800
   ORCL | 2014-09-24 00:00:00-0500 |     39.42 | 39.42 | 39.56 | 38.57 | 38.77 | 18906200
   ORCL | 2014-09-23 00:00:00-0500 |     38.83 | 38.83 | 39.59 | 38.80 | 39.50 | 34353300
(3 rows)

Voilà!

Related Articles

data.engineering
cassandra
streamsets

Apache Cassandra Lunch #94: StreamSets and Cassandra - Business Platform Team

John Doe

5/31/2022

data.engineering
cassandra

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

data.engineering