Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

6/2/2021

Reading time:1 min

Apache Cassandra Lunch #42: SSTable Files with SSTableloader - Business Platform Team

by Obioma Anomnachi

In case you missed it, this blog post is a recap of Cassandra Lunch #42, covering SSTable files. It also covers their relation to SSTableLoader. We also walk through an example using SSTableloader to load data taken from a cluster to a new, empty cluster. The live recording of Cassandra Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!SSTable FilesAn individual SSTable is a section of on-disk storage used in Cassandra. It is also used in a number of other NoSQL databases. SSTables take the form of directories and files containing the data. They also hold other useful information to facilitate reading that data later on. SSTables are immutable once written, with new ones being added over time. More details on SSTables can be found in our previous posts here and here.SSTableloaderSSTableloader, also known as the Cassandra Bulk Loader is a tool for loading data from SSTables into a Cassandra cluster. Note that this is different from loading SSTables onto a Cassandra cluster. Rather than copying SSTable files, sstableloader instead streams the data contained in those files onto a Cassandra cluster. This process respects things like replication strategy and replication factor for clusters and keyspaces being loaded. In order to work properly, the sstableloader must be given a directory containing at least the Index.db and Data.db sections of the full SSTable directory. It also works off of snapshots. The keyspace and table for data to be streamed into must already exist, but the table can already have other data in it. Cassandra.LinkCassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity. We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email! Posted in Modern Business | Comments Off on Apache Cassandra Lunch #42: SSTable Files with SSTableloader

Illustration Image

In case you missed it, this blog post is a recap of Cassandra Lunch #42, covering SSTable files. It also covers their relation to SSTableLoader. We also walk through an example using SSTableloader to load data taken from a cluster to a new, empty cluster. The live recording of Cassandra Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

SSTable Files

An individual SSTable is a section of on-disk storage used in Cassandra. It is also used in a number of other NoSQL databases. SSTables take the form of directories and files containing the data. They also hold other useful information to facilitate reading that data later on. SSTables are immutable once written, with new ones being added over time. More details on SSTables can be found in our previous posts here and here.

SSTableloader

SSTableloader, also known as the Cassandra Bulk Loader is a tool for loading data from SSTables into a Cassandra cluster. Note that this is different from loading SSTables onto a Cassandra cluster. Rather than copying SSTable files, sstableloader instead streams the data contained in those files onto a Cassandra cluster. This process respects things like replication strategy and replication factor for clusters and keyspaces being loaded. 

In order to work properly, the sstableloader must be given a directory containing at least the Index.db and Data.db sections of the full SSTable directory. It also works off of snapshots. The keyspace and table for data to be streamed into must already exist, but the table can already have other data in it.

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!

Related Articles

cassandra
tools
sstables

ic-tools for Apache Cassandra SSTables

John Doe

2/17/2023

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra.lunch