9/7/2018

Reading time:2 min

Apache Cassandra, Part 7: Secondary Index, Replication, and Tips

by Haris Hasan

This series of posts present an introduction to Apache Cassandra. It discusses key Cassandra features, its core concepts, how it works under the hood, how it is different from other data stores, data modelling best practices with examples, and some tips & tricks.Secondary Index in CassandraThe purpose of secondary indexes in Cassandra is not to provide fast access to data using attributes other than partition key, rather it just provides a convenience in writing queries and fetching data. The two disk pass approach of secondary indexes, one for reading the secondary index file and second for accessing the actual data makes it inefficient in terms of performance. You can read more on this here.Replication in CassandraCassandra supports async replication based on a specified replication factor. Consider a scenario where you have 99 partitions with a replication factor of 3. Cassandra will replicate data of each partition on two other partitions. As a result, if a query requests to read all data, Cassandra can find the required data by reading only 33 partitions, hence reducing the number of partitions to read. Replication not only plays an important role in read optimization, but more importantly, it enables fault tolerance by ensuring access to data in case some node goes down in a Cassandra cluster.Cassandra TipsHere are some tips that may come handy during your journey through Cassandra.Test your data model as early as possible. Build prototype, insert data, write queries, make sure your workflow works end to end.Use Cassandra stress to generate reads, writes, and to measure performance.Do not try to minimize writes, extra writes to improve reads is worth it.Data duplication is fact of life in Cassandra, don’t be afraid of it. Disk space is the cheapest available resource.Always use async writes to keep your code non blocking.Batch inserts are anti pattern unless batch data belongs to same partition.Regularly view Cassandra logs to look for warnings and suggestions.Benchmark to measure performance against your needs e.g. data ingestion rate (events/sec) and query execution time.References and Further ReadingHere are some resources that helped me in writing this series.Developer Blog | DataStax Academy: Free Cassandra Tutorials and TrainingDataStax Enterprise is powered by the best distribution of Apache Cassandra™. ©2017 DataStax, All rights reserved…academy.datastax.comCassandra Data Modeling Best Practices, Part 1This is the first in a series of posts on Cassandra data modeling, implementation, operations, and related practices…www.ebayinc.comCassandra Data Modelling — Primary KeysIn the previous blog we discussed how data is stored in Cassandra by creating a keyspace and a table. We also inserted…intellidzine.blogspot.com

Read this article if you want to know more about Apache Cassandra, Part 7: Secondary Index, Replication, and Tips

This series of posts present an introduction to Apache Cassandra. It discusses key Cassandra features, its core concepts, how it works under the hood, how it is different from other data stores, data modelling best practices with examples, and some tips & tricks.

Secondary Index in Cassandra

The purpose of secondary indexes in Cassandra is not to provide fast access to data using attributes other than partition key, rather it just provides a convenience in writing queries and fetching data. The two disk pass approach of secondary indexes, one for reading the secondary index file and second for accessing the actual data makes it inefficient in terms of performance. You can read more on this here.

Replication in Cassandra

Cassandra supports async replication based on a specified replication factor. Consider a scenario where you have 99 partitions with a replication factor of 3. Cassandra will replicate data of each partition on two other partitions. As a result, if a query requests to read all data, Cassandra can find the required data by reading only 33 partitions, hence reducing the number of partitions to read. Replication not only plays an important role in read optimization, but more importantly, it enables fault tolerance by ensuring access to data in case some node goes down in a Cassandra cluster.

Cassandra Tips

Here are some tips that may come handy during your journey through Cassandra.

Test your data model as early as possible. Build prototype, insert data, write queries, make sure your workflow works end to end.
Use Cassandra stress to generate reads, writes, and to measure performance.
Do not try to minimize writes, extra writes to improve reads is worth it.
Data duplication is fact of life in Cassandra, don’t be afraid of it. Disk space is the cheapest available resource.
Always use async writes to keep your code non blocking.
Batch inserts are anti pattern unless batch data belongs to same partition.
Regularly view Cassandra logs to look for warnings and suggestions.
Benchmark to measure performance against your needs e.g. data ingestion rate (events/sec) and query execution time.

References and Further Reading

Here are some resources that helped me in writing this series.

Developer Blog | DataStax Academy: Free Cassandra Tutorials and Training
DataStax Enterprise is powered by the best distribution of Apache Cassandra™. ©2017 DataStax, All rights reserved…academy.datastax.com

Cassandra Data Modeling Best Practices, Part 1
This is the first in a series of posts on Cassandra data modeling, implementation, operations, and related practices…www.ebayinc.com

Cassandra Data Modelling — Primary Keys
In the previous blog we discussed how data is stored in Cassandra by creating a keyspace and a table. We also inserted…intellidzine.blogspot.com

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Secondary Index in Cassandra

Replication in Cassandra

Cassandra Tips

References and Further Reading

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us