Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

7/15/2020

Reading time:5 min

When to use Cassandra and when to steer clear

by John Doe

Alex BekkerAug 14, 2018 · 5 min read“But Cassandra doesn’t do it well!” is definitely not something you want to hear after deploying a Cassandra cluster and getting down to work with it. Before making any investments, let’s check whether Apache Cassandra is compatible with your tasks or not. So, should you use Cassandra?Cassandra is by nature good for heavy write workloads. Inter-node data distribution is quick, writes are cheap, which makes Cassandra’s handling hundreds of thousands of write operations per second just a regular Tuesday. Besides, Cassandra handles heavy read workloads very nicely, although there are some limitations described further on.If you’re planning data distribution across multiple data centers and cloud availability zones, Cassandra suits too. Your users in Boston and in Honolulu will access their local data centers (which is faster) but will work with the same pools of data.Thanks to data replication, Cassandra fits ‘always-on’ apps because its clusters are always available. Data is stored on multiple nodes and in multiple data centers, so if up to half the nodes in a cluster go down (or even an entire data center), Cassandra will still manage nicely.In combination with Apache Spark and the like, Cassandra can be a strong ‘backbone’ for real-time analytics. And it scales linearly. So, if you anticipate growth of your real-time data, Cassandra definitely has the utmost advantage here.Cassandra has limitations when it comes to:· ACID transactions. If you expect Cassandra to build a system supporting ACID properties (Atomicity, Consistency, Isolation and Durability), unfortunately, it won’t work. Cassandra’s way of dealing with data just isn’t ‘rigid’ enough: it may allow partially successful transactions, contain duplications, contradictions and so on. This is why ACID-dependent systems (for instance, core banking systems handling bank transfers, among other things) shouldn’t go with Cassandra. In fact, no NoSQL technology will do: such systems need relational databases. And although Cassandra has a lightweight-transactions feature aimed at securing ACID properties, it’s not yet good enough.· Strong consistency. To achieve high availability, Cassandra sacrifices strong consistency and only grants eventual one. Cassandra has measures to remedy it, but it’s not enough for precision demanding solutions. One of the ‘remedies’ is to tune the database not to replicate data, but then it kills Cassandra’s trump — availability.· Lots of updates and deletes. Cassandra is incredible at writes (here are the reasons for this amazing write performance). But it’s only append-oriented. If you need to update a lot, Cassandra’s no good: for each update, it just adds a ‘younger’ data version with the same primary key. Imagine how agonizing it can be for reads to find the needed data version in the pool of their ‘lookalikes.’ What’s more, Cassandra handles deletes similarly: it adds a tombstone to data without actually deleting it. Thus, reads targeted to the same primary key uncover lots of ‘undead’ data instead of mere up-to-date values. From time to time, compaction takes place and all the unnecessary data does get deleted, but in between compactions, reads take longer.· Lots of scans. Cassandra reads data pretty well. But it’s good at reading as long as you know the primary key of data you want. If you don’t, Cassandra will have to scan all nodes to find what you need, which will take a while. And if the latency threshold is exceeded, the scan will not be completed at all.To paint a clearer picture of when to use Cassandra, we give you some of its most popular use cases.The way Cassandra’s data model is organized and the fact that Cassandra is designed for intensive write workloads make it exceptionally good for sensor data. It suits completely different industries, be it manufacturing, logistics, healthcare, real estate, energy production, agriculture or whatever. Regardless of sensor types, Cassandra handles the flow of incoming data nicely and provides possibilities for further data analysis.Messaging systems (chats, collaboration and instant messaging apps, etc.) are just as perfect for Cassandra as sensor data, since they don’t require data updates. Cassandra quickly writes new incoming messages, allows quick reads and other additional features. For instance, you may give a message a ‘time to live’ and Cassandra will delete it when this time runs out avoiding expensive tombstones and compaction.Data model design, write-orientation, fairly fast reads and linear scalability make Cassandra suitable for ecommerce websites with features like product catalogs and recommendation or personalization engines. For the latter, Cassandra can store activities of visitors, who fall into the same segment, close to each other, which will allow analytical tools a quick access to the data to, say, generate tempting recommendations for users who want to leave the website.Cassandra also helps various entertainment websites track and monitor their users’ activities. It stores data on what movies, games, articles or songs a user has watched, played, read or listened to, how much time they spent on each activity, etc. Then, Cassandra can feed this data to an analytical tool to recommend other movies, games, articles or songs users may like.Although Cassandra doesn’t go well with transfers between bank accounts and poorly gets along with ACID transactions, banks still can benefit from it. Their big data solutions built to analyze customer data can provide an extra level of security for their clients by enabling fraud detection. Cassandra does it splendidly, given its great speeds and support for real-time analytics through a seamless integration with Apache Spark.Cassandra is not a silver bullet, just like any NoSQL database isn’t. It has its own advantages and disadvantages to consider. If you want to check how well you remember them, here’s a mini quiz to take:1. Can Cassandra run on multiple synchronized data centers?2. What technologies can Cassandra be used with to do real-time analytics?3. How does Cassandra do with upscaling and intensive writes?4. Does Cassandra suit a project that needs ACID transactions, scans, deletes and updates?5. Which is typical of Cassandra: strong consistency or constant availability?Yet, the biggest question is this: should you use Cassandra or steer clear of it? Now, you have all the background information to make this vital decision.

Illustration Image

“But Cassandra doesn’t do it well!” is definitely not something you want to hear after deploying a Cassandra cluster and getting down to work with it. Before making any investments, let’s check whether Apache Cassandra is compatible with your tasks or not. So, should you use Cassandra?

Cassandra is by nature good for heavy write workloads. Inter-node data distribution is quick, writes are cheap, which makes Cassandra’s handling hundreds of thousands of write operations per second just a regular Tuesday. Besides, Cassandra handles heavy read workloads very nicely, although there are some limitations described further on.

If you’re planning data distribution across multiple data centers and cloud availability zones, Cassandra suits too. Your users in Boston and in Honolulu will access their local data centers (which is faster) but will work with the same pools of data.

Thanks to data replication, Cassandra fits ‘always-on’ apps because its clusters are always available. Data is stored on multiple nodes and in multiple data centers, so if up to half the nodes in a cluster go down (or even an entire data center), Cassandra will still manage nicely.

In combination with Apache Spark and the like, Cassandra can be a strong ‘backbone’ for real-time analytics. And it scales linearly. So, if you anticipate growth of your real-time data, Cassandra definitely has the utmost advantage here.

Cassandra has limitations when it comes to:

· ACID transactions. If you expect Cassandra to build a system supporting ACID properties (Atomicity, Consistency, Isolation and Durability), unfortunately, it won’t work. Cassandra’s way of dealing with data just isn’t ‘rigid’ enough: it may allow partially successful transactions, contain duplications, contradictions and so on. This is why ACID-dependent systems (for instance, core banking systems handling bank transfers, among other things) shouldn’t go with Cassandra. In fact, no NoSQL technology will do: such systems need relational databases. And although Cassandra has a lightweight-transactions feature aimed at securing ACID properties, it’s not yet good enough.

· Strong consistency. To achieve high availability, Cassandra sacrifices strong consistency and only grants eventual one. Cassandra has measures to remedy it, but it’s not enough for precision demanding solutions. One of the ‘remedies’ is to tune the database not to replicate data, but then it kills Cassandra’s trump — availability.

· Lots of updates and deletes. Cassandra is incredible at writes (here are the reasons for this amazing write performance). But it’s only append-oriented. If you need to update a lot, Cassandra’s no good: for each update, it just adds a ‘younger’ data version with the same primary key. Imagine how agonizing it can be for reads to find the needed data version in the pool of their ‘lookalikes.’ What’s more, Cassandra handles deletes similarly: it adds a tombstone to data without actually deleting it. Thus, reads targeted to the same primary key uncover lots of ‘undead’ data instead of mere up-to-date values. From time to time, compaction takes place and all the unnecessary data does get deleted, but in between compactions, reads take longer.

· Lots of scans. Cassandra reads data pretty well. But it’s good at reading as long as you know the primary key of data you want. If you don’t, Cassandra will have to scan all nodes to find what you need, which will take a while. And if the latency threshold is exceeded, the scan will not be completed at all.

To paint a clearer picture of when to use Cassandra, we give you some of its most popular use cases.

The way Cassandra’s data model is organized and the fact that Cassandra is designed for intensive write workloads make it exceptionally good for sensor data. It suits completely different industries, be it manufacturing, logistics, healthcare, real estate, energy production, agriculture or whatever. Regardless of sensor types, Cassandra handles the flow of incoming data nicely and provides possibilities for further data analysis.

Messaging systems (chats, collaboration and instant messaging apps, etc.) are just as perfect for Cassandra as sensor data, since they don’t require data updates. Cassandra quickly writes new incoming messages, allows quick reads and other additional features. For instance, you may give a message a ‘time to live’ and Cassandra will delete it when this time runs out avoiding expensive tombstones and compaction.

Data model design, write-orientation, fairly fast reads and linear scalability make Cassandra suitable for ecommerce websites with features like product catalogs and recommendation or personalization engines. For the latter, Cassandra can store activities of visitors, who fall into the same segment, close to each other, which will allow analytical tools a quick access to the data to, say, generate tempting recommendations for users who want to leave the website.

Cassandra also helps various entertainment websites track and monitor their users’ activities. It stores data on what movies, games, articles or songs a user has watched, played, read or listened to, how much time they spent on each activity, etc. Then, Cassandra can feed this data to an analytical tool to recommend other movies, games, articles or songs users may like.

Although Cassandra doesn’t go well with transfers between bank accounts and poorly gets along with ACID transactions, banks still can benefit from it. Their big data solutions built to analyze customer data can provide an extra level of security for their clients by enabling fraud detection. Cassandra does it splendidly, given its great speeds and support for real-time analytics through a seamless integration with Apache Spark.

Cassandra is not a silver bullet, just like any NoSQL database isn’t. It has its own advantages and disadvantages to consider. If you want to check how well you remember them, here’s a mini quiz to take:

1. Can Cassandra run on multiple synchronized data centers?

2. What technologies can Cassandra be used with to do real-time analytics?

3. How does Cassandra do with upscaling and intensive writes?

4. Does Cassandra suit a project that needs ACID transactions, scans, deletes and updates?

5. Which is typical of Cassandra: strong consistency or constant availability?

Yet, the biggest question is this: should you use Cassandra or steer clear of it? Now, you have all the background information to make this vital decision.

Related Articles

cluster
troubleshooting
datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

arodrime

4/3/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra