This series of posts present an introduction to Apache Cassandra. It discusses key Cassandra features, its core concepts, how it works under the hood, how it is different from other data stores, data modelling best practices with examples, and some tips & tricks.
This series is not about MongoDB or even MongoDB vs Cassandra, but “How is it different from MongoDB?” is a commonly asked question when talking about Cassandra. So before going deep into Cassandra, I would like to describe some commonalities as well as key differences between the two.
Here are some properties which both Cassandra and MongoDB share:
- None of these data stores are a replacement for RDBMS.
- They do not provide ACID compliance.
- Both keep recent data in memory to improve performance.
- Both data stores discourage joins and prefer denormalization.
- Both are open source, have been in industry for quite some time, and have comprehensive support.
At a high level, some major differences between these two stores are:
- MongoDB uses B-Trees under the hood for storage while Cassandra is based on LSM trees which makes Cassandra more scalable for writes.
- MongoDB is more closer to a RDBMS than Cassandra as you can implement concepts like relationships and joins on top of MongoDB. Same is not true for Cassandra.
- MongoDB supports nested objects, Cassandra does not.
- MongoDB offers both primary and secondary index and also allows indexing of nested properties. While Cassandra only supports primary index. [More on this later]
- MongoDB lets you write queries in JSON format and allow all kinds of operators. Cassandra offers CQL which only supports limited operators and use of these operators also depend on how you have defined the schema.
- MongoDB provides built-in aggregation, which works well for small to medium sized databases, Cassandra has no such feature.
- MongoDB does not enforce schema on write, Cassandra prefers design time schema.
- MongoDB is document based store which is somewhat similar to rows in a table, Cassandra on the other hand is a column family store (and not a column oriented store, more on this later).
- Cassandra provides high write availability through its master-less or multi master architecture as compared to MongoDB which works on master slave architecture. (More on this later)
- Cassandra provides linear write scalability which is not the case with MongoDB.