Post Views:
1,488
Cassandra is a Distributed NoSQL database means all the data is distributed across the Cluster. in Cassandra data distribution and replication go together.
The distribution and replication depending on the partition key, key value and Token range.
Cassandra Table:
A collection of ordered columns fetched by table row. A table consists of columns and has a primary key.
On the above diagram column1 having the Primary key
Primary key value is acting as the partition key. A Partitioner uses the hash functioning and determines Token range from partition key.
A Partitioner determines which node will receive the first replica of a piece of data, and how to distribute other replicas across other nodes in the cluster.
Refer this below link for more info.
https://www.ktexperts.com/key-components-in-cassandra/
Consistent hashing:
A Partitioner uses the Consistent hashing, It allows distribution of data across a cluster.
The Consistent hashing minimizes reorganization of cluster when nodes are added or removed.
Cassandra uses the Murmur3hash function (default) for Consistent hashing. Each node in the cluster is responsible for a range of data based on the hash value.
In the below diagram you can see the distributed token range for 4 node cluster.
Example: A table with user details.
USERID
NAME
SAL
28
Raj
15000
15
Krish
80000
54
marry
28000
9
Lucky
90000
On the above diagram USERID having the Primary key. Hash values in a four node cluster.
Murmur3 Partitioner generates a hash value to each partition key.
Partition key
Murmur3 hash value
Raj
744546265787223821
Krish
-853335892720368062
Marry
124471525403678548
Lucky
-420462738791245812
Cassandra places the data on each node according to the value of the partition key and the range that the node is responsible for the data.
Replication factor:
Replication factor indicates the total number of replicas across the cluster.
Let’s tack Replication factor 2 for this 4 node cluster.
A replication factor of 2 means two copies of each row on the cluster where each copy is on a different node. In Cassandra All replicas are equally important there is no primary or master replica.
Continue with part2……
Note: Please test scripts in Non Prod before trying in Production.