Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

2/14/2019

Reading time:1 min

Cassandra : Cassandra Data distribution and replication

by John Doe

Post Views: 1,488Cassandra is a Distributed NoSQL database means all the data is distributed across the Cluster. in Cassandra data distribution and replication go together.The distribution and replication depending on the partition key, key value and Token range.Cassandra Table:A collection of ordered columns fetched by table row. A table consists of columns and has a primary key.On the above diagram column1 having the Primary key Primary key value is acting as the partition key.  A Partitioner uses the hash functioning and determines Token range from partition key.A Partitioner determines which node will receive the first replica of a piece of data, and how to distribute other replicas across other nodes in the cluster.Refer this below link for more info.https://www.ktexperts.com/key-components-in-cassandra/Consistent hashing:A Partitioner uses the Consistent hashing, It allows distribution of data across a cluster.The Consistent hashing minimizes reorganization of cluster when nodes are added or removed.Cassandra uses the Murmur3hash function (default) for Consistent hashing. Each node in the cluster is responsible for a range of data based on the hash value.In the below diagram you can see the distributed token range for 4 node cluster.Example: A table with user details.USERIDNAMESAL28Raj1500015Krish8000054marry280009Lucky90000On the above diagram USERID having the Primary key. Hash values in a four node cluster.Murmur3 Partitioner generates a hash value to each partition key.Partition keyMurmur3 hash valueRaj744546265787223821Krish-853335892720368062Marry124471525403678548Lucky-420462738791245812Cassandra places the data on each node according to the value of the partition key and the range that the node is responsible for the data.Replication factor: Replication factor indicates the total number of replicas across the cluster.Let’s tack Replication factor 2 for this 4 node cluster.A replication factor of 2 means two copies of each row on the cluster where each copy is on a different node. In Cassandra All replicas are equally important there is no primary or master replica.Continue with part2…… 
Note: Please test scripts in Non Prod before trying in Production.

Illustration Image

Post Views: 1,488

Cassandra is a Distributed NoSQL database means all the data is distributed across the Cluster. in Cassandra data distribution and replication go together.

The distribution and replication depending on the partition key, key value and Token range.

Cassandra Table:

A collection of ordered columns fetched by table row. A table consists of columns and has a primary key.

On the above diagram column1 having the Primary key

Primary key value is acting as the partition key.  A Partitioner uses the hash functioning and determines Token range from partition key.

A Partitioner determines which node will receive the first replica of a piece of data, and how to distribute other replicas across other nodes in the cluster.

Refer this below link for more info.

https://www.ktexperts.com/key-components-in-cassandra/

Consistent hashing:

A Partitioner uses the Consistent hashing, It allows distribution of data across a cluster.
The Consistent hashing minimizes reorganization of cluster when nodes are added or removed.
Cassandra uses the Murmur3hash function (default) for Consistent hashing. Each node in the cluster is responsible for a range of data based on the hash value. In the below diagram you can see the distributed token range for 4 node cluster. Example: A table with user details. USERID NAME SAL 28 Raj 15000 15 Krish 80000 54 marry 28000 9 Lucky 90000 On the above diagram USERID having the Primary key. Hash values in a four node cluster. Murmur3 Partitioner generates a hash value to each partition key. Partition key Murmur3 hash value Raj 744546265787223821 Krish -853335892720368062 Marry 124471525403678548 Lucky -420462738791245812 Cassandra places the data on each node according to the value of the partition key and the range that the node is responsible for the data. Replication factor:   Replication factor indicates the total number of replicas across the cluster. Let’s tack Replication factor 2 for this 4 node cluster. A replication factor of 2 means two copies of each row on the cluster where each copy is on a different node. In Cassandra All replicas are equally important there is no primary or master replica. Continue with part2…… Note: Please test scripts in Non Prod before trying in Production.

Related Articles

python
java
cassandra

Vald

John Doe

2/11/2024

cassandra
distributed

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra