Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

1/28/2022

Reading time:5 min

Understanding Data Partitioning and Replication in Apache Cassandra

by DataStax

Understanding Data Partitioning and Replication in Apache Cassandra Successfully reported this slideshow. View original on YouTube Next SlideShares Upcoming SlideShare Loading in …3× 1 of 19 1. Cassandra EssentialsTutorial Series Understanding Data Partitioning and Replication inApache Cassandra 2. Agenda› Overview of partitioning› Setting up data partitioning› Overview of replication› Replication strategies (e.g. single, multi- data center)› Replication mechanics› Where to get Cassandra www.datastax.com 3. Overview of Data Partitioning in CassandraCassandra is a distributed database managementsystem that easily and transparently partitions your dataacross all participating nodes in a database cluster. Eachnode is responsible for part of the overall database. Data is inserted and assigned a row key in a column family Inserted row Data placed on node based on its column family row key www.datastax.com 4. Overview of Data Partitioning in CassandraThere are two basic data partitioning strategies:1.  Random partitioning – this is the default and recommended strategy. Partitions data as evenly as possible across all nodes using an MD5 hash of every column family row key2.  Ordered partitioning – stores column family row keys in sorted order across the nodes in a database cluster www.datastax.com 5. Setting up Data Partitioning in CassandraThe data partitioning strategy is controlled via theCassandra configuration file (cassandra.yaml)partitioner option. There are no other mechanics,work, sharding, etc., to partition data in Cassandra.Note that once a cluster is initialized with a partitioneroption, it cannot be changed without reloading all ofthe data in the cluster. www.datastax.com 6. Overview of Replication in CassandraTo ensure fault tolerance and no single point of failure,you can replicate one or more copies of every row in acolumn family across participating nodes in adatabase cluster. Data is inserted and assigned a row key in a column family Original row Copy of row is replicated across various nodes in the cluster based on the Copy of assigned replication row factor www.datastax.com 7. Overview of Replication in CassandraReplication is controlled by what is called thereplication factor. A replication factor of 1 means thereis only one copy of a row in a cluster. A replicationfactor of 2 means there are two copies of a row storedin a cluster.Replication is controlled at the keyspace level inCassandra. Original row Copy of row www.datastax.com 8. Replication StrategiesThere are different replication strategies:Simple Strategy: places the original row on a nodedetermined by the partitioner. Additional replica rowsare placed on the next nodes clockwise in the ringwithout considering rack or data center location. Original row Copy of row www.datastax.com 9. Replication StrategiesNetwork Topology Strategy: allows for replicationbetween different racks in a data center and/orbetween multiple data centers. This strategy providesmore control over where replica rows are placed. www.datastax.com 10. Replication StrategiesNetwork Topology Strategy: The original row is placedaccording to the partitioner. Additional replica rows inthe same data center are then placed by walking thering clockwise until a node in a different rack from theprevious replica is found. If there is no such node,additional replicas will be placed in the same rack. www.datastax.com 11. Replication StrategiesNetwork Topology Strategy: To replicate databetween 1-n data centers, a replica group is definedand mapped to each logical or physical data center.This definition is specified when a keyspace is createdin Cassandra. www.datastax.com 12. Replication StrategiesBelow is a CQL example of creating a keyspace thatuses the Network Topology replication strategy and hasthree data replicas:CREATE KEYSPACE mykeyspace WITHstrategy_class = 'NetworkTopologyStrategy’ ANDstrategy_options:DC1 = 3; Replica group Number of replicas Original row 2nd copy 1st copy of of row row www.datastax.com 13. Replication MechanicsCassandra uses a snitch to define how nodes aregrouped together within the overall network topology(such as rack and data center groupings). The snitch isdefined in the cassandra.yaml file www.datastax.com 14. Replication MechanicsThe basic snitches include:1.  Simple Snitch – the default and used for the simple replication strategy2.  Rack Inferring Snitch - infers the topology of the network by analyzing the node IP addresses. This snitch assumes that the second octet identifies the data center where a node is located, and the third octet identifies the rack.3.  Property File Snitch – determines the location of nodes by referring to a user-defined description of the network details located in the property file cassandra- topology.properties.4.  EC2 Snitch - is for deployments on Amazon EC2 only. Instead of using the IP to infer node location, this snitch uses the AWS API to request region and availability zone. www.datastax.com 15. Reading and Writing to Cassandra NodesCassandra is a read/write anywhere architecture, so anyuser can connect to any node in any data center andread/write the data they need, with all writes beingpartitioned and replicated for them automaticallythroughout the cluster. www.datastax.com 16. Where to get Cassandra?›  Go to www.datastax.com›  DataStax makes free smart start installers available for Cassandra that include: ›  The most up-to-date Cassandra version that is production quality ›  A version of DataStax OpsCenter, which is a visual, browser-based management tool for managing and monitoring Cassandra ›  Drivers and connectors for popular development languages ›  Same database and application ›  Automatic configuration assistance for ensuring optimal performance and setup for either stand- alone or cluster implementations ›  Getting Started Guide www.datastax.com 17. Where Can I Learn More? www.datastax.com ›  Free Online Documentation ›  Technical White Papers ›  Technical Articles ›  Tutorials ›  User Forums ›  User/Customer Case Studies ›  FAQ’s ›  Videos ›  Blogs ›  Software downloads www.datastax.com 18. Cassandra EssentialsTutorial Series Understanding Data Partitioning and Replication in Apache Cassandra Thanks! × Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard Share this SlideShare

Illustration Image
Understanding Data Partitioning and Replication in Apache Cassandra

Related Articles

cassandra
cassandra.partitioner

Apache Cassandra Advanced Architecture Tutorial

John Doe

6/3/2020

data.modeling
cassandra
cassandra.partitioner

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra