A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically, a partitioner is a hash function for computing the token (it's hash) of a row key. Each row of data is uniquely identified by a row key and distributed across the cluster by the value of the token.

Data Distribution in the Ring

In Cassandra, the total amount of data managed by the cluster is represented as a ring. The ring is divided into ranges equal to the number of nodes, with each node being responsible for one or more ranges of the data. Before a node can join the ring, it must be assigned a token. The token value determines the node's position in the ring and its range of data. Column family data is partitioned across the nodes based on the row key. To determine the node where the first replica of a row will live, the ring is walked clockwise until it locates the node with a token value greater than that of the row key. Each node is responsible for the region of the ring between itself (inclusive) and its predecessor (exclusive). With the nodes sorted in token order, the last node is considered the predecessor of the first node; hence the ring representation.

For example, consider a simple four node cluster where all of the row keys managed by the cluster were numbers in the range of 0 to 100. Each node is assigned a token that represents a point in this range. In this simple example, the token values are 0, 25, 50, and 75. The first node, the one with token 0, is responsible for the wrapping range (75-0). The node with the lowest token also accepts row keys less than the lowest token and more than the highest token.


Understanding the Partitioner Types

When you deploy a Cassandra cluster, you must assign a partitioner and assign each node an initial_token value so each node is responsible for roughly an equal amount of data (load balancing). DataStax strongly recommends using the RandomPartitioner (default) for all cluster deployments.

To calculate the tokens for nodes in a single data center cluster, you divide the range by the total number of nodes in the cluster. In multiple data center deployments, you calculate the tokens such that each data center is individually load balanced. See Generating Tokens for the different approaches to generating tokens for nodes in single and multiple data center clusters.

Unlike almost every other configuration choice in Cassandra, the partitioner may not be changed without reloading all of your data. Therefore, it is important to choose and configure the correct partitioner before initializing your cluster. You set the partitioner in the cassandra.yaml file.

Cassandra offers the following partitioners: