6/17/2020

Reading time:2 min

johnnywidth/cql-calculator

by John Doe

About projectCassandra is known for splitting data into the partitions and has a few limitations as to the performance, therefore, it's critical to keep the size of partitions small.To solve that problem partially, you can run this tool which is used to determine the size of partitions in order to anticipate the needed disk space.To run the program, you'll need to specify the CREATE TABLE query and indicate the assumed number of rows that will appear in the table. In the end, you'll get the exact partition size and the amount of cells.The benefitsThis tool is a quick solution to change the schema design and apply different assumptions to the various data types. Overall, it helps to keep Cassandra productive and avoid any performance bugs or complications in production mode.Calculating Partition SizeIn order to calculate the size of our partitions, we use the following formula:Nv=Nr(Nc−Npk−Ns)+NsThe number of values (or cells) in the partition (Nv) is equal to the number of static columns (Ns) plus the product of the number of rows (Nr) and the number of of values per row. The number of values per row is defined as the number of columns (Nc) minus the number of primary key columns (Npk) and static columns (Ns).In order to determine the size, we use the following formula to determine the size St of a partition:In this formula, ck refers to partition key columns, cs to static columns, cr to regular columns, and cc to clustering columns.The term tavg refers to the average number of bytes of metadata stored per cell, such as timestamps. It is typical to use an estimate of 8 bytes for this value.We recognize the number of rows Nr and number of values Nv from our previous calculations.The sizeOf() function refers to the size in bytes of the CQL data type of each referenced column.Install$ go get -u github.com/johnnywidth/cql-calculator/cmd/cql-calculatorExamplesRun for CREATE TABLE query$ cql-calculator -query "CREATE TABLE video ( \ video_id int, email text, name text STATIC, \ status tinyint, uploaded_at timestamp, \ PRIMARY KEY (video_id, email))"# OutputEnter rows count per one partition: 10000Enter (avarage) size for `email (text)` column: 150Enter (avarage) size for `name (text)` column: 250Number of Values:(10000 * (5 - 2 - 1) + 1) = 20001Partition Size on Disk:(4 + 250 + (10000 * 309) + (8 * 20001)) = 3250262 bytes (3.10 Mb)Run for CREATE TABLE query and save to file$ cql-calculator -file generated.yaml -query "CREATE TABLE video ( \ video_id int, email text, name text STATIC, \ status tinyint, uploaded_at timestamp, \ PRIMARY KEY (video_id, email))"# OutputEnter rows count per one partition: 10000Enter (avarage) size for `email (text)` column: 150Enter (avarage) size for `name (text)` column: 250Number of Values:(10000 * (5 - 2 - 1) + 1) = 20001Partition Size on Disk:(4 + 250 + (10000 * 309) + (8 * 20001)) = 3250262 bytes (3.10 Mb)$ cql-calculator -file generated.yaml# OutputNumber of Values:(10000 * (5 - 2 - 1) + 1) = 20001Partition Size on Disk:(4 + 250 + (10000 * 309) + (8 * 20001)) = 3250262 bytes (3.10 Mb)TODOParsing simple PRIMARY KEY: CREATE TABLE video (video_id int PRIMARY KEY, email text)

Read this article if you want to know more about johnnywidth/cql-calculator

About project

Cassandra is known for splitting data into the partitions and has a few limitations as to the performance, therefore, it's critical to keep the size of partitions small.

To solve that problem partially, you can run this tool which is used to determine the size of partitions in order to anticipate the needed disk space.

To run the program, you'll need to specify the CREATE TABLE query and indicate the assumed number of rows that will appear in the table. In the end, you'll get the exact partition size and the amount of cells.

The benefits

This tool is a quick solution to change the schema design and apply different assumptions to the various data types. Overall, it helps to keep Cassandra productive and avoid any performance bugs or complications in production mode.

Calculating Partition Size

In order to calculate the size of our partitions, we use the following formula:

Nv=Nr(Nc−Npk−Ns)+Ns

The number of values (or cells) in the partition (Nv) is equal to the number of static columns (Ns) plus the product of the number of rows (Nr) and the number of of values per row. The number of values per row is defined as the number of columns (Nc) minus the number of primary key columns (Npk) and static columns (Ns).

In order to determine the size, we use the following formula to determine the size St of a partition:

In this formula, ck refers to partition key columns, cs to static columns, cr to regular columns, and cc to clustering columns.
The term tavg refers to the average number of bytes of metadata stored per cell, such as timestamps. It is typical to use an estimate of 8 bytes for this value.
We recognize the number of rows Nr and number of values Nv from our previous calculations.
The sizeOf() function refers to the size in bytes of the CQL data type of each referenced column.

Install

$ go get -u github.com/johnnywidth/cql-calculator/cmd/cql-calculator

Examples

Run for CREATE TABLE query

$ cql-calculator -query "CREATE TABLE video ( \
    video_id int, email text, name text STATIC, \
    status tinyint, uploaded_at timestamp, \
    PRIMARY KEY (video_id, email))"
# Output
Enter rows count per one partition: 10000
Enter (avarage) size for `email (text)` column: 150
Enter (avarage) size for `name (text)` column: 250
Number of Values:
(10000 * (5 - 2 - 1) + 1) = 20001
Partition Size on Disk:
(4 + 250 + (10000 * 309) + (8 * 20001)) = 3250262 bytes (3.10 Mb)

Run for CREATE TABLE query and save to file

$ cql-calculator -file generated.yaml -query "CREATE TABLE video ( \
    video_id int, email text, name text STATIC, \
    status tinyint, uploaded_at timestamp, \
    PRIMARY KEY (video_id, email))"
# Output
Enter rows count per one partition: 10000
Enter (avarage) size for `email (text)` column: 150
Enter (avarage) size for `name (text)` column: 250
Number of Values:
(10000 * (5 - 2 - 1) + 1) = 20001
Partition Size on Disk:
(4 + 250 + (10000 * 309) + (8 * 20001)) = 3250262 bytes (3.10 Mb)

$ cql-calculator -file generated.yaml
# Output
Number of Values:
(10000 * (5 - 2 - 1) + 1) = 20001
Partition Size on Disk:
(4 + 250 + (10000 * 309) + (8 * 20001)) = 3250262 bytes (3.10 Mb)

TODO

Parsing simple PRIMARY KEY: CREATE TABLE video (video_id int PRIMARY KEY, email text)

acid

open.source

cassandra

GitHub - pmcfadin/awesome-accord: Repository of all kinds of things to help you get up and running with ACID transactions on Apache Cassandra®

pmcfadin

1/16/2025

mongo

nocode

elasticsearch

GitHub - ibagroup-eu/Visual-Flow: Visual-Flow main repository

ibagroup-eu

12/2/2024

cassandra

tools

sstables

ic-tools for Apache Cassandra SSTables

John Doe

2/17/2023

data.modeling

cassandra

Search key of big partition in cassandra

John Doe

2/17/2023

data.modeling

cassandra

Apache Cassandra Data Partitioning

Anup Shirolkar

2/17/2023

data.modeling

cassandra

spark

Dealing with Large Spark Partitions

John Doe

2/17/2023

data.modeling

astra

cassandra

Data Modeling in Cassandra and Astra DB - NLJUG - Nederlandse Java User Group

John Doe

9/22/2022

cloud

modernization

open.source

The cloud ate my database

Matt Asay

9/8/2022

data.modeling

cassandra

7 mistakes when using Apache Cassandra

John Doe

4/7/2022

kubernetes

open.source

cassandra

Apple Open Source

John Doe

3/11/2022

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!