Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

2/17/2023

Reading time:10 min

ic-tools for Apache Cassandra SSTables

by John Doe

Table of ContentsOverviewInstaclustr has developed a number of useful tools to assist with diagnosing issues in a cluster. For users of Instaclustr’s Managed Service, our Technical Operations team will run these as needed when working with you to help diagnose issues. The tools are available on a supported basis for our enterprise support customers and on an unsupported basis for the general community (although we’ll probably answer questions on the C* user email list).These tools supplement the information available from the nodetool utility that is part of core Apache Cassandra. Whereas nodetool tends to report based on summary statistics maintained as Cassandra services operate, ic-tools directly read Cassandra’s data files when executed to report more detailed and accurate statistics.As such, executing the tools can result in a large amount of data being read which can potentially impact the performance of a node where they are being executed. The two most data heavy tools (ic-cfstats and ic-purge) provide rate limiting functions to reduce the impact. However, users are advised to execute care when using these tools in a live cluster.These tools are version-specific and you must use the corresponding ic-tools version for your Cassandra version. We have provided pre-built jars for all versions of Cassandra at the bottom of this page.The source code is published on GitHub.CommandDescriptionic-summarySummary information about all column families including how much of the data is repairedic-sstablesPrint out metadata for SSTables the belong to a column familyic-pstatsPartition size statistics for a column familyic-cfstatsDetailed statistics about cells in a column familyic-purgeStatistics about reclaimable data for a column family(We’ve generally used the old-school C* term ‘column family’. It is synonymous with ‘table’ in modern C* versions.)ic-summaryProvides summary information about all column families. Useful for finding the largest column families and how much data has been repaired by incremental repairs.Usageic-summaryOutputColumnDescriptionKeyspaceKeyspace the column family belongs toColumn FamilyName of column familySSTablesNumber of SSTables on this node for the column familyDisk SizeCompressed size on disk for this nodeData SizeUncompressed size of the data for this nodeLast RepairedTime of the last incremental repairRepair %Percentage of data marked as repaired by incremental repairic-sstablesPrint out SSTable metadata for a column family. Useful in helping to tune compaction settings.Usageic-sstables <keyspace> <column-family>OutputColumnDescriptionSSTableData.db filename of SSTableDisk SizeSize of SSTable on diskTotal SizeUncompressed size of data contained in the SSTableMin TimestampMinimum cell timestamp contained in the SSTableMax TimestampMaximum cell timestamp contained in the SSTableDurationThe time span between minimum and maximum cell timestampsLevelLeveled Tiered Compaction SSTable levelKeysNumber of partition keysAvg Partition SizeAverage partition sizeMax Partition SizeMaximum partition sizeAvg Column CountAverage number of columns in a partitionMax Column CountMaximum number of columns in a partitionDroppableEstimated droppable tombstonesRepaired AtTime when marked as repaired by incremental repairic-pstatsTool for finding largest partitions. Reads the Index.db files so is relatively quick.Usageic-pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>-hDisplay help-bBatch mode. Uses progress indicator that is friendly for running in batch jobs.-n <num>Number of partitions to display in leaders lists-t <name>Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.-f <files>Comma separated list of Data.db SSTables to filter onOutputSummary: Summary statistics about partitionsColumnDescriptionCount (Size)Number of partition keys on this nodeTotal (Size)Total uncompressed size of all partitions on this nodeTotal (SSTable)Number of SSTables on this nodeMinimum (Size)Minimum uncompressed partition sizeMinimum (SSTable)Minimum number of SSTables a partition belongs toAverage (Size)Average (mean) uncompressed partition sizeAverage (SSTable)Average (mean) number of SSTables a partition belongs toStd dev. (Size)Standard deviation of partition sizesStd dev. (SSTable)Standard deviation of number of SSTables for a partition50% (Size)Estimated 50th percentile of partition sizes50% (SSTable)Estimated 50th percentile of SSTables for a partition75% (Size)Estimated 75th percentile of partition sizes75% (SSTable)Estimated 75th percentile of SSTables for a partition90% (Size)Estimated 90th percentile of partition sizes90% (SSTable)Estimated 90th percentile of SSTables for a partition95% (Size)Estimated 95th percentile of partition sizes95% (SSTable)Estimated 95th percentile of SSTables for a partition99% (Size)Estimated 99th percentile of partition sizes99% (SSTable)Estimated 99th percentile of SSTables for a partition99.9% (Size)Estimated 99.9th percentile of partition sizes99.9% (SSTable)Estimated 99.9th percentile of SSTables for a partitionMaximum (Size)Maximum uncompressed partition sizeMaximum (SSTable)Maximum number of SSTables a partition belongs toLargest partitions: The top N largest partitionsColumnDescriptionKeyThe partition keySizeTotal uncompressed size of the partitionSSTable CountNumber of SSTables that contain the partitionSSTable Leaders: The top N partitions that belong to the most SSTablesColumnDescriptionKeyThe partition keySSTable CountNumber of SSTables that contain the partitionSizeTotal uncompressed size of the partitionSSTables: Metadata about SSTables as it relates to partitions.ColumnDescriptionSSTableData.db filename of SSTableSizeUncompressed sizeMin TimestampMinimum cell timestamp in the SSTableMax TimestampMaximum cell timestamp in the SSTableLevelLeveled Tiered Compaction level of SSTablePartitionsNumber of partition keys in the SSTableAvg Partition SizeAverage uncompressed partition size in SSTableMax Partition SizeMaximum uncompressed partition size in SSTableic-cfstatsTool for getting detailed cell statistics that can help identify issues with data model.Usageic-cfstats [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>-hDisplay help-bBatch mode. Uses progress indicator that is friendly for running in batch jobs.-r <limit>Limit read throughput to ratelimit MB/s (unlimited by default, 16 is probably a good starting point if you want to limit)-n <num>Number of partitions to display in leaders lists-t <name>Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.-f <files>Comma separated list of Data.db SSTables to filter onOutputSummary: Summary statistics about partitionsColumnDescriptionCount (Size)Number of partition keys on this nodeRows (Size)(3.x only) Number of clustering rows(deleted)(3.x only) Number of clustering row deletionsTotal (Size)Total uncompressed size of all partitions on this nodeTotal (SSTable)Number of SSTables on this nodeMinimum (Size)Minimum uncompressed partition sizeMinimum (SSTable)Minimum number of SSTables a partition belongs toAverage (Size)Average (mean) uncompressed partition sizeAverage (SSTable)Average (mean) number of SSTables a partition belongs toStd dev. (Size)Standard deviation of partition sizesStd dev. (SSTable)Standard deviation of number of SSTables for a partition50% (Size)Estimated 50th percentile of partition sizes50% (SSTable)Estimated 50th percentile of SSTables for a partition75% (Size)Estimated 75th percentile of partition sizes75% (SSTable)Estimated 75th percentile of SSTables for a partition90% (Size)Estimated 90th percentile of partition sizes90% (SSTable)Estimated 90th percentile of SSTables for a partition95% (Size)Estimated 95th percentile of partition sizes95% (SSTable)Estimated 95th percentile of SSTables for a partition99% (Size)Estimated 99th percentile of partition sizes99% (SSTable)Estimated 99th percentile of SSTables for a partition99.9% (Size)Estimated 99.9th percentile of partition sizes99.9% (SSTable)Estimated 99.9th percentile of SSTables for a partitionMaximum (Size)Maximum uncompressed partition sizeMaximum (SSTable)Maximum number of SSTables a partition belongs to(3.x only) Row Histogram: Histogram of number of rows per partitionColumnDescriptionPercentileMinimum, average, standard deviation (std dev.), percentile, maximumCountEstimated number of rows per partition for the given percentileLargest partitions: Partitions with largest uncompressed sizeColumnDescriptionKeyThe partition keySizeTotal uncompressed size of the partitionRows(3.x only) Total number of clustering rows in the partition(deleted)(3.x only) Number of row deletions in the partitionTombstonesNumber of cell or range tombstones(droppable)Number of tombstones that can be dropped as per gc_grace_secondsCellsNumber of cells in the partitionSSTable CountNumber of SSTables that contain the partitionWidest partitions: Partitions with the most cellsColumnDescriptionKeyThe partition keyRows(3.x only) Total number of clustering rows in the partition(deleted)(3.x only) Number of row deletions in the partitionCellsNumber of cells in the partitionTombstonesNumber of cell or range tombstones(droppable)Number of tombstones that can be dropped as per gc_grace_secondsSizeTotal uncompressed size of the partitionSSTable CountNumber of SSTables that contain the partition(3.x only) Most Deleted Rows: Partitions with the most row deletionsColumnDescriptionKeyThe partition keyRowsTotal number of clustering rows in the partition(deleted)Number of row deletions in the partitionSizeTotal uncompressed size of the partitionSSTable CountNumber of SSTables that contain the partitionTombstone Leaders: Partitions with the most tombstonesColumnDescriptionKeyThe partition keyTombstonesNumber of cell or range tombstones(droppable)Number of tombstones that can be dropped as per gc_grace_secondsRows(3.x only) Total number of clustering rows in the partitionCellsNumber of cells in the partitionSizeTotal uncompressed size of the partitionSSTable CountNumber of SSTables that contain the partitionSSTable Leaders: Partitions that are in the most SSTablesColumnDescriptionKeyThe partition keySSTable CountNumber of SSTables that contain the partitionSizeTotal uncompressed size of the partitionRows(3.x only) Total number of clustering rows in the partitionCellsNumber of cells in the partitionTombstonesNumber of cell or range tombstones(droppable)Number of tombstones that can be dropped as per gc_grace_secondsSSTables: Metadata about SSTables as it relates to partitions.ColumnDescriptionSSTableData.db filename of SSTableSizeUncompressed sizeMin TimestampMinimum cell timestamp in the SSTableMax TimestampMaximum cell timestamp in the SSTablePartitionsNumber of partitions(deleted)Number of row level partition deletions(avg size)Average uncompressed partition size in SSTable(max size)Maximum uncompressed partition size in SSTableRows(3.x only) Total number of clustering rows in SSTable(deleted)(3.x only) Number of row deletions in SSTableCellsNumber of cells in the SSTableTombstonesNumber of cell or range tombstones in the SSTable(droppable)Number of tombstones that are droppable according to gc_grace_seconds(range)Number of range tombstonesCell LivenessPercentage of live cells. Does not consider tombstones or cell updates shadowing cells. That is it is percentage of non-tombstoned cells to total number of cells.ic-purgeUsageic-purge [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>-hDisplay help-bBatch mode. Uses progress indicator that is friendly for running in batch jobs.-r <limit>Limit read throughput to ratelimit MB/s (unlimited by default, 16 is probably a good starting point if you want to limit)-n <num>Number of partitions to display in leaders lists-t <name>Snapshot to analyse. Snapshot is created if none is specified.OutputLargest reclaimable partitions: Partitions with the largest amount of reclaimable dataColumnDescriptionKeyThe partition keySizeTotal uncompressed size of the partitionReclaimReclaimable uncompressed sizeGenerationsSSTable generations the partition belongs toDownloadsic-sstable-tools-3_11_3.jar (67 KB)ic-sstable-tools-3_0_17.jar (67 KB)ic-sstable-tools-2_2_13.jar (61 KB)ic-sstable-tools-2_1_20.jar (59 KB)ic-sstable-tools-2_0_17.jar (59 KB)By Instaclustr Support

Illustration Image

Table of Contents

Overview

Instaclustr has developed a number of useful tools to assist with diagnosing issues in a cluster. For users of Instaclustr’s Managed Service, our Technical Operations team will run these as needed when working with you to help diagnose issues. The tools are available on a supported basis for our enterprise support customers and on an unsupported basis for the general community (although we’ll probably answer questions on the C* user email list).

These tools supplement the information available from the nodetool utility that is part of core Apache Cassandra. Whereas nodetool tends to report based on summary statistics maintained as Cassandra services operate, ic-tools directly read Cassandra’s data files when executed to report more detailed and accurate statistics.

As such, executing the tools can result in a large amount of data being read which can potentially impact the performance of a node where they are being executed. The two most data heavy tools (ic-cfstats and ic-purge) provide rate limiting functions to reduce the impact. However, users are advised to execute care when using these tools in a live cluster.

These tools are version-specific and you must use the corresponding ic-tools version for your Cassandra version. We have provided pre-built jars for all versions of Cassandra at the bottom of this page.

The source code is published on GitHub.

Command Description
ic-summary Summary information about all column families including how much of the data is repaired
ic-sstables Print out metadata for SSTables the belong to a column family
ic-pstats Partition size statistics for a column family
ic-cfstats Detailed statistics about cells in a column family
ic-purge Statistics about reclaimable data for a column family

(We’ve generally used the old-school C* term ‘column family’. It is synonymous with ‘table’ in modern C* versions.)

ic-summary

Provides summary information about all column families. Useful for finding the largest column families and how much data has been repaired by incremental repairs.

Usage

ic-summary

Output

Column Description
Keyspace Keyspace the column family belongs to
Column Family Name of column family
SSTables Number of SSTables on this node for the column family
Disk Size Compressed size on disk for this node
Data Size Uncompressed size of the data for this node
Last Repaired Time of the last incremental repair
Repair % Percentage of data marked as repaired by incremental repair

ic-sstables

Print out SSTable metadata for a column family. Useful in helping to tune compaction settings.

Usage

ic-sstables <keyspace> <column-family>

Output

Column Description
SSTable Data.db filename of SSTable
Disk Size Size of SSTable on disk
Total Size Uncompressed size of data contained in the SSTable
Min Timestamp Minimum cell timestamp contained in the SSTable
Max Timestamp Maximum cell timestamp contained in the SSTable
Duration The time span between minimum and maximum cell timestamps
Level Leveled Tiered Compaction SSTable level
Keys Number of partition keys
Avg Partition Size Average partition size
Max Partition Size Maximum partition size
Avg Column Count Average number of columns in a partition
Max Column Count Maximum number of columns in a partition
Droppable Estimated droppable tombstones
Repaired At Time when marked as repaired by incremental repair

ic-pstats

Tool for finding largest partitions. Reads the Index.db files so is relatively quick.

Usage

ic-pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>

-h Display help
-b Batch mode. Uses progress indicator that is friendly for running in batch jobs.
-n <num> Number of partitions to display in leaders lists
-t <name> Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.
-f <files> Comma separated list of Data.db SSTables to filter on

Output

Summary: Summary statistics about partitions

Column Description
Count (Size) Number of partition keys on this node
Total (Size) Total uncompressed size of all partitions on this node
Total (SSTable) Number of SSTables on this node
Minimum (Size) Minimum uncompressed partition size
Minimum (SSTable) Minimum number of SSTables a partition belongs to
Average (Size) Average (mean) uncompressed partition size
Average (SSTable) Average (mean) number of SSTables a partition belongs to
Std dev. (Size) Standard deviation of partition sizes
Std dev. (SSTable) Standard deviation of number of SSTables for a partition
50% (Size) Estimated 50th percentile of partition sizes
50% (SSTable) Estimated 50th percentile of SSTables for a partition
75% (Size) Estimated 75th percentile of partition sizes
75% (SSTable) Estimated 75th percentile of SSTables for a partition
90% (Size) Estimated 90th percentile of partition sizes
90% (SSTable) Estimated 90th percentile of SSTables for a partition
95% (Size) Estimated 95th percentile of partition sizes
95% (SSTable) Estimated 95th percentile of SSTables for a partition
99% (Size) Estimated 99th percentile of partition sizes
99% (SSTable) Estimated 99th percentile of SSTables for a partition
99.9% (Size) Estimated 99.9th percentile of partition sizes
99.9% (SSTable) Estimated 99.9th percentile of SSTables for a partition
Maximum (Size) Maximum uncompressed partition size
Maximum (SSTable) Maximum number of SSTables a partition belongs to

Largest partitions: The top N largest partitions

Column Description
Key The partition key
Size Total uncompressed size of the partition
SSTable Count Number of SSTables that contain the partition

SSTable Leaders: The top N partitions that belong to the most SSTables

Column Description
Key The partition key
SSTable Count Number of SSTables that contain the partition
Size Total uncompressed size of the partition

SSTables: Metadata about SSTables as it relates to partitions.

Column Description
SSTable Data.db filename of SSTable
Size Uncompressed size
Min Timestamp Minimum cell timestamp in the SSTable
Max Timestamp Maximum cell timestamp in the SSTable
Level Leveled Tiered Compaction level of SSTable
Partitions Number of partition keys in the SSTable
Avg Partition Size Average uncompressed partition size in SSTable
Max Partition Size Maximum uncompressed partition size in SSTable

ic-cfstats

Tool for getting detailed cell statistics that can help identify issues with data model.

Usage

ic-cfstats [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>

-h Display help
-b Batch mode. Uses progress indicator that is friendly for running in batch jobs.
-r <limit> Limit read throughput to ratelimit MB/s (unlimited by default, 16 is probably a good starting point if you want to limit)
-n <num> Number of partitions to display in leaders lists
-t <name> Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.
-f <files> Comma separated list of Data.db SSTables to filter on

Output

Summary: Summary statistics about partitions

Column Description
Count (Size) Number of partition keys on this node
Rows (Size) (3.x only) Number of clustering rows
(deleted) (3.x only) Number of clustering row deletions
Total (Size) Total uncompressed size of all partitions on this node
Total (SSTable) Number of SSTables on this node
Minimum (Size) Minimum uncompressed partition size
Minimum (SSTable) Minimum number of SSTables a partition belongs to
Average (Size) Average (mean) uncompressed partition size
Average (SSTable) Average (mean) number of SSTables a partition belongs to
Std dev. (Size) Standard deviation of partition sizes
Std dev. (SSTable) Standard deviation of number of SSTables for a partition
50% (Size) Estimated 50th percentile of partition sizes
50% (SSTable) Estimated 50th percentile of SSTables for a partition
75% (Size) Estimated 75th percentile of partition sizes
75% (SSTable) Estimated 75th percentile of SSTables for a partition
90% (Size) Estimated 90th percentile of partition sizes
90% (SSTable) Estimated 90th percentile of SSTables for a partition
95% (Size) Estimated 95th percentile of partition sizes
95% (SSTable) Estimated 95th percentile of SSTables for a partition
99% (Size) Estimated 99th percentile of partition sizes
99% (SSTable) Estimated 99th percentile of SSTables for a partition
99.9% (Size) Estimated 99.9th percentile of partition sizes
99.9% (SSTable) Estimated 99.9th percentile of SSTables for a partition
Maximum (Size) Maximum uncompressed partition size
Maximum (SSTable) Maximum number of SSTables a partition belongs to

(3.x only) Row Histogram: Histogram of number of rows per partition

Column Description
Percentile Minimum, average, standard deviation (std dev.), percentile, maximum
Count Estimated number of rows per partition for the given percentile

Largest partitions: Partitions with largest uncompressed size

Column Description
Key The partition key
Size Total uncompressed size of the partition
Rows (3.x only) Total number of clustering rows in the partition
(deleted) (3.x only) Number of row deletions in the partition
Tombstones Number of cell or range tombstones
(droppable) Number of tombstones that can be dropped as per gc_grace_seconds
Cells Number of cells in the partition
SSTable Count Number of SSTables that contain the partition

Widest partitions: Partitions with the most cells

Column Description
Key The partition key
Rows (3.x only) Total number of clustering rows in the partition
(deleted) (3.x only) Number of row deletions in the partition
Cells Number of cells in the partition
Tombstones Number of cell or range tombstones
(droppable) Number of tombstones that can be dropped as per gc_grace_seconds
Size Total uncompressed size of the partition
SSTable Count Number of SSTables that contain the partition

(3.x only) Most Deleted Rows: Partitions with the most row deletions

Column Description
Key The partition key
Rows Total number of clustering rows in the partition
(deleted) Number of row deletions in the partition
Size Total uncompressed size of the partition
SSTable Count Number of SSTables that contain the partition

Tombstone Leaders: Partitions with the most tombstones

Column Description
Key The partition key
Tombstones Number of cell or range tombstones
(droppable) Number of tombstones that can be dropped as per gc_grace_seconds
Rows (3.x only) Total number of clustering rows in the partition
Cells Number of cells in the partition
Size Total uncompressed size of the partition
SSTable Count Number of SSTables that contain the partition

SSTable Leaders: Partitions that are in the most SSTables

Column Description
Key The partition key
SSTable Count Number of SSTables that contain the partition
Size Total uncompressed size of the partition
Rows (3.x only) Total number of clustering rows in the partition
Cells Number of cells in the partition
Tombstones Number of cell or range tombstones
(droppable) Number of tombstones that can be dropped as per gc_grace_seconds

SSTables: Metadata about SSTables as it relates to partitions.

Column Description
SSTable Data.db filename of SSTable
Size Uncompressed size
Min Timestamp Minimum cell timestamp in the SSTable
Max Timestamp Maximum cell timestamp in the SSTable
Partitions Number of partitions
(deleted) Number of row level partition deletions
(avg size) Average uncompressed partition size in SSTable
(max size) Maximum uncompressed partition size in SSTable
Rows (3.x only) Total number of clustering rows in SSTable
(deleted) (3.x only) Number of row deletions in SSTable
Cells Number of cells in the SSTable
Tombstones Number of cell or range tombstones in the SSTable
(droppable) Number of tombstones that are droppable according to gc_grace_seconds
(range) Number of range tombstones
Cell Liveness Percentage of live cells. Does not consider tombstones or cell updates shadowing cells. That is it is percentage of non-tombstoned cells to total number of cells.

ic-purge

Usage

ic-purge [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>

-h Display help
-b Batch mode. Uses progress indicator that is friendly for running in batch jobs.
-r <limit> Limit read throughput to ratelimit MB/s (unlimited by default, 16 is probably a good starting point if you want to limit)
-n <num> Number of partitions to display in leaders lists
-t <name> Snapshot to analyse. Snapshot is created if none is specified.

Output

Largest reclaimable partitions: Partitions with the largest amount of reclaimable data

Column Description
Key The partition key
Size Total uncompressed size of the partition
Reclaim Reclaimable uncompressed size
Generations SSTable generations the partition belongs to

Downloads

By Instaclustr Support

Related Articles

sstable
cassandra
sstables

GitHub - tolbertam/sstable-tools: Tools for parsing, creating and doing other fun stuff with sstables

tolbertam

2/17/2023

cassandra
tools

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra