Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

1/23/2018

Reading time:1 min

wildengineer/cassandra-data-copy-tool

by John Doe

This simple java based tool copies data from a live cassandra table to another. The source and destination tables do not need to be on the same cluster or keyspace. All you need to ensure is that the destination table is compatible with the source table.BuildThe project requires:Java 8Maven 2 or later versions to run.Once you've got your environment setup run from the project directory.mvn packageThis builds a jar in ./target/cassandra-data-copy-tool--SNAPSHOT.jar"ConfigurationTo configure the tool you'll need to create a properties file. Here's an example:copy.tables=table1,table2,table3=>other_table4,...,tableNcopy.ignoreColumns=tab1e1.columnX,table2.columnYcopy.batchSize=20000copy.queryPageSize=1000copy.batchesPerSecond=1source.cassandra.contactPoints=127.0.0.1source.cassandra.port=9142source.cassandra.keyspace=test_keyspacesource.cassandra.username=cassandrasource.cassandra.password=cassandradestination.cassandra.contactPoints=127.0.0.1destination.cassandra.port=9142destination.cassandra.keyspace=test_keyspacedestination.cassandra.username=cassandradestination.cassandra.password=cassandraPropertiesThere are three configuration groups:CopySourceDestinationCopyProperty NameDescriptionDefault Valuecopy.tablesA column delimited list of table names to copy. If the table names don't match between source and destination use source_table=>dest_table""copy.ignoreColumnsA comma delimited list of columns from the source to ignore. Format is TABLE_NAME.COLUMN_NAME""copy.batchSizeSize of batches to insert into destination database.20000copy.queryPageSizeSize of pages as read from source1000copy.batchesPerSecondMaximum rate of batches copied per second1SourceProperty NameDescriptionDefault Valuesource.cassandra.contactPointsComma delimited list of source cluster's contact points127.0.0.1source.cassandra.portSource cluster's port9142source.cassandra.keyspaceSource keyspace name""source.cassandra.usernameSource usernamecassandrasource.cassandra.passwordSource plaintext passwordcassandraDestinationProperty NameDescriptionDefault Valuedestination.cassandra.contactPointsComma delimited list of destination cluster's contact points127.0.0.1destination.cassandra.portDestination cluster's port9142destination.cassandra.keyspaceDestination keyspace name""destination.cassandra.usernameDestination usernamecassandradestination.cassandra.passwordDestination plaintext passwordcassandraRunOnce you have your property file ready, simply run:java -jar cassandra-data-copy-tool--SNAPSHOT.jar --spring.config.location=/path/to/config.properties

Illustration Image

This simple java based tool copies data from a live cassandra table to another. The source and destination tables do not need to be on the same cluster or keyspace. All you need to ensure is that the destination table is compatible with the source table.

Build

The project requires:

  • Java 8
  • Maven 2 or later versions to run.

Once you've got your environment setup run from the project directory.

mvn package

This builds a jar in ./target/cassandra-data-copy-tool--SNAPSHOT.jar"

Configuration

To configure the tool you'll need to create a properties file. Here's an example:

copy.tables=table1,table2,table3=>other_table4,...,tableN
copy.ignoreColumns=tab1e1.columnX,table2.columnY
copy.batchSize=20000
copy.queryPageSize=1000
copy.batchesPerSecond=1
source.cassandra.contactPoints=127.0.0.1
source.cassandra.port=9142
source.cassandra.keyspace=test_keyspace
source.cassandra.username=cassandra
source.cassandra.password=cassandra
destination.cassandra.contactPoints=127.0.0.1
destination.cassandra.port=9142
destination.cassandra.keyspace=test_keyspace
destination.cassandra.username=cassandra
destination.cassandra.password=cassandra

Properties

There are three configuration groups:

  • Copy
  • Source
  • Destination
Copy
Property Name Description Default Value
copy.tables A column delimited list of table names to copy. If the table names don't match between source and destination use source_table=>dest_table ""
copy.ignoreColumns A comma delimited list of columns from the source to ignore. Format is TABLE_NAME.COLUMN_NAME ""
copy.batchSize Size of batches to insert into destination database. 20000
copy.queryPageSize Size of pages as read from source 1000
copy.batchesPerSecond Maximum rate of batches copied per second 1
Source
Property Name Description Default Value
source.cassandra.contactPoints Comma delimited list of source cluster's contact points 127.0.0.1
source.cassandra.port Source cluster's port 9142
source.cassandra.keyspace Source keyspace name ""
source.cassandra.username Source username cassandra
source.cassandra.password Source plaintext password cassandra
Destination
Property Name Description Default Value
destination.cassandra.contactPoints Comma delimited list of destination cluster's contact points 127.0.0.1
destination.cassandra.port Destination cluster's port 9142
destination.cassandra.keyspace Destination keyspace name ""
destination.cassandra.username Destination username cassandra
destination.cassandra.password Destination plaintext password cassandra

Run

Once you have your property file ready, simply run:

java -jar cassandra-data-copy-tool--SNAPSHOT.jar --spring.config.location=/path/to/config.properties

Related Articles

data.modeling
open.source
cassandra

johnnywidth/cql-calculator

John Doe

6/17/2020

cassandra
tool

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra