Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

11/13/2018

Reading time:1 min

phact/dse-cluster-migration

by John Doe

NOTE: If you are looking to perform a migration across two clusters, make sureyou deploy two clusters!This is a Demo for Migrating DSE and cassandra clusters (or even tables within the same cluster) using DSE Analytics / Spark.MotivationMoving data across clusters for one time migrations or bulk migrations are relatively common. DSE Analytics Makes this process almost trivial for users that are well versed in Spark. In a previous blog post I attempted to make this process simpler for users with minimal spark experience. This asset aims to make this process easy even for users that with no spark experience at all.What is included?This field asset (demo) includes the following:dse-cluster-migration Spark JobPerforms migration using massively parallel Spark computeCode is visible in the Che web IDE which ships with this assetMigUI (Migration UI) web appicationReact Redux frontendDropwizard backendUses webhdfs to programatically upload the spark job jar to DSEFSUses the DSE only CQL Spark interface (currently an internal undocumented API) to submit the spark job to DSEBusiness Take AwaysDSE is the best distribution of Apache Cassandra and the easiest to use. By taking advantage of the migration capabilities in DSE analytics, projects can get off the ground faster and complex business requirements have a shorter time to Value.Technical Take AwaysIn some DSE to DSE or c* to DSE scenarios, there are a few cases in which a cluster migration is easier to perform than an upgrade. When the source cluster needs to remain place (i.e. data migrations accross environments, DEV <-> SIT <-> UAT <-> PROD) DSE Analytics can be the right solution.Look at this asset if you are interested in:Using Spark to migrate data from one cluster to anotherUsing Spark to move a table to another table in the same clusterProgramatically writing to DSEFS from JavaProgramatically using CQL to kick off DSE Analytics jobs (NOTE this is unsupported and undocumented at this time)

Illustration Image

NOTE: If you are looking to perform a migration across two clusters, make sure you deploy two clusters!

This is a Demo for Migrating DSE and cassandra clusters (or even tables within the same cluster) using DSE Analytics / Spark.

Motivation

Moving data across clusters for one time migrations or bulk migrations are relatively common. DSE Analytics Makes this process almost trivial for users that are well versed in Spark. In a previous blog post I attempted to make this process simpler for users with minimal spark experience. This asset aims to make this process easy even for users that with no spark experience at all.

What is included?

This field asset (demo) includes the following:

  • dse-cluster-migration Spark Job
  • Performs migration using massively parallel Spark compute
  • Code is visible in the Che web IDE which ships with this asset
  • MigUI (Migration UI) web appication
  • React Redux frontend
  • Dropwizard backend
  • Uses webhdfs to programatically upload the spark job jar to DSEFS
  • Uses the DSE only CQL Spark interface (currently an internal undocumented API) to submit the spark job to DSE

Business Take Aways

DSE is the best distribution of Apache Cassandra and the easiest to use. By taking advantage of the migration capabilities in DSE analytics, projects can get off the ground faster and complex business requirements have a shorter time to Value.

Technical Take Aways

In some DSE to DSE or c* to DSE scenarios, there are a few cases in which a cluster migration is easier to perform than an upgrade. When the source cluster needs to remain place (i.e. data migrations accross environments, DEV <-> SIT <-> UAT <-> PROD) DSE Analytics can be the right solution.

Look at this asset if you are interested in: Using Spark to migrate data from one cluster to another Using Spark to move a table to another table in the same cluster Programatically writing to DSEFS from Java Programatically using CQL to kick off DSE Analytics jobs (NOTE this is unsupported and undocumented at this time)

Related Articles

migration
proxy
datastax

GitHub - datastax/zdm-proxy: An open-source component designed to seamlessly handle the real-time client application activity while a migration is in progress.

datastax

11/1/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

datastax