Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

2/7/2020

Reading time:8 min

Upgrading Apache Cassandra to DataStax Enterprise

by John Doe

Upgrading Apache Cassandra™ to DataStax Enterprise.Tip: DataStax is offering a complimentary half-day Upgrade Assessment. This assessment is a DataStax Services engagement designed to assess the upgrade compatibility of your existing Apache Cassandra™ deployment to DSE versions 5.1, 6.0, and 6.7. Contact the DataStax Services team to schedule your assessment.Attention: Read and understand these instructions before upgrading. Carefully reviewing the planning and upgrade instructions can prevent errors and data loss. In addition, review the DataStax Enterprise release notes for the target upgrade version: 4.8, 5.0, 5.1, 6.0, 6.7.Back up your existing installationWarning: DataStax recommends backing up your data prior to any version upgrade.A backup provides the ability to revert and restore all the data used in the previous version if necessary. For manual backup instructions, see Backing up and restoring DSE.Tip: The general backup and restore operations are identical between Cassandra and DSE. Change the directory names and any DSE specific commands as required.Upgrade SSTablesWarning: Be certain to upgrade SSTables on your nodes both before and after upgrading. Failure to upgrade SSTables will result in severe performance penalties and possible data loss.Upgrade restrictions and limitationsUpgrade pathsUpgrades are impacted by the version you are upgrading from and the version you are upgrading to. The greater the gap between the current version and the target version, the more complex the upgrade. Upgrades from earlier versions may require an interim upgrade to a required version:Cassandra 3.0 and 3.11DSE 6.7Not requiredCassandra 3.0 and 3.11Latest version of DSE 6.0 (6.0.11)Not requiredCassandra 3.0 and 3.11DSE 5.1Not requiredCassandra 3.0DSE 5.0Not requiredCassandra 2.1DSE 5.0DSE 4.8Cassandra 2.0 and earlierCassandra 2.1Questions? Contact DataStax Support.General restrictionsDo not enable new features.Do not run nodetool repair.During the upgrade, do not bootstrap new nodes or decommission existing nodes.Do not enable Change Data Capture (CDC) on a mixed-version cluster. Upgrade all nodes to DSE 5.1 or later before enabling CDC.Do not issue TRUNCATE or DDL related queries during the upgrade process.Restrictions for nodes using securityDo not change security credentials or permissions until the upgrade is complete on all nodes.If you are not already using Kerberos, do not set up Kerberos authentication before upgrading. First upgrade the cluster, and then set up Kerberos.Driver version impactsBe sure to check driver compatibility. Depending on the driver version, you might need to recompile your client application code. See .During upgrades, you might experience driver-specific impact when clusters have mixed versions of drivers. If your cluster has mixed versions, the protocol version is negotiated with the first host to which the driver connects, although certain drivers, such as Java 4.x/2.x automatically select a protocol version that works across nodes. To avoid driver version incompatibility during upgrades, use one of these workarounds:Protocol version: Set the protocol version explicitly in your application at start up. Switch to the Java driver to the new protocol version only after the upgrade is complete on all nodes in the cluster.Initial contact points: Ensure that the list of initial contact points contains only hosts with the oldest DSE version or protocol version. For example, the initial contact points contain only protocol version 2.For details on protocol version negotiation, see protocol versions with mixed clusters in the Java driver version you're using, for example,Java driver.Preparing to upgradeFollow these steps to prepare each node for the upgrade:Familiarize yourself with the changes and features in the new release:DataStax Enterprise release notes for the target upgrade version: 4.8, 5.0, 5.1, 6.0, 6.7.General upgrade advice and Cassandra features in NEWS.txt/DSE CHANGES.txt. If you are upgrading from an earlier release, read NEWS.txt all the way back to your current version.Ensure that your version of Cassandra can be upgraded directly to the version of Cassandra that is used by DataStax Enterprise. See the Cassandra changes in CHANGES.txt/DSE CHANGES.txt.Before upgrading, be sure that each node has adequate free disk space.Determine current DSE data disk space usage:sudo du -sh /var/lib/cassandra/data/3.9G /var/lib/cassandra/data/Determine available disk space:sudo df -hT /Filesystem Type Size Used Avail Use% Mounted on/dev/sda1 ext4 59G 16G 41G 28% /Important: The required space depends on the compaction strategy. See Disk spaceUpgrade the SSTables on each node to ensure that all SSTables are on the current version:nodetool upgradesstablesWarning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage.Tip: Use the--jobsoption to set the number of SSTables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads. DataStax recommends running theupgradesstablescommand on one node at a time or when using racks, one rack at a time.If the SSTables are already on the current version, the command returns immediately and no action is taken.Verify the Java runtime version and upgrade to the recommended version.java -versionopenjdk version "1.8.0_222"OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)Recommended: OpenJDK 8 (1.8.0_151 minimum)Note: Recommendation changed due to the end of public updates for Oracle JRE/JDK 8. See Oracle Java SE Support Roadmap.Supported: Oracle Java SE 8 (JRE or JDK) (1.8.0_151 minimum)Important: Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8.Run nodetool repair to ensure that data on each replica is consistent with data on other nodes:nodetool repair -prInstall the libaio package for optimal performance.RHEL platforms:sudo yum install libaioDebian:sudo apt-get install libaio1Back up any customized files since they may be overwritten with default values during installation of the new version.Tip: If you backed up your installation using instructions in Backing up and restoring DSE, your original configuration files are included in the archive.Upgrade stepsFollow these steps on each node in the recommended. The upgrade process requires upgrading and restarting one node at a time.Note: These steps are performed in your upgraded version and use DSE 5.1 documentation.Flush the commit log of the current installation:nodetool drainStop the node. (2.1, 2.2, 3.0)Uninstall Cassandra.Note: If you installed Cassandra from packages in APT or RPM repositories, you must remove the packages before setting up and installing DDAC.For packages installed from APT repositories:sudo apt-get autoremove "dsc*" "cassandra*" "apache-cassandra*"This action shuts down Cassandra if it is still running.For packages installed from Yum repositories:sudo yum remove "dsc*" "cassandra*" "apache-cassandra*"The old Cassandra configuration file might be renamed to cassandra.yaml.rpmsave, for example:warning: /etc/cassandra/default.conf/cassandra.yamlsaved as /etc/cassandra/default.conf/cassandra.yaml.rpmsaveIf Cassandra was installed with a binary tarball:ps auwx | grep cassandrasudo kill cassandra_pidAnd then remove the Cassandra installation directory.Install DSE using the appropriate instructions: 4.8 5.0 5.1 6.0 6.7.To configure the new product version:Compare changes in the new files with the backup configuration files after the upgrade but before restarting, remove deprecated settings, and update any new settings if required.Warning: Do not simply replace new configuration files with old. Rather compare your old files to the new files and make any required changes.Tip: Use the DSEyaml_diff toolto compare backup YAML files with the upgraded YAML files:cd /usr/share/dse/tools/yamls./yaml_diff path/to/yaml-file-old path/to/yaml-file-new... CHANGES =========authenticator:- AllowAllAuthenticator+ com.datastax.bdp.cassandra.auth.DseAuthenticatorauthorizer:- AllowAllAuthorizer+ com.datastax.bdp.cassandra.auth.DseAuthorizerroles_validity_in_ms:- 2000+ 120000...If upgrading from Cassandra 3.11.2 or later, comment out the following parameters in cassandra.yaml if they exist:enable_materialized_viewsenable_sasi_indexesTip: See for the location of Cassandra configuration files.Start the node using the appropriate method: 4.8 5.0 5.1 6.0 6.7.Verify that the upgraded datacenter names match the datacenter names in the keyspace schema definition:Get the node's datacenter name:nodetool status | grep "Datacenter"Datacenter: datacenter-nameVerify that the node's datacenter name matches the datacenter name for a keyspace:cqlsh --execute "DESCRIBE KEYSPACE keyspace-name;" | grep "replication"CREATE KEYSPACE keyspace-name WITH replication = {'class': 'NetworkTopologyStrategy, 'datacenter-name': '3'};Review the logs for warnings, errors, and exceptions:grep -w 'WARNING\|ERROR\|exception' /var/log/cassandra/*.logWarnings, errors, and exceptions are frequently found in the logs when starting an upgraded node. Some of these log entries are informational to help you execute specific upgrade-related steps. If you find unexpected warnings, errors, or exceptions, contactDataStax Support.Tip: Non standard log locations are configured in .Run nodetool repair:bin/nodetool repair -prImportant: Be sure to run nodetool repair on each node in the datacenter.Repeat the upgrade process on each node in the cluster following the recommended .After the entire cluster upgrade is complete: upgrade the SSTables on one node at a time or, when using racks, one rack at a time.Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage and possible data loss. Upgrading is not complete until the SSTables are upgraded.nodetool upgradesstablesTip: Use the--jobsoption to set the number of SSTables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads. DataStax recommends running theupgradesstablescommand on one node at a time or when using racks, one rack at a time.Important: You can run theupgradesstablescommand before all the nodes are upgraded as long as you run the command on only one node at a time or when using racks, one rack at a time. Runningupgradesstableson too many nodes at once will degrade performance.

Illustration Image

Upgrading Apache Cassandra™ to DataStax Enterprise.

Tip: DataStax is offering a complimentary half-day Upgrade Assessment. This assessment is a DataStax Services engagement designed to assess the upgrade compatibility of your existing Apache Cassandra™ deployment to DSE versions 5.1, 6.0, and 6.7. Contact the DataStax Services team to schedule your assessment.

Attention: Read and understand these instructions before upgrading. Carefully reviewing the planning and upgrade instructions can prevent errors and data loss. In addition, review the DataStax Enterprise release notes for the target upgrade version: 4.8, 5.0, 5.1, 6.0, 6.7.

Back up your existing installation

Warning: DataStax recommends backing up your data prior to any version upgrade.

A backup provides the ability to revert and restore all the data used in the previous version if necessary. For manual backup instructions, see Backing up and restoring DSE.

Tip: The general backup and restore operations are identical between Cassandra and DSE. Change the directory names and any DSE specific commands as required.

Upgrade SSTables

Warning: Be certain to upgrade SSTables on your nodes both before and after upgrading. Failure to upgrade SSTables will result in severe performance penalties and possible data loss.

Upgrade restrictions and limitations

Upgrade paths

Upgrades are impacted by the version you are upgrading from and the version you are upgrading to. The greater the gap between the current version and the target version, the more complex the upgrade. Upgrades from earlier versions may require an interim upgrade to a required version:
Cassandra 3.0 and 3.11 DSE 6.7 Not required
Cassandra 3.0 and 3.11 Latest version of DSE 6.0 (6.0.11) Not required
Cassandra 3.0 and 3.11 DSE 5.1 Not required
Cassandra 3.0 DSE 5.0 Not required
Cassandra 2.1 DSE 5.0 DSE 4.8
Cassandra 2.0 and earlier Cassandra 2.1

Questions? Contact DataStax Support.

General restrictions

  • Do not enable new features.
  • Do not run nodetool repair.
  • During the upgrade, do not bootstrap new nodes or decommission existing nodes.
  • Do not enable Change Data Capture (CDC) on a mixed-version cluster. Upgrade all nodes to DSE 5.1 or later before enabling CDC.
  • Do not issue TRUNCATE or DDL related queries during the upgrade process.

Restrictions for nodes using security

  • Do not change security credentials or permissions until the upgrade is complete on all nodes.
  • If you are not already using Kerberos, do not set up Kerberos authentication before upgrading. First upgrade the cluster, and then set up Kerberos.

Driver version impacts

Be sure to check driver compatibility. Depending on the driver version, you might need to recompile your client application code. See .

During upgrades, you might experience driver-specific impact when clusters have mixed versions of drivers. If your cluster has mixed versions, the protocol version is negotiated with the first host to which the driver connects, although certain drivers, such as Java 4.x/2.x automatically select a protocol version that works across nodes. To avoid driver version incompatibility during upgrades, use one of these workarounds:
  • Protocol version: Set the protocol version explicitly in your application at start up. Switch to the Java driver to the new protocol version only after the upgrade is complete on all nodes in the cluster.
  • Initial contact points: Ensure that the list of initial contact points contains only hosts with the oldest DSE version or protocol version. For example, the initial contact points contain only protocol version 2.
For details on protocol version negotiation, see protocol versions with mixed clusters in the Java driver version you're using, for example,Java driver.

Preparing to upgrade

Follow these steps to prepare each node for the upgrade:

  1. Familiarize yourself with the changes and features in the new release:
    • DataStax Enterprise release notes for the target upgrade version: 4.8, 5.0, 5.1, 6.0, 6.7.
    • General upgrade advice and Cassandra features in NEWS.txt/DSE CHANGES.txt. If you are upgrading from an earlier release, read NEWS.txt all the way back to your current version.
    • Ensure that your version of Cassandra can be upgraded directly to the version of Cassandra that is used by DataStax Enterprise. See the Cassandra changes in CHANGES.txt/DSE CHANGES.txt.
  2. Before upgrading, be sure that each node has adequate free disk space.
    Determine current DSE data disk space usage:
    sudo du -sh /var/lib/cassandra/data/
    3.9G    /var/lib/cassandra/data/
    Determine available disk space:
    sudo df -hT /
    Filesystem     Type  Size  Used Avail Use% Mounted on
    /dev/sda1      ext4   59G   16G   41G  28% /

    Important: The required space depends on the compaction strategy. See Disk space

  3. Upgrade the SSTables on each node to ensure that all SSTables are on the current version:
    nodetool upgradesstables

    Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage.

    Tip: Use the--jobsoption to set the number of SSTables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads. DataStax recommends running theupgradesstablescommand on one node at a time or when using racks, one rack at a time.

    If the SSTables are already on the current version, the command returns immediately and no action is taken.

  4. Verify the Java runtime version and upgrade to the recommended version.
    java -version
    openjdk version "1.8.0_222"
    OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
    OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)

    Important: Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8.

  5. Run nodetool repair to ensure that data on each replica is consistent with data on other nodes:
    nodetool repair -pr
  6. Install the libaio package for optimal performance.
    RHEL platforms:
    sudo yum install libaio
    Debian:
    sudo apt-get install libaio1
  7. Back up any customized files since they may be overwritten with default values during installation of the new version.

    Tip: If you backed up your installation using instructions in Backing up and restoring DSE, your original configuration files are included in the archive.

Upgrade steps

Follow these steps on each node in the recommended. The upgrade process requires upgrading and restarting one node at a time.

Note: These steps are performed in your upgraded version and use DSE 5.1 documentation.

  1. Flush the commit log of the current installation:
    nodetool drain
  2. Stop the node. (2.1, 2.2, 3.0)
  3. Uninstall Cassandra.

    Note: If you installed Cassandra from packages in APT or RPM repositories, you must remove the packages before setting up and installing DDAC.

    • For packages installed from APT repositories:
      sudo apt-get autoremove "dsc*" "cassandra*" "apache-cassandra*"

      This action shuts down Cassandra if it is still running.

    • For packages installed from Yum repositories:
      sudo yum remove "dsc*" "cassandra*" "apache-cassandra*"

      The old Cassandra configuration file might be renamed to cassandra.yaml.rpmsave, for example:

      warning: /etc/cassandra/default.conf/cassandra.yaml
      saved as /etc/cassandra/default.conf/cassandra.yaml.rpmsave
    • If Cassandra was installed with a binary tarball:
      ps auwx | grep cassandra
      sudo  kill cassandra_pid

      And then remove the Cassandra installation directory.

  4. Install DSE using the appropriate instructions: 4.8 5.0 5.1 6.0 6.7.
  5. To configure the new product version:

    1. Compare changes in the new files with the backup configuration files after the upgrade but before restarting, remove deprecated settings, and update any new settings if required.

      Warning: Do not simply replace new configuration files with old. Rather compare your old files to the new files and make any required changes.

      Tip: Use the DSEyaml_diff toolto compare backup YAML files with the upgraded YAML files:
      cd /usr/share/dse/tools/yamls
      ./yaml_diff path/to/yaml-file-old path/to/yaml-file-new
      ...
       CHANGES 
      =========
      authenticator:
      - AllowAllAuthenticator
      + com.datastax.bdp.cassandra.auth.DseAuthenticator
      authorizer:
      - AllowAllAuthorizer
      + com.datastax.bdp.cassandra.auth.DseAuthorizer
      roles_validity_in_ms:
      - 2000
      + 120000
      ...
  6. If upgrading from Cassandra 3.11.2 or later, comment out the following parameters in cassandra.yaml if they exist:
    • enable_materialized_views
    • enable_sasi_indexes

    Tip: See for the location of Cassandra configuration files.

  7. Start the node using the appropriate method: 4.8 5.0 5.1 6.0 6.7.
  8. Verify that the upgraded datacenter names match the datacenter names in the keyspace schema definition:
    • Get the node's datacenter name:
      nodetool status | grep "Datacenter"
      Datacenter: datacenter-name
    • Verify that the node's datacenter name matches the datacenter name for a keyspace:
      cqlsh --execute "DESCRIBE KEYSPACE keyspace-name;" | grep "replication"
      CREATE KEYSPACE keyspace-name WITH replication = {'class': 'NetworkTopologyStrategy, 'datacenter-name': '3'};
  9. Review the logs for warnings, errors, and exceptions:
    grep -w 'WARNING\|ERROR\|exception' /var/log/cassandra/*.log
    Warnings, errors, and exceptions are frequently found in the logs when starting an upgraded node. Some of these log entries are informational to help you execute specific upgrade-related steps. If you find unexpected warnings, errors, or exceptions, contactDataStax Support.

    Tip: Non standard log locations are configured in .

  10. Run nodetool repair:
    bin/nodetool repair -pr

    Important: Be sure to run nodetool repair on each node in the datacenter.

  11. Repeat the upgrade process on each node in the cluster following the recommended .
  12. After the entire cluster upgrade is complete: upgrade the SSTables on one node at a time or, when using racks, one rack at a time.

    Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage and possible data loss. Upgrading is not complete until the SSTables are upgraded.

    nodetool upgradesstables
    Tip: Use the--jobsoption to set the number of SSTables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads. DataStax recommends running theupgradesstablescommand on one node at a time or when using racks, one rack at a time.
    Important: You can run theupgradesstablescommand before all the nodes are upgraded as long as you run the command on only one node at a time or when using racks, one rack at a time. Runningupgradesstableson too many nodes at once will degrade performance.

Related Articles

cluster
troubleshooting
datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

arodrime

4/3/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra