Instructions to upgrade to DataStax Enterprise 5.0 from 4.7 or 4.8.
Follow these instructions to upgrade from DataStax Enterprise 4.7 or 4.8 to 5.0. If you have an earlier version, upgrade to 4.8 before continuing.
Read and understand these instructions before upgrading.
Apache Cassandra™ version change
- DataStax Enterprise 5.1 uses Cassandra 3.11.
- DataStax Enterprise 5.0 uses Cassandra 3.0.
- DataStax Enterprise 4.7 to 4.8 uses Cassandra 2.1.
- DataStax Enterprise 4.0 to 4.6 uses Cassandra 2.0.
General recommendations
DataStax recommends backing up your data prior to any version upgrade. A backup provides the ability to revert and restore all the data used in the previous version if necessary.
General restrictions and limitations during the upgrade process
Restrictions and limitations apply while a cluster is in a partially upgraded state.
With these exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.
- General upgrade restrictions
-
- Do not enable new features.
- Do not run nodetool repair. If you have the OpsCenter Repair Service configured, turn off the Repair Service.
- During the upgrade, do not bootstrap or decommission nodes.
- Do not issue these types of CQL queries during a rolling restart:
DDL
andTRUNCATE
. - During the upgrade, the nodes on different versions might show a schema disagreement.
- Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.
- Restrictions for DSE Analytic (Hadoop and Spark) nodes
-
- Do not run analytics jobs until all nodes are upgraded.
- Kill all Spark worker processes before you stop the node and install the new version.
- DSE Search (Solr) upgrade restrictions and limitations
-
- Do not update schemas.
- Do not reindex DSE Search nodes during upgrade.
- Do not issue these types of queries during a rolling restart:
DDL
orTRUNCATE
. - While mixed versions of nodes exist during an upgrade, DataStax Enterprise runs two different servers for backward compatibility. One based on shard_transport_options, the other based on internode_messaging_options. (These options are located in dse.yaml.) After all nodes are upgraded to 5.0, internode_messaging_options are used. The internode_messaging_options are used by several components of DataStax Enterprise. For 5.0 and later, all internode messaging requests use this service.
- Restrictions for nodes using any kind of security
-
- Do not change security credentials or permissions until after the upgrade is complete.
- If you are not already using Kerberos, do not set up Kerberos authentication before upgrading. First upgrade the cluster, and then set up Kerberos.
- Upgrading drivers and possible impact when driver versions are incompatible
- Be sure to check driver compatibility. Depending on the driver version, you might need to recompile your client application code. See Upgrading DataStax drivers.
Preparing to upgrade
- Before upgrading, be sure that each node has ample free disk space.
The required space depends on the compaction strategy. See Disk space in Planning and testing DataStax Enterprise deployments.
- Familiarize yourself with the changes and features in this
release:
- Be sure your platform is supported.
- Oracle Java SE Runtime Environment 8 (JDK) (1.8.0_40 minimum) or OpenJDK 8. Earlier or later versions are not supported.
- DataStax Enterprise 5.0 release notes.
- General upgrading advice for any version and New features for Apache Cassandra™ 3.0 in NEWS.txt. Be sure to read the NEWS.txt all the way back to your current version.
- Apache Cassandra™ changes in CHANGES.txt.
- DataStax Enterprise 5.0 production-certified changes to Apache Cassandra.
- DataStax driver changes.
- Verify your current product version. If necessary, upgrade to an interim version:
DataStax Enterprise 4.7 or 4.8 DataStax Enterprise 5.0 DataStax Enterprise 4.0, 4.5, or 4.6 DataStax Enterprise 4.8 DataStax Community or open source Apache Cassandra™ 2.0.x DataStax Enterprise 4.8 DataStax Community 3.0.x No interim version required. DataStax Distribution of Apache Cassandra™ 3.x Upgrade not available. - Upgrade the SSTables on each node to ensure that all SSTables
are on the current version. This is required for DataStax Enterprise upgrades that include changes.
Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage.
$ nodetool upgradesstables
If the SSTables are already on the current version, the command returns immediately and no action is taken. See SSTable compatibility and upgrade version.
Use the
--jobs
option to set the number of SStables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads. - Verify the Java runtime version and upgrade to the recommended
version.
$ java -version
The latest version of Oracle Java SE Runtime Environment 8 (JDK) (1.8.0_40 minimum) or OpenJDK 8 is recommended. The JDK is recommended for development and production systems. The JDK provides useful troubleshooting tools that are not in the JRE, such as jstack, jmap, jps, and jstat.
- Run nodetool repair to ensure that data on each replica is consistent with data on other nodes.
- DSE Search partition key namesThe partition key names of COMPACT STORAGE tables backed by DSE Search indexes match the uniqueKey in schema.xml. For example, consider the following table is created with compact storage:
CREATE TABLE keyspace_name.table_name (key text PRIMARY KEY, foo text, solr_query text) WITH COMPACT STORAGE
and the Solr schema.xml is:... <uniqueKey>id</uniqueKey> ...
then rename the key in the table to match the schema:ALTER TABLE ks.table RENAME key TO id;
- Back up the files you use to a folder
that is not in the directory where you normally run commands.
The configuration files are overwritten with default values during installation of the new version.
Upgrade steps
Tip: The DataStax installer upgrades DataStax Enterprise and automatically performs many upgrade tasks.
- Upgrade order matters. Upgrade nodes in this order:
- In multiple datacenter clusters, upgrade every node in one datacenter before upgrading another datacenter.
- Upgrade the seed nodes within a datacenter first.
For DSE Analytics nodes using DSE Hadoop, upgrade the Job Tracker node first. Then upgrade Hadoop nodes, followed by Spark nodes.
- Upgrade types in this order:
- DSE Analytics nodes or datacenters
- Transactional/DSE Graph nodes or datacenters
- DSE Search nodes or datacenters
- DSE Analytics nodes: Kill all Spark worker processes.
- DSE Search nodes: Review these considerations and take appropriate actions:
- Run nodetool drain to flush the commit log of the old
installation:
$ nodetool drain -h hostname
This step saves time when nodes start up after the upgrade and prevents DSE Search nodes from having to reindex data.Important: This step is mandatory when upgrading between major Cassandra versions that change SSTable formats, rendering commit logs from the previous version incompatible with the new version.
- Stop the node:
- Use the appropriate method to install the new product version on a
supported platform:
Note: Install the new product version using the same installation type that is on the system. The upgrade proceeds with installation regardless of the installation type and might result in issues.
- If the cluster will run Hadoop in a Kerberos secure environment, change
the task-controller file ownership to root and access permissions
to 4750. For
example:
$ sudo chown root /usr/share/dse/resources/hadoop/native/Linux-amd64-64/bin/task-controller $ sudo chmod 4750 /usr/share/dse/resources/hadoop/native/Linux-amd64-64/bin/task-controller
Package installations only: The default location of the
task-controller
file should be /usr/share/dse/resources/hadoop/native/Linux-amd64-64/bin/task-controller. - To configure the new product version:
- Compare your backup files to the
new configuration files:
- Look for any deprecated, removed, or changed settings.
- Be sure you are familiar with the Apache Cassandra and DataStax Enterprise changes and features in the new release.
- Merge the applicable modifications into the new version.
- Compare your backup files to the
new configuration files:
- Start the node.
- Installer-Services and Package installations: See Starting DataStax Enterprise as a service.
- Installer-No Services and Tarball installations: See Starting DataStax Enterprise as a stand-alone process.
- Verify that the upgraded datacenter names match the datacenter
names in the keyspace schema definition:
$ nodetool status
- Review the logs for warnings, errors, and exceptions. Because DataStax Enterprise 5.0 uses
Cassandra 3.0, the output.log might include warnings about the following:
- sstable_compression
- chunk_length_kb
- memory_allocator
- memtable_allocation_type
- offheap_objects
- netty_server_port - used only during the upgrade to 5.0. After all nodes are running 5.0, requests that are coordinated by this node no longer contact other nodes on this port. Instead requests use inter-node messaging options. The internode_messaging_options are used by several components of DataStax Enterprise. For 5.0 and later, all internode messaging requests use this service.
Warnings, errors, and exceptions are frequently found in the logs when starting up an upgraded node. Some of these log entries are informational to help you execute specific upgrade-related steps. If you find unexpected warnings, errors, or exceptions, contact DataStax Support.
During upgrade of DSE Analytics nodes, exceptions about the Task Tracker are logged in the nodes that are not yet upgraded to 5.0. The jobs succeed after the entire cluster is upgraded.
- Repeat the upgrade on each node in the cluster following the recommended .
- After all nodes are upgraded, drop the following legacy tables: system_auth.users,
system_auth.credentials and system_auth.permissions.
As described in Cassandra NEWS.txt, the authentication and authorization subsystems have been redesigned to support role-based access control (RBAC), which results in a change to the schema of the system_auth keyspace.
- After the new version is install on all nodes, upgrade the
SSTables:
$ nodetool upgradesstables
Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.
Use the
--jobs
option to set the number of SStables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads. - For multiple datacenter deployments, change the replication factor of the system_distributed keyspace to NetworkTopologyStrategy.
- If you use the OpsCenter Repair Service, turn on the Repair Service.