Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

12/14/2021

Reading time:2 min

Step 4: Configure DSBulk settings

by John Doe

This section outlines the steps required to configure DSBulk for data upload to Amazon Keyspaces. You configure DSBulk by using a configuration file. You specify the configuration file directly from the command line. Create a DSBulk configuration file for the migration to Amazon Keyspaces, in this example we use the file name dsbulk_keyspaces.conf. Specify the following settings in the DSBulk configuration file. PlainTextAuthProvider – Create the authentication provider with the PlainTextAuthProvider class. ServiceUserName and ServicePassword should match the user name and password you obtained when you generated the service-specific credentials by following the steps at Creating credentials to access Amazon Keyspaces programmatically. local-datacenter – Set the value for local-datacenter to the AWS Region that you're connecting to. For example, if the application is connecting to cassandra.us-east-2.amazonaws.com, then set the local data center to us-east-2. For all available AWS Regions, see Service endpoints for Amazon Keyspaces. SSLEngineFactory – To configure SSL/TLS, initialize the SSLEngineFactory by adding a section in the configuration file with a single line that specifies the class with class = DefaultSslEngineFactory. Provide the path to cassandra_truststore.jks and the password that you created previously. consistency –Set the consistency level to LOCAL QUORUM and turn off the token_metadata setting. Other write consistency levels are not supported, for more information see Supported Apache Cassandra consistency levels in Amazon Keyspaces. The following is the complete sample configuration file. datastax-java-driver {basic.contact-points = [ "cassandra.us-east-2.amazonaws.com:9142"]advanced.auth-provider { class = PlainTextAuthProvider username = "ServiceUserName" password = "ServicePassword"}basic.load-balancing-policy { local-datacenter = "us-east-2"}basic.request { consistency = LOCAL_QUORUM default-idempotence = true}advanced.ssl-engine-factory { class = DefaultSslEngineFactory truststore-path = "./cassandra_truststore.jks" truststore-password = "my_password" hostname-validation = false }advanced.metadata { schema { token-map.enabled = false }}} Review the parameters for the DSBulk load command. executor.maxPerSecond – The maximum number of rows that the load command attempts to process concurrently per second. If unset, this setting is disabled with -1. Set executor.maxPerSecond based on the number of WCUs that you provisioned to the target destination table. The executor.maxPerSecond of the load command isn’t a limit – it’s a target average. This means it can (and often does) burst above the number you set. To allow for bursts and make sure that enough capacity is in place to handle the data load requests, set executor.maxPerSecond to 90% of the table’s write capacity. executor.maxPerSecond = WCUs * .90 In this tutorial, we set executor.maxPerSecond to 5. NoteIf you are using DSBulk 1.6.0 or higher, you can use dsbulk.engine.maxConcurrentQueries instead. Configure these additional parameters for the DSBulk load command. batch-mode – This parameter tells the system to group operations by partition key. Because this could interfere with other settings, we recommend to disable batch mode. driver.advanced.retry-policy-max-retries – This determines how many times to retry a failed query. If unset, the default is 10. You can adjust this value as needed. driver.basic.request.timeout – The time in minutes the system waits for a query to return. If unset, the default is "5 minutes". You can adjust this value as needed. Javascript is disabled or is unavailable in your browser.To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions. Document ConventionsStep 3: Set throughput capacity for the tableStep 5: Run the DSBulk load command

Illustration Image

This section outlines the steps required to configure DSBulk for data upload to Amazon Keyspaces. You configure DSBulk by using a configuration file. You specify the configuration file directly from the command line.

  1. Create a DSBulk configuration file for the migration to Amazon Keyspaces, in this example we use the file name dsbulk_keyspaces.conf. Specify the following settings in the DSBulk configuration file.

    1. PlainTextAuthProvider – Create the authentication provider with the PlainTextAuthProvider class. ServiceUserName and ServicePassword should match the user name and password you obtained when you generated the service-specific credentials by following the steps at Creating credentials to access Amazon Keyspaces programmatically.

    2. local-datacenter – Set the value for local-datacenter to the AWS Region that you're connecting to. For example, if the application is connecting to cassandra.us-east-2.amazonaws.com, then set the local data center to us-east-2. For all available AWS Regions, see Service endpoints for Amazon Keyspaces.

    3. SSLEngineFactory – To configure SSL/TLS, initialize the SSLEngineFactory by adding a section in the configuration file with a single line that specifies the class with class = DefaultSslEngineFactory. Provide the path to cassandra_truststore.jks and the password that you created previously.

    4. consistency –Set the consistency level to LOCAL QUORUM and turn off the token_metadata setting. Other write consistency levels are not supported, for more information see Supported Apache Cassandra consistency levels in Amazon Keyspaces.

    The following is the complete sample configuration file.

    datastax-java-driver {
    basic.contact-points = [ "cassandra.us-east-2.amazonaws.com:9142"]
    advanced.auth-provider {
        class = PlainTextAuthProvider
        username = "ServiceUserName"
        password = "ServicePassword"
    }
    basic.load-balancing-policy {
        local-datacenter = "us-east-2"
    }
    basic.request {
        consistency = LOCAL_QUORUM
        default-idempotence = true
    }
    advanced.ssl-engine-factory {
        class = DefaultSslEngineFactory
        truststore-path = "./cassandra_truststore.jks"
        truststore-password = "my_password"
        hostname-validation = false
      }
    advanced.metadata {
        schema {
          token-map.enabled = false
        }
    }
    }
  2. Review the parameters for the DSBulk load command.

    1. executor.maxPerSecond – The maximum number of rows that the load command attempts to process concurrently per second. If unset, this setting is disabled with -1.

      Set executor.maxPerSecond based on the number of WCUs that you provisioned to the target destination table. The executor.maxPerSecond of the load command isn’t a limit – it’s a target average. This means it can (and often does) burst above the number you set. To allow for bursts and make sure that enough capacity is in place to handle the data load requests, set executor.maxPerSecond to 90% of the table’s write capacity.

      executor.maxPerSecond = WCUs * .90

      In this tutorial, we set executor.maxPerSecond to 5.

      Note

      If you are using DSBulk 1.6.0 or higher, you can use dsbulk.engine.maxConcurrentQueries instead.

    2. Configure these additional parameters for the DSBulk load command.

      • batch-mode – This parameter tells the system to group operations by partition key. Because this could interfere with other settings, we recommend to disable batch mode.

      • driver.advanced.retry-policy-max-retries – This determines how many times to retry a failed query. If unset, the default is 10. You can adjust this value as needed.

      • driver.basic.request.timeout – The time in minutes the system waits for a query to return. If unset, the default is "5 minutes". You can adjust this value as needed.

Related Articles

cassandra
amazon.keyspaces
aws.keyspaces

Amazon Keyspaces (for Apache Cassandra) adds support for DML query auditing with Amazon CloudTrail

John Doe

1/5/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra