Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

9/24/2021

Reading time:3 min

Apache Cassandra Lunch #68: DataStax Apache Kafka Connector - Business Platform Team

by Arpan Patel

September 24, 2021 by In Apache Cassandra Lunch #68: DataStax Apache Kafka Connector, we introduce the DataStax Apache Kafka Connector and discuss how we can use it to connect Apache Kafka and Cassandra. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!In Apache Cassandra Lunch #68: DataStax Apache Kafka Connector, we introduce the DataStax Apache Kafka Connector and discuss how we can use it to connect Apache Kafka and Cassandra. In the video recording embedded below, we go through some basic information regarding the connector, basic architecture of how it works, and also go through a simple Katacoda example from DataStax to show you how to use the connector. Additionally, we also discuss how we use the DataStax Apache Kafka Connector in our Cassandra.Realtime repo, so be sure to check out the embedded video below!The DataStax Apache Kafka Connector is open source software that works with the Kafka Connect framework. It synchronizes records from a Kafka topic with table rows in the following supported databases: DataStax Astra cloud databases, DataStax Enterprise (DSE) 4.7 and later databases, and Open source Apache Cassandra® 2.1 and later databases. The connector gets deployed on the Kafka Connect Worker nodes and runs within the worker JVM. The connector Workers running one or more instances of the DataStax Kafka Connector pull messages from Kafka topics and write them to a database table on the DataStax platform using the DataStax Enterprise Java driver.Basic ArchitectureEach instance of the DataStax Apache Kafka Connector creates a single session with the cluster.A single connector instance can process records from multiple Kafka topics and write to several database tables.Data is pulled from the Kafka topic and written to the mapped table using a CQL batch that contains multiple write statements.A map specification binds a Kafka topic field to a table column. Fields that are omitted from the specification are not included in the write request. Fields with null values are written to the database as UNSET (see nullToUnset). To ensure proper ordering, all records are written using the Kafka record timestamp.Use multiple connectors when different global connect settings are required for different scenarios, such as writing to different clusters or datacenters.The Datastax Connector tasks store the offsets in config.offset.topic. In the event of a failure, the DataStax Connector task resumes reading from the last recorded location.Ingest data from Kafka topics with records in the following data structures:Primitive type values, such as integer or stringComplex field values in record types:JSON formatted stringKafka StructAvroBuilt-in SSL, LDAP/Active Directory, and Kerberos integrationMore Features: https://docs.datastax.com/en/kafka/doc/kafka/kafkaFeatures.htmlThe demo portion of Apache Cassandra Lunch #68: DataStax Apache Kafka Connector is split into two parts as mentioned above. In the first portion, we cover a DataStax Katacoda Scenario in which we create a Kafka topic, configure and start a Kafka Connect Worker, download and configure the DataStax Kafka Connector, and push data from the topic in Kafka to a Cassandra instance. In the second portion of the demo, we take a look at Cassandra.Realtime and discuss how that walkthrough uses the same basics we covered in the Katacoda scenario. If you want a more in-depth discussion and video demo, be sure to watch the embedded Youtube video below! Resourceshttps://www.datastax.com/dev/kafka#introductionhttps://docs.datastax.com/en/kafka/doc/kafka/kafkaIntro.htmlhttps://github.com/Anant/cassandra.realtimeCassandra.LinkCassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email! Posted in Data & Analytics, Events | Comments Off on Apache Cassandra Lunch #68: DataStax Apache Kafka Connector Tags: cassandra, cassandra.lunch, datastax, kafka

Illustration Image
DataStax Apache Kafka Connector

In Apache Cassandra Lunch #68: DataStax Apache Kafka Connector, we introduce the DataStax Apache Kafka Connector and discuss how we can use it to connect Apache Kafka and Cassandra. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

In Apache Cassandra Lunch #68: DataStax Apache Kafka Connector, we introduce the DataStax Apache Kafka Connector and discuss how we can use it to connect Apache Kafka and Cassandra. In the video recording embedded below, we go through some basic information regarding the connector, basic architecture of how it works, and also go through a simple Katacoda example from DataStax to show you how to use the connector. Additionally, we also discuss how we use the DataStax Apache Kafka Connector in our Cassandra.Realtime repo, so be sure to check out the embedded video below!

The DataStax Apache Kafka Connector is open source software that works with the Kafka Connect framework. It synchronizes records from a Kafka topic with table rows in the following supported databases: DataStax Astra cloud databases, DataStax Enterprise (DSE) 4.7 and later databases, and Open source Apache Cassandra® 2.1 and later databases. The connector gets deployed on the Kafka Connect Worker nodes and runs within the worker JVM. The connector Workers running one or more instances of the DataStax Kafka Connector pull messages from Kafka topics and write them to a database table on the DataStax platform using the DataStax Enterprise Java driver.

Basic Architecture
Basic Architecture
  • Each instance of the DataStax Apache Kafka Connector creates a single session with the cluster.
    • A single connector instance can process records from multiple Kafka topics and write to several database tables.
  • Data is pulled from the Kafka topic and written to the mapped table using a CQL batch that contains multiple write statements.
  • A map specification binds a Kafka topic field to a table column. 
    • Fields that are omitted from the specification are not included in the write request. 
    • Fields with null values are written to the database as UNSET (see nullToUnset). 
    • To ensure proper ordering, all records are written using the Kafka record timestamp.
  • Use multiple connectors when different global connect settings are required for different scenarios, such as writing to different clusters or datacenters.
  • The Datastax Connector tasks store the offsets in config.offset.topic. 
    • In the event of a failure, the DataStax Connector task resumes reading from the last recorded location.
  • Ingest data from Kafka topics with records in the following data structures:
    • Primitive type values, such as integer or string
    • Complex field values in record types:
      • JSON formatted string
      • Kafka Struct
      • Avro
  • Built-in SSL, LDAP/Active Directory, and Kerberos integration
  • More Features: https://docs.datastax.com/en/kafka/doc/kafka/kafkaFeatures.html

The demo portion of Apache Cassandra Lunch #68: DataStax Apache Kafka Connector is split into two parts as mentioned above. In the first portion, we cover a DataStax Katacoda Scenario in which we create a Kafka topic, configure and start a Kafka Connect Worker, download and configure the DataStax Kafka Connector, and push data from the topic in Kafka to a Cassandra instance. In the second portion of the demo, we take a look at Cassandra.Realtime and discuss how that walkthrough uses the same basics we covered in the Katacoda scenario. If you want a more in-depth discussion and video demo, be sure to watch the embedded Youtube video below!

Resources

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!

Related Articles

cluster
troubleshooting
datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

arodrime

4/3/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

datastax