Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

12/10/2020

Reading time:2 min

End to end streaming from kafka to cassandra

by John Doe

I have been working with building some research product to store server logs in a secure storage(obviously in a blockchian :)). I named this project as Octopus. As a part of this project we had to take logs from different services and store in cassandra. In this case, first I have taken different service logs from filebeat to kafka(I have written about that story in here). Then I have streamed the data in kafka to cassandra with using akka streams.In this post I’m gonna show how I have done the streaming from kafka to cassandra. All the source codes which relates to this post available on this gitlab repo. Please clone it and follow the below steps.First I need to run cassandra in my local machine. Following is the docker-compose file entry for cassandra.I can simply run it with below command. It will start single node cassandra cluster on 9042 port on my local machine.docker-compose up -d cassandraThen I need to create cassandra keyspace and table. The data comes from kafka source sink into this table.Next thing is to setup kafka. Following is the content of docker-compose.yml which correspond for run zookeeper and kafka.I can start zookeeper and kafka with below commands. In here I’m running single instance of zookeeper and kafka. Kafka will be available with 9092port on my local machine.docker-compose up -d zookeeperdocker-compose up -d kafkaIn my sbt application I need to define akka-streams dependency and alpakka connectors dependencies(cassandra and kafka). Following is the build.sbt file.Then we need to define kafka consumer configurations in application.conf. Following are the configurations.Now all the service configurations are done. Next things is to implement akka stream source, flows, sink and build the graph.In here first I have defined kafka source and a cassandra sink. Kafka source connected for log topic(Filebeat service publishing log messages as JSON string to log topic). Cassandra sink connected for log table in mystiko keyspace(Cassandra saves the logs in log table in mystiko keyspace). Inside sink I have added cql query and statementBinder which binds scala case class fields into java field types.Next I have defined two flows. One flow is to convert kafka JSON messages to Message objects. Other flow converts Message objects Log objects. These Log object data will be saved in cassandra. Finally I have defined the graph that connecting the source with flows and sink.In order to test this application I have used kafkacat. I’m publishing some sample log messages(JSON format strings) to kafka with kafkacat. You can take kafkacat command line tool from here. These messages will go through the stream pipeline and saved in the cassandra log table.# connecting to kafka with kafakcat# kafka broker - localhost:9092 # kafka topic - logkafkacat -P -b localhost:9092 -t log# publish messages {"msg": "udpz.go:25 server started 7070", "service":"storage"}https://medium.com/@itseranga/publish-logs-to-kafka-with-filebeat-74497ef7dafehttps://medium.com/@itseranga/kafka-consumer-with-scala-akka-streams-7e3237d6acchttp://safdarhusain.com/DataIntegration2.htmlhttps://github.com/calvinlfer/alpakka-cassandra-sink-usage/blob/master/src/main/scala/com/experiments/calvin/ExampleApp.scala

Illustration Image

Read this article if you want to know more about End to end streaming from kafka to cassandra

I have been working with building some research product to store server logs in a secure storage(obviously in a blockchian :)). I named this project as Octopus. As a part of this project we had to take logs from different services and store in cassandra. In this case, first I have taken different service logs from filebeat to kafka(I have written about that story in here). Then I have streamed the data in kafka to cassandra with using akka streams.

In this post I’m gonna show how I have done the streaming from kafka to cassandra. All the source codes which relates to this post available on this gitlab repo. Please clone it and follow the below steps.

First I need to run cassandra in my local machine. Following is the docker-compose file entry for cassandra.

I can simply run it with below command. It will start single node cassandra cluster on 9042 port on my local machine.

docker-compose up -d cassandra

Then I need to create cassandra keyspace and table. The data comes from kafka source sink into this table.

Next thing is to setup kafka. Following is the content of docker-compose.yml which correspond for run zookeeper and kafka.

I can start zookeeper and kafka with below commands. In here I’m running single instance of zookeeper and kafka. Kafka will be available with 9092port on my local machine.

docker-compose up -d zookeeper
docker-compose up -d kafka

In my sbt application I need to define akka-streams dependency and alpakka connectors dependencies(cassandra and kafka). Following is the build.sbt file.

Then we need to define kafka consumer configurations in application.conf. Following are the configurations.

Now all the service configurations are done. Next things is to implement akka stream source, flows, sink and build the graph.

In here first I have defined kafka source and a cassandra sink. Kafka source connected for log topic(Filebeat service publishing log messages as JSON string to log topic). Cassandra sink connected for log table in mystiko keyspace(Cassandra saves the logs in log table in mystiko keyspace). Inside sink I have added cql query and statementBinder which binds scala case class fields into java field types.

Next I have defined two flows. One flow is to convert kafka JSON messages to Message objects. Other flow converts Message objects Log objects. These Log object data will be saved in cassandra. Finally I have defined the graph that connecting the source with flows and sink.

In order to test this application I have used kafkacat. I’m publishing some sample log messages(JSON format strings) to kafka with kafkacat. You can take kafkacat command line tool from here. These messages will go through the stream pipeline and saved in the cassandra log table.

# connecting to kafka with kafakcat
# kafka broker - localhost:9092 
# kafka topic - log
kafkacat -P -b localhost:9092 -t log# publish messages 
{"msg": "udpz.go:25 server started 7070", "service":"storage"}

Related Articles

Build an Event-Driven Architecture with Apache Kafka, Apache Spark, and Apache Cassandra

DataStax

8/3/2024

Keen - Event Streaming Platform

John Doe

2/3/2024

Top 10 Real-Time Databases to Use in 2024

K.sabreena

1/5/2024

GitHub - andreia-negreira/Data_streaming_project: Data streaming project with robust end-to-end pipeline, combining tools such as Airflow, Kafka, Spark, Cassandra and containerized solution to easy deployment.

andreia-negreira

12/2/2023

GitHub - airscholar/e2e-data-engineering: An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

airscholar

12/2/2023

GitHub - princebhatt9588/Stock_Market_Real_Time_Data_Pipeline_Project_with-Apache-Kafka-and-Cassandra: This app utilizes Python, Apache Kafka, and Cassandra to fetch and process real-time stock market data, providing valuable insights for investors and traders.

princebhatt9588

12/2/2023

Apache Cassandra Lunch #84: Data & Analytics Platform: Cassandra, Spark, Kafka

John Doe

11/4/2022

Geospatial Anomaly Detection (Terra-Locus Anomalia Machina) Part 3: 3D Geohashes (and Drones)

John Doe

8/5/2022

Geospatial Anomaly Detection (Terra-Locus Anomalia Machina) Part 2: Geohashes (2D)

John Doe

8/5/2022

Geospatial Anomaly Detection: Part 1—Massively Scalable Geospatial Anomaly Detection With Apache Kafka® and Apache Cassandra®

Paul Brebner

8/5/2022

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company.

Contact Info

3 Washington Circle NW Suite 301 - Washington, D.C. 20037

support@anant.us

(855) 262-6526

Resources

Services

Careers

Events

Contact Us

Open Source Tools

Properties

Blog

Cassandra.Link

Cassandra.Tools

Anant Playbook

Awesome Cassandra

Follow Us

Github

Youtube

Twitter

Linkedin

Facebook

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company.

Illustration Image

Illustration Image

© 2023 Anant Corporation

Apache, the Apache feather logo, Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation.