11/3/2017

Reading time:1 min

waldmark/spark-cassandra-batch-s3-examples

by John Doe

README.md Spark Casaandra ExampleJava example of Apache Spark consuming and processing 911 calls stored in Cassandra. Requirements:Java 8 installedCassandraScality S3 serverThis demo was developed using docker images running locally for Cassandra and Scality S3. Other instances of Cassandra and S3 should work as well.The example can be run from an IDE (like IntelliJ), or from a runnable jar. See instructions below on building the runnable uber-jar.Stand alone processing from a fileThe class com.objectpartners.spark.rt911.standalone.MainApplication has a runnable main. It loads data into Cassandra;once loaded, it uses the Spark Cassandra Connector to read and then analyze data from Cassandra, and then store the results into S3.Building a runnable jarA standalone jar can be created using Gradle. In the project root directory, in a terminal run gradle:gradle clean buildgradle shadowjarThe uber-jar will be built and placed in the {$project.dir}/build/libs directory.ResourcesIn src/main/resources are two gzips containing 911 call data in csv format:Seattle_Real_Time_Fire_911_Calls_10_Test.csv.gz contains 10 911 calls (10 lines) and can be used for simple testing.Note that the application assumes the first line contains header data, so only 9 calls are actually processed.Seattle_Real_Time_Fire_911_Calls_Chrono.csv.gzA chronologically ordered set of (lots of) calls.

Read this article if you want to know more about waldmark/spark-cassandra-batch-s3-examples

README.md

Spark Casaandra Example

Java example of Apache Spark consuming and processing 911 calls stored in Cassandra.

Requirements:

Java 8 installed
Cassandra
Scality S3 server

This demo was developed using docker images running locally for Cassandra and Scality S3. Other instances of Cassandra and S3 should work as well.

The example can be run from an IDE (like IntelliJ), or from a runnable jar. See instructions below on building the runnable uber-jar.

Stand alone processing from a file

The class com.objectpartners.spark.rt911.standalone.MainApplication has a runnable main. It loads data into Cassandra; once loaded, it uses the Spark Cassandra Connector to read and then analyze data from Cassandra, and then store the results into S3.

Building a runnable jar

A standalone jar can be created using Gradle. In the project root directory, in a terminal run gradle:

gradle clean build
gradle shadowjar

The uber-jar will be built and placed in the {$project.dir}/build/libs directory.

Resources

In src/main/resources are two gzips containing 911 call data in csv format:

Seattle_Real_Time_Fire_911_Calls_10_Test.csv.gz contains 10 911 calls (10 lines) and can be used for simple testing. Note that the application assumes the first line contains header data, so only 9 calls are actually processed.
Seattle_Real_Time_Fire_911_Calls_Chrono.csv.gz A chronologically ordered set of (lots of) calls.

Related Articles

sstable

cassandra

spark

Spark and Cassandra’s SSTable loader

Arunkumar

11/1/2024

analytics

cassandra

spark

GitHub - apache/cassandra-analytics: Apache cassandra

apache

9/4/2024

cassandra

event.driven

spark

Build an Event-Driven Architecture with Apache Kafka, Apache Spark, and Apache Cassandra

DataStax

8/3/2024

python

cassandra

spark

GitHub - andreia-negreira/Data_streaming_project: Data streaming project with robust end-to-end pipeline, combining tools such as Airflow, Kafka, Spark, Cassandra and containerized solution to easy deployment.

andreia-negreira

12/2/2023

python

cassandra

spark

GitHub - airscholar/e2e-data-engineering: An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

airscholar

12/2/2023

flink

beam

dataflow

• Google Dataflow - Awesome-Astra

John Doe

5/10/2023

data.modeling

cassandra

spark

Dealing with Large Spark Partitions

John Doe

2/17/2023

cassandra

spark

kafka

Apache Cassandra Lunch #84: Data & Analytics Platform: Cassandra, Spark, Kafka

John Doe

11/4/2022

cassandra

spark

Can Spark Applications Coexist with NoSQL Databases? | Capital One

John Doe

11/4/2022

proxy

cassandra

spark

Migrate to Azure Managed Instance for Apache Cassandra using Apache Spark

TheovanKraay

8/18/2022

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

README.md

Spark Casaandra Example

Stand alone processing from a file

Building a runnable jar

Resources

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us