12/2/2020

Reading time:N/A min

Cassandra and Spark: Optimizing for Data Locality - Databricks

by John Doe

There are only three things that are important in doing analytics on a distributed database: Locality, locality and locality. Learn how the Cassandra-Spark connector builds RDD’s and optimizes for interacting with local Cassandra machines. We’ll go in depth into how Cassandra stores data in a cluster and the steps the Open Source Connector uses for both reading and writing data to Cassandra. Discover the Cassandra specific RDD functions that allow you to take advantage of underlying Cassandra mechanisms and perform lightening fast analytics on the world’s most scalable OLTP database. You will learn to take advantage of these strategies in your applications and make sure that you are making the most of your cluster resources.Learn more:Spark and Cassandra: An Amazing Apache Love StorySpark And Cassandra: 2 Fast, 2 FuriousCassandra and SparkSQL: You Don’t Need Functional Programming for FunZen and the Art of Apache Spark Maintenance with Cassandra« back

Read this article if you want to know more about Cassandra and Spark: Optimizing for Data Locality - Databricks

There are only three things that are important in doing analytics on a distributed database: Locality, locality and locality. Learn how the Cassandra-Spark connector builds RDD’s and optimizes for interacting with local Cassandra machines. We’ll go in depth into how Cassandra stores data in a cluster and the steps the Open Source Connector uses for both reading and writing data to Cassandra. Discover the Cassandra specific RDD functions that allow you to take advantage of underlying Cassandra mechanisms and perform lightening fast analytics on the world’s most scalable OLTP database. You will learn to take advantage of these strategies in your applications and make sure that you are making the most of your cluster resources.

Learn more:

Spark and Cassandra: An Amazing Apache Love Story Spark And Cassandra: 2 Fast, 2 Furious Cassandra and SparkSQL: You Don’t Need Functional Programming for Fun Zen and the Art of Apache Spark Maintenance with Cassandra

Related Articles

Spark and Cassandra’s SSTable loader

Arunkumar

11/1/2024

analytics

cassandra

spark

GitHub - apache/cassandra-analytics: Apache cassandra

apache

9/4/2024

cassandra

event.driven

spark

Build an Event-Driven Architecture with Apache Kafka, Apache Spark, and Apache Cassandra

DataStax

8/3/2024

python

cassandra

spark

GitHub - andreia-negreira/Data_streaming_project: Data streaming project with robust end-to-end pipeline, combining tools such as Airflow, Kafka, Spark, Cassandra and containerized solution to easy deployment.

andreia-negreira

12/2/2023

python

cassandra

spark

GitHub - airscholar/e2e-data-engineering: An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

airscholar

12/2/2023

flink

beam

dataflow

• Google Dataflow - Awesome-Astra

John Doe

5/10/2023

data.modeling

cassandra

spark

Dealing with Large Spark Partitions

John Doe

2/17/2023

cassandra

spark

kafka

Apache Cassandra Lunch #84: Data & Analytics Platform: Cassandra, Spark, Kafka

John Doe

11/4/2022

cassandra

spark

Can Spark Applications Coexist with NoSQL Databases? | Capital One

John Doe

11/4/2022

proxy

cassandra

spark

Migrate to Azure Managed Instance for Apache Cassandra using Apache Spark

TheovanKraay

8/18/2022

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us