1/15/2018

Reading time:1 min

markthebault/importCSVSparkCassandra

by John Doe

README.md This short example will show you how easily is to import CSV files from your AWS S3 bucketsusing spark into cassandra.Setting up your applicaitonClone this repository git clone http://gitlab.ippon.fr/mthebault/simplecsvexportspark.gitOpen the file 'src/main/ressources/project.conf' and change your settings.You need to change the following values:CassandrahostportkeyspacetableAWSaccessKeysecretKeybucketfileNameBuild a jarTo build the Jar of your application you just need to run the command sbt clean assemblyDeploy the Jar on a spark clusterTo deploy a jar on a spark cluster you have to make sure you have the port 7077 accessible from the outside.You have to push this Jar to a S3 public bucket aws s3 cp ./target/scala-2.10/ImportCSV.jar s3://YOUR_BUCKET/ImportCVS.jarOnce you have done that, you just need to run the spark-submit command as following:$SPARK_HOME/bin/spark-submit \ --verbose \ --master spark://IP_SPARK_MASTER:PORT \ --deploy-mode cluster \ --driver-class-path /spark/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar \ --class Application \ https://s3-eu-west-1.amazonaws.com/YOUR_BUCKET/ImportCSV.jarNote:Here I am using a public s3 bucket for the jars. If you want to use your private buckets you can use the following link:http://AWS_S3_ACCESS_KEY:AWS_S3_SECRET_KEY@YOUR_BUCKET/ImportCSV.jarplease consider of the http link have to be encoded you can use this website to encode the linkNextIf you want to contribute to this project feel free to do it, if you see some mistake please leave me an issue.

Read this article if you want to know more about markthebault/importCSVSparkCassandra

README.md

This short example will show you how easily is to import CSV files from your AWS S3 buckets using spark into cassandra.

Setting up your applicaiton

Clone this repository git clone http://gitlab.ippon.fr/mthebault/simplecsvexportspark.git Open the file 'src/main/ressources/project.conf' and change your settings.

You need to change the following values:

Cassandra
- host
- port
- keyspace
- table
AWS
- accessKey
- secretKey
- bucket
- fileName

Build a jar

To build the Jar of your application you just need to run the command sbt clean assembly

Deploy the Jar on a spark cluster

To deploy a jar on a spark cluster you have to make sure you have the port 7077 accessible from the outside. You have to push this Jar to a S3 public bucket aws s3 cp ./target/scala-2.10/ImportCSV.jar s3://YOUR_BUCKET/ImportCVS.jar

Once you have done that, you just need to run the spark-submit command as following:

$SPARK_HOME/bin/spark-submit \
	--verbose \
	--master spark://IP_SPARK_MASTER:PORT \
	--deploy-mode cluster \
	--driver-class-path /spark/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar \
	--class Application \
	https://s3-eu-west-1.amazonaws.com/YOUR_BUCKET/ImportCSV.jar

Note: Here I am using a public s3 bucket for the jars. If you want to use your private buckets you can use the following link: http://AWS_S3_ACCESS_KEY:AWS_S3_SECRET_KEY@YOUR_BUCKET/ImportCSV.jar please consider of the http link have to be encoded you can use this website to encode the link

GitHub - airscholar/e2e-data-engineering: An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

airscholar

12/2/2023

flink

beam

dataflow

• Google Dataflow - Awesome-Astra

John Doe

5/10/2023

data.modeling

cassandra

spark

Dealing with Large Spark Partitions

John Doe

2/17/2023

cassandra

spark

kafka

Apache Cassandra Lunch #84: Data & Analytics Platform: Cassandra, Spark, Kafka

John Doe

11/4/2022

cassandra

spark

Can Spark Applications Coexist with NoSQL Databases? | Capital One

John Doe

11/4/2022

proxy

cassandra

spark

Migrate to Azure Managed Instance for Apache Cassandra using Apache Spark

TheovanKraay

8/18/2022

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

README.md

Setting up your applicaiton

Build a jar

Deploy the Jar on a spark cluster

Next

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us