4/20/2018

Reading time:1 min

jamesward/koober

by John Doe

README.md An uber data pipeline sample app. Play Framework, Akka Streams, Kafka, Flink, Spark Streaming, and Cassandra.Start Kafka:./sbt kafkaServer/runWeb App:Obtain an API key from mapbox.comStart the Play web app: MAPBOX_ACCESS_TOKEN=YOUR-MAPBOX-API-KEY ./sbt webapp/runTry it out:Open the driver UI: http://localhost:9000/driverOpen the rider UI: http://localhost:9000/riderIn the Rider UI, click on the map to position the riderIn the Driver UI, click on the rider to initiate a pickupStart Flink:./sbt flinkClient/runInitiate a few pickups and see the average pickup wait time change (in the stdout console for the Flink process)Start Cassandra:./sbt cassandraServer/runStart the Spark Streaming process:./sbt kafkaToCassandra/runWatch all of the ride data be micro-batched from Kafka to CassandraSetup PredictionIO Pipeline:Setup PIOSet the PIO Access Key: export PIO_ACCESS_KEY=<YOUR PIO ACCESS KEY>Start the PIO Pipeline: ./sbt pioClient/runCopy demo data into Kafka or PIO:For fake data, run:./sbt "demoData/run <kafka|pio> fake <number of records> <number of months> <number of clusters>"For New York data, run:./sbt "demoData/run <kafka|pio> ny <number of months> <sample rate>"Start the Demand DashboardPREDICTIONIO_URL=http://asdf.com MAPBOX_ACCESS_TOKEN=YOUR_MAPBOX_TOKEN ./sbt demandDashboard/run

Read this article if you want to know more about jamesward/koober

README.md

An uber data pipeline sample app. Play Framework, Akka Streams, Kafka, Flink, Spark Streaming, and Cassandra.

Start Kafka:

./sbt kafkaServer/run

Web App:

Obtain an API key from mapbox.com
Start the Play web app: MAPBOX_ACCESS_TOKEN=YOUR-MAPBOX-API-KEY ./sbt webapp/run

Try it out:

Open the driver UI: http://localhost:9000/driver
Open the rider UI: http://localhost:9000/rider
In the Rider UI, click on the map to position the rider
In the Driver UI, click on the rider to initiate a pickup

Start Flink:

./sbt flinkClient/run
Initiate a few pickups and see the average pickup wait time change (in the stdout console for the Flink process)

Start Cassandra:

./sbt cassandraServer/run

Start the Spark Streaming process:

./sbt kafkaToCassandra/run
Watch all of the ride data be micro-batched from Kafka to Cassandra

Setup PredictionIO Pipeline:

Setup PIO

Set the PIO Access Key:

 export PIO_ACCESS_KEY=<YOUR PIO ACCESS KEY>

Start the PIO Pipeline:
```
 ./sbt pioClient/run
```

Copy demo data into Kafka or PIO:

For fake data, run:

./sbt "demoData/run <kafka|pio> fake <number of records> <number of months> <number of clusters>"

For New York data, run:

./sbt "demoData/run <kafka|pio> ny <number of months> <sample rate>"

Start the Demand Dashboard

PREDICTIONIO_URL=http://asdf.com MAPBOX_ACCESS_TOKEN=YOUR_MAPBOX_TOKEN ./sbt demandDashboard/run

Related Articles

sstable

cassandra

spark

Spark and Cassandra’s SSTable loader

Arunkumar

11/1/2024

analytics

cassandra

spark

GitHub - apache/cassandra-analytics: Apache cassandra

apache

9/4/2024

cassandra

event.driven

spark

Build an Event-Driven Architecture with Apache Kafka, Apache Spark, and Apache Cassandra

DataStax

8/3/2024

analytics

streaming

visualization

Keen - Event Streaming Platform

John Doe

2/3/2024

mongo

cassandra

kafka

Top 10 Real-Time Databases to Use in 2024

K.sabreena

1/5/2024

python

cassandra

spark

GitHub - andreia-negreira/Data_streaming_project: Data streaming project with robust end-to-end pipeline, combining tools such as Airflow, Kafka, Spark, Cassandra and containerized solution to easy deployment.

andreia-negreira

12/2/2023

python

cassandra

spark

GitHub - airscholar/e2e-data-engineering: An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

airscholar

12/2/2023

cassandra

python

kafka

GitHub - princebhatt9588/Stock_Market_Real_Time_Data_Pipeline_Project_with-Apache-Kafka-and-Cassandra: This app utilizes Python, Apache Kafka, and Cassandra to fetch and process real-time stock market data, providing valuable insights for investors and traders.

princebhatt9588

12/2/2023

flink

beam

dataflow

• Google Dataflow - Awesome-Astra

John Doe

5/10/2023

data.modeling

cassandra

spark

Dealing with Large Spark Partitions

John Doe

2/17/2023

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

README.md

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us