Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

4/9/2021

Reading time:2 min

Anant/example-cassandra-alpakka-twitter

by Anant

This project is a Scala application which uses Alpakka Cassandra 2.0, Akka Streams and Twitter4S (Scala Twitter Client) to pull new Tweets from Twitter for a given hashtag (or set of hashtags) using Twitter API v1.1 and write them into a local Cassandra database.NOTE: The project will only save tweets which are not a retweet of another tweet and currently only saves the truncated version of tweets (<=140 chars).RequirementsScala 2.12+JDK 8sbt (this project uses 1.4.9)Docker (and required RAM for running a Cassandra container)Table of ContentsSetup and run local Cassandra using DockerConfigure Twitter API keysSetup hashtags and run the project using SBTObserve results in Cassandra using cqlsh1. Cassandra Setup1.1 - Make sure you have docker installed on your machine. Run the following docker command to pull up a local Cassandra container with port 9042 exposed:docker run -p 9042:9042 --rm --name my-cassandra -d cassandra1.2 - Make sure your container is running (may need to give the container a few minutes to boot up):docker ps -aThe above output shows that the container has been running for 3 minutes, and also shows that port 9042 locally is bound to port 9042 in the container. (default port for Cassandra)1.3 - Afterwards, run CQLSH on the container in interactive terminal mode to setup keyspace and tables:docker exec -it my-cassandra cqlsh1.4 - Once CQLSH comes up, create the necessary keyspace and table for this demo.CREATE KEYSPACE testkeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;CREATE table testkeyspace.testtable(id bigint PRIMARY KEY, excerpt text); INSERT INTO testkeyspace.testtable(id, excerpt)VALUES (37, 'appletest');exit2. Twitter Setup2.1 - From the root folder of this repository, browse to the application.conf.example file found in /src/main/resources/application.conf.example. Copy this file into this same directory and rename it application.confmv /src/main/resources/application.conf.example /src/main/resources/application.conf2.2 - Go to the twitter developer dashboard website, register an application and insert these four twitter api keys into this portion of application.conf:twitter { consumer { key = "consumer-key-here" secret = "consumer-secret-here" } access { key = "access-key-here" secret = "access-token-here" }}3. Running The Project3.1 - Navigate to /src/main/scala/com/alptwitter/AlpakkaTwitter.scala and change the following line to indicate what hashtags you wish to look at new tweets for val trackedWords = Seq("#myHashtag"):vim /workspace/example-cassandra-alpakka-twitter/src/main/scala/com/alptwitter/AlpakkaTwitter.scalaIf you want to track more than one hashtag, add more by adding more strings and separating with commas.3.2 - The project can then be run by navigating to the root folder of the project and running:sbt runAs new tweets are posted which contain any of the hashtags listed in the trackedWords variable, a message will print in the console which says whether the tweet was a retweet or a unique tweet.4. Observe Tables4.1 - As new tweets (not retweets of tweets) with your entered hashtags are posted and found, they will be saved to Cassandra as a (tweet id, text of tweet) entry in testkeyspace.testtable. To check that the tweets are being saved to Cassandra, run CQLSH on the cassandra container and observe the table:docker exec -it my-cassandra cqlshSELECT * FROM testkeyspace.testtable; References / Useful Links:Twitter4S (Twitter for Scala) Github RepositoryTwitter4S definition of Tweet objectAlpakka Cassandra Documentation

Illustration Image

This project is a Scala application which uses Alpakka Cassandra 2.0, Akka Streams and Twitter4S (Scala Twitter Client) to pull new Tweets from Twitter for a given hashtag (or set of hashtags) using Twitter API v1.1 and write them into a local Cassandra database.

NOTE: The project will only save tweets which are not a retweet of another tweet and currently only saves the truncated version of tweets (<=140 chars).

Img


Requirements

  • Scala 2.12+
  • JDK 8
  • sbt (this project uses 1.4.9)
  • Docker (and required RAM for running a Cassandra container)

Table of Contents

  1. Setup and run local Cassandra using Docker
  2. Configure Twitter API keys
  3. Setup hashtags and run the project using SBT
  4. Observe results in Cassandra using cqlsh

1. Cassandra Setup

1.1 - Make sure you have docker installed on your machine. Run the following docker command to pull up a local Cassandra container with port 9042 exposed:

docker run -p 9042:9042 --rm --name my-cassandra -d cassandra

1.2 - Make sure your container is running (may need to give the container a few minutes to boot up):

docker ps -a

Screenshot
The above output shows that the container has been running for 3 minutes, and also shows that port 9042 locally is bound to port 9042 in the container. (default port for Cassandra)

1.3 - Afterwards, run CQLSH on the container in interactive terminal mode to setup keyspace and tables:

docker exec -it my-cassandra cqlsh

1.4 - Once CQLSH comes up, create the necessary keyspace and table for this demo.

CREATE KEYSPACE testkeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;
CREATE table testkeyspace.testtable(id bigint PRIMARY KEY, excerpt text);  
INSERT INTO testkeyspace.testtable(id, excerpt)
VALUES (37, 'appletest');
exit

2. Twitter Setup

2.1 - From the root folder of this repository, browse to the application.conf.example file found in /src/main/resources/application.conf.example. Copy this file into this same directory and rename it application.conf

mv /src/main/resources/application.conf.example /src/main/resources/application.conf

2.2 - Go to the twitter developer dashboard website, register an application and insert these four twitter api keys into this portion of application.conf:

twitter {
  consumer {
    key = "consumer-key-here"
    secret = "consumer-secret-here"
  }
  access {
    key = "access-key-here"
    secret = "access-token-here"
  }
}

3. Running The Project

3.1 - Navigate to /src/main/scala/com/alptwitter/AlpakkaTwitter.scala and change the following line to indicate what hashtags you wish to look at new tweets for val trackedWords = Seq("#myHashtag"):

vim /workspace/example-cassandra-alpakka-twitter/src/main/scala/com/alptwitter/AlpakkaTwitter.scala

If you want to track more than one hashtag, add more by adding more strings and separating with commas.

3.2 - The project can then be run by navigating to the root folder of the project and running:

sbt run

As new tweets are posted which contain any of the hashtags listed in the trackedWords variable, a message will print in the console which says whether the tweet was a retweet or a unique tweet.


4. Observe Tables

4.1 - As new tweets (not retweets of tweets) with your entered hashtags are posted and found, they will be saved to Cassandra as a (tweet id, text of tweet) entry in testkeyspace.testtable. To check that the tweets are being saved to Cassandra, run CQLSH on the cassandra container and observe the table:

docker exec -it my-cassandra cqlsh
SELECT * FROM testkeyspace.testtable; 

References / Useful Links:

Twitter4S (Twitter for Scala) Github Repository

Twitter4S definition of Tweet object

Alpakka Cassandra Documentation

Related Articles

alpakka
realtime
twitter

Apache Cassandra Lunch #45: Alpakka Cassandra and Twitter - Business Platform Team

John Doe

6/11/2022

akka
cassandra

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

alpakka