This project is a Scala application which uses Alpakka Cassandra 2.0, Akka Streams and Twitter4S (Scala Twitter Client) to pull new Tweets from Twitter for a given hashtag (or set of hashtags) using Twitter API v1.1 and write them into a local Cassandra database.
NOTE: The project will only save tweets which are not a retweet of another tweet and currently only saves the truncated version of tweets (<=140 chars).
Requirements
- Scala 2.12+
- JDK 8
- sbt (this project uses 1.4.9)
- Docker (and required RAM for running a Cassandra container)
Table of Contents
- Setup and run local Cassandra using Docker
- Configure Twitter API keys
- Setup hashtags and run the project using SBT
- Observe results in Cassandra using cqlsh
1. Cassandra Setup
1.1 - Make sure you have docker installed on your machine. Run the following docker command to pull up a local Cassandra container with port 9042 exposed:
docker run -p 9042:9042 --rm --name my-cassandra -d cassandra
1.2 - Make sure your container is running (may need to give the container a few minutes to boot up):
docker ps -a
The above output shows that the container has been running for 3 minutes, and also shows that port 9042 locally is bound to port 9042 in the container. (default port for Cassandra)
1.3 - Afterwards, run CQLSH on the container in interactive terminal mode to setup keyspace and tables:
docker exec -it my-cassandra cqlsh
1.4 - Once CQLSH comes up, create the necessary keyspace and table for this demo.
CREATE KEYSPACE testkeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE table testkeyspace.testtable(id bigint PRIMARY KEY, excerpt text);
INSERT INTO testkeyspace.testtable(id, excerpt)
VALUES (37, 'appletest');
exit
2. Twitter Setup
application.conf.example
file found in /src/main/resources/application.conf.example
. Copy this file into this same directory and rename it application.conf
2.1 - From the root folder of this repository, browse to the mv /src/main/resources/application.conf.example /src/main/resources/application.conf
twitter developer dashboard website, register an application and insert these four twitter api keys into this portion of application.conf
:
2.2 - Go to the twitter { consumer { key = "consumer-key-here" secret = "consumer-secret-here" } access { key = "access-key-here" secret = "access-token-here" } }
3. Running The Project
/src/main/scala/com/alptwitter/AlpakkaTwitter.scala
and change the following line to indicate what hashtags you wish to look at new tweets for val trackedWords = Seq("#myHashtag")
:
3.1 - Navigate to vim /workspace/example-cassandra-alpakka-twitter/src/main/scala/com/alptwitter/AlpakkaTwitter.scala
If you want to track more than one hashtag, add more by adding more strings and separating with commas.
3.2 - The project can then be run by navigating to the root folder of the project and running:
sbt run
As new tweets are posted which contain any of the hashtags listed in the trackedWords variable, a message will print in the console which says whether the tweet was a retweet or a unique tweet.
4. Observe Tables
4.1 - As new tweets (not retweets of tweets) with your entered hashtags are posted and found, they will be saved to Cassandra as a (tweet id, text of tweet) entry in testkeyspace.testtable. To check that the tweets are being saved to Cassandra, run CQLSH on the cassandra container and observe the table:
docker exec -it my-cassandra cqlsh
SELECT * FROM testkeyspace.testtable;