Kick-Start with SMACK Stack

Successfully reported this slideshow.

Kick-Start with SMACK Stack
Kick-Start with SMACK Stack
Sandeep Purohit
Software Consultant
Knoldus Software LLP
Agenda:
● What is SMACK?
● Why SMACK?
● Brief introduction of technologies
● How to Integrate all the technologies to crea...
What is SMACK?
● Spark :Apache Spark is a fast and general-purpose cluster
computing system.
● Mesos :Cluster resource man...
Why SMACK?
● Smack is used for pipelined data architecture which is
required for the real time data analysis.
● Smack is u...
SMACK Pipeline Architecture
Why Spark?
● Its general purpose big data processing engine which have
4 main components spark core, spark streaming, spar...
Why Cassandra?
● Cassandra implements “no single points of failure
● Cassandra Write-path is so fast so it can handle real...
Why Mesos?
Mesos Master
Mesos Master
Standby
Mesos Master
Standby
Zookepeer
Mesos Slave
Mesos Slave
Mesos Slave
Models in SMACK
● In SMACK models are Scala and AKKA.
● We can use models to write highly concurrent and parallel
applicat...
Models use in SMACK
Akka-Http
Akka-Scheduler
Why Kafka
● streams of data efficiently and in real time
● Use Kafka for fault tolerance.
● To create bridge between two a...
Architecture of Spark and cassandra
Cassandra Cluster
Spark Worker
Spark Worker
Spark Worker
Spark Worker
Spark worker nod...
Spark, Mesos, Cassandra
Mesos Slaves and cassandra nodes are collocated to enforce the better data
locality for spark.
Dri...
Demo Application Architecture
Tweets
Store tweets in
kafka topic
Retrieve
hashtags
Evaluate Top
hashtag in
every 10
second...
Demo
SMACK_Tweets
Thank You!!

Upcoming SlideShare

Loading in …5

×

  1. 1. Kick-Start with SMACK Stack Sandeep Purohit Software Consultant Knoldus Software LLP
  2. 2. Agenda: ● What is SMACK? ● Why SMACK? ● Brief introduction of technologies ● How to Integrate all the technologies to create the data pipeline ● Demo
  3. 3. What is SMACK? ● Spark :Apache Spark is a fast and general-purpose cluster computing system. ● Mesos :Cluster resource management system that provide efficient resource allocation. ● Akka :Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM. ● Cassandra :The Apache Cassandra database is the right choice when you need scalability and high availability. ● Kafka :distributed messaging system for handling real time data.
  4. 4. Why SMACK? ● Smack is used for pipelined data architecture which is required for the real time data analysis. ● Smack is use to integrate all the technology at the right place to efficient data pipeline. ● Smack is use to linearly scale your whole cluster without any hassle
  5. 5. SMACK Pipeline Architecture
  6. 6. Why Spark? ● Its general purpose big data processing engine which have 4 main components spark core, spark streaming, spark ml, spark graphx ● So we can process our data which any of the component at real time. ● Its provide fault tolerant for real time application.
  7. 7. Why Cassandra? ● Cassandra implements “no single points of failure ● Cassandra Write-path is so fast so it can handle real-time data easily ● It will support Datacenter architecture so we can easily use different DC for different things. Ingestion DC Analysis DC Cassandra Cluster
  8. 8. Why Mesos? Mesos Master Mesos Master Standby Mesos Master Standby Zookepeer Mesos Slave Mesos Slave Mesos Slave
  9. 9. Models in SMACK ● In SMACK models are Scala and AKKA. ● We can use models to write highly concurrent and parallel applications. ● Example: We can use akka modules according to our use case like akka-http, akka-scheduler, akka priority mailboxes etc.
  10. 10. Models use in SMACK Akka-Http Akka-Scheduler
  11. 11. Why Kafka ● streams of data efficiently and in real time ● Use Kafka for fault tolerance. ● To create bridge between two applications. Streaming Source Kafka Broker Spark Receiver
  12. 12. Architecture of Spark and cassandra Cassandra Cluster Spark Worker Spark Worker Spark Worker Spark Worker Spark worker nodes will get the data on local node so it will avoid latency
  13. 13. Spark, Mesos, Cassandra Mesos Slaves and cassandra nodes are collocated to enforce the better data locality for spark. Driver Program Mesos Master Mesos slave Cassandra node Mesos slave Cassandra node Mesos slave Cassandra node
  14. 14. Demo Application Architecture Tweets Store tweets in kafka topic Retrieve hashtags Evaluate Top hashtag in every 10 seconds Store tweets in cassandra table
  15. 15. Demo SMACK_Tweets
  16. 16. Thank You!!