1/15/2018

Reading time:7 min

SMACK Stack 1.1

by Joe Stein

SMACK Stack 1.1 SlideShare Explore You Successfully reported this slideshow.SMACK Stack 1.1Upcoming SlideShareLoading in …5× 0 Comments 3 Likes Statistics Notes Henrik Skriver Rasmussen , Software Consultant at Teletronics Dhyaneshwaran Muralidharan , Senior Software Engineer at Yottaa at Yottaa Seo Hoseok No DownloadsNo notes for slide 1. SMACK Stack 1.1 2. Elodina is a big data as a service platform built on topof open source software.The Elodina platform solves today’s dataanalytics needs by providing the tools andsupport necessary to utilize open sourcetechnologies.http://www.elodina.net/ 3. Whats SMACK Stack?SMACK stack 1.0 has been traditionally Spark, Mesos, Akka, Cassandra andKafka lots https://dzone.com/articles/smack-stack-guide and lots lots more https://www.google.com/webhp?q=smack%20stackNow we are going to introduce SMACK Stack 1.1 and talk more about dynamiccompute, micro services, orchestration, micro segmentation all part of what youcan do now with Streaming, Mesos, Analytics, Cassandra and Kafka 4. The free lunch is over!http://www.gotw.ca/publications/concurrency-ddj.htm 5. Many industries still don’t get itXML is everywhere but we have alternatives!We can support XML interface but don’t have to take on the burden of the extradata. You can save A LOT of overheard just by having a pre-processing steptaking the XML, turning it into Avro and processing and storing that.It works https://github.com/elodina/xml-avroYou can even process the response in Avro but return the result in XML, more onthat later though! 6. You need to be running Mesos. Lots of options here!What is most important is that you abstract your “Provider” from your “Grid”.What is “The Grid”?It is your PaaS layer you deploy too that runs your software. (aka your newawesome super computer)The grid is your mesos cluster. You are likely going to have more than one so planaccordingly. Think of it as immutable infrastructure, the computer does.Step 1 7. “Provider” of compute resources 8. The Grid … 2.0 ...https://github.com/elodina/sawfly/blob/master/cloud-deploy-grid.mdProgram against your datacenter like it’s a single pool of resources Apache Mesos abstracts CPU,memory, storage, and other compute resources away from machines (physical or virtual), enablingfault-tolerant and elastic distributed systems to easily be built and run effectively. Mesosphere’s DataCenter Operating System (DCOS) is an operating system that spans all of the machines in a datacenteror cloud and treats them as a single computer, providing a highly elastic and highly scalable way ofdeploying applications, services, and big data infrastructure on shared resources. DCOS is based onApache Mesos and includes a distributed systems kernel with enterprise-grade security. 9. Data Center Optimization! 10. But there is more!● Provisioning● Micro Segmentation● Orchestration● Configuration Management● Service Discovery● Deployment Isolation and Identification● Telemetry, Tracing, Ops Stuff, Etc● Oh My!It boils back down into stacks! https://github.com/elodina/stack-deploy and howyou are working with your schedulers in your cluster ultimatlly. 11. Stack Deploy to the rescue! 12. In the Grid you need Schedulers!● Kafka – Producer/Consumer-based message queue management● Exhibitor – Supervisor for distributed persistence (like ZooKeeper)● Cassandra/DSE – HA, scalable, distributed NoSQL data storage● Storm – Topology-based Real-time distributed data streaming● Monarch – Distributed Remote Procedure Calls, Kafka REST interface and schema repository● Zipkin – Configure, launch and manage Zipkin distributed trace on Mesos● HDFS – Configure, launch and manage HDFS on Mesos (coming soon)● Stockpile – Consumer to “stock pile” data into persistent storage (mesos scheduler only for c* now)● MirrorMaker – Consumer to make a mirror copy of data to destination● StatsD – Producer to pump StatsD on Mesos into Kafka for consumption, preserves layers● SysLog – Producer to pump Syslog on Mesos into Kafka for consumption, preserves layershttps://github.com/elodina/ 13. Virtual Telemetry “Data Center” In the GridZipkinQATeamBuild92● 1x Exhibitor-Mesos● 1x Exhibitor● 1x DSE-Mesos● 1x Cassandra node● 1x Kafka-Mesos● 1x Kafka 0.8 broker● 1x Zipkin-Mesos● 1x Zipkin Collector● 1x Zipkin Query● 1x Zipkin Web“cluster”“zone”“Stack” - defaultSimpleZipkinFull“data center” 14. Stack Deploy In Action./stack-deploy addlayer --file stacks/cassandra_dc.stack --level datacenter./stack-deploy addlayer --file stacks/cassandra_cluster.stack --level cluster --parent cassandra_dc./stack-deploy addlayer --file stacks/cassandra_zone1.stack --level zone --parent cassandra_cluster./stack-deploy addlayer --file stacks/cassandra_zone2.stack --level zone --parent cassandra_cluster./stack-deploy add --file stacks/cassandra.stack./stack-deploy run cassandra --zone cassandra_zone1 15. Full Stack Deployments 16. Cassandra 17. Cassandra Multi DC 18. Casandra https://github.com/elodina/datastax-enterprise-mesos 19. Start your nodes! 20. Apache Kafka• Apache Kafkao http://kafka.apache.org• Apache Kafka Source Codeo https://github.com/apache/kafka• Documentationo http://kafka.apache.org/documentation.html• Wikio https://cwiki.apache.org/confluence/display/KAFKA/Index 21. It often starts with just one data pipeline 22. Reuse of data pipelines for new producers 23. Reuse of existing providers for new consumers 24. Eventually the solution becomes the problem 25. Kafka decouples data-pipelines 26. Topics & Partitions 27. A high-throughput distributed messaging systemrethought as a distributed commit log. 28. Intra Cluster Replication 29. Mesos Kafka http://github.com/mesos/kafka 30. Streaming & Analytics● The landscape of streaming is about to get more fragmented and harder tonavigate. This is not all bad news and it is not much different than where wewere with NoSQL 6 years ago or so.● Different systems are getting really (really (really)) good at different things.○ Dag based systems○ Event based systems○ Query & Execution Engines○ Streaming Engines○ Etc! 31. GearPump 32. Airflow 33. Spring Cloud Data Flow 34. Storm (and Storm Topology based systems) 35. Storm Nimbus{"id": "storm-nimbus","cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm-mesos nimbus -c mesos.master.url=zk://zookeeper.service:2181/mesos -c storm.zookeeper.servers="["zookeeper.service"]" -c nimbus.thrift.port=$PORT0 -c topology.mesos.worker.cpu=0.5 -c topology.mesos.worker.mem.mb=615 -c worker.childopts=-Xmx512m -c topology.mesos.executor.cpu=0.1 -ctopology.mesos.executor.mem.mb=160 -c supervisor.childopts=-Xmx128m -c mesos.executor.uri=http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz -c storm.log.dir=$(pwd)/logs","cpus": 1.0,"mem": 1024,"ports": [31056],"requirePorts": true,"instances": 1,"uris": ["http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz","http://repo.elodina.s3.amazonaws.com/storm.yaml"]} 36. Storm UI{"id": "storm-ui","cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm ui -c ui.port=$PORT0 -c nimbus.thrift.port=31056 -c nimbus.host=storm-nimbus.service -c storm.log.dir=$(pwd)/logs","cpus": 0.2,"mem": 512,"ports": [31067],"requirePorts": true,"instances": 1,"uris": ["http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz","http://repo.elodina.s3.amazonaws.com/storm.yaml"],"healthChecks": [{"protocol": "HTTP","portIndex": 0,"path": "/","gracePeriodSeconds": 120,"intervalSeconds": 20,"maxConsecutiveFailures": 3}]} 37. Storm Kafka - new spouts & bolts for Kafka 8, 9, ... 38. Apache Kafka Streams 39. Go Kafka Client - Fan Out Processinghttps://github.com/elodina/go-kafka-client-mesos● Dynamic Kafka Log workers● Blue/Green Deploy Support● Fan Out Processing● Auditable● Batches● Scalable/Auto-Scalable 40. Questions?http://www.elodina.net Recommended Teaching Techniques: Creating Effective Learning AssessmentsOnline Course - LinkedIn Learning Teaching Techniques: Creating Multimedia LearningOnline Course - LinkedIn Learning Creative Inspirations: Duarte Design, Presentation Design StudioOnline Course - LinkedIn Learning Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...{code} by Dell EMC Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSpark Summit Data processing platforms with SMACK: Spark and Mesos internalsAnton Kirillov Laying down the smack on your data pipelinesPatrick McFadin Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu WangSpark Summit Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson About Blog Terms Privacy Copyright LinkedIn Corporation © 2018 Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Read this article if you want to know more about SMACK Stack 1.1

SMACK Stack 1.1

SlideShare Explore You

Successfully reported this slideshow.

SMACK Stack 1.1

Upcoming SlideShare

Loading in …5

×

0 Comments

1. SMACK Stack 1.1
2. Elodina is a big data as a service platform built on top of open source software. The Elodina platform solves today’s data analytics needs by providing the tools and support necessary to utilize open source technologies. http://www.elodina.net/
3. Whats SMACK Stack? SMACK stack 1.0 has been traditionally Spark, Mesos, Akka, Cassandra and Kafka lots https://dzone.com/articles/smack-stack-guide and lots lots more https: //www.google.com/webhp?q=smack%20stack Now we are going to introduce SMACK Stack 1.1 and talk more about dynamic compute, micro services, orchestration, micro segmentation all part of what you can do now with Streaming, Mesos, Analytics, Cassandra and Kafka
4. The free lunch is over! http://www.gotw.ca/publications/concurrency-ddj.htm
5. Many industries still don’t get it XML is everywhere but we have alternatives! We can support XML interface but don’t have to take on the burden of the extra data. You can save A LOT of overheard just by having a pre-processing step taking the XML, turning it into Avro and processing and storing that. It works https://github.com/elodina/xml-avro You can even process the response in Avro but return the result in XML, more on that later though!
6. You need to be running Mesos. Lots of options here! What is most important is that you abstract your “Provider” from your “Grid”. What is “The Grid”? It is your PaaS layer you deploy too that runs your software. (aka your new awesome super computer) The grid is your mesos cluster. You are likely going to have more than one so plan accordingly. Think of it as immutable infrastructure, the computer does. Step 1
7. “Provider” of compute resources
8. The Grid … 2.0 ... https://github.com/elodina/sawfly/blob/master/cloud-deploy-grid.md Program against your datacenter like it’s a single pool of resources Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesosphere’s Data Center Operating System (DCOS) is an operating system that spans all of the machines in a datacenter or cloud and treats them as a single computer, providing a highly elastic and highly scalable way of deploying applications, services, and big data infrastructure on shared resources. DCOS is based on Apache Mesos and includes a distributed systems kernel with enterprise-grade security.
9. Data Center Optimization!
10. But there is more! ● Provisioning ● Micro Segmentation ● Orchestration ● Configuration Management ● Service Discovery ● Deployment Isolation and Identification ● Telemetry, Tracing, Ops Stuff, Etc ● Oh My! It boils back down into stacks! https://github.com/elodina/stack-deploy and how you are working with your schedulers in your cluster ultimatlly.
11. Stack Deploy to the rescue!
12. In the Grid you need Schedulers! ● Kafka – Producer/Consumer-based message queue management ● Exhibitor – Supervisor for distributed persistence (like ZooKeeper) ● Cassandra/DSE – HA, scalable, distributed NoSQL data storage ● Storm – Topology-based Real-time distributed data streaming ● Monarch – Distributed Remote Procedure Calls, Kafka REST interface and schema repository ● Zipkin – Configure, launch and manage Zipkin distributed trace on Mesos ● HDFS – Configure, launch and manage HDFS on Mesos (coming soon) ● Stockpile – Consumer to “stock pile” data into persistent storage (mesos scheduler only for c* now) ● MirrorMaker – Consumer to make a mirror copy of data to destination ● StatsD – Producer to pump StatsD on Mesos into Kafka for consumption, preserves layers ● SysLog – Producer to pump Syslog on Mesos into Kafka for consumption, preserves layers https://github.com/elodina/
13. Virtual Telemetry “Data Center” In the Grid ZipkinQATeamBuild92 ● 1x Exhibitor-Mesos ● 1x Exhibitor ● 1x DSE-Mesos ● 1x Cassandra node ● 1x Kafka-Mesos ● 1x Kafka 0.8 broker ● 1x Zipkin-Mesos ● 1x Zipkin Collector ● 1x Zipkin Query ● 1x Zipkin Web “cluster” “zone” “Stack” - defaultSimpleZipkinFull “data center”
14. Stack Deploy In Action ./stack-deploy addlayer --file stacks/cassandra_dc.stack --level datacenter ./stack-deploy addlayer --file stacks/cassandra_cluster.stack --level cluster --parent cassandra_dc ./stack-deploy addlayer --file stacks/cassandra_zone1.stack --level zone --parent cassandra_cluster ./stack-deploy addlayer --file stacks/cassandra_zone2.stack --level zone --parent cassandra_cluster ./stack-deploy add --file stacks/cassandra.stack ./stack-deploy run cassandra --zone cassandra_zone1
15. Full Stack Deployments
16. Cassandra
17. Cassandra Multi DC
18. Casandra https://github.com/elodina/datastax-enterprise-mesos
19. Start your nodes!
20. Apache Kafka • Apache Kafka o http://kafka.apache.org • Apache Kafka Source Code o https://github.com/apache/kafka • Documentation o http://kafka.apache.org/documentation.html • Wiki o https://cwiki.apache.org/confluence/display/KAFKA/Index
21. It often starts with just one data pipeline
22. Reuse of data pipelines for new producers
23. Reuse of existing providers for new consumers
24. Eventually the solution becomes the problem
25. Kafka decouples data-pipelines
26. Topics & Partitions
27. A high-throughput distributed messaging system rethought as a distributed commit log.
28. Intra Cluster Replication
29. Mesos Kafka http://github.com/mesos/kafka
30. Streaming & Analytics ● The landscape of streaming is about to get more fragmented and harder to navigate. This is not all bad news and it is not much different than where we were with NoSQL 6 years ago or so. ● Different systems are getting really (really (really)) good at different things. ○ Dag based systems ○ Event based systems ○ Query & Execution Engines ○ Streaming Engines ○ Etc!
31. GearPump
32. Airflow
33. Spring Cloud Data Flow
34. Storm (and Storm Topology based systems)
35. Storm Nimbus { "id": "storm-nimbus", "cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm-mesos nimbus -c mesos.master.url=zk: //zookeeper.service:2181/mesos -c storm.zookeeper.servers="["zookeeper.service"]" -c nimbus.thrift.port=$PORT0 -c topology. mesos.worker.cpu=0.5 -c topology.mesos.worker.mem.mb=615 -c worker.childopts=-Xmx512m -c topology.mesos.executor.cpu=0.1 -c topology.mesos.executor.mem.mb=160 -c supervisor.childopts=-Xmx128m -c mesos.executor.uri=http://repo.elodina.s3.amazonaws. com/storm-mesos-0.9.6.tgz -c storm.log.dir=$(pwd)/logs", "cpus": 1.0, "mem": 1024, "ports": [31056], "requirePorts": true, "instances": 1, "uris": [ "http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz", "http://repo.elodina.s3.amazonaws.com/storm.yaml" ] }
36. Storm UI { "id": "storm-ui", "cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm ui -c ui.port=$PORT0 -c nimbus.thrift.port=31056 -c nimbus. host=storm-nimbus.service -c storm.log.dir=$(pwd)/logs", "cpus": 0.2, "mem": 512, "ports": [31067], "requirePorts": true, "instances": 1, "uris": [ "http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz", "http://repo.elodina.s3.amazonaws.com/storm.yaml" ], "healthChecks": [ { "protocol": "HTTP", "portIndex": 0, "path": "/", "gracePeriodSeconds": 120, "intervalSeconds": 20, "maxConsecutiveFailures": 3 } ] }
37. Storm Kafka - new spouts & bolts for Kafka 8, 9, ...
38. Apache Kafka Streams
39. Go Kafka Client - Fan Out Processing https://github.com/elodina/go-kafka-client-mesos ● Dynamic Kafka Log workers ● Blue/Green Deploy Support ● Fan Out Processing ● Auditable ● Batches ● Scalable/Auto-Scalable
40. Questions? http://www.elodina.net

Visibility Others can see my Clipboard

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us