Setting up Analytics Stack for Streaming Applications
- SKACK Stack is an open source Full-Stack platform for Real-Time analysis of Big Data. It consists of Apache Spark, Kubernetes, Akka, Apache Cassandra, and Apache Kafka.
- GCP & GlusterFS acts a storage solution as it supports multi-mount and data remains on all nodes of GlusterFS & GCP.
Challenge for Setting Up Multi Node cluster on SKACK
- Set up a multi-node cluster for SKACK Stack with a document on Kubernetes.
- Container environment is not persistent by default, so application in Kubernetes needs Persistent storage to store data.
- Using Kubernetes to scale up Spark.
- Using Kubernetes to scale up Cassandra
- Using Kubernetes to scale up Kafka
Solution Offerings for Setting Up on Premises Kubernetes Cluster
To overcome the challenges mentioned above, set up a three-node on premises Kubernetes cluster in which one will as a master and the other two workers.
The Cluster includes –
- Kubernetes Master
- Kubernetes Scheduler
- Kubernetes Controller Manager
Setup for analyzing the cluster and reporting to the API server to store metrics that contains resource utilization, availability, and performance.