Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

11/13/2018

Reading time:2 min

phact/SimpleSparkStreaming

by John Doe

This is a guide for how to use the Streaming Analytics Proofshop asset brought to you by the Vanguard team. The Proofshop consists of demonstrating real time data processing and analytics with the DSE paltform at high throughput, low latency, and scale.MotivationDSE is often seen as a serving layer for the output of analytical applications on large datasets. Often the analytics themselves are performed in slow batch data layers and persisted to DSE only when they are ready to be exposed to a front end or downstream system. In many cases, as we look to satisfy end user requirements, it makes more sense to perform some of these operational analytics in real time using DSE's streaming analytics capabilities.What is included?This field asset includes sample usage of DSE Streaming Analytics in the following contexts:Data generation using EBDSE served via TCPSpark streaming application that persists raw events into DSE and rollups into summary tablesTuning capabilities for throughput and latenciesCheckpointing into DSEFSBusiness Take AwaysIn the right now economy businesses need analytics that are up to date. Monthly, weekly, and hourly reports are often too old to be actionable. DSE Streaming analytics allows businesses to operationalize structured analytics to obtain real time insights to power their decision making.Out of the Five Dimensions, this asset focuses on Relevancy and Responsiveness without ignoring the remaining dimensions (Availability, Accessibility, Engagement).If discussing this asset with a business stakeholder it may be relevant to walk them through The DataStax StoryTechnical Take AwaysDSE Streaming Analytics is DataStax's version of Apache Spark (TM) which ships with DSE and is modified to match the design principles that our engineering team has always focused on when building C* and DSE (CARDS). DSE Spark is optimized for high availability and operational simplicity by removing its dependency on zookeeper and using LWTs for leader elections. Furthermore, DSE Spark is optimized for performance against the DSE backend with features that include Contiunous Paging and Direct Joins.Building Streaming Analytics applications on DSE requires:Having a streaming source - in this case we use EBDSE's tcpserver functionality and data generation capabilities as our streaming sourceA Spark Streaming Application - sources for the app included as part of the assetFor more general information on how to use Spark Streaming check out the Programming Guide in the Spark Docs.These docs will dive deeper on the following functionality:Cumulative Streaming CalculationsMonitoring and Tuning Streaming Jobs

Illustration Image

This is a guide for how to use the Streaming Analytics Proofshop asset brought to you by the Vanguard team. The Proofshop consists of demonstrating real time data processing and analytics with the DSE paltform at high throughput, low latency, and scale.

Motivation

DSE is often seen as a serving layer for the output of analytical applications on large datasets. Often the analytics themselves are performed in slow batch data layers and persisted to DSE only when they are ready to be exposed to a front end or downstream system. In many cases, as we look to satisfy end user requirements, it makes more sense to perform some of these operational analytics in real time using DSE's streaming analytics capabilities.

What is included?

This field asset includes sample usage of DSE Streaming Analytics in the following contexts:

  • Data generation using EBDSE served via TCP
  • Spark streaming application that persists raw events into DSE and rollups into summary tables
  • Tuning capabilities for throughput and latencies
  • Checkpointing into DSEFS

Business Take Aways

In the right now economy businesses need analytics that are up to date. Monthly, weekly, and hourly reports are often too old to be actionable. DSE Streaming analytics allows businesses to operationalize structured analytics to obtain real time insights to power their decision making.

Out of the Five Dimensions, this asset focuses on Relevancy and Responsiveness without ignoring the remaining dimensions (Availability, Accessibility, Engagement).

If discussing this asset with a business stakeholder it may be relevant to walk them through The DataStax Story

Technical Take Aways

DSE Streaming Analytics is DataStax's version of Apache Spark (TM) which ships with DSE and is modified to match the design principles that our engineering team has always focused on when building C* and DSE (CARDS). DSE Spark is optimized for high availability and operational simplicity by removing its dependency on zookeeper and using LWTs for leader elections. Furthermore, DSE Spark is optimized for performance against the DSE backend with features that include Contiunous Paging and Direct Joins.

Building Streaming Analytics applications on DSE requires:

  • Having a streaming source - in this case we use EBDSE's tcpserver functionality and data generation capabilities as our streaming source
  • A Spark Streaming Application - sources for the app included as part of the asset

For more general information on how to use Spark Streaming check out the Programming Guide in the Spark Docs.

These docs will dive deeper on the following functionality:

  • Cumulative Streaming Calculations
  • Monitoring and Tuning Streaming Jobs

Related Articles

cluster
troubleshooting
datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

arodrime

4/3/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

streaming