11/25/2019

Reading time:2 min

Cassandra + Spark + Elk

by Vasil Remeniuk

Cassandra + Spark + Elk SlideShare Explore You Successfully reported this slideshow.Cassandra + Spark + ElkUpcoming SlideShareLoading in …5× 0 Comments 7 Likes Statistics Notes Ian Hill , Solutions Architect at Amazon Web Services at Solutions Architect Vladimir Yesaulov , Co-Founder at I-Free 雄也日下部 , Engineer at SAKURA Internet Inc. slachiewicz HYUNGIL KIM , Principal Engineer at Samsung SDS at Samsung SDS Show More No DownloadsNo notes for slide 1. Cassandra+Spark+ELKDmitriy Kalyada @ 2015 2. What is Spark?• Master: Driver program• Workers: Executors• High Availability• Standby Masters withZooKeeper• Single-Node Recovery withLocal File System 3. Under the hood• Resilient Distributed Dataset(RDD)• Scala + Akka Framework• Java, Scala, Python API• Spark SQL, MLib, SparkStreaming, GraphX 4. Our particular caseDevicesCassandraSparkELK 5. Data ﬂowFetcher Transformer SaverInput Source(s)x-RDD x-RDDOutput Source 6. Spark Cassandra Connector• Represents Cassandra tables as Spark RDDs• Write Spark RDDs to Cassandra tables• Execute CQL queries in Spark applicationshttps://github.com/datastax/spark-cassandra-connector 7. CassandraRDD settings• Connection params• Fetching params1. input.split.size: C* partitions in a SparkPartition.2. input.page.row.size: number of CQL rowsfetched per roundtrip. 8. Fetching essentials…………-968391295277638458 … -893783532241185833-968391295277638458, -893783532241185833-7378580094811526501, -73402401171764012396426215139012569257, 6428979455828914106-6094480671546553265, -6016282219056649738-7259249675596554667, -7237838231745167324-6734336817058726139, -6684208157211348972-3891103372671105499, -38225134563250869234453206019575747361,44624417258138553917855385326468991461,7906589648045207141-129433796439502583,-101280166181350027-2233788032218452383,-20666446207110921983248662132571799756,33961294535157767047744134136205124749,7812918342246679728-1408208314239486033,-1403736406052004344• SupportMurmur3PartitionerandRandomPartitioner• Retrieve token rangesfrom Cassandra• Prediction on base of 16random token ranges 9. Data to RDD…Tokens Per RDD[input.split.size]Token Range #NSlurp amount[input.page.row.size] 10. Token range vs rows number 11. What to do?• Change read strategy• Split data on a smaller pieces• Increase cluster strength• Reorganize Cassandra schema 12. Elastic Search 13. Elastic Search & Kibana• Index initialization: TransportClient• Create/Delete Index• Setup Mappings• Indexing: ScalaEsRDD• Data presentation: Kibana 14. Kibana 15. Deployment• Build package: Spark Job + Dependent Jars +Conﬁgs• Upload to the Spark Master Node• Start job submit script 16. Thank youdkaliada@exadel.comDmitriy Kalyada @ 2015 Recommended Teaching with TechnologyOnline Course - LinkedIn Learning Office 365 for EducatorsOnline Course - LinkedIn Learning Learning Management Systems (LMS) Quick StartOnline Course - LinkedIn Learning SMARTSTUDY Django 오픈 세션 2012-08Hyun-woo Park Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...DataStax BI, Reporting and Analytics on Apache CassandraVictor Coustenoble TensorFrames: Google Tensorflow on Apache SparkDatabricks How We Used Cassandra/Solr to Build Real-Time Analytics PlatformDataStax Academy An Introduction to Distributed Search with Cassandra and SolrDataStax Academy SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax About Blog Terms Privacy Copyright LinkedIn Corporation © 2019 × Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Read this article if you want to know more about Cassandra + Spark + Elk

Cassandra + Spark + Elk

SlideShare Explore You

Successfully reported this slideshow.

Cassandra + Spark + Elk

Upcoming SlideShare

Loading in …5

×

0 Comments

1. Cassandra+Spark+ELK Dmitriy Kalyada @ 2015
2. What is Spark? • Master: Driver program • Workers: Executors • High Availability • Standby Masters with ZooKeeper • Single-Node Recovery with Local File System
3. Under the hood • Resilient Distributed Dataset (RDD) • Scala + Akka Framework • Java, Scala, Python API • Spark SQL, MLib, Spark Streaming, GraphX
4. Our particular case Devices Cassandra Spark ELK
5. Data ﬂow Fetcher Transformer Saver Input Source(s) x-RDD x-RDD Output Source
6. Spark Cassandra Connector • Represents Cassandra tables as Spark RDDs • Write Spark RDDs to Cassandra tables • Execute CQL queries in Spark applications https://github.com/datastax/spark-cassandra-connector
7. CassandraRDD settings • Connection params • Fetching params 1. input.split.size: C* partitions in a Spark Partition. 2. input.page.row.size: number of CQL rows fetched per roundtrip.
8. Fetching essentials … … … … -968391295277638458 … -893783532241185833 -968391295277638458, -893783532241185833 -7378580094811526501, -7340240117176401239 6426215139012569257, 6428979455828914106 -6094480671546553265, -6016282219056649738 -7259249675596554667, -7237838231745167324 -6734336817058726139, -6684208157211348972 -3891103372671105499, -3822513456325086923 4453206019575747361,4462441725813855391 7855385326468991461,7906589648045207141 -129433796439502583,-101280166181350027 -2233788032218452383,-2066644620711092198 3248662132571799756,3396129453515776704 7744134136205124749,7812918342246679728 -1408208314239486033,-1403736406052004344 • Support Murmur3Partitioner and RandomPartitioner • Retrieve token ranges from Cassandra • Prediction on base of 16 random token ranges
9. Data to RDD … Tokens Per RDD [input.split.size] Token Range #N Slurp amount [input.page.row.size]
10. Token range vs rows number
11. What to do? • Change read strategy • Split data on a smaller pieces • Increase cluster strength • Reorganize Cassandra schema
12. Elastic Search
13. Elastic Search & Kibana • Index initialization: TransportClient • Create/Delete Index • Setup Mappings • Indexing: ScalaEsRDD • Data presentation: Kibana
14. Kibana
15. Deployment • Build package: Spark Job + Dependent Jars + Conﬁgs • Upload to the Spark Master Node • Start job submit script
16. Thank you dkaliada@exadel.com Dmitriy Kalyada @ 2015

×

Visibility Others can see my Clipboard

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company