Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Mon…

Successfully reported this slideshow.

We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Knewton) | C* Summit 2016
Java tuning for Knewton’s C* clusters Lessons learned Carlos Monroy Knewton
Knewton Leader in adaptive learning - Partners with publishers and institutions in Europe, US, and Asia - Provides unique ...
1 JVM tuning at Knewton 2 Updating memtable_allocation_type 3 Changing garbage collection strategy 3© DataStax, All Rights...
Context As many startups, our company needed to make tradeoffs in order to rapidly deliver the product: - Technical debt. ...
Successful initiatives - In house command line tools - cassandra-toolbox python package - distributed nodetool - Separatio...
Successful initiatives - In house command line tools https://github.com/Knewton/cassandra-toolbox/ - cassandra-toolbox pyt...
Successful initiatives - In house command line tools - cassandra-toolbox python package - distributed nodetool - Separatio...
Successful initiatives - In house command line tools - cassandra-toolbox python package - distributed nodetool - Separatio...
Some less successful initiatives - Monitoring and alerts based on Graphite graphs - Too many resources to get an aggregate...
1 JVM tuning at Knewton 2 Updating memtable_allocation_type 3 Changing garbage collection strategy 10© DataStax, All Right...
memtable_allocation_type Cassandra allows to keep memtables and key cache objects in the native memory, instead of the Jav...
Update memtable_allocation_type cassandra-stress tool is a great starting point while validating changes for the database ...
Update memtable_allocation_type cassandra-stress tool is a great starting point while validating changes for the database ...
Test memtable_allocation_type update 14 Update setting Load test (locust) Compile logs from C* and application Analysis wi...
Update memtable - Criteria End-to-end 15 • Response time – Timeouts • Errors • Throughput • CPU consumption • Memory used ...
Update memtable_allocation_type Time used for Garbage Collection offheap_buffers offheap_objects heap_buffers 16 Comparing...
Update memtable_allocation_type Time used for Garbage Collection offheap_buffers offheap_objects heap_buffers 17 Comparing...
Update memtable_allocation_type Memory sizes offheap_buffers offheap_objects heap_buffers 18 Comparing the sizes per gener...
Update memtable_allocation_type GC phases offheap_buffers offheap_objects heap_buffers 19 Comparing the behaviour of the g...
memtable_allocation_type results We are using offheap_buffers as it showed: - the lowest average response time for request...
1 JVM tuning at Knewton 2 Updating memtable_allocation_type 3 Changing garbage collection strategy 21© DataStax, All Right...
22 Garbage First Garbage Collection (G1GC) The G1 collector utilizes multiple background threads to scan through the heap ...
G1GC analysis G1 was released since April 2012 (JDK 7 update 4 and up) The tools available for the analysis of the garbage...
Test garbage collection 24 Enable gc data collection Get a baseline Compile GC logs Analysis with R
G1GC Java arguments 25 Java Arguments as defined in cassandra-env.sh -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX...
G1GC analysis - Heap size 26
G1GC analysis - Heap size 27
G1GC analysis - Heap size 28
G1GC analysis - Heap size 29
G1GC analysis - Heap size 30
G1GC analysis - phases 31
G1GC Analysis demo
Code Garbage collection analysis : https://gist.github.com/roymontecutli/4cf5c97f03720e60825f414667c141da Cassandra toolbo...
Conclusions - Moving objects from the JVM heap memory can improve the performance of the application when dealing with lar...
Thanks carlos@knewton.com

Upcoming SlideShare

Loading in …5

×

image1

Like this presentation? Why not share!

  1. 1. Java tuning for Knewton’s C* clusters Lessons learned Carlos Monroy Knewton
  2. 2. Knewton Leader in adaptive learning - Partners with publishers and institutions in Europe, US, and Asia - Provides unique recommendations to students based on previous behavior. - Advanced content ingestion, curation, and calibration - Runs in AWS with many different storage backends - Check us out: www.knewton.com/about/careers/ © DataStax, All Rights Reserved. 2
  3. 3. 1 JVM tuning at Knewton 2 Updating memtable_allocation_type 3 Changing garbage collection strategy 3© DataStax, All Rights Reserved.
  4. 4. Context As many startups, our company needed to make tradeoffs in order to rapidly deliver the product: - Technical debt. - Silos and isolated efforts. - Decisions based on gut and intuition. One year ago: - Different versions of Cassandra - Multiple clients (i.e.: Pycassa, Hector, Astyanax, Datastax) - Huge challenge with backups and restores Now: - 99.98% database uptime - The database is not a black box anymore © DataStax, All Rights Reserved. 4
  5. 5. Successful initiatives - In house command line tools - cassandra-toolbox python package - distributed nodetool - Separation of objects from heap memory (memtable_allocation_type) - Customization of heap size allocation. - Update to Garbage First Garbage Collection (G1GC). - Monitoring/alerting based on JMX metrics © DataStax, All Rights Reserved. 5
  6. 6. Successful initiatives - In house command line tools https://github.com/Knewton/cassandra-toolbox/ - cassandra-toolbox python package - distributed nodetool - Separation of objects from heap memory (memtable_allocation_type) - Customization of heap size allocation. - Update to Garbage First Garbage Collection (G1GC). - Monitoring/alerting based on JMX metrics © DataStax, All Rights Reserved. 6
  7. 7. Successful initiatives - In house command line tools - cassandra-toolbox python package - distributed nodetool - Separation of objects from heap memory (memtable_allocation_type) - Customization of heap size allocation. - Update to Garbage First Garbage Collection (G1GC). - Monitoring/alerting based on JMX metrics © DataStax, All Rights Reserved. 7
  8. 8. Successful initiatives - In house command line tools - cassandra-toolbox python package - distributed nodetool - Separation of objects from heap memory (memtable_allocation_type) - Customization of heap size allocation. https://tech.knewton.com/ - Update to Garbage First Garbage Collection (G1GC). - Monitoring/alerting based on JMX metrics © DataStax, All Rights Reserved. 8
  9. 9. Some less successful initiatives - Monitoring and alerts based on Graphite graphs - Too many resources to get an aggregate - High incidence of false positives and false negatives - GoCD - Cloudwatch © DataStax, All Rights Reserved. 9
  10. 10. 1 JVM tuning at Knewton 2 Updating memtable_allocation_type 3 Changing garbage collection strategy 10© DataStax, All Rights Reserved.
  11. 11. memtable_allocation_type Cassandra allows to keep memtables and key cache objects in the native memory, instead of the Java JVM heap. - Used for data structures that continue growing with time - Options: - heap_buffers - default value before Cassandra 3.0 - all the objects are kept in the JVM heap memory - offheap_buffers - cell name and values are moved to DirectBuffer objects - offheap_objects - moves the entire cell off heap, leaving only a pointer 11
  12. 12. Update memtable_allocation_type cassandra-stress tool is a great starting point while validating changes for the database configuration 12 But we needed to go the extra mile with an end-to-end test - involving the rest of the dev team - demonstrate the positive impact of the change to the rest of the system
  13. 13. Update memtable_allocation_type cassandra-stress tool is a great starting point while validating changes for the database configuration 13 But we needed to go the extra mile with an end-to-end test - involving the rest of the dev team - demonstrate the positive impact of the change to the rest of the system
  14. 14. Test memtable_allocation_type update 14 Update setting Load test (locust) Compile logs from C* and application Analysis with R Response times Functional load tests
  15. 15. Update memtable - Criteria End-to-end 15 • Response time – Timeouts • Errors • Throughput • CPU consumption • Memory used Cassandra specific • Cassandra – Time spent for Garbage Collection • Collection – Read and Write latencies – Errors/Exceptions
  16. 16. Update memtable_allocation_type Time used for Garbage Collection offheap_buffers offheap_objects heap_buffers 16 Comparing garbage collection times with different values for memtable_allocation_type
  17. 17. Update memtable_allocation_type Time used for Garbage Collection offheap_buffers offheap_objects heap_buffers 17 Comparing garbage collection times with different values for memtable_allocation_type
  18. 18. Update memtable_allocation_type Memory sizes offheap_buffers offheap_objects heap_buffers 18 Comparing the sizes per generation spaces, before and after the garbage collection.
  19. 19. Update memtable_allocation_type GC phases offheap_buffers offheap_objects heap_buffers 19 Comparing the behaviour of the garbage collection phases.
  20. 20. memtable_allocation_type results We are using offheap_buffers as it showed: - the lowest average response time for requests - lowest CPU usage - lowest thread count created - lowest write latency *Results may vary 20
  21. 21. 1 JVM tuning at Knewton 2 Updating memtable_allocation_type 3 Changing garbage collection strategy 21© DataStax, All Rights Reserved.
  22. 22. 22 Garbage First Garbage Collection (G1GC) The G1 collector utilizes multiple background threads to scan through the heap that it divides into regions. It is named “Garbage first” (G1) gives preference to scan those regions that contain the most garbage objects first. This collector is turned on using the –XX:+UseG1GC flag.
  23. 23. G1GC analysis G1 was released since April 2012 (JDK 7 update 4 and up) The tools available for the analysis of the garbage collection logs didn’t have the support or were not able to interpret all the information from our servers. - Netflix gcviz does not support Garbage First (G1) strategy - In Oracle’s developer blog (Jeff Taylor) it is proposed an initial approach for JDK 7 © DataStax, All Rights Reserved. 23
  24. 24. Test garbage collection 24 Enable gc data collection Get a baseline Compile GC logs Analysis with R
  25. 25. G1GC Java arguments 25 Java Arguments as defined in cassandra-env.sh -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:ParallelGCThreads=2 -XX:ConcGCThreads=2 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 -Xloggc:/<valid path>/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
  26. 26. G1GC analysis - Heap size 26
  27. 27. G1GC analysis - Heap size 27
  28. 28. G1GC analysis - Heap size 28
  29. 29. G1GC analysis - Heap size 29
  30. 30. G1GC analysis - Heap size 30
  31. 31. G1GC analysis - phases 31
  32. 32. G1GC Analysis demo
  33. 33. Code Garbage collection analysis : https://gist.github.com/roymontecutli/4cf5c97f03720e60825f414667c141da Cassandra toolbox : https://github.com/Knewton/cassandra-toolbox 33
  34. 34. Conclusions - Moving objects from the JVM heap memory can improve the performance of the application when dealing with large data sets. Yet you need to find out which strategy (take out buffers or objects) suits the best for your use case. - Garbage Collection is an operation that can impact adversely the performance on a Cassandra cluster. Having tools to analyze its behaviour will help to identify areas of impact and measure improvements. - Configuration changes should always consider the system as a whole, involve all the teams. © DataStax, All Rights Reserved. 34
  35. 35. Thanks carlos@knewton.com

×