Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

10/24/2018

Reading time:2 min

HubSpot/gc_log_visualizer

by John Doe

The python script gc_log_visualizer.py will use gnuplot to graph interesting characteristics and data from the given gc log.pre/post gc amounts for total heap. Bar for InitiatingHeapOccupancyPercent if found.mixed gc duration, from the start of the first event until not continued in a new minor event (g1gc)count of sequentials runs of mixed gc (g1gc)stop-the-world pause times from GC events, other stw events ignoredPercentage of total time spent in GC stop-the-worldCount of GC stop-the-world pause times grouped by time takenMulti-phase concurrent mark cycle duration (g1gc)Line graph of pre-gc sizes, young old and total. to-space exhaustion events added for g1gc. Bar for InitiatingHeapOccupancyPercent if found. Reclaimable (mb) amount per mixed gc event.Eden size pre/post. For g1gc shows how the alg floats the target Eden size around.Delta of Tenured data for each GC event for g1gc only. The idea of this graph is to get a rough idea on the Tenured fill rate. Not entirely sure of what's going on here, after a young gc event Tenured can drop significantly.The shell script regionsize_vs_objectsize.sh will take a gc.log as input and return the percent of Humongous Objects that would fit into various G1RegionSize's (2mb-32mb by powers of 2). ./regionsize_vs_objectsize.sh <gc.log> 1986 humongous objects referenced in <gc.log> 32% would not be humongous with a 2mb g1 region size 77% would not be humongous with a 4mb g1 region size 100% would not be humongous with a 8mb g1 region size 100% would not be humongous with a 16mb g1 region size 100% would not be humongous with a 32mb g1 region sizeHow to runThe start and end dates are optional and can be any format gnuplot understands. The second argument will be used as the base name for the created png files. python gc_log_visualizer.py <gc log> <optional output file base name> <optional start date/time, fmt: 2015-08-12:19:36:00> <optional end date/time, fmt: 2015-08-12:19:39:00> python gc_log_visualizer.py gc.log python gc_log_visualizer.py gc.log.0.current user-app python gc_log_visualizer.py gc.log 3minwindow 2015-08-12:19:36:00 2015-08-12:19:39:00gc log preparationThe script has been run on ParallelGC and G1GC logs. There may be some oddities/issues with ParallelGC as profiling it hasn't proven overly useful.The following gc params are required for full functionality. -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicyrequired python libsThe python libs that are required can be found in the setup.py and handled in the usual manner.# enter a virtualenv or notpip install -r requirements.txtgnuplotThe gc.log is parsed into flat files which are then run through gnuplot. # osx brew install gnuplot brew unlink libjpeg brew install libjpeg brew link libjpegExamplesLine charts of generation sizes and total, bar for InitiatingHeapOccupancyPercent (in Mb), reclaimable amount per mixed gc event.Another example of the same chart but with the InitiatingHeapOccupancyPercent set below working set, which results in lots of mixed gc events shown as the reclaimable squares.To-space exhaustion from traffic bursts on cache expiration events. Solution: use stampeding herd protection.This visualization of humongous objects shows the sizes in KB, as well as the vertical groupings that have the potential to cause to-space exhaustion.

Illustration Image

The python script gc_log_visualizer.py will use gnuplot to graph interesting characteristics and data from the given gc log.

  • pre/post gc amounts for total heap. Bar for InitiatingHeapOccupancyPercent if found.
  • mixed gc duration, from the start of the first event until not continued in a new minor event (g1gc)
  • count of sequentials runs of mixed gc (g1gc)
  • stop-the-world pause times from GC events, other stw events ignored
  • Percentage of total time spent in GC stop-the-world
  • Count of GC stop-the-world pause times grouped by time taken
  • Multi-phase concurrent mark cycle duration (g1gc)
  • Line graph of pre-gc sizes, young old and total. to-space exhaustion events added for g1gc. Bar for InitiatingHeapOccupancyPercent if found. Reclaimable (mb) amount per mixed gc event.
  • Eden size pre/post. For g1gc shows how the alg floats the target Eden size around.
  • Delta of Tenured data for each GC event for g1gc only. The idea of this graph is to get a rough idea on the Tenured fill rate. Not entirely sure of what's going on here, after a young gc event Tenured can drop significantly.

The shell script regionsize_vs_objectsize.sh will take a gc.log as input and return the percent of Humongous Objects that would fit into various G1RegionSize's (2mb-32mb by powers of 2).

  ./regionsize_vs_objectsize.sh <gc.log>
  1986 humongous objects referenced in <gc.log>
  32% would not be humongous with a 2mb g1 region size
  77% would not be humongous with a 4mb g1 region size
  100% would not be humongous with a 8mb g1 region size
  100% would not be humongous with a 16mb g1 region size
  100% would not be humongous with a 32mb g1 region size

How to run

The start and end dates are optional and can be any format gnuplot understands. The second argument will be used as the base name for the created png files.

  python gc_log_visualizer.py <gc log> <optional output file base name> <optional start date/time, fmt: 2015-08-12:19:36:00> <optional end date/time, fmt: 2015-08-12:19:39:00>
  python gc_log_visualizer.py gc.log
  python gc_log_visualizer.py gc.log.0.current user-app
  python gc_log_visualizer.py gc.log 3minwindow 2015-08-12:19:36:00 2015-08-12:19:39:00

gc log preparation

The script has been run on ParallelGC and G1GC logs. There may be some oddities/issues with ParallelGC as profiling it hasn't proven overly useful.

The following gc params are required for full functionality.

  -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy

required python libs

The python libs that are required can be found in the setup.py and handled in the usual manner.

# enter a virtualenv or not
pip install -r requirements.txt

gnuplot

The gc.log is parsed into flat files which are then run through gnuplot.

  # osx
  brew install gnuplot
  brew unlink libjpeg
  brew install libjpeg
  brew link libjpeg

Examples

Line charts of generation sizes and total, bar for InitiatingHeapOccupancyPercent (in Mb), reclaimable amount per mixed gc event.

example of main chart with InitiatingHeapOccupancyPercent and reclaimable

Another example of the same chart but with the InitiatingHeapOccupancyPercent set below working set, which results in lots of mixed gc events shown as the reclaimable squares.

example of unhealthy main chart with InitiatingHeapOccupancyPercent and reclaimable

To-space exhaustion from traffic bursts on cache expiration events. Solution: use stampeding herd protection.

example of to-space exhaustion

This visualization of humongous objects shows the sizes in KB, as well as the vertical groupings that have the potential to cause to-space exhaustion.

example of humongous objects

Related Articles

spring
rest
api

GitHub - DataStax-Examples/spring-data-starter: ⚡️ A sample Spring Data Cassandra REST API

John Doe

2/14/2024

cassandra
java

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra