Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
DataStax: Extreme Cassandra Optimization: The Sequel
Upcoming SlideShare
Loading in …5
×
1
Like this presentation? Why not share!
No Downloads
No notes for slide
- 1. ©2015 DataStax Confidential. Do not distribute without consent. https://goo.gl/JtC9YR @AlTobey Extreme Cassandra Optimization: The Sequel 1
- 2. init() •This is all specific to Cassandra 2.1 •I will try to call out dangerous and apocryphal settings •Focus is on the low-hanging fruit
- 3. OODA
- 4. benchmark configure observe think START HERE (unless you’re already in prod, in which case, START HERE)
- 5. Questions to ask: • Look at the available hardware and make an educated guess • How many sockets/cores? Hyperthreading? NUMA? • How much RAM? • memory bandwidth matters • What kind of storage? • How much per node? • What kind of network interface is it? • Some clouds have PPS limit
- 6. 0x00b0 0x00b0 Hypervisor IOMMU vCPU 0 vCPU 1 vCPU 2 vCPU 3 application kernel vCPU 0 vCPU 1 vCPU 2 vCPU 3 application 0x00b0 0x00b0 kernel hypervisors
- 7. containers (Docker) 0x00b0 0x00b0 kernel 0x00b0 0x00b0 bridge veth application iptables application host networking Docker networking
- 8. benchmark configure observe think YOU ARE HERE
- 9. JVM • Use Hotspot Java 8 >= u45 • Java 7 is EOL and slower • OpenJDK is fine •Zulu is a handy way to get the latest •http://www.azulsystems.com/products/zulu •Speaking of Azul … • Some Datastax customers are having success with C4 • But I can’t talk about any of them
- 10. cassandra-env.sh: G1GC #JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}" # REJOICE! JVM_OPTS="$JVM_OPTS -XX:+UseG1GC" JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=20" JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5" #JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=24" #JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=24" #JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500" JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"
- 11. cassandra-env.sh: CMS MAX_HEAP_SIZE=8G HEAP_NEWSIZE=2G # start here, adjust to workload # http://blog.ragozin.info/2012/03/secret-hotspot-option- improving-gc.html JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions" JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=4096" # these will need to be adjusted to the workload; start here JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2" JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=15"
- 12. cassandra-env.sh: More JVM flags JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking" JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB" JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch" # esp. Docker! JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC" JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"
- 13. cassandra.yaml: IO threads concurrent_reads: 128 concurrent_writes: 128
- 14. cassandra.yaml: memtables memtable_heap_space_in_mb: 2048 memtable_cleanup_threshold: 0.10 memtable_flush_writers: 4 #memtable_allocation_type: offheap_objects # MAYBE Set these together!
- 15. cassandra.yaml: commitlog # Cassandra >= 2.1.9 commitlog_segment_recycling: false # on SSDs and some HDD RAID trickle_fsync: true trickle_fsync_interval_in_kb: 1024 # and/or set vm.dirty_background_bytes low echo 8388608 > /proc/sys/vm/dirty_background_bytes
- 16. cassandra.yaml: miscellaneous num_tokens: 32 # or 1, if you prefer # default in OSS is “all” internode_compression: dc # Cassandra >= 2.1.5 otc_coalescing_strategy: TIMEHORIZON # https://issues.apache.org/jira/browse/CASSANDRA-8611 streaming_socket_timeout_in_ms: 600000
- 17. cassandra: schema • The data model is the single most important factor for performance! • Check your compression block size (per table) • Use size-tiered compaction (STCS) • leveled compaction (LCS) for read-heavy workloads on fast storage • the current default of 160MB sstable_size_in_mb is fine • DTCS for time series (http://www.datastax.com/dev/blog/dtcs-notes-from-the-field)
- 18. Linux: sysctl.d vm.dirty_background_bytes = 16777216 vm.dirty_bytes = 4294967296 fs.file-max = 1000000 vm.max_map_count = 1048576 vm.swappiness = 1
- 19. Linux: storage cd /sys/block for drive in sd* xvd* vd* nvme* do echo deadline > $drive/queue/scheduler echo 8 > $drive/queue/read_ahead_kb # only on fast SSDs echo 0 > $drive/queue/nomerges done
- 20. Linux: RAID & filesystems • use xfs • ext4 if you must • ZFS if you love yourself and want to be happy • btrfs if you like to live dangerously • RAID*: Pass stripe size & width to mkfs whenever possible • RAID0 is by far the most common choice • RAID10 is fine if you can afford the disks • RAID5/6 in some circumstances, but there’s a tradeoff • JBOD is great but has tradeoffs
- 21. Linux kernel boot parameters isolcpus=0 idle=mwait intel_idle.max_cstate=0 processor.max_cstate=0 idle=halt (C1 only) idle=poll (for extreme cases, wastes power) Disable in BIOS
- 22. Disable Frequency Scaling # make sure the CPUs run at max frequency for sysfs_cpu in /sys/devices/system/cpu/cpu[0-9]* do echo performance > $sysfs_cpu/cpufreq/scaling_governor done
- 23. Docker
- 24. benchmark configure observe think YOU ARE HERE
- 25. BENCHMARKETING
- 26. cassandra-stress cassandra-stress write n=100M cl=LOCAL_QUORUM -col "size=fixed(128)" "n=fixed(10)" -schema "replication(factor=3)" -rate threads=512 limit=35000/s -errors ignore -mode native cql3 -node 127.0.0.1
- 27. ops/s mean median p95 p99 p99.9 max
- 28. cassandra-stress: user schema cassandra-stress user n=100M cl=LOCAL_QUORUM profile=bank_stress.yaml 'ops(simple=1)' no-warmup -rate threads=512 limit=35000/s -errors ignore -node 127.0.0.1
- 29. benchmark configure observe think YOU ARE HERE
- 30. drop cache increase RA job done
- 31. drop cache 332MiB free 91.6GiB free
- 32. lspci -vv
- 33. https://goo.gl/JtC9YR
Public clipboards featuring this slide
No public clipboards found for this slide