1/29/2019

Reading time:7 min

A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) …

by DataStax

A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) … SlideShare Explore You Successfully reported this slideshow.A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassandra Summit 2016Upcoming SlideShareLoading in …5× 0 Comments 3 Likes Statistics Notes zheng jinyuan , ccc at Asiacell at Asiacell Vijayakumar Ramdoss , Platform Architect at Dell Anand Rao , Sr. SOA Architect at Cengage Learning No DownloadsNo notes for slide 1. Introduction toCassandra.yaml& friends 2. Hi, I’m Edward Capriolo.@edwardcapriolohttps://www.linkedin.com/in/edwardcapriolohttp://www.slideshare.net/edwardcaprioloConsultantThe Last PickleCassandra user since (v 0.6.5)White Plains, NY USA 3. We help people deliver and improve ApacheCassandra based solutions.With staff in 5 countries and over 50 yearscombined experience. 4. This talk is the ʻgatewayʼ talk…Many ʻpicklersʼ (TLP staff) are coveringsome points I will quickly cover over indepth in other talks. 5. 1. Key conﬁguration settings2. Conﬁguration outside of the yaml3. Multi-system conﬁguration settings4. Advanced settingsSection Overview5. Exotic settings 6. 1. $ wget <apache-cassandra*.tar.gz>2. $ tar -xf <apache-cassandra*.tar.gz>3. $ apache-cassandra*/bin/cassandraBasic setupResult:Web scale distributed storageDrop Mic. 7. Well almost…We have to do a bit of conﬁguration. 8. cqlsh> CREATE KEYSPACE test WITH replication =  {'class': ‘SimpleStrategy', 'replication_factor' : 1}; cqlsh> USE test; cqlsh:test> CREATE COLUMNFAMILY trip (src varchar,... dest varchar, PRIMARY KEY (src,dest));Before we dive into conﬁgcqlsh:test> SELECT * FROM trip;src | dest-----+------ny | cacqlsh:test> INSERT INTO trip (src, dest) VALUES ('ny', 'ca');cqlsh:test> INSERT INTO trip (src, dest) VALUES ('fl', 'ca');cqlsh:test> SELECT * FROM trip;src | dest-----+------fl | cany | ca 9. Single Data Center 10. Multiple Data Center 11. data_file_directories:- /var/lib/cassandra/data1. User data is stored in all listed directories2. Do: fast seekʼing storage (SSD)Where does the data go?3. Do: ample free space (30% overhead)4. Donʼt: Store on a SAN 12. commitlog_directory:- /var/lib/cassandra/commitlog2. Donʼt: Assume these are log4j type logs3. Do: use a dedicated disk if possibleCommit log storage4. Do: provide at least 10GB (write velocity)1. Stores unﬂushed mutations (write/deletes) 13. Ok we now where(most of) the data goes…How do clients connect? 14. Default port binding1. Cassandra does not bind to 0.0.0.02. 127.0.0.1 not web scale3. 7000 is the “Storage Port” inter node trafﬁc4. 9042 is the “Native Port” client trafﬁc 15. start_native_transport: true (default)native_transport_port: 9042 (default)listen_address: localhost1. Change listen_address to a client-reachable addressNative transport2. Do: consider transport security3. Do: consider network routing performance4. Donʼt: put nodes on a public network. EVAR 16. Outside the yaml ﬁle… 17. cassandra-env.sh (& friends)1. JVM and startup params deﬁned outside the YAML2. Newer version of c* use jvm.options 18. #MAX_HEAP_SIZE="1G"#HEAP_NEWSIZE="100M"1. max(min(1/2 ram, 1024MB),  min(1/4 ram, 8GB))2. Do: set lower when experimenting with workstationMemory usage3. Do: leave ample free memory for disk cache 19. JMX1. bin/nodetool uses JMX to administer Cassandra2. All management tools require password if setCheck out Nate’s talk on Securing Cassandra to learn more 20. Multi-node conﬁgurations 21. # phi_convict_threshold: 81. Threshold for failure detector3. Do: Raise for ﬂaky WAN networks 10 - 12Phi convict threshold2. False positives make nodes appear down to peers 22. # endpoint_snitch: SimpleSnitch2. Do: use SimpleSnitch for single switch/LAN3. Consider: Multi DC to startDeﬁning network topology1. Snitch with conﬁg data determines topology 23. 3. Rack has impact on replication copies4. Donʼt: Change rack unless you understand the impactGossiping Property File Snitch2. DC may not be physical but is a replication unitconf/cassandra-rackdc.properties  dc=dc1rack=rack11. Information is propagated around the cluster 24. internode_compression: all | dc | noneinter_dc_tcp_nodelay: falseInternode communications1. WAN can beneﬁt from reduced sizeserver_encryption_options:internode_encryption: noneinternode_authenticator:o.a.c.auth.AllowAllInternodeAuthenticator2. Settings which server nodes use to communicate 25. broadcast_address: 1.2.3.4listen_on_broadcast_address: falsebroadcast_rpc_address: 1.2.3.4Broadcast address1. Gossip a speciﬁc address (not bind address)2. Useful in NAT and cloud environments 26. Broadcast address 27. Advanced settings 28. Write pathhttp://www.toadworld.com/platforms/nosql/w/wiki/11621.an-introduction-to-apache-cassandra 29. #memtable_flush_writers: 11. Default One per data directoryMemtables# memtable_cleanup_thresholddefaults to 1 /(memtable_flush_writers +#memtable_cleanup_threshold: 0.112. 1 / (1 + 1) = .5 30. #If omitted, both set to 1/4 the heap#memtable_heap_space_in_mb: 2048 #memtable_offheap_space_in_mb: 20481. Depending on the next setting dictates how muchof each memory type is used.5 of what you ask?#heap_buffers: on heap nio buffers #offheap_buffers: off heap nio buffers #offheap_objects: off heap objects #memtable_allocation_type: heap_buffers2. Based on column value buffers vs objects may bebetter 31. trickle_fsync: falsetrickle_fsync_interval_in_kb: 10240Trickle fsync1. Optimization to periodically f-sync large ﬁles2. Designed to prevent latency spikes in read path 32. Include image of compaction hereCompactionhttps://www.instaclustr.com/blog/2016/01/27/apache-cassandra-compaction/ 33. concurrent_compactors: 1compaction_throughput_mb_per_sec: 161. Control resources used by compactionCompaction2. Compaction throughput can be changed at runtime3. Generally concurrent_compactors < 8 and > 1 34. disk_failure_policy: stopcommit_failure_policy: stopDisk Failure settings1. stop_paranoid: shut down gossip and clienttransports even for single-sstable errors, kill the JVMfor errors during startup2. die: shut down gossip and Thrift and kill the JVM,so the node can be replaced 35. hinted_handoff_enabled: truemax_hint_window_in_ms: 10800000hinted_handoff_throttle_in_kb: 1024max_hints_delivery_threads: 2hints_directory: /var/lib/cassandra/hintshints_flush_period_in_ms: 10000max_hints_file_size_in_mb: 128 hints_compression: LZ4CompressorHints1. Hints recently redesigned, again again2. Donʼt: tune high and overwhelming recovering node3. Donʼt: tune low and have out of sync data 36. #disk_optimization_strategy: ssd1. Tip for those with rotational disksDisk optimization strategy 37. Exotic settings 38. auto_bootstrap : true(hidden variable)Auto bootstrap1. “Bootstrapping” here means: Should the nodejoining attempt to acquire data from other nodes orstartup empty2. Can be used when bringing on new datacenter3. Can be used when streaming/ join issues 39. incremental_backups: false snapshot_before_compaction: false auto_snapshot: trueBackup*Ish options1. Enable with external backup like tools2. Creates hard link ﬁles operator must clean up3. Enabling and not cleaning will cause disk ﬁll up4. Truncate/drop makes snapshot 40. read_request_timeout_in_ms: 5000 write_request_timeout_in_ms: 2000 request_timeout_in_ms: 10000Per operation default timeouts1. Each operation type has different timeout2. Applied on the coordinator not the client3. Previously was only global rpc_timeout 41. commitlog_sync: periodiccommitlog_sync_period_in_ms: 10000 commitlog_segment_size_in_mb: 32Commit Log sync1. Alternative batch mode blocks ack to clients2. Commit logs persist until Memtableʼs ﬂush 42. Thanks!@edwardcapriolo Recommended Learning SchoologyOnline Course - LinkedIn Learning Information LiteracyOnline Course - LinkedIn Learning Learning to Write a SyllabusOnline Course - LinkedIn Learning Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Jérôme Petazzoni Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax Designing a Distributed Cloud Database for DummiesDataStax How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax How to Evaluate Cloud Databases for eCommerceDataStax Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax About Blog Terms Privacy Copyright LinkedIn Corporation © 2019 Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Read this article if you want to know more about A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) …

A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) …

SlideShare Explore You

Successfully reported this slideshow.

A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassandra Summit 2016

Introduction to
Cassandra.yaml
& friends

Upcoming SlideShare

Loading in …5

×

0 Comments

1. Introduction to Cassandra.yaml & friends
2. Hi, I’m Edward Capriolo. @edwardcapriolo https://www.linkedin.com/in/edwardcapriolo http://www.slideshare.net/edwardcapriolo Consultant The Last Pickle Cassandra user since (v 0.6.5) White Plains, NY USA
3. We help people deliver and improve Apache Cassandra based solutions. With staff in 5 countries and over 50 years combined experience.
4. This talk is the ʻgatewayʼ talk… Many ʻpicklersʼ (TLP staff) are covering some points I will quickly cover over in depth in other talks.
5. 1. Key conﬁguration settings 2. Conﬁguration outside of the yaml 3. Multi-system conﬁguration settings 4. Advanced settings Section Overview 5. Exotic settings
6. 1. $ wget <apache-cassandra*.tar.gz> 2. $ tar -xf <apache-cassandra*.tar.gz> 3. $ apache-cassandra*/bin/cassandra Basic setup Result: Web scale distributed storage Drop Mic.
7. Well almost… We have to do a bit of conﬁguration.
8. cqlsh> CREATE KEYSPACE test WITH replication =   {'class': ‘SimpleStrategy', 'replication_factor' : 1};  cqlsh> USE test;  cqlsh:test> CREATE COLUMNFAMILY trip (src varchar, ... dest varchar, PRIMARY KEY (src,dest)); Before we dive into conﬁg cqlsh:test> SELECT * FROM trip; src | dest -----+------ ny | ca cqlsh:test> INSERT INTO trip (src, dest) VALUES ('ny', 'ca'); cqlsh:test> INSERT INTO trip (src, dest) VALUES ('fl', 'ca'); cqlsh:test> SELECT * FROM trip; src | dest -----+------ fl | ca ny | ca
9. Single Data Center
10. Multiple Data Center
11. data_file_directories: - /var/lib/cassandra/data 1. User data is stored in all listed directories 2. Do: fast seekʼing storage (SSD) Where does the data go? 3. Do: ample free space (30% overhead) 4. Donʼt: Store on a SAN
12. commitlog_directory: - /var/lib/cassandra/commitlog 2. Donʼt: Assume these are log4j type logs 3. Do: use a dedicated disk if possible Commit log storage 4. Do: provide at least 10GB (write velocity) 1. Stores unﬂushed mutations (write/deletes)
13. Ok we now where (most of) the data goes… How do clients connect?
14. Default port binding 1. Cassandra does not bind to 0.0.0.0 2. 127.0.0.1 not web scale 3. 7000 is the “Storage Port” inter node trafﬁc 4. 9042 is the “Native Port” client trafﬁc
15. start_native_transport: true (default) native_transport_port: 9042 (default) listen_address: localhost 1. Change listen_address to a client-reachable address Native transport 2. Do: consider transport security 3. Do: consider network routing performance 4. Donʼt: put nodes on a public network. EVAR
16. Outside the yaml ﬁle…
17. cassandra-env.sh (& friends) 1. JVM and startup params deﬁned outside the YAML 2. Newer version of c* use jvm.options
18. #MAX_HEAP_SIZE="1G" #HEAP_NEWSIZE="100M" 1. max(min(1/2 ram, 1024MB),   min(1/4 ram, 8GB)) 2. Do: set lower when experimenting with workstation Memory usage 3. Do: leave ample free memory for disk cache
19. JMX 1. bin/nodetool uses JMX to administer Cassandra 2. All management tools require password if set Check out Nate’s talk on Securing Cassandra to learn more
20. Multi-node conﬁgurations
21. # phi_convict_threshold: 8 1. Threshold for failure detector 3. Do: Raise for ﬂaky WAN networks 10 - 12 Phi convict threshold 2. False positives make nodes appear down to peers
22. # endpoint_snitch: SimpleSnitch 2. Do: use SimpleSnitch for single switch/LAN 3. Consider: Multi DC to start Deﬁning network topology 1. Snitch with conﬁg data determines topology
23. 3. Rack has impact on replication copies 4. Donʼt: Change rack unless you understand the impact Gossiping Property File Snitch 2. DC may not be physical but is a replication unit conf/cassandra-rackdc.properties   dc=dc1 rack=rack1 1. Information is propagated around the cluster
24. internode_compression: all | dc | none inter_dc_tcp_nodelay: false Internode communications 1. WAN can beneﬁt from reduced size server_encryption_options: internode_encryption: none internode_authenticator: o.a.c.auth.AllowAllInternodeAuthenticator 2. Settings which server nodes use to communicate
25. broadcast_address: 1.2.3.4 listen_on_broadcast_address: false broadcast_rpc_address: 1.2.3.4 Broadcast address 1. Gossip a speciﬁc address (not bind address) 2. Useful in NAT and cloud environments
26. Broadcast address
27. Advanced settings
28. Write path http://www.toadworld.com/platforms/nosql/w/wiki/11621.an- introduction-to-apache-cassandra
29. #memtable_flush_writers: 1 1. Default One per data directory Memtables # memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + #memtable_cleanup_threshold: 0.11 2. 1 / (1 + 1) = .5
30. #If omitted, both set to 1/4 the heap #memtable_heap_space_in_mb: 2048  #memtable_offheap_space_in_mb: 2048 1. Depending on the next setting dictates how much of each memory type is used .5 of what you ask? #heap_buffers: on heap nio buffers  #offheap_buffers: off heap nio buffers  #offheap_objects: off heap objects  #memtable_allocation_type: heap_buffers 2. Based on column value buffers vs objects may be better
31. trickle_fsync: false trickle_fsync_interval_in_kb: 10240 Trickle fsync 1. Optimization to periodically f-sync large ﬁles 2. Designed to prevent latency spikes in read path
32. Include image of compaction here Compaction https://www.instaclustr.com/blog/2016/01/27/apache- cassandra-compaction/
33. concurrent_compactors: 1 compaction_throughput_mb_per_sec: 16 1. Control resources used by compaction Compaction 2. Compaction throughput can be changed at runtime 3. Generally concurrent_compactors < 8 and > 1
34. disk_failure_policy: stop commit_failure_policy: stop Disk Failure settings 1. stop_paranoid: shut down gossip and client transports even for single-sstable errors, kill the JVM for errors during startup 2. die: shut down gossip and Thrift and kill the JVM, so the node can be replaced
35. hinted_handoff_enabled: true max_hint_window_in_ms: 10800000 hinted_handoff_throttle_in_kb: 1024 max_hints_delivery_threads: 2 hints_directory: /var/lib/cassandra/hints hints_flush_period_in_ms: 10000 max_hints_file_size_in_mb: 128  hints_compression: LZ4Compressor Hints 1. Hints recently redesigned, again again 2. Donʼt: tune high and overwhelming recovering node 3. Donʼt: tune low and have out of sync data
36. #disk_optimization_strategy: ssd 1. Tip for those with rotational disks Disk optimization strategy
37. Exotic settings
38. auto_bootstrap : true(hidden variable) Auto bootstrap 1. “Bootstrapping” here means: Should the node joining attempt to acquire data from other nodes or startup empty 2. Can be used when bringing on new datacenter 3. Can be used when streaming/ join issues
39. incremental_backups: false  snapshot_before_compaction: false  auto_snapshot: true Backup*Ish options 1. Enable with external backup like tools 2. Creates hard link ﬁles operator must clean up 3. Enabling and not cleaning will cause disk ﬁll up 4. Truncate/drop makes snapshot
40. read_request_timeout_in_ms: 5000  write_request_timeout_in_ms: 2000  request_timeout_in_ms: 10000 Per operation default timeouts 1. Each operation type has different timeout 2. Applied on the coordinator not the client 3. Previously was only global rpc_timeout
41. commitlog_sync: periodic commitlog_sync_period_in_ms: 10000  commitlog_segment_size_in_mb: 32 Commit Log sync 1. Alternative batch mode blocks ack to clients 2. Commit logs persist until Memtableʼs ﬂush
42. Thanks!@edwardcapriolo

Visibility Others can see my Clipboard

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us