Successfully reported this slideshow.
Multi-Region Cassandra Clusters
Upcoming SlideShare
Loading in …5
×
-
Be the first to like this
No Downloads
No notes for slide
- 1. Novel Multi-region Clusters Cassandra Deployments Split Between Heterogeneous Data Centres with NAT & DNS-SD #CassandraSummit
- 2. Adam Zegelin Co-founder & VP of Engineering www.instaclustr.com adam@instaclustr.com @adamzegelin
- 3. Instaclustr • Instaclustr provides Cassandra-as-a-service in the cloud (Currently only on AWS — Google Cloud in private beta) • We currently manage 50+ Cassandra nodes for various customers • We often get requests to do cool things — and try and make it happen!
- 4. Multi-DC @ Instaclustr • Cloud ⇄ cloud, “classic” internet-facing data centre ⇄ cloud • Works out-of-the-box today. • Requires per-node public IP • Private network clusters ⇄ Cloud clusters • Easy if your private network allocates per-node public IP addresses • VPNs • Something else?
- 5. • Overview of multi- region/data centre clusters • What is supported out-of-the-box • Alternative solutions • Supporting technology overview (NAT/PAT and DNS-SD) • Implementation
- 6. Single Node • What you get from running apt-get install cassandra and /usr/bin/ cassandra • Fragile (no redundancy) • Dev/test/sandbox only C*
- 7. Multi-node, Single Data Centre • Two or more servers running Cassandra within one DC • Replication of data (redundancy) • Increased capacity (storage + throughput) • Baseline for production clusters C* C* C*
- 8. Multi-node, Multi-DC • Cassandra running in two or more data centres • Global deployments • Data near your customers (reduced latency) • Supported out-of-the-box C* C* C* C* C* C* C* C* C*
- 9. Snitches • Understands data centres and racks • Implementation may automatically determine node DC and rack (EC2MultiRegionSnitch uses AWS internal metadata service, GossipingPropertiesFileSnitch loads a .properties file) • Node DC and rack is advertised via Gossip • Determine node proximity (estimated link latency) • Cluster may use a combination of Snitch implementations
- 10. Data Centres • Collection of Racks • Complete replications • Geographically separate • Possibly high-latency interconnects (e.g. East Coast US → Sydney, ~300ms round-trip)
- 11. Racks • Collection of nodes • May fail as a single unit • Modelled on the traditional DC rack/cage (n-servers running of a UPS)
- 12. ☁ • Amazon Web Services (use EC2MultiRegionSnitch) • Data Centre ≡ AWS Region (e.g. US_East_1, AP_SOUTHEAST_2) • Rack ≡ Availability Zone (e.g. us-east-1a, ap-southeast-2b) • Google Cloud Platform (no out-of-the-box auto-configuring snitch — use GossipingPropertiesFileSnitch, or roll your own!) • Data Centre ≡ GCP Region (e.g. US, Europe) • Rack ≡ Zone (e.g. us-central1-a, europe-west1-a)
- 13. Data Centre Aware • Cassandra is data centre aware • Only fetch data from a remote DC if absolutely required (remote data is more “expensive”) • Clients can be made data centre aware • If your app knows its DC, client will talk to the closest DC
- 14. Cluster cluster = Cluster.builder() .addContactPoint(…) .withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(“US_EAST_1")) .build();
- 15. Multi DC Support • Per-node public (internet-facing) IP address • Optionally, per-node private IP address • Per-node public address is used for inter-data centre connectivity • Per node private address is used for intra-data centre connectivity
- 16. Multi DC Support • Cloud ⇄ cloud, traditional ⇄ cloud, traditional ⇄ traditional • Easy to setup per-node public and private addresses • Private network clusters ⇄ Cloud clusters • Private networks: ? public addresses, shared by ? private addresses. Not 1 ↔ 1 (where often ? > ?) • done via Network Address Translation
- 17. IPv4 Address Space Exhaustion Source: http://www.potaroo.net/tools/ipv4/
- 18. Multi-DC Support • IPv4 • Address exhaustion • Over time, will become more expensive to purchase addresses • Wasteful (being a good internet citizen)
- 19. Alternatives • IPv6 • Java supports it ∴ Cassandra probably supports it (untested by us) • Global IPv6 adoption is ~4% (according to Google — google.com/intl/en/ipv6/statistics.html) • IPv6/IPv4 hybrid (Teredo, 6over4, et. al.) • AWS EC2 does not support IPv6. End of story. (Elastic Load Balancer does support IPv6)
- 20. Alternatives • VPNs • tinc, OpenVPN, etc. • All private address space — no dual addressing • Requires multiple links — between every DC and per client • Address space overlaps between multiple VPNs • Connectivity to multiple clusters an issue (for multi-cluster apps, centralised monitoring, etc)
- 21. Data Centres Links 3 3 5 10 7 21
- 22. Alternatives • Network Address Translation (NAT) (aka IP Masquerading or Port Address Translation (PAT)) • Deployed on most private networks • Connectivity between private network clusters ⇄ Cloud clusters • Supports client connectivity to multiple clusters
- 23. NAT Basics • Re-maps IP address spaces (e.g. Public 96.31.81.80 ↔ Private 192.168.*.*) • ? public addresses, shared by ? private addresses. Not 1 ↔ 1 (where often n = 1, ? > ?) • Port Address Translation • Private port ↔ Public port • Outbound connections only without port forwarding or NAT traversal • Per DC gateway device — performs NAT and port forwarding
- 24. NAT with Inbound Connections • Static port forwarding (configured on the gateway) • Automatic port forwarding — UPnP, NAT-PMP/PCP (configured by the application, e.g. Cassandra) • NAT Traversal — STUN, ICE, etc.
- 25. NAT + C∗ Situation: ? Cassandra nodes, 1 public address per data centre • Port forward different public ports for each node • Advertise assigned ports • Modify Cassandra and client applications to connect to advertised ports
- 26. Advertising Port Mappings • Extend Cassandra Gossip • Include port numbers in node address announcements • Allow seed node addresses to include port numbers • Allow multiple nodes to have identical public & private addresses (only port numbers differ per DC) • How to bootstrap? SIP? • Cassandra must be aware of the allocated ports in order to advertise • Hard if C* is not directly responsible for the port mapping (e.g. static port forwarding) • Too many modifications to internals
- 27. Advertising Port Mappings • DNS-SD — dns-sd.org (aka Bonjour/Zeroconf) • Reads — works with existing DNS implementations (it’s just a DNS query) • Even inside restrictive networks, DNS usually works • Combination of DNS TXT, SRV and PTR records. • Updates • via DNS Update & TSIG — supported by bind • via API — e.g. for AWS Route 53
- 28. Advertising Port Mappings • DNS-SD cont’d. • SRV records contain hostname and port (i.e., hostname of the NAT gateway and public C* port) • TXT records contain key=value pairs (useful for additional connection & config details) • Modify C* connection code to lookup foreign node port from DNS • Modify client driver connection code to lookup ports from DNS • Can be queried & updated out-of-band (updated by the NAT device or central management server which knows which ports were mapped)
- 29. Advertised Details • Each cluster is it’s own browse domain • Each NAT gateway device has an A record in the browse domain • Each DNS-SD service is named based on the private IP address • Requires unique private IP addresses across data centres • SRV port is the C* thrift port • Additional ports are advertise via TXT
- 30. Configuration • Cassandra is configured to only use private addresses • On cluster creation • Establish a new DNS-SD browse domain • Create A records for each gateway device • NAT gateway device is notified when a new C* node is started • Allocates random public ports for C* and configures Port Forwarding • Updates DNS-SD • New SRV and TXT record
- 31. $ dns-sd -B _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. Browsing for _cassandra._tcp A/R Flags if Domain Service Type Instance Name Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-4 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-2 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-3 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-2 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-4 Add 2 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-3 $ dns-sd -L 192-168-1-4 _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. Lookup 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. can be reached at aws- us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.:1236 (interface 0) version=2.0.7 cqlport=1237 $ nslookup aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. Non-authoritative answer: Name: aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au Address: 54.209.123.195 Output of dns-sd (Can also use avahi-browse, dig, or any other DNS query tool)
- 32. Java Driver Modifications • This is usually a no-op (the default is IdentityTranslater) • Modify translate() to perform a DNS-SD lookup. • The address parameter is a node private IP address. • Locate a service with a name = private IP address to determine public IP/port. public interface AddressTranslater { public InetSocketAddress translate(InetSocketAddress address); }
- 33. Modifying Cassandra • Responsible for managing Socket connections. • Modify newSocket() to perform a DNS-SD lookup. • The endpoint parameter is a node private IP address. • Locate a service with a name = private IP address to determine public IP/port public class OutboundTcpConnectionPool { ⋮ public static Socket newSocket(InetAddress endpoint) throws IOException {…} ⋮ }
- 34. C* C* C* C* C* C* NAT Gateway NAT Gateway DNS (+ DNS-SD) Server (Route 53, Self-hosted, etc)Client Application
- 35. Thanks! Questions? adam@instaclustr.com
Public clipboards featuring this slide
No public clipboards found for this slide