Multi-Region Cassandra Clusters

Successfully reported this slideshow.

Multi-Region Cassandra Clusters
Novel Multi-region Clusters
Cassandra Deployments Split Between Heterogeneous Data Centres
with NAT & DNS-SD
#CassandraSum...
Adam Zegelin
Co-founder & VP of Engineering
www.instaclustr.com
adam@instaclustr.com

@adamzegelin
Instaclustr
• Instaclustr provides Cassandra-as-a-service in the cloud

(Currently only on AWS — Google Cloud in private b...
Multi-DC @ Instaclustr
• Cloud ⇄ cloud, “classic” internet-facing data centre ⇄ cloud
• Works out-of-the-box today.
• Requ...
• Overview of multi- region/data centre clusters
• What is supported out-of-the-box
• Alternative solutions
• Supporting t...
Single Node
• What you get from running
apt-get install
cassandra and /usr/bin/
cassandra
• Fragile (no redundancy)
• Dev/...
Multi-node, Single Data Centre
• Two or more servers running
Cassandra within one DC
• Replication of data
(redundancy)
• ...
Multi-node, Multi-DC
• Cassandra running in two or
more data centres
• Global deployments
• Data near your customers
(redu...
Snitches
• Understands data centres and racks
• Implementation may automatically determine node DC and rack

(EC2MultiRegi...
Data Centres
• Collection of Racks
• Complete replications
• Geographically separate
• Possibly high-latency interconnects...
Racks
• Collection of nodes
• May fail as a single unit
• Modelled on the traditional DC rack/cage

(n-servers running of ...
☁
• Amazon Web Services

(use EC2MultiRegionSnitch)
• Data Centre ≡ AWS Region

(e.g. US_East_1, AP_SOUTHEAST_2)
• Rack ≡ ...
Data Centre Aware
• Cassandra is data centre aware
• Only fetch data from a remote DC if absolutely required

(remote data...
Cluster cluster = Cluster.builder()
.addContactPoint(…)
.withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(“US_EAST_1"))...
Multi DC Support
• Per-node public (internet-facing) IP address
• Optionally, per-node private IP address
• Per-node publi...
Multi DC Support
• Cloud ⇄ cloud, traditional ⇄ cloud, traditional ⇄ traditional
• Easy to setup per-node public and priva...
IPv4 Address Space Exhaustion
Source: http://www.potaroo.net/tools/ipv4/
Multi-DC Support
• IPv4
• Address exhaustion
• Over time, will become more expensive to purchase addresses
• Wasteful

(be...
Alternatives
• IPv6
• Java supports it ∴ Cassandra probably supports it

(untested by us)
• Global IPv6 adoption is ~4%

(...
Alternatives
• VPNs
• tinc, OpenVPN, etc.
• All private address space — no dual addressing
• Requires multiple links — bet...
Data Centres Links
3 3
5 10
7 21
Alternatives
• Network Address Translation (NAT)

(aka IP Masquerading or Port Address Translation (PAT))
• Deployed on mo...
NAT Basics
• Re-maps IP address spaces

(e.g. Public 96.31.81.80 ↔ Private 192.168.*.*)
• ? public addresses, shared by ? ...
NAT with Inbound Connections
• Static port forwarding

(configured on the gateway)
• Automatic port forwarding — UPnP, NAT-...
NAT + C∗
Situation: ? Cassandra nodes, 1 public address per data centre
• Port forward different public ports for each nod...
Advertising Port Mappings
• Extend Cassandra Gossip
• Include port numbers in node address announcements
• Allow seed node...
Advertising Port Mappings
• DNS-SD — dns-sd.org

(aka Bonjour/Zeroconf)
• Reads — works with existing DNS implementations
...
Advertising Port Mappings
• DNS-SD cont’d.
• SRV records contain hostname and port

(i.e., hostname of the NAT gateway and...
Advertised Details
• Each cluster is it’s own browse domain
• Each NAT gateway device has an A record in the browse domain...
Configuration
• Cassandra is configured to only use private addresses
• On cluster creation
• Establish a new DNS-SD browse ...
$ dns-sd -B _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.
Browsing for _cassandra._tcp...
Java Driver Modifications
• This is usually a no-op

(the default is IdentityTranslater)
• Modify translate() to perform a ...
Modifying Cassandra
• Responsible for managing Socket connections.
• Modify newSocket() to perform a DNS-SD lookup.
• The ...
C* C*
C*
C* C*
C*
NAT Gateway NAT Gateway
DNS (+ DNS-SD) Server

(Route 53, Self-hosted, etc)Client
Application
Thanks!
Questions?
adam@instaclustr.com

Upcoming SlideShare

Loading in …5

×

  • Be the first to like this

  1. 1. Novel Multi-region Clusters Cassandra Deployments Split Between Heterogeneous Data Centres with NAT & DNS-SD #CassandraSummit
  2. 2. Adam Zegelin Co-founder & VP of Engineering www.instaclustr.com adam@instaclustr.com
 @adamzegelin
  3. 3. Instaclustr • Instaclustr provides Cassandra-as-a-service in the cloud
 (Currently only on AWS — Google Cloud in private beta) • We currently manage 50+ Cassandra nodes for various customers • We often get requests to do cool things — and try and make it happen!
  4. 4. Multi-DC @ Instaclustr • Cloud ⇄ cloud, “classic” internet-facing data centre ⇄ cloud • Works out-of-the-box today. • Requires per-node public IP • Private network clusters ⇄ Cloud clusters • Easy if your private network allocates per-node public IP addresses • VPNs • Something else?
  5. 5. • Overview of multi- region/data centre clusters • What is supported out-of-the-box • Alternative solutions • Supporting technology overview (NAT/PAT and DNS-SD) • Implementation
  6. 6. Single Node • What you get from running apt-get install cassandra and /usr/bin/ cassandra • Fragile (no redundancy) • Dev/test/sandbox only C*
  7. 7. Multi-node, Single Data Centre • Two or more servers running Cassandra within one DC • Replication of data (redundancy) • Increased capacity (storage + throughput) • Baseline for production clusters C* C* C*
  8. 8. Multi-node, Multi-DC • Cassandra running in two or more data centres • Global deployments • Data near your customers (reduced latency) • Supported out-of-the-box C* C* C* C* C* C* C* C* C*
  9. 9. Snitches • Understands data centres and racks • Implementation may automatically determine node DC and rack
 (EC2MultiRegionSnitch uses AWS internal metadata service, GossipingPropertiesFileSnitch loads a .properties file) • Node DC and rack is advertised via Gossip • Determine node proximity (estimated link latency) • Cluster may use a combination of Snitch implementations
  10. 10. Data Centres • Collection of Racks • Complete replications • Geographically separate • Possibly high-latency interconnects
 (e.g. East Coast US → Sydney, ~300ms round-trip)
  11. 11. Racks • Collection of nodes • May fail as a single unit • Modelled on the traditional DC rack/cage
 (n-servers running of a UPS)
  12. 12. ☁ • Amazon Web Services
 (use EC2MultiRegionSnitch) • Data Centre ≡ AWS Region
 (e.g. US_East_1, AP_SOUTHEAST_2) • Rack ≡ Availability Zone
 (e.g. us-east-1a, ap-southeast-2b) • Google Cloud Platform
 (no out-of-the-box auto-configuring snitch — use GossipingPropertiesFileSnitch, or roll your own!) • Data Centre ≡ GCP Region
 (e.g. US, Europe) • Rack ≡ Zone
 (e.g. us-central1-a, europe-west1-a)
  13. 13. Data Centre Aware • Cassandra is data centre aware • Only fetch data from a remote DC if absolutely required
 (remote data is more “expensive”) • Clients can be made data centre aware • If your app knows its DC, client will talk to the closest DC
  14. 14. Cluster cluster = Cluster.builder() .addContactPoint(…) .withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(“US_EAST_1")) .build();
  15. 15. Multi DC Support • Per-node public (internet-facing) IP address • Optionally, per-node private IP address • Per-node public address is used for inter-data centre connectivity • Per node private address is used for intra-data centre connectivity
  16. 16. Multi DC Support • Cloud ⇄ cloud, traditional ⇄ cloud, traditional ⇄ traditional • Easy to setup per-node public and private addresses • Private network clusters ⇄ Cloud clusters • Private networks: ? public addresses, shared by ? private addresses. Not 1 ↔ 1
 (where often ? > ?) • done via Network Address Translation
  17. 17. IPv4 Address Space Exhaustion Source: http://www.potaroo.net/tools/ipv4/
  18. 18. Multi-DC Support • IPv4 • Address exhaustion • Over time, will become more expensive to purchase addresses • Wasteful
 (being a good internet citizen)
  19. 19. Alternatives • IPv6 • Java supports it ∴ Cassandra probably supports it
 (untested by us) • Global IPv6 adoption is ~4%
 (according to Google — google.com/intl/en/ipv6/statistics.html) • IPv6/IPv4 hybrid
 (Teredo, 6over4, et. al.) • AWS EC2 does not support IPv6. End of story.
 (Elastic Load Balancer does support IPv6)
  20. 20. Alternatives • VPNs • tinc, OpenVPN, etc. • All private address space — no dual addressing • Requires multiple links — between every DC and per client • Address space overlaps between multiple VPNs • Connectivity to multiple clusters an issue
 (for multi-cluster apps, centralised monitoring, etc)
  21. 21. Data Centres Links 3 3 5 10 7 21
  22. 22. Alternatives • Network Address Translation (NAT)
 (aka IP Masquerading or Port Address Translation (PAT)) • Deployed on most private networks • Connectivity between private network clusters ⇄ Cloud clusters • Supports client connectivity to multiple clusters
  23. 23. NAT Basics • Re-maps IP address spaces
 (e.g. Public 96.31.81.80 ↔ Private 192.168.*.*) • ? public addresses, shared by ? private addresses. Not 1 ↔ 1
 (where often n = 1, ? > ?) • Port Address Translation • Private port ↔ Public port • Outbound connections only without port forwarding or NAT traversal • Per DC gateway device — performs NAT and port forwarding
  24. 24. NAT with Inbound Connections • Static port forwarding
 (configured on the gateway) • Automatic port forwarding — UPnP, NAT-PMP/PCP
 (configured by the application, e.g. Cassandra) • NAT Traversal — STUN, ICE, etc.
  25. 25. NAT + C∗ Situation: ? Cassandra nodes, 1 public address per data centre • Port forward different public ports for each node • Advertise assigned ports • Modify Cassandra and client applications to connect to advertised ports
  26. 26. Advertising Port Mappings • Extend Cassandra Gossip • Include port numbers in node address announcements • Allow seed node addresses to include port numbers • Allow multiple nodes to have identical public & private addresses
 (only port numbers differ per DC) • How to bootstrap? SIP? • Cassandra must be aware of the allocated ports in order to advertise • Hard if C* is not directly responsible for the port mapping
 (e.g. static port forwarding) • Too many modifications to internals
  27. 27. Advertising Port Mappings • DNS-SD — dns-sd.org
 (aka Bonjour/Zeroconf) • Reads — works with existing DNS implementations
 (it’s just a DNS query) • Even inside restrictive networks, DNS usually works • Combination of DNS TXT, SRV and PTR records. • Updates • via DNS Update & TSIG — supported by bind • via API — e.g. for AWS Route 53
  28. 28. Advertising Port Mappings • DNS-SD cont’d. • SRV records contain hostname and port
 (i.e., hostname of the NAT gateway and public C* port) • TXT records contain key=value pairs
 (useful for additional connection & config details) • Modify C* connection code to lookup foreign node port from DNS • Modify client driver connection code to lookup ports from DNS • Can be queried & updated out-of-band
 (updated by the NAT device or central management server which knows which ports were mapped)
  29. 29. Advertised Details • Each cluster is it’s own browse domain • Each NAT gateway device has an A record in the browse domain • Each DNS-SD service is named based on the private IP address • Requires unique private IP addresses across data centres • SRV port is the C* thrift port • Additional ports are advertise via TXT
  30. 30. Configuration • Cassandra is configured to only use private addresses • On cluster creation • Establish a new DNS-SD browse domain • Create A records for each gateway device • NAT gateway device is notified when a new C* node is started • Allocates random public ports for C* and configures Port Forwarding • Updates DNS-SD • New SRV and TXT record
  31. 31. $ dns-sd -B _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. Browsing for _cassandra._tcp A/R Flags if Domain Service Type Instance Name Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-4 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-2 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-3 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-2 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-4 Add 2 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-3 $ dns-sd -L 192-168-1-4 _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. Lookup 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. can be reached at aws- us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.:1236 (interface 0) version=2.0.7 cqlport=1237 $ nslookup aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. Non-authoritative answer: Name: aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au Address: 54.209.123.195 Output of dns-sd
 (Can also use avahi-browse, dig, or any other DNS query tool)
  32. 32. Java Driver Modifications • This is usually a no-op
 (the default is IdentityTranslater) • Modify translate() to perform a DNS-SD lookup. • The address parameter is a node private IP address. • Locate a service with a name = private IP address to determine public IP/port. public interface AddressTranslater { public InetSocketAddress translate(InetSocketAddress address); }
  33. 33. Modifying Cassandra • Responsible for managing Socket connections. • Modify newSocket() to perform a DNS-SD lookup. • The endpoint parameter is a node private IP address. • Locate a service with a name = private IP address to determine public IP/port public class OutboundTcpConnectionPool { ⋮ public static Socket newSocket(InetAddress endpoint) throws IOException {…}
 ⋮ }
  34. 34. C* C* C* C* C* C* NAT Gateway NAT Gateway DNS (+ DNS-SD) Server
 (Route 53, Self-hosted, etc)Client Application
  35. 35. Thanks! Questions? adam@instaclustr.com