Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

6/26/2019

Reading time:6 min

Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Kne…

by DataStax

Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Kne… SlideShare Explore You Successfully reported this slideshow.Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | C* Summit 2016Upcoming SlideShareLoading in …5× 1 Comment 9 Likes Statistics Notes RovieSalvatierra Yoshitaka Shimokawa , IBM Global Services - DPE at Delivery Project Executive Yogesh Tiwari Andres March , Lead Software Engineer at Cimpress at Cimpress Show More No DownloadsNo notes for slideNot called out directly: Run once Ignore errorsAZ loss problem removed if each AZ has a complete copy of the data4) Nice is used for low impact 6) Bucket lifecycle policies are also used. Separate process is for higher granularity.Hostnames: We use these in S3 paths as a unique source identifier. May not be needed depending on implementation.Impact: nice Automation: cron, Tower, etcConsistency: data agrees to within C* internals Integrity: no corruption induced by restore Time: ~few hoursFiltered keyspaces: Peers Local NOT schema! Highlight box: Minimum metadata collected Combination of old & new config settings: Critical: stored in S3 Non-critical: use what’s in repo Approach assumes: Snapshot being restored is recent Config changes are rare Could store more config details, up to entire cassandra.yamlAssumes all AZs have same # of nodes → much worse if not! Quorum loss threshold: For RF=3 and same # nodes in each AZ: 9 total nodesMapping requires metadata stored at backup time → restore-focused backups3) Since completion, restores have been in demand for investigations. Dev velocity has increased as a result. 1. Cassandra backups and restorations usingAnsibleDr. Joshua WickmanDatabase EngineerKnewton 2. Relevant technologies● AWS infrastructure● Deployment and configuration managementwith Ansible○ Ansible is built on:■ Python■ YAML■ SSH■ Jinja2 templating○ Agentless - less complexity 3. Ansible playbooksample---- hosts: < host group specification >serial: 1pre_tasks:- name: ask for human confirmationlocal_action:module: pauseprompt: Confirm action on {{ play_hosts | length }} hosts?run_once: yestags:- always- hostcount< more setup tasks >roles:- role: base- role: cassandra-install- role: cassandra-configurepost_tasks:- name: wait to make sure cassandra is upwait_for:host: '{{ inventory_hostname }}'port: 9160delay: "{{ pause_time | default(15) }}"timeout: "{{ listen_timeout | default(120) }}"ignore_errors: yes< more post-startup tasks >- name: install and configure alertsinclude: monitoring.yml< more plays >A single “play”Roles define complex,repeatable rule setsCan execute on local orremote hostTags allow task filteringOne host at a time(default: all in parallel)Import other playbooksBuilt-in variablesTemplate with defaultansible-playbook path/to/sample_playbook.yml -i host_file -e "listen_timeout=30"Sample command: 4. Knewton’s Cassandra deployment● Running on AWS instances in a VPC● Ansible repo contains:○ Dynamic host inventory○ Configuration details for Cassandra nodes■ Config file templates (cassandra.yaml, etc)■ Variable defaults○ Roles and playbooks for Cassandra node operations:■ Create / provision new nodes■ Rolling restart a cluster■ Upgrade a cluster■ Backups and restores 5. Backups for disaster recoveryDatalossDatacorruptionAZ/rackloss Data centerloss 6. But that’s not all...Restored backups are also useful for:● Benchmarking● Data warehousing● Batch jobs● Load testing● Corruption testing● Tracking down incident causes 7. BackupsThose sound like a good idea. I can get those for you, no sweat! 8. ● Simple to use● Centralized, yet distributed● Low impact● Built with restores in mindBackups — requirementsEasy with AnsibleObvious, but super important to get right! 9. Backup playbook1. Ansible run initiated2. Commands sent to each Cassandranode over SSH3. nodetool snapshot on each node4. Snapshot uploaded to S3Via AWS CLI5. Metadata gathered centrally byAnsible and uploaded to S36. Backup retention policies enforced byseparate processAnsibleCassandra clusterAWS S3RetentionenforcementSSHAWS CLI 10. Backup metadata{"ips": ["123.45.67.0","123.45.67.1","123.45.67.2"],"ts": "2016-09-01T01:23:45.987654","version": "2.1","tokens": {"1a": [{"tokens": [...],"hostname": "sample-0"},"1c": [{"tokens": [...],"hostname": "sample-2"},...]}}● IP list for cluster history / backupsource tracking● Needed for restores:○ Cassandra version○ Token ranges○ AZ mappingSSTable compatibilityFor partitionerMore on this later 11. Backups — results● Simple and predictable● Clusterwide snapshots● Low impact● Automation-readyEverything’s good!...right? 12. RestoresOh, you actually wanted to use that data again? That’s… harder. 13. ● Primary○ Data consistency across nodes○ Data integrity maintained○ Time to recovery● Secondary○ Multiple snapshots at a time○ Can be automated or run on-demand○ Versatile end stateRestores — requirementsSpin up newcluster usingrestored data 14. Contained in backup metadata• Cassandra version• Number of nodes• Token ranges• Rack distribution– On AWS: availability zones (AZs)Restored cluster — requirementsEntirely separate from live cluster• No common members• No common seeds• Distinct provisioning identifiers– For us: AWS tagsSame configuration as at snapshotRestore-focused backups 15. Ansible in the cloud — a caveatProgrammatic launch of servers+Ansible host discovery happens once per playbook=Launching cluster requires 2 steps:1. Create instances2. Provision instances as Cassandra nodes 16. Restore playbook 1: create nodes1. Get metadata from S32. Find number of nodes in originalcluster3. Create new nodesNew cluster name is stamped withsnapshot ID, allowing:• Easy distinction from live cluster• Multiple concurrent restores perclusterAnsibleNew Cassandra clusterS3 17. 1. Get metadata from S3 (again)2. Parse metadata– Map source to target3. Find matching files in S3– Filter out some Cassandra systemtables4. Partially provision nodes– Install Cassandra• Use original C* version– Mount data partition5. Download snapshot data to nodes6. Configure Cassandra and finishprovisioning nodesRestore playbook 2: provision nodesAnsibleNew Cassandra clusterS3S3LOADED 18. Restores: node mappingSource ⇒ TargetInclude token rangesSource AZs ⇒ Target AZs 19. Restores: random AZ assignmentSource clusterRestored cluster1a 1c 1d 1a 1c 1d1a 1c 1d 1a 1c 1d 20. Why is this a problem?With NetworkTopologyStrategy and RF ≤ # of AZs, Cassandra would distributereplicas in different AZs…...so data appearing in the same AZ will be skipped on read.● Effectively fewer replicas● Potential quorum loss● Inconsistent access of most recent data 21. Restores: AZ awareSource clusterRestored cluster1a 1c 1d 1a 1c 1d1a 1c 1d 1a 1c 1d 22. Implementation details● Snapshot ID○ Datetime stamp (start of backup)○ Restore defaults to latest● Restores use auto_bootstrap: false○ Nodes already have their data!● Anti-corruption measures○ Metadata manifest created after backup hassucceeded○ If any node fails, entire restore fails 23. Extras● Automated runs using cron job,Ansible Tower or CD frameworks● Restricted-access backups fordev teams using internal service 24. Conclusions● Restore-focused backups are imperative forconsistent restores● Ansible is easy to work with and providescentralized control with a distributed workload● Reliable backup restores are powerful andversatile 25. Thank you!Questions? Recommended Teaching Future-Ready StudentsOnline Course - LinkedIn Learning Teaching Technical Skills Through VideoOnline Course - LinkedIn Learning Gamification of LearningOnline Course - LinkedIn Learning An Introduction to PriamJason Brown DataStax: Backup and Restore in Cassandra and OpsCenterDataStax Academy Cassandra Operations at Netflixgreggulrich Open source integrated infra structure using ansible configuration managementDyaa El-din Ahmed 20160623 衛生福利部:「我國當前重要傳染病防治」報告R.O.C.Executive Yuan Spotmakers studio presentationМихаил Маризов Zalando: Bootstrapping a Cassandra Cluster in a Dynamic Cloud EnvironmentDataStax Academy About Blog Terms Privacy Copyright LinkedIn Corporation © 2019 × Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Illustration Image
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Kne…

Successfully reported this slideshow.

Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | C* Summit 2016
Cassandra backups and restorations using
Ansible
Dr. Joshua Wickman
Database Engineer
Knewton
Relevant technologies
● AWS infrastructure
● Deployment and configuration management
with Ansible
○ Ansible is built on:
■...
Ansible playbook
sample
---
- hosts: < host group specification >
serial: 1
pre_tasks:
- name: ask for human confirmation
...
Knewton’s Cassandra deployment
● Running on AWS instances in a VPC
● Ansible repo contains:
○ Dynamic host inventory
○ Con...
Backups for disaster recovery
Data
loss
Data
corruption
AZ/rack
loss Data center
loss
But that’s not all...
Restored backups are also useful for:
● Benchmarking
● Data warehousing
● Batch jobs
● Load testing
...
Backups
Those sound like a good idea. I can get those for you, no sweat!
● Simple to use
● Centralized, yet distributed
● Low impact
● Built with restores in mind
Backups — requirements
Easy with...
Backup playbook
1. Ansible run initiated
2. Commands sent to each Cassandra
node over SSH
3. nodetool snapshot on each nod...
Backup metadata
{
"ips": [
"123.45.67.0",
"123.45.67.1",
"123.45.67.2"
],
"ts": "2016-09-01T01:23:45.987654",
"version": "...
Backups — results
● Simple and predictable
● Clusterwide snapshots
● Low impact
● Automation-ready
Everything’s good!
...r...
Restores
Oh, you actually wanted to use that data again? That’s… harder.
● Primary
○ Data consistency across nodes
○ Data integrity maintained
○ Time to recovery
● Secondary
○ Multiple snapshots ...
Contained in backup metadata
• Cassandra version
• Number of nodes
• Token ranges
• Rack distribution
– On AWS: availabili...
Ansible in the cloud — a caveat
Programmatic launch of servers
+
Ansible host discovery happens once per playbook
=
Launch...
Restore playbook 1: create nodes
1. Get metadata from S3
2. Find number of nodes in original
cluster
3. Create new nodes
N...
1. Get metadata from S3 (again)
2. Parse metadata
– Map source to target
3. Find matching files in S3
– Filter out some Ca...
Restores: node mapping
Source ⇒ Target
Include token ranges
Source AZs ⇒ Target AZs
Restores: random AZ assignment
Source cluster
Restored cluster
1a 1c 1d 1a 1c 1d
1a 1c 1d 1a 1c 1d
Why is this a problem?
With NetworkTopologyStrategy and RF ≤ # of AZs, Cassandra would distribute
replicas in different AZ...
Restores: AZ aware
Source cluster
Restored cluster
1a 1c 1d 1a 1c 1d
1a 1c 1d 1a 1c 1d
Implementation details
● Snapshot ID
○ Datetime stamp (start of backup)
○ Restore defaults to latest
● Restores use auto_b...
Extras
● Automated runs using cron job,
Ansible Tower or CD frameworks
● Restricted-access backups for
dev teams using int...
Conclusions
● Restore-focused backups are imperative for
consistent restores
● Ansible is easy to work with and provides
c...
Thank you!
Questions?

Upcoming SlideShare

Loading in …5

×

  1. 1. Cassandra backups and restorations using Ansible Dr. Joshua Wickman Database Engineer Knewton
  2. 2. Relevant technologies ● AWS infrastructure ● Deployment and configuration management with Ansible ○ Ansible is built on: ■ Python ■ YAML ■ SSH ■ Jinja2 templating ○ Agentless - less complexity
  3. 3. Ansible playbook sample --- - hosts: < host group specification > serial: 1 pre_tasks: - name: ask for human confirmation local_action: module: pause prompt: Confirm action on {{ play_hosts | length }} hosts? run_once: yes tags: - always - hostcount < more setup tasks > roles: - role: base - role: cassandra-install - role: cassandra-configure post_tasks: - name: wait to make sure cassandra is up wait_for: host: '{{ inventory_hostname }}' port: 9160 delay: "{{ pause_time | default(15) }}" timeout: "{{ listen_timeout | default(120) }}" ignore_errors: yes < more post-startup tasks > - name: install and configure alerts include: monitoring.yml < more plays > A single “play” Roles define complex, repeatable rule sets Can execute on local or remote host Tags allow task filtering One host at a time (default: all in parallel) Import other playbooks Built-in variables Template with default ansible-playbook path/to/sample_playbook.yml -i host_file -e "listen_timeout=30" Sample command:
  4. 4. Knewton’s Cassandra deployment ● Running on AWS instances in a VPC ● Ansible repo contains: ○ Dynamic host inventory ○ Configuration details for Cassandra nodes ■ Config file templates (cassandra.yaml, etc) ■ Variable defaults ○ Roles and playbooks for Cassandra node operations: ■ Create / provision new nodes ■ Rolling restart a cluster ■ Upgrade a cluster ■ Backups and restores
  5. 5. Backups for disaster recovery Data loss Data corruption AZ/rack loss Data center loss
  6. 6. But that’s not all... Restored backups are also useful for: ● Benchmarking ● Data warehousing ● Batch jobs ● Load testing ● Corruption testing ● Tracking down incident causes
  7. 7. Backups Those sound like a good idea. I can get those for you, no sweat!
  8. 8. ● Simple to use ● Centralized, yet distributed ● Low impact ● Built with restores in mind Backups — requirements Easy with Ansible Obvious, but super important to get right!
  9. 9. Backup playbook 1. Ansible run initiated 2. Commands sent to each Cassandra node over SSH 3. nodetool snapshot on each node 4. Snapshot uploaded to S3 Via AWS CLI 5. Metadata gathered centrally by Ansible and uploaded to S3 6. Backup retention policies enforced by separate process Ansible Cassandra cluster AWS S3 Retention enforcement SSH AWS CLI
  10. 10. Backup metadata { "ips": [ "123.45.67.0", "123.45.67.1", "123.45.67.2" ], "ts": "2016-09-01T01:23:45.987654", "version": "2.1", "tokens": { "1a": [ { "tokens": [...], "hostname": "sample-0" }, "1c": [ { "tokens": [...], "hostname": "sample-2" }, ... ] } } ● IP list for cluster history / backup source tracking ● Needed for restores: ○ Cassandra version ○ Token ranges ○ AZ mapping SSTable compatibility For partitioner More on this later
  11. 11. Backups — results ● Simple and predictable ● Clusterwide snapshots ● Low impact ● Automation-ready Everything’s good! ...right?
  12. 12. Restores Oh, you actually wanted to use that data again? That’s… harder.
  13. 13. ● Primary ○ Data consistency across nodes ○ Data integrity maintained ○ Time to recovery ● Secondary ○ Multiple snapshots at a time ○ Can be automated or run on-demand ○ Versatile end state Restores — requirements Spin up new cluster using restored data
  14. 14. Contained in backup metadata • Cassandra version • Number of nodes • Token ranges • Rack distribution – On AWS: availability zones (AZs) Restored cluster — requirements Entirely separate from live cluster • No common members • No common seeds • Distinct provisioning identifiers – For us: AWS tags Same configuration as at snapshot Restore-focused backups
  15. 15. Ansible in the cloud — a caveat Programmatic launch of servers + Ansible host discovery happens once per playbook = Launching cluster requires 2 steps: 1. Create instances 2. Provision instances as Cassandra nodes
  16. 16. Restore playbook 1: create nodes 1. Get metadata from S3 2. Find number of nodes in original cluster 3. Create new nodes New cluster name is stamped with snapshot ID, allowing: • Easy distinction from live cluster • Multiple concurrent restores per cluster Ansible New Cassandra cluster S3
  17. 17. 1. Get metadata from S3 (again) 2. Parse metadata – Map source to target 3. Find matching files in S3 – Filter out some Cassandra system tables 4. Partially provision nodes – Install Cassandra • Use original C* version – Mount data partition 5. Download snapshot data to nodes 6. Configure Cassandra and finish provisioning nodes Restore playbook 2: provision nodes Ansible New Cassandra cluster S3 S3 LOADED
  18. 18. Restores: node mapping Source ⇒ Target Include token ranges Source AZs ⇒ Target AZs
  19. 19. Restores: random AZ assignment Source cluster Restored cluster 1a 1c 1d 1a 1c 1d 1a 1c 1d 1a 1c 1d
  20. 20. Why is this a problem? With NetworkTopologyStrategy and RF ≤ # of AZs, Cassandra would distribute replicas in different AZs… ...so data appearing in the same AZ will be skipped on read. ● Effectively fewer replicas ● Potential quorum loss ● Inconsistent access of most recent data
  21. 21. Restores: AZ aware Source cluster Restored cluster 1a 1c 1d 1a 1c 1d 1a 1c 1d 1a 1c 1d
  22. 22. Implementation details ● Snapshot ID ○ Datetime stamp (start of backup) ○ Restore defaults to latest ● Restores use auto_bootstrap: false ○ Nodes already have their data! ● Anti-corruption measures ○ Metadata manifest created after backup has succeeded ○ If any node fails, entire restore fails
  23. 23. Extras ● Automated runs using cron job, Ansible Tower or CD frameworks ● Restricted-access backups for dev teams using internal service
  24. 24. Conclusions ● Restore-focused backups are imperative for consistent restores ● Ansible is easy to work with and provides centralized control with a distributed workload ● Reliable backup restores are powerful and versatile
  25. 25. Thank you! Questions?

×

Related Articles

cassandra
ansible

GitHub - locp/ansible-role-cassandra: Ansible role to install and configure Apache Cassandra

locp

8/25/2022

kubernetes
terraform
cassandra

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra