3231 ๋‹จ์–ด
16 ๋ถ„
๐Ÿšจ AWS Disaster Recovery โ€” Whitepaper Summary

๐Ÿšจ AWS Disaster Recovery โ€” Whitepaper Summary#

์ถœ์ฒ˜: Disaster Recovery of Workloads on AWS: Recovery in the Cloud

Publication: February 12, 2021 | AWS Well-Architected Framework


๋ชฉ์ฐจ#

  1. ํ•ต์‹ฌ ๊ฐœ๋… ์ •์˜
  2. Business Continuity Plan (BCP)
  3. Data Plane vs. Control Plane
  4. DR ์ „๋žต 4๊ฐ€์ง€
  5. Active/Passive vs. Active/Active
  6. Failover ํŠธ๋ž˜ํ”ฝ ๋ผ์šฐํŒ… ์„œ๋น„์Šค
  7. AWS ํ•ต์‹ฌ DR ์„œ๋น„์Šค
  8. DR ํ…Œ์ŠคํŒ… (Testing Disaster Recovery)
  9. ์ „๋žต ์„ ํƒ ๊ฐ€์ด๋“œ
  10. ๋น„์šฉ ์ตœ์ ํ™” ๊ด€์ 
  11. ๐Ÿ“Œ ์‹œํ—˜ ์ž์ฃผ ์ถœ์ œ ํฌ์ธํŠธ ์ด์ •๋ฆฌ

1. ํ•ต์‹ฌ ๊ฐœ๋… ์ •์˜#

Disaster (์žฌํ•ด)๋ž€?#

์›Œํฌ๋กœ๋“œ ๋˜๋Š” ์‹œ์Šคํ…œ์ด Primary ๋ฐฐํฌ ์œ„์น˜์—์„œ ๋น„์ฆˆ๋‹ˆ์Šค ๋ชฉํ‘œ๋ฅผ ๋‹ฌ์„ฑํ•˜์ง€ ๋ชปํ•˜๋„๋ก ๋ฐฉํ•ดํ•˜๋Š” ์ด๋ฒคํŠธ.

์ž์—ฐ์žฌํ•ด, ๊ธฐ์ˆ  ์žฅ์• , ์ธ์œ„์  ์‹ค์ˆ˜ ๋ชจ๋‘ ํฌํ•จ.


RTO & RPO#

์ง€ํ‘œ์ •์˜
RTO (Recovery Time Objective)์„œ๋น„์Šค ์ค‘๋‹จ ํ›„ ๋ณต๊ตฌ๊นŒ์ง€ ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•œ ์ตœ๋Œ€ ์‹œ๊ฐ„ (โ€œ์–ผ๋งˆ๋‚˜ ๋นจ๋ฆฌ ๋Œ์•„์˜ฌ ์ˆ˜ ์žˆ๋Š”๊ฐ€?โ€)
RPO (Recovery Point Objective)๋งˆ์ง€๋ง‰ ๋ฐ์ดํ„ฐ ๋ณต๊ตฌ ์‹œ์ ๋ถ€ํ„ฐ ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•œ ์ตœ๋Œ€ ์‹œ๊ฐ„ (โ€œ์–ผ๋งˆ๋งŒํผ์˜ ๋ฐ์ดํ„ฐ ์†์‹ค์„ ํ—ˆ์šฉํ•˜๋Š”๊ฐ€?โ€)
์žฌํ•ด ๋ฐœ์ƒ
โ”‚
โ”œโ”€โ”€โ”€ RPO โ”€โ”€โ”€โ†’ (๋งˆ์ง€๋ง‰ ๋ฐฑ์—… ์‹œ์ )
โ”‚ ๋ฐ์ดํ„ฐ ์†์‹ค ๋ฒ”์œ„
โ”‚
โ””โ”€โ”€โ”€ RTO โ”€โ”€โ”€โ†’ (์„œ๋น„์Šค ๋ณต๊ตฌ ์‹œ์ )
์„œ๋น„์Šค ๋‹ค์šดํƒ€์ž„ ๋ฒ”์œ„

๐Ÿ“Œ RTO, RPO๊ฐ€ ๋‚ฎ์„์ˆ˜๋ก ๋น„์šฉ๊ณผ ๋ณต์žก๋„ ์ฆ๊ฐ€ โ€” ๋น„์ฆˆ๋‹ˆ์Šค ์š”๊ตฌ์‚ฌํ•ญ์— ๋งž๋Š” ์ˆ˜์ค€ ์„ ํƒ ํ•„์ˆ˜


Resiliency vs. DR vs. High Availability#

๊ฐœ๋…์ •์˜๋ฒ”์œ„
Resiliency์ธํ”„๋ผ/์„œ๋น„์Šค ์žฅ์•  ๋ณต๊ตฌ ๋Šฅ๋ ฅ๊ด‘๋ฒ”์œ„
High Availability๋‹จ์ผ ๊ตฌ์„ฑ์š”์†Œ ์žฅ์•  ์‹œ์—๋„ ์„œ๋น„์Šค ์ง€์†AZ ๋ ˆ๋ฒจ ๋‹จ์ผ ์žฅ์• 
Disaster Recovery์ผํšŒ์„ฑ ์žฌ๋‚œ ์ด๋ฒคํŠธ ๋ฐœ์ƒ ์‹œ ๋ณต๊ตฌ๋ฐ์ดํ„ฐ ์„ผํ„ฐ/๋ฆฌ์ „ ๋ ˆ๋ฒจ

DR์€ HA์™€ ๋ณ„๊ฐœ์ด๋ฉฐ, HA๊ฐ€ ์™„๋ฒฝํ•ด๋„ DR ์ „๋žต์€ ๋ณ„๋„๋กœ ํ•„์š”ํ•จ.


2. Business Continuity Plan (BCP)#

  • DR ์ „๋žต์€ BCP(๋น„์ฆˆ๋‹ˆ์Šค ์—ฐ์†์„ฑ ๊ณ„ํš)์˜ ํ•˜์œ„ ์ง‘ํ•ฉ์ด์–ด์•ผ ํ•จ (๋…๋ฆฝ์ ์ธ ๋ฌธ์„œ ์•„๋‹˜)
  • ๋น„์ฆˆ๋‹ˆ์Šค ์˜ํ–ฅ ๋ถ„์„ (Business Impact Analysis)์œผ๋กœ ์›Œํฌ๋กœ๋“œ ์ค‘๋‹จ์˜ ๋น„์ฆˆ๋‹ˆ์Šค ์˜ํ–ฅ ์ •๋Ÿ‰ํ™”
  • DR ๋ชฉํ‘œ๋Š” ๋น„์ฆˆ๋‹ˆ์Šค ์š”๊ตฌ์‚ฌํ•ญ, ์šฐ์„ ์ˆœ์œ„, ๋งฅ๋ฝ์— ๊ธฐ๋ฐ˜ํ•ด์•ผ ํ•จ

์˜ˆ์‹œ: ์ง€์ง„์œผ๋กœ ๋ฌผ๋ฅ˜๊ฐ€ ์ฐจ๋‹จ๋˜๋ฉด eCommerce ์•ฑ DR์ด ์™„๋ฒฝํ•ด๋„ BCP๊ฐ€ ๋ฌผ๋ฅ˜๋ฅผ ํ•ด๊ฒฐ ๋ชปํ•˜๋ฉด ๋น„์ฆˆ๋‹ˆ์Šค ๋ชฉํ‘œ ๋‹ฌ์„ฑ ๋ถˆ๊ฐ€.


3. Data Plane vs. Control Plane#

DR ์ „๋žต ์„ค๊ณ„ ์‹œ ํ•ต์‹ฌ ๊ตฌ๋ถ„:

๊ตฌ๋ถ„์—ญํ• ๊ฐ€์šฉ์„ฑ ๋ชฉํ‘œFailover ์‹œ ์‚ฌ์šฉ
Data Plane์‹ค์‹œ๊ฐ„ ์„œ๋น„์Šค ์ œ๊ณต๋†’์Œโœ… ๊ถŒ์žฅ
Control Planeํ™˜๊ฒฝ ์„ค์ •/๊ด€๋ฆฌ๋‚ฎ์ŒโŒ ์ง€์–‘
IMPORTANT

Failover ์ž‘์—…์—๋Š” Data Plane ์ž‘์—…๋งŒ ์‚ฌ์šฉ โ€” Control Plane ์ž‘์—…(์˜ˆ: AWS Backup ๋ณต์›)์€ ์žฌํ•ด ์‹œ ์ ‘๊ทผ ๋ถˆ๊ฐ€๋Šฅํ•  ์ˆ˜ ์žˆ์Œ.

TIP

๋ฐฑ์—…์—์„œ ๋ฐ์ดํ„ฐ ๋ณต์›์€ Control Plane ์ž‘์—… โ†’ ์ •๊ธฐ์  ์ฃผ๊ธฐ ๋ณต์›(Scheduled Periodic Restore)์„ ๋ฏธ๋ฆฌ ์„ค์ •ํ•˜์—ฌ ๋ณต์›๋œ ๋ฐ์ดํ„ฐ์Šคํ† ์–ด๋ฅผ ํ•ญ์ƒ ๋ณด์œ ํ•ด์•ผ ํ•จ.


4. DR ์ „๋žต 4๊ฐ€์ง€#

์ „๋žต ๋น„๊ต ํ•œ๋ˆˆ์— ๋ณด๊ธฐ#

์ „๋žตRPORTO๋น„์šฉ๋ณต์žก๋„ํŠธ๋ž˜ํ”ฝ ์œ ํ˜•
Backup & Restore์‹œ๊ฐ„ ๋‹จ์œ„์‹œ๊ฐ„ ๋‹จ์œ„์ตœ์ € ๐Ÿ’ฒ์ตœ์ €Active/Passive
Pilot Light๋ถ„ ๋‹จ์œ„์ˆ˜์‹ญ ๋ถ„์ค‘๊ฐ„ ๐Ÿ’ฒ๐Ÿ’ฒ์ค‘๊ฐ„Active/Passive
Warm Standby์ดˆ ๋‹จ์œ„๋ถ„ ๋‹จ์œ„๋†’์Œ ๐Ÿ’ฒ๐Ÿ’ฒ๐Ÿ’ฒ๋†’์ŒActive/Passive
Multi-Site Active/ActiveNear-zeroNear-zero์ตœ๊ณ  ๐Ÿ’ฒ๐Ÿ’ฒ๐Ÿ’ฒ๐Ÿ’ฒ์ตœ๊ณ Active/Active
๋น„์šฉ/๋ณต์žก๋„ (๋‚ฎ์Œ โ†’ ๋†’์Œ):
Backup & Restore โ†’ Pilot Light โ†’ Warm Standby โ†’ Multi-Site Active/Active
RTO/RPO (๋†’์Œ โ†’ ๋‚ฎ์Œ, ์ฆ‰ ๋ณต๊ตฌ ์‹œ๊ฐ„ ๊ธธ์–ด์ง):
Multi-Site Active/Active โ†’ Warm Standby โ†’ Pilot Light โ†’ Backup & Restore

์ „๋žต 1: Backup & Restore (๋ฐฑ์—… ๋ฐ ๋ณต์›)#

๊ฐœ๋…: ๋ฐ์ดํ„ฐ๋ฅผ ์ •๊ธฐ์ ์œผ๋กœ ๋ฐฑ์—… โ†’ ์žฌํ•ด ๋ฐœ์ƒ ์‹œ ๋ณต์›

[Primary Region] [Recovery Region]
EC2 / RDS / EBS ๋“ฑ
โ”‚
โ”œโ”€โ”€ ์Šค๋ƒ…์ƒท ์ƒ์„ฑ (๊ฐ™์€ ๋ฆฌ์ „)
โ””โ”€โ”€ ์Šค๋ƒ…์ƒท ๋ณต์‚ฌ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ†’ S3 / Glacier ์ €์žฅ
(์žฌํ•ด ์‹œ ๋ณต์›)
์ธํ”„๋ผ๋„ IaC๋กœ ์žฌ๋ฐฐํฌ

ํŠน์„ฑ:

  • ๊ฐ€์žฅ ๋‹จ์ˆœํ•˜๊ณ  ๊ฐ€์žฅ ์ €๋ ดํ•œ ์ „๋žต
  • Recovery ์‹œ ์ธํ”„๋ผ ๋ฐฐํฌ + ์ฝ”๋“œ ๋ฐฐํฌ + ๋ฐ์ดํ„ฐ ๋ณต์› ๋ชจ๋‘ ํ•„์š” โ†’ ๋†’์€ RTO
  • PITR(Point-in-Time Recovery) ํ™œ์šฉ ์‹œ RPO๋ฅผ ์•ฝ 5๋ถ„๊นŒ์ง€ ๋‚ฎ์ถœ ์ˆ˜ ์žˆ์Œ
  • ๋ชจ๋“  ์›Œํฌ๋กœ๋“œ์— ๊ธฐ๋ณธ ์ ์šฉ (๋‹ค๋ฅธ ์ „๋žต๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉ)

AWS ์„œ๋น„์Šค:

  • AWS Backup: EC2, EBS, RDS, DynamoDB, EFS, FSx ๋“ฑ ์ค‘์•™ ์ง‘์ค‘ ๋ฐฑ์—… ๊ด€๋ฆฌ
  • S3 + S3 Glacier: ๋ฐฑ์—… ์ €์žฅ (Glacier: ์ˆ˜์‹œ๊ฐ„ ๋ณต์›)
  • CloudFormation / CDK: ์ธํ”„๋ผ ์ฝ”๋“œ(IaC)๋กœ ๋ณต์› ์‹œ๊ฐ„ ๋‹จ์ถ•
  • Amazon EventBridge + Lambda: ๊ฐ์ง€ ์ž๋™ํ™” โ†’ RTO ๋‹จ์ถ•

์ ํ•ฉํ•œ ๊ฒฝ์šฐ:

  • ๋น„์ฆˆ๋‹ˆ์Šค ํฌ๋ฆฌํ‹ฐ์ปฌํ•˜์ง€ ์•Š์€ ์›Œํฌ๋กœ๋“œ
  • ๋‹จ์ผ AZ/๋ฐ์ดํ„ฐ ์„ผํ„ฐ ์žฅ์•  ์ˆ˜์ค€์˜ ์žฌํ•ด๋งŒ ๊ณ ๋ คํ•  ๋•Œ
  • ๋น„์šฉ ์ ˆ๊ฐ์ด ์ตœ์šฐ์„ ์ผ ๋•Œ (cost-effective)

์ „๋žต 2: Pilot Light (ํŒŒ์ผ๋Ÿฟ ๋ผ์ดํŠธ)#

๊ฐœ๋…: ๋ฐ์ดํ„ฐ๋Š” ํ•ญ์ƒ ๋ณต์ œ ์œ ์ง€ โ†’ ํ•ต์‹ฌ ์ธํ”„๋ผ๋งŒ Recovery Region์— ๋ฐฐํฌ โ†’ ์ปดํ“จํŒ…(EC2 ๋“ฑ)์€ ๊บผ์ง„(Shut-off) ์ƒํƒœ

[Primary Region] [Recovery Region]
์•ฑ ์„œ๋ฒ„ (Active) ์•ฑ ์„œ๋ฒ„ = 0๊ฐœ (๋ฐฐํฌ ์•ˆ ๋จ)
RDS Primary โ”€โ”€โ”€ Async ๋ณต์ œ โ”€โ”€โ†’ RDS Replica (ํ•ญ์ƒ ์‹คํ–‰)
ELB / ASG ELB / ASG (๋ฐฐํฌ๋จ, ํŠธ๋ž˜ํ”ฝ ์—†์Œ)
โ”‚ โ”‚
โ”‚ ์žฌํ•ด ๋ฐœ์ƒ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ†’
AMI์—์„œ EC2 ์ธ์Šคํ„ด์Šค ๋ฐฐํฌ + ์Šค์ผ€์ผ ์•„์›ƒ
DNS/Global Accelerator๋กœ ํŠธ๋ž˜ํ”ฝ ์ „ํ™˜

ํŠน์„ฑ:

  • ๋ฐ์ดํ„ฐ๋Š” ํ•ญ์ƒ ๋ผ์ด๋ธŒ ๋ณต์ œ (DB๋Š” ํ•ญ์ƒ ์ผœ์ง)
  • ์ปดํ“จํŒ… ๋ฆฌ์†Œ์Šค๋Š” ์žฌํ•ด ์‹œ ๋น„๋กœ์†Œ ๋ฐฐํฌ โ†’ ๋ฐฐํฌ ์‹œ๊ฐ„ ํฌํ•จ = RTO ์ˆ˜์‹ญ ๋ถ„
  • Warm Standby ๋Œ€๋น„ ๋น„์šฉ ์ ˆ๊ฐ (์ปดํ“จํŒ… ๋น„์šฉ ์—†์Œ)
  • Pilot Light vs Warm Standby ํ•ต์‹ฌ ์ฐจ์ด: Pilot Light๋Š” ์ปดํ“จํŒ… ์—†์Œ โ†’ ์žฌํ•ด ์‹œ โ€œTurn onโ€(๋ฐฐํฌ) ํ•„์š”

AWS Elastic Disaster Recovery (DRS):

  • Pilot Light ์ „๋žต์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌํ˜„
  • ์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜ ๋ธ”๋ก ๋ ˆ๋ฒจ ์—ฐ์† ๋ณต์ œ (Block-level replication)
  • On-premises/EC2 โ†’ AWS๋กœ ๋ณต์ œ ๊ฐ€๋Šฅ (๋‹จ, RDS ์ œ์™ธ โ€” EC2 ๊ธฐ๋ฐ˜ DB๋งŒ)
  • RPO: ์ดˆ ๋‹จ์œ„ / RTO: ๋ถ„ ๋‹จ์œ„ ๋‹ฌ์„ฑ ๊ฐ€๋Šฅ
  • Staging Area์— ๋ณต์ œ๋ณธ ์œ ์ง€ (์ €๋น„์šฉ ์Šคํ† ๋ฆฌ์ง€ + ์ตœ์†Œ ์ปดํ“จํŒ…)
  • ๋น„ํŒŒ๊ดด์ (Non-disruptive) ํ…Œ์ŠคํŠธ ๋“œ๋ฆด ์ง€์›
  • Failover/Failback ๋ชจ๋‘ ์ง€์›

AWS ์„œ๋น„์Šค:

  • Aurora Global Database (๋ณต์ œ)
  • Amazon S3 + CloudFormation (๋ฐฑ์—… + IaC)
  • Route 53 / Global Accelerator (ํŠธ๋ž˜ํ”ฝ ์ „ํ™˜)
  • AWS Elastic Disaster Recovery

์ „๋žต 3: Warm Standby (์›œ ์Šคํƒ ๋ฐ”์ด)#

๊ฐœ๋…: Recovery Region์— ์ถ•์†Œ๋œ(Scaled-down) ์™„์ „ํ•œ ๊ธฐ๋Šฅ์˜ ํ™˜๊ฒฝ ํ•ญ์ƒ ์‹คํ–‰ โ†’ ์žฌํ•ด ์‹œ ์Šค์ผ€์ผ ์—…(Scale up)๋งŒ ํ•˜๋ฉด ๋จ

[Primary Region] [Recovery Region]
์•ฑ ์„œ๋ฒ„ ร— N๊ฐœ (Full) ์•ฑ ์„œ๋ฒ„ ร— 1๊ฐœ (Minimum) โ† ํ•ญ์ƒ ์‹คํ–‰
RDS Primary โ”€โ”€โ”€ ๋ณต์ œ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ†’ RDS Replica (ํ•ญ์ƒ ์‹คํ–‰)
ELB / ASG (Full) ELB / ASG (Minimum)
โ”‚ โ”‚
โ”‚ ์žฌํ•ด ๋ฐœ์ƒ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ†’
Scale Out๋งŒ ํ•˜๋ฉด ๋จ (๋ฐฐํฌ ๋ถˆํ•„์š”)
DNS/Global Accelerator๋กœ ํŠธ๋ž˜ํ”ฝ ์ „ํ™˜

ํŠน์„ฑ:

  • ํ•ญ์ƒ ์ตœ์†Œํ•œ์˜ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์ด ์‹คํ–‰ ์ค‘ โ†’ ์ฆ‰์‹œ ์ผ๋ถ€ ํŠธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ
  • ์žฌํ•ด ์‹œ ์Šค์ผ€์ผ ์—…๋งŒ ํ•„์š” โ†’ Pilot Light๋ณด๋‹ค ๋‚ฎ์€ RTO
  • Pilot Light์™€ ๋น„๊ต: ์ฝ”๋“œ์™€ ์ธํ”„๋ผ๊ฐ€ ์ด๋ฏธ ์‹คํ–‰ ์ค‘์ธ ๊ฒƒ์ด ์ฐจ์ด
  • Full Scale = โ€œHot Standbyโ€๋ผ๊ณ ๋„ ๋ถˆ๋ฆผ

Pilot Light vs Warm Standby ์ฐจ์ด ์š”์•ฝ:

ํ•ญ๋ชฉPilot LightWarm Standby
์ปดํ“จํŒ… ์ƒํƒœ๋ฐฐํฌ ์•ˆ ๋จ (๊บผ์ง)์ตœ์†Œ ๊ทœ๋ชจ๋กœ ์‹คํ–‰ ์ค‘
์žฌํ•ด ์‹œ ํ–‰๋™๋ฐฐํฌ + ์Šค์ผ€์ผ ์•„์›ƒ์Šค์ผ€์ผ ์•„์›ƒ๋งŒ
RTO๋” ๋†’์Œ๋” ๋‚ฎ์Œ
๋น„์šฉ๋” ๋‚ฎ์Œ๋” ๋†’์Œ
์ฆ‰์‹œ ํŠธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌโŒโœ… (์ถ•์†Œ๋œ ์šฉ๋Ÿ‰)

AWS ์„œ๋น„์Šค:

  • Aurora Global Database / RDS Multi-Region Replica
  • EC2 Auto Scaling (์Šค์ผ€์ผ ์—…)
  • Route 53 / Global Accelerator (Failover ๋ผ์šฐํŒ…)

์ „๋žต 4: Multi-Site Active/Active (๋ฉ€ํ‹ฐ ์‚ฌ์ดํŠธ ์•กํ‹ฐ๋ธŒ/์•กํ‹ฐ๋ธŒ)#

๊ฐœ๋…: 2๊ฐœ ์ด์ƒ์˜ AWS Region์—์„œ ๋™์‹œ์— ํŠธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ โ†’ ํ•œ ๋ฆฌ์ „ ์žฅ์•  ์‹œ ํŠธ๋ž˜ํ”ฝ๋งŒ ์žฌ๋ผ์šฐํŒ…

[Region A: Active] [Region B: Active]
์•ฑ ์„œ๋ฒ„ (Full) ์•ฑ ์„œ๋ฒ„ (Full)
RDS โ†โ”€โ”€โ”€โ”€ ์–‘๋ฐฉํ–ฅ ๋ณต์ œ โ”€โ”€โ”€โ”€โ”€โ†’ RDS (Aurora Global)
โ†‘ โ†‘
โ””โ”€โ”€โ”€โ”€ Route 53 / Global Accelerator โ”€โ”€โ”€โ”€โ”˜
(๋‘ ๋ฆฌ์ „ ๋ชจ๋‘ ํŠธ๋ž˜ํ”ฝ ์ˆ˜์‹ )
์žฌํ•ด ๋ฐœ์ƒ ์‹œ: ํŠธ๋ž˜ํ”ฝ์„ ์žฅ์•  ๋ฆฌ์ „์—์„œ ๋นผ๋Š” ๊ฒƒ๋งŒ์œผ๋กœ ๋ณต๊ตฌ

ํŠน์„ฑ:

  • RPO: Near-zero / RTO: Near-zero ๋˜๋Š” Zero
  • ๋น„๋™๊ธฐ ๋ฐ์ดํ„ฐ ๋ณต์ œ๋กœ Near-zero RPO ๋‹ฌ์„ฑ
  • ๊ฐ€์žฅ ๋†’์€ ๋น„์šฉ๊ณผ ๋ณต์žก๋„
  • ์“ฐ๊ธฐ ์ถฉ๋Œ(Write Conflict) ๊ด€๋ฆฌ ํ•„์š” (๋‘ ๋ฆฌ์ „์— ๋™์‹œ Write ๊ฐ€๋Šฅ)
  • ๋ฐ์ดํ„ฐ ๋ณต์ œ๋Š” ์ผ๋ถ€ ์žฌํ•ด๋กœ๋ถ€ํ„ฐ๋งŒ ๋ณดํ˜ธ โ†’ PITR(Point-in-Time Recovery)๋„ ํ•จ๊ป˜ ํ•„์š”

ํŠธ๋ž˜ํ”ฝ ๋ผ์šฐํŒ…:

  • Route 53: ์ง€์—ญ๋ณ„ ๋ผ์šฐํŒ… ์ •์ฑ… (Geoproximity, Latency ๋“ฑ), ๋น„์œจ ๊ธฐ๋ฐ˜ ๊ฐ€์ค‘์น˜
  • Global Accelerator: AWS Edge Network ํ™œ์šฉ โ†’ ๋‚ฎ์€ ์ง€์—ฐ ์‹œ๊ฐ„, DNS ์บ์‹œ ๋ฌธ์ œ ์—†์Œ, Traffic Dial๋กœ ๋ฆฌ์ „๋ณ„ ๋น„์œจ ์„ค์ •

5. Active/Passive vs. Active/Active#

ํ•ญ๋ชฉActive/PassiveActive/Active
ํ•ด๋‹น ์ „๋žตBackup & Restore, Pilot Light, Warm StandbyMulti-Site Active/Active
๋™์ž‘Primary์—์„œ ํŠธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ, DR์€ Standby๋ชจ๋“  ๋ฆฌ์ „์—์„œ ํŠธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ
Failover ๋ฐฉ๋ฒ•DNS/Global Accelerator๋กœ ํŠธ๋ž˜ํ”ฝ ์ „ํ™˜์žฅ์•  ๋ฆฌ์ „์—์„œ ํŠธ๋ž˜ํ”ฝ ๋นผ๊ธฐ๋งŒ
๋ฐ์ดํ„ฐ ์ถฉ๋Œ์—†์Œ์“ฐ๊ธฐ ์ถฉ๋Œ ๊ด€๋ฆฌ ํ•„์š”

6. Failover ํŠธ๋ž˜ํ”ฝ ๋ผ์šฐํŒ… ์„œ๋น„์Šค#

์„œ๋น„์ŠคํŠน์ง•Active/Active ์ง€์›DNS ์บ์‹œ ๋ฌธ์ œ
Amazon Route 53DNS ๊ธฐ๋ฐ˜, ๋‹ค์–‘ํ•œ ๋ผ์šฐํŒ… ์ •์ฑ… ์ง€์›โœ…์žˆ์Œ
AWS Global AcceleratorAnycast IP, AWS Edge ๋„คํŠธ์›Œํฌ ์ง์ ‘ ์ง„์ž…โœ…โŒ ์—†์Œ
Amazon CloudFrontOrigin Failover (์š”์ฒญ ๋‹จ์œ„), ์ดํ›„ ์š”์ฒญ์€ Primary๋กœ์ œํ•œ์ -
TIP

Global Accelerator: DNS ์บ์‹œ ๋ฌธ์ œ ์—†์Œ, ๋‚ฎ์€ ์ง€์—ฐ ์‹œ๊ฐ„ โ†’ Active/Active, Pilot Light, Warm Standby ๋ชจ๋‘ ์ ํ•ฉ


7. AWS ํ•ต์‹ฌ DR ์„œ๋น„์Šค#

์„œ๋น„์Šค์—ญํ• ์ฃผ์š” ์ „๋žต
AWS BackupEC2, EBS, RDS, DynamoDB ๋“ฑ ์ค‘์•™ ์ง‘์ค‘ ๋ฐฑ์—…Backup & Restore
AWS Elastic Disaster Recovery (DRS)๋ธ”๋ก ๋ ˆ๋ฒจ ์—ฐ์† ๋ณต์ œ, Failover/FailbackPilot Light
Aurora Global Database๋ฆฌ์ „ ๊ฐ„ ๋ณต์ œ < 1์ดˆ, ์ตœ๋Œ€ 5๊ฐœ SecondaryPilot Light, Warm Standby
RDS Read Replica (Cross-Region)๋น„๋™๊ธฐ ๋ณต์ œPilot Light, Warm Standby
S3 Cross-Region Replication๊ฐ์ฒด ๋ณต์ œ๋ชจ๋“  ์ „๋žต
CloudFormation / CDKIaC๋กœ DR ๋ฆฌ์ „ ์ธํ”„๋ผ ์‹ ์† ๋ฐฐํฌBackup & Restore
AWS Resilience HubRTO/RPO ๋ชฉํ‘œ ๋‹ฌ์„ฑ ์—ฌ๋ถ€ ์ง€์† ๊ฒ€์ฆ ๋ฐ ์ถ”์ ๋ชจ๋“  ์ „๋žต
Route 53 / Global AcceleratorFailover ํŠธ๋ž˜ํ”ฝ ๋ผ์šฐํŒ…๋ชจ๋“  ์ „๋žต
Amazon EventBridge + Lambda์žฌํ•ด ๊ฐ์ง€ ์ž๋™ํ™”, RTO ๋‹จ์ถ•๋ชจ๋“  ์ „๋žต

8. DR ํ…Œ์ŠคํŒ… (Testing Disaster Recovery)#

WARNING

โš ๏ธ DR ์ „๋žต์€ ์ •๊ธฐ์ ์œผ๋กœ ํ…Œ์ŠคํŠธํ•˜์ง€ ์•Š์œผ๋ฉด ์‹ค์ œ ์žฌํ•ด ์‹œ ์ž‘๋™ ๋ณด์žฅ ๋ถˆ๊ฐ€

ํ•ต์‹ฌ ์›์น™#

  • DR ๊ตฌํ˜„์„ ๊ฒ€์ฆํ•˜๊ณ  DR ๋ฆฌ์ „์œผ๋กœ์˜ Failover๋ฅผ ์ •๊ธฐ ํ…Œ์ŠคํŠธํ•˜์—ฌ RTO/RPO ๋‹ฌ์„ฑ ์—ฌ๋ถ€ ํ™•์ธ
  • โ€œ๊ฑฐ์˜ ์‹คํ–‰๋˜์ง€ ์•Š๋Š” ๋ณต๊ตฌ ๊ฒฝ๋กœโ€ ํŒจํ„ด ํ”ผํ•˜๊ธฐ: ๋“œ๋ฌผ๊ฒŒ ํ…Œ์ŠคํŠธ๋œ ๊ฒฝ๋กœ๋Š” ์‹ค์ œ ์žฅ์•  ์‹œ ์‹คํŒจ ์œ„ํ—˜

ํ…Œ์ŠคํŠธํ•ด์•ผ ํ•˜๋Š” ์ด์œ  (์‹ค์ œ ์‚ฌ๋ก€):

๊ฐ€์ •: Secondary DB๊ฐ€ Read-only ์ฟผ๋ฆฌ๋ฅผ ๋‹ด๋‹น
Primary ์žฅ์•  ์‹œ Secondary๋กœ Write ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ์ƒ๊ฐ
ํ˜„์‹ค: ์˜ค๋žซ๋™์•ˆ Failover ํ…Œ์ŠคํŠธ๋ฅผ ์•ˆ ํ–ˆ๋‹ค๋ฉด
โ†’ Secondary ์šฉ๋Ÿ‰์ด ๋ถ€์กฑํ•˜๊ฑฐ๋‚˜
โ†’ ์„œ๋น„์Šค ์ฟผํ„ฐ๊ฐ€ ์ถฉ์กฑ ์•ˆ ๋  ์ˆ˜ ์žˆ์Œ

DR ํ…Œ์ŠคํŠธ ๋ฐฉ๋ฒ•#

๋ฐฉ๋ฒ•์„ค๋ช…
Backup ํ…Œ์ŠคํŠธ๋ฐฑ์—… ๋ณต์› ์ •๊ธฐ ์‹คํ–‰ (๋‹จ์ˆœํžˆ ๋ฐฑ์—… ์ƒ์„ฑ์œผ๋กœ ์ถฉ๋ถ„ ์•„๋‹˜)
Failover ๋“œ๋ฆด์‹ค์ œ DR ๋ฆฌ์ „์œผ๋กœ Failover ์‹œ์—ฐ
Chaos Engineering์˜๋„์  ์žฅ์•  ์ฃผ์ž…์œผ๋กœ ๋ณต๊ตฌ ๋Šฅ๋ ฅ ๊ฒ€์ฆ
DR ๋“œ๋ฆดIsolated Subnet์—์„œ ์‹คํ–‰ โ†’ ํ”„๋กœ๋•์…˜ ๋ฏธ๊ฐ„์„ญ
NOTE

๐Ÿ’ก AWS Resilience Hub: RTO/RPO ๋ชฉํ‘œ ๋‹ฌ์„ฑ ์—ฌ๋ถ€๋ฅผ ์ง€์†์ ์œผ๋กœ ๊ฒ€์ฆํ•˜๋Š” ์„œ๋น„์Šค


9. ์ „๋žต ์„ ํƒ ๊ฐ€์ด๋“œ#

๋‹จ์ˆœํ•œ ๋ฐ์ดํ„ฐ ์„ผํ„ฐ ์žฅ์•  ์ˆ˜์ค€ + ๋น„์šฉ ์ตœ์†Œํ™”
โ†“
Backup & Restore
๋ฐ์ดํ„ฐ ์„ผํ„ฐ ์žฅ์•  ์ˆ˜์ค€ + RPO/RTO ์ˆ˜์‹ญ ๋ถ„ + ๋น„์šฉ ์ ˆ๊ฐ
โ†“
Pilot Light
(๋˜๋Š” AWS Elastic Disaster Recovery)
๋ฆฌ์ „ ๋ ˆ๋ฒจ ์žฅ์•  + RPO/RTO ๋ถ„ ๋‹จ์œ„ + ๋น„์ฆˆ๋‹ˆ์Šค ํฌ๋ฆฌํ‹ฐ์ปฌ
โ†“
Warm Standby
๋ฆฌ์ „ ๋ ˆ๋ฒจ ์žฅ์•  + Near-zero RPO/RTO + ๋ฏธ์…˜ ํฌ๋ฆฌํ‹ฐ์ปฌ
โ†“
Multi-Site Active/Active

๊ทœ์ œ ์š”๊ฑด: ๋ฐ์ดํ„ฐ ๋ ˆ์ง€๋˜์‹œ ์š”๊ตฌ์‚ฌํ•ญ์ด ์žˆ์–ด ๋‹จ์ผ ๋ฆฌ์ „๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ โ†’ AZ๋ฅผ Region ๋Œ€์‹  DR ์‚ฌ์ดํŠธ๋กœ ํ™œ์šฉ ๊ฐ€๋Šฅ


10. ๋น„์šฉ ์ตœ์ ํ™” ๊ด€์ #

  • AWS๋Š” ๋ฌผ๋ฆฌ์  Backup Data Center์˜ ๊ณ ์ • ์ž๋ณธ ๋น„์šฉ(CapEx) โ†’ **์‹ค์ œ ์‚ฌ์šฉ๋Ÿ‰ ๊ธฐ๋ฐ˜ ์šด์˜ ๋น„์šฉ(OpEx)**์œผ๋กœ ์ „ํ™˜
  • ์˜จํ”„๋ ˆ๋ฏธ์Šค DR์€ ํ•ญ์ƒ 2๋ฒˆ์งธ ๋ฐ์ดํ„ฐ ์„ผํ„ฐ ํ’€ ์šด์˜ ๋น„์šฉ โ†’ AWS์—์„œ๋Š” Pilot Light/Warm Standby๋กœ ์ตœ์†Œ ๋น„์šฉ๋งŒ ์œ ์ง€
  • Glacier / Glacier Deep Archive: ๋ฐฑ์—… ์ €์žฅ ๋น„์šฉ ๋Œ€ํญ ์ ˆ๊ฐ (์•„์นด์ด๋ธŒ ๋ฐ์ดํ„ฐ)
  • ํ•„์š” ์ด์ƒ์œผ๋กœ ์—„๊ฒฉํ•œ ์ „๋žต ์„ ํƒ ๊ธˆ์ง€ โ†’ ๋ถˆํ•„์š”ํ•œ ๋น„์šฉ ๋ฐœ์ƒ

๐Ÿ“Œ ์‹œํ—˜ ์ž์ฃผ ์ถœ์ œ ํฌ์ธํŠธ ์ด์ •๋ฆฌ#

ํฌ์ธํŠธ๋‚ด์šฉ
RTO ์ •์˜์„œ๋น„์Šค ์ค‘๋‹จ ํ›„ ๋ณต๊ตฌ๊นŒ์ง€ ํ—ˆ์šฉ ์ตœ๋Œ€ ์‹œ๊ฐ„
RPO ์ •์˜๋งˆ์ง€๋ง‰ ๋ฐ์ดํ„ฐ ๋ณต๊ตฌ ์‹œ์  ์ดํ›„ ํ—ˆ์šฉ ์ตœ๋Œ€ ์‹œ๊ฐ„ (๋ฐ์ดํ„ฐ ์†์‹ค ํ—ˆ์šฉ ๋ฒ”์œ„)
RTO/RPO ๋‚ฎ์„์ˆ˜๋ก๋น„์šฉ๊ณผ ๋ณต์žก๋„ ์ฆ๊ฐ€
Data Plane vs Control PlaneFailover์—๋Š” Data Plane ์ž‘์—…๋งŒ ์‚ฌ์šฉ ๊ถŒ์žฅ
Backup & Restore RTO/RPO๊ฐ€์žฅ ๋†’์Œ (์‹œ๊ฐ„ ๋‹จ์œ„) / ๋น„์šฉ ์ตœ์ €
Pilot Light ํŠน์ง•๋ฐ์ดํ„ฐ ๋ผ์ด๋ธŒ ๋ณต์ œ, ์ปดํ“จํŒ… ๋ฐฐํฌ ์—†์Œ
Pilot Light vs Warm StandbyPilot: ์žฌํ•ด ์‹œ โ€œTurn onโ€(๋ฐฐํฌ) / Warm: ์žฌํ•ด ์‹œ โ€œScale upโ€๋งŒ
Warm Standby ํŠน์ง•์ตœ์†Œ ๊ทœ๋ชจ ํ™˜๊ฒฝ ํ•ญ์ƒ ์‹คํ–‰, ์ฆ‰์‹œ ์ผ๋ถ€ ํŠธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ
Multi-Site Active/ActiveNear-zero RPO/RTO, ๋ชจ๋“  ๋ฆฌ์ „ Active, ์“ฐ๊ธฐ ์ถฉ๋Œ ๊ด€๋ฆฌ ํ•„์š”
Active/Passive ํ•ด๋‹น ์ „๋žตBackup & Restore, Pilot Light, Warm Standby
Active/Active ํ•ด๋‹น ์ „๋žตMulti-Site Active/Active
AWS Elastic DRS ์ „๋žตPilot Light ๊ธฐ๋ฐ˜, ๋ธ”๋ก ๋ ˆ๋ฒจ ์—ฐ์† ๋ณต์ œ
AWS Elastic DRS ๋Œ€์ƒOn-premises ๋˜๋Š” EC2 ๊ธฐ๋ฐ˜ ์•ฑ/DB (RDS ์ œ์™ธ)
Global Accelerator ์žฅ์ DNS ์บ์‹œ ๋ฌธ์ œ ์—†์Œ, Edge ๋„คํŠธ์›Œํฌ ํ™œ์šฉ
DR ํ…Œ์ŠคํŒ… ํ•„์ˆ˜ ์ด์œ ๋“œ๋ฌผ๊ฒŒ ์‹คํ–‰๋˜๋Š” ๋ณต๊ตฌ ๊ฒฝ๋กœ๋Š” ์‹ค์ œ ์žฅ์•  ์‹œ ์‹คํŒจ ๊ฐ€๋Šฅ
Backup์—์„œ ์ž๋™ ๋ณต์›AWS SDK๋กœ AWS Backup API ํ˜ธ์ถœ (์ž๋™ ๋ณต์› ๊ธฐ๋ณธ ๋ฏธ์ง€์›)
DR ๋ชฉํ‘œ ์ง€์† ์ถ”์ AWS Resilience Hub
๋‹จ์ผ ๋ฆฌ์ „ ๊ทœ์ œ ํ™˜๊ฒฝ DRAZ๋ฅผ Recovery Site๋กœ ํ™œ์šฉ