2548 ๋‹จ์–ด
13 ๋ถ„
๐Ÿ“Š AWS Database & Data Analytics

๐Ÿ“Š AWS Database & Data Analytics#

RDS ยท Aurora ยท ElastiCache ยท DynamoDB ยท DocumentDB ยท Neptune ยท Keyspaces ยท Timestream

Athena ยท Redshift ยท OpenSearch ยท EMR ยท QuickSight ยท Glue ยท Lake Formation ยท Flink ยท MSK


๋ชฉ์ฐจ#

  1. DB ์œ ํ˜• ์ „์ฒด ๋น„๊ต
  2. RDS (์š”์•ฝ)
  3. Aurora (์š”์•ฝ)
  4. ElastiCache (์š”์•ฝ)
  5. DocumentDB
  6. Amazon Neptune
  7. Amazon Keyspaces (for Apache Cassandra)
  8. Amazon Timestream
  9. Amazon Athena
  10. Amazon Redshift
  11. Amazon OpenSearch Service
  12. Amazon EMR (Elastic MapReduce)
  13. Amazon QuickSight
  14. AWS Glue
  15. AWS Lake Formation
  16. Amazon Managed Service for Apache Flink
  17. Amazon MSK (Managed Streaming for Apache Kafka)
  18. ๋น…๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ํŒŒ์ดํ”„๋ผ์ธ ์•„ํ‚คํ…์ฒ˜
  19. ๐Ÿ“Œ ์‹œํ—˜ ์ž์ฃผ ์ถœ์ œ ํฌ์ธํŠธ

DB ์œ ํ˜• ์ „์ฒด ๋น„๊ต#

์œ ํ˜•์„œ๋น„์ŠคํŠน์ง•
RDBMS (OLTP)RDS, AuroraSQL, JOIN ๊ฐ€๋Šฅ
NoSQLDynamoDB, ElastiCache, Neptune, DocumentDB, KeyspacesJOIN/SQL ์—†์Œ
Object StoreS3, S3 Glacier๋Œ€์šฉ๋Ÿ‰ ๊ฐ์ฒด, ์•„์นด์ด๋ธŒ
Data Warehouse (OLAP)Redshift, Athena, EMRSQL ๋ถ„์„, BI
SearchOpenSearch์ „์ฒด ํ…์ŠคํŠธ ๊ฒ€์ƒ‰, ๋น„์ •ํ˜•
GraphNeptune๊ด€๊ณ„ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”
LedgerQLDB (Quantum Ledger DB)๋ณ€๊ฒฝ ๋ถˆ๊ฐ€ ํŠธ๋žœ์žญ์…˜ ์ด๋ ฅ
Time SeriesTimestream์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ

RDS (์š”์•ฝ)#

  • Managed: PostgreSQL / MySQL / Oracle / SQL Server / DB2 / MariaDB / Custom
  • Provisioned Instance Size + EBS Volume
  • Read Replicas, Multi-AZ, Storage Auto Scaling
  • IAM, Security Groups, KMS(rest), SSL(transit)
  • PITR ์ตœ๋Œ€ 35์ผ, Manual Snapshot ๋ฌด์ œํ•œ
  • RDS Custom: Oracle/SQL Server ์ „์šฉ, OS ๋ฐ DB ์ง์ ‘ ์ ‘๊ทผ ๊ฐ€๋Šฅ
  • Use Case: RDBMS/OLTP, SQL ์ฟผ๋ฆฌ, ํŠธ๋žœ์žญ์…˜

Aurora (์š”์•ฝ)#

  • PostgreSQL/MySQL API ํ˜ธํ™˜, ์Šคํ† ๋ฆฌ์ง€์™€ ์ปดํ“จํŒ… ๋ถ„๋ฆฌ
  • ์Šคํ† ๋ฆฌ์ง€: 3 AZ ร— 2 = 6๊ฐœ ๋ณต์‚ฌ๋ณธ, Self-healing, Auto Scaling
  • ์ปดํ“จํŒ…: Multi-AZ DB Cluster, Read Replica Auto Scaling
  • Aurora Serverless: ์˜ˆ์ธก ๋ถˆ๊ฐ€/๊ฐ„ํ—์  ์›Œํฌ๋กœ๋“œ
  • Aurora Global: ๋ฆฌ์ „๋‹น ์ตœ๋Œ€ 16 Read Instance, ๋ฆฌ์ „ ๊ฐ„ ๋ณต์ œ < 1์ดˆ
  • Aurora ML: SageMaker/Comprehend ํ†ตํ•ฉ
  • Aurora Cloning: ๊ธฐ์กด ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋น ๋ฅด๊ฒŒ ์ƒˆ ํด๋Ÿฌ์Šคํ„ฐ ์ƒ์„ฑ (์Šคํ…Œ์ด์ง• DB)
  • Use Case: RDS์™€ ๋™์ผ + ๋” ๋†’์€ ์„ฑ๋Šฅ, ๊ฐ€์šฉ์„ฑ, ์œ ์—ฐ์„ฑ

ElastiCache (์š”์•ฝ)#

  • Managed Redis / Memcached, Sub-millisecond ์ง€์—ฐ
  • Redis: Multi-AZ, Read Replicas, Backup, AOF ์˜์†์„ฑ
  • Memcached: Sharding, ๋น„์˜์†, ๋ฉ€ํ‹ฐ์Šค๋ ˆ๋“œ
  • ๋ณด์•ˆ: IAM, Security Groups, KMS, Redis AUTH
  • ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ฝ”๋“œ ๋ณ€๊ฒฝ ํ•„์š”
  • Use Case: Key/Value ์บ์‹œ, ์„ธ์…˜ ์Šคํ† ์–ด, DB ์ฟผ๋ฆฌ ๊ฒฐ๊ณผ ์บ์‹œ

DocumentDB#

  • Aurora์˜ AWS ๊ตฌํ˜„์ฒ˜๋Ÿผ, DocumentDB๋Š” MongoDB์˜ AWS ๊ตฌํ˜„
  • JSON ๋ฐ์ดํ„ฐ ์ €์žฅ/์ฟผ๋ฆฌ/์ธ๋ฑ์‹ฑ
  • Aurora์™€ ์œ ์‚ฌํ•œ ๋ฐฐํฌ ๊ฐœ๋…: Fully Managed, 3 AZ ๊ณ ๊ฐ€์šฉ์„ฑ
  • ์Šคํ† ๋ฆฌ์ง€ ์ž๋™ ์ฆ๊ฐ€ (10 GB ๋‹จ์œ„)
  • ์ดˆ๋‹น ์ˆ˜๋ฐฑ๋งŒ req๋กœ ์ž๋™ ํ™•์žฅ
  • Use Case: MongoDB ์›Œํฌ๋กœ๋“œ๋ฅผ AWS๋กœ ์ด์ „

Amazon Neptune#

  • ์™„์ „ ๊ด€๋ฆฌํ˜• Graph Database
  • ์ˆ˜์‹ญ์–ต ๊ฐœ ๊ด€๊ณ„ ์ €์žฅ, Millisecond ๋‹จ์œ„ ์ฟผ๋ฆฌ
  • 3 AZ ๊ณ ๊ฐ€์šฉ์„ฑ, ์ตœ๋Œ€ 15 Read Replicas
  • Use Cases: ์†Œ์…œ ๋„คํŠธ์›Œํฌ, ์ง€์‹ ๊ทธ๋ž˜ํ”„(Wikipedia), ์‚ฌ๊ธฐ ํƒ์ง€, ์ถ”์ฒœ ์—”์ง„

Neptune Streams#

  • Graph ๋ฐ์ดํ„ฐ ๋ณ€๊ฒฝ์˜ ์‹ค์‹œ๊ฐ„ ์ˆœ์„œ ๋ณด์žฅ ์ŠคํŠธ๋ฆผ
  • HTTP REST API๋กœ ์ ‘๊ทผ
  • Use Cases: ๋ณ€๊ฒฝ ์•Œ๋ฆผ, OpenSearch/ElastiCache ๋™๊ธฐํ™”, ๋ฆฌ์ „ ๊ฐ„ ๋ณต์ œ

Amazon Keyspaces (for Apache Cassandra)#

  • Apache Cassandra ํ˜ธํ™˜ ์™„์ „ ๊ด€๋ฆฌํ˜• DB
  • Serverless, Scalable, Multi-AZ (3 AZ ๋ณต์ œ)
  • Cassandra Query Language (CQL) ์‚ฌ์šฉ
  • Single-digit millisecond ์ง€์—ฐ, ์ดˆ๋‹น ์ˆ˜์ฒœ req
  • On-Demand ๋˜๋Š” Provisioned with Auto-scaling
  • PITR ์ตœ๋Œ€ 35์ผ
  • Use Cases: IoT ๋””๋ฐ”์ด์Šค ๋ฐ์ดํ„ฐ, ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ

Amazon Timestream#

  • ์™„์ „ ๊ด€๋ฆฌํ˜• Time Series Database, Serverless
  • ํ•˜๋ฃจ ์ˆ˜์กฐ ๊ฐœ ์ด๋ฒคํŠธ ์ฒ˜๋ฆฌ, ๊ด€๊ณ„ํ˜• DB ๋Œ€๋น„ 100๋ฐฐ ๋น ๋ฆ„, 1/10 ๋น„์šฉ
  • ์Šคํ† ๋ฆฌ์ง€ ํ‹ฐ์–ด๋ง: ์ตœ๊ทผ ๋ฐ์ดํ„ฐ โ†’ ๋ฉ”๋ชจ๋ฆฌ / ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ โ†’ ์ €๋น„์šฉ ์Šคํ† ๋ฆฌ์ง€
  • ๋‚ด์žฅ ์‹œ๊ณ„์—ด ๋ถ„์„ ํ•จ์ˆ˜ (Near real-time ํŒจํ„ด ์‹๋ณ„)
  • SQL ํ˜ธํ™˜, ์•”ํ˜ธํ™”(transit/rest)
  • Use Cases: IoT, ์šด์˜ ๋ชจ๋‹ˆํ„ฐ๋ง, ์‹ค์‹œ๊ฐ„ ๋ถ„์„

Amazon Athena#

  • S3์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ Serverless SQL๋กœ ๋ถ„์„
  • Presto ๊ธฐ๋ฐ˜, Standard SQL ์‚ฌ์šฉ
  • ์ง€์› ํฌ๋งท: CSV, JSON, ORC, Avro, Parquet
  • ๊ฐ€๊ฒฉ: ์Šค์บ”๋œ ๋ฐ์ดํ„ฐ TB๋‹น $5.00
  • Amazon QuickSight์™€ ์—ฐ๋™ํ•˜์—ฌ BI ๋Œ€์‹œ๋ณด๋“œ ๊ตฌ์„ฑ

์„ฑ๋Šฅ ์ตœ์ ํ™” (๋น„์šฉ ์ ˆ๊ฐ):

๋ฐฉ๋ฒ•ํšจ๊ณผ
Columnar ํฌ๋งท (Parquet, ORC)์Šค์บ” ๋ฐ์ดํ„ฐ ๊ฐ์†Œ โ†’ ๋น„์šฉ ๋Œ€ํญ ์ ˆ๊ฐ
Glue๋กœ ๋ณ€ํ™˜CSV โ†’ Parquet/ORC ์ž๋™ ๋ณ€ํ™˜
๋ฐ์ดํ„ฐ ์••์ถ•bzip2, gzip, snappy ๋“ฑ
S3 Partitioning๊ฐ€์ƒ ์ปฌ๋Ÿผ ๊ธฐ๋ฐ˜ ํŒŒํ‹ฐ์…”๋‹์œผ๋กœ ์Šค์บ” ๋ฒ”์œ„ ์ถ•์†Œ
ํฐ ํŒŒ์ผ ์‚ฌ์šฉ> 128 MB ๊ถŒ์žฅ (์†Œํ˜• ํŒŒ์ผ ์˜ค๋ฒ„ํ—ค๋“œ ์ œ๊ฑฐ)

Federated Query:

  • ๊ด€๊ณ„ํ˜•/๋น„๊ด€๊ณ„ํ˜•/S3 ๋“ฑ ๋‹ค์–‘ํ•œ ์†Œ์Šค๋ฅผ SQL๋กœ ํ†ตํ•ฉ ์ฟผ๋ฆฌ
  • Lambda ๊ธฐ๋ฐ˜ Data Source Connector ์‚ฌ์šฉ (CloudWatch Logs, DynamoDB, RDS ๋“ฑ)

๐Ÿ“Œ ์‹œํ—˜ Tip: โ€œS3 ๋ฐ์ดํ„ฐ๋ฅผ Serverless SQL๋กœ ๋ถ„์„โ€ โ†’ Athena


Amazon Redshift#

  • PostgreSQL ๊ธฐ๋ฐ˜์ด์ง€๋งŒ OLAP (๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค), OLTP ์•„๋‹˜
  • Columnar Storage + ๋ณ‘๋ ฌ ์ฟผ๋ฆฌ ์—”์ง„ โ†’ Petabyte ๊ทœ๋ชจ ๋ถ„์„
  • Athena๋ณด๋‹ค Index ๋•๋ถ„์— ๋” ๋น ๋ฅธ Join/์ง‘๊ณ„
  • BI ๋„๊ตฌ (QuickSight, Tableau) ํ†ตํ•ฉ

ํด๋Ÿฌ์Šคํ„ฐ ๊ตฌ์กฐ#

[Leader Node] : ์ฟผ๋ฆฌ ๊ณ„ํš + ๊ฒฐ๊ณผ ์ง‘๊ณ„
[Compute Nodes] : ์‹ค์ œ ์ฟผ๋ฆฌ ์‹คํ–‰ (N๊ฐœ)

๋‘ ๊ฐ€์ง€ ๋ชจ๋“œ: Provisioned Cluster / Serverless Cluster

Snapshots & DR#

  • ์ผ๋ถ€ ํด๋Ÿฌ์Šคํ„ฐ์— Multi-AZ ๋ชจ๋“œ ์ง€์›
  • Snapshot = ํด๋Ÿฌ์Šคํ„ฐ์˜ PITR ๋ฐฑ์—… (S3 ๋‚ด๋ถ€ ์ €์žฅ, Incremental)
  • ์ž๋™ ๋ฐฑ์—…: 8์‹œ๊ฐ„๋งˆ๋‹ค / 5 GB๋งˆ๋‹ค / ์Šค์ผ€์ค„ ๊ธฐ๋ฐ˜
  • ์ˆ˜๋™ ๋ฐฑ์—…: ๋ช…์‹œ์  ์‚ญ์ œ ์ „๊นŒ์ง€ ์œ ์ง€
  • ๋‹ค๋ฅธ ๋ฆฌ์ „์œผ๋กœ Snapshot ์ž๋™ ๋ณต์‚ฌ ์„ค์ • ๊ฐ€๋Šฅ

๋ฐ์ดํ„ฐ ๋กœ๋”ฉ ๋ฐฉ๋ฒ•#

๋ฐฉ๋ฒ•์„ค๋ช…
Kinesis Data FirehoseFirehose โ†’ S3 โ†’ COPY๋กœ Redshift ์ ์žฌ
S3 COPY ๋ช…๋ นIAM Role๋กœ S3์—์„œ ์ง์ ‘ COPY (VPC ๋ผ์šฐํŒ… ๊ฐ€๋Šฅ)
EC2 + JDBC๋ฐฐ์น˜๋กœ ๋ฐ์ดํ„ฐ ์ „์†ก (๋Œ€๋Ÿ‰ Insert๊ฐ€ ์œ ๋ฆฌ)

Redshift Spectrum#

  • S3 ๋ฐ์ดํ„ฐ๋ฅผ Redshift์— ๋กœ๋”ฉํ•˜์ง€ ์•Š๊ณ  ์ง์ ‘ ์ฟผ๋ฆฌ
  • Redshift Cluster๊ฐ€ ์žˆ์–ด์•ผ ํ•จ (์ฟผ๋ฆฌ๋Š” ์ˆ˜์ฒœ ๊ฐœ Spectrum ๋…ธ๋“œ์—์„œ ์ฒ˜๋ฆฌ)

Amazon OpenSearch Service#

  • Amazon ElasticSearch์˜ ํ›„์† ์„œ๋น„์Šค
  • ์–ด๋–ค ํ•„๋“œ๋“  ์ „์ฒด ํ…์ŠคํŠธ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ (DynamoDB๋Š” Primary Key/Index๋งŒ)
  • ๋‹ค๋ฅธ DB์˜ ๋ณด์™„์žฌ๋กœ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ
  • Managed Cluster ๋˜๋Š” Serverless ๋ชจ๋“œ
  • ๊ธฐ๋ณธ SQL ๋ฏธ์ง€์› (ํ”Œ๋Ÿฌ๊ทธ์ธ์œผ๋กœ ํ™œ์„ฑํ™” ๊ฐ€๋Šฅ)
  • ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘: Kinesis Data Firehose, IoT, CloudWatch Logs
  • ๋ณด์•ˆ: Cognito, IAM, KMS, TLS
  • OpenSearch Dashboards (์‹œ๊ฐํ™”) ํฌํ•จ

OpenSearch ํ†ตํ•ฉ ํŒจํ„ด#

DynamoDB + OpenSearch:

[์•ฑ CRUD] โ†’ [DynamoDB] โ†’ [DynamoDB Stream] โ†’ [Lambda] โ†’ [OpenSearch]
[์•ฑ ๊ฒ€์ƒ‰] โ†’ OpenSearch API โ†’ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ โ†’ DynamoDB์—์„œ ์ƒ์„ธ ์กฐํšŒ

CloudWatch Logs:

CloudWatch Logs โ†’ Subscription Filter โ†’ Lambda โ†’ OpenSearch (Real-time)
CloudWatch Logs โ†’ Subscription Filter โ†’ Firehose โ†’ OpenSearch (Near real-time)

Kinesis:

KDS โ†’ Firehose โ†’ OpenSearch (Near real-time)
KDS โ†’ Lambda โ†’ OpenSearch (Real-time)

Amazon EMR (Elastic MapReduce)#

  • Hadoop ํด๋Ÿฌ์Šคํ„ฐ ๊ธฐ๋ฐ˜ ๋น…๋ฐ์ดํ„ฐ ๋ถ„์„ ํ”Œ๋žซํผ
  • ์ˆ˜๋ฐฑ ๊ฐœ์˜ EC2 ์ธ์Šคํ„ด์Šค๋กœ ๊ตฌ์„ฑ๋œ ํด๋Ÿฌ์Šคํ„ฐ
  • Apache Spark, HBase, Presto, Flink ๋ฒˆ๋“ค ํฌํ•จ
  • ํ”„๋กœ๋น„์ €๋‹/์„ค์ • ์ž๋™ํ™”, Auto Scaling, Spot Instance ํ†ตํ•ฉ

๋…ธ๋“œ ์œ ํ˜• ๋ฐ ๊ตฌ๋งค ์˜ต์…˜#

๋…ธ๋“œ ์œ ํ˜•์—ญํ• ํŠน์„ฑ
Master Nodeํด๋Ÿฌ์Šคํ„ฐ ๊ด€๋ฆฌ, ์ƒํƒœ ์กฐ์œจLong-running
Core Node์ž‘์—… ์‹คํ–‰ + ๋ฐ์ดํ„ฐ ์ €์žฅLong-running
Task Node์ž‘์—… ์‹คํ–‰๋งŒ์ผ๋ฐ˜์ ์œผ๋กœ Spot Instance
๊ตฌ๋งค ๋ฐฉ์‹ํŠน์ง•
On-Demand์•ˆ์ •์ , ์ข…๋ฃŒ ์—†์Œ
Reserved๋น„์šฉ ์ ˆ๊ฐ (๊ฐ€์šฉ ์‹œ EMR ์ž๋™ ์‚ฌ์šฉ)
Spot์ €๋ ดํ•˜์ง€๋งŒ ์ข…๋ฃŒ ๊ฐ€๋Šฅ, ๋œ ์•ˆ์ •์ 

Amazon QuickSight#

  • Serverless ML ๊ธฐ๋ฐ˜ BI(Business Intelligence) ์„œ๋น„์Šค
  • ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ๋Œ€์‹œ๋ณด๋“œ ์ƒ์„ฑ, ์ž๋™ ํ™•์žฅ, Session ๋‹จ์œ„ ๊ณผ๊ธˆ
  • SPICE ์—”์ง„: ๋ฐ์ดํ„ฐ Import ์‹œ In-memory ๊ณ„์‚ฐ
  • Enterprise ๋ฒ„์ „: Column-Level Security (CLS)

ํ†ตํ•ฉ ๋ฐ์ดํ„ฐ ์†Œ์Šค:

  • RDS, Aurora, Redshift, Athena, S3, OpenSearch, Timestream
  • Salesforce, Jira, Teradata, ์˜จํ”„๋ ˆ๋ฏธ์Šค DB (JDBC)
  • ํŒŒ์ผ: xlsx, csv, json, tsv, elf/clf

๋Œ€์‹œ๋ณด๋“œ ๊ณต์œ :

  • Users (Standard) / Groups (Enterprise) โ€” QuickSight ๋‚ด๋ถ€ ๊ฐœ๋… (IAM๊ณผ ๋ณ„๊ฐœ)
  • ๋Œ€์‹œ๋ณด๋“œ ๊ณต์œ  ์ „ ๋ฐ˜๋“œ์‹œ Publish ํ•„์š”
  • ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ๋ณด๋Š” ์‚ฌ์šฉ์ž๋Š” ๊ธฐ์ € ๋ฐ์ดํ„ฐ๋„ ๋ณผ ์ˆ˜ ์žˆ์Œ

AWS Glue#

  • ์™„์ „ ๊ด€๋ฆฌํ˜• ETL (Extract, Transform, Load) ์„œ๋น„์Šค, Serverless
  • S3 ๋˜๋Š” RDS ๋ฐ์ดํ„ฐ โ†’ Glue ETL(๋ณ€ํ™˜) โ†’ Redshift ๋กœ๋“œ

Glue Data Catalog#

  • S3, RDS, DynamoDB, JDBC โ†’ Glue Data Crawler โ†’ Glue Data Catalog (๋ฉ”ํƒ€๋ฐ์ดํ„ฐ)
  • Athena, Redshift Spectrum, EMR์ด Catalog๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ๊ฒ€์ƒ‰

Glue ์ฃผ์š” ๊ธฐ๋Šฅ#

๊ธฐ๋Šฅ์„ค๋ช…
Job Bookmarks์ด๋ฏธ ์ฒ˜๋ฆฌํ•œ ๋ฐ์ดํ„ฐ ์žฌ์ฒ˜๋ฆฌ ๋ฐฉ์ง€
DataBrew์ฝ”๋“œ ์—†๋Š” ์‚ฌ์ „ ๋นŒ๋“œ ๋ณ€ํ™˜์œผ๋กœ ๋ฐ์ดํ„ฐ ์ •์ œ
Glue StudioETL Job ์ƒ์„ฑ/์‹คํ–‰/๋ชจ๋‹ˆํ„ฐ๋ง GUI
Streaming ETLKinesis Data Streams, Kafka, MSK ๊ธฐ๋ฐ˜ ์ŠคํŠธ๋ฆฌ๋ฐ ETL (Spark Structured Streaming)

CSV โ†’ Parquet ๋ณ€ํ™˜ ํŒจํ„ด#

[S3 Input] โ”€(Put)โ”€โ†’ Glue ETL (CSV โ†’ Parquet ๋ณ€ํ™˜) โ†’ [S3 Output]
โ†‘ Lambda/EventBridge ํŠธ๋ฆฌ๊ฑฐ ๊ฐ€๋Šฅ
โ†’ Athena๊ฐ€ Parquet ํŒŒ์ผ๋กœ ํ›จ์”ฌ ์ ์€ ๋น„์šฉ์œผ๋กœ ์ฟผ๋ฆฌ ๊ฐ€๋Šฅ

AWS Lake Formation#

  • Data Lake = ๋ถ„์„์šฉ ์ค‘์•™ ๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ
  • ๋ณต์žกํ•œ ์ˆ˜๋™ ๋‹จ๊ณ„(์ˆ˜์ง‘, ์ •์ œ, ์ด๋™, ์นดํƒˆ๋กœ๊ทธ) ์ž๋™ํ™”
  • ์ •ํ˜•/๋น„์ •ํ˜• ๋ฐ์ดํ„ฐ ๊ฒฐํ•ฉ
  • Source Blueprints: S3, RDS, ๊ด€๊ณ„ํ˜•/NoSQL DB
  • Row-level / Column-level Fine-grained Access Control
  • AWS Glue ์œ„์— ๊ตฌ์ถ•

์ค‘์•™ ๊ถŒํ•œ ๊ด€๋ฆฌ#

[Athena] โ”€โ†’ [QuickSight]
[Lake Formation] โ† Column-level ์ ‘๊ทผ ์ œ์–ด ์„ค์ •
โ†’ Athena, QuickSight๊ฐ€ Lake Formation์˜ ๊ถŒํ•œ์„ ๋”ฐ๋ฆ„
โ†’ ๊ฐ ์„œ๋น„์Šค๋ณ„๋กœ ๋ณ„๋„ ์ ‘๊ทผ ์ œ์–ด ๋ถˆํ•„์š” โ†’ ์ค‘์•™ ์ง‘์ค‘์‹ ๋ณด์•ˆ

  • ์ด์ „ ์ด๋ฆ„: Kinesis Data Analytics for Apache Flink
  • Java, Scala, SQL๋กœ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ
  • ์†Œ์Šค: Kinesis Data Streams ๋˜๋Š” Amazon MSK (Kafka)
  • ์™„์ „ ๊ด€๋ฆฌํ˜•: ํ”„๋กœ๋น„์ €๋‹, ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ, ์ž๋™ ์Šค์ผ€์ผ๋ง, ๋ฐฑ์—…(Checkpoint/Snapshot)
  • โš ๏ธ Amazon Data Firehose์—์„œ ์ง์ ‘ ์ฝ๊ธฐ ๋ถˆ๊ฐ€ (Data Streams์—์„œ๋งŒ)

Amazon MSK (Managed Streaming for Apache Kafka)#

  • Kinesis์˜ ๋Œ€์•ˆ: AWS์—์„œ ์™„์ „ ๊ด€๋ฆฌํ˜• Kafka
  • Kafka Broker Node + Zookeeper ๋…ธ๋“œ ์ž๋™ ๊ด€๋ฆฌ
  • VPC ๋‚ด ๋ฐฐํฌ, Multi-AZ (์ตœ๋Œ€ 3 AZ)
  • ๋ฐ์ดํ„ฐ๋ฅผ EBS์— ์›ํ•˜๋Š” ๊ธฐ๊ฐ„๋งŒํผ ์ €์žฅ
  • MSK Serverless: ์šฉ๋Ÿ‰ ๊ด€๋ฆฌ ์—†์ด Kafka ์‹คํ–‰

Kinesis Data Streams vs. Amazon MSK#

ํ•ญ๋ชฉKinesis Data StreamsAmazon MSK
๋ฉ”์‹œ์ง€ ํฌ๊ธฐ1 MB ์ œํ•œ๊ธฐ๋ณธ 1 MB, ์ตœ๋Œ€ 10 MB ์„ค์ • ๊ฐ€๋Šฅ
๊ตฌ์กฐShardsKafka Topics with Partitions
ํ™•์žฅShard ๋ถ„ํ• /๋ณ‘ํ•ฉ ๊ฐ€๋ŠฅํŒŒํ‹ฐ์…˜ ์ถ”๊ฐ€๋งŒ ๊ฐ€๋Šฅ
์•”ํ˜ธํ™” (in-flight)TLSPLAINTEXT ๋˜๋Š” TLS
์•”ํ˜ธํ™” (at-rest)KMSKMS

MSK Consumers: Apache Flink, Glue (Streaming ETL), Lambda, EC2/ECS/EKS ์•ฑ


๋น…๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ํŒŒ์ดํ”„๋ผ์ธ ์•„ํ‚คํ…์ฒ˜#

[IoT Devices]
โ”‚
โ–ผ
[Kinesis Data Streams] โ† ์‹ค์‹œ๊ฐ„ ์ˆ˜์ง‘
โ”‚
โ–ผ
[Kinesis Data Firehose] โ† Near real-time ์ „๋‹ฌ
โ”‚
โ–ผ
[S3 (Ingestion Bucket)] โ† Raw ๋ฐ์ดํ„ฐ ์ €์žฅ
โ”‚ โ”‚
โ”‚ โ–ผ (์„ ํƒ)
โ”‚ [SQS โ†’ Lambda] โ† ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ์ฒ˜๋ฆฌ
โ”‚
โ–ผ
[Amazon Athena] โ† Serverless SQL ๋ถ„์„
โ”‚
โ–ผ
[S3 (Reporting Bucket)] โ† ๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ
โ”‚
โ”œโ”€โ”€โ†’ [QuickSight] โ† BI ๋Œ€์‹œ๋ณด๋“œ
โ””โ”€โ”€โ†’ [Redshift] โ† ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค

๐Ÿ“Œ ์‹œํ—˜ ์ž์ฃผ ์ถœ์ œ ํฌ์ธํŠธ#

ํฌ์ธํŠธ๋‚ด์šฉ
DocumentDBMongoDB ํ˜ธํ™˜ AWS ๊ตฌํ˜„
Neptune Use Case์†Œ์…œ ๊ทธ๋ž˜ํ”„, ์‚ฌ๊ธฐ ํƒ์ง€, ์ถ”์ฒœ ์—”์ง„
KeyspacesApache Cassandra ํ˜ธํ™˜
Timestream์‹œ๊ณ„์—ด DB, IoT/์šด์˜ ๋ชจ๋‹ˆํ„ฐ๋ง
Athena ๊ฐ€๊ฒฉ์Šค์บ”๋œ TB๋‹น $5
Athena ๋น„์šฉ ์ ˆ๊ฐParquet/ORC ํฌ๋งท + Partitioning
Athena Federated QueryLambda Data Source Connector ์‚ฌ์šฉ
Athena vs Redshift๊ฐ„๋‹จ S3 ์ฟผ๋ฆฌ โ†’ Athena / ๋ณต์žก Join/์ง‘๊ณ„ โ†’ Redshift
Redshift SpectrumS3 ๋ฐ์ดํ„ฐ๋ฅผ Redshift์— ๋กœ๋”ฉ ์—†์ด ์ฟผ๋ฆฌ
Redshift ๋ฆฌ์ „ ๊ฐ„ Snapshot์ž๋™ ๋ณต์‚ฌ ์„ค์ • ๊ฐ€๋Šฅ
OpenSearch ํŠน์ง•์–ด๋–ค ํ•„๋“œ๋“  ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ (DynamoDB์™€ ์ฐจ์ด)
Glue CatalogAthena/Redshift Spectrum/EMR์ด ์‚ฌ์šฉํ•˜๋Š” ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ
Lake Formation ํŠน์ง•Row/Column ๋ ˆ๋ฒจ ์„ธ๋ฐ€ํ•œ ์ ‘๊ทผ ์ œ์–ด
Flink ์†Œ์ŠคKinesis Data Streams ๋˜๋Š” MSK (Firehose ์•„๋‹˜)
MSK vs KinesisMSK: ๋ฉ”์‹œ์ง€ ํฌ๊ธฐ ์ตœ๋Œ€ 10 MB, ํŒŒํ‹ฐ์…˜ ์ถ”๊ฐ€๋งŒ ๊ฐ€๋Šฅ
QuickSight SPICEIn-memory ๊ณ„์‚ฐ ์—”์ง„
QuickSight Users/GroupsQuickSight ๋‚ด๋ถ€ ๊ฐœ๋… (IAM๊ณผ ๋ณ„๊ฐœ)
EMR Task Node์ฃผ๋กœ Spot Instance ์‚ฌ์šฉ