Skip to content

Aurora and RDS (Relational Database Service)

1. Why This Service Exists (The Real Problem)

The Problem: Running a database on a raw server (EC2 or On-Prem) is painful. - Patching: OS updates, DB engine updates. - Backups: Writing cron jobs to pg_dump, managing disk space, testing restores. - High Availability: Setting up replication (Master-Slave), handling failover IPs, syncing data. - Scaling: Vertical scaling means downtime. Storage scaling means downtime.

The Solution: AWS manages the Undifferentiated Heavy Lifting of database administration. You get an endpoint, you bring the schema.

2. Mental Model (Antigravity View)

The Analogy: A Car Rental with a Chauffeur. - EC2: Buying a car (You drive, you fix flat tires, you fuel up). - RDS: Renting a car (They fix tires, they fuel up, you just drive). - Aurora: A self-driving, self-repairing car that grows bigger when you have more passengers.

One-Sentence Definition: - RDS: Managed standard database engines (Postgres, MySQL, MariaDB, Oracle, SQL Server). - Aurora: Cloud-native re-architecture of Postgres/MySQL for high performance and auto-scaling storage.

3. Core Components (No Marketing)

  1. DB Instance: The compute node (CPU/RAM).
  2. DB Cluster Volume (Aurora only): A virtualized storage layer that spans 3 AZs.
  3. Subnet Group: Defines which subnets the DB can live in (must be at least 2 AZs).
  4. Parameter Group: The my.cnf or postgresql.conf config settings.
  5. Option Group: Extra features (e.g., specific plugins or extensions).
  6. Read Replicas: Read-only copies of the DB to offload read traffic.

4. How It Works Internally (Simplified)

Standard RDS

  • Storage: Basic EBS Volumes attached to an EC2 instance.
  • Replication: Uses standard engine replication (binlog/WAL).
  • Failover: DNS flip. The CNAME db.example.com updates from IP A to IP B. Takes 60-120s.

Aurora (The Special Sauce)

  • Separation of Compute and Storage: The "Database" engine doesn't write to a local disk. It writes to a shared, distributed, log-structured storage volume.
  • 6-Way Replication: Every write is copied 6 times across 3 AZs. You can lose an entire AZ and 1 extra node without data loss.
  • Failover: Instant (< 30s) because the Reader viewing the same storage simply promotes itself to Writer.

5. Common Production Use Cases

  • Transactional Apps (OLTP): User profiles, orders, inventory.
  • Web CMS: WordPress (MySQL/Aurora MySQL).
  • Enterprise Apps: CRM/ERP systems.

6. Architecture Patterns

The "Aurora Serverless" Pattern

Don't guess capacity. Do use Serverless v2.

Architecture: 1. Application: Connects to Cluster Endpoint (Writer). 2. Aurora Serverless: Scales ACUs (Aurora Capacity Units) from 0.5 to 128 instantly based on CPU/Memory load. 3. Use Case: Test environments, spiky workloads, infrequent cron jobs.

The "Read Heavy" Pattern

  1. Writer: One instance handles all INSERT/UPDATE/DELETE.
  2. Readers: 1 to 15 Read Replicas.
  3. Load Balancer: The "Reader Endpoint" automatically load balances SELECT queries across all readers.
  4. App Logic: Code must split queries. Writes -> Writer Endpoint. Reads -> Reader Endpoint.

7. IAM & Security Model

  • Security Groups: Allow access on Port 5432 (Postgres) / 3306 (MySQL) ONLY from the App Security Group.
  • IAM Database Authentication: Instead of hardcoding passwords, use IAM Roles to generate a temporary auth token (expires in 15 mins).
    • Pros: No credential rotation needed.
    • Cons: Slight latency overhead on connection.

8. Cost Model (Very Important)

  • Instance Hours: Paying for the CPU/RAM.
  • Storage:
    • RDS: Provisioned GBs (GP3).
    • Aurora: Pay per GB-stored and per Million I/O requests. (Cost Trap: High I/O apps on Aurora can contain billing shocks).
  • Data Transfer: Replicating data across AZs is free within the cluster (usually), but OUT to internet is expensive.
  • Backup Storage: Equal to your DB size is free. Extra backups cost money.

Optimization: - Stop Idle DBs: RDS instances can be stopped for 7 days (mostly dev environments). - Reserved Instances: Crucial for production. 1-year commitment saves ~40%. - Aurora I/O-Optimized: A new pricing flavor for I/O heavy apps. Higher fixed cost, zero I/O cost.

9. Common Mistakes & Anti-Patterns

  • Publicly Accessible: Putting your DB in a public subnet with Public IP. Never do this. Use a VPN or Bastion Host/SSM to connect.
  • Ignored Maintenance Windows: AWS will forcibly patch your DB during this window. If it's set to "Mon 9am during peak traffic", you will have an outage. Set to Sunday 3am.
  • Using Default Parameter Group: Not tuning max_connections or work_mem for your workload.
  • Assuming Backups are Instant: Point-In-Time recovery (PITR) relies on playing back logs. Restoring a 1TB DB can take hours.

10. When NOT to Use This Service

  • Massive Analytics (OLAP): Don't run SELECT SUM(*) ... GROUP BY on billion rows. Use Redshift or Athena.
  • Key-Value / High Scale: If you need single-digit millisecond latency at million concurrent requests level, use DynamoDB.
  • Graph Data: Use Neptune.
  • Time Series: Use Timestream (or just stick to Postgres/TimescaleDB on RDS).

11. Interview-Level Summary

  • Multi-AZ: Synchronous replication for Disaster Recovery (Standby).
  • Read Replica: Asynchronous replication for Scaling Reads.
  • Aurora Storage: Grows automatically in 10GB chunks. Information is stored in 6 copies across 3 AZs.
  • Endpoint types: Cluster (Writer), Reader (Load Balanced), Instance (Direct).
  • IAM Auth: Passwordless connection using AWS Signature V4.