Elastic Kubernetes Service (EKS)

1. Why This Service Exists (The Real Problem)

The Problem: Running Kubernetes yourself (Kops/Kubeadm) is the hardest thing in DevOps. - Etcd Management: If the etcd database corrupts, your entire cluster dies. - Control Plane Scaling: As you add 1000 nodes, the API server crashes under load. - Upgrades: Upgrading from K8s 1.28 to 1.29 manually is a week-long nightmare of rotating certificates and dread.

The Solution: AWS manages the Control Plane (API Server + Etcd). You only manage the Worker Nodes.

2. Mental Model (Antigravity View)

The Analogy: The Manager and the Workers. - EKS Control Plane: The HQ (The Bosses). AWS hides this from you. It's a black box API endpoint. You pay $0.10/hour for them to exist. - Worker Nodes: The Employees (EC2 instances). You see these, pay for these, and can SSH into them. - Pods: The actual tasks (Containers) running on the Employees' desks.

One-Sentence Definition: A managed service where AWS runs the Kubernetes control plane for you, ensuring high availability and seamless upgrades.

3. Core Components (No Marketing)

Cluster Endpoint: The URL (https://...sk1.us-east-1.eks.amazonaws.com) where kubectl sends commands.
Managed Node Groups: EC2 instances that AWS automatically patches and joins to the cluster.
Fargate Profile: Option to run Pods without any EC2 nodes (Serverless K8s).
VPC CNI Plugin: The networking magic that gives every Pod a real VPC IP address.
OIDC Provider: The identity bridge allowing K8s Service Accounts to assume AWS IAM Roles.

4. How It Works Internally (Simplified)

Creation: You call CreateCluster. AWS spins up 3 API servers and a highly available Etcd cluster across 3 AZs in their own hidden account.
Networking: AWS creates an Elastic Network Interface (ENI) in your VPC to let the Control Plane talk to your Worker Nodes.
Auth:
- User runs kubectl get pods.
- Request goes to EKS Control Plane.
- EKS authenticates user via AWS IAM.
- EKS authorizes user via K8s RBAC.
- EKS fetches data from Etcd and returns it.

5. Common Production Use Cases

Microservices Orchestration: Running 50+ services that need to talk to each other (Service Mesh).
ML Training: Running Kubeflow or JupyterHub on GPU nodes.
Hybrid Cloud: Using EKS Anywhere to run the same K8s version on-premise and in cloud.

6. Architecture Patterns

The "Karpenter" Scaling Pattern

Don't use the slow, standard Cluster Autoscaler. Do use Karpenter.

Architecture: 1. Pending Pod: A new heavy pod (Request: 64GB RAM) appears. 2. Karpenter: Detects the pending pod immediately (within milliseconds). 3. Provision: Calls EC2 Fleet API to launch the exact right instance type (e.g., r6i.2xlarge) in the right AZ. 4. Join: Node joins cluster in < 60 seconds. Pod runs. 5. Kill: When pod finishes, Karpenter terminates the node instantly to save money.

The "IRSA" (IAM Roles for Service Accounts) Pattern

Don't give the EC2 Node Role permission to S3. (If one pod is compromised, the attacker has the Node's full access). Do use IRSA. - Create an IAM Role S3Reader. - Map it to a K8s ServiceAccount my-app-sa. - The Pod assumes S3Reader directly. Minimal privilege.

7. IAM & Security Model

The Double Auth Problem: - AuthN (Who are you?): Handled by AWS IAM (via aws-iam-authenticator). - AuthZ (Can you lists pods?): Handled by Kubernetes RBAC (RoleBinding). - The bridge: The aws-auth ConfigMap (legacy) or EKS Access Entries (Modern) maps IAM Users/Roles to K8s Groups (e.g., system:masters).

8. Cost Model (Very Important)

Cluster Fee: $0.10 per hour (~$73/month) per cluster.
Worker Nodes: Standard EC2 pricing.
Fargate: Pay per vCPU/RAM per second (More expensive than EC2, but zero management).
Load Balancers: Every K8s Service type: LoadBalancer spins up a Classic/Network LB ($15/month). Use an Ingress Controller (ALB) to share one LB across many services.

9. Common Mistakes & Anti-Patterns

"One Cluster to Rule Them All": Putting Dev, Staging, and Prod in one huge cluster. One bad config upgrade kills everything. Fix: Separate Clusters for Prod and Non-Prod.
IP Exhaustion: Using a /24 VPC subnet. EKS assigns an IP to every single pod. You will run out of IPs in minutes. Fix: Use /16 or Secondary CIDR blocks.
Ignoring Upgrades: K8s deprecates APIs every 3 months. If you ignore upgrades for a year, you are in "Upgrade Hell". EKS forces upgrades eventually.

10. When NOT to Use This Service

Simple Apps: If you just have a frontend and backend, EKS is massive overkill. Use App Runner or Elastic Beanstalk.
Small Team: K8s requires dedicated maintenance (Helm charts, ingress, debugging). If you have < 3 DevOps engineers, stay away.
Stateful Monoliths: Moving a legacy app that writes to local disk into K8s is painful.

11. Interview-Level Summary

Control Plane: Managed by AWS. High Availability (HA) by default.
Data Plane: Managed by You (EC2) or Serverless (Fargate).
CNI: Uses AWS VPC CNI. Pods get real VPC IPs.
Ingress: Use AWS Load Balancer Controller to provision ALBs automatically.
Storage: use EBS CSI Driver for ReadWriteOnce volumes.