Kubernetes on AWS (EKS)
1. Why This Exists (The Real Infrastructure Problem)
The Problem: Vanilla Kubernetes (kops/kubeadm) assumes you have infinite control over the network and hardware. On AWS, you don't. - Load Balancing: K8s doesn't know how to spawn an AWS ALB. - Storage: K8s doesn't know how to create an EBS volume. - Auth: K8s has its own User DB. AWS has IAM. You don't want to manage two sets of users.
The Solution: The AWS Cloud Controller Manager (CCM) and CSI Drivers. These are the plugins that translate "K8s Speak" into "AWS API Calls".
2. Mental Model (Antigravity View)
The Analogy: The Universal Translator. - You (K8s): Speak Esperanto ("Create Ingress"). - AWS: Speaks English ("Create Application Load Balancer"). - EKS Add-ons: The Interpreters that sit in the middle and make the magic happen.
One-Sentence Definition: EKS is Standard Kubernetes optimized to delegate "Infrastructure Heavy Lifting" (Networking, Storage, Load Balancing) to AWS Native Services.
3. Architecture Diagram (EKS Production)
[ Internet ]
|
[ AWS ALB (Application Load Balancer) ] <--- Managed by ALB Controller
| (Routes to Target Group: Node Ports)
v
+-- VPC (10.0.0.0/16) --------------------------------------+
| +-- Public Subnets (NAT Gateways) --------------------+ |
| | | |
| +-----------------------------------------------------+ |
| |
| +-- Private Subnets (Worker Nodes) -------------------+ |
| | | |
| | [ Node 1 (EC2) ] [ Node 2 (EC2) ] | |
| | IP: 10.0.1.50 IP: 10.0.2.60 | |
| | | | | |
| | +-> [Pod A] (IP: 10.0.1.51) <-- VPC CNI Plugin | |
| | | |
| +-----------------------------------------------------+ |
+-----------------------------------------------------------+
|
[ EKS Control Plane ] (Managed by AWS, hidden in their account)
4. Core Concepts (AWS Specific)
- VPC CNI Plugin:
- Standard K8s: Pods have a fake overlay network (100.x.x.x).
- EKS: Pods get Real VPC IPs. A Pod can talk directly to an RDS instance because they are on the same network CIDR. Fast, but eats IPs.
- AWS Load Balancer Controller:
- Takes a K8s
Ingressobject -> Auto-provisions an AWS ALB. - Takes a K8s
Serviceobject -> Auto-provisions an AWS NLB.
- Takes a K8s
- EBS CSI Driver:
- Takes a K8s
PersistentVolumeClaim-> Calls AWS API to create an EBS Volume and attach it to the EC2 node.
- Takes a K8s
- IRSA (IAM Roles for Service Accounts):
- The security bridge. Maps a K8s ServiceAccount (
backend-sa) to an AWS IAM Role (BackendS3Access). Uses OIDC.
- The security bridge. Maps a K8s ServiceAccount (
5. How It Works Internally (The Request Flow)
- User matches Ingress Rule:
api.myapp.com/users. - AWS ALB: Receives request. Looks at Target Group.
- IP Mode (Performance): ALB sends packet directly to the Pod IP (bypassing Kube-Proxy/NodePort). This is possible because ALB and Pod are in same VPC network (Thank you, CNI Plugin).
- Pod: Processes request.
- Pod needs S3:
- SDK looks for AWS Credentials.
- EKS "Identity Webhook" injects a temporary AWS Token into the Pod (via volume mount).
- SDK authenticates with S3 transparently.
6. Command Reference (EKS Specific)
Cluster Management calls
aws eks update-kubeconfig --name my-cluster: Generates the~/.kube/configfile to let you usekubectl.eksctl create cluster -f cluster.yaml: The "Easy Mode" standard tool for creating clusters.
Debugging EKS
kubectl get pods -n kube-system: Check health of CoreDNS, VPC-CNI, and Kube-Proxy.kubectl logs -n kube-system deployment/aws-load-balancer-controller: CRITICAL. Check why your ALB isn't being created.kubectl describe sa my-service-account: Verify theeks.amazonaws.com/role-arnannotation exists (for IRSA).
7. Production Deployment Example (AWS Integration)
1. IAM Role (Terraform/CloudFormation)
Create Role S3-Read-Role with Trust Policy trusting the EKS OIDC Provider.
2. Kubernetes Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
name: backend-sa
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/S3-Read-Role
3. Ingress (ALB)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: main-ingress
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip # Route directly to Pod IP
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
spec:
ingressClassName: alb
rules:
- host: api.myapp.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: backend-svc
port:
number: 80
8. Scaling Model (EKS Style)
- Karpenter: The modern standard.
- Instead of simple Auto Scaling Groups, Karpenter watches the K8s Scheduler.
- "Hey, a Pod needs a GPU?" -> Karpenter calls EC2 Fleet API -> Launches a
g4dn.xlarge-> Joins cluster in 45s. - Drastically cheaper and faster than Cluster Autoscaler.
9. Failure Modes (AWS Context)
- IP Exhaustion: You created your EKS cluster in a small Subnet (
/24). VPC CNI assigns an IP to every Pod. You hit 254 IPs and can't schedule more pods. Fix: Use Secondary CIDR blocks or "Prefix Delegation". - EBS Stuck Attaching: EBS volumes can only attach to one node in one AZ. If a Pod moves to Node in AZ-B, but volume is in AZ-A, it fails. Fix: Use EBS Multi-Attach (rare) or EFS, or rely on Topology Aware Scheduling.
10. Security Model
- Private Cluster: Make the API Server Endpoint "Private Only". Only accessible from VPN/Bastion.
- Security Groups: EKS manages SGs. "Cluster SG" allows control plane <-> Node communication. "Node SG" allows Node <-> Node.
- Limits: Use ResourceQuotas to prevent one namespace from eating all the AWS Load Balancers ($$$).
11. Cost Model
- Hidden Costs:
- NAT Gateway: Every time a Pod pulls a Docker Image from DockerHub, it goes through NAT ($0.045/GB). Fix: Use ECR (internal/free bandwidth) and VPC Endpoints for S3/DynamoDB.
- Cross-AZ Traffic: Pod A (AZ1) talking to Pod B (AZ2) costs money. Fix: Topology Spread Constraints.
- Control Plane: $73/month flat.
12. When NOT To Use It
- Cost Sensitive Small Apps: The $73/month + Load Balancer ($15) + NAT Gateway ($30) base fee is high for a hobby project.
- Team size < 3: EKS upgrades (every 3 months) are non-trivial work involving checking deprecated APIs and updating Add-ons.
13. Interview & System Design Summary
- CNI: Uses native VPC networking. Fast but strictly tied to VPC limits.
- ALB Integration: Ingress -> ALB.
- Auth: IAM -> RBAC Mapping via
aws-auth/Access Entries. - IRSA: Pod Identity (Fine-grained permissions).
- Fargate: Serverless Data Plane (No EC2 management, but strict limitations/higher cost).