Skip to content

Kubernetes on AWS (EKS)

1. Why This Exists (The Real Infrastructure Problem)

The Problem: Vanilla Kubernetes (kops/kubeadm) assumes you have infinite control over the network and hardware. On AWS, you don't. - Load Balancing: K8s doesn't know how to spawn an AWS ALB. - Storage: K8s doesn't know how to create an EBS volume. - Auth: K8s has its own User DB. AWS has IAM. You don't want to manage two sets of users.

The Solution: The AWS Cloud Controller Manager (CCM) and CSI Drivers. These are the plugins that translate "K8s Speak" into "AWS API Calls".

2. Mental Model (Antigravity View)

The Analogy: The Universal Translator. - You (K8s): Speak Esperanto ("Create Ingress"). - AWS: Speaks English ("Create Application Load Balancer"). - EKS Add-ons: The Interpreters that sit in the middle and make the magic happen.

One-Sentence Definition: EKS is Standard Kubernetes optimized to delegate "Infrastructure Heavy Lifting" (Networking, Storage, Load Balancing) to AWS Native Services.

3. Architecture Diagram (EKS Production)

[ Internet ]
      |
[ AWS ALB (Application Load Balancer) ] <--- Managed by ALB Controller
      | (Routes to Target Group: Node Ports)
      v
+-- VPC (10.0.0.0/16) --------------------------------------+
|  +-- Public Subnets (NAT Gateways) --------------------+  |
|  |                                                     |  |
|  +-----------------------------------------------------+  |
|                                                           |
|  +-- Private Subnets (Worker Nodes) -------------------+  |
|  |                                                     |  |
|  |  [ Node 1 (EC2) ]       [ Node 2 (EC2) ]            |  |
|  |   IP: 10.0.1.50          IP: 10.0.2.60              |  |
|  |   |                      |                          |  |
|  |   +-> [Pod A] (IP: 10.0.1.51) <-- VPC CNI Plugin    |  |
|  |                                                     |  |
|  +-----------------------------------------------------+  |
+-----------------------------------------------------------+
      |
[ EKS Control Plane ] (Managed by AWS, hidden in their account)

4. Core Concepts (AWS Specific)

  1. VPC CNI Plugin:
    • Standard K8s: Pods have a fake overlay network (100.x.x.x).
    • EKS: Pods get Real VPC IPs. A Pod can talk directly to an RDS instance because they are on the same network CIDR. Fast, but eats IPs.
  2. AWS Load Balancer Controller:
    • Takes a K8s Ingress object -> Auto-provisions an AWS ALB.
    • Takes a K8s Service object -> Auto-provisions an AWS NLB.
  3. EBS CSI Driver:
    • Takes a K8s PersistentVolumeClaim -> Calls AWS API to create an EBS Volume and attach it to the EC2 node.
  4. IRSA (IAM Roles for Service Accounts):
    • The security bridge. Maps a K8s ServiceAccount (backend-sa) to an AWS IAM Role (BackendS3Access). Uses OIDC.

5. How It Works Internally (The Request Flow)

  1. User matches Ingress Rule: api.myapp.com/users.
  2. AWS ALB: Receives request. Looks at Target Group.
  3. IP Mode (Performance): ALB sends packet directly to the Pod IP (bypassing Kube-Proxy/NodePort). This is possible because ALB and Pod are in same VPC network (Thank you, CNI Plugin).
  4. Pod: Processes request.
  5. Pod needs S3:
    • SDK looks for AWS Credentials.
    • EKS "Identity Webhook" injects a temporary AWS Token into the Pod (via volume mount).
    • SDK authenticates with S3 transparently.

6. Command Reference (EKS Specific)

Cluster Management calls

  • aws eks update-kubeconfig --name my-cluster: Generates the ~/.kube/config file to let you use kubectl.
  • eksctl create cluster -f cluster.yaml: The "Easy Mode" standard tool for creating clusters.

Debugging EKS

  • kubectl get pods -n kube-system: Check health of CoreDNS, VPC-CNI, and Kube-Proxy.
  • kubectl logs -n kube-system deployment/aws-load-balancer-controller: CRITICAL. Check why your ALB isn't being created.
  • kubectl describe sa my-service-account: Verify the eks.amazonaws.com/role-arn annotation exists (for IRSA).

7. Production Deployment Example (AWS Integration)

1. IAM Role (Terraform/CloudFormation) Create Role S3-Read-Role with Trust Policy trusting the EKS OIDC Provider.

2. Kubernetes Service Account

apiVersion: v1
kind: ServiceAccount
metadata:
  name: backend-sa
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/S3-Read-Role

3. Ingress (ALB)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: main-ingress
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip # Route directly to Pod IP
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
spec:
  ingressClassName: alb
  rules:
    - host: api.myapp.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: backend-svc
                port:
                  number: 80

8. Scaling Model (EKS Style)

  • Karpenter: The modern standard.
    • Instead of simple Auto Scaling Groups, Karpenter watches the K8s Scheduler.
    • "Hey, a Pod needs a GPU?" -> Karpenter calls EC2 Fleet API -> Launches a g4dn.xlarge -> Joins cluster in 45s.
    • Drastically cheaper and faster than Cluster Autoscaler.

9. Failure Modes (AWS Context)

  • IP Exhaustion: You created your EKS cluster in a small Subnet (/24). VPC CNI assigns an IP to every Pod. You hit 254 IPs and can't schedule more pods. Fix: Use Secondary CIDR blocks or "Prefix Delegation".
  • EBS Stuck Attaching: EBS volumes can only attach to one node in one AZ. If a Pod moves to Node in AZ-B, but volume is in AZ-A, it fails. Fix: Use EBS Multi-Attach (rare) or EFS, or rely on Topology Aware Scheduling.

10. Security Model

  • Private Cluster: Make the API Server Endpoint "Private Only". Only accessible from VPN/Bastion.
  • Security Groups: EKS manages SGs. "Cluster SG" allows control plane <-> Node communication. "Node SG" allows Node <-> Node.
  • Limits: Use ResourceQuotas to prevent one namespace from eating all the AWS Load Balancers ($$$).

11. Cost Model

  • Hidden Costs:
    • NAT Gateway: Every time a Pod pulls a Docker Image from DockerHub, it goes through NAT ($0.045/GB). Fix: Use ECR (internal/free bandwidth) and VPC Endpoints for S3/DynamoDB.
    • Cross-AZ Traffic: Pod A (AZ1) talking to Pod B (AZ2) costs money. Fix: Topology Spread Constraints.
    • Control Plane: $73/month flat.

12. When NOT To Use It

  • Cost Sensitive Small Apps: The $73/month + Load Balancer ($15) + NAT Gateway ($30) base fee is high for a hobby project.
  • Team size < 3: EKS upgrades (every 3 months) are non-trivial work involving checking deprecated APIs and updating Add-ons.

13. Interview & System Design Summary

  • CNI: Uses native VPC networking. Fast but strictly tied to VPC limits.
  • ALB Integration: Ingress -> ALB.
  • Auth: IAM -> RBAC Mapping via aws-auth/Access Entries.
  • IRSA: Pod Identity (Fine-grained permissions).
  • Fargate: Serverless Data Plane (No EC2 management, but strict limitations/higher cost).