Study guide series — Vol. 3

Kubernetes
Study Guide

40 Questions

8 Domains

Engineer level

Mastered

0 / 40

Filter //

Easy

Medium

Hard

🏗️

Core Architecture 5 questions

Explain the Kubernetes control plane components and what each does.

Easy

›

The control plane is the brain of the cluster — it makes global decisions about scheduling, state, and responding to events.

kube-apiserver — the front door. All kubectl commands, controller watches, and kubelet registrations hit this REST API. Only component that talks to etcd directly.
etcd — the cluster's distributed key-value store. Source of truth for all cluster state — Pod specs, Secrets, ConfigMaps, node registrations. Quorum-based (needs majority of nodes to be healthy).
kube-scheduler — watches for unscheduled Pods and assigns them to a node based on resource requests, taints/tolerations, affinity rules, and node conditions.
kube-controller-manager — runs all the built-in controllers in one process: Deployment controller, ReplicaSet controller, Node controller, Job controller, etc. Each controller reconciles desired vs actual state.
cloud-controller-manager — integrates with cloud provider APIs (AWS, GCP, Azure) to provision load balancers, persistent volumes, and node lifecycle.

When asked "what happens when I run kubectl apply?" — trace the path: kubectl → apiserver → stored in etcd → relevant controller detects change → scheduler assigns node → kubelet on that node creates the container.

What runs on a worker node? Explain the kubelet, kube-proxy, and container runtime.

Easy

›

kubelet — the node agent. Watches the API server for Pods assigned to its node, instructs the container runtime to start/stop containers, reports node and Pod status back, runs liveness/readiness probes.

kube-proxy — maintains iptables (or IPVS) rules on the node that implement Service routing. When you hit a ClusterIP, kube-proxy's rules load-balance traffic to healthy Pod endpoints. Note: modern CNI plugins (Cilium) can replace kube-proxy entirely with eBPF.

Container Runtime — the software that actually runs containers. Kubernetes uses the Container Runtime Interface (CRI). Common runtimes: containerd (default in most distributions), CRI-O. Docker Engine is no longer supported directly (was removed in K8s 1.24).

What is etcd and what happens if it becomes unavailable?

Medium

›

etcd is a distributed, consistent key-value store using the Raft consensus algorithm. Every piece of cluster state lives there — Pod specs, Service definitions, Secrets, RBAC rules, node registrations.

If etcd becomes unavailable:

The API server can no longer read or write state — kubectl commands fail
Existing Pods continue running on nodes (kubelet keeps them alive from local state)
No new Pods can be scheduled, no scaling, no deployments, no config changes
Controllers stop reconciling — if a Pod crashes, it won't be replaced

etcd disk exhaustion is a real production failure mode — etcd writes an append-only WAL and periodically compacts, but if disk fills, it stops accepting writes. Remediation: etcdctl compact + etcdctl defrag.

This is one of the most common K8s incident scenarios in senior interviews. Know the compaction/defrag sequence cold: etcdctl compact $(etcdctl endpoint status --write-out="json" | jq '.[0].Status.header.revision') followed by etcdctl defrag.

What is the difference between a Pod, a ReplicaSet, and a Deployment?

Medium

›

Pod — the smallest deployable unit. One or more tightly-coupled containers sharing a network namespace and storage. Ephemeral — if it dies, it's not replaced unless managed by a controller.

ReplicaSet — ensures a specified number of Pod replicas are running at all times. Uses a label selector to identify its Pods. If a Pod dies, it creates a replacement. Rarely created directly.

Deployment — manages ReplicaSets. Adds rollout/rollback capability: when you update a Deployment spec, it creates a new ReplicaSet and gradually scales it up while scaling down the old one (rolling update). Provides the kubectl rollout undo mechanism.

Hierarchy: Deployment → manages → ReplicaSet → manages → Pods

Always deploy stateless apps via Deployments, never bare Pods. Bare Pods have no self-healing. If asked "what's your rollback strategy?" — Deployments keep the old ReplicaSet around so rollback is instant.

What is a StatefulSet and when do you use it over a Deployment?

Medium

›

A StatefulSet manages Pods that need stable, persistent identity across rescheduling:

Stable network identity — Pods get predictable DNS names: web-0.web.default.svc.cluster.local
Stable storage — each Pod gets its own PVC via volumeClaimTemplates. When a Pod is rescheduled, it reattaches to the same PVC
Ordered deployment/scaling — Pods start and stop in order (0, 1, 2...), not simultaneously

Use StatefulSets for: databases (PostgreSQL, MySQL), distributed stores (Cassandra, ZooKeeper, Kafka), anything where Pod identity matters.

Use Deployments for: stateless web servers, APIs, workers — anything where Pods are interchangeable.

🌐

Networking & Services 6 questions

What are the four Service types in Kubernetes and when do you use each?

Easy

›

ClusterIP — default. Stable virtual IP reachable only within the cluster. Use for internal service-to-service communication.

NodePort — exposes the Service on a static port (30000–32767) on every node's IP. External traffic can reach it via NodeIP:NodePort. Mostly for dev/testing or when you control external routing.

LoadBalancer — provisions a cloud load balancer (AWS ELB, GCP LB) via the cloud-controller-manager. The standard way to expose a Service externally in managed K8s (EKS, GKE, AKS). Each Service gets its own LB — can be expensive at scale.

ExternalName — maps a Service to a DNS name outside the cluster. No proxying — returns a CNAME. Use to alias external services (e.g., RDS endpoint) within cluster DNS.

At scale, you don't use a LoadBalancer Service per app — you use one Ingress controller with a single LB, and route all traffic via Ingress rules. Know this pattern cold.

What is an Ingress and how does it differ from a Service?

Medium

›

An Ingress is a layer 7 HTTP/HTTPS router that routes external traffic to internal Services based on hostname and path rules — using a single external load balancer entry point.

# Example Ingress
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /v1
        backend:
          service:
            name: api-v1-svc
            port: number: 80
      - path: /v2
        backend:
          service:
            name: api-v2-svc

An Ingress Controller (nginx, AWS ALB Ingress Controller, Traefik, Istio) must be deployed to implement the Ingress resource — the resource is just configuration, the controller is the actual proxy.

Key features: TLS termination, path-based routing, host-based routing, rewrites, rate limiting (controller-dependent).

A Service is getting 502 errors. Walk through how you'd debug it.

Hard

›

502 means the load balancer/proxy received an invalid or no response from the backend. Systematic diagnosis:

1. Check Endpoints:

kubectl get endpoints my-service

If ENDPOINTS is <none> — the Service selector doesn't match any Pod labels. Compare kubectl get svc my-service -o yaml selector vs kubectl get pods --show-labels.

2. Check Pod health:

kubectl get pods -l app=my-app
kubectl describe pod <pod-name>

Are Pods in Running state? Are readiness probes passing? A Pod not passing readiness is removed from Endpoints automatically.

3. Test directly:

kubectl exec -it debug-pod -- curl http://<pod-ip>:<port>

Bypasses Service/kube-proxy — isolates whether the app itself is responding.

4. Check Service port mapping: Confirm targetPort matches the port the container is actually listening on.

5. Check Network Policy: If NetworkPolicies exist, they may be blocking traffic to the Pod.

The most common 502 cause: label selector mismatch between Service and Pods. Always check kubectl get endpoints first — if it's empty, that's your answer.

What is a CNI plugin and what does it do?

Medium

›

CNI (Container Network Interface) is the plugin standard for Pod networking. When a Pod is created, the kubelet calls the CNI plugin to: assign an IP address, set up network interfaces inside the Pod, and configure routing so the Pod can communicate with others.

Kubernetes requires that: every Pod gets a unique IP, Pods can reach other Pods directly (no NAT), nodes can reach all Pods.

Popular CNI plugins:

Flannel — simple, uses VXLAN overlay. Good for learning. Limited NetworkPolicy support.
Calico — popular for production. Supports NetworkPolicy, can use BGP (no overlay). Great for on-prem.
Cilium — eBPF-based. Replaces kube-proxy, advanced NetworkPolicy, observability via Hubble, very high performance.
AWS VPC CNI — on EKS, assigns actual VPC IP addresses to Pods. No overlay, native VPC routing.

What are NetworkPolicies and how do you use them to restrict Pod traffic?

Medium

›

NetworkPolicies are firewall rules for Pod-to-Pod traffic. By default, all Pods can reach all other Pods. Once you apply any NetworkPolicy selecting a Pod, that Pod becomes deny-all for the policy type (ingress/egress), and only explicitly allowed traffic passes.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-frontend
spec:
  podSelector:
    matchLabels:
      app: api          # applies to api pods
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend  # only allow from frontend pods
    ports:
    - port: 8080

Important: NetworkPolicies are enforced by the CNI plugin — if your CNI doesn't support them (e.g., Flannel), the resources exist but have no effect.

Default-deny pattern: apply an empty NetworkPolicy (no ingress/egress rules) selecting all Pods to create a deny-all baseline, then add explicit allow policies per service. This is the production security baseline.

How does DNS resolution work inside a Kubernetes cluster?

Hard

›

Kubernetes runs CoreDNS as a cluster DNS server (deployed as a Deployment in the kube-system namespace). Each Pod's /etc/resolv.conf points to the CoreDNS ClusterIP.

DNS record format for a Service: <service-name>.<namespace>.svc.cluster.local

Within the same namespace: you can just use my-service — the search domains in resolv.conf expand it automatically.

For Pods: <pod-ip-dashes>.<namespace>.pod.cluster.local (e.g., 10-244-1-5.default.pod.cluster.local)

Headless Services (ClusterIP: None): CoreDNS returns the individual Pod IPs instead of a single ClusterIP — used by StatefulSets so clients can connect directly to specific Pods.

Debugging DNS:

kubectl exec -it debug-pod -- nslookup my-service
kubectl exec -it debug-pod -- cat /etc/resolv.conf

⚖️

Scheduling & Resource Management 5 questions

What are resource requests and limits? What happens when a Pod exceeds them?

Medium

›

Requests — the amount of CPU/memory the scheduler guarantees the container will have. Used for scheduling decisions — a node must have at least this much available.

Limits — the maximum the container can use. Enforced at runtime by the kernel.

When exceeded:

CPU limit exceeded — the container is throttled (cgroup CPU throttling). It slows down but keeps running. CPU throttle is silent and commonly causes latency spikes.
Memory limit exceeded — the container is OOM-killed (SIGKILL, exit code 137). Kubernetes restarts it per the restart policy.

QoS classes (scheduler uses these for eviction priority):

Guaranteed — requests == limits for all containers. Last to be evicted.
Burstable — requests < limits. Middle priority.
BestEffort — no requests/limits set. First evicted under pressure.

Always set requests. Always set memory limits. Consider NOT setting CPU limits in production — CPU throttling can silently kill P99 latency. Instead, right-size requests and use VPA.

A Pod is stuck in Pending state. Walk through your diagnosis.

Hard

›

Pending means the scheduler cannot place the Pod on any node. Always start here:

kubectl describe pod <pod-name>

Check the Events section at the bottom — the scheduler emits a specific message.

Common causes:

Insufficient resources — 0/3 nodes are available: 3 Insufficient memory. All nodes lack the requested CPU/memory. Fix: scale cluster, reduce requests, or check if requests are set unreasonably high.
No nodes match taints/tolerations — 3 node(s) had taint that the pod didn't tolerate. Pod needs a toleration or nodes need the taint removed.
Node affinity/selector mismatch — nodeSelector or nodeAffinity rules don't match any node. Check node labels: kubectl get nodes --show-labels.
PVC not bound — Pod is waiting for a PersistentVolumeClaim to become Bound. Check kubectl get pvc.
Cluster autoscaler pending — CA is provisioning a new node. Pod will schedule once it's ready.

In interviews, narrate the thought process: "First I'd check the scheduler events in kubectl describe pod — that message tells me exactly what constraint is failing, so I don't need to guess."

What are taints and tolerations? How do you use them?

Medium

›

Taints are applied to nodes to repel Pods that don't explicitly tolerate them.
Tolerations are applied to Pods to allow them to schedule on tainted nodes.

# Taint a node (mark it as GPU-only)
kubectl taint nodes gpu-node-1 hardware=gpu:NoSchedule

# Pod toleration
tolerations:
- key: "hardware"
  operator: "Equal"
  value: "gpu"
  effect: "NoSchedule"

Taint effects:

NoSchedule — new Pods without toleration won't be scheduled here. Existing Pods stay.
PreferNoSchedule — scheduler tries to avoid, but will use if no other option.
NoExecute — evicts existing Pods that don't have the toleration, in addition to blocking new ones.

Use cases: dedicated GPU nodes, spot-instance-only nodes, master node isolation (control plane uses node-role.kubernetes.io/master:NoSchedule).

What is a LimitRange and a ResourceQuota? When do you use each?

Medium

›

LimitRange — sets default and maximum resource requests/limits for containers in a namespace. Ensures every Pod has sensible defaults even if the developer doesn't specify them.

apiVersion: v1
kind: LimitRange
spec:
  limits:
  - type: Container
    default:          # applied if container sets no limits
      cpu: 500m
      memory: 256Mi
    defaultRequest:   # applied if container sets no requests
      cpu: 100m
      memory: 128Mi
    max:
      cpu: "2"
      memory: 1Gi

ResourceQuota — caps the total resources consumed by all objects in a namespace. Prevents a single team from starving the cluster.

spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    pods: "50"
    services.loadbalancers: "2"

Use LimitRange to set per-Pod guardrails. Use ResourceQuota to enforce per-namespace budgets in multi-tenant clusters.

How does the Cluster Autoscaler work and what are its limitations?

Hard

›

The Cluster Autoscaler (CA) watches for Pods stuck in Pending due to insufficient resources and adds nodes, and watches for underutilized nodes and removes them.

Scale-up trigger: Pod is Pending because no node has enough capacity → CA calculates which node group to expand → provisions new node (via cloud provider ASG/MIG) → node joins cluster → Pod schedules.

Scale-down trigger: A node has been underutilized (all Pods using <50% of requests) for 10+ minutes → CA checks if Pods can be rescheduled elsewhere → drains and terminates the node.

Limitations:

Slow — new node provisioning takes 2–5 minutes. Sudden traffic spikes hit before capacity is ready.
Won't scale down nodes with Pods that have local storage, certain annotations, or PodDisruptionBudgets blocking eviction
Relies on Pod requests being accurate — if requests are too low, CA thinks nodes have capacity and won't scale

Karpenter (AWS-native): faster alternative that provisions nodes directly in seconds, not minutes, and is more cost-aware (right-sizes instance types to exact Pod needs).

💾

Storage & Persistence 4 questions

Explain the relationship between PersistentVolume, PersistentVolumeClaim, and StorageClass.

Easy

›

PersistentVolume (PV) — a cluster-level storage resource, provisioned by an admin or dynamically. Represents actual storage (EBS volume, NFS share, Azure Disk).

PersistentVolumeClaim (PVC) — a user's request for storage. Specifies size, access mode, and StorageClass. Kubernetes binds it to a matching PV.

StorageClass — defines the "type" of storage and how it's dynamically provisioned. When a PVC references a StorageClass, the provisioner automatically creates a PV.

apiVersion: v1
kind: PersistentVolumeClaim
spec:
  storageClassName: gp3    # references a StorageClass
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 20Gi

Access modes: ReadWriteOnce (one node, read-write), ReadOnlyMany (many nodes, read-only), ReadWriteMany (many nodes, read-write — requires NFS/EFS).

What is a CSI driver and why was it introduced?

Medium

›

The Container Storage Interface (CSI) is a standard API for exposing storage systems to containerized workloads. Before CSI, storage plugins were compiled directly into the Kubernetes codebase — storage vendors had to contribute to the K8s repo, and bugs required a K8s release to fix.

CSI decouples storage from K8s core: vendors ship their own CSI driver as a separate container that runs in the cluster. K8s communicates with it via a standard gRPC interface.

Examples: AWS EBS CSI Driver, AWS EFS CSI Driver, Azure Disk CSI, GCP PD CSI, Portworx, NetApp Trident.

On EKS: the EBS CSI driver must be explicitly installed (it's an EKS Add-On) to provision gp3 volumes. Without it, dynamic provisioning fails.

What is the difference between a ConfigMap and a Secret?

Easy

›

ConfigMap — stores non-sensitive configuration data as key-value pairs. Mounted as files or injected as environment variables. Stored in plaintext in etcd.

Secret — stores sensitive data. Values are base64-encoded (not encrypted by default). Stored in etcd — requires encryption at rest (EncryptionConfiguration) enabled in the API server to actually be encrypted.

Important caveats:

Base64 is encoding, not encryption — don't confuse them
Secrets should also be managed via external secret stores: AWS Secrets Manager + External Secrets Operator, HashiCorp Vault + Vault Agent, Sealed Secrets
Avoid injecting Secrets as env vars — they appear in crash dumps and kubectl describe pod. Prefer mounting as files in a tmpfs volume.

If asked "how do you manage secrets in K8s?" — the answer should be External Secrets Operator pulling from AWS Secrets Manager, not just "I use K8s Secrets".

What volume types would you use for sharing data between containers in the same Pod?

Medium

›

Containers in a Pod can share volumes mounted at the same path. Common volume types for this:

emptyDir — ephemeral directory created when Pod starts, deleted when it ends. Lives on the node's disk (or memory if medium: Memory — becomes a tmpfs). Perfect for sidecar patterns: a log collector sidecar reads from the same emptyDir that the main app writes logs to.
configMap / secret — mount cluster resources as files.
projected — combine multiple sources (configMap, secret, serviceAccountToken, downwardAPI) into one mount point.

Common sidecar patterns using emptyDir: log shipping (Fluentd sidecar), proxy injection (Envoy/Istio), init containers that download/unpack assets before the main container starts.

🔐

Security & RBAC 5 questions

Explain RBAC in Kubernetes: Roles, ClusterRoles, RoleBindings, ClusterRoleBindings.

Medium

›

RBAC controls who can do what to which resources in the API server.

Role — grants permissions within a specific namespace.
ClusterRole — grants permissions cluster-wide (or to non-namespaced resources like Nodes, PVs).
RoleBinding — binds a Role or ClusterRole to a subject (User, Group, ServiceAccount) within a namespace.
ClusterRoleBinding — binds a ClusterRole to a subject across all namespaces.

kind: Role
rules:
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get","list","watch","create","update"]
---
kind: RoleBinding
subjects:
- kind: ServiceAccount
  name: ci-runner
  namespace: default
roleRef:
  kind: Role
  name: deployment-manager

Always use ServiceAccounts for Pod identity, never User credentials. For CI/CD pipelines touching the cluster — dedicated ServiceAccount with a scoped Role, not cluster-admin.

What is a ServiceAccount and how does a Pod use it to authenticate with the API server?

Medium

›

A ServiceAccount is an identity for processes running in a Pod. Every namespace has a default ServiceAccount, but you should create dedicated ones per workload.

When a Pod is created with a ServiceAccount, Kubernetes automatically mounts a projected volume at /var/run/secrets/kubernetes.io/serviceaccount/token containing a short-lived JWT token (bound service account token, introduced in K8s 1.21).

The Pod uses this token to authenticate API requests. The API server validates the token, looks up the ServiceAccount, and evaluates RBAC to authorize the action.

On EKS: IRSA (IAM Roles for Service Accounts) — annotate a ServiceAccount with an IAM Role ARN. EKS's webhook injects AWS credentials as a projected token, enabling Pods to call AWS APIs without instance profile credentials.

serviceAccountName: my-app-sa
# Annotation on SA:
eks.amazonaws.com/role-arn: arn:aws:iam::123:role/MyRole

What is a Pod Security Standard / Admission Controller and how do you enforce security policies?

Hard

›

Pod Security Standards (PSS) (replaced PodSecurityPolicies in K8s 1.25) define three security profiles:

Privileged — no restrictions. For trusted system workloads.
Baseline — prevents known privilege escalation. Minimum restrictions for most workloads.
Restricted — heavily restricted, follows security best practices (no root, read-only root FS, dropped capabilities).

Enforced via the built-in Pod Security Admission controller — label your namespace:

labels:
  pod-security.kubernetes.io/enforce: restricted
  pod-security.kubernetes.io/warn: restricted

Advanced policy enforcement — use OPA/Gatekeeper or Kyverno: admission webhooks that intercept API requests and enforce custom policies (e.g., require all images come from a specific registry, require resource limits set, require specific labels).

Mention Kyverno for policy-as-code — it uses K8s-native YAML syntax (no Rego), making it more accessible for teams already comfortable with K8s manifests.

How do you prevent a Pod from running as root?

Medium

›

Use a securityContext in the Pod or container spec:

securityContext:
  runAsNonRoot: true       # K8s rejects Pod if image defaults to root
  runAsUser: 1000          # run as specific UID
  runAsGroup: 3000
  fsGroup: 2000            # GID for volume ownership
  readOnlyRootFilesystem: true  # prevent writes to container FS
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]          # drop all Linux capabilities
    add: ["NET_BIND_SERVICE"]  # add back only what's needed

These can be set at the Pod level (applies to all containers) and overridden at the container level.

At the cluster level, enforce via Pod Security Standards (restricted profile) or a Kyverno/Gatekeeper policy that rejects Pods without runAsNonRoot: true.

What is a mutating vs validating admission webhook?

Hard

›

Admission webhooks intercept API requests after authentication/authorization but before persistence in etcd.

Mutating Admission Webhook — can modify the incoming object. Runs first. Examples: Istio injects an Envoy sidecar container into every Pod, cert-manager injects CA bundles, resource default injectors add resource limits if not set.

Validating Admission Webhook — can only approve or reject. Runs after mutation. Examples: OPA Gatekeeper enforcing policy rules, requiring specific labels or annotations, blocking images from untrusted registries.

Both are registered via MutatingWebhookConfiguration / ValidatingWebhookConfiguration resources, pointing to an HTTPS endpoint (your webhook server or a tool like Kyverno/Gatekeeper).

Risk: If a webhook is unavailable and failurePolicy: Fail, all matching API requests fail — can break cluster operations. Set failurePolicy: Ignore for non-critical webhooks or ensure high availability.

📊

Observability & Pod Health 5 questions

What are liveness, readiness, and startup probes? How do they differ?

Easy

›

Readiness Probe — "is this Pod ready to receive traffic?" If it fails, the Pod is removed from Service Endpoints (no traffic). The Pod keeps running. Use for: slow startup, dependency checks, circuit-breaker patterns.

Liveness Probe — "is this Pod still alive?" If it fails repeatedly (based on failureThreshold), kubelet restarts the container. Use for: detecting deadlocks, stuck processes that appear running but aren't making progress.

Startup Probe — "has the app finished starting?" Disables the liveness probe until it passes. Prevents liveness from killing a slow-starting app. Once startup probe passes, it stops running.

Probe types: httpGet, tcpSocket, exec (run a command inside the container).

Common mistake: setting liveness probe too aggressively (low thresholds) causing restart loops under load. Liveness should detect "dead, will never recover" — not "slow right now." Readiness handles the slow-under-load case.

A Pod is in CrashLoopBackOff. What's your diagnostic process?

Hard

›

CrashLoopBackOff means the container is starting, crashing, and Kubernetes is exponentially backing off restart attempts (10s, 20s, 40s... up to 5 minutes).

Step 1 — Get logs from the crashed instance:

kubectl logs <pod> --previous   # logs from the last crash

Step 2 — Check events:

kubectl describe pod <pod>

Look at exit code in the container status section.

Exit code guide:

1 — application error (check app logs)
127 — command not found (check image/entrypoint)
137 — OOM killed (SIGKILL). Increase memory limit.
139 — segmentation fault
143 — SIGTERM, graceful shutdown requested but container exited non-zero

Step 3 — Debug the image directly:

kubectl run debug --image=my-app:tag --command -- sleep 3600
kubectl exec -it debug -- /bin/sh

If logs are empty (container crashes before writing anything), check the image entrypoint and whether required env vars or mounted secrets are present: kubectl exec -- env | grep MY_VAR.

How do you set up monitoring and alerting for a Kubernetes cluster?

Medium

›

Standard stack: Prometheus + Grafana

Deploy via the kube-prometheus-stack Helm chart — installs Prometheus, Grafana, Alertmanager, and pre-built dashboards for cluster metrics.

What Prometheus scrapes:

kube-state-metrics — Deployment/Pod/Node status metrics
node-exporter — per-node CPU, memory, disk, network
kubelet /metrics and /metrics/cadvisor — container resource usage
Your application's /metrics endpoint (instrument with Prometheus client library)

Alerting: Define PrometheusRules (e.g., PodCrashLooping, NodeMemoryPressure, DeploymentReplicasMismatch) → Alertmanager routes to PagerDuty/Slack.

On EKS: use Amazon Managed Service for Prometheus (AMP) + Amazon Managed Grafana to skip managing the Prometheus stack yourself.

What is a PodDisruptionBudget and when do you need one?

Medium

›

A PodDisruptionBudget (PDB) limits how many Pods of a Deployment/StatefulSet can be simultaneously unavailable during voluntary disruptions — node drains, upgrades, Cluster Autoscaler scale-downs.

apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
  minAvailable: 2     # at least 2 pods must always be available
  # OR
  maxUnavailable: 1   # at most 1 pod can be down at once
  selector:
    matchLabels:
      app: my-api

When you need one: Any stateless service with multiple replicas that you want to remain available during node maintenance, K8s upgrades, or Cluster Autoscaler scale-downs.

Without a PDB, kubectl drain can evict all Pods of a deployment at once — causing an outage even if you have multiple replicas.

PDBs only protect against voluntary disruptions. If a node fails unexpectedly, PDBs don't apply. Combine with pod anti-affinity rules to spread Pods across nodes/AZs for hardware failure tolerance.

How does Horizontal Pod Autoscaling (HPA) work?

Medium

›

The HPA controller watches a target metric (CPU, memory, or custom) and adjusts the replica count of a Deployment/StatefulSet to keep the metric near a target value.

How it works:

Metrics server (or Prometheus Adapter for custom metrics) collects current resource usage
HPA computes: desiredReplicas = ceil(currentReplicas × currentValue / targetValue)
Updates the Deployment's replica count

kubectl autoscale deployment my-app \
  --min=2 --max=20 --cpu-percent=60

Requirements: Pods must have CPU requests set — HPA uses requests as the denominator for CPU utilization %.

KEDA (Kubernetes Event-Driven Autoscaling): extends HPA to scale on external metrics — SQS queue depth, Kafka lag, Datadog metrics, cron schedules. Scales to zero replicas when idle (HPA cannot scale below 1).

🚀

Deployments, Helm & GitOps 5 questions

What are Kubernetes deployment strategies? Compare RollingUpdate vs Recreate.

Easy

›

RollingUpdate (default) — gradually replaces old Pods with new ones. Controlled by:

maxUnavailable — max Pods that can be down at once (default 25%)
maxSurge — max extra Pods above desired count during rollout (default 25%)

Zero-downtime if readiness probes are configured correctly — new Pods must pass readiness before old ones are terminated.

Recreate — terminates all old Pods, then creates all new ones. Causes downtime. Use only when old and new versions absolutely cannot run simultaneously (schema changes, single-process architecture).

Advanced strategies require external tools: blue/green (swap the Service selector), canary (use two Deployments with different weights, or Argo Rollouts), A/B testing (Istio traffic weights).

What is Helm and what problem does it solve?

Medium

›

Helm is the Kubernetes package manager. It solves the problem of managing collections of related K8s manifests as a single deployable unit.

Core concepts:

Chart — a package of templated K8s manifests + default values
Release — an installed instance of a chart in a cluster. Multiple releases of the same chart can coexist (e.g., postgres-dev and postgres-prod)
Values — parameters that customize the chart. Override with -f values.yaml or --set key=value
Repository — a collection of charts (Artifact Hub, Bitnami, your private OCI registry)

helm install my-app ./charts/my-app \
  -f values-prod.yaml \
  --set image.tag=v1.2.3 \
  --namespace production

Helm also tracks release history — helm rollback my-app 1 reverts to a previous revision.

For production: always use versioned chart releases and pinned image tags. Never helm upgrade --install with image.tag=latest in a production pipeline.

What is GitOps? Compare Argo CD and Flux.

Hard

›

GitOps is an operational model where Git is the single source of truth for cluster state. A GitOps agent continuously reconciles the cluster to match what's declared in Git. Benefits: full audit trail, pull-request-based change management, automatic drift detection and correction.

Argo CD: declarative GitOps CD tool. Syncs Git repos to cluster. Strong UI, multi-cluster support, ApplicationSets for templating across many clusters/environments, integrates with Argo Rollouts for canary/blue-green.

Flux: GitOps toolkit — more modular/composable. Flux controllers (source-controller, kustomize-controller, helm-controller) can be composed. CLI-first, tighter GitHub/GitLab integration.

Choose Argo CD if: you want a strong UI, multi-cluster management from one pane, and an established community.

Choose Flux if: you prefer a composable CLI-driven approach and deeper native integration with OCI artifacts for Helm charts.

What are init containers and how do you use them?

Medium

›

Init containers run to completion before any app containers start. They run sequentially (not in parallel). If any init container fails, the Pod restarts until it succeeds.

Use cases:

Wait for a dependency to be ready: until nc -z db-service 5432; do sleep 2; done
Run database migrations before the app starts
Clone a git repo or download config files into a shared emptyDir volume
Register the Pod with an external service before it becomes ready
Fetch secrets from Vault before the main container has access to them

initContainers:
- name: wait-for-db
  image: busybox
  command: ['sh', '-c',
    'until nc -z db-svc 5432; do echo waiting; sleep 2; done']

Init containers can have different images and security contexts from the main container — useful for privileged setup steps that the main app doesn't need.

How do you implement zero-downtime deployments in Kubernetes?

Hard

›

Zero-downtime requires several pieces working together:

Readiness probes — new Pods must pass readiness before they receive traffic. Old Pods stay in rotation until new ones are ready.
RollingUpdate strategy — set maxUnavailable: 0 (never take a Pod down before a new one is ready) and maxSurge: 1 (one extra Pod at a time).
Graceful shutdown — app must handle SIGTERM and finish in-flight requests. Set terminationGracePeriodSeconds to allow time. Add a preStop lifecycle hook with a short sleep to let kube-proxy drain the endpoint before SIGTERM.
PodDisruptionBudget — prevents node drains from taking down all replicas simultaneously.
Pod anti-affinity — spread replicas across nodes/AZs so a node failure doesn't take down multiple Pods.

The preStop sleep hook (sleep 5) is critical — there's a race condition between kube-proxy updating iptables rules and the Pod being removed from Endpoints. The sleep bridges that gap and prevents 502s during rollout.

🔧

Troubleshooting & War Stories 5 questions

A node is NotReady. What do you do?

Hard

›

Step 1 — Assess impact:

kubectl get nodes
kubectl describe node <node-name>

Check Conditions — look for MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable.

Step 2 — Check the kubelet: SSH to the node.

systemctl status kubelet
journalctl -u kubelet -n 100 --no-pager

Common causes: kubelet crashed, certificate expired, containerd unhealthy, disk full, out of memory (OOM on the node itself).

Step 3 — If unrecoverable, cordon and drain:

kubectl cordon <node>    # stop new pods scheduling here
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data

Step 4 — Replace: In cloud environments, terminate the instance — the ASG or Karpenter will provision a fresh replacement node.

Node controller behavior: After ~40s NotReady, controller marks Pods as Unknown. After 5 minutes, eviction begins and Pods are rescheduled to healthy nodes.

Pods are being evicted. What are the possible causes and how do you fix them?

Hard

›

Pod eviction happens when a node is under resource pressure. The kubelet evicts Pods to reclaim resources.

Causes (in order of eviction priority — BestEffort evicted first):

Node memory pressure — node's actual memory usage is high. Evicts BestEffort Pods first, then Burstable, then Guaranteed.
Node disk pressure — node's disk is full. Often caused by unrotated container logs or container image layer accumulation. Fix: increase disk, configure log rotation, enable image garbage collection.
PID pressure — too many processes on the node.

Diagnosis:

kubectl describe node <node> | grep -A5 Conditions
kubectl get events --field-selector reason=Evicted

Fixes:

Set resource requests on all Pods (prevents BestEffort classification)
Increase node size or add nodes
Enable log rotation (--container-log-max-size on kubelet)
Configure image GC thresholds on kubelet

Your kubectl commands are timing out. The API server appears unresponsive. What do you check?

Hard

›

This points to a control plane issue — work through the layers:

1. Check etcd health (most common cause):

etcdctl endpoint health
etcdctl endpoint status

etcd disk exhaustion or quorum loss causes the API server to stop accepting writes.

2. Check API server process: On control plane nodes:

systemctl status kube-apiserver
# or for kubeadm clusters:
crictl ps | grep apiserver

3. Check API server resource usage: Is the API server OOM-killed? Check node memory on control plane nodes.

4. Check for API server overload: Too many requests (LIST calls, runaway controllers, CI tools hammering the API). Check apiserver_request_total metrics for rate spikes.

5. Certificate expiry: Cluster certificates expire — all API communication fails. Check with kubeadm certs check-expiration.

Mitigating impact: Existing running Pods continue to run — kubelet caches pod specs locally. New scheduling and config changes are frozen until API server recovers.

Design a multi-tenant Kubernetes cluster for multiple teams with isolation and cost attribution.

Hard

›

Namespace-per-team as the isolation boundary:

ResourceQuotas — cap each team's CPU/memory/pod count. Prevents one team from starving others.
LimitRanges — enforce per-pod defaults and maximums so teams can't request unlimited resources.
NetworkPolicies — default-deny between namespaces. Explicit allow for cross-team dependencies.
RBAC — each team has a Role in their namespace only. No cluster-admin. Platform team manages ClusterRoles.
Dedicated node pools — high-security or high-performance teams get their own node pool via taints/tolerations. Prevents noisy-neighbor CPU/memory issues.

Cost attribution: Label all resources with team: and env: labels. Use Kubecost or OpenCost to attribute cluster cost to namespaces/teams. Feed into internal chargeback or showback dashboards.

For strict multi-tenant isolation, mention vCluster (virtual clusters) — each team gets their own virtual K8s API server and control plane backed by the physical cluster. Stronger isolation than namespace-level.

Tell me about a time you had to troubleshoot a Kubernetes production incident.

Hard

›

This is your Kubernetes war story. Use the 5-part framework:

1. Setup (30s) — business impact: "Our checkout service on EKS was returning 503s to 20% of requests during a peak traffic period."

2. Discovery (45s) — first signal: "CloudWatch ALB 5xx alarm fired. kubectl get pods showed 3 of 6 pods in CrashLoopBackOff. kubectl describe pod showed exit code 137 — OOM killed."

3. Diagnosis (60s) — root cause: "A code change had introduced a memory leak in a new endpoint. Traffic to that endpoint was causing pods to hit their 512Mi memory limit. With 3 pods continuously crashing, the remaining 3 were handling 2x traffic — also hitting OOM."

4. Remediation (30s) — what you did: "Rolled back the Deployment with kubectl rollout undo. Pods recovered immediately. Then temporarily increased memory limits while the dev team patched the leak."

5. Prevention (30s) — what changed: "Added memory usage to our canary metrics with an auto-rollback threshold. New container memory limit alerts at 80% before they hit 100% and OOM-kill."

Adapt to real incidents from your experience. The OOM/CrashLoopBackOff scenario maps well to common real-world K8s issues and demonstrates the full diagnostic chain interviewers want to see.