Python Study Guide

🐍

Core & Runtime 5 questions

01 What is CPython and why is it the default on most Linux servers?

›

CPython is the reference implementation of Python written in C. It is what you get when you run python3 on most distributions — the interpreter executes your bytecode on a virtual machine.

Ops teams care because packaging, extension modules (many AWS SDK paths), and system packages (apt install python3) target CPython. Alternative implementations (PyPy, MicroPython) exist for speed or embedded use but are not interchangeable without testing extensions and wheels.

If asked about performance at scale, mention PyPy or rewriting hot paths — but acknowledge most infra glue is I/O bound, not CPU bound.

02 What is a virtual environment (venv) and why use it in production deploys?

›

A virtual environment is an isolated directory with its own site-packages and usually a copy/symlink of the interpreter (python -m venv .venv). Imports resolve against that env first, not the system Python.

In production it prevents dependency clashes with OS-managed packages, makes upgrades rollback-friendly (replace the venv or image layer), and matches the “one app per env” model in containers and VMs.

Contrast with pip install --user and global installs — venvs are the standard for reproducible deploys.

03 Why use #!/usr/bin/env python3 instead of a hard-coded interpreter path?

›

env looks up python3 on PATH, so the script works across distros and local venv activation. A hard path like #!/usr/local/bin/python3 breaks on machines where Python lives elsewhere.

In systemd units or cron, you still often set PATH explicitly or call the venv’s interpreter directly (/opt/app/.venv/bin/python) for determinism.

Interviewers like hearing “env for portability, absolute venv path when the service must be bulletproof.”

04 What is PYTHONPATH and when can it cause problems in production?

›

PYTHONPATH prepends directories to sys.path so Python can import modules outside the default locations. Useful for monorepos or ad-hoc tooling.

Problems: shadowing installed packages, non-reproducible behavior between hosts, and security surprises if writable dirs appear earlier on the path. Prefer proper packages (pip install -e .), PYTHONPATH only when controlled and documented.

Mention “same image / same venv everywhere” beats per-host PYTHONPATH hacks.

05 What is the GIL and when does it matter for DevOps-style workloads?

›

The Global Interpreter Lock (GIL) in CPython allows only one thread to execute Python bytecode at a time per process. CPU-bound parallel work in threads does not scale; I/O-bound work often does because threads release the GIL around blocking calls.

For DevOps scripts (boto3 calls, SSH, subprocess, HTTP), workloads are usually I/O bound — the GIL rarely matters. For heavy parallel CPU (parsing huge logs in threads), use multiprocessing, offload to native code, or pick another runtime.

Python 3.13+ has optional GIL-free experimental builds — know the name, don’t overclaim production readiness.

📦

Packaging & Dependencies 5 questions

06 When do you use pip vs pipx?

›

pip installs into the active environment (venv or system) — libraries and apps tied to that project.

pipx installs CLI applications each into an isolated venv and links a single binary on PATH (pipx install black). Ideal for operator laptops, not for app dependency management inside a service repo.

In CI/CD, pin tools in requirements or use pre-commit; pipx is more “workstation ergonomics.”

07 What is pyproject.toml and how does PEP 621 fit in?

›

pyproject.toml is the standard place to declare build system and project metadata (PEP 518 onward). PEP 621 defines the [project] table: name, version, dependencies, optional deps, scripts entry points.

Tools (Poetry, hatch, setuptools, uv) read or generate this file. It replaces scattered setup.py-only workflows for many teams.

Mention [build-system] requires requires + build-backend so pip knows how to build the wheel.

08 Compare requirements.txt, requirements.in + pip-compile, and Poetry lockfiles.

›

requirements.txt — flat list (often pinned) of what pip installs. Simple but manual transitive updates.

requirements.in + pip-compile (pip-tools) — you edit top-level deps; compiler resolves and pins transitive versions into requirements.txt for reproducibility.

Poetry / PDM — poetry.lock or pdm.lock locks the full graph with metadata; installs are deterministic from the lockfile.

CI should install from lock or compiled requirements, not open-ended ranges.

09 What does pip install -e . (editable install) mean for internal tools?

›

Editable installs add the project source to easy-install.pth (or equivalent) so code changes are immediately importable without reinstalling. Used when developing shared libraries or CLI tools in a monorepo.

Production images usually install a wheel or copy source intentionally — editable mounts in containers can hide “works on my machine” drift.

Good follow-up: “we use -e in dev, wheel + hash pinning in prod.”

10 How do uv, pip, and Poetry differ for CI speed and reproducibility?

›

pip — universal, mature; slower cold installs unless cached.

Poetry — resolves deps, lockfile, good DX; resolver can be slower; still common in pipelines.

uv (Astral) — very fast Rust-based install and resolver; drop-in for many pip workflows; growing adoption for CI cache efficiency.

Reproducibility comes from locks or compiled requirements, not the installer brand alone.

Mention caching ~/.cache/pip or uv cache in GHA for build time wins.

⚙️

Scripting & Stdlib 5 questions

11 When is argparse enough vs Click or Typer for ops CLIs?

›

argparse is in the stdlib — no extra deps, fine for small internal scripts and CI entrypoints.

Click / Typer add composable commands, better help text, and type-hint driven CLIs — worth it when the tool grows subcommands and plugins (kubectl-style UX).

For “one script in a Terraform repo,” argparse; for “platform CLI used by many teams,” Typer/Click.

12 Why is subprocess.run(..., shell=True) dangerous?

›

With shell=True, the string passes through a shell, so metacharacters (; | $()) are interpreted. Untrusted input can become command injection.

Prefer shell=False with a list argv ["ansible-playbook", "site.yml"] and no user-controlled shell grammar.

If you must use a shell, sanitize strictly or use shlex with fixed prefixes — interviewers want “avoid shell=True.”

13 How do you safely incorporate user or config data into subprocess calls?

›

Pass arguments as a list of strings to subprocess.run without shell=True so the OS passes argv directly to the program. Validate allowlists for hostnames, paths, and job names.

For environment variables, use env={**os.environ, "FOO": value} with controlled values. Never interpolate untrusted strings into shell one-liners.

Pair with “defense in depth”: least-privilege IAM + safe subprocess patterns.

14 Why prefer pathlib over os.path for automation?

›

pathlib.Path gives object-oriented paths, overloads / for joining, and clear methods (read_text, glob, mkdir(parents=True)). Fewer string bugs across Windows vs Linux when you must support both.

os.path still fine in legacy code; new internal tools should standardize on pathlib.

Mention Path.resolve() for symlink clarity in deploy scripts.

15 Why do ops teams prefer structured (JSON) logs from Python services?

›

Plain text logs need fragile regex in Loki, CloudWatch, or ELK. JSON lines map cleanly to fields (level, trace_id, user) for filtering and alerting.

Libraries like structlog or logging formatters emit one JSON object per line; ship to the collector via stdout (12-factor).

Correlate with OpenTelemetry trace IDs when discussing modern observability.

🧪

Testing & Quality 5 questions

16 Why choose pytest over unittest for infrastructure-related tests?

›

pytest has concise asserts, powerful fixtures, parametrization, and a huge plugin ecosystem (pytest-cov, pytest-xdist). Less boilerplate than unittest classes for glue code and policy tests.

unittest remains valid (stdlib, some enterprises); pytest is the de facto modern default.

Mention testing Terraform wrappers or boto3 helpers with fixtures + moto.

17 How would you mock AWS API calls in Python tests?

›

Common options: moto (decorators or server mode emulating many services), botocore.stub.Stubber for precise request/response fixtures, or dependency-inject a client interface and fake it in unit tests.

Integration tests may hit a real sandbox account with scoped credentials and cleanup fixtures.

Say “unit tests fake AWS; one smoke test in staging proves IAM and networking.”

18 What is coverage.py and how do you use it in CI?

›

coverage.py measures which lines ran during tests. Run pytest --cov=src --cov-report=xml and upload XML to Sonar, Codecov, or fail the job if coverage drops below threshold.

Use .coveragerc to omit generated or migration paths. Coverage is a guardrail, not proof of correctness.

Acknowledge 100% coverage can be gamed — pair with meaningful assertions and review.

19 What is pre-commit and why use it in a Python DevOps repo?

›

pre-commit runs configured hooks (Ruff, Black, mypy, secret scanners) before a commit or in CI via pre-commit run --all-files. Same checks locally and in pipeline reduce “lint failed on main” churn.

Share .pre-commit-config.yaml in the repo; pin hook revisions for reproducibility.

Mention terraform fmt or yaml lint hooks alongside Python — shows platform thinking.

20 How do unit tests differ from integration tests for deployment automation?

›

Unit tests isolate functions (parsing, templating, IAM policy generation) with mocks — fast, deterministic.

Integration tests run against real-ish dependencies: Docker compose stack, k3d cluster, or account sandbox — slower, catch wiring and permission bugs.

Pyramid: many unit, fewer integration, rare full e2e pipeline runs.

CI cost vs confidence — parallelize integration suites and gate merges on unit + smoke.

☁️

Cloud & AWS (boto3) 5 questions

21 What is boto3 and what is the difference between client and resource?

›

boto3 is the AWS SDK for Python. A client is low-level: maps 1:1 to API operations (list_objects_v2), returns dicts.

A resource is a higher-level object-oriented facade (e.g. s3.Bucket) built on clients — ergonomic for simple CRUD, less control for every API edge case.

New code often uses clients + type hints; resources still appear in older scripts.

For interviews, “client for full API surface; resource for quick S3/Dynamo ergonomics.”

22 How should boto3 obtain credentials on a laptop, in CI, and on Lambda?

›

boto3 uses the default credential chain: env vars (AWS_ACCESS_KEY_ID), shared credentials file, SSO / IAM Identity Center, instance/container role, etc.

Lambda — execution role credentials injected automatically; never bake keys into the image.

CI — prefer OIDC federation to assume role (GitHub Actions → AWS) over long-lived access keys.

Explicit is better: Session(profile_name=...) in dev scripts for clarity.

23 How do you handle paginated AWS list APIs in boto3?

›

Use paginators: client.get_paginator('list_objects_v2') then paginate(Bucket=...) — handles NextToken / ContinuationToken correctly.

Manual loops with NextMarker are error-prone. Paginators are the standard pattern in automation and data pipelines.

Mention rate limiting / backoff if listing huge accounts — pair with CloudTrail awareness.

24 How do retries, throttling, and idempotency interact in AWS automation scripts?

›

boto3 (via botocore) can use configurable retries for transient errors and throttling. Not all APIs are safe to blindly retry — create operations may double-create without idempotency tokens.

Use client tokens where supported, check-before-create patterns, or Step Functions for orchestrated retries. Combine with exponential backoff and jitter at the application layer for custom loops.

Link to SQS/Lambda partial batch failure semantics if the role touches event-driven systems.

25 When would you use Python (Pulumi, CDK) vs pure Terraform/HCL for IaC?

›

Python IaC shines when you need loops, shared libraries, unit tests of infra logic, or tight coupling to application code in one language.

Terraform excels at broad provider ecosystem, declarative state, and team familiarity. Many orgs standardize on HCL and call Python only for glue (pre/post hooks, custom providers via CDKTF).

Tradeoff: abstraction power vs complexity and reviewability.

Show you can defend either side with team skill and blast radius in mind.

🐳

Containers & Orchestration 5 questions

26 What is the point of multi-stage Docker builds for Python apps?

›

A builder stage installs compilers and build deps; a final runtime stage copies only wheels/venv and runs on a slim base (python:3.12-slim). Smaller images, fewer CVEs, faster pulls.

Avoid leaving gcc, git, or .git history in production layers.

Mention distroless or chainguard images for hardened production if the interviewer goes deep.

27 Why order Dockerfile instructions to maximize layer caching for pip install?

›

Docker caches a layer if inputs unchanged. Copy requirements / lockfile first, run pip install, then COPY application source. Code changes won’t bust the dependency layer.

Anti-pattern: COPY . . before pip — any file edit reinstalls everything, slowing CI.

Combine with pip install --no-cache-dir in images to control size vs host cache tradeoffs.

28 What are typical uses of the official Kubernetes Python client?

›

Automating CRUD on resources (Jobs, CronJobs, ConfigMaps), building internal portals, controllers/operators (often with kopf or framework), or glue between CI and the API server.

Uses in-cluster service account tokens or kubeconfig from CI secrets. Watch for RBAC least privilege on the ServiceAccount.

Compare with kubectl from subprocess — client library gives structured errors and retries.

29 What do you automate with the Docker SDK for Python?

›

Build/push images in pipelines, prune images on hosts, introspect containers for diagnostics, integration-test compose stacks programmatically.

Prefer the CLI for human ops; SDK when tests or services must drive Docker without shelling out.

Note Docker socket mount = root-equivalent — never expose lightly.

30 Cron vs Celery vs Step Functions for periodic Python work — when to use which?

›

Cron / k8s CronJob — simple schedules on a box or cluster; good for batch scripts with clear idempotency.

Celery — distributed task queue with workers; when you need retries, routing, and backpressure inside an app ecosystem.

Step Functions (or similar) — visual/state-machine orchestration across Lambdas and services with built-in retries and observability.

Pick by complexity, team familiarity, and cloud vs self-hosted constraints.

For “just run this script nightly,” don’t over-engineer — cron + logs + alerts may be enough.

🔒

Security, CI/CD & Behavioral 5 questions

31 How do you pin dependencies and scan Python projects for CVEs?

›

Pin with lockfiles or compiled requirements; use pip-audit, Safety, GitHub Dependabot, or Snyk in CI. Fail builds on critical vulns or open tickets with SLA.

Combine with minimal base images and regular rebuilds — supply chain is layers + deps + base OS.

Mention PEP 740 attestations or Sigstore for advanced signing if the conversation goes there.

32 How do you load secrets in Python: env vars vs AWS Secrets Manager vs Vault?

›

Env vars — 12-factor friendly, simple for non-rotating config; risk if leaked via logs or ps.

Secrets Manager / SSM Parameter Store — fetch at startup with IAM role, support rotation and auditing.

Vault — dynamic secrets, policy-heavy enterprises. Cache carefully and handle auth (K8s SA, AppRole).

Never commit secrets; inject via platform in production.

Lambda extension or init container pattern to hydrate env before app start is a strong senior answer.

33 How do you roll out a Python minor version upgrade across many hosts or images?

›

Run tests (unit + smoke) on the new interpreter in CI; build new base images; canary a subset of workloads; watch for deprecated stdlib removals and C-extension wheels (manylinux).

Document breaking changes (e.g. asyncio, encoding defaults). Roll forward with automated AMIs or GitOps image bumps; keep rollback tag.

Use pyupgrade or Ruff rules to modernize syntax after upgrade.

Mention reading porting notes and testing greenlet/gevent or scientific stacks extra hard.

34 Common pitfalls running Python under systemd?

›

Wrong WorkingDirectory breaking relative paths; missing venv in ExecStart; buffered stdout hiding logs (use PYTHONUNBUFFERED=1 or -u); user permissions; Restart= policies masking crash loops.

Set EnvironmentFile or drop-ins for secrets; use Type=notify only if the app supports sd_notify.

journalctl + structured logs — show you debug production services methodically.

35 Tell me about a time Python automation improved deployment reliability or velocity.

›

Use a concise STAR outline:

Situation: Manual deploys, flaky config drift, or slow feedback.

Task: Automate validation, packaging, or cloud updates.

Action: Name concrete tools — pytest + moto, boto3 paginators, Docker multi-stage, GitHub Actions OIDC, pre-commit, structured logging.

Result: Quantify — time saved, fewer incidents, faster rollbacks.

Reflection: What you’d add next (integration tests, feature flags, better observability).

Specific beats generic — one real metric or outage prevented is worth ten buzzwords.