Databases & Storage Study Guide

🗄

RELATIONAL DATABASES 5 questions

01 What is ACID and why does it matter for OLTP?

›

Atomicity (all-or-nothing), Consistency (valid state), Isolation (concurrent transactions), Durability (committed survives crash). OLTP (orders, payments) needs strong guarantees; analytics may relax.

02 What are primary keys, foreign keys, and indexes?

›

PK uniquely identifies rows. FK enforces referential integrity between tables. Indexes speed lookups but slow writes and use storage - design for query patterns, avoid over-indexing.

03 What is connection pooling and why exhaust it?

›

Pools reuse DB connections to avoid TCP+auth overhead per request. Exhaustion causes timeouts under load - fix with pool sizing, shorter queries, read replicas, or app-level queuing. Watch max_connections on DB.

04 Explain read phenomena: dirty read, non-repeatable read, phantom read.

›

Dirty: read uncommitted data from another txn. Non-repeatable: same read twice returns different committed values. Phantom: new rows appear in range query. Isolation levels (Read Committed, Repeatable Read, Serializable) trade consistency vs performance.

05 What is MVCC at a high level?

›

Multi-Version Concurrency Control keeps row versions so readers do not block writers - snapshots for transactions. Used in Postgres, InnoDB. Trade-offs: vacuum/undo cleanup, long transactions holding old versions causing bloat.

⇄

REPLICATION & HIGH AVAILABILITY 5 questions

06 What is the difference between synchronous and asynchronous replication?

›

Synchronous: primary waits for replica ack - stronger durability, higher latency, risk if replica slow. Asynchronous: faster commits, possible data loss if primary dies before log ships. Choose RPO requirements.

07 What are RPO and RTO?

›

RPO: max acceptable data loss window. RTO: max acceptable downtime. Drive backup frequency, multi-AZ vs multi-region, and runbook automation.

08 What is failover in RDS vs manual promotion of read replica?

›

Multi-AZ: synchronous standby, automatic failover for AZ outage. Cross-region replica promotion: manual or scripted - async lag means nonzero RPO. Test failover quarterly.

09 What is split-brain in clustered databases?

›

Two nodes both believe they are primary - divergent writes, data corruption. Quorum systems (majority votes), fencing (STONITH), and orchestrators (Patroni) mitigate. Kubernetes operators must handle carefully.

10 How do you capacity-plan a database for 10x growth?

›

Measure p95 query latency, CPU, IOPS, storage growth, connection counts. Plan vertical limits, read replicas, sharding/partitioning strategy, caching (Redis), archive cold data, load test before cutover.

⬡

NOSQL & CACHING 5 questions

11 When would you pick a document store vs relational DB?

›

Document (MongoDB, DynamoDB document): flexible schema, nested objects, horizontal scale patterns. Relational: complex joins, strict integrity, mature tooling. Wrong choice causes join emulation in app or schema rigidity pain.

12 What is DynamoDB partition key vs sort key?

›

Partition key hashes to shard - hot keys throttle. Sort key orders items in partition for range queries. Design for access patterns; GSIs duplicate data for alternate queries - understand write amplification.

13 What is Redis used for in production systems?

›

Cache, session store, rate limiting, pub/sub, leaderboards - in-memory speed. Plan persistence (AOF/RDB), eviction policy, HA (Redis Sentinel/Cluster), and never as sole source of truth for critical durable data unless architected for it.

14 What is eventual consistency in NoSQL?

›

Reads may not reflect latest write immediately across replicas - common in globally distributed systems. Application must tolerate stale reads or use strongly consistent APIs when available (higher latency/cost).

15 What is the cache stampede / thundering herd problem?

›

Many requests miss cache simultaneously and hit DB. Mitigate: probabilistic early expiration, request coalescing (single-flight), circuit breakers, warming, per-key locks. Critical at scale during cold start.

▢

OBJECT STORAGE & FILES 5 questions

16 What is S3-style object storage vs block storage vs NFS?

›

Object: flat namespace, HTTP API, great for blobs, static assets, data lake - not POSIX. Block: EBS volumes for databases/filesystems. NFS/EFS: shared POSIX files - different latency/consistency model.

17 What are S3 storage classes and lifecycle policies?

›

Standard, IA, Glacier tiers - trade cost vs retrieval latency. Lifecycle transitions old objects, expire deletes, Intelligent-Tiering for unknown patterns. Reduces bill for archives and logs.

18 What is strong read-after-write consistency in S3?

›

New objects readable immediately; overwrites/deletes have eventual consistency historically - know current AWS guarantees per operation. Matters for pipelines writing then reading same key.

19 How do you secure a bucket from public exposure?

›

Block public access settings, bucket policies denying s3:GetObject for Principal *, encryption at rest, logging to centralized account, IAM least privilege, pre-signed URLs for time-limited access, regular audits (S3 Block Public Access org-wide).

20 What is multipart upload and when required?

›

Split large objects into parts uploaded in parallel - required above 5 GB, recommended for large files for reliability/resume. Failed uploads leave orphan parts - lifecycle rule to abort incomplete multipart saves cost.

⧉

BACKUPS & DISASTER RECOVERY 5 questions

21 What is the difference between backup and replication?

›

Replication mirrors live state - logical/physical corruption or deletes can replicate. Backups are point-in-time copies (snapshots, dumps) for restore to different timeline. Need both for resilience.

22 What is a backup retention and legal hold consideration?

›

Retention balances cost vs compliance (7 years tax logs). Legal hold pauses deletion. Automate lifecycle; test restores, not just backup success jobs.

23 How do you test database restores?

›

Periodic automated restore to isolated env, run integrity checks, measure RTO, verify app connectivity. Tabletop exercises for region loss. Document who approves production restore.

24 What is point-in-time recovery (PITR)?

›

Restore to any second within retention window using transaction logs + periodic snapshots - minimizes data loss vs snapshot-only. Enabled on RDS/Aurora; understand storage cost of logs.

25 What is the difference between logical and physical backups?

›

Logical (pg_dump): portable, slower restore, good cross-version. Physical (snapshot, xtrabackup): fast, closer to byte-copy, may be version/region specific. Choose per RTO/RPO and migration needs.

☁

MANAGED DB OPS (CLOUD) 5 questions

26 What operational tasks does RDS still leave to you?

›

Schema design, query tuning, connection management, parameter group tuning, major version upgrades planning, security groups, monitoring slow queries - AWS patches OS, backups, failover mechanics.

27 What is Aurora vs vanilla RDS MySQL/Postgres?

›

Aurora: AWS storage engine, fast failover, read replicas share storage, auto-growing storage. Higher performance at cost; some MySQL/Postgres features differ - verify compatibility.

28 What is blue/green deployment for database major version upgrades?

›

Replica on new version, replicate, switch traffic with controlled cutover - reduces downtime vs in-place upgrade. AWS offers managed blue/green for some engines; test app compatibility.

29 How do you migrate a large DB to cloud with minimal downtime?

›

Initial dump/snapshot, ongoing replication (DMS, native logical replication), read-only cutover window, DNS/connection string swap, validate sequences and collations, rollback plan. Practice dry runs.

30 What metrics do you alert on for a production database?

›

CPU, free storage, replication lag, connection count, deadlock rate, slow query log thresholds, failed backup jobs, disk queue depth, memory pressure. Tie alerts to runbooks.