// RDBMS, NoSQL, object storage

Databases & Storage
Study Guide

30 QUESTIONS 6 DOMAINS DATA ON CALL
MASTERED
0 / 30
FILTER:
EASY
MEDIUM
HARD
🗄
RELATIONAL DATABASES 5 questions
01 What is ACID and why does it matter for OLTP?
Atomicity (all-or-nothing), Consistency (valid state), Isolation (concurrent transactions), Durability (committed survives crash). OLTP (orders, payments) needs strong guarantees; analytics may relax.
02 What are primary keys, foreign keys, and indexes?
PK uniquely identifies rows. FK enforces referential integrity between tables. Indexes speed lookups but slow writes and use storage - design for query patterns, avoid over-indexing.
03 What is connection pooling and why exhaust it?
Pools reuse DB connections to avoid TCP+auth overhead per request. Exhaustion causes timeouts under load - fix with pool sizing, shorter queries, read replicas, or app-level queuing. Watch max_connections on DB.
04 Explain read phenomena: dirty read, non-repeatable read, phantom read.
Dirty: read uncommitted data from another txn. Non-repeatable: same read twice returns different committed values. Phantom: new rows appear in range query. Isolation levels (Read Committed, Repeatable Read, Serializable) trade consistency vs performance.
05 What is MVCC at a high level?
Multi-Version Concurrency Control keeps row versions so readers do not block writers - snapshots for transactions. Used in Postgres, InnoDB. Trade-offs: vacuum/undo cleanup, long transactions holding old versions causing bloat.
REPLICATION & HIGH AVAILABILITY 5 questions
06 What is the difference between synchronous and asynchronous replication?
Synchronous: primary waits for replica ack - stronger durability, higher latency, risk if replica slow. Asynchronous: faster commits, possible data loss if primary dies before log ships. Choose RPO requirements.
07 What are RPO and RTO?
RPO: max acceptable data loss window. RTO: max acceptable downtime. Drive backup frequency, multi-AZ vs multi-region, and runbook automation.
08 What is failover in RDS vs manual promotion of read replica?
Multi-AZ: synchronous standby, automatic failover for AZ outage. Cross-region replica promotion: manual or scripted - async lag means nonzero RPO. Test failover quarterly.
09 What is split-brain in clustered databases?
Two nodes both believe they are primary - divergent writes, data corruption. Quorum systems (majority votes), fencing (STONITH), and orchestrators (Patroni) mitigate. Kubernetes operators must handle carefully.
10 How do you capacity-plan a database for 10x growth?
Measure p95 query latency, CPU, IOPS, storage growth, connection counts. Plan vertical limits, read replicas, sharding/partitioning strategy, caching (Redis), archive cold data, load test before cutover.
NOSQL & CACHING 5 questions
11 When would you pick a document store vs relational DB?
Document (MongoDB, DynamoDB document): flexible schema, nested objects, horizontal scale patterns. Relational: complex joins, strict integrity, mature tooling. Wrong choice causes join emulation in app or schema rigidity pain.
12 What is DynamoDB partition key vs sort key?
Partition key hashes to shard - hot keys throttle. Sort key orders items in partition for range queries. Design for access patterns; GSIs duplicate data for alternate queries - understand write amplification.
13 What is Redis used for in production systems?
Cache, session store, rate limiting, pub/sub, leaderboards - in-memory speed. Plan persistence (AOF/RDB), eviction policy, HA (Redis Sentinel/Cluster), and never as sole source of truth for critical durable data unless architected for it.
14 What is eventual consistency in NoSQL?
Reads may not reflect latest write immediately across replicas - common in globally distributed systems. Application must tolerate stale reads or use strongly consistent APIs when available (higher latency/cost).
15 What is the cache stampede / thundering herd problem?
Many requests miss cache simultaneously and hit DB. Mitigate: probabilistic early expiration, request coalescing (single-flight), circuit breakers, warming, per-key locks. Critical at scale during cold start.
OBJECT STORAGE & FILES 5 questions
16 What is S3-style object storage vs block storage vs NFS?
Object: flat namespace, HTTP API, great for blobs, static assets, data lake - not POSIX. Block: EBS volumes for databases/filesystems. NFS/EFS: shared POSIX files - different latency/consistency model.
17 What are S3 storage classes and lifecycle policies?
Standard, IA, Glacier tiers - trade cost vs retrieval latency. Lifecycle transitions old objects, expire deletes, Intelligent-Tiering for unknown patterns. Reduces bill for archives and logs.
18 What is strong read-after-write consistency in S3?
New objects readable immediately; overwrites/deletes have eventual consistency historically - know current AWS guarantees per operation. Matters for pipelines writing then reading same key.
19 How do you secure a bucket from public exposure?
Block public access settings, bucket policies denying s3:GetObject for Principal *, encryption at rest, logging to centralized account, IAM least privilege, pre-signed URLs for time-limited access, regular audits (S3 Block Public Access org-wide).
20 What is multipart upload and when required?
Split large objects into parts uploaded in parallel - required above 5 GB, recommended for large files for reliability/resume. Failed uploads leave orphan parts - lifecycle rule to abort incomplete multipart saves cost.
BACKUPS & DISASTER RECOVERY 5 questions
21 What is the difference between backup and replication?
Replication mirrors live state - logical/physical corruption or deletes can replicate. Backups are point-in-time copies (snapshots, dumps) for restore to different timeline. Need both for resilience.
22 What is a backup retention and legal hold consideration?
Retention balances cost vs compliance (7 years tax logs). Legal hold pauses deletion. Automate lifecycle; test restores, not just backup success jobs.
23 How do you test database restores?
Periodic automated restore to isolated env, run integrity checks, measure RTO, verify app connectivity. Tabletop exercises for region loss. Document who approves production restore.
24 What is point-in-time recovery (PITR)?
Restore to any second within retention window using transaction logs + periodic snapshots - minimizes data loss vs snapshot-only. Enabled on RDS/Aurora; understand storage cost of logs.
25 What is the difference between logical and physical backups?
Logical (pg_dump): portable, slower restore, good cross-version. Physical (snapshot, xtrabackup): fast, closer to byte-copy, may be version/region specific. Choose per RTO/RPO and migration needs.
MANAGED DB OPS (CLOUD) 5 questions
26 What operational tasks does RDS still leave to you?
Schema design, query tuning, connection management, parameter group tuning, major version upgrades planning, security groups, monitoring slow queries - AWS patches OS, backups, failover mechanics.
27 What is Aurora vs vanilla RDS MySQL/Postgres?
Aurora: AWS storage engine, fast failover, read replicas share storage, auto-growing storage. Higher performance at cost; some MySQL/Postgres features differ - verify compatibility.
28 What is blue/green deployment for database major version upgrades?
Replica on new version, replicate, switch traffic with controlled cutover - reduces downtime vs in-place upgrade. AWS offers managed blue/green for some engines; test app compatibility.
29 How do you migrate a large DB to cloud with minimal downtime?
Initial dump/snapshot, ongoing replication (DMS, native logical replication), read-only cutover window, DNS/connection string swap, validate sequences and collations, rollback plan. Practice dry runs.
30 What metrics do you alert on for a production database?
CPU, free storage, replication lag, connection count, deadlock rate, slow query log thresholds, failed backup jobs, disk queue depth, memory pressure. Tie alerts to runbooks.