02What are primary keys, foreign keys, and indexes?
›
PK uniquely identifies rows. FK enforces referential integrity between tables. Indexes speed lookups but slow writes and use storage - design for query patterns, avoid over-indexing.
03What is connection pooling and why exhaust it?
›
Pools reuse DB connections to avoid TCP+auth overhead per request. Exhaustion causes timeouts under load - fix with pool sizing, shorter queries, read replicas, or app-level queuing. Watch max_connections on DB.
Dirty: read uncommitted data from another txn. Non-repeatable: same read twice returns different committed values. Phantom: new rows appear in range query. Isolation levels (Read Committed, Repeatable Read, Serializable) trade consistency vs performance.
05What is MVCC at a high level?
›
Multi-Version Concurrency Control keeps row versions so readers do not block writers - snapshots for transactions. Used in Postgres, InnoDB. Trade-offs: vacuum/undo cleanup, long transactions holding old versions causing bloat.
⇄
REPLICATION & HIGH AVAILABILITY5 questions
06What is the difference between synchronous and asynchronous replication?
›
Synchronous: primary waits for replica ack - stronger durability, higher latency, risk if replica slow. Asynchronous: faster commits, possible data loss if primary dies before log ships. Choose RPO requirements.
07What are RPO and RTO?
›
RPO: max acceptable data loss window. RTO: max acceptable downtime. Drive backup frequency, multi-AZ vs multi-region, and runbook automation.
08What is failover in RDS vs manual promotion of read replica?
›
Multi-AZ: synchronous standby, automatic failover for AZ outage. Cross-region replica promotion: manual or scripted - async lag means nonzero RPO. Test failover quarterly.
09What is split-brain in clustered databases?
›
Two nodes both believe they are primary - divergent writes, data corruption. Quorum systems (majority votes), fencing (STONITH), and orchestrators (Patroni) mitigate. Kubernetes operators must handle carefully.
10How do you capacity-plan a database for 10x growth?
›
Measure p95 query latency, CPU, IOPS, storage growth, connection counts. Plan vertical limits, read replicas, sharding/partitioning strategy, caching (Redis), archive cold data, load test before cutover.
⬡
NOSQL & CACHING5 questions
11When would you pick a document store vs relational DB?
Partition key hashes to shard - hot keys throttle. Sort key orders items in partition for range queries. Design for access patterns; GSIs duplicate data for alternate queries - understand write amplification.
13What is Redis used for in production systems?
›
Cache, session store, rate limiting, pub/sub, leaderboards - in-memory speed. Plan persistence (AOF/RDB), eviction policy, HA (Redis Sentinel/Cluster), and never as sole source of truth for critical durable data unless architected for it.
14What is eventual consistency in NoSQL?
›
Reads may not reflect latest write immediately across replicas - common in globally distributed systems. Application must tolerate stale reads or use strongly consistent APIs when available (higher latency/cost).
15What is the cache stampede / thundering herd problem?
›
Many requests miss cache simultaneously and hit DB. Mitigate: probabilistic early expiration, request coalescing (single-flight), circuit breakers, warming, per-key locks. Critical at scale during cold start.
▢
OBJECT STORAGE & FILES5 questions
16What is S3-style object storage vs block storage vs NFS?
›
Object: flat namespace, HTTP API, great for blobs, static assets, data lake - not POSIX. Block: EBS volumes for databases/filesystems. NFS/EFS: shared POSIX files - different latency/consistency model.
17What are S3 storage classes and lifecycle policies?
›
Standard, IA, Glacier tiers - trade cost vs retrieval latency. Lifecycle transitions old objects, expire deletes, Intelligent-Tiering for unknown patterns. Reduces bill for archives and logs.
18What is strong read-after-write consistency in S3?
›
New objects readable immediately; overwrites/deletes have eventual consistency historically - know current AWS guarantees per operation. Matters for pipelines writing then reading same key.
19How do you secure a bucket from public exposure?
›
Block public access settings, bucket policies denying s3:GetObject for Principal *, encryption at rest, logging to centralized account, IAM least privilege, pre-signed URLs for time-limited access, regular audits (S3 Block Public Access org-wide).
20What is multipart upload and when required?
›
Split large objects into parts uploaded in parallel - required above 5 GB, recommended for large files for reliability/resume. Failed uploads leave orphan parts - lifecycle rule to abort incomplete multipart saves cost.
⧉
BACKUPS & DISASTER RECOVERY5 questions
21What is the difference between backup and replication?
›
Replication mirrors live state - logical/physical corruption or deletes can replicate. Backups are point-in-time copies (snapshots, dumps) for restore to different timeline. Need both for resilience.
22What is a backup retention and legal hold consideration?
›
Retention balances cost vs compliance (7 years tax logs). Legal hold pauses deletion. Automate lifecycle; test restores, not just backup success jobs.
23How do you test database restores?
›
Periodic automated restore to isolated env, run integrity checks, measure RTO, verify app connectivity. Tabletop exercises for region loss. Document who approves production restore.
24What is point-in-time recovery (PITR)?
›
Restore to any second within retention window using transaction logs + periodic snapshots - minimizes data loss vs snapshot-only. Enabled on RDS/Aurora; understand storage cost of logs.
25What is the difference between logical and physical backups?
›
Logical (pg_dump): portable, slower restore, good cross-version. Physical (snapshot, xtrabackup): fast, closer to byte-copy, may be version/region specific. Choose per RTO/RPO and migration needs.
☁
MANAGED DB OPS (CLOUD)5 questions
26What operational tasks does RDS still leave to you?
›
Schema design, query tuning, connection management, parameter group tuning, major version upgrades planning, security groups, monitoring slow queries - AWS patches OS, backups, failover mechanics.
27What is Aurora vs vanilla RDS MySQL/Postgres?
›
Aurora: AWS storage engine, fast failover, read replicas share storage, auto-growing storage. Higher performance at cost; some MySQL/Postgres features differ - verify compatibility.
28What is blue/green deployment for database major version upgrades?
›
Replica on new version, replicate, switch traffic with controlled cutover - reduces downtime vs in-place upgrade. AWS offers managed blue/green for some engines; test app compatibility.
29How do you migrate a large DB to cloud with minimal downtime?