Databases: Design, Access, and Scale

Databases are the backbone of most software. A clean design saves time later, while poor choices cost performance and money. The core idea is to match how data is stored with how it is used.

Design choices

  • Start from the questions your app answers: who owns a post, which comments belong to it, when a user last logged in.
  • Choose a model: relational (tables and keys), document (collections of nested data), or wide-column (more flexible rows).
  • Normalize to avoid duplication, but denormalize when reads are heavy. Use indexes and materialized views to speed common queries.

Access patterns

  • Define the typical queries and transactions first.
  • Index the fields used in filters and joins: e.g., users.id, posts.author_id, comments.post_id.
  • Use connection pools and limit long-running reads; add a caching layer for hot data.

Scale

  • Vertical scaling adds power to one machine, but has limits.
  • Horizontal scaling adds more machines: replicas for reads, shards for writes.
  • For critical work, consider distributed transactions carefully; many apps use eventual consistency with clear boundaries.
  • Plan backups, monitoring, and disaster recovery from the start.

Example

  • A small blog: users, posts, comments. A relational design with foreign keys makes it easy to enforce consistency; indexes on posts.author_id and comments.post_id speed lookups.

Getting started

  • Sketch a simple data model on paper.
  • List the 5 most common queries; design indexes for them.
  • Pick a database family that fits: relational for strong consistency, document for flexible schemas, or a column store for analytics.

Trade-offs

  • Normalization reduces updates but can slow reads with joins; denormalization speeds reads but needs more updates.
  • Caching helps, yet requires good invalidation rules.
  • Strong consistency is clear and safe, but may limit throughput under heavy load.

Real-world tips

  • Measure early: use simple benchmarks for your key queries.
  • Use explain plans or profiling to find slow parts.
  • Prepare for growth with read replicas and regular backups.

Key Takeaways

  • Start with data needs, then pick a data model that fits read and write patterns.
  • Design for the main queries and keep an eye on indexes and caching.
  • Plan for scale early with a clear strategy for replication, sharding, and backups.