Software Engineer Handbook
  • 👋Welcome to this handbook
  • Level 0 : Non-technical considerations
    • How to manage your time effectively
    • How to prepare for job interviews #1 : Digital Presence
    • How to prepare for job interviews #2 : At the interview
  • Level 1 : Computer Science
    • Intro: Why computer science is critical
    • Algorithms & Data structures Handbook
      • Algorithms & Data structures Handbook
      • Sorting Algorithms
      • Data Structures ADTs Implementation (In Go)
      • Techniques for Solving Data Structures Problems
    • Computer Organization and Architecture
    • Operating Systems
    • Database systems
  • Level 2 : System Design
    • Step 1 : Analysis
    • Step 2 : Digging deeper into data modeling
    • Step 3 : High-Level component design
    • Step 4: Low-Level / Detailed Design system
      • Clients
      • Application Servers
      • Databases
      • Security
      • Distributed Systems with Microservices
  • Level 4: Productivity Handbook for Software Engineers
    • How I use Fig
    • Command line tools
    • Develop on the cloud
  • Extra
  • Leadership
    • Processes
    • Poeple
    • Delivery
    • Transitioning to a new role
    • Resources I like
Powered by GitBook
On this page
  • Relational
  • Sharding
  • Replication
  • Optimization
  • Data modeling
  • Migrations
  • SQL Vs NoSQL
  • NoSQL
  • SQL
  • Use cases
  • NoSQL
  • SQL
  • App-level
  1. Level 2 : System Design
  2. Step 4: Low-Level / Detailed Design system

Databases

PreviousApplication ServersNextSecurity

Last updated 2 years ago

Relational

Sharding

  • App-level joins

  • Distributed transactions

  • App-level partitioning of data

Example :

Challenges of app-level partitioning

  • Partitioning function among DB instances

  • How to rebalance if instance become hot/big (a lot of writes and reads)

  • Since you split data across instances, you need to sub-divide your queries to sub queries

  • Also if you want to do ACID transactions across partitions there is a need for distributed protocols such as 2PC

Replication

  • Chain-based replication

  • The Raft Consensus Algorithm

Optimization

  • Caching slow to compute data or frequently accessed

  • Queries

    • n+1

    • Archive old records

    • Return only needed columns → because I/O , CPU, and memory load

    • Use enums for categorical data

    • Add indexes (usually it’s b-tree data structure)

    • Use explain to inspect

    • Use slow query log for debugging

  • DB optimization

    • Horizontal scaling

      • Replication

      • App-level partitioning/sharding

    • Vertical Scaling: Increasing hardware resources (CPU cores and core speed, RAM, disk space)

Data modeling

  • The Entity-Relationship Model

  • Identifying Data Objects

  • Developing basic schema (attributes, primary key, relationships ,foreign keys, integrity rules)

Migrations

  • App-level migrations (migrations scripts or tools such as knexjs)

    • Altering tables and columns and rolling back in migration history

  • From DBMS to DBMS

  • Transfer all data form source to target and shutdown source

  • Transfer all data from source to target and then leave source running → DB replication

  • Transfer data → complete or partial

  • Humongous/Heterogenous → source and target DBMS are the same or not

  • There is migrations systems which moves data from source to target

  • From machine to another machine

SQL Vs NoSQL

NoSQL

  • Better support for locality (embedding) which results in less network requests

  • Closer to application data structures usage: For example, an account page contains account info and profile info embedded inside the same account document

  • If the data in your application has a document-like structure (i.e., a tree of one-to-many relationships, where typically the entire tree is loaded at once)

  • Flexible schema

    • Easy to change

    • No pre design overhead, you design on the go

  • You model based on your application queries not based on how your entities is designed

SQL

  • The relational model counters by providing better support for joins, and better many-to-one and many-to-many relationships.

    • Joins can be emulated in application code by making multiple requests to the database, but that also moves complexity into the application and is usually slower than a join performed by specialized code inside the database.

Use cases

NoSQL

  • Big data

  • When you need horizontal scaling for high throughput in writes and reads

  • Flexible schema

SQL

  • Need ACID

  • JOINs for many-to-many or one-to-many

  • Analytics using aggregations

  • You need to know your queries in advance

App-level

  • Migrations scripts

  • ORMs

  • Query builders

Sharding Pinterest: How we scaled our MySQL fleetMedium
Raw SQL vs Query Builder vs ORMMedium
Lecture 8: Transactions, ACID, 2PC, 2PL, Serializability
Logo
Logo
Chain-based replication from Understanding distributed systems book
Raft Consensus Algorithm
Logo