Distributed Transactions: An Intuitive Deep Dive

June 8, 2025

“Ensuring all-or-nothing across machines without melting clocks or tearing your hair out.”


1. The Problem

You have two services, OrderSvc and InventorySvc, each with its own database. A single “buy 100 shirts” operation spans:

  1. OrderSvc: insert orders(order_id, user_id, qty)
  2. InventorySvc: update stock(item_id) -= qty

We require atomicity across services. Formally, let each service’s local transaction be T₁ and T₂. We need a global transaction G = {T₁, T₂} satisfying:

  • Atomicity: [ commit(G) ⇔ commit(T₁) ∧ commit(T₂) ]
  • Consistency: database invariants preserved
  • Isolation: no interleaving from other global transactions
  • Durability: once acked, changes survive crashes

2. Two-Phase Commit (2PC) in Pseudocode

Coordinator pseudocode:

// Phase 1: Prepare for each participant P: send P.prepare(GID) await P.vote ∈ {YES, NO} // timeout ⇒ NO // Phase 2: Commit/Abort if all votes == YES: decision = COMMIT else: decision = ABORT for each participant P: send P.decision(decision)

Participant pseudocode:

on prepare(GID): write UNDO/REDO to local log if local checks pass → reply YES else → reply NO on decision(COMMIT): apply REDO, release locks on decision(ABORT): apply UNDO, release locks

Latency ≈ 2 × RTT + disk-fsyncs. Quorum: coordinator must hear N votes; blocking if coordinator dies.


3. Math of Quorum & Fault Tolerance

For N replicas coordinating via Paxos/Raft:

  • Write quorum W ≥ ⌊N/2⌋ + 1
  • Read quorum R ≥ ⌊N/2⌋ + 1
  • Guarantees W + R > N ⇒ any read sees latest write

In 2PC, you implicitly require all participants (strict quorum = N). If one is down, the entire G stalls.


4. Three-Phase Commit (3PC) & Timeouts

3PC splits prepare into:

  1. CanCommit
  2. PreCommit (write REDO/UNDO and fence)
  3. DoCommit

By introducing per-phase timeouts, participants avoid indefinite blocking—yet under network partitions + coordinator app crash, they can still diverge. No free lunch: stronger liveness at cost of protocol complexity.


5. The Saga Pattern

Instead of global locks, define a state machine:

State: [START] → [ORDER_CREATED] → [PAID] → [SHIPPED] → [COMPLETED] Failure at step k ⇒ trigger Compensation[k…1]

Compensation: an idempotent inverse of each step:

  • If shipping fails, run refundPayment(order_id).
  • Compensation graph may be cyclic—ensure idempotency.

This is eventual consistency:

limtstatealldesiredendstatelim_{t→∞} state_{all} ≡ desired end-state

6. Advanced: Deterministic Ordering (Calvin)

Calvin’s workflow:

  1. Sequencer assigns a strict, global sequence number to every transaction Tᵢ → σᵢ
  2. Each node executes transactions in ascending σ order without further coordination
  3. Conflict detection happens offline, at the sequencing layer

Benefit: per-transaction execution is lock-free and local—latency = local compute + log shipping.


7. Hybrid Logical Clocks (HLC) & Bounded Time

RealTime clocks drift—enter HLC:

HLC.timestamp=maxlocal.physical.now(),recv.logical+1HLC.timestamp = max{ local.physical.now(), recv.logical } + 1

Use HLC to assign commit timestamps, enabling:

  • Monotonic ordering
  • Bounded skew guarantees w/o specialized hardware

Spanner’s TrueTime is an HLC + GPS+NTP solution with ε bound.


8. Latency vs. Consistency Trade-off

Let L = one-way latency, σ = clock-uncertainty, f = failure probability.

ProtocolCommit LatencyAvailability under P
2PC≈ 2L + O(fs, logs)Blocks on coordinator
3PC≈ 3L + O(fs, logs)Reduced blocking
Spanner≈ 2L + σ + O(log N)CP choice
Saga (async)≈ L (best-effort local) + retriesAP choice

9. Recommendations for Senior Engineers

  1. Monolith → Regional: start with single-region ACID.
  2. Global Strong Consistency: use Raft/Paxos + 2PC + HLC/TrueTime.
  3. High-Throughput Eventual Consistency: employ Saga + message broker (Kafka/RocketMQ) + idempotent compensations.
  4. Deterministic Execution: consider Calvin if you can centralize sequencing.
  5. Failure Testing: inject network partitions, crash coordinators, and verify recovery paths.

Join the Discussion

Share your thoughts and insights about this system.