Distributed Transactions: An Intuitive Deep Dive

June 7, 2025

Distributed transactions are the backbone of modern system design, where operations span multiple services, databases, and regions. In this post, I’ll break down the real engineering behind atomicity, consistency, isolation, and durability when a single database isn’t enough—and show how to maintain data consistency even when failures occur.


1. The Problem

You have two services, OrderSvc and InventorySvc, each with its own database. A single “buy 100 shirts” operation spans:

  1. OrderSvc: insert orders(order_id, user_id, qty)
  2. InventorySvc: update stock(item_id) -= qty

In a single database system, a transaction manager ensures that either the entire transaction completes successfully (commit) or rolls back on failure (rollback), maintaining atomicity and leaving the system in a consistent state. But as soon as your business logic needs to span multiple services—each with independent data stores—ensuring distributed transactions become a major engineering challenge.

We require atomicity across services. Formally, let each service’s local transaction be T₁ and T₂. We need a global transaction G = {T₁, T₂} satisfying:

  • Atomicity: commit(G)commit(T1)commit(T2)commit(G) ⇔ commit(T₁) ∧ commit(T₂)
  • Consistency: database invariants preserved
  • Isolation: no interleaving from other global transactions
  • Durability: once acked, changes survive crashes

If a failure occurs mid-operation, the system must not be left in an inconsistent state. This is where transaction managers and commit protocols are essential.


2. Two-Phase Commit (2PC) in Pseudocode

Distributed transactions often rely on the two-phase commit protocol (2PC) to coordinate commit or rollback across participants.

Coordinator pseudocode:

// Phase 1: Prepare for each participant P: send P.prepare(GID) await P.vote ∈ {YES, NO} // timeout ⇒ NO // Phase 2: Commit/Abort if all votes == YES: decision = COMMIT else: decision = ABORT for each participant P: send P.decision(decision)

Participant pseudocode:

on prepare(GID): write UNDO/REDO to local log if local checks pass → reply YES else → reply NO on decision(COMMIT): apply REDO, release locks on decision(ABORT): apply UNDO, release locks

Latency ≈ 2 × RTT + disk-fsyncs. Quorum: coordinator must hear N votes; blocks if coordinator dies. Downside: In distributed transactions, 2PC can leave the system blocked if the coordinator fails, and recovery is not always automatic.


3. Math of Quorum & Fault Tolerance

For N replicas coordinating via Paxos/Raft (distributed transaction managers):

  • Write quorum W ≥ ⌊N/2⌋ + 1
  • Read quorum R ≥ ⌊N/2⌋ + 1
  • Guarantees W + R > N ⇒ any read sees latest write

In 2PC, you implicitly require all participants (strict quorum = N). If one is down, the entire global transaction stalls, and the system may require manual intervention to ensure data consistency.


4. Three-Phase Commit (3PC) & Timeouts

3PC splits prepare into:

  1. CanCommit
  2. PreCommit (write REDO/UNDO and fence)
  3. DoCommit

By introducing per-phase timeouts, participants avoid indefinite blocking—yet under network partitions or app crashes, consistency guarantees weaken, and the system can still be left in an inconsistent state. No free lunch: stronger liveness at cost of protocol complexity.


5. The Saga Pattern

Instead of global locks, define a state machine for long-lived distributed transactions:

State: [START] → [ORDER_CREATED] → [PAID] → [SHIPPED] → [COMPLETED] Failure at step k ⇒ trigger Compensation[k…1]

Compensation: an idempotent inverse of each step—also called compensating transactions. If shipping fails, run refundPayment(order_id). Always ensure compensating transactions themselves are idempotent to avoid further inconsistency.

This is eventual consistency:

limtstatealldesiredendstatelim_{t→∞} state_{all} ≡ desired end-state

Sagas are often managed by transaction managers and message brokers like Kafka or RocketMQ to coordinate events and retries across multiple services.


6. Advanced: Deterministic Ordering (Calvin)

Calvin’s workflow:

  1. Sequencer assigns a strict, global sequence number to every transaction Tᵢ → σᵢ
  2. Each node executes transactions in ascending σ order without further coordination
  3. Conflict detection happens offline, at the sequencing layer

Benefit: per-transaction execution is lock-free and local—latency = local compute + log shipping. Deterministic execution minimizes the risk that a transaction is completed out of order.


7. Hybrid Logical Clocks (HLC) & Bounded Time

RealTime clocks drift—enter HLC:

HLC.timestamp=maxlocal.physical.now(),recv.logical+1HLC.timestamp = max{ local.physical.now(), recv.logical } + 1

HLC enables monotonic ordering and bounded skew for commit timestamps, critical for distributed transactions where physical time alone can’t ensure consistency. Spanner’s TrueTime combines HLC with GPS+NTP, maintaining data consistency with a tight ε bound.


8. Latency vs. Consistency Trade-off

Let L = one-way latency, σ = clock-uncertainty, f = failure probability.

ProtocolCommit LatencyAvailability under P
2PC≈ 2L + O(fs, logs)Blocks on coordinator
3PC≈ 3L + O(fs, logs)Reduced blocking
Spanner≈ 2L + σ + O(log N)CP choice
Saga (async)≈ L (best-effort local) + retriesAP choice

Engineers must weigh the trade-off between consistency and latency: distributed transactions with strong consistency often increase commit latency, while Sagas favor availability and local speed at the risk of eventual consistency.


9. Recommendations for Senior Engineers

  1. Monolith → Regional: start with single-region ACID (single database transactions).
  2. Global Strong Consistency: use Raft/Paxos + 2PC + HLC/TrueTime.
  3. High-Throughput Eventual Consistency: employ Saga + message broker + idempotent compensating transactions.
  4. Deterministic Execution: consider Calvin if you can centralize sequencing.
  5. Failure Testing: inject network partitions, crash coordinators, and verify recovery paths. Always ensure rollback or compensation leaves the system in a consistent state.

10. Real-World Example: Compensating for Failure in Distributed Transactions

Let’s say your e-commerce platform executes a transaction that must:

  • Reserve inventory in InventorySvc
  • Charge payment in PaymentSvc
  • Create an order record in OrderSvc

Scenario:

  1. The transaction manager begins the distributed transaction.
  2. InventorySvc successfully reserves 100 shirts.
  3. PaymentSvc successfully charges the customer.
  4. OrderSvc fails due to a network timeout or database error.

Now you have a partial transaction: Inventory is reduced, payment is charged, but the order does not exist in the system. If you simply ignore the failure, you leave the system in an inconsistent state, frustrating both customers and business operations.

Solution:

To ensure data consistency, your distributed transaction manager must trigger compensating transactions:

  • PaymentSvc issues a refund (compensating transaction for charge).
  • InventorySvc releases the reserved stock.

If your transaction system supports automatic compensation and idempotency, retries are safe and no step will execute twice. This is the core of the Saga Pattern—each step is reversible, and the process as a whole can complete successfully or roll back safely even if failures occur.


11. When to Use Each Distributed Transaction Protocol

Use CaseRecommended Protocol
All services in a single databaseLocal ACID
Multi-database, strong consistency needed2PC or 3PC
Large-scale, high-throughput, partial failures OKSaga Pattern
Highly parallel workloads, strict orderingCalvin
Global deployment with clock uncertaintySpanner/TrueTime
  • 2PC/3PC: Good for small N, high consistency needs, but risk blocking.
  • Saga: Good for microservices, eventual consistency, complex workflows.
  • Calvin/Spanner: For global, lock-free, or clock-synchronized use cases.

Additional Resources


FAQ: Distributed Transactions

Q1: What is a distributed transaction? A: It’s a transaction that spans multiple databases or services, requiring coordination to maintain atomicity, consistency, and data integrity.

Q2: How do transaction managers maintain data consistency across multiple services? A: By using commit protocols (2PC, 3PC) or orchestrating compensating transactions (Sagas) to ensure either all steps complete successfully or all changes are rolled back.

Q3: What happens if a failure occurs in the middle of a distributed transaction? A: The system must coordinate a rollback or compensation to avoid leaving data in an inconsistent state, relying on transaction logs or message brokers to recover.


Want more deep dives into distributed systems, database transactions, and real engineering patterns? Contact me or subscribe for updates.

Join the Discussion

Share your thoughts and insights about this system.