Distributed Transactions: An Intuitive Deep Dive
Distributed transactions are the backbone of modern system design, where operations span multiple services, databases, and regions. In this post, I’ll break down the real engineering behind atomicity, consistency, isolation, and durability when a single database isn’t enough—and show how to maintain data consistency even when failures occur.
1. The Problem
You have two services, OrderSvc
and InventorySvc
, each with its own database. A single “buy 100 shirts” operation spans:
- OrderSvc: insert orders(order_id, user_id, qty)
- InventorySvc: update stock(item_id) -= qty
In a single database system, a transaction manager ensures that either the entire transaction completes successfully (commit) or rolls back on failure (rollback), maintaining atomicity and leaving the system in a consistent state. But as soon as your business logic needs to span multiple services—each with independent data stores—ensuring distributed transactions become a major engineering challenge.
We require atomicity across services. Formally, let each service’s local transaction be T₁ and T₂. We need a global transaction G = {T₁, T₂} satisfying:
- Atomicity:
- Consistency: database invariants preserved
- Isolation: no interleaving from other global transactions
- Durability: once acked, changes survive crashes
If a failure occurs mid-operation, the system must not be left in an inconsistent state. This is where transaction managers and commit protocols are essential.
2. Two-Phase Commit (2PC) in Pseudocode
Distributed transactions often rely on the two-phase commit protocol (2PC) to coordinate commit or rollback across participants.
Coordinator pseudocode:
// Phase 1: Prepare for each participant P: send P.prepare(GID) await P.vote ∈ {YES, NO} // timeout ⇒ NO // Phase 2: Commit/Abort if all votes == YES: decision = COMMIT else: decision = ABORT for each participant P: send P.decision(decision)
Participant pseudocode:
on prepare(GID): write UNDO/REDO to local log if local checks pass → reply YES else → reply NO on decision(COMMIT): apply REDO, release locks on decision(ABORT): apply UNDO, release locks
Latency ≈ 2 × RTT + disk-fsyncs. Quorum: coordinator must hear N votes; blocks if coordinator dies. Downside: In distributed transactions, 2PC can leave the system blocked if the coordinator fails, and recovery is not always automatic.
3. Math of Quorum & Fault Tolerance
For N replicas coordinating via Paxos/Raft (distributed transaction managers):
- Write quorum W ≥ ⌊N/2⌋ + 1
- Read quorum R ≥ ⌊N/2⌋ + 1
- Guarantees W + R > N ⇒ any read sees latest write
In 2PC, you implicitly require all participants (strict quorum = N). If one is down, the entire global transaction stalls, and the system may require manual intervention to ensure data consistency.
4. Three-Phase Commit (3PC) & Timeouts
3PC splits prepare into:
- CanCommit
- PreCommit (write REDO/UNDO and fence)
- DoCommit
By introducing per-phase timeouts, participants avoid indefinite blocking—yet under network partitions or app crashes, consistency guarantees weaken, and the system can still be left in an inconsistent state. No free lunch: stronger liveness at cost of protocol complexity.
5. The Saga Pattern
Instead of global locks, define a state machine for long-lived distributed transactions:
State: [START] → [ORDER_CREATED] → [PAID] → [SHIPPED] → [COMPLETED] Failure at step k ⇒ trigger Compensation[k…1]
Compensation: an idempotent inverse of each step—also called compensating transactions.
If shipping fails, run refundPayment(order_id)
.
Always ensure compensating transactions themselves are idempotent to avoid further inconsistency.
This is eventual consistency:
Sagas are often managed by transaction managers and message brokers like Kafka or RocketMQ to coordinate events and retries across multiple services.
6. Advanced: Deterministic Ordering (Calvin)
Calvin’s workflow:
- Sequencer assigns a strict, global sequence number to every transaction Tᵢ → σᵢ
- Each node executes transactions in ascending σ order without further coordination
- Conflict detection happens offline, at the sequencing layer
Benefit: per-transaction execution is lock-free and local—latency = local compute + log shipping. Deterministic execution minimizes the risk that a transaction is completed out of order.
7. Hybrid Logical Clocks (HLC) & Bounded Time
RealTime clocks drift—enter HLC:
HLC enables monotonic ordering and bounded skew for commit timestamps, critical for distributed transactions where physical time alone can’t ensure consistency. Spanner’s TrueTime combines HLC with GPS+NTP, maintaining data consistency with a tight ε bound.
8. Latency vs. Consistency Trade-off
Let L = one-way latency, σ = clock-uncertainty, f = failure probability.
Protocol | Commit Latency | Availability under P |
---|---|---|
2PC | ≈ 2L + O(fs, logs) | Blocks on coordinator |
3PC | ≈ 3L + O(fs, logs) | Reduced blocking |
Spanner | ≈ 2L + σ + O(log N) | CP choice |
Saga (async) | ≈ L (best-effort local) + retries | AP choice |
Engineers must weigh the trade-off between consistency and latency: distributed transactions with strong consistency often increase commit latency, while Sagas favor availability and local speed at the risk of eventual consistency.
9. Recommendations for Senior Engineers
- Monolith → Regional: start with single-region ACID (single database transactions).
- Global Strong Consistency: use Raft/Paxos + 2PC + HLC/TrueTime.
- High-Throughput Eventual Consistency: employ Saga + message broker + idempotent compensating transactions.
- Deterministic Execution: consider Calvin if you can centralize sequencing.
- Failure Testing: inject network partitions, crash coordinators, and verify recovery paths. Always ensure rollback or compensation leaves the system in a consistent state.
10. Real-World Example: Compensating for Failure in Distributed Transactions
Let’s say your e-commerce platform executes a transaction that must:
- Reserve inventory in InventorySvc
- Charge payment in PaymentSvc
- Create an order record in OrderSvc
Scenario:
- The transaction manager begins the distributed transaction.
- InventorySvc successfully reserves 100 shirts.
- PaymentSvc successfully charges the customer.
- OrderSvc fails due to a network timeout or database error.
Now you have a partial transaction: Inventory is reduced, payment is charged, but the order does not exist in the system. If you simply ignore the failure, you leave the system in an inconsistent state, frustrating both customers and business operations.
Solution:
To ensure data consistency, your distributed transaction manager must trigger compensating transactions:
- PaymentSvc issues a refund (compensating transaction for charge).
- InventorySvc releases the reserved stock.
If your transaction system supports automatic compensation and idempotency, retries are safe and no step will execute twice. This is the core of the Saga Pattern—each step is reversible, and the process as a whole can complete successfully or roll back safely even if failures occur.
11. When to Use Each Distributed Transaction Protocol
Use Case | Recommended Protocol |
---|---|
All services in a single database | Local ACID |
Multi-database, strong consistency needed | 2PC or 3PC |
Large-scale, high-throughput, partial failures OK | Saga Pattern |
Highly parallel workloads, strict ordering | Calvin |
Global deployment with clock uncertainty | Spanner/TrueTime |
- 2PC/3PC: Good for small N, high consistency needs, but risk blocking.
- Saga: Good for microservices, eventual consistency, complex workflows.
- Calvin/Spanner: For global, lock-free, or clock-synchronized use cases.
Additional Resources
- Designing Data-Intensive Applications by Martin Kleppmann
- Google Spanner: TrueTime and Global Transactions
- Distributed Systems for Fun and Profit
FAQ: Distributed Transactions
Q1: What is a distributed transaction? A: It’s a transaction that spans multiple databases or services, requiring coordination to maintain atomicity, consistency, and data integrity.
Q2: How do transaction managers maintain data consistency across multiple services? A: By using commit protocols (2PC, 3PC) or orchestrating compensating transactions (Sagas) to ensure either all steps complete successfully or all changes are rolled back.
Q3: What happens if a failure occurs in the middle of a distributed transaction? A: The system must coordinate a rollback or compensation to avoid leaving data in an inconsistent state, relying on transaction logs or message brokers to recover.
Want more deep dives into distributed systems, database transactions, and real engineering patterns? Contact me or subscribe for updates.
Join the Discussion
Share your thoughts and insights about this system.