Intro: Real-time Agent Execution in an AI-Native World

We’re entering a new era of AI-native systems—multi-agent workflows, LLM feedback loops, and autonomous decision routing. If you're building tools where agents reason over graphs of tasks and evolve workflows over time (like OhWise does), you’ll inevitably face this question:

Where and how should you store, track, and coordinate the execution state of your DAG?

This post dives into the evolution from ZooKeeper to Redis, contrasting their coordination models, and explaining why Redis ultimately won for OhWise. We’ll also compare this model to emerging MCP-based agentic systems, and why Redis-backed DAG orchestration gives us more flexibility, speed, and agency.

The Problem: Multi-Agent Execution State Store

Let’s say your orchestrator parses a user request into this DAG:

    a
   / \
  b   c
   \ /
    d

Each node is an agent task—API call, model execution, I/O task, etc. Your backend must:

Track real-time status for each node (pending, running, success)
Enqueue ready nodes only
Notify frontend as each node finishes
Handle partial failures and restarts

This leads us to the State Store + Coordination Layer problem.

ZooKeeper: Battle-Tested for Distributed Coordination

DAG Encoding in ZK

/dag/123/a/status = success
/dag/123/b/status = running

ZooKeeper has advantages in coordinating workers in a distributed system.

Watches: orchestrator reacts in real time without polling
Ephemeral nodes: detects dead workers via sessions
Consistency: strong, linearizable guarantees (via Zab protocol)

However, it has certain overheads in terms of performance

Complex to operate (quorum, leader election)
Throughput bottlenecks under high churn
Overkill unless you truly need distributed locks or coordination

ZooKeeper is recommended when your system needs leader election, fencing tokens, or shared locks. But it quickly becomes cumbersome for task graphs.

Redis as a Real-Time DAG Store

Pattern: Adjacency Map in Redis Hash

HSET dag:123 task:a '{"status":"success","next":["b","c"]}'
HSET dag:123 task:b '{"status":"pending","next":["d"]}'

Why Redis is Right at This Stage

Sub-ms atomic updates per task
Easy for workers to HSET results
Orchestrator can HGETALL in one shot to check progress
Low memory and ops overhead—ideal for MVPs and sub-10K user scale
Compatible with Redis Streams for event tracking

What You Still Have to Handle Manually

No built-in dependency resolution—you compute this externally
Must implement your own retry/reconcile logic

Despite this, Redis’s combination of simplicity, speed, and developer control makes it ideal for intelligent workflows.

Final Decision: Why Redis Wins Over ZooKeeper

After evaluating both systems under real-world DAG orchestration load, Redis won for OhWise because:

10x faster write/read latency than ZooKeeper
Simpler setup and ops—no quorum, no zkCli debugging
Flexible schema for embedding custom node info
Streams + Hashes + Pub/Sub cover 90% of coordination needs
Easier to integrate with LLM agents, self-healing logic, and WebSocket-based UIs

ZooKeeper is designed for distributed consensus. Redis is designed for fast, reactive, real-time systems.

For AI-native DAG workflows where agents make decisions, mutate graphs, and operate asynchronously, Redis offers a cleaner and more maintainable mental model.

Comparison with MCP-Based Agent Systems

MCP (Model Context Protocol), introduced by Anthropic in 2024, allows LLMs to call tools via standard interfaces (JSON-RPC). It’s a step toward standardizing how agents talk to tools. But it’s not an orchestrator.

MCP Limitations for DAG Execution

No native task dependency handling or dynamic scheduling
No orchestration memory or state tracking
Doesn’t support backpressure, retries, or mutation-based DAG evolution

OhWise Advantage

Orchestrator has full control of DAG graph, status, and self-evolution
Redis for fast state transitions; MariaDB for versioned storage
Multiple agents can contribute to one evolving DAG over time
Designed for long-lived agent sessions—not stateless invocation

MCP is like an RPC bus. Redis + OhWise is like a DAG brain.

Race Conditions, Failures & Recovery

Let's think about 2 scenarios:

Worker dies after updating Redis but before notifying orchestrator?
Orchestrator crashes before reacting to completion?

Solution: Use Redis Streams + Periodic Reconcile

Workers emit XADD task_complete events
Orchestrator blocks on XREADGROUP to react in near-real-time
A separate reconcile job runs every 30–60s:
- Looks for "running" tasks with old timestamps
- Re-enqueues or retries them

This pattern ensures fault tolerance without tight coupling.

Redis vs ZooKeeper vs SQL: The Matrix

Feature	Redis	ZooKeeper	MariaDB
Read latency	Sub-ms	1–5ms	100–300ms
Write throughput	High	Limited	Decent
Built-in watchers	Manual polling	Yes	No
Coordination logic	DIY	Built-in	Externalized
Durability	Configurable	Strong	ACID
Operational cost	Lightweight	Complex	Commodity

Conclusion

Use Redis for real-time task graph state: fast, flexible, observable. Use MariaDB for DAG versioning and historical insight. ZooKeeper is still useful for consensus and leader election—but it’s not worth the operational cost unless you truly need it.

Want low-latency fan-out? Use Redis Pub/Sub + Socket.IO to stream back to your frontend. Want self-healing? Use a cron-based reconcile job to replay or resume stuck tasks.

The Future: Self-Evolving DAGs and Agent-Driven Orchestration

Your orchestrator shouldn't just dispatch—it should learn.

Next steps:

Add an LLM-powered audit agent that reads past DAGs and recommends structure improvements
Use embedding-based matching to route user prompts to similar past DAGs
Let agents propose mutations to the DAG and evaluate them over time

Soon, orchestration won’t just be a pipeline engine—it’ll be an evolving memory structure.

This is not a dev tool. It’s infrastructure for intelligent systems.

If you're building systems with autonomous agents, graph reasoning, or real-time pipelines—connect with me at heunify.com/contact

From ZooKeeper to Redis: Rethinking Agent Orchestration for AI-Native Systems