Hard Lessons from Operating Stateful Systems on Kubernetes: Why We Moved Away from EFS

March 12, 2025

To support distributed systems and data-heavy services in a containerized Kubernetes environment, Amazon EFS was initially selected as a shared storage layer. The promise of simple, elastic NFS across nodes seemed to offer the flexibility required for services needing shared access to models, configuration, and runtime code. This approach also allowed for in-place Git operations during container startup, aiming to reduce image rebuild time and simplify deployment logic.

However, as the system requirements matured—demanding higher throughput, faster boot times, and predictable availability—the limitations of EFS became increasingly apparent. This post outlines the architectural adjustments made in response to those limitations, and the resulting transition toward an image-based deployment strategy backed by persistent node storage.


Architecture Overview

  • Self-managed Kubernetes cluster on EC2 (Ubuntu 22.04)

  • MariaDB cluster deployed using StatefulSet

  • Supporting services: Redis, Weaviate, LLM inference backends

  • Persistent storage:

    • EFS mounted at /mnt/efs on all nodes
    • EBS volumes manually mounted at /mnt/data
  • Deployment goal: reduce full-stack code deployment time to under 2 minutes without compromising service continuity or state


Why EFS Didn’t Scale with Operational Demands

Despite the ease of setup, EFS introduced variability and overhead that were difficult to control in production:

SymptomRoot Cause
High startup latency for servicesCold-read and metadata fetch overhead in EFS
Readiness and liveness probes failingSlow socket and I/O access via network NFS
df -h showing 0 or 8.0EEFS's virtualized capacity model
Unexpected mount behavior (127.0.0.1:/)Fallback behavior when EFS DNS resolution fails

Even with security group tuning and correct mount targets, EFS’s performance was inconsistent for stateful workloads and latency-sensitive services.


Image-Based Deployment Model

To address startup speed, repeatability, and runtime stability, application code was migrated to immutable Docker images built via CI/CD pipelines. This change eliminated runtime Git operations and removed the need for shared storage for code artifacts.

Key practices adopted:

  • Minimal base images to reduce cold start time
  • Layered Docker builds to separate code from system dependencies
  • Deployment via container rolling restarts, enabling consistent and fast rollout

This shift resulted in reliable startup performance and improved alignment with Kubernetes-native deployment models.


Storage Strategy for Stateful Services

Persistent service data such as MariaDB volumes was backed by manually attached EBS disks. Each node received a dedicated partition mounted to a standardized path. Kubernetes StatefulSets were used to manage replica identity and ensure predictable storage mapping. Node-level isolation was enforced using podAntiAffinity to prevent multiple replicas from sharing a node. Scheduling tolerations were applied to allow usage of control-plane nodes in small clusters.


Disk Partitioning

To separate application-level storage from system processes (e.g., container runtime, OS logs), each EBS volume was manually partitioned:

sudo parted /dev/nvme1n1 mklabel gpt sudo parted -a optimal /dev/nvme1n1 mkpart primary ext4 0% 100% sudo mkfs.ext4 /dev/nvme1n1p1 sudo mkdir -p /mnt/data sudo mount /dev/nvme1n1p1 /mnt/data

This ensured clean volume management, enabled consistent mount behavior, and isolated application data for better performance and fault recovery.


Performance Comparison

OperationEFS (NFS)Local File (EBS)Image-Based (CI-built)
Git clone (clean repo)~4.2s~1.1sN/A (no runtime cloning)
MariaDB cold start~18–22s~4–6s~4–6s
Probe failure rate~35% (intermittent)0%0%
File I/O latency (mixed R/W)HighLowN/A
Resilience to node restartUnpredictableReliableReliable

Key Observations

  • StatefulSet + podAntiAffinity is appropriate for databases and persistent services. It should not be applied to headless or stateless services.
  • EFS is more appropriate for shared logs, model archives, or backup storage, but unsuitable for high-throughput, application-layer storage.
  • Kubernetes does not guarantee pod-to-node affinity unless explicitly managed. Persistent volume locality must be enforced intentionally.
  • Image-based deployments introduce discipline and control, reducing runtime risk and improving rollout consistency.

Conclusion

Amazon EFS is convenient but introduces latency, fragility, and scaling challenges that make it unsuitable for stateful, production-grade Kubernetes services. For systems where performance and correctness are critical, image-based deployments and partitioned node-level persistent volumes provide greater reliability and control. Code should be baked into container images, and state should be isolated through Kubernetes-native volume management and node-aware scheduling.

Join the Discussion

Share your thoughts and insights about this system.