Hard Lessons from Operating Stateful Systems on Kubernetes: Why We Moved Away from EFS

To support distributed systems and data-heavy services in a containerized Kubernetes environment, Amazon EFS was initially selected as a shared storage layer. The promise of simple, elastic NFS across nodes seemed to offer the flexibility required for services needing shared access to models, configuration, and runtime code. This approach also allowed for in-place Git operations during container startup, aiming to reduce image rebuild time and simplify deployment logic.

However, as the system requirements matured—demanding higher throughput, faster boot times, and predictable availability—the limitations of EFS became increasingly apparent. This post outlines the architectural adjustments made in response to those limitations, and the resulting transition toward an image-based deployment strategy backed by persistent node storage.

Architecture Overview

Self-managed Kubernetes cluster on EC2 (Ubuntu 22.04)
MariaDB cluster deployed using StatefulSet
Supporting services: Redis, Weaviate, LLM inference backends
Persistent storage:
- EFS mounted at /mnt/efs on all nodes
- EBS volumes manually mounted at /mnt/data
Deployment goal: reduce full-stack code deployment time to under 2 minutes without compromising service continuity or state

Why EFS Didn’t Scale with Operational Demands

Despite the ease of setup, EFS introduced variability and overhead that were difficult to control in production:

Symptom	Root Cause
High startup latency for services	Cold-read and metadata fetch overhead in EFS
Readiness and liveness probes failing	Slow socket and I/O access via network NFS
`df -h` showing 0 or 8.0E	EFS's virtualized capacity model
Unexpected mount behavior (127.0.0.1:/)	Fallback behavior when EFS DNS resolution fails

Even with security group tuning and correct mount targets, EFS’s performance was inconsistent for stateful workloads and latency-sensitive services.

Image-Based Deployment Model

To address startup speed, repeatability, and runtime stability, application code was migrated to immutable Docker images built via CI/CD pipelines. This change eliminated runtime Git operations and removed the need for shared storage for code artifacts.

Key practices adopted:

Minimal base images to reduce cold start time
Layered Docker builds to separate code from system dependencies
Deployment via container rolling restarts, enabling consistent and fast rollout

This shift resulted in reliable startup performance and improved alignment with Kubernetes-native deployment models.

Storage Strategy for Stateful Services

Persistent service data such as MariaDB volumes was backed by manually attached EBS disks. Each node received a dedicated partition mounted to a standardized path. Kubernetes StatefulSets were used to manage replica identity and ensure predictable storage mapping. Node-level isolation was enforced using podAntiAffinity to prevent multiple replicas from sharing a node. Scheduling tolerations were applied to allow usage of control-plane nodes in small clusters.

Disk Partitioning

To separate application-level storage from system processes (e.g., container runtime, OS logs), each EBS volume was manually partitioned:

sudo parted /dev/nvme1n1 mklabel gpt
sudo parted -a optimal /dev/nvme1n1 mkpart primary ext4 0% 100%
sudo mkfs.ext4 /dev/nvme1n1p1
sudo mkdir -p /mnt/data
sudo mount /dev/nvme1n1p1 /mnt/data

This ensured clean volume management, enabled consistent mount behavior, and isolated application data for better performance and fault recovery.

Performance Comparison

Operation	EFS (NFS)	Local File (EBS)	Image-Based (CI-built)
Git clone (clean repo)	~4.2s	~1.1s	N/A (no runtime cloning)
MariaDB cold start	~18–22s	~4–6s	~4–6s
Probe failure rate	~35% (intermittent)	0%	0%
File I/O latency (mixed R/W)	High	Low	N/A
Resilience to node restart	Unpredictable	Reliable	Reliable

Key Observations

StatefulSet + podAntiAffinity is appropriate for databases and persistent services. It should not be applied to headless or stateless services.
EFS is more appropriate for shared logs, model archives, or backup storage, but unsuitable for high-throughput, application-layer storage.
Kubernetes does not guarantee pod-to-node affinity unless explicitly managed. Persistent volume locality must be enforced intentionally.
Image-based deployments introduce discipline and control, reducing runtime risk and improving rollout consistency.

Conclusion

Amazon EFS is convenient but introduces latency, fragility, and scaling challenges that make it unsuitable for stateful, production-grade Kubernetes services. For systems where performance and correctness are critical, image-based deployments and partitioned node-level persistent volumes provide greater reliability and control. Code should be baked into container images, and state should be isolated through Kubernetes-native volume management and node-aware scheduling.