k8s MultiAzStorage Problem
- As of 2023-09 there seem to be a fundamental problem with k8s and statefull workloads.
Problem 1 - POD and PV(Storage) not in same AZ
For deployments this can be avoided by only creating the PV from the PVC once the pod is scheduled with Storage Class config
volumeBindingMode: WaitForFirstConsumer
- Storage wait for POD to start and then get's created in same AZ
- The real issue is when there is a server failure, or other re-schedule of pods, and they land on a different AZ than there PV(Permanent volume/Disk)
Example error in k8s event log
Warning FailedScheduling pod/infra-elasticsearch-master-0 0/4 nodes are available: 1 Insufficient cpu, 3 node(s) had volume node affinity conflict. preemption: 0/4 nodes are available: 1 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling..
- If you can loose the pvc data, then just scale to 0 and delete the pvc, scale backup up and get new volume in correct AZ
- If you can't loose data you have to move the pod to the correct az (Impossible for Statefullset) or move the data, backup, create new PV and restore.