k8s MariaDb Galera Cluster
- Links
- Safe to bootstrap
- In case of a sudden crash of the entire cluster, all nodes will be considered unsafe to bootstrap from, so operator action will always be required to force the use of a particular node as a bootstrap node.
Restore huge db to Galera/Mariadb - using single node
https://severalnines.com/blog/guide-mysql-galera-cluster-restoration-using-mysqldump/
https://galeracluster.com/library/training/tutorials/galera-backup.html
https://github.com/mydumper/mydumper - multi threaded db dump
Restart - after orderly shutdown
- Check for "safe_to_bootstrap: 1" in grastate.dat
Restart - after hard crash of all nodes
all grastate.dat should now have "safe_to_bootstrap: 0"
Find node with last transaction committed
mysql --wsrep-recover # Look in logs for highest "WSREP: Recoverd position: 37bb-addd-xxx # Pick the node with highest number and change grastage.dat "safe_to_bootstrap: 0 -> 1"
k8s recover from hard restart by mounting pvc volume into temp container, and then manually editing /mnt/data/grastage.dat
export k8s_claimName=mariadb-galera-0 kubectl get pvc ${k8s_claimName} | grep "${k8s_claimName}\s\+Bound\s" || echo "# Didn't find Bound pvc ${k8s_claimName} in namespace" kubectl run -i --tty --rm volpodcontainer --overrides=' { "apiVersion": "v1", "kind": "Pod", "metadata": { "name": "volpod" }, "spec": { "containers": [ { "command": [ "bash" ] ,"image": "docker.io/diepes/debug:latest", "name": "volpod" ,"stdin": true, "tty": true ,"volumeMounts": [{ "mountPath": "/mnt", "name": "galeradata" }] }] ,"restartPolicy": "Never" ,"volumes": [{ "name": "galeradata" , "persistentVolumeClaim": { "claimName": "'${k8s_claimName}'" } }] ,"tolerations": [{"effect": "NoSchedule", "key": "kubernetes.azure.com/scalesetpriority", "operator": "Equal", "value": "spot" }] } }' --image="docker.io/diepes/debug:latest"
HAPROXY liveness script for MariaDB Galera
MySQL (MariaDB) ram tuning
Error messages Mariadb/Galera
- "[Warning] WSREP: no nodes coming from prim view, prim not possible"
- or "[ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster ..."
- Which means that no cluster primary node exists and it can't figure out if it should become primary.
- recovery:
we could try starting the DB’s in parallel, or putting each to sleep and making the HealthCheck pass, while we manually follow the recovery steps
Boot strapping / recovery
- Delay restarts
update on the StatefulSets parameter readinessProbe under initialDelaySeconds from the default 30 to 300 (which is 5 minutes) to allow sufficient time to edit the impacted file
Find latest db
mysqld --wsrep-recover
- select the pod to boot first
Update grstate.dat
cat /bitnami/mariadb/data/grastate.dat # uuid: 2a651c5d-139e-11ee-8733-0eab9be77c14 # seqno: -1 # safe_to_bootstrap: 0 cd /bitnami/mariadb/data sed -i “s/safe_to_bootstrap: 0/safe_to_bootstrap: 1/“ grstate.dat # Now delete / recreate pod to bootstrap
- Delay restarts