k8s MariaDb Galera Cluster

Links
Safe to bootstrap
- https://galeracluster.com/2016/11/introducing-the-safe-to-bootstrap-feature-in-galera-cluster/
In case of a sudden crash of the entire cluster, all nodes will be considered unsafe to bootstrap from, so operator action will always be required to force the use of a particular node as a bootstrap node.

Restore huge db to Galera/Mariadb - using single node

Restart - after orderly shutdown

Check for "safe_to_bootstrap: 1" in grastate.dat
see - https://github.com/bitnami/charts/tree/main/bitnami/mariadb-galera#user-content-bootstraping-a-node-other-than-0

Restart - after hard crash of all nodes

all grastate.dat should now have "safe_to_bootstrap: 0"

Find node with last transaction committed

mysql --wsrep-recover

# Look in logs for highest "WSREP: Recoverd position: 37bb-addd-xxx

# Pick the node with highest number and change grastage.dat  "safe_to_bootstrap: 0 -> 1"

k8s recover from hard restart by mounting pvc volume into temp container, and then manually editing /mnt/data/grastage.dat

export k8s_claimName=mariadb-galera-0
kubectl get pvc ${k8s_claimName} | grep "${k8s_claimName}\s\+Bound\s" || echo "# Didn't find Bound pvc ${k8s_claimName} in namespace"
kubectl run -i --tty --rm volpodcontainer --overrides='
{   "apiVersion": "v1", "kind": "Pod", "metadata": { "name": "volpod" },
    "spec": {   "containers": [
                {   "command": [ "bash" ]
                    ,"image": "docker.io/diepes/debug:latest",  "name": "volpod"
                    ,"stdin": true, "tty": true
                    ,"volumeMounts": [{ "mountPath": "/mnt", "name": "galeradata" }]
                }]
        ,"restartPolicy": "Never"
        ,"volumes": [{ "name": "galeradata"
            , "persistentVolumeClaim": { "claimName": "'${k8s_claimName}'" } }]
        ,"tolerations": [{"effect": "NoSchedule", "key": "kubernetes.azure.com/scalesetpriority", "operator": "Equal", "value": "spot"  }]
    }  }' --image="docker.io/diepes/debug:latest"

HAPROXY liveness script for MariaDB Galera

https://github.com/olafz/percona-clustercheck

MySQL (MariaDB) ram tuning

https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-resize.html

Error messages Mariadb/Galera

"[Warning] WSREP: no nodes coming from prim view, prim not possible"
- or "[ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster ..."
- Which means that no cluster primary node exists and it can't figure out if it should become primary.
- recovery:
  - we could try starting the DB’s in parallel, or putting each to sleep and making the HealthCheck pass, while we manually follow the recovery steps
  - Boot strapping / recovery
    1. Delay restarts
      - update on the StatefulSets parameter readinessProbe under initialDelaySeconds from the default 30 to 300 (which is 5 minutes) to allow sufficient time to edit the impacted file
    2. Find latest db
```
mysqld --wsrep-recover
```
    3. select the pod to boot first
      - Update grstate.dat
        cat /bitnami/mariadb/data/grastate.dat # uuid: 2a651c5d-139e-11ee-8733-0eab9be77c14 # seqno: -1 # safe_to_bootstrap: 0 cd /bitnami/mariadb/data sed -i “s/safe_to_bootstrap: 0/safe_to_bootstrap: 1/“ grstate.dat # Now delete / recreate pod to bootstrap