Node maintenance checklist

Drain nodes safely and ensure Klustre CSI pods return to service.

1. Cordon and drain

kubectl cordon <node>
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data

Because the Klustre CSI daemonset is a DaemonSet, it is unaffected by --ignore-daemonsets, but draining ensures your workloads move off the node before reboot.

2. Verify daemonset status

kubectl get pods -n klustre-system -o wide | grep <node>

Expect the daemonset pod to terminate when the node drains and recreate once the node returns.

3. Patch or reboot the node

  • Apply OS updates, reboot, or swap hardware as needed.
  • Ensure the Lustre client packages remain installed (validate with mount.lustre --version).

4. Uncordon and relabel if necessary

kubectl uncordon <node>

If the node lost the lustre.csi.klustrefs.io/lustre-client=true label, reapply it after verifying Lustre connectivity.

5. Watch for daemonset rollout

kubectl rollout status daemonset/klustre-csi-node -n klustre-system

6. Confirm workloads recover

Use kubectl get pods for namespaces that rely on Lustre PVCs to ensure pods are running and mounts succeeded.

Tips

  • For large clusters, drain one Lustre node at a time to keep mounts available.
  • If kubectl drain hangs due to pods using Lustre PVCs, identify them with kubectl get pods --all-namespaces -o wide | grep <node> and evict manually.

Last modified November 30, 2025: docs: restructure docs, disable blog (62b401e)