Node maintenance checklist
Drain nodes safely and ensure Klustre CSI pods return to service.
1. Cordon and drain
kubectl cordon <node>
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data
Because the Klustre CSI daemonset is a DaemonSet, it is unaffected by --ignore-daemonsets, but draining ensures your workloads move off the node before reboot.
2. Verify daemonset status
kubectl get pods -n klustre-system -o wide | grep <node>
Expect the daemonset pod to terminate when the node drains and recreate once the node returns.
3. Patch or reboot the node
- Apply OS updates, reboot, or swap hardware as needed.
- Ensure the Lustre client packages remain installed (validate with
mount.lustre --version).
4. Uncordon and relabel if necessary
kubectl uncordon <node>
If the node lost the lustre.csi.klustrefs.io/lustre-client=true label, reapply it after verifying Lustre connectivity.
5. Watch for daemonset rollout
kubectl rollout status daemonset/klustre-csi-node -n klustre-system
6. Confirm workloads recover
Use kubectl get pods for namespaces that rely on Lustre PVCs to ensure pods are running and mounts succeeded.
Tips
- For large clusters, drain one Lustre node at a time to keep mounts available.
- If
kubectl drainhangs due to pods using Lustre PVCs, identify them withkubectl get pods --all-namespaces -o wide | grep <node>and evict manually.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.