Maintenance and Upgrade

Keep Klustre CSI healthy during node drains and Kubernetes upgrades.

1: Node maintenance checklist
2: Upgrade guide

Patch nodes, rotate images, or upgrade the CSI plugin without interrupting workloads.

Select the maintenance guide you need from the navigation—node checklist, upgrade plan, and future topics all live underneath this page.

1 - Node maintenance checklist

Drain nodes safely and ensure Klustre CSI pods return to service.

1. Cordon and drain

kubectl cordon <node>
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data

Because the Klustre CSI daemonset is a DaemonSet, it is unaffected by --ignore-daemonsets, but draining ensures your workloads move off the node before reboot.

2. Verify daemonset status

kubectl get pods -n klustre-system -o wide | grep <node>

Expect the daemonset pod to terminate when the node drains and recreate once the node returns.

3. Patch or reboot the node

Apply OS updates, reboot, or swap hardware as needed.
Ensure the Lustre client packages remain installed (validate with mount.lustre --version).

4. Uncordon and relabel if necessary

kubectl uncordon <node>

If the node lost the lustre.csi.klustrefs.io/lustre-client=true label, reapply it after verifying Lustre connectivity.

5. Watch for daemonset rollout

kubectl rollout status daemonset/klustre-csi-node -n klustre-system

6. Confirm workloads recover

Use kubectl get pods for namespaces that rely on Lustre PVCs to ensure pods are running and mounts succeeded.

Tips

For large clusters, drain one Lustre node at a time to keep mounts available.
If kubectl drain hangs due to pods using Lustre PVCs, identify them with kubectl get pods --all-namespaces -o wide | grep <node> and evict manually.

2 - Upgrade guide

Plan Klustre CSI version upgrades alongside Kubernetes changes.

1. Review release notes

Check the klustre-csi-plugin GitHub releases for breaking changes, minimum Kubernetes versions, and image tags.

2. Update the image reference

Helm users: bump image.tag and nodePlugin.registrar.image.tag in your values file, then run helm upgrade.
Manifest users: edit manifests/configmap-klustre-csi-settings.yaml (nodeImage, registrarImage) and reapply the manifests.

See Update the node daemonset image for detailed steps.

3. Roll out sequentially

kubectl rollout restart daemonset/klustre-csi-node -n klustre-system
kubectl rollout status daemonset/klustre-csi-node -n klustre-system

The daemonset restarts one node at a time, keeping existing mounts available.

4. Coordinate with Kubernetes upgrades

When upgrading kubelet:

Follow the node maintenance checklist for each node.
Upgrade the node OS/kubelet.
Verify the daemonset pod recreates successfully before moving to the next node.

5. Validate workloads

Spot-check pods that rely on Lustre PVCs (kubectl exec into them and run df -h /mnt/lustre).
Ensure no stale FailedMount events exist.

Rollback

If the new version misbehaves:

Revert nodeImage and related settings to the previous tag.
Run kubectl rollout restart daemonset/klustre-csi-node -n klustre-system.
Inspect logs to confirm the old version is running.