Use these task guides when you need to change cluster settings, roll out new plugin versions, or troubleshoot node issues. Each page focuses on one repeatable operation so you can jump straight to the steps you need.
This is the multi-page printable view of this section. Click here to print.
Operations
- 1: Nodes and Volumes
- 1.1: Nodes
- 1.1.1: Node preparation
- 1.1.2: Node integration flow
- 1.2: Volumes
- 1.2.1: Static PV workflow
- 1.2.2: Volume attributes and mount options
- 2: Maintenance and Upgrade
- 2.1: Node maintenance checklist
- 2.2: Upgrade guide
- 3: Label Lustre-capable nodes
- 4: Update the Klustre CSI image
- 5: Collect diagnostics
1 - Nodes and Volumes
Learn about node prerequisites, kubelet integration, and how static Lustre volumes are represented in Kubernetes.
1.1 - Nodes
Klustre CSI only schedules on nodes that can mount Lustre exports. Use the topics below to prepare those nodes, understand what the daemonset mounts from the host, and keep kubelet integration healthy.
1.1.1 - Node preparation
Install the Lustre client stack
Every node that runs Lustre-backed pods must have:
mount.lustreandumount.lustrebinaries (vialustre-clientRPM/DEB).- Kernel modules compatible with your Lustre servers.
- Network reachability to the Lustre MGS/MDS/OSS endpoints.
Verify installation:
mount.lustre --version
lsmod | grep lustre
Label nodes
The default storage class and daemonset use the label lustre.csi.klustrefs.io/lustre-client=true.
kubectl label nodes <node-name> lustre.csi.klustrefs.io/lustre-client=true
Remove the label when a node no longer has Lustre access:
kubectl label nodes <node-name> lustre.csi.klustrefs.io/lustre-client-
Allow privileged workloads
Klustre CSI pods require:
privileged: true,allowPrivilegeEscalation: truehostPID: true,hostNetwork: true- HostPath mounts for
/var/lib/kubelet,/dev,/sbin,/usr/sbin,/lib, and/lib64
Label the namespace with Pod Security Admission overrides:
kubectl create namespace klustre-system
kubectl label namespace klustre-system \
pod-security.kubernetes.io/enforce=privileged \
pod-security.kubernetes.io/audit=privileged \
pod-security.kubernetes.io/warn=privileged
Maintain consistency
- Keep AMIs or OS images in sync so every node has the same Lustre client version.
- If you use autoscaling groups, bake the client packages into your node image or run a bootstrap script before kubelet starts.
- Automate label management with infrastructure-as-code (e.g., Cluster API, Ansible) so the right nodes receive the
lustre-client=truelabel on join/leave events.
1.1.2 - Node integration flow
Daemonset host mounts
DaemonSet/klustre-csi-node mounts the following host paths:
/var/lib/kubelet/pluginsand/var/lib/kubelet/pods– required for CSI socket registration and mount propagation./dev– ensures device files (if any) are accessible when mounting Lustre./sbin,/usr/sbin,/lib,/lib64– expose the host’s Lustre client binaries and libraries to the container.
If your kubelet uses custom directories, update pluginDir and registrationDir in the settings ConfigMap.
CSI socket lifecycle
- The node plugin listens on
csiEndpoint(defaults to/var/lib/kubelet/plugins/lustre.csi.klustrefs.io/csi.sock). - The node-driver-registrar sidecar registers that socket with kubelet via
registrationDir. - Kubelet uses the UNIX socket to call
NodePublishVolumeandNodeUnpublishVolumewhen pods mount or unmount PVCs.
If the daemonset does not come up or kubelet cannot reach the socket, run:
kubectl describe daemonset klustre-csi-node -n klustre-system
kubectl logs -n klustre-system daemonset/klustre-csi-node -c klustre-csi
PATH and library overrides
The containers inherit PATH and LD_LIBRARY_PATH values that point at the host bind mounts. If your Lustre client lives elsewhere, override:
nodePlugin.pathEnvnodePlugin.ldLibraryPath
via Helm values or by editing the daemonset manifest.
Health signals
- Kubernetes events referencing
lustre.csi.klustrefs.ioindicate mount/unmount activity. kubectl get pods -n klustre-system -o wideshould show one pod per labeled node.- A missing pod usually means the node label is absent or taints/tolerations are mismatched.
1.2 - Volumes
Klustre CSI focuses on static provisioning: you point a PV at an existing Lustre export, bind it to a PVC, and mount it into pods. Explore the topics below for the manifest workflow and mount attribute details.
1.2.1 - Static PV workflow
1. Create the PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: lustre-static-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
storageClassName: klustre-csi-static
persistentVolumeReclaimPolicy: Retain
csi:
driver: lustre.csi.klustrefs.io
volumeHandle: lustre-static-pv
volumeAttributes:
source: 10.0.0.1@tcp0:/lustre-fs
mountOptions: flock,user_xattr
volumeHandlejust needs to be unique within the cluster; it is not used by the Lustre backend.volumeAttributes.sourcecarries the Lustre management target and filesystem path.
2. Bind with a PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: lustre-static-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: klustre-csi-static
volumeName: lustre-static-pv
resources:
requests:
storage: 10Gi
Even though Lustre capacity is managed outside Kubernetes, the storage field should match the PV so the binder succeeds.
3. Mount from workloads
volumes:
- name: lustre
persistentVolumeClaim:
claimName: lustre-static-pvc
containers:
- name: app
image: busybox
volumeMounts:
- name: lustre
mountPath: /mnt/lustre
Multiple pods can reference the same PVC because Lustre supports ReadWriteMany. Pods must schedule on labeled nodes (lustre.csi.klustrefs.io/lustre-client=true).
4. Cleanup
Deleting the PVC detaches pods but the PV remains because the reclaim policy is Retain. Manually delete the PV when you no longer need it.
1.2.2 - Volume attributes and mount options
volumeAttributes
| Key | Example | Purpose |
|---|---|---|
source | 10.0.0.1@tcp0:/lustre-fs | Host(s) and filesystem path given to mount.lustre. |
mountOptions | flock,user_xattr | Comma-separated Lustre mount flags. |
Additional keys (e.g., subdir) can be added in the future; the driver simply passes the map to the Lustre helper script.
Storage class tuning
See the storage class reference for details on:
allowedTopologies– keep workloads on nodes with the Lustre label.reclaimPolicy– typicallyRetainfor static PVs.mountOptions– defaults toflockanduser_xattr, but you can addnoatime,flock,user_xattr, etc.
Override mount options per volume by setting volumeAttributes.mountOptions. This is useful when a subset of workloads needs different locking semantics.
Access modes
- Use
ReadWriteManyfor shared Lustre volumes. ReadOnlyManyis supported when you only need read access.ReadWriteOnceoffers no benefit with Lustre; prefer RWX.
Lifecycle reminders
- Klustre CSI does not provision or delete Lustre exports. Ensure the server-side directory exists and has the correct permissions.
- Kubernetes capacity values are advisory. Quotas should be enforced on the Lustre server.
PersistentVolumeReclaimPolicy=Retainkeeps PVs around after PVC deletion; clean them up manually to avoid dangling objects.
2 - Maintenance and Upgrade
Patch nodes, rotate images, or upgrade the CSI plugin without interrupting workloads.
Select the maintenance guide you need from the navigation—node checklist, upgrade plan, and future topics all live underneath this page.
2.1 - Node maintenance checklist
1. Cordon and drain
kubectl cordon <node>
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data
Because the Klustre CSI daemonset is a DaemonSet, it is unaffected by --ignore-daemonsets, but draining ensures your workloads move off the node before reboot.
2. Verify daemonset status
kubectl get pods -n klustre-system -o wide | grep <node>
Expect the daemonset pod to terminate when the node drains and recreate once the node returns.
3. Patch or reboot the node
- Apply OS updates, reboot, or swap hardware as needed.
- Ensure the Lustre client packages remain installed (validate with
mount.lustre --version).
4. Uncordon and relabel if necessary
kubectl uncordon <node>
If the node lost the lustre.csi.klustrefs.io/lustre-client=true label, reapply it after verifying Lustre connectivity.
5. Watch for daemonset rollout
kubectl rollout status daemonset/klustre-csi-node -n klustre-system
6. Confirm workloads recover
Use kubectl get pods for namespaces that rely on Lustre PVCs to ensure pods are running and mounts succeeded.
Tips
- For large clusters, drain one Lustre node at a time to keep mounts available.
- If
kubectl drainhangs due to pods using Lustre PVCs, identify them withkubectl get pods --all-namespaces -o wide | grep <node>and evict manually.
2.2 - Upgrade guide
1. Review release notes
Check the klustre-csi-plugin GitHub releases for breaking changes, minimum Kubernetes versions, and image tags.
2. Update the image reference
- Helm users: bump
image.tagandnodePlugin.registrar.image.tagin your values file, then runhelm upgrade. - Manifest users: edit
manifests/configmap-klustre-csi-settings.yaml(nodeImage,registrarImage) and reapply the manifests.
See Update the node daemonset image for detailed steps.
3. Roll out sequentially
kubectl rollout restart daemonset/klustre-csi-node -n klustre-system
kubectl rollout status daemonset/klustre-csi-node -n klustre-system
The daemonset restarts one node at a time, keeping existing mounts available.
4. Coordinate with Kubernetes upgrades
When upgrading kubelet:
- Follow the node maintenance checklist for each node.
- Upgrade the node OS/kubelet.
- Verify the daemonset pod recreates successfully before moving to the next node.
5. Validate workloads
- Spot-check pods that rely on Lustre PVCs (
kubectl execinto them and rundf -h /mnt/lustre). - Ensure no stale
FailedMountevents exist.
Rollback
If the new version misbehaves:
- Revert
nodeImageand related settings to the previous tag. - Run
kubectl rollout restart daemonset/klustre-csi-node -n klustre-system. - Inspect logs to confirm the old version is running.
3 - Label Lustre-capable nodes
The default klustre-csi-static storage class restricts scheduling to nodes labeled lustre.csi.klustrefs.io/lustre-client=true. Use this runbook whenever you add or remove nodes from the Lustre client pool.
Requirements
- Cluster-admin access with
kubectl. - Nodes already have the Lustre client packages installed and can reach your Lustre servers.
Steps
Identify nodes that can mount Lustre
kubectl get nodes -o wideCross-reference with your infrastructure inventory or automation outputs to find the node names that have Lustre connectivity.
Apply the label
kubectl label nodes <node-name> lustre.csi.klustrefs.io/lustre-client=trueRepeat for each eligible node. Use
--overwriteif the label already exists but the value should change.Verify
kubectl get nodes -L lustre.csi.klustrefs.io/lustre-clientEnsure only the nodes with Lustre access show
true. Remove the label from nodes that lose access:kubectl label nodes <node-name> lustre.csi.klustrefs.io/lustre-client-Confirm DaemonSet placement
kubectl get pods -n klustre-system -o wide \ -l app.kubernetes.io/name=klustre-csiPods from the
klustre-csi-nodedaemonset should exist only on labeled nodes. If you see pods on unlabeled nodes, check thenodeSelectorand tolerations in the daemonset spec.
Related topics
4 - Update the Klustre CSI image
Use this guide to bump the Klustre CSI image version (for example, when adopting a new release).
Requirements
- Cluster-admin access.
- The new image is pushed to a registry reachable by your cluster (GHCR or a mirror).
- The
ghcr-secretor equivalent image pull secret already contains credentials for the registry.
Steps
Edit the settings ConfigMap
The manifests and Helm chart both reference
ConfigMap/klustre-csi-settings. Update thenodeImagekey with the new tag:kubectl -n klustre-system edit configmap klustre-csi-settingsExample snippet:
data: nodeImage: ghcr.io/klustrefs/klustre-csi-plugin:0.1.2Save and exit.
Restart the daemonset pods
kubectl rollout restart daemonset/klustre-csi-node -n klustre-systemWatch the rollout
kubectl rollout status daemonset/klustre-csi-node -n klustre-system kubectl get pods -n klustre-system -o wideVerify the running image
kubectl get pods -n klustre-system -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].image}{"\n"}{end}'Confirm all pods now report the new tag.
Optional: clean up old images
If you mirror images, remove unused tags from your registry or automation as needed.
Related topics
5 - Collect diagnostics
When reporting an issue, provide the following artifacts so maintainers can reproduce the problem.
1. Capture pod logs
kubectl logs -n klustre-system daemonset/klustre-csi-node -c klustre-csi --tail=200 > klustre-csi.log
kubectl logs -n klustre-system daemonset/klustre-csi-node -c node-driver-registrar --tail=200 > node-driver-registrar.log
If a specific pod is failing, target it directly:
kubectl logs -n klustre-system <pod-name> -c klustre-csi --previous
2. Describe pods and daemonset
kubectl describe daemonset klustre-csi-node -n klustre-system > klustre-csi-daemonset.txt
kubectl describe pods -n klustre-system -l app.kubernetes.io/name=klustre-csi > klustre-csi-pods.txt
3. Export relevant resources
kubectl get csidriver lustre.csi.klustrefs.io -o yaml > csidriver.yaml
kubectl get storageclass klustre-csi-static -o yaml > storageclass.yaml
kubectl get configmap klustre-csi-settings -n klustre-system -o yaml > configmap.yaml
Remove sensitive data (e.g., registry credentials) before sharing.
4. Include node information
- Output of
uname -a,lsmod | grep lustre, and the Lustre client version on affected nodes. - Whether the node can reach your Lustre servers (share ping or
mount.lustrecommand output if available).
5. Bundle and share
Package the files into an archive and attach it to your GitHub issue or support request:
tar czf klustre-diagnostics.tgz klustre-csi.log node-driver-registrar.log \
klustre-csi-daemonset.txt klustre-csi-pods.txt csidriver.yaml storageclass.yaml configmap.yaml