This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Advanced Installation

Deep dive into Klustre CSI prerequisites and install methods.

Ready to customize your deployment? Use the pages in this section for the full checklist, installation methods, and platform-specific notes. Requirements always come first; after that, pick either manifests or Helm, then dive into the environment-specific guidelines.

1 - Kind Quickstart

Stand up a local Kind cluster, simulate a Lustre client, and exercise Klustre CSI Plugin without touching production clusters.

This walkthrough targets Linux hosts with Docker/Podman because Kind worker nodes run as containers. macOS and Windows hosts cannot load kernel modules required by Lustre, but you can still observe the driver boot sequence. The shim below fakes mount.lustre with tmpfs so you can run the end-to-end demo locally.

Requirements

  • Docker 20.10+ (or a compatible container runtime supported by Kind).
  • Kind v0.20+.
  • kubectl v1.27+ pointed at your Kind context.
  • A GitHub personal access token with read:packages if you plan to pull images from GitHub Container Registry via an image pull secret (optional but recommended).

1. Create a Kind cluster

Save the following Kind configuration and create the cluster:

cat <<'EOF' > kind-klustre.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:v1.29.2
- role: worker
  image: kindest/node:v1.29.2
EOF

kind create cluster --name klustre-kind --config kind-klustre.yaml
kubectl cluster-info --context kind-klustre-kind

2. Install a Lustre shim inside the nodes

The CSI plugin shells out to mount.lustre and umount.lustre. Kind nodes do not ship with the Lustre client, so we create lightweight shims that mount a tmpfs and behave like a Lustre mount. This allows the volume lifecycle to complete even though no real Lustre server exists.

cat <<'EOF' > lustre-shim.sh
#!/bin/bash
set -euo pipefail
SOURCE="${1:-tmpfs}"
TARGET="${2:-/mnt/lustre}"
shift 2 || true
mkdir -p "$TARGET"
if mountpoint -q "$TARGET"; then
  exit 0
fi
mount -t tmpfs -o size=512m tmpfs "$TARGET"
EOF

cat <<'EOF' > lustre-unmount.sh
#!/bin/bash
set -euo pipefail
TARGET="${1:?target path required}"
umount "$TARGET"
EOF
chmod +x lustre-shim.sh lustre-unmount.sh

for node in $(kind get nodes --name klustre-kind); do
  docker cp lustre-shim.sh "$node":/usr/sbin/mount.lustre
  docker cp lustre-unmount.sh "$node":/usr/sbin/umount.lustre
  docker exec "$node" chmod +x /usr/sbin/mount.lustre /usr/sbin/umount.lustre
done

3. Prepare node labels

Label the Kind worker node so it is eligible to run Lustre workloads:

kubectl label node klustre-kind-worker lustre.csi.klustrefs.io/lustre-client=true

The default klustre-csi-static storage class uses the label above inside allowedTopologies. Label any node that will run workloads needing Lustre.

4. Deploy Klustre CSI Plugin

Install the driver into the Kind cluster using the published Kustomize manifests:

export KLUSTREFS_VERSION=main
kubectl apply -k "github.com/klustrefs/klustre-csi-plugin//manifests?ref=$KLUSTREFS_VERSION"

Then watch the pods come up:

kubectl get pods -n klustre-system -o wide

Then wait for the daemonset rollout to complete:

kubectl rollout status daemonset/klustre-csi-node -n klustre-system --timeout=120s

Wait until the klustre-csi-node daemonset shows READY pods on the control-plane and worker nodes.

5. Mount the simulated Lustre share

Create a demo manifest that provisions a static PersistentVolume and a BusyBox deployment. Because the mount.lustre shim mounts tmpfs, data is confined to the worker node memory and disappears when the pod restarts. Replace the source string with the Lustre target you plan to use later—here it is only metadata.

Create the demo manifest with a heredoc:

cat <<'EOF' > lustre-demo.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: lustre-demo-pv
spec:
  storageClassName: klustre-csi-static
  capacity:
    storage: 1Ti
  accessModes:
    - ReadWriteMany
  csi:
    driver: lustre.csi.klustrefs.io
    volumeHandle: lustre-demo
    volumeAttributes:
      # This is only metadata in the Kind lab; replace with a real target for production clusters.
      source: 10.0.0.1@tcp0:/lustre-fs
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lustre-demo-pvc
spec:
  storageClassName: klustre-csi-static
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Ti
  volumeName: lustre-demo-pv
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lustre-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: lustre-demo
  template:
    metadata:
      labels:
        app: lustre-demo
    spec:
      containers:
        - name: demo
          image: busybox:1.36
          command: ["sh", "-c", "sleep 3600"]
          volumeMounts:
            - name: lustre
              mountPath: /mnt/lustre
      volumes:
        - name: lustre
          persistentVolumeClaim:
            claimName: lustre-demo-pvc
EOF

Apply the demo manifest:

kubectl apply -f lustre-demo.yaml

Wait for the demo deployment to become available:

kubectl wait --for=condition=available deployment/lustre-demo

Confirm the Lustre (tmpfs) mount is visible in the pod:

kubectl exec deploy/lustre-demo -- df -h /mnt/lustre

Write and read back a test file:

kubectl exec deploy/lustre-demo -- sh -c 'echo "hello from $(hostname)" > /mnt/lustre/hello.txt'
kubectl exec deploy/lustre-demo -- cat /mnt/lustre/hello.txt

You should see the tmpfs mount reported by df and be able to write temporary files.

6. Clean up (optional)

Remove the demo PV, PVC, and Deployment:

kubectl delete -f lustre-demo.yaml

If you want to tear down the Kind environment as well:

kubectl delete namespace klustre-system
kind delete cluster --name klustre-kind
rm kind-klustre.yaml lustre-shim.sh lustre-unmount.sh lustre-demo.yaml

Troubleshooting

  • If the daemonset pods crash with ImagePullBackOff, use kubectl describe daemonset/klustre-csi-node -n klustre-system and kubectl logs daemonset/klustre-csi-node -n klustre-system -c klustre-csi to inspect the error. The image is public on ghcr.io, so no image pull secret is required; ensure your nodes can reach ghcr.io (or your proxy) from inside the cluster.
  • If the demo pod fails to mount /mnt/lustre, make sure the shim scripts were copied to every Kind node and are executable. You can rerun the docker cp ... mount.lustre / umount.lustre loop from step 2 after adding or recreating nodes.
  • Remember that tmpfs lives in RAM. Large writes in the demo workload consume memory inside the Kind worker container and disappear after pod restarts. Move to a real Lustre environment for persistent data testing.

Use this local experience to get familiar with the manifests and volume lifecycle, then follow the main Introduction guide when you are ready to operate against real Lustre backends.

2 - Amazon EKS Notes

Outline for deploying Klustre CSI Plugin on managed Amazon EKS clusters backed by Lustre (FSx or self-managed).

The AWS-oriented quickstart is under construction. It will cover:

  • Preparing EKS worker nodes with the Lustre client (either Amazon Linux extras or the FSx-provided packages).
  • Handling IAM roles for service accounts (IRSA) and pulling container images from GitHub Container Registry.
  • Connecting to FSx for Lustre file systems (imported or linked to S3 buckets) and exposing them via static PersistentVolumes.

Until the full write-up lands, adapt the Introduction flow by:

  1. Installing the Lustre client on your managed node groups (e.g., with yum install lustre-client in your AMI or through user data).
  2. Labeling the nodes that have Lustre access with lustre.csi.klustrefs.io/lustre-client=true.
  3. Applying the Klustre CSI manifests or Helm chart in the klustre-system namespace.

Feedback on which AWS-specific topics matter most (FSx throughput tiers, PrivateLink, IAM policies, etc.) is welcome in the community discussions.

3 - Bare Metal Notes

Notes for operators preparing self-managed clusters before following the main introduction flow.

This guide will describe how to prepare on-prem or colocation clusters where you manage the operating systems directly (kernel modules, Lustre packages, kubelet paths, etc.). While the detailed walkthrough is in progress, you can already follow the general Introduction page and keep the following considerations in mind:

  • Ensure every node that should host Lustre-backed pods has the Lustre client packages installed via your distribution’s package manager (for example, lustre-client RPM/DEB).
  • Label those nodes with lustre.csi.klustrefs.io/lustre-client=true.
  • Grant the klustre-system namespace Pod Security admission exemptions (e.g., pod-security.kubernetes.io/enforce=privileged) because the daemonset requires hostPID, hostNetwork, and SYS_ADMIN.

If you are interested in helping us document more advanced configurations (multiple interfaces, bonded networks, RDMA, etc.), please open an issue or discussion in the GitHub repository.

4 - Install with Helm

Deploy the Klustre CSI plugin using the OCI-distributed Helm chart.

The Helm chart is published under oci://ghcr.io/klustrefs/charts/klustre-csi-plugin.

1. Authenticate (optional)

If you use a GitHub personal access token for GHCR:

helm registry login ghcr.io -u <github-user>

Skip this step if anonymous pulls are permitted in your environment.

2. Install or upgrade

helm upgrade --install klustre-csi \
  oci://ghcr.io/klustrefs/charts/klustre-csi-plugin \
  --version 0.1.1 \
  --namespace klustre-system \
  --create-namespace \
  --set imagePullSecrets[0].name=ghcr-secret

Adjust the release name, namespace, and imagePullSecrets as needed. You can omit the secret if GHCR is reachable without credentials.

3. Override values

Common overrides:

  • nodePlugin.logLevel – adjust verbosity (debug, info, etc.).
  • nodePlugin.pluginDir, nodePlugin.kubeletRegistrationPath – change if /var/lib/kubelet differs on your hosts.
  • storageClass.mountOptions – add Lustre mount flags such as flock or user_xattr.

View the full schema:

helm show values oci://ghcr.io/klustrefs/charts/klustre-csi-plugin --version 0.1.1

4. Check status

kubectl get pods -n klustre-system
helm status klustre-csi -n klustre-system

When pods are ready, continue with the validation instructions or deploy a workload that uses the Lustre-backed storage class.

5 - Install with kubectl/manifests

Apply the published Klustre CSI manifests with kubectl.

1. Install directly with Kustomize (no clone)

If you just want a default install, you don’t need to clone the repository. You can apply the published manifests directly from GitHub:

export KLUSTREFS_VERSION=main
kubectl apply -k "github.com/klustrefs/klustre-csi-plugin//manifests?ref=$KLUSTREFS_VERSION"

The manifests/ directory includes the namespace, RBAC, CSIDriver, daemonset, node service account, default StorageClass (klustre-csi-static), and settings config map.

If you plan to inspect or customize the manifests, clone the repo and work from a local checkout:

git clone https://github.com/klustrefs/klustre-csi-plugin.git
cd klustre-csi-plugin

You can perform the same default install from the local checkout:

kubectl apply -k manifests

3. Customize with a Kustomize overlay (optional)

To change defaults such as logLevel, nodeImage, or the CSI endpoint path without editing the base files, create a small overlay that patches the settings config map.

Create overlays/my-cluster/kustomization.yaml:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../manifests

patchesStrategicMerge:
  - configmap-klustre-csi-settings-patch.yaml

Create overlays/my-cluster/configmap-klustre-csi-settings-patch.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: klustre-csi-settings
  namespace: klustre-system
data:
  logLevel: debug
  nodeImage: ghcr.io/klustrefs/klustre-csi-plugin:0.1.1

Then apply your overlay instead of the base:

kubectl apply -k overlays/my-cluster

You can add additional patches in the overlay (for example, to tweak the daemonset or StorageClass) as your cluster needs grow.

4. Verify rollout

kubectl get pods -n klustre-system -o wide
kubectl describe daemonset klustre-csi-node -n klustre-system
kubectl logs daemonset/klustre-csi-node -n klustre-system -c klustre-csi

After the daemonset is healthy on all Lustre-capable nodes, continue with the validation steps or jump to the sample workload.