Tutorials

End-to-end scenarios that combine Klustre CSI with real workloads.

1: Static Lustre volume demo
2: Share a dataset between prep and training jobs

Follow these guides when you want more than a single command—each tutorial walks through a complete workflow that exercises Klustre CSI Plugin alongside common Kubernetes patterns.

Open an issue in github.com/klustrefs/klustre-csi-plugin if you’d like to see another workflow documented.

1 - Static Lustre volume demo

Provision a Lustre-backed scratch space, populate it, and consume it from a training deployment.

Use these snippets as starting points for demos, CI smoke tests, or reproduction cases when you report issues.

Static Lustre volume demo

Creates a static PV/PVC pointing at 10.0.0.1@tcp0:/lustre-fs and mounts it in a BusyBox deployment.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: lustre-static-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  storageClassName: klustre-csi-static
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: lustre.csi.klustrefs.io
    volumeHandle: lustre-static-pv
    volumeAttributes:
      source: 10.0.0.1@tcp0:/lustre-fs
      mountOptions: flock,user_xattr
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lustre-static-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: klustre-csi-static
  resources:
    requests:
      storage: 10Gi
  volumeName: lustre-static-pv
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lustre-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: lustre-demo
  template:
    metadata:
      labels:
        app: lustre-demo
    spec:
      containers:
        - name: app
          image: busybox
          command: ["sleep", "infinity"]
          volumeMounts:
            - name: lustre-share
              mountPath: /mnt/lustre
      volumes:
        - name: lustre-share
          persistentVolumeClaim:
            claimName: lustre-static-pvc

Validate

kubectl apply -f lustre-demo.yaml
kubectl exec deploy/lustre-demo -- df -h /mnt/lustre
kubectl exec deploy/lustre-demo -- sh -c 'echo "hello $(date)" > /mnt/lustre/hello.txt'

Delete when finished:

kubectl delete -f lustre-demo.yaml

Pod-level health probe

Use a simple read/write loop to verify Lustre connectivity inside a pod:

apiVersion: v1
kind: Pod
metadata:
  name: lustre-probe
spec:
  containers:
  - name: probe
    image: busybox
    command: ["sh", "-c", "while true; do date >> /mnt/lustre/probe.log && tail -n1 /mnt/lustre/probe.log; sleep 30; done"]
    volumeMounts:
    - name: lustre-share
      mountPath: /mnt/lustre
  volumes:
  - name: lustre-share
    persistentVolumeClaim:
      claimName: lustre-static-pvc

Run kubectl logs pod/lustre-probe -f to inspect the periodic writes.

Where to find more

manifests/ directory in the GitHub repo for installation YAML.
Kind quickstart for a self-contained lab, including the shim scripts used to emulate Lustre mounts.

2 - Share a dataset between prep and training jobs

Provision a Lustre-backed scratch space, populate it, and consume it from a training deployment.

This tutorial wires a simple data pipeline together:

Create a static Lustre PersistentVolume and bind it to a PersistentVolumeClaim.
Run a data-prep job that writes artifacts into the Lustre mount.
Start a training deployment that reads the prepared data.
Validate shared access and clean up.

Requirements

Klustre CSI Plugin installed and verified (see the Introduction).
An existing Lustre export, e.g., 10.0.0.1@tcp0:/lustre-fs.
kubectl access with cluster-admin privileges.

1. Define the storage objects

Save the following manifest as lustre-pipeline.yaml. Update volumeAttributes.source to match your Lustre target and tweak mountOptions if required.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: lustre-scratch-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  storageClassName: klustre-csi-static
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: lustre.csi.klustrefs.io
    volumeHandle: lustre-scratch
    volumeAttributes:
      source: 10.0.0.1@tcp0:/lustre-fs
      mountOptions: flock,user_xattr
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lustre-scratch-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: klustre-csi-static
  resources:
    requests:
      storage: 100Gi
  volumeName: lustre-scratch-pv

Apply it:

kubectl apply -f lustre-pipeline.yaml

Confirm the PVC is bound:

kubectl get pvc lustre-scratch-pvc

2. Run the data-prep job

Append the job definition to lustre-pipeline.yaml or save it separately as dataset-job.yaml:

apiVersion: batch/v1
kind: Job
metadata:
  name: dataset-prep
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: writer
          image: busybox
          command:
            - sh
            - -c
            - |
              echo "Generating synthetic dataset..."
              RUNDIR=/mnt/lustre/datasets/run-$(date +%s)
              mkdir -p "$RUNDIR"
              dd if=/dev/urandom of=$RUNDIR/dataset.bin bs=1M count=5
              echo "ready" > $RUNDIR/status.txt
              ln -sfn "$RUNDIR" /mnt/lustre/datasets/current
          volumeMounts:
            - name: lustre
              mountPath: /mnt/lustre
      volumes:
        - name: lustre
          persistentVolumeClaim:
            claimName: lustre-scratch-pvc

Apply and monitor (substitute the file name you used above):

kubectl apply -f dataset-job.yaml  # or lustre-pipeline.yaml
kubectl logs job/dataset-prep

Ensure the job completes successfully before moving on.

3. Launch the training deployment

This deployment tails the generated status file and lists artifacts to demonstrate read access.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: trainer
spec:
  replicas: 1
  selector:
    matchLabels:
      app: trainer
  template:
    metadata:
      labels:
        app: trainer
    spec:
      containers:
        - name: trainer
          image: busybox
          command:
            - sh
            - -c
            - |
              ls -lh /mnt/lustre/datasets/current
              tail -f /mnt/lustre/datasets/current/status.txt
          volumeMounts:
            - name: lustre
              mountPath: /mnt/lustre
      volumes:
        - name: lustre
          persistentVolumeClaim:
            claimName: lustre-scratch-pvc

Apply and inspect logs:

kubectl apply -f trainer-deployment.yaml
kubectl logs deploy/trainer

You should see the dataset files created by the job alongside the status text.

4. Cleanup

When finished, remove all resources. Because the PV uses Retain, data remains on the Lustre share; delete or archive it manually if desired.

kubectl delete deployment trainer
kubectl delete job dataset-prep
kubectl delete pvc lustre-scratch-pvc
kubectl delete pv lustre-scratch-pv

Next steps

Adapt the job and deployment containers to your actual preprocessing/training images.
Add a CronJob to refresh datasets on a schedule.
Use the Kind quickstart if you need a disposable lab cluster to iterate on this flow.