Share a dataset between prep and training jobs
This tutorial wires a simple data pipeline together:
- Create a static Lustre
PersistentVolumeand bind it to aPersistentVolumeClaim. - Run a data-prep job that writes artifacts into the Lustre mount.
- Start a training deployment that reads the prepared data.
- Validate shared access and clean up.
Requirements
- Klustre CSI Plugin installed and verified (see the Introduction).
- An existing Lustre export, e.g.,
10.0.0.1@tcp0:/lustre-fs. kubectlaccess with cluster-admin privileges.
1. Define the storage objects
Save the following manifest as lustre-pipeline.yaml. Update volumeAttributes.source to match your Lustre target and tweak mountOptions if required.
apiVersion: v1
kind: PersistentVolume
metadata:
name: lustre-scratch-pv
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteMany
storageClassName: klustre-csi-static
persistentVolumeReclaimPolicy: Retain
csi:
driver: lustre.csi.klustrefs.io
volumeHandle: lustre-scratch
volumeAttributes:
source: 10.0.0.1@tcp0:/lustre-fs
mountOptions: flock,user_xattr
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: lustre-scratch-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: klustre-csi-static
resources:
requests:
storage: 100Gi
volumeName: lustre-scratch-pv
Apply it:
kubectl apply -f lustre-pipeline.yaml
Confirm the PVC is bound:
kubectl get pvc lustre-scratch-pvc
2. Run the data-prep job
Append the job definition to lustre-pipeline.yaml or save it separately as dataset-job.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: dataset-prep
spec:
template:
spec:
restartPolicy: Never
containers:
- name: writer
image: busybox
command:
- sh
- -c
- |
echo "Generating synthetic dataset..."
RUNDIR=/mnt/lustre/datasets/run-$(date +%s)
mkdir -p "$RUNDIR"
dd if=/dev/urandom of=$RUNDIR/dataset.bin bs=1M count=5
echo "ready" > $RUNDIR/status.txt
ln -sfn "$RUNDIR" /mnt/lustre/datasets/current
volumeMounts:
- name: lustre
mountPath: /mnt/lustre
volumes:
- name: lustre
persistentVolumeClaim:
claimName: lustre-scratch-pvc
Apply and monitor (substitute the file name you used above):
kubectl apply -f dataset-job.yaml # or lustre-pipeline.yaml
kubectl logs job/dataset-prep
Ensure the job completes successfully before moving on.
3. Launch the training deployment
This deployment tails the generated status file and lists artifacts to demonstrate read access.
apiVersion: apps/v1
kind: Deployment
metadata:
name: trainer
spec:
replicas: 1
selector:
matchLabels:
app: trainer
template:
metadata:
labels:
app: trainer
spec:
containers:
- name: trainer
image: busybox
command:
- sh
- -c
- |
ls -lh /mnt/lustre/datasets/current
tail -f /mnt/lustre/datasets/current/status.txt
volumeMounts:
- name: lustre
mountPath: /mnt/lustre
volumes:
- name: lustre
persistentVolumeClaim:
claimName: lustre-scratch-pvc
Apply and inspect logs:
kubectl apply -f trainer-deployment.yaml
kubectl logs deploy/trainer
You should see the dataset files created by the job alongside the status text.
4. Cleanup
When finished, remove all resources. Because the PV uses Retain, data remains on the Lustre share; delete or archive it manually if desired.
kubectl delete deployment trainer
kubectl delete job dataset-prep
kubectl delete pvc lustre-scratch-pvc
kubectl delete pv lustre-scratch-pv
Next steps
- Adapt the job and deployment containers to your actual preprocessing/training images.
- Add a CronJob to refresh datasets on a schedule.
- Use the Kind quickstart if you need a disposable lab cluster to iterate on this flow.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.