Kubernetes init container
A Kubernetes init container that hydrates a PersistentVolumeClaim from
a remote archive is a primary peel use case. The PVC is sized for
the extracted contents plus a small download window, not for
compressed + extracted. Resume across pod restarts is automatic.
The minimal Pod spec
apiVersion: v1
kind: Pod
metadata:
name: model-server
spec:
volumes:
- name: model
persistentVolumeClaim:
claimName: model-pvc
initContainers:
- name: hydrate
image: ghcr.io/agouin/peel:latest
args:
- https://models.example.com/llama-3.tar.zst
- --sha256
- ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
- --max-bandwidth
- 200MB/s
- -o
- /model/
volumeMounts:
- name: model
mountPath: /model
containers:
- name: app
image: ghcr.io/example/model-server
volumeMounts:
- name: model
mountPath: /model
readOnly: true
Properties of this configuration:
- PVC sizing:
extracted_size + ~300 MB, notarchive_size + extracted_size. A 40 GiB extracted model fits on a 41 GiB PVC. - Resume across pod restarts: if the init container OOM-kills or
the node reboots mid-extraction, the next pod restart picks up at
the last checkpoint. The sidecars (
.peel.part,.peel.ckpt) live on the PVC, so they survive the restart. - Integrity:
--sha256verifies the source end-to-end. A corrupted upstream produces a clear failure rather than a silently-bad model. - Bandwidth limiting:
--max-bandwidth 200MB/sprevents the hydration from saturating shared cluster network.
Sizing the PVC
Roughly:
PVC size = extracted_size # the model / dataset
+ --max-disk-buffer (default 1G) # in-flight window
+ ~100 MiB # checkpoint + filesystem overhead
+ some slack # for the workload to grow
If disk is tight, lower --max-disk-buffer:
args:
- https://models.example.com/llama-3.tar.zst
- --max-disk-buffer
- 256MiB
- -o
- /model/
This tightens the lookahead floor to 256 MiB. The decoder blocks briefly when the network outruns it. There is no correctness penalty.
Sidecars on ephemeral scratch
To keep per-pod sidecars off a shared PVC, place them on the container's writable scratch layer:
args:
- https://models.example.com/llama-3.tar.zst
- --workdir
- /tmp/peel-state
- -o
- /model/
volumeMounts:
- name: model
mountPath: /model
Tradeoff: the sidecars do not survive a pod restart, so resume is lost. The next pod restart re-fetches from scratch. This is acceptable for short-running hydration and unsuitable for large archives over flaky networks.
RBAC / network policy
peel makes outbound HTTP/HTTPS to the supplied URLs. It does
not talk to the Kubernetes API. The cluster's egress policy must
allow the origin host(s). The peel container itself requires no
elevated permissions: it runs as a normal user and requires no
CAP_* capabilities.
Multi-mirror in-cluster
For intra-cluster mirrors of the same archive (e.g. a MinIO bucket
inside the cluster plus a public origin outside), use
--mirror:
args:
- https://internal-cache.svc.cluster.local/llama-3.tar.zst
- --mirror
- https://models.example.com/llama-3.tar.zst
- --sha256
- ba7816bf...
- -o
- /model/
peel prefers the internal mirror (faster, no egress cost) and
falls back to the public origin only if the internal one fails.
io_uring inside the pod
By default on Linux 5.6+, peel uses io_uring for sockets.
cri-o's default seccomp profile blocks io_uring_* syscalls, so
in practice peel logs one fallback warning at startup and continues
with the blocking backend. Two options exist for enabling io_uring:
- Loosen the seccomp profile (requires
securityContext.seccompProfile.type: Unconfinedor a custom profile that allowsio_uring_*). - Accept the fallback.
peeloperates correctly withoutio_uring, with reduced throughput on high-bandwidth links.
Most clusters take option 2. Revisit if the workload is bandwidth-bound.
A complete example with secrets
apiVersion: v1
kind: Secret
metadata:
name: archive-password
type: Opaque
stringData:
password: my-archive-password
---
apiVersion: v1
kind: Pod
metadata:
name: hydrated-app
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: data-pvc
initContainers:
- name: hydrate
image: ghcr.io/agouin/peel:latest
env:
- name: PEEL_PW
valueFrom:
secretKeyRef:
name: archive-password
key: password
args:
- https://example.com/secret.tar.zst
- --password-from
- env:PEEL_PW
- --sha256
- ba7816bf...
- -o
- /data/
volumeMounts:
- name: data
mountPath: /data
containers:
- name: app
image: ghcr.io/example/app
volumeMounts:
- name: data
mountPath: /data
readOnly: true
The password is mounted as an env var via the standard Secret
mechanism. peel reads it via --password-from env:PEEL_PW. The
secret never appears on the command line.
Comparison with curl + tar -x
The naive shape, which has the problems described below:
# NOT recommended
- name: hydrate
image: alpine
command: [sh, -c]
args:
- |
apk add curl tar zstd
curl -fL "$URL" -o /tmp/data.tar.zst
tar -I zstd -xf /tmp/data.tar.zst -C /data/
rm /tmp/data.tar.zst
Problems:
- PVC size: peak disk =
archive_size + extracted_size. A 40 GiB extracted model needs an 80+ GiB PVC. - No resume: OOM-kill mid-download restarts from byte 0.
- No integrity: curl does not verify a hash. Layering
sha256sumon afterwards is a separate step. - Single TCP stream: parallel ranged GETs are faster on high-RTT origins.
- Image bulk:
apk addpulls packages every pod restart.
A single peel invocation addresses all of these.