Checkpoint and resume
peel survives any failure short of disk corruption (dropped TCP,
kill -9, OOM kill, pod restart, power loss) and resumes
byte-identical to a clean run. This page describes the on-disk
layout, the write cadence, and how to interpret the sidecar files.
The two sidecar files
When peel extracts <output> from an HTTP source, two sidecar
files appear next to the output during the run:
| File | What it holds |
|---|---|
<output>.peel.part | The sparse compressed bytes (the part-file). Hole-punched as the decoder advances; physical size is the lookahead window. |
<output>.peel.ckpt | Frame-aligned decoder state, chunk bitmap, and optional SHA-256 state, written atomically. |
On clean completion, both files are unlinked.
On failure or interruption, both files are left on disk. Re-running the same command picks them up and resumes from the checkpoint.
The --workdir <DIR> flag relocates both files. Their basenames stay
the same (<output_name>.peel.part / <output_name>.peel.ckpt);
only the parent directory changes.
When checkpoints are written
A checkpoint write is triggered when all of these are true:
--checkpoint-min-bytesbytes of source progress have accumulated since the last checkpoint (default 8 MiB).--checkpoint-min-secsseconds have elapsed since the last checkpoint (default 2 s).- The decoder is at a frame-aligned boundary (per zstd block, per LZMA2 chunk, per deflate block, per tar member, per 7z folder, per RAR entry, per ZIP entry / intra-entry boundary).
The byte floor is scaled up at high download rates so the cadence
stays below --checkpoint-target-secs (default 0.2 s) wall-clock.
Pass --checkpoint-target-secs 0 to disable rate-aware scaling.
The combination keeps checkpoint cadence steady (~5 / sec) on a fast
network without burning CPU on filesystems where fsync is slow,
and falls back to the byte floor on a slow network.
How a write is atomic
A checkpoint write is never a partial overwrite:
- Serialise the checkpoint blob to
<output>.peel.ckpt.tmp. fsyncit.renameit over<output>.peel.ckpt.fsyncthe parent directory.
A crash during the write loses at most the in-flight checkpoint, not the previous one. The next run reads the previous checkpoint and resumes there.
What's in the checkpoint
The on-disk format is versioned (current version 7). The blob holds:
- Source identity:
Content-Length,ETag,Last-Modified, and the per-mirror metadata. Detects upstream drift (the source changing during a run). - Chunk bitmap and CRC32C fingerprints: which chunks are complete. The per-chunk fingerprint catches partial writes that were not yet marked.
- Decoder state: per-format frame-aligned snapshot. For zstd, the inter-block state. For xz, the LZMA2 inter-chunk state. For gzip, a 32 KiB sliding-window snapshot plus the running CRC32 / ISIZE. For RAR, the §F1 blob capturing the LZ dictionary state and filter program cache.
- Sink state: per-entry write progress for tar / zip / 7z / rar per-entry sinks.
- Streaming SHA-256 state: if
--sha256is set, the SHA-256 intermediatestatewords are checkpointed so the resumed digest is byte-identical tosha256sumover the original file.
Resume guarantees
The output is byte-identical to a clean run if and only if:
- The source bytes at the same URL have not changed (ETag / Last-Modified verification catches this).
- The same
peelversion (or a forward-compatible one) is used to resume. - The output directory has not been tampered with between runs.
peeldoes not re-verify extracted files on resume; it trusts the checkpoint's record of what was written.
If the source has changed mid-run, peel's per-chunk CRC32C
fingerprints catch the drift: a chunk's fingerprint at re-fetch
time disagrees with what was checkpointed. peel aborts the resume
with a specific "source changed during run" error rather than
silently writing wrong bytes.
If the peel version changed and the checkpoint format is
incompatible, the resume aborts at parse time. Re-run with the same
version, or delete the sidecars (rm <output>.peel.part <output>.peel.ckpt) to start from scratch.
Resuming a run
There is no separate "resume" flag. Re-invoke the same command:
peel https://example.com/dataset.tar.zst -o ./out/
# Ctrl-C / kill -9 / network drop happens at 50% through.
# Sidecars remain on disk.
peel https://example.com/dataset.tar.zst -o ./out/
# Picks up at the last checkpoint, finishes the rest.
For multi-volume or multi-part runs, pass the same URL list / @file
and the same -o. The checkpoint records the assembled source's
identity, so partial progress across multiple URLs is preserved.
Crash-test coverage
The crash-test harness in tests/test_crash_resume.rs runs 100
random kill points per format and asserts that the post-resume
output bytes are byte-identical to a clean run, every time. This
verifies the byte-identical guarantee.
Inspecting a checkpoint
The checkpoint blob is not human-readable. Its presence on disk is inspectable:
$ ls -la ./out.peel.part ./out.peel.ckpt
-rw-r--r-- 1 ag staff 10737418240 May 13 14:22 ./out.peel.part # logical size
-rw-r--r-- 1 ag staff 274432 May 13 14:22 ./out.peel.ckpt
# Physical size: what is actually on disk, after hole-punching
$ du -h ./out.peel.part
123M ./out.peel.part
du -h reports the physical size, which is the in-flight window.
The logical size (ls -la) is the full archive length.
RUST_LOG=debug peel … logs checkpoint writes as they happen:
DEBUG checkpoint write: bytes_since_last=8.0MiB seconds_since_last=2.1
DEBUG checkpoint write: bytes_since_last=8.0MiB seconds_since_last=2.0
Tuning checkpoint cadence
The defaults work well across the bench grid. Reasons to tune:
| Goal | Flag | Direction |
|---|---|---|
Fewer fsyncs on slow disks | --checkpoint-min-bytes | Larger (e.g. 64 MiB) |
| Tighter resume granularity for very long runs | --checkpoint-min-secs | Smaller (e.g. 1 s) |
| Steady cadence under highly variable network | --checkpoint-target-secs | Smaller (e.g. 0.1 s) |
| Disable rate-aware scaling for reproducibility | --checkpoint-target-secs | 0 |
A more aggressive cadence trades extra fsync syscalls for
finer-grained resume: less work lost on a kill -9, more CPU and
IO during normal operation.
When resume can't help
A few scenarios fall outside the byte-identical-resume guarantee:
- The source disappeared between runs. Sidecars stay on disk until removed; the next run fails at the HEAD probe with a clear error.
- The output directory was partially modified by hand.
peeldoes not re-verify already-extracted files. If this is suspected, delete the output and the sidecars and start over. - The checkpoint format is from an incompatible
peelversion. Delete the.peel.ckptto start fresh from the part-file (the part-file's chunks are still individually verifiable via the inline fingerprints), or delete both sidecars to start completely from scratch. - Non-destructive local extraction.
peel ./file.tar.zst(no-d) is a one-pass run with no checkpoint. The source remains intact on kill, so re-run.