peel
Streaming, resumable, space-efficient extractor for compressed archives over HTTP, and for local archive files on disk.
peel https://example.com/dataset.tar.zst
peel is a Rust CLI that downloads, decompresses, and extracts an archive in
a single pass. It resumes exactly where it left off after a dropped
connection, kill -9, OOM kill, or power loss. The compressed bytes never
fully land on disk: as the decoder consumes the prefix, the download buffer
underneath is hole-punched out. The archive and the extracted tree never
coexist at full size.
What it solves
- Disk pressure. Pulling a 40 GB
.tar.zstshould not require 80 GB free. Peak disk usage is roughlyextracted_size + a few hundred MB, notcompressed_size + extracted_size. - Flaky networks. A dropped connection mid-download is the default case,
not the edge case.
peelresumes at the byte that was in flight. kill -9and pod restarts. Frame-aligned checkpoints (atomicwrite+fsync+rename) plus per-chunk fingerprints ensure a hard kill mid-extraction resumes exactly where it left off, byte-identical to a clean run.- Streaming
.zip,.7z,.rarover HTTP.curl | unzipdoes not work: the ZIP central directory lives at the end of the file, the 7z trailer pointer sits at the end of the file, andunrarrequireslseekon its input.peelissues a ranged GET for the central directory or trailer first (zip, 7z), or walks the RAR header chain in stream order (rar), then streams entries to disk as soon as their bytes arrive.
Format coverage at a glance
| Family | Formats |
|---|---|
| Plain | .tar |
| Streaming codecs | .zst / .tar.zst · .xz / .tar.xz · .lz4 / .tar.lz4 · .gz / .tar.gz |
| Random-access archives | .zip · .7z · .rar (RAR5 + legacy RAR3/RAR4) |
Encrypted archives are supported for zip (WinZip-AES, ZipCrypto), 7z (AES-256-CBC), and rar5 (AES-256-CBC, both archive-header and per-file). See Encrypted archives.
The full per-format matrix (magic-byte detection, resume granularity, encryption) is on the Supported formats page.
Distinguishing features
-
Hole-punched compressed buffer. Parallel ranged HTTP downloads feed a sparse part-file. The decoder consumes the prefix while workers continue to fetch the suffix, and finished bytes are released back to the filesystem as the decoder advances. Peak compressed-side disk usage is the download window (approximately
--max-disk-buffer), not the archive size. -
Frame-aligned, byte-identical resume. A
kill -9anywhere in the pipeline leaves a.peel.ckptnext to the part file. Re-running the same command picks up exactly at the checkpointed frame. The final output is byte-identical to a clean run. The crash-test harness runs 100 random kill points per format and asserts that property every time. -
One command for HTTP and local. A URL argument triggers parallel ranged GETs and streaming extract. A local file argument runs the same hand-rolled decoders against the file on disk: non-destructive by default, with hole-punching enabled via
-d.
Where to next
- Getting started: Installation and Quick start.
- Full flag listing: CLI reference.
- Specific features: Multi-volume archives, Encrypted archives, Performance and tuning, Checkpoint and resume.
- Pipeline integration: the worked examples cover Kubernetes init containers, CI runners, and an Arbitrum snapshot bundle.
Installation
peel is a single statically-linked binary. It has no runtime dependencies
beyond a working libc, and on Linux a 5.6+ kernel for the io_uring
fast paths. Older kernels fall back automatically.
From source (Cargo)
The currently-supported route. The crate name on crates.io is peel-rs.
The installed binary is peel.
cargo install peel-rs --locked
The MSRV is pinned in rust-toolchain.toml;
a recent stable Rust (1.93+) is sufficient.
Building from a checkout
git clone https://github.com/agouin/peel
cd peel
cargo build --release
./target/release/peel --help
Cargo features
| Feature | Default | What it enables |
|---|---|---|
rar | on | RAR5 and legacy RAR3/RAR4 decoders. When disabled, the binary still registers .rar against a diagnostic-only factory so the user sees compiled without the 'rar' feature instead of unknown format. |
To drop the RAR module entirely (shrinks the binary; useful when .rar
inputs are not expected):
cargo install peel-rs --locked --no-default-features
From a release binary
Pre-built binaries for Linux (x86_64, aarch64) and macOS (x86_64, aarch64) are attached to every GitHub release:
https://github.com/agouin/peel/releases
# Linux x86_64 example. Substitute your platform's triple.
curl -L https://github.com/agouin/peel/releases/latest/download/peel-x86_64-unknown-linux-gnu.tar.gz \
| tar -xz -C /usr/local/bin peel
peel --version
Docker
docker run --rm -v "$PWD/out:/out" ghcr.io/agouin/peel \
https://example.com/dataset.tar.zst -o /out/
The image is a FROM scratch build with the static peel binary plus a
recent CA bundle (no shell, no package manager). See the
Kubernetes init container example for usage
inside a Pod.
Verifying the install
peel --version
peel --help | head
To confirm which file-IO backend peel selects at runtime on Linux, run
any command with RUST_LOG=info and look for the
selected file IO backend = … line:
RUST_LOG=info peel https://example.com/x.tar.zst -o ./out/ 2>&1 | head -5
See Performance and tuning for what each backend means.
Quick start
Five things peel does, in five copy-pasteable commands.
1. Extract a tarball over HTTP
peel https://example.com/linux-6.x.tar.xz
Without -o, the default extract directory is the URL basename with
archive and compression suffixes stripped, in the current working
directory. The example above lands the kernel sources in ./linux-6.x/.
To set an explicit path, a trailing slash forces directory semantics (useful when the URL has no recognisable suffix):
peel https://example.com/linux-6.x.tar.xz -o ./linux/
2. Extract a bare compressed file
For stream-shaped formats (raw .zst / .xz / .lz4 / .gz) the output
is a single file, not a directory:
peel https://example.com/model.bin.zst -o ./model.bin
3. Download without extracting
Skip the decoder and write the bytes verbatim into a single file using parallel ranged GETs. The same scheduler, mirror, and resume machinery used by extract mode applies.
peel https://example.com/big.deb --no-extract
--download-only is an alias provided for compatibility with aria2c.
4. Extract a .zip, .7z, or .rar over HTTP
These formats place their index at the end of the file
(curl | unzip does not work; see How it works).
peel fetches the central directory or trailer first via a ranged GET,
then streams entries to disk as they arrive:
peel https://example.com/dataset.zip -o ./out/
peel https://example.com/snapshot.7z -o ./out/
peel https://example.com/backup.rar -o ./out/
For a password-protected archive, see Encrypted archives:
peel https://example.com/secret.zip -o ./out/ --password-from prompt
5. Extract a local file
For an archive already on disk, skip the HTTP machinery and run the same decoders against the local file. Non-destructive by default:
peel /tmp/dataset.tar.zst # extracts to ./dataset/, archive untouched
peel /tmp/dataset.tar.zst -o ./out/ # explicit output dir
peel -d /tmp/dataset.tar.zst -o ./out/ # destructive: hole-punch and delete on success
See Local-file extraction for the full mode table.
Default behaviour
Every command above runs with these guarantees, without any extra flag:
- Parallel ranged GETs. Default 4 workers, tunable with
--workers. - Streaming decompression that overlaps with the download. Peak disk for the compressed side is the lookahead window, not the archive size.
- Hole-punched compressed buffer.
fallocate(PUNCH_HOLE)andmadvise(MADV_REMOVE)release blocks of the part-file as the decoder advances past them. - Frame-aligned resume. A
kill -9mid-run leaves a.peel.ckptnext to the part-file. Re-running the same command resumes. - Live progress UI. Three-line block on TTY (download, extract, ETA,
active workers, on-disk source footprint). Falls back to
tracing::info!lines on a non-TTY without any extra flag.
Where to go next
- Hash verification, bandwidth caps, or mirrors: Integrity verification and Multi-mirror downloads.
- Multi-volume archives (
name.part0001.rar,name.7z.001, spanned ZIP): Multi-volume archives. - Archives split across several URLs as
.part0000/.part0001: Multi-part URLs. - High-bandwidth or memory-constrained runs:
Performance and tuning covers
--io-backend,--http-version,--max-bandwidth,--max-disk-buffer,--workdir. - Full flag listing: CLI reference.
How it works
This page describes the internal architecture. The Quick start is sufficient for basic use. This material covers why disk usage stays bounded, how resume converges to a byte-identical output, and what happens on the wire.
The three components
A peel run for an HTTP source has three loosely-coupled stages, each
running concurrently:
+-------------+ +----------+ +----------+
| download | ---> | part-file| ---> | decoder | ---> output tree
| workers | | (sparse) | | + sink |
+-------------+ +----------+ +----------+
| ^ |
| | v
| +-----------+ +-----------+
+-----------> | scheduler | | puncher |
| + bitmap | | (releases |
+-----------+ | blocks) |
+-----------+
- Download workers fetch ranges of the source object in parallel via
ranged GETs. Each worker writes its bytes into the sparse
.peel.partfile at the byte offset the scheduler assigned it. - The decoder walks the part-file from offset 0, consuming whatever the workers have already written, blocking briefly when it gets ahead of them.
- The puncher trails the decoder. As the decoder advances past a
chunk boundary,
fallocate(PUNCH_HOLE)(Linux) ormadvise(MADV_REMOVE)(Linux, mmap backend) orF_PUNCHHOLE(macOS) releases the blocks underneath that range back to the filesystem.
At any instant the part-file's logical size is the full archive,
but its physical size on disk is roughly the gap between the slowest
worker and the decoder. This is the lookahead window, capped by
--max-disk-buffer (default 1 GiB).
The bitmap and the checkpoint
Two pieces of state make resume work:
A chunk bitmap. The source is divided into fixed-size chunks
(--chunk-size, default 4 MiB). A bit per chunk records "this chunk's
bytes have been fetched and written." The scheduler hands out the next
unset bit to whichever worker is free.
A checkpoint sidecar. peel writes <output>.peel.ckpt next to
the part-file at quiescent points: boundaries the decoder can resume
from byte-identically. These are frame-aligned (per zstd block, per
LZMA2 chunk, per deflate block, per 7z folder, per RAR entry, etc.),
so the next run reads the checkpoint, knows the decoder state, and
picks up at exactly the byte that was in flight.
A checkpoint write is atomic: peel writes to a .tmp file,
fsyncs it, and renames it over the previous checkpoint. A crash
during the write loses at most the in-flight checkpoint, not the
previous one.
Streaming .zip, .7z, and .rar over HTTP
ZIP and 7z put their index at the end of the archive: the ZIP
central directory after every entry, the 7z trailer at the bottom of
the SignatureHeader's pointer chain. unrar does not depend on a
tail-anchored index, but the unrar binary requires lseek on its
input regardless. None of curl | unzip, curl | 7z x, or
curl | unrar x will start producing output until the entire archive
has been buffered somewhere.
peel does not buffer the whole archive. It issues a small ranged
GET to fetch the tail (zip central directory or 7z trailer) up front,
parses it, then dispatches entry-sized GETs in parallel. Entries are
written to the sink as soon as their bytes arrive while the rest of
the archive is still in flight. The same hole-punching and resume
guarantees as the streaming .tar.* family apply.
For RAR, the format's per-file headers are already laid out at the
start of each file's data area, so peel walks them in stream order.
No tail probe is needed.
Resume after kill -9
The output is byte-identical to a clean run if peel is re-invoked
with the same arguments after any failure. The mechanism:
- Workers write to the part-file with
pwrite(ormmapmemcpy under the §9 backend). The kernel page-caches the write. - The bitmap is updated only after a chunk has been written and
fsync'd back into the part-file (configurable via--checkpoint-min-bytes/--checkpoint-min-secs). - The checkpoint sidecar captures the decoder's frame-aligned state
plus the bitmap plus the streaming SHA-256 state (if
--sha256is set). - A
kill -9between bitmap updates leaves the part-file with bytes that haven't been marked yet. Per-chunk CRC32C fingerprints in the bitmap detect those bytes on resume; they are re-fetched. - The decoder resumes from the checkpoint's frame boundary, not from the start. Per-format details: zstd resumes per block, xz per LZMA2 chunk, gzip per deflate block (with a 32 KiB sliding-window snapshot), tar per member, zip per entry plus intra-entry (per deflate block / per zstd block), 7z per folder, rar per entry plus intra-entry (via the §F1 checkpoint blob that snapshots the LZ dictionary and filter cache).
The crash-test harness runs 100 random kill points per format and asserts the post-resume output bytes match a clean run, every time.
Bounded disk usage
The compressed side of the pipeline runs as a sliding window:
decoder pointer
v
[hole-punched][......in-flight......][unfetched]
^ ^
worker N worker N+M
The window's width is the gap between the slowest active worker and the decoder. Two knobs bound it:
--max-disk-buffer(default 1 GiB): when the gap reaches this many bytes, the scheduler stops dispatching new chunks until the decoder catches up. The default rarely engages on a healthy disk and bounds disaster on a slow one.--punch-threshold(default 4 MiB): minimum gap between in-loop hole-punch syscalls. Smaller values yield a tighter physical-disk footprint; larger values yield fewer syscalls per second. Tune downward to enforce a hard ceiling on physical disk; upward if the filesystem's punch-hole implementation is slow.
For --no-extract runs the puncher is bypassed and the part-file
grows to the full archive size. Otherwise the part-file's physical
size tracks the in-flight window, typically a few hundred MiB on a
healthy network.
What runs where on Linux
--io-backend auto (default) runs probes at startup and picks the
fastest path the kernel allows:
- mmap sparse-file for the part-file: workers
memcpyinto aMAP_SHAREDregion; the puncher usesmadvise(MADV_REMOVE). This removes a syscall per chunk write at high parallelism. io_uringfor the HTTP client's sockets: TCPconnect,send, andrecvare submitted to a single ring on a dedicated IO thread, with linkedLinkTimeoutSQEs for prompt cancellation.rustlsrides on top unchanged.
If a probe fails (kernel < 5.6, RLIMIT_MEMLOCK too low, seccomp
blocking, filesystem rejecting MADV_REMOVE), peel logs one
warn! and falls back to the blocking pwrite / pread backend.
Force a specific path with --io-backend [auto|blocking|uring|mmap].
See Performance and tuning.
Further reading
- Checkpoint and resume: contents of the
.peel.ckptsidecar, write cadence, and inspection. - Performance and tuning: every knob with a measured tradeoff.
- Supported formats: the per-format detection, resume, and encryption matrix.
CLI reference
peel [OPTIONS] [URLS]...
peel --help prints the same content with full details and the exact
default values for the current build. This page covers every flag,
grouped by function, with design notes and constraints.
The full alphabetical list, with one-liners, appears at the bottom under Flag summary.
Positional arguments
[URLS]...
One or more source URLs or local file paths.
- One URL: the single-source case. Example:
peel https://host/x.tar.zst -o ./out/. - Two or more URLs: activates the multi-part split-archive path. The byte-concatenation of every URL's body is treated as one logical archive stream. Workers fetch all parts in parallel via ranged GETs.
- A single local path (no
http://orhttps://scheme): activates local-file extraction. The same decoders run without HTTP machinery. @file.txt(single arg): read URLs and paths fromfile.txt, one per line. Blank lines and#comments are ignored. Suitable for multi-volume manifests stored next to the archive.
Output and destination
-o, --output <PATH>
Destination for the extracted contents. Accepts a directory for
archive formats that produce a tree (tar, zip, 7z, rar, and any
compressed wrapper around tar), or a file for stream-shaped formats
(raw .zst, .xz, .lz4, .gz).
- A trailing slash forces directory semantics.
peel x.zst -o ./out/errors at parse time because.zstis a single-file output shape. - No
-o: defaults to the URL basename with archive and compression suffixes stripped, in the current working directory.peel https://host/linux-6.x.tar.xzextracts into./linux-6.x/. - The resolver errors at coordinator entry if the explicit shape (trailing slash, file path) disagrees with the detected format.
See Output path resolution for the full table of URL → output mappings.
--workdir <DIR>
Directory for the .peel.part and .peel.ckpt sidecar files.
By default these are placed as siblings of the output
(<output>.peel.part and <output>.peel.ckpt). Override when the
extracted output and the in-flight state should live on different
disks. Examples: extracting onto slow HDD-backed storage while
keeping the part-file on a fast NVMe, or pinning the sidecars
inside a Kubernetes PVC mount when the output's parent is on
ephemeral container storage.
The directory is created if missing. The basenames stay the same; only their parent directory changes.
Download mode
peel runs in one of three modes (default, -k, --no-extract),
plus a destructive opt-in for local-file runs. See
Download modes for the full mode table.
-k, --keep-archive[=<PATH>]
Extract and keep the source archive on disk. The puncher is
forced to no-op so the archive's bytes are preserved at their full
Content-Length.
-kor--keep-archive(bare): preserve the archive as a sibling of-o, named after the URL basename.-k=<PATH>or--keep-archive=<PATH>: explicit path. The=is required because bare-kfollowed by a positional URL is otherwise ambiguous.- Flag absent: default behaviour. The source bytes are dropped. Hole-punching trims them and the part-file is removed on success.
-k is a no-op in local mode (preservation is the default there)
and incompatible with -d/--destructive for HTTP sources.
--no-extract (alias: --download-only)
Skip extraction. Download the source bytes verbatim to a single file. The remote object is fetched in parallel via ranged GETs, using the same scheduler, mirror, resume, and SHA-256 machinery as extract mode, and is renamed into place on success. No decoder runs and no holes are punched.
Suitable for arbitrary non-archive downloads, for keeping an archive
to extract later with a different tool, or as a parallel-ranged-GET
replacement for aria2c.
Mutually exclusive with --format, --force-format-from-magic, and
--punch-threshold. These are extractor knobs and nothing extracts in
this mode.
-d, --destructive
Opt in to destructive extraction in local-file mode: hole-punch the source as the decoder advances, then delete on clean completion. Required because local mode is non-destructive by default.
For HTTP sources -d is a no-op. The HTTP path is destructive by
default. Combining -d with -k for an HTTP source is an error.
--strict-format
Make format-detection failure a hard error instead of falling
through to --no-extract.
Default behaviour: if neither the URL suffix nor the magic bytes
identify a registered decoder, peel warns and saves the remote
object under its URL basename. --strict-format flips that to a
fatal error. Useful in CI when an upstream object changing shape
unexpectedly should fail the build instead of producing a different
artifact.
Incompatible with --no-extract. No detection runs when nothing is
being extracted.
Format selection
peel detects the archive shape from the URL suffix first, then
falls back to a magic-byte read of the first ~8 bytes of the source.
A mismatch between the suffix and the magic fails closed unless
overridden.
--format <NAME>
Force a specific decoder, bypassing both URL-suffix and magic-byte
detection. Required when the URL has no usable suffix (for example,
an opaque query-string download). Valid names: tar, zstd, xz,
lz4, gzip, zip, 7z, rar.
Mutually exclusive with --force-format-from-magic.
--force-format-from-magic
When the URL suffix and the source's magic bytes disagree, trust the
magic instead of returning FormatMismatch.
Mutually exclusive with --format.
Network
--workers <N>
Number of parallel ranged-GET workers. Default 4. The scheduler
will not dispatch more concurrent requests than this against the
primary or any mirror.
Raise on a high-latency, high-bandwidth link (origin in another region) where individual GETs leave the pipe under-utilised. Lower on a single-machine, single-NIC link if the workers saturate the kernel's network stack and per-worker throughput collapses.
--mirror <URL> (repeatable)
Additional source URL serving the same file. The positional URL is
the primary; every --mirror is an alternate.
At startup, peel runs a parallel HEAD against every URL and drops
any mirror whose Content-Length (or ETag and Last-Modified, when
--sha256 is unset) disagrees with the primary. Surviving mirrors are
picked from per ranged GET, biased toward the fastest live one.
Failures exclude a mirror for 30 s before retry.
--max-bandwidth <RATE>
Aggregate bandwidth cap across all workers and mirrors via a shared token bucket. Accepts:
- Decimal suffixes (1000-based, network convention):
K,M,G,T. - Binary suffixes (1024-based):
Ki,Mi,Gi,Ti. - A trailing
Band/sare accepted and ignored.
Examples: 10MB/s, 1.5GB/s, 512KiB/s, 1000000.
The cap is aggregate, not per-mirror.
--max-disk-buffer <SIZE>
Cap on the on-disk lookahead: bytes downloaded but not yet consumed
by the decoder. When the gap reaches this value, the scheduler stops
dispatching new chunks until the decoder catches up, bounding the
size of the .peel.part file when the network is faster than the
disk.
Accepts the same size syntax as --max-bandwidth. Pass none,
off, or disabled to remove the cap. Default 1GiB.
--http-version <auto|h1|h2>
HTTP version to use for downloads.
auto(default): ALPN-negotiate between H1 and H2 over TLS, H1 over plaintext.h1: force HTTP/1.1.h2: force HTTP/2. Over TLS, the origin must negotiateh2or the handshake fails. Over plaintext this forces HTTP/2 prior-knowledge ("h2c"), which only works against servers that explicitly speak it.
auto is the default.
--no-auto-discover
Skip multi-volume auto-discovery.
When the positional URL matches a multi-volume pattern
(<base>.part<N>.rar, <base>.7z.<NNN>, <base>.z<NN> and
<base>.zip), peel HEAD-probes the origin to discover the full
ordered volume set before any download starts. This flag forces the
seed to be treated as a single-source URL even when its basename
matches a multi-volume pattern.
Applicable when:
- The seed's filename matches one of the conventions but is not actually a multi-volume archive.
- Discovery would fan out to many failed HEAD probes against a high-latency origin and the seed is known to be a single source.
No effect when multiple positional URLs are supplied. That path already opts out of auto-discovery.
Integrity
--sha256 <HEX> (repeatable)
SHA-256 digest the assembled compressed source must match. Repeatable.
- Single-URL runs: pass once.
peelstreams a hand-rolled, resumable SHA-256 over the source bytes as they arrive and aborts at clean completion if the digest disagrees. The hash state is checkpointed across resumes, so a resumed run produces a digest byte-identical tosha256sumon the original file. - Multi-URL runs: pass zero times (no verification) or exactly once per URL, paired by order. Hashes are per-part digests of each part's bytes; verified at part-boundaries as the decoder advances.
See Integrity verification. Hashing happens on the
streaming pipeline. .zip archives extract per-entry and integrity
checking does not extend to that path in the current release.
Encryption
--password-from <SOURCE>
Password source for encrypted archives. Accepts:
prompt: read from/dev/ttywith echo disabled. Up to 3 attempts on a wrong password before exit code 4.env:NAME: read from the named environment variable.file:PATH: read the first line of the file. Modes other than0600emit a one-shot warning.fd:N: read from file descriptorN(one-shot, until EOF or newline). Compatible withpeel … --password-from fd:3 3< <(pass …).
peel does not accept a --password=<value> flag. argv is
visible to every process on the host.
Tuning knobs
These have measured defaults that work well across the bench grid. See Performance and tuning before changing them in production.
--chunk-size <BYTES>
Bitmap chunk size: the unit of completion tracked in checkpoints. Default 4 MiB.
With adaptive chunk-sizing enabled (the default), the scheduler may
coalesce several consecutive bitmap chunks into a single ranged GET;
this flag continues to set the bitmap unit. Pair with
--no-adaptive-chunk-size to force a fixed dispatch size.
--no-adaptive-chunk-size
Disable the adaptive chunk-size policy. The scheduler dispatches exactly one bitmap chunk per worker, with no growth/shrink decisions over the lifetime of the run. Useful for benchmarking and reproducible test runs.
--punch-threshold <BYTES>
Minimum gap between in-loop hole-punch syscalls. Default 4 MiB.
Smaller values yield a tighter physical-disk footprint; larger values yield fewer syscalls per second. Tune downward to enforce a hard ceiling on physical disk; upward if the filesystem's punch-hole implementation is slow.
--checkpoint-min-bytes <BYTES>
Minimum source-byte progress between checkpoint writes. Default 8 MiB.
--checkpoint-min-secs <SECS>
Minimum wall-clock interval between checkpoint writes (fractional). Default 2 s.
--checkpoint-target-secs <SECS>
Target wall-clock interval between checkpoints. Used to scale the
byte floor up at high download rates so the cadence stays below this
target. 0 disables rate-aware scaling. Default 0.2 s.
--io-backend <auto|blocking|uring|mmap>
File-IO backend selection.
auto(default): on Linux,mmapfor the sparse part file plusio_uringfor sockets, with graceful fallback. On non-Linux, the blocking backend for both.blocking: force the pre-io_uringpwrite/preadpath everywhere. Used for A/B comparison.uring: requireio_uringfor sockets; error out if unavailable.mmap: force the memory-mapped sparse-file path explicitly, with the blocking socket backend.
See Performance and tuning for what each path does and when to pick it.
Help and version
-h, --help
Print full help. -h prints a one-line summary per flag; --help
prints the full description.
-V, --version
Print the version.
Flag summary
| Flag | Purpose | Default |
|---|---|---|
-o, --output <PATH> | Output path | URL basename, suffixes stripped |
--workdir <DIR> | Sidecar (.peel.part / .peel.ckpt) location | Sibling of output |
-k, --keep-archive[=<PATH>] | Extract AND keep the source | off |
--no-extract | Download without extracting | off |
-d, --destructive | Hole-punch + delete source (local mode) | off |
--strict-format | Unrecognised format → error | off |
--format <NAME> | Force a decoder | none |
--force-format-from-magic | Trust magic over URL suffix | off |
--workers <N> | Parallel GETs | 4 |
--mirror <URL> (repeat) | Additional source URLs | none |
--max-bandwidth <RATE> | Aggregate token-bucket cap | none |
--max-disk-buffer <SIZE> | Lookahead window cap | 1 GiB |
--http-version <auto|h1|h2> | HTTP version | auto |
--no-auto-discover | Skip multi-volume HEAD probes | off |
--sha256 <HEX> (repeat) | Verify hash | none |
--password-from <SOURCE> | Password source | none |
--chunk-size <BYTES> | Bitmap unit | 4 MiB |
--no-adaptive-chunk-size | Fixed dispatch size | off |
--punch-threshold <BYTES> | Min gap between punches | 4 MiB |
--checkpoint-min-bytes <BYTES> | Min progress between checkpoints | 8 MiB |
--checkpoint-min-secs <SECS> | Min interval between checkpoints | 2 s |
--checkpoint-target-secs <SECS> | Target interval (rate-aware) | 0.2 s |
--io-backend <NAME> | File-IO backend | auto |
-h, --help | Print help | none |
-V, --version | Print version | none |
Supported formats
Every format peel decodes is hand-rolled or wraps a vetted upstream
crate. The binary does not shell out to tar, unzip, 7z, or
unrar. See How it works for the architecture.
Detection
peel resolves the archive shape with a two-step fallback:
- URL-suffix. The last component of the URL is matched against a
list of known suffixes (
.tar,.tar.zst,.zst,.tar.xz,.xz,.tar.lz4,.lz4,.tar.gz,.gz,.zip,.7z,.rar). - Magic-byte fallback. If the suffix doesn't match,
peelissues a tiny initial GET for the first ~16 bytes of the source and matches the magic.
A mismatch between suffix and magic (for example, a URL ending in
.tar.zst but bytes starting with the gzip magic 0x1f8b) fails
closed. Override with one of:
--force-format-from-magic: trust the magic, ignore the suffix.--format <NAME>: bypass detection entirely.
If neither suffix nor magic matches a registered decoder, the default
behaviour is to warn once and fall through to --no-extract. The
remote object is saved under its URL basename. --strict-format
converts that warning to a fatal error.
Format matrix
| Format | Streaming | Resume granularity | Encryption | Multi-volume |
|---|---|---|---|---|
.tar (uncompressed) | ✓ | per tar member | n/a | n/a |
.zst / .tar.zst | ✓ | per zstd block | n/a | n/a |
.xz / .tar.xz | ✓ | per LZMA2 chunk | n/a | n/a |
.lz4 / .tar.lz4 | ✓ | per lz4 block | n/a | n/a |
.gz / .tar.gz | ✓ | per deflate block¹ | n/a | n/a |
.bz2 / .tar.bz2 / .tbz2 / .tbz | ✓ | per bzip2 block | n/a | n/a |
.zip | per-entry² | per entry + intra-entry³ | WinZip-AES, ZipCrypto | spanned ZIP (.zNN + .zip) |
.7z | per-folder⁴ | per folder | AES-256-CBC (SHA-256 KDF) | .7z.001/.002/… |
.rar (RAR5) | per-entry⁵ | per entry + intra-entry⁶ | AES-256-CBC (header + per-file) | .part0001.rar/… |
.rar (RAR3/RAR4 legacy) | per-entry⁷ | per entry + intra-entry⁷ | queued | RAR3 multi-volume queued |
Footnotes below.
Streaming codecs (.tar.*, raw codecs)
.tar (uncompressed)
Plain POSIX tar. peel recognises ustar (0x75 0x73 0x74 0x61 0x72
at offset 257) and emits each entry to its final path as the member
header arrives. Hard links, symlinks, and long-name extensions are
all supported.
.zst / .tar.zst
Streaming Zstandard. The decoder is hand-rolled in
src/decode/zstd/. Resume is per-block: the checkpoint snapshots
the decoder state at every zstd block boundary, so a kill -9
mid-archive picks up at the next block.
The zstd crate in [dependencies] exists for decoding zstd-coded
ZIP entries only. The streaming .tar.zst / .zst path is
hand-rolled.
.xz / .tar.xz
Streaming XZ (LZMA2). The hand-rolled decoder in
src/decode/xz_liblzma/ is per-cycle-equivalent to liblzma (see
the bench grid in the project README). Resume is per LZMA2 chunk.
.lz4 / .tar.lz4
Streaming LZ4 Frame Format. Frame parsing is hand-rolled; the inner
block-layer decompression uses the lz4_flex crate's
block::decompress_into API. Resume is per lz4 block.
.gz / .tar.gz
Streaming gzip with hand-rolled RFC 1951 DEFLATE. The 32 KiB sliding
window and the running CRC32 / ISIZE are persisted in the checkpoint,
so a kill -9 mid-member resumes byte-identically without
re-decoding the member from its start.
Multi-member gzip (the pigz / gzip a b > c.gz shape) is
handled per RFC 1952 §2.2: concatenated members are decoded in
sequence and emitted as one logical stream.
¹ flate2 is a [dev-dependencies] only (used in the differential
test harness to cross-check the hand-rolled decoder); the runtime
binary does not link flate2.
.bz2 / .tar.bz2 / .tbz2 / .tbz
Streaming bzip2 with hand-rolled MSB-first Huffman / MTF / RLE2 /
BWT / RLE1 layers. Each block (≤ 900 KB uncompressed at -9, with
a 48-bit pi BCD sync header 0x314159265359 per the bzip2 wire
format) is an independent restart point; the per-block resume blob
is ~25 bytes (bit cursor + running stream CRC + cross-block RLE1
state + stream level). The decoder rejects the legacy bzip2 0.9.0
"randomised block" flag with a specific diagnostic; modern encoders
have not emitted that flag since 1.0.0 in 1999.
Multi-stream .bz2 files (the cat a.bz2 b.bz2 > c.bz2 shape) are
handled by aligning to the next byte boundary after each stream's
combined CRC and re-entering the per-block loop with a fresh RLE1
state.
peel does not link libbz2; the decoder is pure Rust.
Random-access archives
.zip
ZIP uses a separate per-entry pipeline because of its
central-directory-at-the-end layout. On startup, peel issues a
small ranged GET for the End-of-Central-Directory record, walks the
central directory, then dispatches per-entry GETs in parallel.
Entries are written to their final paths as their bytes arrive.
Supported coders in entries:
- STORED (uncompressed)
- DEFLATE (RFC 1951; same hand-rolled decoder as
.gz) - zstd entries (via the
zstdcrate's streaming reader API)
Encryption: WinZip-AES (AE-1 and AE-2 forms, AES-128/192/256-CTR with PBKDF2-HMAC-SHA1 key derivation and an HMAC-SHA1-80 trailer); PKWARE traditional "ZipCrypto" (CRC32-keyed PRGA, insecure but supported for compatibility). PKWARE strong-encryption (central-directory encryption, general-purpose flag bit 6) is not supported and surfaces as a clear error.
Zip64, multi-disk / spanned archives (other than the simple
.zNN + .zip form), and AES with non-standard parameters are
not yet supported. Such archives fail with a specific
"unsupported feature" error rather than producing wrong output.
² Per-entry streaming: each entry's bytes are written to its final path as soon as they arrive, while the rest of the archive is still in flight.
³ STORED entries resume byte-granular. DEFLATE entries resume per deflate block via the 32 KiB-window snapshot. zstd entries resume per zstd block. All encoded into the checkpoint format (version 7) under each in-progress entry.
.7z
7z uses a separate per-folder pipeline because of its
SignatureHeader → trailer-pointer layout. peel reads the
SignatureHeader at offset 0, follows the pointer to fetch the
trailer, parses the streams metadata, and dispatches per-folder
GETs.
Supported coders:
- COPY (no compression)
- DEFLATE
- LZMA
- LZMA2
Header forms: plain Header and unencrypted EncodedHeader (the
trailer compresses metadata with an unencrypted coder chain).
Encryption ships for AES-256-CBC under the
7z KDF (crate::crypto::sevenz_kdf).
The current release is single-volume only; multi-volume .7z.001
support is planned. BCJ filters (x86, ARM, and other preprocessor
filters) and per-coder intra-folder resume are queued.
⁴ Resume granularity is one folder at a time. A kill -9 mid-folder
restarts that folder from the start of its packed range. Per-coder
intra-folder resume, BCJ filters, AES with non-default parameters,
and multi-volume archives are queued.
.rar (RAR5)
RAR5 walks file headers in stream order with no tail-anchored index
like zip or 7z, so peel streams entries to their final paths as
each entry's data area arrives.
Supported coders (compression methods):
- STORED (method 0)
- Standard RAR5 algorithm (methods 1–5) via the hand-rolled
decode::rar_nativeLZSS pipeline plus the RAR-VM standard filters (E8, E8E9, Delta, RGB, Audio).
Encryption: AES-256-CBC for both archive-header encryption
(HEAD_CRYPT, header type 4) and per-file data encryption (extra
record type 1), with PBKDF2-HMAC-SHA256 key derivation. Optional
pswcheck verifier supported. See Encrypted archives.
Multi-volume archives in the <base>.part<N>.rar form are
supported via the multi-volume path:
auto-discovery, explicit positional list, or manifest file.
The previous "non-encrypted, single-volume only" restriction no
longer applies. Encryption ships, multi-volume ships. SFX archives
and the rarely-used RAR-VM custom-filter slot
(O.RAR.CUSTOMFILTER) remain queued.
⁵ Per-entry streaming with the §F1 checkpoint blob capturing the LZ dictionary state and filter program cache so resume is byte-identical.
⁶ Mid-entry resume: a kill -9 mid-RAR5 file restarts the in-flight
entry from the snapshot, not from its start. Multi-block lookahead
state is captured in the blob.
.rar (RAR3 / RAR4 legacy)
Legacy RAR3 / RAR4 archives use the hand-rolled decode::rar_legacy
LZ pipeline plus the RarVM standard-filter dispatcher (E8, E8E9,
Delta, RGB, Audio).
Supported coders:
- STORED
- LZ Normal (
-m3from therarencoder)
The mid-entry checkpoint blob (PLAN_rar3.md §F1) captures the LZ
dictionary state and filter program cache.
PPMd-II and other less-common filters and coders are queued. Encryption for legacy RAR3 archives is queued.
⁷ Same per-entry-plus-intra-entry resume model as RAR5; the LZ
pipeline is different (hand-rolled decode::rar_legacy) but the
checkpoint semantics are identical.
RAR provenance
peel's RAR3 and RAR5 decoders are clean-room implementations.
RARLAB's unrar source has not been consulted at any point.
libarchive's RAR readers (LGPL-2.1, OSI-licensed) are referenced
as an external spec where the RAR wire format requires one. They
are read, not vendored or linked.
Test fixtures are produced with a license-purchased copy of
RARLAB's rar encoder. The unrar binary is not linked, vendored,
or used as an implementation reference; it appears in the RAR
benchmark grid as a third-party point of comparison only.
peel is licensed MIT OR Apache-2.0. The unRAR license is non-OSI
and GPL-incompatible, so a clean-room derivation is the only way to
ship a RAR decoder without inheriting that constraint.
Disabling the RAR module
To produce a smaller binary without .rar support, build without
the rar feature:
cargo install peel-rs --locked --no-default-features
The crate still registers .rar and the RAR5 magic against a
diagnostic-only factory, so the user sees a precise compiled without the 'rar' feature error rather than unknown format.
What's not (yet) supported
The following are not in the current release:
.lzma(raw LZMA1, no XZ container): not registered.- PKWARE strong encryption: clear error.
- ZIP64 multi-disk: clear error (regular Zip64 is supported).
- GPG-encrypted tarballs: out of scope. This is a separate pipeline
that
peeldoes not wrap. - 7z BCJ filters, AES with non-default coder placement, multi-volume
.7z.001: clear error. - RAR self-extracting (SFX) archives: clear error.
Exit codes
peel uses a small, stable set of exit codes so wrapper scripts can
distinguish failure modes without parsing stderr.
| Code | Meaning |
|---|---|
0 | Extraction completed successfully |
1 | Generic extraction or I/O failure (everything not covered below) |
2 | CLI argument parse error (clap-handled; not user-distinct) |
4 | PasswordIncorrect or PasswordMissing anywhere in the error chain |
128 + signum | Graceful shutdown after a signal (130 = SIGINT, 143 = SIGTERM); sidecars left on disk for resume |
Code 0
The extracted output is complete and, if --sha256 was set, matches
the expected hash. The .peel.part and .peel.ckpt sidecars have
been unlinked.
In -k and --keep-archive mode, the source archive is at its final
location.
In --no-extract mode, the downloaded source bytes are at -o, or at
the URL basename if -o was omitted.
Code 1
Something else went wrong: disk full, network exhausted retries, the
source disappeared mid-run, the checkpoint format is incompatible,
format detection failed under --strict-format, or the SHA-256 digest
did not match.
The error message on stderr identifies the cause. Examples:
| Error message | Cause |
|---|---|
No space left on device | Output filesystem full |
digest mismatch: expected …, got … | --sha256 value disagrees with the streamed bytes |
source changed during run | Per-chunk CRC32C fingerprint disagrees on resume |
format detection failed, --strict-format set | --strict-format is on and neither URL suffix nor magic identifies a registered decoder |
mirror https://… : 502 Bad Gateway (after retries) | All mirrors exhausted |
checkpoint format version 6 not compatible with this peel build (current: 7) | Older sidecar; delete it or use a compatible peel version |
The sidecars (.peel.part, .peel.ckpt) are left in place on a
code-1 exit so that a follow-up run can either resume (if the cause
was transient) or be cleaned up explicitly.
Code 4
A password issue: the wrong password was supplied, no password was
supplied for an encrypted archive, or --password-from prompt
exhausted its 3 retries.
This is a separate code so that scripts can re-prompt without conflating it with a genuine extraction failure. A retry loop:
while true; do
peel "$URL" --password-from prompt -o ./out/ && break
rc=$?
if [ "$rc" != "4" ]; then
echo "peel failed with code $rc (not a password issue)" >&2
exit "$rc"
fi
echo "wrong password, retry" >&2
done
See Encrypted archives for the full encryption discussion.
Codes 130 and 143 (signal exits)
peel traps SIGINT (Ctrl-C) and SIGTERM and exits with
128 + signum (130 for SIGINT, 143 for SIGTERM). On graceful
shutdown:
- The current checkpoint is flushed and
fsync'd. - The
.peel.partand.peel.ckptsidecars are left on disk. - Re-running the same command resumes from the last checkpoint.
SIGKILL (kill -9) does not get a graceful shutdown. The
process dies immediately. An ungraceful kill is still safe to resume
from: the last completed checkpoint is on disk, the per-chunk
fingerprints catch the in-flight chunk's partial bytes, and the next
run reconciles.
Code 2 (clap parse error)
CLI argument parsing errors (unrecognised flag, conflicting flags,
wrong value type) come from clap and exit code 2. The error
message names the offending argument:
error: the argument '--no-extract' cannot be used with '--format <NAME>'
Usage: peel --no-extract [URLS]...
For more information, try '--help'.
This is not user-distinct. It follows the standard clap convention
and matches cargo, rustup, and most modern Rust CLIs.
Scripting against the codes
A common pattern distinguishes "user error" (retry with different inputs), "transient error" (retry with the same inputs), and "give up":
#!/usr/bin/env bash
set -u
URL=$1
OUT=$2
peel "$URL" -o "$OUT" --password-from env:PEEL_PW
rc=$?
case "$rc" in
0)
echo "ok"; exit 0 ;;
4)
echo "wrong password: set PEEL_PW correctly and retry"; exit 4 ;;
130|143)
echo "interrupted; re-run to resume"; exit "$rc" ;;
*)
echo "peel failed; sidecars at ${OUT}.peel.part / ${OUT}.peel.ckpt"; exit "$rc" ;;
esac
On kill -9, peel does not get to set an exit code. The parent
sees 137 (128 + 9), which peel itself never produces. That
state is still resumable: the next run picks up the sidecars.
Download modes
peel runs in one of three modes for HTTP sources, plus a destructive
opt-in for local-file sources. The mode is selected by flag at the CLI;
format detection (URL suffix → magic bytes) decides the output shape
for the default mode.
Mode summary (HTTP source)
| Flag | Download | Extract | Hole-punch source | Source on disk at exit |
|---|---|---|---|---|
| (default) | yes | yes | yes | deleted |
-k (bare) | yes | yes | no | preserved as sibling of -o |
-k=<PATH> | yes | yes | no | preserved at <PATH> |
--no-extract | yes | no | n/a | preserved at -o |
If format detection fails, peel warns and runs as --no-extract
by default: the remote object is saved to disk under its URL
basename. Pass --strict-format to make that case a hard error
instead. Useful in CI when an upstream object changing shape should
fail the build rather than produce a different artifact.
Default mode: extract and destroy
peel https://example.com/dataset.tar.zst -o ./out/
Behaviour:
- Parallel ranged GETs feed
<output>.peel.part(sparse). - The decoder consumes the prefix while workers fetch the suffix.
fallocate(PUNCH_HOLE)/madvise(MADV_REMOVE)releases blocks of the part-file as the decoder advances past them.- On clean completion, the part-file (now mostly holes) is unlinked
and the checkpoint sidecar (
<output>.peel.ckpt) is removed. - On
kill -9or crash, the part-file and the checkpoint sidecar are left on disk. Re-running the same command resumes byte-identically.
Peak compressed-side disk: roughly --max-disk-buffer (default 1 GiB).
Peak total disk: extracted_size + lookahead_window.
-k / --keep-archive: extract and keep the archive
peel https://example.com/dataset.tar.zst -o ./out/ -k
peel https://example.com/dataset.tar.zst -o ./out/ -k=./preserved/dataset.tar.zst
Behaviour:
- Same parallel download and streaming extract as default mode.
- The puncher is forced to no-op. The part-file grows to the
full
Content-Lengthof the source. - On clean completion, the part-file is renamed to its final
archive path:
- Bare
-k→ sibling of-o, named after the URL basename. -k=<PATH>→ explicit path. The=is required, since bare-kfollowed by a positional URL is otherwise ambiguous.
- Bare
Peak disk: extracted_size + compressed_size. Use this mode when the
archive must remain on disk afterward (for example, to upload it
elsewhere, to extract it again with a different tool, or to keep as
a backup).
-k is redundant with --no-extract (which already preserves the
source). The CLI logs an info-level note rather than erroring.
--no-extract: download only, parallel-GET aria2c-style
peel https://example.com/big.deb --no-extract
peel https://example.com/big.deb --download-only # alias
Behaviour:
- Parallel ranged GETs feed
<output>.peel.part. - No decoder runs. No holes are punched.
- On clean completion,
<output>.peel.partis renamed to its final path (the URL basename when-ois unset). - Resume on
kill -9or network drop works the same way as extract mode: the chunk bitmap, ETag handling, and SHA-256 hashing all apply.
Suitable for:
- Arbitrary remote downloads that are not archives:
.debpackages, raw binaries, checksum files, ML weight files. - Keeping the archive on disk to extract later with a different tool.
- Using
peelas a parallel ranged-GET replacement foraria2c,axel, orwget -c, with the same scheduler, mirror fan-out, SHA-256 verification, and checkpointed resume.
Mutually exclusive with --format, --force-format-from-magic, and
--punch-threshold. Those are extractor knobs and nothing extracts
in this mode.
-d / --destructive: opt in to destructive local-file extraction
peel /tmp/dataset.tar.zst # non-destructive (default for local)
peel /tmp/dataset.tar.zst -d -o ./out/ # destructive: hole-punch + delete on success
Local-file extraction is non-destructive
by default. peel abc.tar.xz extracts into ./abc/ and leaves
abc.tar.xz untouched. -d opts in to the disk-pressure contract
of the HTTP path: the source is progressively hole-punched as the
decoder advances and deleted on clean completion, freeing the
archive's blocks before the extracted tree is fully written.
For an HTTP source, -d is a harmless no-op (HTTP runs are
destructive by default), and peel logs an info-level note.
Combining -d with -k/--keep-archive for an HTTP source is an
error: the two intents contradict.
-d does not apply to the random-access formats (.zip, .7z,
.rar) in local mode. Their pipelines seek backwards into the
archive (zip central directory at the tail, 7z trailer pointer,
rar per-entry headers), so a monotonically-advancing punch cursor
cannot be maintained. peel warns and proceeds non-destructively
when -d is passed against one of those sources.
Strict mode
peel --strict-format <URL> -o <PATH>
When the URL suffix and the magic-byte read both fail to identify a
registered decoder, the default behaviour is a warning and a
fall-through to --no-extract. --strict-format turns that case
into a hard error.
Use this in CI when an upstream object changing shape unexpectedly
(.tar.zst → .tar.gz, or a maintainer's CDN serving a different
file under the same URL) should fail the build rather than produce
a different artifact. Incompatible with --no-extract (no detection
runs when nothing is being extracted). Compatible with -k.
Putting it together
| Goal | Command |
|---|---|
| Extract and discard the archive | peel <URL> -o ./out/ |
| Extract and keep the archive | peel <URL> -o ./out/ -k |
| Just download (no extract) | peel <URL> --no-extract |
| Extract a local file, preserve source | peel ./archive.tar.zst -o ./out/ |
| Extract a local file, free disk as you go | peel ./archive.tar.zst -o ./out/ -d |
| Verify hash, cap bandwidth, fan out across mirrors | see Multi-mirror downloads |
| Fail CI on format drift | peel <URL> -o ./out/ --strict-format |
Output path resolution
-o <PATH> accepts either a directory (for archive formats that
produce a tree) or a file (for stream-shaped formats). When -o
is omitted, peel derives a default from the URL basename.
The two output shapes
| Shape | When | Default -o |
|---|---|---|
| Directory (tree-shaped) | tar, zip, 7z, rar, and any .tar.<x> wrapper | URL basename with archive / compression suffixes stripped |
| File (stream-shaped) | Raw .zst, .xz, .lz4, .gz (no inner tar) | URL basename with the compression suffix stripped |
If the explicit -o does not match the detected format's shape,
peel errors at coordinator entry. There is no silent fixup.
Examples
# Tar wrapper → directory. Trailing slash is optional but explicit.
peel https://example.com/linux-6.x.tar.xz # → ./linux-6.x/
peel https://example.com/linux-6.x.tar.xz -o ./linux/ # → ./linux/
peel https://example.com/linux-6.x.tar.xz -o ./linux # → ./linux/ (no trailing slash, still a dir)
# Raw compressed → single file.
peel https://example.com/model.bin.zst # → ./model.bin
peel https://example.com/model.bin.zst -o ./weights.bin # → ./weights.bin
# ZIP / 7z / RAR → directory.
peel https://example.com/data.zip # → ./data/
peel https://example.com/snapshot.7z -o ./snap/ # → ./snap/
peel https://example.com/backup.part0001.rar -o ./out/ # → ./out/ (multi-volume auto-discovered)
# Trailing slash forces directory semantics. Useful when the URL has
# no suffix and a tree output is required for `--format zip`.
peel "https://host/dl?id=42" --format zip -o ./out/
How the basename is computed
- Strip the URL's query string and fragment.
- Take the last path component.
- Strip suffixes in order until a non-archive / non-compression
suffix remains:
.tarstrips.tar..zst/.xz/.lz4/.gzstrip the compression suffix..tar.zstetc. strip both..zip/.7z/.rarstrip the archive suffix.
Examples:
| URL basename | Default output |
|---|---|
linux-6.x.tar.xz | linux-6.x/ |
model.bin.zst | model.bin |
data.tar | data/ |
dataset.zip | dataset/ |
snapshot.7z | snapshot/ |
backup.part0001.rar | backup/ |
When the URL has no useful suffix
Opaque query-string downloads (?id=42, ?download_token=…) defeat
the suffix-based default. Two options:
# 1. Pin the decoder explicitly.
peel "https://host/dl?id=42" --format zstd -o ./out.bin
# 2. Force trust in magic-byte detection. A small initial GET reads
# the magic, and the resolver picks the decoder from there.
peel "https://host/dl?id=42" --force-format-from-magic -o ./out.bin
--format is more deterministic than relying on magic. Prefer it
whenever the format is known ahead of time.
Conflict resolution
| Situation | Behaviour |
|---|---|
-o is a file path, format is tree-shaped (e.g. tar.zst) | Error at coordinator entry: shape mismatch |
-o ends in /, format is stream-shaped (e.g. raw .zst) | Error at coordinator entry: shape mismatch |
-o is a directory that exists and is non-empty (tar / zip / 7z / rar) | peel writes into it; pre-existing files are overwritten when an archive entry has the same path |
-o is a file path that exists (stream-shaped output) | Overwritten |
-o's parent directory does not exist | Created if a single parent component is missing; otherwise error |
Where the sidecars live
The .peel.part and .peel.ckpt sidecar files live next to the
output by default:
-o ./out/→./out.peel.part,./out.peel.ckpt-o ./out.bin→./out.bin.peel.part,./out.bin.peel.ckpt
Override with --workdir <DIR> when the output and the in-flight
state should live on different disks:
# Extract onto slow HDD-backed /data, keep the in-flight state on
# fast NVMe at /var/cache/peel.
peel https://host/dataset.tar.zst -o /data/out/ --workdir /var/cache/peel/
The directory is created if missing. The basenames stay the same
(<output_name>.peel.part / <output_name>.peel.ckpt); only the
parent directory changes.
Multi-part URLs
Some publishers split a large archive into multiple files at the HTTP
layer, serving separate URLs for name.tar.part0000,
name.tar.part0001, and so on. The byte-concatenation of every part's
body forms the logical archive. peel handles this case by accepting
two or more positional URLs.
This case differs from multi-volume archives. The
format's own splitting (RAR .partNNN.rar, 7z .7z.NNN, spanned ZIP
.zNN) stores format-aware metadata in each volume. Multi-part URLs
carry no such metadata: they are a byte-stream served across multiple
URLs.
Usage
peel \
https://snapshot.example.com/dataset.tar.part0000 \
https://snapshot.example.com/dataset.tar.part0001 \
https://snapshot.example.com/dataset.tar.part0002 \
-o ./out/
Behaviour:
- At startup,
peelissues a parallelHEADagainst every URL and reads itsContent-Length. - The full assembled length is
sum(Content-Length)of the parts. - Workers fetch every part in parallel via ranged GETs (the same
approach as
aria2c -Z), but the bytes stream into a single logical part file and a single decoder. - The decoder sees the byte-concatenation of every part's body, in order.
The compressed bytes never fully land on disk. The hole-punching and resume guarantees of a single-URL run apply.
Verifying integrity per part
--sha256 is repeatable. With two or more URLs, either form is valid:
- Zero
--sha256flags: no verification. - Exactly one
--sha256per URL, paired by order. Each part's hash is verified at its part-boundary as the decoder advances.
peel \
https://snapshot.example.com/dataset.tar.part0000 \
https://snapshot.example.com/dataset.tar.part0001 \
--sha256 0a8de6e83fd8ba040fd052fd8d4fd0e009a9736ace5cb32bb2abd4ac6a61725d \
--sha256 1bcf4d2e9aa01ff5... \
-o ./out/
A wrong number of --sha256 flags (1 for 2 URLs, 3 for 2 URLs) is
rejected at parse time.
A real example: Arbitrum snapshot bundles
This mode is used in production against
Arbitrum snapshot bundles. The
nova snapshot, for example, is published as pruned.tar.part0000
through pruned.tar.partNNNN with a per-part SHA-256 list:
peel \
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0000 \
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0001 \
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0002 \
--sha256 0a8de6e83fd8ba040fd052fd8d4fd0e009a9736ace5cb32bb2abd4ac6a61725d \
--sha256 1bcf4d2e9aa01ff5e8aa72a2ab39310af020bdb6f76d6f7c75c7c14ade38c6ce \
--sha256 c40bf8a2cb9d9a90e4c80a5b7c6e9c5d3b8a2e1f9d4a6c1b7e2f8d3a5c0b9e1f0 \
-o ./nova-out/
The convenience script
scripts/arb-snapshot.sh
wraps the URL list / hash list discovery against the Arbitrum
manifest.
Reading URLs from a file
When the part list is large (tens or hundreds of URLs), pass it as
@file.txt instead of inlining it:
# urls.txt: blank lines and "#" comments are skipped
https://snapshot.example.com/dataset.tar.part0000
https://snapshot.example.com/dataset.tar.part0001
https://snapshot.example.com/dataset.tar.part0002
peel @urls.txt -o ./out/
@file.txt is also used for multi-volume manifests.
Differences from multi-volume archives
| Multi-part URLs | Multi-volume archives | |
|---|---|---|
| Detection | Caller passes ≥ 2 URLs | One URL whose basename matches a known volume pattern, auto-discovered |
| Format metadata | None; bytes concatenate raw | Each volume carries format-aware headers |
| Order matters | Yes; caller specifies | Yes; discovered from volume numbering |
| Use cases | Large .tar.* published in chunks (Arbitrum snapshots) | RAR .partNNN.rar, 7z .7z.NNN, spanned ZIP |
| Override | Pass @file.txt for many parts | --no-auto-discover forces single-source |
To distinguish the two cases, inspect the URL suffixes. URLs ending
in .partNNNN (numbered, no archive extension) or .tar.partNNN
(numbered after a tar extension) are multi-part URLs. URLs ending in
.part0001.rar, .7z.001, or .z01 are multi-volume archives.
Multi-volume archives
Some archive formats support splitting one logical archive across
multiple physical files with format-aware metadata in each volume.
peel recognises three multi-volume naming conventions and resolves
every sibling volume up front (one parallel HEAD per volume for
HTTP seeds).
Archives split at the HTTP layer (raw .partNNNN files that
concatenate into a logical archive) are handled by
Multi-part URLs instead.
Supported conventions
| Format | Pattern | Example |
|---|---|---|
| RAR5 | <base>.part<N>.rar | backup.part0001.rar, backup.part0002.rar, … |
| 7z | <base>.7z.<NNN> | snapshot.7z.001, snapshot.7z.002, … |
| ZIP (spanned) | <base>.z<NN> + <base>.zip | data.z01, data.z02, …, data.zip |
For spanned ZIP, the <base>.zip final volume is mandatory: it
contains the End-of-Central-Directory record. The .zNN files hold
the entry data.
Three ways to invoke
1. Single seed with auto-discovery (default)
Pass any volume whose basename matches a recognised pattern and
peel discovers the full ordered set:
peel https://host/backup.part0001.rar -o ./out/
peel https://host/snapshot.7z.001 -o ./out/
peel ./data.z01 -o ./out/ # local works too
At startup, peel:
- Recognises the pattern from the basename.
- Probes the origin for siblings via
HEADagainstbackup.part0002.rar,backup.part0003.rar, and so on, until two consecutive HEADs return 404. - Reports the resolved volume count in the progress UI.
- Routes downloads through the multi-volume storage path.
Discovery is parallel: every probe runs concurrently against the origin, so resolution costs one round-trip of wall-clock time regardless of the volume count.
2. Explicit positional list
Pass every volume URL as a positional argument. Useful when
auto-discovery does not fit (volumes hosted on different origins, or
numbering that is not contiguous from 0001):
peel \
https://host/backup.part0001.rar \
https://host/backup.part0002.rar \
https://host/backup.part0003.rar \
-o ./out/
The volume basenames must form a contiguous numeric sequence;
out-of-order or non-contiguous entries (part0001, part0003) are
rejected at parse time with a specific error.
3. Manifest file
Pass @file.txt (one URL or path per line; blank lines and #
comments ignored):
# volumes.txt
https://host/backup.part0001.rar
https://host/backup.part0002.rar
https://host/backup.part0003.rar
peel @volumes.txt -o ./out/
Useful when the volume list is long or generated programmatically.
Disabling auto-discovery
--no-auto-discover forces single-source semantics on a seed whose
basename happens to match a multi-volume pattern:
# Just download the one .zip file, don't probe for .z01 siblings.
peel https://host/data.zip --no-extract --no-auto-discover
When to use it:
- The seed's filename matches one of the conventions but is not
actually a multi-volume archive (for example, an unrelated
.zipfile that should not be HEAD-probed for.z01siblings). - Discovery would fan out to many failed HEAD probes against a high-latency origin and the seed is known to be a single source.
The flag has no effect when multiple positional URLs are supplied: that path already opts out of auto-discovery.
How it interacts with the streaming pipeline
A multi-volume archive is internally a single logical archive: the scheduler, the bitmap, the checkpoint, and the decoder all see one contiguous source. Each volume contributes its bytes to the byte-concatenated logical stream.
That means:
- Resume works across volumes. A
kill -9while volume 7 of 12 is in flight is safe: the next run picks up exactly where the decoder was. - Hole-punching applies to each volume's
.peel.partshard as the decoder advances past it. The compressed-side disk footprint stays bounded the same way as a single-URL run. - Mirror fan-out (
--mirror) is currently single-URL only. Multi-volume archives are fetched from their primary URLs. Mirror support across the volume set is a planned addition. --sha256is single-hash-per-URL on multi-URL runs, so a multi-volume archive expects a hash per volume.
Listing the resolved volumes
Run with RUST_LOG=info to see the discovered set before any
downloads start:
RUST_LOG=info peel https://host/backup.part0001.rar -o ./out/ 2>&1 | head -20
Look for the discovered N volumes line and the per-volume sizes.
Diagnostics
| Error | Cause | Fix |
|---|---|---|
multi-volume volumes not contiguous | Explicit list skips a number | Add the missing volume or renumber |
multi-volume probe returned mixed Content-Length | Origin serving inconsistent volumes | Investigate origin; check for partial uploads |
spanned zip requires .zip final volume | Passed .z01..z09 without .zip | Add the .zip final volume to the list |
cannot mix multi-volume conventions | .part0001.rar + .7z.001 in one list | One archive per invocation |
Multi-mirror downloads
peel can fetch a single file from several origins in parallel,
biasing the work toward whichever mirror is fastest and excluding
mirrors that fail. The positional URL is the primary. Every
--mirror <URL> is an alternate.
Usage
peel https://primary.example.com/dataset.tar.zst \
--mirror https://eu.mirror.example.com/dataset.tar.zst \
--mirror https://us.mirror.example.com/dataset.tar.zst \
-o ./out/
--mirror is repeatable with no fixed upper bound. Returns diminish
once the mirror count exceeds the network's ability to keep more
than a few worker connections busy.
Startup validation
Before any data download, peel runs a parallel HEAD against
every URL (primary and each mirror) and compares:
Content-Length: the byte size of the source must agree across all mirrors.ETag/Last-Modified: if--sha256is unset, these serve as a secondary identity signal. Mismatched ETags indicate the mirrors are serving different files.Accept-Ranges: bytes: required for the mirror to be useful. Mirrors without ranged-GET support are dropped with a warning.
Any mirror that fails these checks is excluded for the run. Surviving mirrors are selected per ranged GET, biased toward the fastest live mirror.
If the primary fails validation, the run aborts unless
--sha256 is set. With the hash as the source of truth, peel
proceeds against the agreeing mirrors.
Scheduler behaviour
For each pending ranged GET:
- The scheduler picks among healthy mirrors using a smoothed per-mirror throughput estimate.
- A mirror that fails a request (5xx, connection reset, timeout) is excluded for 30 seconds before being retried.
- The exclusion is logged at
warn!. The retry is logged atinfo!. - If all mirrors are excluded simultaneously, the scheduler back-pressures the workers until one returns.
A flapping mirror takes itself out of rotation rather than failing the whole run, providing graceful degradation.
Combining with other features
With --sha256
When --sha256 is set, the hash is the source of truth. peel
trusts agreeing mirrors even when their Last-Modified headers
disagree (CDN edge timing, mirror re-uploads). A wrong-hash mirror
fails validation later. A right-hash mirror is accepted even if its
metadata is slightly different.
peel https://primary.example.com/dataset.tar.zst \
--mirror https://eu.mirror.example.com/dataset.tar.zst \
--mirror https://us.mirror.example.com/dataset.tar.zst \
--sha256 ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad \
-o ./out/
With --max-bandwidth
The cap is aggregate across all mirrors via a single token
bucket. --max-bandwidth 50MB/s against 3 mirrors caps the total at
50 MB/s, not 150 MB/s. This matches the intent when the cap exists
to be polite to the caller's network or to the mirrors collectively.
With --workers
--workers <N> is the total in-flight request count across all
mirrors, not per-mirror. With 4 workers and 3 mirrors, ~4 concurrent
requests are in flight at any time, drawn from whichever mirrors are
fastest at the moment.
What --mirror is not
- Failover only.
peeldoes not sequentially try mirror 1, then mirror 2 on failure. It uses all of them in parallel by default. - A way to download from sharded URLs. When the URLs serve different bytes (different parts of one logical file), use Multi-part URLs.
- A way to download a multi-volume archive. For
name.part0001.rar+name.part0002.rar(each volume is its own file), use Multi-volume archives.--mirrorapplies only when the same file is reachable at multiple URLs.
Diagnostics
| Log line | Meaning |
|---|---|
mirror https://… dropped at startup: Content-Length mismatch | Mirror's reported size disagrees with the primary |
mirror https://… dropped: no Accept-Ranges: bytes | Mirror does not support ranged GETs and cannot be used for parallel download |
mirror https://… excluded for 30s after status=502 | Transient failure; mirror will be retried |
all mirrors excluded; back-pressuring | All sources are down simultaneously; the scheduler waits |
primary failed validation; using N agreeing mirror(s) | The primary's size/etag didn't match; mirrors did. Requires --sha256 |
Local-file extraction
When passed a path on disk, peel skips the HTTP machinery entirely
(no scheduler, no mirrors, no chunk bitmap) and runs the same
decoder / sink / extractor stack against the local file.
Use this mode when the archive is already on disk and peel's
decoders are preferred over tar -I zstd -xf, unzip, 7z x, or
unrar x.
Usage
# Non-destructive (default): extracts to ./dataset/ and leaves
# /tmp/dataset.tar.zst untouched.
peel /tmp/dataset.tar.zst
# Explicit output directory.
peel /tmp/dataset.tar.zst -o ./out/
# Destructive opt-in: hole-punch the source as the decoder advances,
# delete it on clean completion.
peel -d /tmp/dataset.tar.zst -o ./out/
peel recognises a local path by the absence of an http:// or
https:// scheme. Relative paths are resolved against the current
working directory.
Modes
| Flag | Behaviour |
|---|---|
| (default) | Non-destructive: extract and leave the source untouched, no .peel.ckpt written |
-d / --destructive | Hole-punch the source as the decoder advances and delete it on clean completion |
-k / --keep-archive | No-op in local mode (preservation is already the default); kept for cross-source script compatibility |
--format <NAME> | Force a decoder (same semantics as HTTP mode) |
--workdir <DIR> | Place the .peel.ckpt sidecar here instead of next to the source (destructive mode only) |
--io-backend … | Selects the puncher implementation (auto / blocking / mmap) |
--punch-threshold | Minimum gap between in-loop punch syscalls in destructive mode |
Resume
Destructive mode writes a .peel.ckpt next to the source after
each quiescent decoder boundary. A kill -9 mid-run followed by a
re-invocation (with the same -d) converges to the same final
output tree as a clean single run.
Non-destructive mode is one-pass: no .peel.ckpt is
written. A kill mid-run requires re-running from scratch against
the still-intact source.
Format coverage
Every format peel supports works through the local path:
- Streaming shapes (
.tar.zst,.tar.xz,.tar.lz4,.tar.gz, raw.zst/.xz/.lz4/.gz, plain uncompressed.tar) flow through the same single-pass decoder the HTTP path uses. - Random-access shapes (
.zip,.7z,.rar: RAR5 plus legacy RAR3/RAR4) drive their per-format pipelines against the source archive opened read-only and wrapped in a fully-marked chunk bitmap, so the existing orchestrators run unchanged.
Destructive mode (-d) does not apply to the random-access
formats. Their pipelines seek backwards into the archive (zip's
central directory at the tail, 7z's trailer pointer, rar's per-entry
headers), so a monotonically-advancing punch cursor cannot be
maintained. peel warns and proceeds non-destructively when -d is
passed against one of those sources.
Flags rejected in local mode
A few HTTP-only flags are rejected at parse time when peel detects
a local-path positional argument:
--mirror--sha256--workers--chunk-size--no-adaptive-chunk-size--max-bandwidth--max-disk-buffer--http-version--no-extract--strict-format
If any of those flags are required, the run belongs on the HTTP
path: pass a file:///… URL or upload to localhost.
When to use local mode
The HTTP path uses the same decoders. The choice depends on whether the bytes are already on disk.
- Already on disk: local mode is faster (no syscall overhead
from the HTTP client), simpler, and supports destructive
hole-punching via
-dwhen disk pressure is the goal. - Must be downloaded: HTTP mode does it in one pass. A separate download then local-extract pipeline adds a full disk round-trip.
The bench grid in the project README
compares peel against the system tools (tar -I zstd -xf …,
unzip, 7z x, unrar x) for local-file decode and covers the
per-format performance characteristics.
Examples
# Extract a .tar.zst from disk, default output dir is ./dataset/
peel /tmp/dataset.tar.zst
# Extract a .zip with one specific decoder, output to ./out/
peel ./archive.zip --format zip -o ./out/
# Free disk as the extraction proceeds (destructive); fail if the
# decoder gets stuck.
peel -d /var/snapshots/big.tar.xz -o /data/snapshot/
# Keep checkpoint state on fast NVMe, write output to slow HDD.
peel -d /data/big.tar.zst -o /mnt/slow/out/ --workdir /var/cache/peel/
Checkpoint and resume
peel survives any failure short of disk corruption (dropped TCP,
kill -9, OOM kill, pod restart, power loss) and resumes
byte-identical to a clean run. This page describes the on-disk
layout, the write cadence, and how to interpret the sidecar files.
The two sidecar files
When peel extracts <output> from an HTTP source, two sidecar
files appear next to the output during the run:
| File | What it holds |
|---|---|
<output>.peel.part | The sparse compressed bytes (the part-file). Hole-punched as the decoder advances; physical size is the lookahead window. |
<output>.peel.ckpt | Frame-aligned decoder state, chunk bitmap, and optional SHA-256 state, written atomically. |
On clean completion, both files are unlinked.
On failure or interruption, both files are left on disk. Re-running the same command picks them up and resumes from the checkpoint.
The --workdir <DIR> flag relocates both files. Their basenames stay
the same (<output_name>.peel.part / <output_name>.peel.ckpt);
only the parent directory changes.
When checkpoints are written
A checkpoint write is triggered when all of these are true:
--checkpoint-min-bytesbytes of source progress have accumulated since the last checkpoint (default 8 MiB).--checkpoint-min-secsseconds have elapsed since the last checkpoint (default 2 s).- The decoder is at a frame-aligned boundary (per zstd block, per LZMA2 chunk, per deflate block, per tar member, per 7z folder, per RAR entry, per ZIP entry / intra-entry boundary).
The byte floor is scaled up at high download rates so the cadence
stays below --checkpoint-target-secs (default 0.2 s) wall-clock.
Pass --checkpoint-target-secs 0 to disable rate-aware scaling.
The combination keeps checkpoint cadence steady (~5 / sec) on a fast
network without burning CPU on filesystems where fsync is slow,
and falls back to the byte floor on a slow network.
How a write is atomic
A checkpoint write is never a partial overwrite:
- Serialise the checkpoint blob to
<output>.peel.ckpt.tmp. fsyncit.renameit over<output>.peel.ckpt.fsyncthe parent directory.
A crash during the write loses at most the in-flight checkpoint, not the previous one. The next run reads the previous checkpoint and resumes there.
What's in the checkpoint
The on-disk format is versioned (current version 7). The blob holds:
- Source identity:
Content-Length,ETag,Last-Modified, and the per-mirror metadata. Detects upstream drift (the source changing during a run). - Chunk bitmap and CRC32C fingerprints: which chunks are complete. The per-chunk fingerprint catches partial writes that were not yet marked.
- Decoder state: per-format frame-aligned snapshot. For zstd, the inter-block state. For xz, the LZMA2 inter-chunk state. For gzip, a 32 KiB sliding-window snapshot plus the running CRC32 / ISIZE. For RAR, the §F1 blob capturing the LZ dictionary state and filter program cache.
- Sink state: per-entry write progress for tar / zip / 7z / rar per-entry sinks.
- Streaming SHA-256 state: if
--sha256is set, the SHA-256 intermediatestatewords are checkpointed so the resumed digest is byte-identical tosha256sumover the original file.
Resume guarantees
The output is byte-identical to a clean run if and only if:
- The source bytes at the same URL have not changed (ETag / Last-Modified verification catches this).
- The same
peelversion (or a forward-compatible one) is used to resume. - The output directory has not been tampered with between runs.
peeldoes not re-verify extracted files on resume; it trusts the checkpoint's record of what was written.
If the source has changed mid-run, peel's per-chunk CRC32C
fingerprints catch the drift: a chunk's fingerprint at re-fetch
time disagrees with what was checkpointed. peel aborts the resume
with a specific "source changed during run" error rather than
silently writing wrong bytes.
If the peel version changed and the checkpoint format is
incompatible, the resume aborts at parse time. Re-run with the same
version, or delete the sidecars (rm <output>.peel.part <output>.peel.ckpt) to start from scratch.
Resuming a run
There is no separate "resume" flag. Re-invoke the same command:
peel https://example.com/dataset.tar.zst -o ./out/
# Ctrl-C / kill -9 / network drop happens at 50% through.
# Sidecars remain on disk.
peel https://example.com/dataset.tar.zst -o ./out/
# Picks up at the last checkpoint, finishes the rest.
For multi-volume or multi-part runs, pass the same URL list / @file
and the same -o. The checkpoint records the assembled source's
identity, so partial progress across multiple URLs is preserved.
Crash-test coverage
The crash-test harness in tests/test_crash_resume.rs runs 100
random kill points per format and asserts that the post-resume
output bytes are byte-identical to a clean run, every time. This
verifies the byte-identical guarantee.
Inspecting a checkpoint
The checkpoint blob is not human-readable. Its presence on disk is inspectable:
$ ls -la ./out.peel.part ./out.peel.ckpt
-rw-r--r-- 1 ag staff 10737418240 May 13 14:22 ./out.peel.part # logical size
-rw-r--r-- 1 ag staff 274432 May 13 14:22 ./out.peel.ckpt
# Physical size: what is actually on disk, after hole-punching
$ du -h ./out.peel.part
123M ./out.peel.part
du -h reports the physical size, which is the in-flight window.
The logical size (ls -la) is the full archive length.
RUST_LOG=debug peel … logs checkpoint writes as they happen:
DEBUG checkpoint write: bytes_since_last=8.0MiB seconds_since_last=2.1
DEBUG checkpoint write: bytes_since_last=8.0MiB seconds_since_last=2.0
Tuning checkpoint cadence
The defaults work well across the bench grid. Reasons to tune:
| Goal | Flag | Direction |
|---|---|---|
Fewer fsyncs on slow disks | --checkpoint-min-bytes | Larger (e.g. 64 MiB) |
| Tighter resume granularity for very long runs | --checkpoint-min-secs | Smaller (e.g. 1 s) |
| Steady cadence under highly variable network | --checkpoint-target-secs | Smaller (e.g. 0.1 s) |
| Disable rate-aware scaling for reproducibility | --checkpoint-target-secs | 0 |
A more aggressive cadence trades extra fsync syscalls for
finer-grained resume: less work lost on a kill -9, more CPU and
IO during normal operation.
When resume can't help
A few scenarios fall outside the byte-identical-resume guarantee:
- The source disappeared between runs. Sidecars stay on disk until removed; the next run fails at the HEAD probe with a clear error.
- The output directory was partially modified by hand.
peeldoes not re-verify already-extracted files. If this is suspected, delete the output and the sidecars and start over. - The checkpoint format is from an incompatible
peelversion. Delete the.peel.ckptto start fresh from the part-file (the part-file's chunks are still individually verifiable via the inline fingerprints), or delete both sidecars to start completely from scratch. - Non-destructive local extraction.
peel ./file.tar.zst(no-d) is a one-pass run with no checkpoint. The source remains intact on kill, so re-run.
Integrity verification
peel provides two integrity mechanisms layered on top of the
streaming pipeline:
--sha256 <HEX>: end-to-end source verification against the exact bytes produced bysha256sumover the original archive.- Per-chunk CRC32C fingerprints: automatic drift detection inside the chunk bitmap, catching a source that changes mid-run or on resume.
The second mechanism is enabled by default. The first is opt-in via
the --sha256 flag.
--sha256: end-to-end source verification
Pass the expected digest of the compressed source bytes, the value
sha256sum dataset.tar.zst would print over the original archive
(not over the extracted contents):
peel https://example.com/dataset.tar.zst \
--sha256 ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad \
-o ./out/
Behaviour:
peelstreams a resumable SHA-256 over the source bytes as they arrive.- On clean completion, the digest is compared. A mismatch aborts the run with a specific error and exit code 1.
- The hash state is checkpointed alongside everything else, so a
resumed run produces a digest byte-identical to
sha256sumon the original file.
The digest is 64 hex characters. Mixed case is accepted; whitespace is not.
Relationship to TLS
TLS protects against in-flight tampering. It does not protect against:
- The origin serving a corrupted file.
- A CDN mirror serving a stale or wrong file.
- A
--mirrorURL pointing at a subtly different object. - A mid-flight transmission glitch that survives TLS framing (rare but observed).
--sha256 enforces a published-by-the-source contract (the project
declares "this archive's hash is X") end-to-end.
Multi-URL runs
For multi-part URLs and
multi-volume archives, --sha256 is repeatable.
Pass zero flags (no verification) or exactly one --sha256
per URL, paired by order:
peel \
https://host/dataset.tar.part0000 \
https://host/dataset.tar.part0001 \
--sha256 0a8de6e83fd8ba040fd052fd8d4fd0e009a9736ace5cb32bb2abd4ac6a61725d \
--sha256 1bcf4d2e9aa01ff5... \
-o ./out/
Each part's hash is verified at its part-boundary as the decoder
advances. A wrong number of --sha256 flags (1 hash for 2 URLs, 3
hashes for 2 URLs) is rejected at parse time.
Scope and limits
--sha256covers the streaming pipeline: anything that goes through the.tar.*/ raw codec /.7zpath..ziparchives extract per-entry and integrity checking does not extend to the streaming-source path in the current release. Each ZIP entry's own CRC32 (in the central directory) is still verified per-entry.- For
.rarand.7z, the format's per-entry integrity check (RAR's BLAKE2sp / CRC32, 7z's per-substream CRC32) is verified independently and on top of--sha256.
CRC32C fingerprints: automatic drift detection
Every bitmap chunk (default 4 MiB) has a CRC32C fingerprint stored in the checkpoint. Two scenarios where this matters:
Mid-run source drift
If the source changes during a long run (someone re-uploaded the
file, or a CDN edge invalidated and re-pulled a different version),
a worker fetching a later chunk receives bytes that disagree with
those of an earlier worker. The fingerprint comparison catches this
case: peel aborts with a "source changed during run" error rather
than producing wrong output.
Resume after a kill
On kill -9, the part-file may contain bytes for chunks that were
not yet marked complete in the bitmap. On resume, peel:
- Reads the bitmap to find which chunks are complete.
- Re-verifies the fingerprint against the bytes on disk for any chunk near a recent bitmap update.
- Marks the chunk complete if the fingerprint matches, or re-fetches it otherwise.
This procedure makes a kill -9 mid-write safe. Bytes that landed on
disk are reused when correct and refetched when not.
ETag / Last-Modified handling
When --sha256 is not set, peel uses ETag and Last-Modified
as secondary identity signals:
- At startup, the HEAD probe records the ETag and Last-Modified.
- On resume, the ETag and Last-Modified are re-checked. A change indicates the source changed and the resume aborts.
- For multi-mirror runs, mirrors with disagreeing
ETags are dropped at startup (unless
--sha256is set, in which case the hash is the source of truth).
Strong ETags are honoured strictly. Weak ETags (W/"…") are treated
as best-effort, since they may legitimately differ across CDN edges
for the same file. A weak-ETag mismatch is logged but does not fail
the run.
Reading a hash from a file
peel does not provide a --sha256-file flag. Use shell substitution:
# Bash, zsh:
peel "$URL" --sha256 "$(awk '{print $1}' dataset.tar.zst.sha256)" -o ./out/
# With process substitution:
peel "$URL" --sha256 $(< checksum.txt) -o ./out/
A future --sha256-file <PATH> flag is under consideration. Shell
substitution is the recommended path in the interim.
Failure modes
| Error message | Cause | Action |
|---|---|---|
digest mismatch | --sha256 value disagrees with what was streamed | Check the source against the published hash; the file may have been re-uploaded |
source changed during run | CRC32C fingerprint disagrees between chunks | Re-run; the source is unstable |
ETag mismatch on resume | The source's ETag changed since the run started | Delete the sidecars and start fresh, or pass --sha256 to trust the hash instead |
multi-URL sha256 count mismatch | Wrong number of --sha256 for the URL count | Pass exactly one per URL, or none |
Encrypted archives
peel decrypts encrypted ZIP, 7z, and RAR5 archives. It never
encrypts. Re-encrypting an extracted stream to a different password
is out of scope; pipe to 7z or zip for that.
The password is supplied via --password-from <SOURCE> and never
appears on the command line. argv is visible to every process on
the host and is the wrong default for a passphrase.
Supported schemes at a glance
| Format | Scheme | KDF | Authenticated |
|---|---|---|---|
| zip | WinZip-AES (AE-1 / AE-2; AES-128/192/256-CTR) | PBKDF2-HMAC-SHA1, 1000 iterations | HMAC-SHA1-80 trailer |
| zip | PKWARE traditional "ZipCrypto" (CRC32-keyed PRGA) | password-derived 12-byte header | none (CRC32 of plaintext)¹ |
| rar5 | AES-256-CBC, archive-header encryption (type 4) | PBKDF2-HMAC-SHA256, 2^(kdf_count+15) iterations | optional pswcheck |
| rar5 | AES-256-CBC, per-file encryption (extra record 1) | same as above (per-record salt / IV / kdf_count) | optional pswcheck |
| 7z | AES-256-CBC (coder id 06:F1:07:01) | bespoke SHA-256 "round-tower" KDF | none (CRC32 of plaintext) |
¹ ZipCrypto is insecure: published 1994, broken under known-plaintext attack. Supported only for compatibility with archives that already use it.
Supplying a password
--password-from <SOURCE>
| Source | Use it when | Notes |
|---|---|---|
prompt | Interactive terminal | Reads /dev/tty directly (so a piped stdin carrying archive data can't accidentally answer). Echo disabled. Up to 3 retries on wrong password. |
env:NAME | CI / scripted runs | Reads the named environment variable. Strips a trailing newline; empty values are refused. |
file:PATH | Long-lived credential files | Reads the first line of PATH. Modes other than 0600 emit a one-shot warning. |
fd:N | Process substitution / pass integration | Reads from file descriptor N (until EOF or newline). peel … --password-from fd:3 3< <(pass …). |
Absence of --password=<value>
Process-list visibility (ps aux, /proc/<pid>/cmdline,
Get-Process -IncludeUserName) is the wrong default for a
passphrase. Every other source above keeps the password out of
argv. For a one-step non-interactive invocation, wrap with
env:NAME:
PEEL_PW="$(cat ~/.peel-passwords/dataset)" \
peel "$URL" --password-from env:PEEL_PW -o ./out/
unset PEEL_PW
Examples
Interactive prompt
peel https://example.com/secret.zip -o ./out/ --password-from prompt
The prompt reads /dev/tty. Three failed attempts trigger exit
code 4.
From an environment variable
PEEL_PW='hunter2' peel "$URL" --password-from env:PEEL_PW -o ./out/
From a file
echo 'hunter2' > /root/.peel-pw
chmod 0600 /root/.peel-pw
peel "$URL" --password-from file:/root/.peel-pw -o ./out/
The 0600 chmod silences the mode warning.
From an fd via process substitution
peel "$URL" --password-from fd:3 3< <(pass show archives/dataset) -o ./out/
Integrate with pass, gopass, 1password-cli, or any other
passphrase manager that writes to stdout by piping its output into
an fd peel reads.
RAR5 specifics
RAR5 has two independent encryption layers. An archive may use either, both, or neither.
Archive-header encryption (HEAD_CRYPT)
When present, every header after HEAD_CRYPT is AES-256-CBC
encrypted under a per-archive key. Each encrypted header is prefixed
by its own 16-byte IV and padded to a 16-byte boundary.
Data areas are not encrypted by this layer. They pass through
cleartext (or under per-file encryption, below). peel's walker
switches into encrypted-header mode after parsing HEAD_CRYPT.
Per-file data encryption (extra record type 1)
Each file header may carry an encryption record with its own salt,
IV, kdf_count, and optional pswcheck. When present, the file's
data area is AES-256-CBC encrypted under a per-file key.
Both layers share a single password (resolved once per archive). The
kdf_count byte is capped at the spec maximum of 24
(= 2^39 iterations) before key derivation runs.
When a checkpoint resumes a partially-extracted run, encrypted entries restart from byte 0 on the in-flight entry. The CBC chain state cannot yet be migrated across a checkpoint snapshot. The sink replays the on-disk prefix to seed its hashes, so the user-visible bytes remain byte-identical to a clean run.
7z specifics
7z has a single encryption shape: an AES-256-CBC coder (id
06:F1:07:01) at the front of a folder's coder chain.
The coder props blob encodes:
numCyclesPower(low 6 bits of byte 0): the SHA-256 round-tower KDF runs2^powerrounds.- Optional salt (up to 16 bytes) and IV (up to 16 bytes), present when the high bits of byte 0 are set.
The KDF derives a 32-byte AES-256 key by hashing
salt || password_utf16le || round_counter_le for each of the
2^power rounds. The on-disk IV is zero-padded to 16 bytes if shorter.
7z has no in-archive password verifier (unlike RAR5's optional
pswcheck or ZIP-AES's 2-byte PBKDF2 verifier). The first
correctness signal is the per-substream CRC32 inside the decoded
plaintext. Under a wrong password the plaintext is random and the
CRC32 mismatches with overwhelming probability. peel translates
that into EncryptionError::PasswordIncorrect when it knows the
folder is encrypted.
All folders in an archive share one password (loaded lazily on the first encrypted folder), matching 7-Zip's own behaviour. Resume restarts the in-flight folder from byte 0, the same constraint as RAR5's per-file encryption, for the same reason (CBC chain state).
Exit code 4
Password-related failures use a dedicated exit code so scripts can distinguish them from generic extraction failures:
0: extraction completed.1: generic extraction or I/O failure.4:PasswordIncorrectorPasswordMissinganywhere in the error chain.128 + signum: graceful shutdown after SIGINT (130) or SIGTERM (143). The.peel.part/.peel.ckptsidecars are left on disk; re-running resumes.
A retry loop on wrong password looks like:
while true; do
peel "$URL" --password-from prompt -o ./out/ && break
rc=$?
if [ "$rc" != "4" ]; then
echo "peel failed with code $rc (not a password issue)"; exit "$rc"
fi
echo "wrong password, retry"
done
Threat model
peel decrypts. It does not authenticate the user. It has no
support for hardware tokens, smart cards, GPG-encrypted passphrases,
or biometric unlock. The user supplies a passphrase via one of the
--password-from sources; everything beyond that is the operating
system's responsibility.
peel does not protect against an attacker with:
- Read access to the process's address space (
/proc/<pid>/memon Linux,vmmapon macOS). The internalPasswordwrapper zeroises its backing storage on drop, but a snapshot taken during key derivation will see the cleartext. - Read access to the swap device. If the machine swaps mid-extraction the passphrase may be written to disk. Disable swap for the workload if this matters.
- Read access to
argv, which is precisely why--password=<value>does not exist. - Precise micro-architectural timing side-channels (Spectre-class,
cache-timing on a co-located VM). Tag comparisons go through a
length-stable
ct_eqfunction, but the underlying AES / HMAC primitives are not cycle-constant.
peel does apply the following discipline:
- Every cryptographic primitive ships with a differential test suite
cross-checking against a reference upstream crate (
sha1,hmac,pbkdf2,sha2,aes,ctr,cbc). The runtime binary links none of these. - Tag and verifier comparisons route through
crypto::ct_eq. - Password bytes never travel through any code path that prints
Debug. ThePasswordtype explicitly redacts. - All KDF iteration counts come from the archive's header.
peeldoes not guess a sensible default.
Out of scope, permanently
- Re-encrypting on the fly.
peelnever encrypts. "Decrypt this remote archive and re-encrypt to a different password" is not apeeljob. - Password-protected gzip / xz / lz4 / zstd. None of these formats has a native encryption layer; the convention "GPG-encrypted tarball" is a separate pipeline.
- ZIP central-directory encryption (PKWARE strong-encryption spec,
general-purpose flag bit 6). Used by approximately one product
outside enterprise contexts. Surfaces as
unsupported feature: PKWARE strong encryption. - Hardware-accelerated AES (AES-NI). Software AES first; a runtime-probed AES-NI path may land later behind a feature flag.
Verifying the primitives
Every primitive ships with a differential test suite that runs
≥ 1000 random inputs through both peel's implementation and the
upstream reference crate, asserting byte-identical output. The corpus
also includes known-answer vectors from the format specs themselves
(FIPS 197 for AES, RFC 3174 for SHA-1, NIST SP 800-132 for PBKDF2).
cargo test --tests test_crypto_diff
The reference crates pinned in [dev-dependencies] are sha1,
hmac, pbkdf2, aes, ctr, cbc, sha2, blake2. The runtime
binary links none of these.
Performance and tuning
peel's defaults target a laptop-class machine on a healthy network and
land within ~6% of the system tools across the
bench grid in the project README.
Outside that envelope (extremely high bandwidth, severe memory pressure,
picky filesystems, locked-down kernels), the following knobs matter.
File-IO backend
--io-backend <auto|blocking|uring|mmap>
The part-file is written from many workers concurrently, then read linearly by the decoder, then hole-punched. Three backends implement this differently:
| Backend | Workers write via | Puncher | Sockets |
|---|---|---|---|
blocking | pwrite(2) | fallocate(PUNCH_HOLE) / F_PUNCHHOLE | Blocking BSD sockets |
mmap (Linux only) | memcpy into MAP_SHARED | madvise(MADV_REMOVE) | Blocking BSD sockets |
uring (Linux only) | pwrite SQE on the ring | fallocate(PUNCH_HOLE) SQE | TCP connect / send / recv on the ring |
auto (default) | Probes each at startup |
What auto picks
On Linux, auto selects:
mmapfor the part-file if the filesystem supportsMADV_REMOVE(probed at startup with a small test mapping). All major Linux filesystems do. The probe fails on some unusual mounts (for example,tmpfsdoes not acceptMADV_REMOVEand falls back cleanly).io_uringfor the HTTP client's sockets ifio_uring_setupsucceeds. Falls back to blocking sockets with oneinfo!log if the kernel rejects ring construction (kernel < 5.6, seccomp blocking such as cri-o's default profile under Kubernetes, orRLIMIT_MEMLOCKtoo low).
On non-Linux platforms (macOS, BSD), auto selects the blocking
backend for both sockets and file IO. No io_uring equivalent exists.
Mmap with hole-punching works but does not beat the blocking path by
enough to default to it.
When to override
- A/B benchmarking.
--io-backend blockingforces the pre-io_uringpath everywhere. Useful for measuring the speedup the fast paths contribute on a given network. - Hard requirement on
io_uring.--io-backend uringerrors out if the kernel cannot construct a ring. Suitable for CI verification that the fast path is actually in use. - Hard requirement on
mmap.--io-backend mmapselects the mmap part-file path explicitly with the blocking socket backend. Same memcpy-into-the-mapping shape, but noio_uringfor sockets.
Confirming what got selected
RUST_LOG=info peel <URL> -o ./out/ 2>&1 | grep 'selected'
# selected file IO backend = mmap
# selected socket backend = io_uring
HTTP version
--http-version <auto|h1|h2>
| Value | Behaviour |
|---|---|
auto (default) | ALPN-negotiate H1 / H2 over TLS; H1 over plaintext |
h1 | Force HTTP/1.1 |
h2 | Force HTTP/2 (prior-knowledge h2c over plaintext) |
When to override
- Suspected H2 misbehaviour. Some origins (and some middleboxes,
typically corporate proxies) handle ranged GETs better on H1 than on
H2. Force
--http-version h1to test. - Origin only speaks
h2c. Plaintext H2 does not ALPN-negotiate and must be selected explicitly. - Origin negotiation fails. TLS handshakes that succeed but
negotiate something
peelcannot use surface as a clear error with the negotiated protocol name.
Bandwidth and disk caps
--max-bandwidth <RATE>
Aggregate token-bucket cap across all workers and mirrors.
Accepts decimal (K, M, G, T, 1000-based) or binary (Ki, Mi,
Gi, Ti) suffixes. Trailing B and /s are accepted and ignored.
peel <URL> --max-bandwidth 50MB/s -o ./out/ # 50 megabytes/s, decimal
peel <URL> --max-bandwidth 512MiB/s -o ./out/ # 512 mebibytes/s, binary
peel <URL> --max-bandwidth 1000000 -o ./out/ # 1 million bytes/s
The cap is aggregate, not per-mirror: --max-bandwidth 50MB/s
with three mirrors caps the total at 50 MB/s, not 150 MB/s.
When to use it:
- Polite scraping of a public mirror.
- Co-tenant workloads where
peelmust not saturate the pipe. - Reproducible benchmarks needing a deterministic wire-time floor.
--max-disk-buffer <SIZE>
Cap on the on-disk lookahead: bytes downloaded but not yet consumed by the decoder. When the gap reaches this value, the scheduler stops dispatching new chunks until the decoder catches up.
peel <URL> --max-disk-buffer 256MiB -o ./out/ # tighter cap for memory-constrained env
peel <URL> --max-disk-buffer none -o ./out/ # disable
Default 1GiB. The default rarely engages on a healthy disk and
bounds the part-file's physical size on a slow one.
When to lower it:
- Containers with a hard ephemeral-disk quota (Kubernetes pods with
small
emptyDir, CI runners with capped tmpfs). - A network much faster than the disk (10 Gbps NIC and a spinning disk output target) where the part-file must not balloon before the decoder catches up.
When to raise it (or disable):
- Decoder is the bottleneck and network bursts should be absorbed fully into the buffer.
- Very fast disk with a slow or bursty network where pre-buffering wins.
Worker count
--workers <N>
Default 4. The scheduler will not dispatch more than this many
concurrent ranged GETs against the primary or any mirror.
Tuning matrix:
| Symptom | Direction |
|---|---|
| Wire under-utilised, origin far away (high RTT) | Raise: more workers in flight overlap the RTT |
| Origin returns 429 / 503 under load | Lower: back off the per-origin parallelism |
| Per-worker throughput collapses with more workers | Lower: local CPU or NIC is the bottleneck |
| Memory pressure from many in-flight buffers | Lower: each worker holds its in-flight chunk |
The default 4 suits a laptop-class machine on a healthy network. On
a high-spec server pulling from a far CDN, 8–16 is often faster. On a
constrained client, 2 can win.
Chunk-size tuning
--chunk-size <BYTES> + --no-adaptive-chunk-size
The bitmap chunk size is the unit of completion tracked in checkpoints (default 4 MiB). It is also the smallest possible ranged GET.
With adaptive sizing (the default), the scheduler watches per-GET latency and retry rate and may coalesce several consecutive bitmap chunks into a single ranged GET:
- 1 MiB floor, 64 MiB cap.
- 30 s hysteresis: the scheduler waits before reacting to transient changes.
- Bitmap unit and dispatch unit are decoupled. Checkpoints stay fine-grained while the wire-level request size scales with the network.
Pass --no-adaptive-chunk-size to lock dispatch to the bitmap unit.
The scheduler then dispatches exactly one bitmap chunk per worker
task, with no growth or shrink decisions over the lifetime of the
run. Useful for benchmarking and reproducible test runs.
Puncher and checkpoint cadence
--punch-threshold <BYTES>
Minimum gap between in-loop hole-punch syscalls (default 4 MiB).
- Smaller: tighter physical-disk footprint, more syscalls.
- Larger: fewer syscalls, larger transient physical footprint.
Tune downward for a hard ceiling on physical disk usage. Tune upward if the filesystem's punch-hole implementation is slow (some network-attached storage backends have noticeable per-punch cost).
--checkpoint-min-bytes / --checkpoint-min-secs / --checkpoint-target-secs
See Checkpoint and resume for the full discussion of these.
Defaults: 8 MiB, 2 s, 0.2 s target. Raise the min-bytes floor when
the filesystem has slow fsync and it dominates wall-clock. Lower it
for tighter resume granularity on very long runs.
Workdir placement
--workdir <DIR>
Place the .peel.part and .peel.ckpt sidecars in a separate
directory from the output.
Use cases:
-
Slow output disk, fast scratch disk. Extract onto slow HDD-backed
/data, keep the in-flight part-file on fast NVMe at/var/cache/peel:peel <URL> -o /data/out/ --workdir /var/cache/peel/ -
Persistent Kubernetes PVC, ephemeral container scratch. Output goes onto the PVC mount, sidecars onto the container's ephemeral scratch. Resume across pod restarts uses the PVC's data and the ephemeral checkpoint as a fast-path optimisation. After a pod restart, delete the ephemeral checkpoint and
peelre-derives state from the part-file's bytes. -
Read-only output filesystem. Output is a read-only mount where only the extracted contents are needed. Sidecars go to a writable scratch dir.
The directory is created if missing. Basenames stay the same
(<output_name>.peel.part, <output_name>.peel.ckpt). Only the
parent directory changes.
Progress and logging
peel emits a live three-line block on a TTY:
download: 412.3 MiB / 1.2 GiB (33.7%) @ 187 MB/s (4 workers, 312 MiB on disk)
extract : 387.1 MiB / 1.2 GiB (31.8%) @ 178 MB/s
eta : 4.6s
On a non-TTY (CI logs, redirected output), the progress UI falls
back to periodic tracing::info! lines. No extra flag is needed.
RUST_LOG=<level> controls verbosity:
RUST_LOG=warn: only warnings and errors. The default whenRUST_LOGis unset isinfo.RUST_LOG=info: startup banners (selected backend, discovered volumes, mirror probes, checkpoint cadence summaries).RUST_LOG=debug: per-chunk dispatch, per-checkpoint writes, per-mirror selection decisions.
RUST_LOG=debug peel <URL> -o ./out/ 2>peel-debug.log
A typical tuning workflow
- Run with defaults. Inspect the progress UI's download rate and "on disk" footprint.
- If the download rate is far below what the network should
support, raise
--workers(try 8, then 16). - If the rate is fine but the physical disk footprint is
uncomfortable, lower
--max-disk-bufferand--punch-threshold. - If
fsyncdominates CPU on a slow disk, raise--checkpoint-min-bytes(try 64 MiB). - Fallback warnings from
--io-backend autoare expected on most non-Linux or restricted-kernel hosts. Verify withRUST_LOG=infothat the blocking backend is selected, then check throughput before concluding the fallback matters.
Kubernetes init container
A Kubernetes init container that hydrates a PersistentVolumeClaim from
a remote archive is a primary peel use case. The PVC is sized for
the extracted contents plus a small download window, not for
compressed + extracted. Resume across pod restarts is automatic.
The minimal Pod spec
apiVersion: v1
kind: Pod
metadata:
name: model-server
spec:
volumes:
- name: model
persistentVolumeClaim:
claimName: model-pvc
initContainers:
- name: hydrate
image: ghcr.io/agouin/peel:latest
args:
- https://models.example.com/llama-3.tar.zst
- --sha256
- ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
- --max-bandwidth
- 200MB/s
- -o
- /model/
volumeMounts:
- name: model
mountPath: /model
containers:
- name: app
image: ghcr.io/example/model-server
volumeMounts:
- name: model
mountPath: /model
readOnly: true
Properties of this configuration:
- PVC sizing:
extracted_size + ~300 MB, notarchive_size + extracted_size. A 40 GiB extracted model fits on a 41 GiB PVC. - Resume across pod restarts: if the init container OOM-kills or
the node reboots mid-extraction, the next pod restart picks up at
the last checkpoint. The sidecars (
.peel.part,.peel.ckpt) live on the PVC, so they survive the restart. - Integrity:
--sha256verifies the source end-to-end. A corrupted upstream produces a clear failure rather than a silently-bad model. - Bandwidth limiting:
--max-bandwidth 200MB/sprevents the hydration from saturating shared cluster network.
Sizing the PVC
Roughly:
PVC size = extracted_size # the model / dataset
+ --max-disk-buffer (default 1G) # in-flight window
+ ~100 MiB # checkpoint + filesystem overhead
+ some slack # for the workload to grow
If disk is tight, lower --max-disk-buffer:
args:
- https://models.example.com/llama-3.tar.zst
- --max-disk-buffer
- 256MiB
- -o
- /model/
This tightens the lookahead floor to 256 MiB. The decoder blocks briefly when the network outruns it. There is no correctness penalty.
Sidecars on ephemeral scratch
To keep per-pod sidecars off a shared PVC, place them on the container's writable scratch layer:
args:
- https://models.example.com/llama-3.tar.zst
- --workdir
- /tmp/peel-state
- -o
- /model/
volumeMounts:
- name: model
mountPath: /model
Tradeoff: the sidecars do not survive a pod restart, so resume is lost. The next pod restart re-fetches from scratch. This is acceptable for short-running hydration and unsuitable for large archives over flaky networks.
RBAC / network policy
peel makes outbound HTTP/HTTPS to the supplied URLs. It does
not talk to the Kubernetes API. The cluster's egress policy must
allow the origin host(s). The peel container itself requires no
elevated permissions: it runs as a normal user and requires no
CAP_* capabilities.
Multi-mirror in-cluster
For intra-cluster mirrors of the same archive (e.g. a MinIO bucket
inside the cluster plus a public origin outside), use
--mirror:
args:
- https://internal-cache.svc.cluster.local/llama-3.tar.zst
- --mirror
- https://models.example.com/llama-3.tar.zst
- --sha256
- ba7816bf...
- -o
- /model/
peel prefers the internal mirror (faster, no egress cost) and
falls back to the public origin only if the internal one fails.
io_uring inside the pod
By default on Linux 5.6+, peel uses io_uring for sockets.
cri-o's default seccomp profile blocks io_uring_* syscalls, so
in practice peel logs one fallback warning at startup and continues
with the blocking backend. Two options exist for enabling io_uring:
- Loosen the seccomp profile (requires
securityContext.seccompProfile.type: Unconfinedor a custom profile that allowsio_uring_*). - Accept the fallback.
peeloperates correctly withoutio_uring, with reduced throughput on high-bandwidth links.
Most clusters take option 2. Revisit if the workload is bandwidth-bound.
A complete example with secrets
apiVersion: v1
kind: Secret
metadata:
name: archive-password
type: Opaque
stringData:
password: my-archive-password
---
apiVersion: v1
kind: Pod
metadata:
name: hydrated-app
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: data-pvc
initContainers:
- name: hydrate
image: ghcr.io/agouin/peel:latest
env:
- name: PEEL_PW
valueFrom:
secretKeyRef:
name: archive-password
key: password
args:
- https://example.com/secret.tar.zst
- --password-from
- env:PEEL_PW
- --sha256
- ba7816bf...
- -o
- /data/
volumeMounts:
- name: data
mountPath: /data
containers:
- name: app
image: ghcr.io/example/app
volumeMounts:
- name: data
mountPath: /data
readOnly: true
The password is mounted as an env var via the standard Secret
mechanism. peel reads it via --password-from env:PEEL_PW. The
secret never appears on the command line.
Comparison with curl + tar -x
The naive shape, which has the problems described below:
# NOT recommended
- name: hydrate
image: alpine
command: [sh, -c]
args:
- |
apk add curl tar zstd
curl -fL "$URL" -o /tmp/data.tar.zst
tar -I zstd -xf /tmp/data.tar.zst -C /data/
rm /tmp/data.tar.zst
Problems:
- PVC size: peak disk =
archive_size + extracted_size. A 40 GiB extracted model needs an 80+ GiB PVC. - No resume: OOM-kill mid-download restarts from byte 0.
- No integrity: curl does not verify a hash. Layering
sha256sumon afterwards is a separate step. - Single TCP stream: parallel ranged GETs are faster on high-RTT origins.
- Image bulk:
apk addpulls packages every pod restart.
A single peel invocation addresses all of these.
CI runner with flaky network
CI runners typically have small ephemeral disks, flaky outbound network (especially in self-hosted runners behind corporate proxies), and a strong preference for fail-fast. A job that silently produces a different artifact is worse than a job that errors clearly.
peel addresses all three:
- Bounded compressed-side disk usage via the sliding lookahead window.
- Resume on transient network failure without losing the partial download.
--strict-formatand--sha256turn upstream drift into a clear exit code 1 rather than a degraded build.
GitHub Actions example
name: ml-test
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install peel
run: cargo install peel-rs --locked
- name: Hydrate model fixtures
run: |
peel \
https://fixtures.example.com/models-v3.tar.zst \
--sha256 ${{ vars.MODELS_SHA256 }} \
--strict-format \
--max-disk-buffer 512MiB \
-o ./fixtures/
- name: Run tests
run: cargo test --release
Flag behavior:
--sha256 ${{ vars.MODELS_SHA256 }}: the expected hash is in the repo's Actions variables, so a wrong-fixture upload fails CI immediately. The hash is in version control implicitly. It ratchets forward as the team uploads new fixtures and updates the variable.--strict-format: if the upstream URL ever serves a different shape (e.g. a 404 HTML page with a 200 status code from a misbehaving proxy), the run fails clearly instead of producing a corrupt fixtures directory.--max-disk-buffer 512MiB: GitHub-hosted runners have ~14 GB free. Capping the lookahead avoids transient disk pressure during hydration.
GitLab CI example
test:
stage: test
image: rust:1.93
before_script:
- cargo install peel-rs --locked
script:
- >
peel
"$FIXTURE_URL"
--sha256 "$FIXTURE_SHA256"
--strict-format
--max-disk-buffer 256MiB
-o ./fixtures/
- cargo test --release
variables:
FIXTURE_URL: https://fixtures.example.com/models-v3.tar.zst
FIXTURE_SHA256: ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
cache:
paths:
- fixtures.peel.part
- fixtures.peel.ckpt
The cache directive retains fixtures.peel.part and
fixtures.peel.ckpt between runs. If a previous run was interrupted
partway through (timeout, runner restart), the next run resumes from
the checkpoint, saving network bandwidth and wall-clock on every retry.
Self-hosted runner behind a corporate proxy
A frequent CI failure mode is a self-hosted runner behind an HTTPS
proxy that does TLS termination with its own CA. peel honours
SSL_CERT_FILE:
- name: Hydrate fixtures
env:
HTTPS_PROXY: https://proxy.corp.example.com:8443
SSL_CERT_FILE: /etc/ssl/certs/corp-bundle.pem
run: peel "$FIXTURE_URL" --sha256 "$FIXTURE_SHA256" -o ./fixtures/
If the proxy mangles HTTP/2 (the most common cause of intermittent hydration failures on locked-down corporate networks), force HTTP/1.1:
run: peel "$FIXTURE_URL" --http-version h1 -o ./fixtures/
Caching the extracted output
If the CI cache supports it, cache the extracted output directly rather than only the sidecars:
- uses: actions/cache@v4
with:
path: ./fixtures/
key: fixtures-${{ vars.MODELS_SHA256 }}
- if: steps.cache.outputs.cache-hit != 'true'
run: peel "$URL" --sha256 "$SHA256" -o ./fixtures/
Hydration runs only when the cache misses. The cache key includes the SHA-256, so an updated fixture set automatically invalidates the cache.
Failing the build on upstream drift
The combination of --sha256 + --strict-format is the strongest
guarantee:
| Failure | --sha256 catches? | --strict-format catches? |
|---|---|---|
| Upstream re-uploaded a corrupted file | ✓ | |
| Upstream serves a 200 status on a 404 HTML body | ✓ | |
Upstream changed the format (.tar.zst → .tar.gz) | ✓ | |
| Upstream re-uploaded a legitimately-different file | ✓ | |
| Mirror is serving stale content | ✓ |
Use both in CI. Omit them only when downloading a non-deterministic resource by intent.
Comparison with actions/cache
If the CI has a well-managed artifact cache (sized, verified,
mirrored), and the archive is small enough that download time is not
a concern, actions/cache (or actions/restore-cache, or the CI's
equivalent) is simpler. peel is preferable when:
- The archive is large enough that hydration time matters.
- End-to-end verification of the source is required, not just the cache.
- The CI's cache TTL is shorter than the fixture's lifetime, so cache misses force a re-hydration where bounded disk and resume matter.
- Integration is from outside the CI (e.g. a pre-job step in a test orchestrator that lacks CI-native caching).
Exit code handling
CI scripts want to distinguish "fixture hydration failed transiently" from "fixture is wrong":
#!/usr/bin/env bash
set -u
peel "$URL" --sha256 "$SHA256" --strict-format -o ./fixtures/
rc=$?
case "$rc" in
0) echo "fixtures ready"; exit 0 ;;
1)
# Generic failure: could be transient network, disk full, hash mismatch.
# Check stderr to distinguish. For CI, retry once.
echo "first attempt failed; sleeping 10s then retry"
sleep 10
peel "$URL" --sha256 "$SHA256" --strict-format -o ./fixtures/
;;
*)
echo "peel failed with $rc; not retrying"
exit "$rc"
;;
esac
See Exit codes for the full list.
Arbitrum snapshot bundle
Arbitrum publishes Nitro chain snapshots as multi-part archives:
pruned.tar.part0000 through pruned.tar.partNNNN, each ~5–15 GiB,
totalling 200–500 GiB depending on chain and pruning mode. Per-part
SHA-256s are published in a manifest.
This workload uses peel's multi-part URL
path. The bundled scripts/arb-snapshot.sh in the repo wraps it.
The manual version
peel \
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0000 \
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0001 \
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0002 \
... \
--sha256 0a8de6e83fd8ba040fd052fd8d4fd0e009a9736ace5cb32bb2abd4ac6a61725d \
--sha256 1bcf4d2e9aa01ff5e8aa72a2ab39310af020bdb6f76d6f7c75c7c14ade38c6ce \
--sha256 c40bf8a2cb9d9a90e4c80a5b7c6e9c5d3b8a2e1f9d4a6c1b7e2f8d3a5c0b9e1f0 \
... \
-o ./nova-out/
The byte-concatenation of every URL's body is decoded as one
logical pruned.tar, written into ./nova-out/. Per-part hashes
are verified at each part boundary as the decoder advances.
Via a manifest file
For chains with dozens of parts, a per-line manifest is cleaner:
# nova-volumes.txt
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0000
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0001
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0002
# ... etc
peel @nova-volumes.txt -o ./nova-out/
For hashes, generate a --sha256 arg list from the published
manifest:
peel @nova-volumes.txt \
$(jq -r '.parts[] | "--sha256 \(.sha256)"' nova-manifest.json) \
-o ./nova-out/
Disk math
A typical Nitro Nova snapshot:
- Total compressed (sum of all parts): ~120 GiB
- Extracted: ~340 GiB
With peel, peak disk = extracted_size + lookahead_window ≈
~341 GiB. The --max-disk-buffer default (1 GiB) bounds the
compressed-side window.
Without peel (download-all-then-extract):
peak disk = compressed_size + extracted_size ≈ ~460 GiB.
This is a 120 GiB savings on a single node. For a fleet, the multiplier matters. For a one-node bootstrapping flow on tight disk, it is the difference between "works" and "does not fit."
On Kubernetes
Snapshot hydration matches the
Kubernetes init container workflow. The PVC sizes
to ~extracted_size + 1 GiB instead of ~compressed + extracted,
and a pod restart mid-hydration resumes at the last checkpoint:
initContainers:
- name: hydrate-nova
image: ghcr.io/agouin/peel:latest
args:
- @/manifest/nova-volumes.txt
- --sha256-from-file=/manifest/nova-hashes.txt # (planned)
- --max-bandwidth
- 500MB/s
- -o
- /chain/
volumeMounts:
- name: chain
mountPath: /chain
- name: manifest
mountPath: /manifest
readOnly: true
(--sha256-from-file is on the roadmap; for now, expand inline via
shell substitution.)
Bandwidth limiting
Arbitrum snapshot mirrors are CloudFront-fronted with generous burst
allowances, but a fleet of nodes hydrating simultaneously will hit
rate-limits. The default --workers 4 is conservative. Raise to
--workers 8 on a fat pipe when needed. Add
--max-bandwidth 500MB/s to bound aggregate throughput.
Recovery from kill -9
Snapshot hydration is a long-running, interruptible workload. Power loss, OOM, scheduler eviction, and upstream rate-limit-induced retries are all normal.
In every case, re-run the same command:
peel @nova-volumes.txt --sha256 ... -o ./nova-out/
peel reads the .peel.ckpt next to ./nova-out, picks up at the
checkpointed part and byte, and continues. Bytes already extracted to
./nova-out/ are kept, not re-written. Final output is byte-identical
to a clean single run.
See also
- The shipped wrapper script:
scripts/arb-snapshot.sh. - Multi-part URLs: the full feature reference.
- Integrity verification: how per-part
--sha256is verified. - Checkpoint and resume: what the
.peel.ckptsidecar captures across the part boundary.
Bare downloader (aria2c replacement)
peel --no-extract is a parallel-ranged-GET downloader with mirror
fan-out, SHA-256 verification, and resume. It covers the same surface
as aria2c, minus the extract step.
The basic case
peel https://example.com/big-file.iso --no-extract
Behavior:
- Issues 4 parallel ranged GETs against the URL.
- Writes the bytes to
<basename>.peel.part(sparse). - On clean completion, renames to the final filename (
big-file.iso).
The bytes never pass through a decoder. No hole-punching occurs,
since no decoder advances the puncher. The part-file grows to the
full Content-Length.
--download-only is an alias for callers who prefer aria2c-style
naming.
With explicit output path
peel https://example.com/big-file.iso --no-extract -o /downloads/
peel https://example.com/big-file.iso --no-extract -o /downloads/renamed.iso
Same semantics as extract mode: a trailing slash makes -o a
directory, otherwise it is the final file path.
With hash verification
peel https://example.com/big-file.iso \
--no-extract \
--sha256 ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
The SHA-256 is checked against the downloaded bytes. This is the
same hash that sha256sum big-file.iso would produce after the
download finishes, without the separate hash step.
With mirror fan-out
peel https://primary.example.com/big-file.iso \
--mirror https://eu.mirror.example.com/big-file.iso \
--mirror https://us.mirror.example.com/big-file.iso \
--sha256 ba7816bf... \
--no-extract
All Multi-mirror downloads machinery applies: parallel HEAD validation at startup, per-mirror health tracking, 30s exclusion on failure, aggregate bandwidth cap.
With bandwidth cap
peel https://example.com/big-file.iso \
--no-extract \
--max-bandwidth 10MB/s
Useful when:
- Downloading on a shared link where saturating the pipe is disruptive.
- Cron-scheduled downloads that should run at steady-state rather than burst-and-idle.
With resume across kills
--no-extract has the same resume guarantee as extract mode.
Ctrl-C / kill -9 / network drop / OOM:
peel https://example.com/big-file.iso --no-extract
# ... interrupted at 40% ...
peel https://example.com/big-file.iso --no-extract
# Picks up where it left off, completes the rest.
The sidecars (big-file.iso.peel.part and big-file.iso.peel.ckpt)
stay on disk between runs.
Choosing peel --no-extract over aria2c
| Need | Tool |
|---|---|
| Parallel ranged GETs | both |
Resume on kill -9 | both |
| Multiple URLs treated as one logical file | both (aria2c -Z, peel's multi-part URL path) |
| Multiple URLs serving the same file (mirror fan-out) | both |
| SHA-256 verification | both |
| Hand-rolled, vetted single-binary install | peel |
| Out-of-band integration with a streaming extract step | only peel (toggling --no-extract switches to default extract mode) |
| Bittorrent / Metalink / multi-protocol | only aria2c |
| Browser-style cookie handling, OAuth, etc. | only aria2c |
peel --no-extract applies when callers want the
streaming/resumable/parallel guarantee without a separate
extract step. For a plain download (no archive) where the eventual
extract is not a concern, it offers aria2c-level UX in one binary.
A typical script
#!/usr/bin/env bash
set -euo pipefail
URL=https://example.com/big-file.iso
SHA256=ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
OUT=/downloads/big-file.iso
peel "$URL" \
--no-extract \
--sha256 "$SHA256" \
--max-bandwidth 50MB/s \
-o "$OUT"
echo "Downloaded and verified: $OUT"
Re-running this script after any failure resumes from the last
checkpoint. Re-running it after a clean completion is a no-op
(the file is already at $OUT, the sidecars are gone, and the next
invocation downloads from scratch because there is no checkpoint to
resume from).
For "only download if not already present" semantics, wrap with a check:
if [ ! -s "$OUT" ]; then
peel "$URL" --no-extract --sha256 "$SHA256" -o "$OUT"
fi
(The -s test checks for non-empty, which catches both "missing"
and "empty partial".)
Non-goals
- Torrent client. No DHT, no peers.
- Protocol-coercing tool. HTTP / HTTPS only.
- Auth-aware downloader. No OAuth flow, no browser-cookie
import. For URLs that require auth, pre-sign or pass a custom
Authorizationheader via a reverse proxy.peelhonoursHTTP_PROXY/HTTPS_PROXY/NO_PROXYenv vars.
Troubleshooting
Symptoms, likely causes, and verification steps. For problems not listed here, see Exit codes for the error code key and FAQ for design-rationale questions.
"No space left on device"
The extraction filled the output filesystem. Two causes:
-
The extracted tree is genuinely bigger than the free space. Check the archive's expected uncompressed size (most formats report it in their metadata, and
peellogs it on the progress UI's first line). If that exceeds free space, more disk is required: the sliding window only bounds the compressed side. -
The part-file's lookahead window grew faster than the decoder consumed it. Lower
--max-disk-buffer(default 1 GiB) so the scheduler back-pressures sooner:peel <URL> --max-disk-buffer 128MiB -o ./out/Then confirm hole-punching is working (see the next section).
"Hole-punching seems disabled / part-file is huge"
Check the part-file's physical size (du -h) versus its
logical size (ls -la):
$ ls -la out.peel.part out.peel.ckpt
-rw-r--r-- ... 10737418240 ... out.peel.part # 10 GiB logical
$ du -h out.peel.part
182M out.peel.part # 182 MiB physical (healthy)
If du is close to ls, hole-punching is not trimming. Possible
causes:
-
-k/--keep-archiveis set. The puncher is intentionally disabled. Remove-kif archive preservation is not required. -
--no-extractis set. Nothing decodes, so nothing punches. Expected for--no-extract: the bytes are kept verbatim. -
The filesystem does not support punch-hole. Some unusual mounts and old kernels reject it.
peellogs awarn!at startup when the probe fails:WARN filesystem rejected MADV_REMOVE probe, falling back to fallocate(PUNCH_HOLE) WARN filesystem rejected fallocate(PUNCH_HOLE) probe, source bytes will not be releasedMove the workdir to a filesystem that supports it (
--workdir /var/tmp/peel), or accept the larger transient footprint.
"io_uring fallback warning"
On Linux, one of these messages may appear:
WARN io_uring_setup failed (errno=1 EPERM), falling back to blocking sockets
WARN io_uring not available (kernel < 5.6), falling back to blocking sockets
WARN RLIMIT_MEMLOCK too low for io_uring (need at least N KiB), falling back to blocking sockets
These are informational, not errors. peel falls back to the
blocking backend and continues. The fallback path is the same code
every non-Linux build uses; results are correct either way.
To force io_uring and fail-fast when it is not available:
peel <URL> --io-backend uring -o ./out/
Common causes:
- Seccomp profile blocks the syscalls.
cri-o's default profile is the most common case under Kubernetes. Addio_uring_*to the allowed syscalls or accept the fallback. - Kernel too old. Minimum 5.6 for the SQEs
peeluses. In practice 5.10+ is more reliable. RLIMIT_MEMLOCKtoo low. Container default may be 16 KiB, whileio_uringrings need a few MiB. Raise the limit (ulimit -l unlimitedin the container spec) or accept the fallback.
"Wrong digest at completion"
error: digest mismatch
expected: ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
got: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
The streamed bytes did not hash to the value asserted by --sha256.
Possibilities:
- The source moved or was re-uploaded. Re-download a small
sample (
curl --range 0-1023 $URL | sha256sum) and compare against what the publisher currently advertises. - Wrong hash supplied. Double-check the hash source.
- A
--mirroris serving subtly different bytes. Remove mirrors one at a time to identify the culprit.peeldrops misbehaving mirrors at the HEAD validation step, but some forms of subtle corruption only show up post-decoder.
"Source changed during run"
error: source changed during run
chunk 1247 fingerprint mismatch: stored=…, refetched=…
The CRC32C fingerprint of a chunk does not agree across fetches. The source bytes changed between when one worker fetched the chunk and when another did. Causes:
- CDN-edge cache drift. Common with mirror infrastructure that is mid-rollout. Wait for propagation, then re-run.
- Origin re-uploaded the file. Check the upstream publishing timeline.
- Network corruption. Rare with TLS, but observed on some middlebox-heavy paths. Repeat the run; the second attempt usually succeeds.
Delete the .peel.ckpt to start fresh from the part-file's bytes
(which peel re-verifies chunk-by-chunk), or delete both sidecars
to start completely from scratch.
"ETag mismatch on resume"
error: source identity changed since last run
ETag at startup: "8b1a9953c4611296a827abf8c47804d7"
ETag now: "65a8e27d8879283831b664bd8b7f0ad4"
The source's ETag (or Last-Modified) changed between the start of
the run and the resume. peel aborts rather than silently mixing
bytes from two different versions of the file.
Two fixes:
- Delete the sidecars and start from scratch. The bytes already on disk belong to the previous version and are not useful.
- Pass
--sha256of the new version. With--sha256set, the hash is the source of truth, and agreeing mirrors are trusted regardless of ETag drift.
"Multi-volume probe returned 404s"
warn: multi-volume probe: backup.part0005.rar returned 404, stopping at 4 volumes
This is normal: auto-discovery stops at the first missing
volume. The reported count is what peel will fetch.
If more volumes are expected than the probe found:
- The volumes may be on a different host or path.
- The numbering may have a gap.
- The volumes may use a different convention than the seed implies.
Use the explicit positional list or an @manifest.txt file instead.
"Wrong format detected"
error: format mismatch: URL suffix says .tar.zst, magic bytes are 0x1f 0x8b (gzip)
pass --force-format-from-magic to trust magic, or --format <NAME> to pin
The URL suffix and the magic bytes disagree. Three options:
- Trust the magic when the file is known to match its bytes:
--force-format-from-magic. - Pin the decoder when the expected format is known:
--format zstd(or whichever). - Investigate the source. The file may have been re-encoded without renaming, or the URL is genuinely serving the wrong file.
"Permission denied" writing the output
The output's parent directory is not writable for the user running
peel. The error message names the path:
error: cannot create output directory ./out/: Permission denied (os error 13)
peel does not elevate privileges. Run as a user with write
permission on the output path, or use --workdir to relocate only
the sidecars to a writable location while writing the final
extracted tree to a location the user owns.
"Output file already exists" (or seems to)
peel overwrites the extracted output:
- For tree-shaped outputs (tar, zip, 7z, rar), existing files at paths matching an archive entry are overwritten. Existing files at paths not in the archive are left alone.
- For stream-shaped outputs (raw
.zst,.xz,.lz4,.gz), the output file is overwritten unconditionally.
For a non-destructive run, point -o at a fresh directory.
"I want to interrupt and resume later"
Press Ctrl-C (or send SIGTERM). peel traps the signal, flushes
the in-flight checkpoint, and exits with code 130 (SIGINT) or 143
(SIGTERM). The sidecars stay on disk. Re-run the exact same command
to resume.
kill -9 (SIGKILL) is also safe. peel is designed so that even
an ungraceful kill leaves the part-file's bytes and the last
checkpoint in a consistent state, and the next run reconciles.
"Where do the logs go?"
stderr. The progress UI block goes to stderr as well, redrawn in
place on a TTY. To capture:
peel <URL> -o ./out/ 2>peel.log # only log
peel <URL> -o ./out/ 2> >(tee peel.log >&2) # log and show
RUST_LOG=debug peel <URL> -o ./out/ 2>peel.log # verbose
"Live progress UI shows wrong percentage"
The percentage is streamed_bytes / Content-Length. Two pitfalls:
- For multi-part URLs, the denominator is
the sum of all parts'
Content-Lengthvalues (accurate). - For a server that does not return
Content-Length(rare, mostly badly-configured proxies),peelfalls back to a chunk-count progress estimate. The percentage will be approximate.
Getting better diagnostics
Always run with RUST_LOG=info (or RUST_LOG=debug) when filing a
bug report:
RUST_LOG=debug peel <URL> -o ./out/ 2>peel-debug.log
The first few lines list the selected backends, the discovered volumes and mirrors, and the format detection result. Misbehaviour typically shows up there.
FAQ
Design-rationale notes and answers to common "why does peel do X"
questions.
No --password=<value> flag
argv is visible to every process on the host via ps,
/proc/<pid>/cmdline, and Get-Process -IncludeUserName. A
passphrase on the command line is read by:
- Any unprivileged process on the host (until the process exits).
- Anything that scrapes process listings: monitoring agents, shell history collectors, exit-code-replaying scripts.
- Container observability tools that log process state on crash.
--password-from keeps the passphrase out of argv. env:NAME,
file:PATH, and fd:N integrate with non-interactive workflows.
For a single-line non-interactive invocation,
PEEL_PW=… peel … --password-from env:PEEL_PW is two characters
longer than --password=… and avoids the visibility problem.
Why .bz2 support was added in round two
Round one shipped without .bz2 on the basis that bzip2 is a slow,
single-threaded codec superseded by xz (better ratio) and zstd
(faster). That priors-only argument held until a real corpus
arrived as .tar.bz2 and bunzip2 | peel /dev/stdin discarded the
streaming + resume properties peel exists to deliver — the source
side's on-disk footprint went unbounded across the workaround pipe,
and a mid-extraction kill -9 restarted the whole thing.
.bz2, .tar.bz2, .tbz2, and .tbz are now first-class formats.
The pipeline matches the other compressed .tar.* codecs: parallel
ranged HTTP downloads, in-flight streaming decompression,
fallocate(PUNCH_HOLE) reclaim of the compressed source as the
decoder advances, and per-block frame-aligned checkpointing so a
crash mid-extraction resumes exactly where it left off. See
internal/PLAN_bz2_support.md for the engineering plan and
trade-offs (randomised blocks, mid-block resume, parallel block
decoding) deferred from this round.
No raw lzma support (only xz)
XZ is the modern container format wrapping LZMA2. Raw .lzma
(without the XZ headers) was the LZMA1 era's format and is rare in
modern publishing. peel's decoder is per-cycle equivalent to
liblzma on the XZ path. Adding the raw LZMA1 framing is in the
backlog but does not fit the streaming-from-HTTP workflow peel
targets.
No nested archive handling
Each invocation of peel extracts one archive. Chain invocations
for a nested archive:
peel https://host/outer.tar.zst -o ./outer/
peel ./outer/inner.zip -o ./final/ # local mode
Nested-archive auto-detection adds an order-of-magnitude of complexity (filesystem walking, recursion limits, archive bombs) for no compelling user-facing win.
Reason for --no-extract
Three things peel --no-extract provides that plain curl does
not:
- Parallel ranged GETs, like
aria2c.curlhas--parallelbut it parallelises over many URLs, not over ranges of one URL. - Resume after
kill -9, with checkpointed state.curl -C -resumes a single in-flight transfer and does not survive a kill that lost the file descriptor. - Mirror fan-out and SHA-256 verification.
--mirror's per-mirror health tracking and aggregate token-bucket bandwidth cap are built in.
Use curl for one-off "download this one file fast". Use
peel --no-extract for parallel-GET, resume, mirror failover, or
hash verification (the full aria2c use case).
--mirror failover is parallel, not sequential
Modern CDN topologies are mostly symmetric: any mirror should serve any byte. Parallel scheduling across mirrors gives the aggregate bandwidth of all of them. Sequential failover wastes that.
When a mirror starts failing, the scheduler excludes it for 30 s and rebalances. The exclusion is logged for debugging which mirror went out of rotation.
For true sequential failover (mirror 2 used only if mirror 1 is totally unreachable), wrap with shell logic:
peel "$PRIMARY" -o ./out/ ||
peel "$BACKUP_1" -o ./out/ ||
peel "$BACKUP_2" -o ./out/
Corporate proxy support
peel honours the standard HTTP_PROXY, HTTPS_PROXY, and
NO_PROXY environment variables for outbound requests, plus
SSL_CERT_FILE for trust-store overrides.
For TLS errors with a corporate CA, point SSL_CERT_FILE at the
bundle that includes it:
SSL_CERT_FILE=/etc/ssl/certs/corp-bundle.pem peel <URL> -o ./out/
H2 through a corporate proxy is the most fragile combination.
--http-version h1 is the usual workaround when an H2-aware proxy
is doing something subtle wrong.
No Homebrew formula
The crates.io publish is the primary distribution path. The GitHub
release attachments cover platforms where cargo install is not
convenient. A Homebrew formula is on the wish list but not yet in
place. PRs welcome at https://github.com/agouin/peel.
Windows support
Not officially supported. The blocking backend and the codec machinery are platform-neutral and should work, but:
io_uringis Linux-only.mmapwithMADV_REMOVEis Linux-only.- The progress UI's terminal handling is tested on TTYs that behave
like xterm.
cmd.exeis not in the test grid. --password-from promptreads/dev/tty, which does not exist on Windows.
WSL2 is a reasonable workaround and provides the full Linux path. Native Windows support is open for contribution.
Large on-disk part-file logical size
The part-file is sparse. The logical size is the full archive
length (ls -la shows the full size), while the physical size
on disk is the in-flight window (du -h shows actual usage):
$ ls -la out.peel.part
-rw-r--r-- ... 10737418240 ... out.peel.part # 10 GiB logical
$ du -h out.peel.part
182M out.peel.part # 182 MiB physical
Tools that ignore sparse files (some backup tools, some tar
implementations) see the logical size. The actual disk usage is the
physical size.
Choosing --max-disk-buffer
Default 1 GiB rarely engages on a healthy disk. Tune it when:
- A hard ceiling on transient disk usage is required (CI runner
with small
/tmp). - The network is much faster than the disk and the lookahead grows unboundedly before the decoder catches up.
Common values: 256 MiB on memory-constrained or disk-constrained
containers, default 1 GiB on a laptop or server, disable
(--max-disk-buffer none) on a high-bandwidth host where the
network burst should be absorbed fully into the buffer.
Bench grid platform
The README's bench grid is single-machine, single-run on an Apple M4 Max. macOS was chosen because:
- The reference CLIs (
zstd,xz,lz4,gzip,7z,unzip,unrar) are all available as Homebrew packages with stable versions. peel'sblockingbackend is in use (noio_uringon macOS), so the grid measures the codec story alone. Linux-specific fast paths (mmap, io_uring) provide additional gains on top.
A Linux grid with the io_uring backend is in
internal/bench-results/.
Licensing
MIT OR Apache-2.0, at the user's option. The full text is at LICENSE-MIT and LICENSE-APACHE.
The RAR3 and RAR5 decoders are clean-room implementations.
RARLAB's unrar source has not been consulted at any point. See
Supported formats §RAR provenance.
Filing bugs or feature requests
GitHub Issues: https://github.com/agouin/peel/issues. Include the
output of peel --version, the command that was run, and (if
applicable) a RUST_LOG=debug log.