peel

Streaming, resumable, space-efficient extractor for compressed archives over HTTP, and for local archive files on disk.

peel https://example.com/dataset.tar.zst

peel is a Rust CLI that downloads, decompresses, and extracts an archive in a single pass. It resumes exactly where it left off after a dropped connection, kill -9, OOM kill, or power loss. The compressed bytes never fully land on disk: as the decoder consumes the prefix, the download buffer underneath is hole-punched out. The archive and the extracted tree never coexist at full size.

What it solves

  • Disk pressure. Pulling a 40 GB .tar.zst should not require 80 GB free. Peak disk usage is roughly extracted_size + a few hundred MB, not compressed_size + extracted_size.
  • Flaky networks. A dropped connection mid-download is the default case, not the edge case. peel resumes at the byte that was in flight.
  • kill -9 and pod restarts. Frame-aligned checkpoints (atomic write+fsync+rename) plus per-chunk fingerprints ensure a hard kill mid-extraction resumes exactly where it left off, byte-identical to a clean run.
  • Streaming .zip, .7z, .rar over HTTP. curl | unzip does not work: the ZIP central directory lives at the end of the file, the 7z trailer pointer sits at the end of the file, and unrar requires lseek on its input. peel issues a ranged GET for the central directory or trailer first (zip, 7z), or walks the RAR header chain in stream order (rar), then streams entries to disk as soon as their bytes arrive.

Format coverage at a glance

FamilyFormats
Plain.tar
Streaming codecs.zst / .tar.zst · .xz / .tar.xz · .lz4 / .tar.lz4 · .gz / .tar.gz
Random-access archives.zip · .7z · .rar (RAR5 + legacy RAR3/RAR4)

Encrypted archives are supported for zip (WinZip-AES, ZipCrypto), 7z (AES-256-CBC), and rar5 (AES-256-CBC, both archive-header and per-file). See Encrypted archives.

The full per-format matrix (magic-byte detection, resume granularity, encryption) is on the Supported formats page.

Distinguishing features

  1. Hole-punched compressed buffer. Parallel ranged HTTP downloads feed a sparse part-file. The decoder consumes the prefix while workers continue to fetch the suffix, and finished bytes are released back to the filesystem as the decoder advances. Peak compressed-side disk usage is the download window (approximately --max-disk-buffer), not the archive size.

  2. Frame-aligned, byte-identical resume. A kill -9 anywhere in the pipeline leaves a .peel.ckpt next to the part file. Re-running the same command picks up exactly at the checkpointed frame. The final output is byte-identical to a clean run. The crash-test harness runs 100 random kill points per format and asserts that property every time.

  3. One command for HTTP and local. A URL argument triggers parallel ranged GETs and streaming extract. A local file argument runs the same hand-rolled decoders against the file on disk: non-destructive by default, with hole-punching enabled via -d.

Where to next

Installation

peel is a single statically-linked binary. It has no runtime dependencies beyond a working libc, and on Linux a 5.6+ kernel for the io_uring fast paths. Older kernels fall back automatically.

From source (Cargo)

The currently-supported route. The crate name on crates.io is peel-rs. The installed binary is peel.

cargo install peel-rs --locked

The MSRV is pinned in rust-toolchain.toml; a recent stable Rust (1.93+) is sufficient.

Building from a checkout

git clone https://github.com/agouin/peel
cd peel
cargo build --release
./target/release/peel --help

Cargo features

FeatureDefaultWhat it enables
raronRAR5 and legacy RAR3/RAR4 decoders. When disabled, the binary still registers .rar against a diagnostic-only factory so the user sees compiled without the 'rar' feature instead of unknown format.

To drop the RAR module entirely (shrinks the binary; useful when .rar inputs are not expected):

cargo install peel-rs --locked --no-default-features

From a release binary

Pre-built binaries for Linux (x86_64, aarch64) and macOS (x86_64, aarch64) are attached to every GitHub release:

https://github.com/agouin/peel/releases

# Linux x86_64 example. Substitute your platform's triple.
curl -L https://github.com/agouin/peel/releases/latest/download/peel-x86_64-unknown-linux-gnu.tar.gz \
  | tar -xz -C /usr/local/bin peel
peel --version

Docker

docker run --rm -v "$PWD/out:/out" ghcr.io/agouin/peel \
  https://example.com/dataset.tar.zst -o /out/

The image is a FROM scratch build with the static peel binary plus a recent CA bundle (no shell, no package manager). See the Kubernetes init container example for usage inside a Pod.

Verifying the install

peel --version
peel --help | head

To confirm which file-IO backend peel selects at runtime on Linux, run any command with RUST_LOG=info and look for the selected file IO backend = … line:

RUST_LOG=info peel https://example.com/x.tar.zst -o ./out/ 2>&1 | head -5

See Performance and tuning for what each backend means.

Quick start

Five things peel does, in five copy-pasteable commands.

1. Extract a tarball over HTTP

peel https://example.com/linux-6.x.tar.xz

Without -o, the default extract directory is the URL basename with archive and compression suffixes stripped, in the current working directory. The example above lands the kernel sources in ./linux-6.x/.

To set an explicit path, a trailing slash forces directory semantics (useful when the URL has no recognisable suffix):

peel https://example.com/linux-6.x.tar.xz -o ./linux/

2. Extract a bare compressed file

For stream-shaped formats (raw .zst / .xz / .lz4 / .gz) the output is a single file, not a directory:

peel https://example.com/model.bin.zst -o ./model.bin

3. Download without extracting

Skip the decoder and write the bytes verbatim into a single file using parallel ranged GETs. The same scheduler, mirror, and resume machinery used by extract mode applies.

peel https://example.com/big.deb --no-extract

--download-only is an alias provided for compatibility with aria2c.

4. Extract a .zip, .7z, or .rar over HTTP

These formats place their index at the end of the file (curl | unzip does not work; see How it works). peel fetches the central directory or trailer first via a ranged GET, then streams entries to disk as they arrive:

peel https://example.com/dataset.zip   -o ./out/
peel https://example.com/snapshot.7z   -o ./out/
peel https://example.com/backup.rar    -o ./out/

For a password-protected archive, see Encrypted archives:

peel https://example.com/secret.zip -o ./out/ --password-from prompt

5. Extract a local file

For an archive already on disk, skip the HTTP machinery and run the same decoders against the local file. Non-destructive by default:

peel /tmp/dataset.tar.zst                # extracts to ./dataset/, archive untouched
peel /tmp/dataset.tar.zst -o ./out/      # explicit output dir
peel -d /tmp/dataset.tar.zst -o ./out/   # destructive: hole-punch and delete on success

See Local-file extraction for the full mode table.

Default behaviour

Every command above runs with these guarantees, without any extra flag:

  • Parallel ranged GETs. Default 4 workers, tunable with --workers.
  • Streaming decompression that overlaps with the download. Peak disk for the compressed side is the lookahead window, not the archive size.
  • Hole-punched compressed buffer. fallocate(PUNCH_HOLE) and madvise(MADV_REMOVE) release blocks of the part-file as the decoder advances past them.
  • Frame-aligned resume. A kill -9 mid-run leaves a .peel.ckpt next to the part-file. Re-running the same command resumes.
  • Live progress UI. Three-line block on TTY (download, extract, ETA, active workers, on-disk source footprint). Falls back to tracing::info! lines on a non-TTY without any extra flag.

Where to go next

How it works

This page describes the internal architecture. The Quick start is sufficient for basic use. This material covers why disk usage stays bounded, how resume converges to a byte-identical output, and what happens on the wire.

The three components

A peel run for an HTTP source has three loosely-coupled stages, each running concurrently:

   +-------------+      +----------+      +----------+
   |  download   | ---> | part-file| ---> | decoder  | ---> output tree
   |  workers    |      | (sparse) |      | + sink   |
   +-------------+      +----------+      +----------+
        |                    ^                  |
        |                    |                  v
        |              +-----------+      +-----------+
        +----------->  | scheduler |      |  puncher  |
                       | + bitmap  |      | (releases |
                       +-----------+      |  blocks)  |
                                          +-----------+
  1. Download workers fetch ranges of the source object in parallel via ranged GETs. Each worker writes its bytes into the sparse .peel.part file at the byte offset the scheduler assigned it.
  2. The decoder walks the part-file from offset 0, consuming whatever the workers have already written, blocking briefly when it gets ahead of them.
  3. The puncher trails the decoder. As the decoder advances past a chunk boundary, fallocate(PUNCH_HOLE) (Linux) or madvise(MADV_REMOVE) (Linux, mmap backend) or F_PUNCHHOLE (macOS) releases the blocks underneath that range back to the filesystem.

At any instant the part-file's logical size is the full archive, but its physical size on disk is roughly the gap between the slowest worker and the decoder. This is the lookahead window, capped by --max-disk-buffer (default 1 GiB).

The bitmap and the checkpoint

Two pieces of state make resume work:

A chunk bitmap. The source is divided into fixed-size chunks (--chunk-size, default 4 MiB). A bit per chunk records "this chunk's bytes have been fetched and written." The scheduler hands out the next unset bit to whichever worker is free.

A checkpoint sidecar. peel writes <output>.peel.ckpt next to the part-file at quiescent points: boundaries the decoder can resume from byte-identically. These are frame-aligned (per zstd block, per LZMA2 chunk, per deflate block, per 7z folder, per RAR entry, etc.), so the next run reads the checkpoint, knows the decoder state, and picks up at exactly the byte that was in flight.

A checkpoint write is atomic: peel writes to a .tmp file, fsyncs it, and renames it over the previous checkpoint. A crash during the write loses at most the in-flight checkpoint, not the previous one.

Streaming .zip, .7z, and .rar over HTTP

ZIP and 7z put their index at the end of the archive: the ZIP central directory after every entry, the 7z trailer at the bottom of the SignatureHeader's pointer chain. unrar does not depend on a tail-anchored index, but the unrar binary requires lseek on its input regardless. None of curl | unzip, curl | 7z x, or curl | unrar x will start producing output until the entire archive has been buffered somewhere.

peel does not buffer the whole archive. It issues a small ranged GET to fetch the tail (zip central directory or 7z trailer) up front, parses it, then dispatches entry-sized GETs in parallel. Entries are written to the sink as soon as their bytes arrive while the rest of the archive is still in flight. The same hole-punching and resume guarantees as the streaming .tar.* family apply.

For RAR, the format's per-file headers are already laid out at the start of each file's data area, so peel walks them in stream order. No tail probe is needed.

Resume after kill -9

The output is byte-identical to a clean run if peel is re-invoked with the same arguments after any failure. The mechanism:

  1. Workers write to the part-file with pwrite (or mmap memcpy under the §9 backend). The kernel page-caches the write.
  2. The bitmap is updated only after a chunk has been written and fsync'd back into the part-file (configurable via --checkpoint-min-bytes / --checkpoint-min-secs).
  3. The checkpoint sidecar captures the decoder's frame-aligned state plus the bitmap plus the streaming SHA-256 state (if --sha256 is set).
  4. A kill -9 between bitmap updates leaves the part-file with bytes that haven't been marked yet. Per-chunk CRC32C fingerprints in the bitmap detect those bytes on resume; they are re-fetched.
  5. The decoder resumes from the checkpoint's frame boundary, not from the start. Per-format details: zstd resumes per block, xz per LZMA2 chunk, gzip per deflate block (with a 32 KiB sliding-window snapshot), tar per member, zip per entry plus intra-entry (per deflate block / per zstd block), 7z per folder, rar per entry plus intra-entry (via the §F1 checkpoint blob that snapshots the LZ dictionary and filter cache).

The crash-test harness runs 100 random kill points per format and asserts the post-resume output bytes match a clean run, every time.

Bounded disk usage

The compressed side of the pipeline runs as a sliding window:

                    decoder pointer
                          v
   [hole-punched][......in-flight......][unfetched]
                          ^                  ^
                       worker N         worker N+M

The window's width is the gap between the slowest active worker and the decoder. Two knobs bound it:

  • --max-disk-buffer (default 1 GiB): when the gap reaches this many bytes, the scheduler stops dispatching new chunks until the decoder catches up. The default rarely engages on a healthy disk and bounds disaster on a slow one.
  • --punch-threshold (default 4 MiB): minimum gap between in-loop hole-punch syscalls. Smaller values yield a tighter physical-disk footprint; larger values yield fewer syscalls per second. Tune downward to enforce a hard ceiling on physical disk; upward if the filesystem's punch-hole implementation is slow.

For --no-extract runs the puncher is bypassed and the part-file grows to the full archive size. Otherwise the part-file's physical size tracks the in-flight window, typically a few hundred MiB on a healthy network.

What runs where on Linux

--io-backend auto (default) runs probes at startup and picks the fastest path the kernel allows:

  • mmap sparse-file for the part-file: workers memcpy into a MAP_SHARED region; the puncher uses madvise(MADV_REMOVE). This removes a syscall per chunk write at high parallelism.
  • io_uring for the HTTP client's sockets: TCP connect, send, and recv are submitted to a single ring on a dedicated IO thread, with linked LinkTimeout SQEs for prompt cancellation. rustls rides on top unchanged.

If a probe fails (kernel < 5.6, RLIMIT_MEMLOCK too low, seccomp blocking, filesystem rejecting MADV_REMOVE), peel logs one warn! and falls back to the blocking pwrite / pread backend. Force a specific path with --io-backend [auto|blocking|uring|mmap]. See Performance and tuning.

Further reading

CLI reference

peel [OPTIONS] [URLS]...

peel --help prints the same content with full details and the exact default values for the current build. This page covers every flag, grouped by function, with design notes and constraints.

The full alphabetical list, with one-liners, appears at the bottom under Flag summary.

Positional arguments

[URLS]...

One or more source URLs or local file paths.

  • One URL: the single-source case. Example: peel https://host/x.tar.zst -o ./out/.
  • Two or more URLs: activates the multi-part split-archive path. The byte-concatenation of every URL's body is treated as one logical archive stream. Workers fetch all parts in parallel via ranged GETs.
  • A single local path (no http:// or https:// scheme): activates local-file extraction. The same decoders run without HTTP machinery.
  • @file.txt (single arg): read URLs and paths from file.txt, one per line. Blank lines and # comments are ignored. Suitable for multi-volume manifests stored next to the archive.

Output and destination

-o, --output <PATH>

Destination for the extracted contents. Accepts a directory for archive formats that produce a tree (tar, zip, 7z, rar, and any compressed wrapper around tar), or a file for stream-shaped formats (raw .zst, .xz, .lz4, .gz).

  • A trailing slash forces directory semantics. peel x.zst -o ./out/ errors at parse time because .zst is a single-file output shape.
  • No -o: defaults to the URL basename with archive and compression suffixes stripped, in the current working directory. peel https://host/linux-6.x.tar.xz extracts into ./linux-6.x/.
  • The resolver errors at coordinator entry if the explicit shape (trailing slash, file path) disagrees with the detected format.

See Output path resolution for the full table of URL → output mappings.

--workdir <DIR>

Directory for the .peel.part and .peel.ckpt sidecar files.

By default these are placed as siblings of the output (<output>.peel.part and <output>.peel.ckpt). Override when the extracted output and the in-flight state should live on different disks. Examples: extracting onto slow HDD-backed storage while keeping the part-file on a fast NVMe, or pinning the sidecars inside a Kubernetes PVC mount when the output's parent is on ephemeral container storage.

The directory is created if missing. The basenames stay the same; only their parent directory changes.

Download mode

peel runs in one of three modes (default, -k, --no-extract), plus a destructive opt-in for local-file runs. See Download modes for the full mode table.

-k, --keep-archive[=<PATH>]

Extract and keep the source archive on disk. The puncher is forced to no-op so the archive's bytes are preserved at their full Content-Length.

  • -k or --keep-archive (bare): preserve the archive as a sibling of -o, named after the URL basename.
  • -k=<PATH> or --keep-archive=<PATH>: explicit path. The = is required because bare -k followed by a positional URL is otherwise ambiguous.
  • Flag absent: default behaviour. The source bytes are dropped. Hole-punching trims them and the part-file is removed on success.

-k is a no-op in local mode (preservation is the default there) and incompatible with -d/--destructive for HTTP sources.

--no-extract (alias: --download-only)

Skip extraction. Download the source bytes verbatim to a single file. The remote object is fetched in parallel via ranged GETs, using the same scheduler, mirror, resume, and SHA-256 machinery as extract mode, and is renamed into place on success. No decoder runs and no holes are punched.

Suitable for arbitrary non-archive downloads, for keeping an archive to extract later with a different tool, or as a parallel-ranged-GET replacement for aria2c.

Mutually exclusive with --format, --force-format-from-magic, and --punch-threshold. These are extractor knobs and nothing extracts in this mode.

-d, --destructive

Opt in to destructive extraction in local-file mode: hole-punch the source as the decoder advances, then delete on clean completion. Required because local mode is non-destructive by default.

For HTTP sources -d is a no-op. The HTTP path is destructive by default. Combining -d with -k for an HTTP source is an error.

--strict-format

Make format-detection failure a hard error instead of falling through to --no-extract.

Default behaviour: if neither the URL suffix nor the magic bytes identify a registered decoder, peel warns and saves the remote object under its URL basename. --strict-format flips that to a fatal error. Useful in CI when an upstream object changing shape unexpectedly should fail the build instead of producing a different artifact.

Incompatible with --no-extract. No detection runs when nothing is being extracted.

Format selection

peel detects the archive shape from the URL suffix first, then falls back to a magic-byte read of the first ~8 bytes of the source. A mismatch between the suffix and the magic fails closed unless overridden.

--format <NAME>

Force a specific decoder, bypassing both URL-suffix and magic-byte detection. Required when the URL has no usable suffix (for example, an opaque query-string download). Valid names: tar, zstd, xz, lz4, gzip, zip, 7z, rar.

Mutually exclusive with --force-format-from-magic.

--force-format-from-magic

When the URL suffix and the source's magic bytes disagree, trust the magic instead of returning FormatMismatch.

Mutually exclusive with --format.

Network

--workers <N>

Number of parallel ranged-GET workers. Default 4. The scheduler will not dispatch more concurrent requests than this against the primary or any mirror.

Raise on a high-latency, high-bandwidth link (origin in another region) where individual GETs leave the pipe under-utilised. Lower on a single-machine, single-NIC link if the workers saturate the kernel's network stack and per-worker throughput collapses.

--mirror <URL> (repeatable)

Additional source URL serving the same file. The positional URL is the primary; every --mirror is an alternate.

At startup, peel runs a parallel HEAD against every URL and drops any mirror whose Content-Length (or ETag and Last-Modified, when --sha256 is unset) disagrees with the primary. Surviving mirrors are picked from per ranged GET, biased toward the fastest live one. Failures exclude a mirror for 30 s before retry.

See Multi-mirror downloads.

--max-bandwidth <RATE>

Aggregate bandwidth cap across all workers and mirrors via a shared token bucket. Accepts:

  • Decimal suffixes (1000-based, network convention): K, M, G, T.
  • Binary suffixes (1024-based): Ki, Mi, Gi, Ti.
  • A trailing B and /s are accepted and ignored.

Examples: 10MB/s, 1.5GB/s, 512KiB/s, 1000000.

The cap is aggregate, not per-mirror.

--max-disk-buffer <SIZE>

Cap on the on-disk lookahead: bytes downloaded but not yet consumed by the decoder. When the gap reaches this value, the scheduler stops dispatching new chunks until the decoder catches up, bounding the size of the .peel.part file when the network is faster than the disk.

Accepts the same size syntax as --max-bandwidth. Pass none, off, or disabled to remove the cap. Default 1GiB.

--http-version <auto|h1|h2>

HTTP version to use for downloads.

  • auto (default): ALPN-negotiate between H1 and H2 over TLS, H1 over plaintext.
  • h1: force HTTP/1.1.
  • h2: force HTTP/2. Over TLS, the origin must negotiate h2 or the handshake fails. Over plaintext this forces HTTP/2 prior-knowledge ("h2c"), which only works against servers that explicitly speak it.

auto is the default.

--no-auto-discover

Skip multi-volume auto-discovery.

When the positional URL matches a multi-volume pattern (<base>.part<N>.rar, <base>.7z.<NNN>, <base>.z<NN> and <base>.zip), peel HEAD-probes the origin to discover the full ordered volume set before any download starts. This flag forces the seed to be treated as a single-source URL even when its basename matches a multi-volume pattern.

Applicable when:

  • The seed's filename matches one of the conventions but is not actually a multi-volume archive.
  • Discovery would fan out to many failed HEAD probes against a high-latency origin and the seed is known to be a single source.

No effect when multiple positional URLs are supplied. That path already opts out of auto-discovery.

Integrity

--sha256 <HEX> (repeatable)

SHA-256 digest the assembled compressed source must match. Repeatable.

  • Single-URL runs: pass once. peel streams a hand-rolled, resumable SHA-256 over the source bytes as they arrive and aborts at clean completion if the digest disagrees. The hash state is checkpointed across resumes, so a resumed run produces a digest byte-identical to sha256sum on the original file.
  • Multi-URL runs: pass zero times (no verification) or exactly once per URL, paired by order. Hashes are per-part digests of each part's bytes; verified at part-boundaries as the decoder advances.

See Integrity verification. Hashing happens on the streaming pipeline. .zip archives extract per-entry and integrity checking does not extend to that path in the current release.

Encryption

--password-from <SOURCE>

Password source for encrypted archives. Accepts:

  • prompt: read from /dev/tty with echo disabled. Up to 3 attempts on a wrong password before exit code 4.
  • env:NAME: read from the named environment variable.
  • file:PATH: read the first line of the file. Modes other than 0600 emit a one-shot warning.
  • fd:N: read from file descriptor N (one-shot, until EOF or newline). Compatible with peel … --password-from fd:3 3< <(pass …).

peel does not accept a --password=<value> flag. argv is visible to every process on the host.

Tuning knobs

These have measured defaults that work well across the bench grid. See Performance and tuning before changing them in production.

--chunk-size <BYTES>

Bitmap chunk size: the unit of completion tracked in checkpoints. Default 4 MiB.

With adaptive chunk-sizing enabled (the default), the scheduler may coalesce several consecutive bitmap chunks into a single ranged GET; this flag continues to set the bitmap unit. Pair with --no-adaptive-chunk-size to force a fixed dispatch size.

--no-adaptive-chunk-size

Disable the adaptive chunk-size policy. The scheduler dispatches exactly one bitmap chunk per worker, with no growth/shrink decisions over the lifetime of the run. Useful for benchmarking and reproducible test runs.

--punch-threshold <BYTES>

Minimum gap between in-loop hole-punch syscalls. Default 4 MiB.

Smaller values yield a tighter physical-disk footprint; larger values yield fewer syscalls per second. Tune downward to enforce a hard ceiling on physical disk; upward if the filesystem's punch-hole implementation is slow.

--checkpoint-min-bytes <BYTES>

Minimum source-byte progress between checkpoint writes. Default 8 MiB.

--checkpoint-min-secs <SECS>

Minimum wall-clock interval between checkpoint writes (fractional). Default 2 s.

--checkpoint-target-secs <SECS>

Target wall-clock interval between checkpoints. Used to scale the byte floor up at high download rates so the cadence stays below this target. 0 disables rate-aware scaling. Default 0.2 s.

--io-backend <auto|blocking|uring|mmap>

File-IO backend selection.

  • auto (default): on Linux, mmap for the sparse part file plus io_uring for sockets, with graceful fallback. On non-Linux, the blocking backend for both.
  • blocking: force the pre-io_uring pwrite / pread path everywhere. Used for A/B comparison.
  • uring: require io_uring for sockets; error out if unavailable.
  • mmap: force the memory-mapped sparse-file path explicitly, with the blocking socket backend.

See Performance and tuning for what each path does and when to pick it.

Help and version

-h, --help

Print full help. -h prints a one-line summary per flag; --help prints the full description.

-V, --version

Print the version.

Flag summary

FlagPurposeDefault
-o, --output <PATH>Output pathURL basename, suffixes stripped
--workdir <DIR>Sidecar (.peel.part / .peel.ckpt) locationSibling of output
-k, --keep-archive[=<PATH>]Extract AND keep the sourceoff
--no-extractDownload without extractingoff
-d, --destructiveHole-punch + delete source (local mode)off
--strict-formatUnrecognised format → erroroff
--format <NAME>Force a decodernone
--force-format-from-magicTrust magic over URL suffixoff
--workers <N>Parallel GETs4
--mirror <URL> (repeat)Additional source URLsnone
--max-bandwidth <RATE>Aggregate token-bucket capnone
--max-disk-buffer <SIZE>Lookahead window cap1 GiB
--http-version <auto|h1|h2>HTTP versionauto
--no-auto-discoverSkip multi-volume HEAD probesoff
--sha256 <HEX> (repeat)Verify hashnone
--password-from <SOURCE>Password sourcenone
--chunk-size <BYTES>Bitmap unit4 MiB
--no-adaptive-chunk-sizeFixed dispatch sizeoff
--punch-threshold <BYTES>Min gap between punches4 MiB
--checkpoint-min-bytes <BYTES>Min progress between checkpoints8 MiB
--checkpoint-min-secs <SECS>Min interval between checkpoints2 s
--checkpoint-target-secs <SECS>Target interval (rate-aware)0.2 s
--io-backend <NAME>File-IO backendauto
-h, --helpPrint helpnone
-V, --versionPrint versionnone

Supported formats

Every format peel decodes is hand-rolled or wraps a vetted upstream crate. The binary does not shell out to tar, unzip, 7z, or unrar. See How it works for the architecture.

Detection

peel resolves the archive shape with a two-step fallback:

  1. URL-suffix. The last component of the URL is matched against a list of known suffixes (.tar, .tar.zst, .zst, .tar.xz, .xz, .tar.lz4, .lz4, .tar.gz, .gz, .zip, .7z, .rar).
  2. Magic-byte fallback. If the suffix doesn't match, peel issues a tiny initial GET for the first ~16 bytes of the source and matches the magic.

A mismatch between suffix and magic (for example, a URL ending in .tar.zst but bytes starting with the gzip magic 0x1f8b) fails closed. Override with one of:

  • --force-format-from-magic: trust the magic, ignore the suffix.
  • --format <NAME>: bypass detection entirely.

If neither suffix nor magic matches a registered decoder, the default behaviour is to warn once and fall through to --no-extract. The remote object is saved under its URL basename. --strict-format converts that warning to a fatal error.

Format matrix

FormatStreamingResume granularityEncryptionMulti-volume
.tar (uncompressed)per tar membern/an/a
.zst / .tar.zstper zstd blockn/an/a
.xz / .tar.xzper LZMA2 chunkn/an/a
.lz4 / .tar.lz4per lz4 blockn/an/a
.gz / .tar.gzper deflate block¹n/an/a
.bz2 / .tar.bz2 / .tbz2 / .tbzper bzip2 blockn/an/a
.zipper-entry²per entry + intra-entry³WinZip-AES, ZipCryptospanned ZIP (.zNN + .zip)
.7zper-folder⁴per folderAES-256-CBC (SHA-256 KDF).7z.001/.002/…
.rar (RAR5)per-entry⁵per entry + intra-entry⁶AES-256-CBC (header + per-file).part0001.rar/…
.rar (RAR3/RAR4 legacy)per-entry⁷per entry + intra-entry⁷queuedRAR3 multi-volume queued

Footnotes below.

Streaming codecs (.tar.*, raw codecs)

.tar (uncompressed)

Plain POSIX tar. peel recognises ustar (0x75 0x73 0x74 0x61 0x72 at offset 257) and emits each entry to its final path as the member header arrives. Hard links, symlinks, and long-name extensions are all supported.

.zst / .tar.zst

Streaming Zstandard. The decoder is hand-rolled in src/decode/zstd/. Resume is per-block: the checkpoint snapshots the decoder state at every zstd block boundary, so a kill -9 mid-archive picks up at the next block.

The zstd crate in [dependencies] exists for decoding zstd-coded ZIP entries only. The streaming .tar.zst / .zst path is hand-rolled.

.xz / .tar.xz

Streaming XZ (LZMA2). The hand-rolled decoder in src/decode/xz_liblzma/ is per-cycle-equivalent to liblzma (see the bench grid in the project README). Resume is per LZMA2 chunk.

.lz4 / .tar.lz4

Streaming LZ4 Frame Format. Frame parsing is hand-rolled; the inner block-layer decompression uses the lz4_flex crate's block::decompress_into API. Resume is per lz4 block.

.gz / .tar.gz

Streaming gzip with hand-rolled RFC 1951 DEFLATE. The 32 KiB sliding window and the running CRC32 / ISIZE are persisted in the checkpoint, so a kill -9 mid-member resumes byte-identically without re-decoding the member from its start.

Multi-member gzip (the pigz / gzip a b > c.gz shape) is handled per RFC 1952 §2.2: concatenated members are decoded in sequence and emitted as one logical stream.

¹ flate2 is a [dev-dependencies] only (used in the differential test harness to cross-check the hand-rolled decoder); the runtime binary does not link flate2.

.bz2 / .tar.bz2 / .tbz2 / .tbz

Streaming bzip2 with hand-rolled MSB-first Huffman / MTF / RLE2 / BWT / RLE1 layers. Each block (≤ 900 KB uncompressed at -9, with a 48-bit pi BCD sync header 0x314159265359 per the bzip2 wire format) is an independent restart point; the per-block resume blob is ~25 bytes (bit cursor + running stream CRC + cross-block RLE1 state + stream level). The decoder rejects the legacy bzip2 0.9.0 "randomised block" flag with a specific diagnostic; modern encoders have not emitted that flag since 1.0.0 in 1999.

Multi-stream .bz2 files (the cat a.bz2 b.bz2 > c.bz2 shape) are handled by aligning to the next byte boundary after each stream's combined CRC and re-entering the per-block loop with a fresh RLE1 state.

peel does not link libbz2; the decoder is pure Rust.

Random-access archives

.zip

ZIP uses a separate per-entry pipeline because of its central-directory-at-the-end layout. On startup, peel issues a small ranged GET for the End-of-Central-Directory record, walks the central directory, then dispatches per-entry GETs in parallel. Entries are written to their final paths as their bytes arrive.

Supported coders in entries:

  • STORED (uncompressed)
  • DEFLATE (RFC 1951; same hand-rolled decoder as .gz)
  • zstd entries (via the zstd crate's streaming reader API)

Encryption: WinZip-AES (AE-1 and AE-2 forms, AES-128/192/256-CTR with PBKDF2-HMAC-SHA1 key derivation and an HMAC-SHA1-80 trailer); PKWARE traditional "ZipCrypto" (CRC32-keyed PRGA, insecure but supported for compatibility). PKWARE strong-encryption (central-directory encryption, general-purpose flag bit 6) is not supported and surfaces as a clear error.

Zip64, multi-disk / spanned archives (other than the simple .zNN + .zip form), and AES with non-standard parameters are not yet supported. Such archives fail with a specific "unsupported feature" error rather than producing wrong output.

² Per-entry streaming: each entry's bytes are written to its final path as soon as they arrive, while the rest of the archive is still in flight.

³ STORED entries resume byte-granular. DEFLATE entries resume per deflate block via the 32 KiB-window snapshot. zstd entries resume per zstd block. All encoded into the checkpoint format (version 7) under each in-progress entry.

.7z

7z uses a separate per-folder pipeline because of its SignatureHeader → trailer-pointer layout. peel reads the SignatureHeader at offset 0, follows the pointer to fetch the trailer, parses the streams metadata, and dispatches per-folder GETs.

Supported coders:

  • COPY (no compression)
  • DEFLATE
  • LZMA
  • LZMA2

Header forms: plain Header and unencrypted EncodedHeader (the trailer compresses metadata with an unencrypted coder chain). Encryption ships for AES-256-CBC under the 7z KDF (crate::crypto::sevenz_kdf).

The current release is single-volume only; multi-volume .7z.001 support is planned. BCJ filters (x86, ARM, and other preprocessor filters) and per-coder intra-folder resume are queued.

⁴ Resume granularity is one folder at a time. A kill -9 mid-folder restarts that folder from the start of its packed range. Per-coder intra-folder resume, BCJ filters, AES with non-default parameters, and multi-volume archives are queued.

.rar (RAR5)

RAR5 walks file headers in stream order with no tail-anchored index like zip or 7z, so peel streams entries to their final paths as each entry's data area arrives.

Supported coders (compression methods):

  • STORED (method 0)
  • Standard RAR5 algorithm (methods 1–5) via the hand-rolled decode::rar_native LZSS pipeline plus the RAR-VM standard filters (E8, E8E9, Delta, RGB, Audio).

Encryption: AES-256-CBC for both archive-header encryption (HEAD_CRYPT, header type 4) and per-file data encryption (extra record type 1), with PBKDF2-HMAC-SHA256 key derivation. Optional pswcheck verifier supported. See Encrypted archives.

Multi-volume archives in the <base>.part<N>.rar form are supported via the multi-volume path: auto-discovery, explicit positional list, or manifest file.

The previous "non-encrypted, single-volume only" restriction no longer applies. Encryption ships, multi-volume ships. SFX archives and the rarely-used RAR-VM custom-filter slot (O.RAR.CUSTOMFILTER) remain queued.

⁵ Per-entry streaming with the §F1 checkpoint blob capturing the LZ dictionary state and filter program cache so resume is byte-identical.

⁶ Mid-entry resume: a kill -9 mid-RAR5 file restarts the in-flight entry from the snapshot, not from its start. Multi-block lookahead state is captured in the blob.

.rar (RAR3 / RAR4 legacy)

Legacy RAR3 / RAR4 archives use the hand-rolled decode::rar_legacy LZ pipeline plus the RarVM standard-filter dispatcher (E8, E8E9, Delta, RGB, Audio).

Supported coders:

  • STORED
  • LZ Normal (-m3 from the rar encoder)

The mid-entry checkpoint blob (PLAN_rar3.md §F1) captures the LZ dictionary state and filter program cache.

PPMd-II and other less-common filters and coders are queued. Encryption for legacy RAR3 archives is queued.

⁷ Same per-entry-plus-intra-entry resume model as RAR5; the LZ pipeline is different (hand-rolled decode::rar_legacy) but the checkpoint semantics are identical.

RAR provenance

peel's RAR3 and RAR5 decoders are clean-room implementations. RARLAB's unrar source has not been consulted at any point. libarchive's RAR readers (LGPL-2.1, OSI-licensed) are referenced as an external spec where the RAR wire format requires one. They are read, not vendored or linked.

Test fixtures are produced with a license-purchased copy of RARLAB's rar encoder. The unrar binary is not linked, vendored, or used as an implementation reference; it appears in the RAR benchmark grid as a third-party point of comparison only.

peel is licensed MIT OR Apache-2.0. The unRAR license is non-OSI and GPL-incompatible, so a clean-room derivation is the only way to ship a RAR decoder without inheriting that constraint.

Disabling the RAR module

To produce a smaller binary without .rar support, build without the rar feature:

cargo install peel-rs --locked --no-default-features

The crate still registers .rar and the RAR5 magic against a diagnostic-only factory, so the user sees a precise compiled without the 'rar' feature error rather than unknown format.

What's not (yet) supported

The following are not in the current release:

  • .lzma (raw LZMA1, no XZ container): not registered.
  • PKWARE strong encryption: clear error.
  • ZIP64 multi-disk: clear error (regular Zip64 is supported).
  • GPG-encrypted tarballs: out of scope. This is a separate pipeline that peel does not wrap.
  • 7z BCJ filters, AES with non-default coder placement, multi-volume .7z.001: clear error.
  • RAR self-extracting (SFX) archives: clear error.

Exit codes

peel uses a small, stable set of exit codes so wrapper scripts can distinguish failure modes without parsing stderr.

CodeMeaning
0Extraction completed successfully
1Generic extraction or I/O failure (everything not covered below)
2CLI argument parse error (clap-handled; not user-distinct)
4PasswordIncorrect or PasswordMissing anywhere in the error chain
128 + signumGraceful shutdown after a signal (130 = SIGINT, 143 = SIGTERM); sidecars left on disk for resume

Code 0

The extracted output is complete and, if --sha256 was set, matches the expected hash. The .peel.part and .peel.ckpt sidecars have been unlinked.

In -k and --keep-archive mode, the source archive is at its final location.

In --no-extract mode, the downloaded source bytes are at -o, or at the URL basename if -o was omitted.

Code 1

Something else went wrong: disk full, network exhausted retries, the source disappeared mid-run, the checkpoint format is incompatible, format detection failed under --strict-format, or the SHA-256 digest did not match.

The error message on stderr identifies the cause. Examples:

Error messageCause
No space left on deviceOutput filesystem full
digest mismatch: expected …, got …--sha256 value disagrees with the streamed bytes
source changed during runPer-chunk CRC32C fingerprint disagrees on resume
format detection failed, --strict-format set--strict-format is on and neither URL suffix nor magic identifies a registered decoder
mirror https://… : 502 Bad Gateway (after retries)All mirrors exhausted
checkpoint format version 6 not compatible with this peel build (current: 7)Older sidecar; delete it or use a compatible peel version

The sidecars (.peel.part, .peel.ckpt) are left in place on a code-1 exit so that a follow-up run can either resume (if the cause was transient) or be cleaned up explicitly.

Code 4

A password issue: the wrong password was supplied, no password was supplied for an encrypted archive, or --password-from prompt exhausted its 3 retries.

This is a separate code so that scripts can re-prompt without conflating it with a genuine extraction failure. A retry loop:

while true; do
  peel "$URL" --password-from prompt -o ./out/ && break
  rc=$?
  if [ "$rc" != "4" ]; then
    echo "peel failed with code $rc (not a password issue)" >&2
    exit "$rc"
  fi
  echo "wrong password, retry" >&2
done

See Encrypted archives for the full encryption discussion.

Codes 130 and 143 (signal exits)

peel traps SIGINT (Ctrl-C) and SIGTERM and exits with 128 + signum (130 for SIGINT, 143 for SIGTERM). On graceful shutdown:

  • The current checkpoint is flushed and fsync'd.
  • The .peel.part and .peel.ckpt sidecars are left on disk.
  • Re-running the same command resumes from the last checkpoint.

SIGKILL (kill -9) does not get a graceful shutdown. The process dies immediately. An ungraceful kill is still safe to resume from: the last completed checkpoint is on disk, the per-chunk fingerprints catch the in-flight chunk's partial bytes, and the next run reconciles.

Code 2 (clap parse error)

CLI argument parsing errors (unrecognised flag, conflicting flags, wrong value type) come from clap and exit code 2. The error message names the offending argument:

error: the argument '--no-extract' cannot be used with '--format <NAME>'

Usage: peel --no-extract [URLS]...

For more information, try '--help'.

This is not user-distinct. It follows the standard clap convention and matches cargo, rustup, and most modern Rust CLIs.

Scripting against the codes

A common pattern distinguishes "user error" (retry with different inputs), "transient error" (retry with the same inputs), and "give up":

#!/usr/bin/env bash
set -u

URL=$1
OUT=$2

peel "$URL" -o "$OUT" --password-from env:PEEL_PW
rc=$?

case "$rc" in
  0)
    echo "ok"; exit 0 ;;
  4)
    echo "wrong password: set PEEL_PW correctly and retry"; exit 4 ;;
  130|143)
    echo "interrupted; re-run to resume"; exit "$rc" ;;
  *)
    echo "peel failed; sidecars at ${OUT}.peel.part / ${OUT}.peel.ckpt"; exit "$rc" ;;
esac

On kill -9, peel does not get to set an exit code. The parent sees 137 (128 + 9), which peel itself never produces. That state is still resumable: the next run picks up the sidecars.

Download modes

peel runs in one of three modes for HTTP sources, plus a destructive opt-in for local-file sources. The mode is selected by flag at the CLI; format detection (URL suffix → magic bytes) decides the output shape for the default mode.

Mode summary (HTTP source)

FlagDownloadExtractHole-punch sourceSource on disk at exit
(default)yesyesyesdeleted
-k (bare)yesyesnopreserved as sibling of -o
-k=<PATH>yesyesnopreserved at <PATH>
--no-extractyesnon/apreserved at -o

If format detection fails, peel warns and runs as --no-extract by default: the remote object is saved to disk under its URL basename. Pass --strict-format to make that case a hard error instead. Useful in CI when an upstream object changing shape should fail the build rather than produce a different artifact.

Default mode: extract and destroy

peel https://example.com/dataset.tar.zst -o ./out/

Behaviour:

  • Parallel ranged GETs feed <output>.peel.part (sparse).
  • The decoder consumes the prefix while workers fetch the suffix.
  • fallocate(PUNCH_HOLE) / madvise(MADV_REMOVE) releases blocks of the part-file as the decoder advances past them.
  • On clean completion, the part-file (now mostly holes) is unlinked and the checkpoint sidecar (<output>.peel.ckpt) is removed.
  • On kill -9 or crash, the part-file and the checkpoint sidecar are left on disk. Re-running the same command resumes byte-identically.

Peak compressed-side disk: roughly --max-disk-buffer (default 1 GiB). Peak total disk: extracted_size + lookahead_window.

-k / --keep-archive: extract and keep the archive

peel https://example.com/dataset.tar.zst -o ./out/ -k
peel https://example.com/dataset.tar.zst -o ./out/ -k=./preserved/dataset.tar.zst

Behaviour:

  • Same parallel download and streaming extract as default mode.
  • The puncher is forced to no-op. The part-file grows to the full Content-Length of the source.
  • On clean completion, the part-file is renamed to its final archive path:
    • Bare -k → sibling of -o, named after the URL basename.
    • -k=<PATH> → explicit path. The = is required, since bare -k followed by a positional URL is otherwise ambiguous.

Peak disk: extracted_size + compressed_size. Use this mode when the archive must remain on disk afterward (for example, to upload it elsewhere, to extract it again with a different tool, or to keep as a backup).

-k is redundant with --no-extract (which already preserves the source). The CLI logs an info-level note rather than erroring.

--no-extract: download only, parallel-GET aria2c-style

peel https://example.com/big.deb --no-extract
peel https://example.com/big.deb --download-only        # alias

Behaviour:

  • Parallel ranged GETs feed <output>.peel.part.
  • No decoder runs. No holes are punched.
  • On clean completion, <output>.peel.part is renamed to its final path (the URL basename when -o is unset).
  • Resume on kill -9 or network drop works the same way as extract mode: the chunk bitmap, ETag handling, and SHA-256 hashing all apply.

Suitable for:

  • Arbitrary remote downloads that are not archives: .deb packages, raw binaries, checksum files, ML weight files.
  • Keeping the archive on disk to extract later with a different tool.
  • Using peel as a parallel ranged-GET replacement for aria2c, axel, or wget -c, with the same scheduler, mirror fan-out, SHA-256 verification, and checkpointed resume.

Mutually exclusive with --format, --force-format-from-magic, and --punch-threshold. Those are extractor knobs and nothing extracts in this mode.

-d / --destructive: opt in to destructive local-file extraction

peel /tmp/dataset.tar.zst                # non-destructive (default for local)
peel /tmp/dataset.tar.zst -d -o ./out/   # destructive: hole-punch + delete on success

Local-file extraction is non-destructive by default. peel abc.tar.xz extracts into ./abc/ and leaves abc.tar.xz untouched. -d opts in to the disk-pressure contract of the HTTP path: the source is progressively hole-punched as the decoder advances and deleted on clean completion, freeing the archive's blocks before the extracted tree is fully written.

For an HTTP source, -d is a harmless no-op (HTTP runs are destructive by default), and peel logs an info-level note. Combining -d with -k/--keep-archive for an HTTP source is an error: the two intents contradict.

-d does not apply to the random-access formats (.zip, .7z, .rar) in local mode. Their pipelines seek backwards into the archive (zip central directory at the tail, 7z trailer pointer, rar per-entry headers), so a monotonically-advancing punch cursor cannot be maintained. peel warns and proceeds non-destructively when -d is passed against one of those sources.

Strict mode

peel --strict-format <URL> -o <PATH>

When the URL suffix and the magic-byte read both fail to identify a registered decoder, the default behaviour is a warning and a fall-through to --no-extract. --strict-format turns that case into a hard error.

Use this in CI when an upstream object changing shape unexpectedly (.tar.zst.tar.gz, or a maintainer's CDN serving a different file under the same URL) should fail the build rather than produce a different artifact. Incompatible with --no-extract (no detection runs when nothing is being extracted). Compatible with -k.

Putting it together

GoalCommand
Extract and discard the archivepeel <URL> -o ./out/
Extract and keep the archivepeel <URL> -o ./out/ -k
Just download (no extract)peel <URL> --no-extract
Extract a local file, preserve sourcepeel ./archive.tar.zst -o ./out/
Extract a local file, free disk as you gopeel ./archive.tar.zst -o ./out/ -d
Verify hash, cap bandwidth, fan out across mirrorssee Multi-mirror downloads
Fail CI on format driftpeel <URL> -o ./out/ --strict-format

Output path resolution

-o <PATH> accepts either a directory (for archive formats that produce a tree) or a file (for stream-shaped formats). When -o is omitted, peel derives a default from the URL basename.

The two output shapes

ShapeWhenDefault -o
Directory (tree-shaped)tar, zip, 7z, rar, and any .tar.<x> wrapperURL basename with archive / compression suffixes stripped
File (stream-shaped)Raw .zst, .xz, .lz4, .gz (no inner tar)URL basename with the compression suffix stripped

If the explicit -o does not match the detected format's shape, peel errors at coordinator entry. There is no silent fixup.

Examples

# Tar wrapper → directory. Trailing slash is optional but explicit.
peel https://example.com/linux-6.x.tar.xz                  # → ./linux-6.x/
peel https://example.com/linux-6.x.tar.xz -o ./linux/      # → ./linux/
peel https://example.com/linux-6.x.tar.xz -o ./linux       # → ./linux/  (no trailing slash, still a dir)

# Raw compressed → single file.
peel https://example.com/model.bin.zst                     # → ./model.bin
peel https://example.com/model.bin.zst -o ./weights.bin    # → ./weights.bin

# ZIP / 7z / RAR → directory.
peel https://example.com/data.zip                          # → ./data/
peel https://example.com/snapshot.7z -o ./snap/            # → ./snap/
peel https://example.com/backup.part0001.rar -o ./out/     # → ./out/  (multi-volume auto-discovered)

# Trailing slash forces directory semantics. Useful when the URL has
# no suffix and a tree output is required for `--format zip`.
peel "https://host/dl?id=42" --format zip -o ./out/

How the basename is computed

  1. Strip the URL's query string and fragment.
  2. Take the last path component.
  3. Strip suffixes in order until a non-archive / non-compression suffix remains:
    • .tar strips .tar.
    • .zst / .xz / .lz4 / .gz strip the compression suffix.
    • .tar.zst etc. strip both.
    • .zip / .7z / .rar strip the archive suffix.

Examples:

URL basenameDefault output
linux-6.x.tar.xzlinux-6.x/
model.bin.zstmodel.bin
data.tardata/
dataset.zipdataset/
snapshot.7zsnapshot/
backup.part0001.rarbackup/

When the URL has no useful suffix

Opaque query-string downloads (?id=42, ?download_token=…) defeat the suffix-based default. Two options:

# 1. Pin the decoder explicitly.
peel "https://host/dl?id=42" --format zstd -o ./out.bin

# 2. Force trust in magic-byte detection. A small initial GET reads
#    the magic, and the resolver picks the decoder from there.
peel "https://host/dl?id=42" --force-format-from-magic -o ./out.bin

--format is more deterministic than relying on magic. Prefer it whenever the format is known ahead of time.

Conflict resolution

SituationBehaviour
-o is a file path, format is tree-shaped (e.g. tar.zst)Error at coordinator entry: shape mismatch
-o ends in /, format is stream-shaped (e.g. raw .zst)Error at coordinator entry: shape mismatch
-o is a directory that exists and is non-empty (tar / zip / 7z / rar)peel writes into it; pre-existing files are overwritten when an archive entry has the same path
-o is a file path that exists (stream-shaped output)Overwritten
-o's parent directory does not existCreated if a single parent component is missing; otherwise error

Where the sidecars live

The .peel.part and .peel.ckpt sidecar files live next to the output by default:

  • -o ./out/./out.peel.part, ./out.peel.ckpt
  • -o ./out.bin./out.bin.peel.part, ./out.bin.peel.ckpt

Override with --workdir <DIR> when the output and the in-flight state should live on different disks:

# Extract onto slow HDD-backed /data, keep the in-flight state on
# fast NVMe at /var/cache/peel.
peel https://host/dataset.tar.zst -o /data/out/ --workdir /var/cache/peel/

The directory is created if missing. The basenames stay the same (<output_name>.peel.part / <output_name>.peel.ckpt); only the parent directory changes.

Multi-part URLs

Some publishers split a large archive into multiple files at the HTTP layer, serving separate URLs for name.tar.part0000, name.tar.part0001, and so on. The byte-concatenation of every part's body forms the logical archive. peel handles this case by accepting two or more positional URLs.

This case differs from multi-volume archives. The format's own splitting (RAR .partNNN.rar, 7z .7z.NNN, spanned ZIP .zNN) stores format-aware metadata in each volume. Multi-part URLs carry no such metadata: they are a byte-stream served across multiple URLs.

Usage

peel \
  https://snapshot.example.com/dataset.tar.part0000 \
  https://snapshot.example.com/dataset.tar.part0001 \
  https://snapshot.example.com/dataset.tar.part0002 \
  -o ./out/

Behaviour:

  • At startup, peel issues a parallel HEAD against every URL and reads its Content-Length.
  • The full assembled length is sum(Content-Length) of the parts.
  • Workers fetch every part in parallel via ranged GETs (the same approach as aria2c -Z), but the bytes stream into a single logical part file and a single decoder.
  • The decoder sees the byte-concatenation of every part's body, in order.

The compressed bytes never fully land on disk. The hole-punching and resume guarantees of a single-URL run apply.

Verifying integrity per part

--sha256 is repeatable. With two or more URLs, either form is valid:

  • Zero --sha256 flags: no verification.
  • Exactly one --sha256 per URL, paired by order. Each part's hash is verified at its part-boundary as the decoder advances.
peel \
  https://snapshot.example.com/dataset.tar.part0000 \
  https://snapshot.example.com/dataset.tar.part0001 \
  --sha256 0a8de6e83fd8ba040fd052fd8d4fd0e009a9736ace5cb32bb2abd4ac6a61725d \
  --sha256 1bcf4d2e9aa01ff5...                                              \
  -o ./out/

A wrong number of --sha256 flags (1 for 2 URLs, 3 for 2 URLs) is rejected at parse time.

A real example: Arbitrum snapshot bundles

This mode is used in production against Arbitrum snapshot bundles. The nova snapshot, for example, is published as pruned.tar.part0000 through pruned.tar.partNNNN with a per-part SHA-256 list:

peel \
  https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0000 \
  https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0001 \
  https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0002 \
  --sha256 0a8de6e83fd8ba040fd052fd8d4fd0e009a9736ace5cb32bb2abd4ac6a61725d \
  --sha256 1bcf4d2e9aa01ff5e8aa72a2ab39310af020bdb6f76d6f7c75c7c14ade38c6ce \
  --sha256 c40bf8a2cb9d9a90e4c80a5b7c6e9c5d3b8a2e1f9d4a6c1b7e2f8d3a5c0b9e1f0 \
  -o ./nova-out/

The convenience script scripts/arb-snapshot.sh wraps the URL list / hash list discovery against the Arbitrum manifest.

Reading URLs from a file

When the part list is large (tens or hundreds of URLs), pass it as @file.txt instead of inlining it:

# urls.txt: blank lines and "#" comments are skipped
https://snapshot.example.com/dataset.tar.part0000
https://snapshot.example.com/dataset.tar.part0001
https://snapshot.example.com/dataset.tar.part0002
peel @urls.txt -o ./out/

@file.txt is also used for multi-volume manifests.

Differences from multi-volume archives

Multi-part URLsMulti-volume archives
DetectionCaller passes ≥ 2 URLsOne URL whose basename matches a known volume pattern, auto-discovered
Format metadataNone; bytes concatenate rawEach volume carries format-aware headers
Order mattersYes; caller specifiesYes; discovered from volume numbering
Use casesLarge .tar.* published in chunks (Arbitrum snapshots)RAR .partNNN.rar, 7z .7z.NNN, spanned ZIP
OverridePass @file.txt for many parts--no-auto-discover forces single-source

To distinguish the two cases, inspect the URL suffixes. URLs ending in .partNNNN (numbered, no archive extension) or .tar.partNNN (numbered after a tar extension) are multi-part URLs. URLs ending in .part0001.rar, .7z.001, or .z01 are multi-volume archives.

Multi-volume archives

Some archive formats support splitting one logical archive across multiple physical files with format-aware metadata in each volume. peel recognises three multi-volume naming conventions and resolves every sibling volume up front (one parallel HEAD per volume for HTTP seeds).

Archives split at the HTTP layer (raw .partNNNN files that concatenate into a logical archive) are handled by Multi-part URLs instead.

Supported conventions

FormatPatternExample
RAR5<base>.part<N>.rarbackup.part0001.rar, backup.part0002.rar, …
7z<base>.7z.<NNN>snapshot.7z.001, snapshot.7z.002, …
ZIP (spanned)<base>.z<NN> + <base>.zipdata.z01, data.z02, …, data.zip

For spanned ZIP, the <base>.zip final volume is mandatory: it contains the End-of-Central-Directory record. The .zNN files hold the entry data.

Three ways to invoke

1. Single seed with auto-discovery (default)

Pass any volume whose basename matches a recognised pattern and peel discovers the full ordered set:

peel https://host/backup.part0001.rar -o ./out/
peel https://host/snapshot.7z.001 -o ./out/
peel ./data.z01 -o ./out/                          # local works too

At startup, peel:

  1. Recognises the pattern from the basename.
  2. Probes the origin for siblings via HEAD against backup.part0002.rar, backup.part0003.rar, and so on, until two consecutive HEADs return 404.
  3. Reports the resolved volume count in the progress UI.
  4. Routes downloads through the multi-volume storage path.

Discovery is parallel: every probe runs concurrently against the origin, so resolution costs one round-trip of wall-clock time regardless of the volume count.

2. Explicit positional list

Pass every volume URL as a positional argument. Useful when auto-discovery does not fit (volumes hosted on different origins, or numbering that is not contiguous from 0001):

peel \
  https://host/backup.part0001.rar \
  https://host/backup.part0002.rar \
  https://host/backup.part0003.rar \
  -o ./out/

The volume basenames must form a contiguous numeric sequence; out-of-order or non-contiguous entries (part0001, part0003) are rejected at parse time with a specific error.

3. Manifest file

Pass @file.txt (one URL or path per line; blank lines and # comments ignored):

# volumes.txt
https://host/backup.part0001.rar
https://host/backup.part0002.rar
https://host/backup.part0003.rar
peel @volumes.txt -o ./out/

Useful when the volume list is long or generated programmatically.

Disabling auto-discovery

--no-auto-discover forces single-source semantics on a seed whose basename happens to match a multi-volume pattern:

# Just download the one .zip file, don't probe for .z01 siblings.
peel https://host/data.zip --no-extract --no-auto-discover

When to use it:

  • The seed's filename matches one of the conventions but is not actually a multi-volume archive (for example, an unrelated .zip file that should not be HEAD-probed for .z01 siblings).
  • Discovery would fan out to many failed HEAD probes against a high-latency origin and the seed is known to be a single source.

The flag has no effect when multiple positional URLs are supplied: that path already opts out of auto-discovery.

How it interacts with the streaming pipeline

A multi-volume archive is internally a single logical archive: the scheduler, the bitmap, the checkpoint, and the decoder all see one contiguous source. Each volume contributes its bytes to the byte-concatenated logical stream.

That means:

  • Resume works across volumes. A kill -9 while volume 7 of 12 is in flight is safe: the next run picks up exactly where the decoder was.
  • Hole-punching applies to each volume's .peel.part shard as the decoder advances past it. The compressed-side disk footprint stays bounded the same way as a single-URL run.
  • Mirror fan-out (--mirror) is currently single-URL only. Multi-volume archives are fetched from their primary URLs. Mirror support across the volume set is a planned addition.
  • --sha256 is single-hash-per-URL on multi-URL runs, so a multi-volume archive expects a hash per volume.

Listing the resolved volumes

Run with RUST_LOG=info to see the discovered set before any downloads start:

RUST_LOG=info peel https://host/backup.part0001.rar -o ./out/ 2>&1 | head -20

Look for the discovered N volumes line and the per-volume sizes.

Diagnostics

ErrorCauseFix
multi-volume volumes not contiguousExplicit list skips a numberAdd the missing volume or renumber
multi-volume probe returned mixed Content-LengthOrigin serving inconsistent volumesInvestigate origin; check for partial uploads
spanned zip requires .zip final volumePassed .z01..z09 without .zipAdd the .zip final volume to the list
cannot mix multi-volume conventions.part0001.rar + .7z.001 in one listOne archive per invocation

Multi-mirror downloads

peel can fetch a single file from several origins in parallel, biasing the work toward whichever mirror is fastest and excluding mirrors that fail. The positional URL is the primary. Every --mirror <URL> is an alternate.

Usage

peel https://primary.example.com/dataset.tar.zst \
  --mirror https://eu.mirror.example.com/dataset.tar.zst \
  --mirror https://us.mirror.example.com/dataset.tar.zst \
  -o ./out/

--mirror is repeatable with no fixed upper bound. Returns diminish once the mirror count exceeds the network's ability to keep more than a few worker connections busy.

Startup validation

Before any data download, peel runs a parallel HEAD against every URL (primary and each mirror) and compares:

  1. Content-Length: the byte size of the source must agree across all mirrors.
  2. ETag / Last-Modified: if --sha256 is unset, these serve as a secondary identity signal. Mismatched ETags indicate the mirrors are serving different files.
  3. Accept-Ranges: bytes: required for the mirror to be useful. Mirrors without ranged-GET support are dropped with a warning.

Any mirror that fails these checks is excluded for the run. Surviving mirrors are selected per ranged GET, biased toward the fastest live mirror.

If the primary fails validation, the run aborts unless --sha256 is set. With the hash as the source of truth, peel proceeds against the agreeing mirrors.

Scheduler behaviour

For each pending ranged GET:

  • The scheduler picks among healthy mirrors using a smoothed per-mirror throughput estimate.
  • A mirror that fails a request (5xx, connection reset, timeout) is excluded for 30 seconds before being retried.
  • The exclusion is logged at warn!. The retry is logged at info!.
  • If all mirrors are excluded simultaneously, the scheduler back-pressures the workers until one returns.

A flapping mirror takes itself out of rotation rather than failing the whole run, providing graceful degradation.

Combining with other features

With --sha256

When --sha256 is set, the hash is the source of truth. peel trusts agreeing mirrors even when their Last-Modified headers disagree (CDN edge timing, mirror re-uploads). A wrong-hash mirror fails validation later. A right-hash mirror is accepted even if its metadata is slightly different.

peel https://primary.example.com/dataset.tar.zst \
  --mirror https://eu.mirror.example.com/dataset.tar.zst \
  --mirror https://us.mirror.example.com/dataset.tar.zst \
  --sha256 ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad \
  -o ./out/

With --max-bandwidth

The cap is aggregate across all mirrors via a single token bucket. --max-bandwidth 50MB/s against 3 mirrors caps the total at 50 MB/s, not 150 MB/s. This matches the intent when the cap exists to be polite to the caller's network or to the mirrors collectively.

With --workers

--workers <N> is the total in-flight request count across all mirrors, not per-mirror. With 4 workers and 3 mirrors, ~4 concurrent requests are in flight at any time, drawn from whichever mirrors are fastest at the moment.

What --mirror is not

  • Failover only. peel does not sequentially try mirror 1, then mirror 2 on failure. It uses all of them in parallel by default.
  • A way to download from sharded URLs. When the URLs serve different bytes (different parts of one logical file), use Multi-part URLs.
  • A way to download a multi-volume archive. For name.part0001.rar + name.part0002.rar (each volume is its own file), use Multi-volume archives. --mirror applies only when the same file is reachable at multiple URLs.

Diagnostics

Log lineMeaning
mirror https://… dropped at startup: Content-Length mismatchMirror's reported size disagrees with the primary
mirror https://… dropped: no Accept-Ranges: bytesMirror does not support ranged GETs and cannot be used for parallel download
mirror https://… excluded for 30s after status=502Transient failure; mirror will be retried
all mirrors excluded; back-pressuringAll sources are down simultaneously; the scheduler waits
primary failed validation; using N agreeing mirror(s)The primary's size/etag didn't match; mirrors did. Requires --sha256

Local-file extraction

When passed a path on disk, peel skips the HTTP machinery entirely (no scheduler, no mirrors, no chunk bitmap) and runs the same decoder / sink / extractor stack against the local file.

Use this mode when the archive is already on disk and peel's decoders are preferred over tar -I zstd -xf, unzip, 7z x, or unrar x.

Usage

# Non-destructive (default): extracts to ./dataset/ and leaves
# /tmp/dataset.tar.zst untouched.
peel /tmp/dataset.tar.zst

# Explicit output directory.
peel /tmp/dataset.tar.zst -o ./out/

# Destructive opt-in: hole-punch the source as the decoder advances,
# delete it on clean completion.
peel -d /tmp/dataset.tar.zst -o ./out/

peel recognises a local path by the absence of an http:// or https:// scheme. Relative paths are resolved against the current working directory.

Modes

FlagBehaviour
(default)Non-destructive: extract and leave the source untouched, no .peel.ckpt written
-d / --destructiveHole-punch the source as the decoder advances and delete it on clean completion
-k / --keep-archiveNo-op in local mode (preservation is already the default); kept for cross-source script compatibility
--format <NAME>Force a decoder (same semantics as HTTP mode)
--workdir <DIR>Place the .peel.ckpt sidecar here instead of next to the source (destructive mode only)
--io-backend …Selects the puncher implementation (auto / blocking / mmap)
--punch-thresholdMinimum gap between in-loop punch syscalls in destructive mode

Resume

Destructive mode writes a .peel.ckpt next to the source after each quiescent decoder boundary. A kill -9 mid-run followed by a re-invocation (with the same -d) converges to the same final output tree as a clean single run.

Non-destructive mode is one-pass: no .peel.ckpt is written. A kill mid-run requires re-running from scratch against the still-intact source.

Format coverage

Every format peel supports works through the local path:

  • Streaming shapes (.tar.zst, .tar.xz, .tar.lz4, .tar.gz, raw .zst / .xz / .lz4 / .gz, plain uncompressed .tar) flow through the same single-pass decoder the HTTP path uses.
  • Random-access shapes (.zip, .7z, .rar: RAR5 plus legacy RAR3/RAR4) drive their per-format pipelines against the source archive opened read-only and wrapped in a fully-marked chunk bitmap, so the existing orchestrators run unchanged.

Destructive mode (-d) does not apply to the random-access formats. Their pipelines seek backwards into the archive (zip's central directory at the tail, 7z's trailer pointer, rar's per-entry headers), so a monotonically-advancing punch cursor cannot be maintained. peel warns and proceeds non-destructively when -d is passed against one of those sources.

Flags rejected in local mode

A few HTTP-only flags are rejected at parse time when peel detects a local-path positional argument:

  • --mirror
  • --sha256
  • --workers
  • --chunk-size
  • --no-adaptive-chunk-size
  • --max-bandwidth
  • --max-disk-buffer
  • --http-version
  • --no-extract
  • --strict-format

If any of those flags are required, the run belongs on the HTTP path: pass a file:///… URL or upload to localhost.

When to use local mode

The HTTP path uses the same decoders. The choice depends on whether the bytes are already on disk.

  • Already on disk: local mode is faster (no syscall overhead from the HTTP client), simpler, and supports destructive hole-punching via -d when disk pressure is the goal.
  • Must be downloaded: HTTP mode does it in one pass. A separate download then local-extract pipeline adds a full disk round-trip.

The bench grid in the project README compares peel against the system tools (tar -I zstd -xf …, unzip, 7z x, unrar x) for local-file decode and covers the per-format performance characteristics.

Examples

# Extract a .tar.zst from disk, default output dir is ./dataset/
peel /tmp/dataset.tar.zst

# Extract a .zip with one specific decoder, output to ./out/
peel ./archive.zip --format zip -o ./out/

# Free disk as the extraction proceeds (destructive); fail if the
# decoder gets stuck.
peel -d /var/snapshots/big.tar.xz -o /data/snapshot/

# Keep checkpoint state on fast NVMe, write output to slow HDD.
peel -d /data/big.tar.zst -o /mnt/slow/out/ --workdir /var/cache/peel/

Checkpoint and resume

peel survives any failure short of disk corruption (dropped TCP, kill -9, OOM kill, pod restart, power loss) and resumes byte-identical to a clean run. This page describes the on-disk layout, the write cadence, and how to interpret the sidecar files.

The two sidecar files

When peel extracts <output> from an HTTP source, two sidecar files appear next to the output during the run:

FileWhat it holds
<output>.peel.partThe sparse compressed bytes (the part-file). Hole-punched as the decoder advances; physical size is the lookahead window.
<output>.peel.ckptFrame-aligned decoder state, chunk bitmap, and optional SHA-256 state, written atomically.

On clean completion, both files are unlinked.

On failure or interruption, both files are left on disk. Re-running the same command picks them up and resumes from the checkpoint.

The --workdir <DIR> flag relocates both files. Their basenames stay the same (<output_name>.peel.part / <output_name>.peel.ckpt); only the parent directory changes.

When checkpoints are written

A checkpoint write is triggered when all of these are true:

  1. --checkpoint-min-bytes bytes of source progress have accumulated since the last checkpoint (default 8 MiB).
  2. --checkpoint-min-secs seconds have elapsed since the last checkpoint (default 2 s).
  3. The decoder is at a frame-aligned boundary (per zstd block, per LZMA2 chunk, per deflate block, per tar member, per 7z folder, per RAR entry, per ZIP entry / intra-entry boundary).

The byte floor is scaled up at high download rates so the cadence stays below --checkpoint-target-secs (default 0.2 s) wall-clock. Pass --checkpoint-target-secs 0 to disable rate-aware scaling.

The combination keeps checkpoint cadence steady (~5 / sec) on a fast network without burning CPU on filesystems where fsync is slow, and falls back to the byte floor on a slow network.

How a write is atomic

A checkpoint write is never a partial overwrite:

  1. Serialise the checkpoint blob to <output>.peel.ckpt.tmp.
  2. fsync it.
  3. rename it over <output>.peel.ckpt.
  4. fsync the parent directory.

A crash during the write loses at most the in-flight checkpoint, not the previous one. The next run reads the previous checkpoint and resumes there.

What's in the checkpoint

The on-disk format is versioned (current version 7). The blob holds:

  • Source identity: Content-Length, ETag, Last-Modified, and the per-mirror metadata. Detects upstream drift (the source changing during a run).
  • Chunk bitmap and CRC32C fingerprints: which chunks are complete. The per-chunk fingerprint catches partial writes that were not yet marked.
  • Decoder state: per-format frame-aligned snapshot. For zstd, the inter-block state. For xz, the LZMA2 inter-chunk state. For gzip, a 32 KiB sliding-window snapshot plus the running CRC32 / ISIZE. For RAR, the §F1 blob capturing the LZ dictionary state and filter program cache.
  • Sink state: per-entry write progress for tar / zip / 7z / rar per-entry sinks.
  • Streaming SHA-256 state: if --sha256 is set, the SHA-256 intermediate state words are checkpointed so the resumed digest is byte-identical to sha256sum over the original file.

Resume guarantees

The output is byte-identical to a clean run if and only if:

  1. The source bytes at the same URL have not changed (ETag / Last-Modified verification catches this).
  2. The same peel version (or a forward-compatible one) is used to resume.
  3. The output directory has not been tampered with between runs. peel does not re-verify extracted files on resume; it trusts the checkpoint's record of what was written.

If the source has changed mid-run, peel's per-chunk CRC32C fingerprints catch the drift: a chunk's fingerprint at re-fetch time disagrees with what was checkpointed. peel aborts the resume with a specific "source changed during run" error rather than silently writing wrong bytes.

If the peel version changed and the checkpoint format is incompatible, the resume aborts at parse time. Re-run with the same version, or delete the sidecars (rm <output>.peel.part <output>.peel.ckpt) to start from scratch.

Resuming a run

There is no separate "resume" flag. Re-invoke the same command:

peel https://example.com/dataset.tar.zst -o ./out/
# Ctrl-C / kill -9 / network drop happens at 50% through.
# Sidecars remain on disk.

peel https://example.com/dataset.tar.zst -o ./out/
# Picks up at the last checkpoint, finishes the rest.

For multi-volume or multi-part runs, pass the same URL list / @file and the same -o. The checkpoint records the assembled source's identity, so partial progress across multiple URLs is preserved.

Crash-test coverage

The crash-test harness in tests/test_crash_resume.rs runs 100 random kill points per format and asserts that the post-resume output bytes are byte-identical to a clean run, every time. This verifies the byte-identical guarantee.

Inspecting a checkpoint

The checkpoint blob is not human-readable. Its presence on disk is inspectable:

$ ls -la ./out.peel.part ./out.peel.ckpt
-rw-r--r--  1 ag  staff  10737418240 May 13 14:22 ./out.peel.part   # logical size
-rw-r--r--  1 ag  staff       274432 May 13 14:22 ./out.peel.ckpt

# Physical size: what is actually on disk, after hole-punching
$ du -h ./out.peel.part
123M    ./out.peel.part

du -h reports the physical size, which is the in-flight window. The logical size (ls -la) is the full archive length.

RUST_LOG=debug peel … logs checkpoint writes as they happen:

DEBUG checkpoint write: bytes_since_last=8.0MiB seconds_since_last=2.1
DEBUG checkpoint write: bytes_since_last=8.0MiB seconds_since_last=2.0

Tuning checkpoint cadence

The defaults work well across the bench grid. Reasons to tune:

GoalFlagDirection
Fewer fsyncs on slow disks--checkpoint-min-bytesLarger (e.g. 64 MiB)
Tighter resume granularity for very long runs--checkpoint-min-secsSmaller (e.g. 1 s)
Steady cadence under highly variable network--checkpoint-target-secsSmaller (e.g. 0.1 s)
Disable rate-aware scaling for reproducibility--checkpoint-target-secs0

A more aggressive cadence trades extra fsync syscalls for finer-grained resume: less work lost on a kill -9, more CPU and IO during normal operation.

When resume can't help

A few scenarios fall outside the byte-identical-resume guarantee:

  • The source disappeared between runs. Sidecars stay on disk until removed; the next run fails at the HEAD probe with a clear error.
  • The output directory was partially modified by hand. peel does not re-verify already-extracted files. If this is suspected, delete the output and the sidecars and start over.
  • The checkpoint format is from an incompatible peel version. Delete the .peel.ckpt to start fresh from the part-file (the part-file's chunks are still individually verifiable via the inline fingerprints), or delete both sidecars to start completely from scratch.
  • Non-destructive local extraction. peel ./file.tar.zst (no -d) is a one-pass run with no checkpoint. The source remains intact on kill, so re-run.

Integrity verification

peel provides two integrity mechanisms layered on top of the streaming pipeline:

  1. --sha256 <HEX>: end-to-end source verification against the exact bytes produced by sha256sum over the original archive.
  2. Per-chunk CRC32C fingerprints: automatic drift detection inside the chunk bitmap, catching a source that changes mid-run or on resume.

The second mechanism is enabled by default. The first is opt-in via the --sha256 flag.

--sha256: end-to-end source verification

Pass the expected digest of the compressed source bytes, the value sha256sum dataset.tar.zst would print over the original archive (not over the extracted contents):

peel https://example.com/dataset.tar.zst \
  --sha256 ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad \
  -o ./out/

Behaviour:

  • peel streams a resumable SHA-256 over the source bytes as they arrive.
  • On clean completion, the digest is compared. A mismatch aborts the run with a specific error and exit code 1.
  • The hash state is checkpointed alongside everything else, so a resumed run produces a digest byte-identical to sha256sum on the original file.

The digest is 64 hex characters. Mixed case is accepted; whitespace is not.

Relationship to TLS

TLS protects against in-flight tampering. It does not protect against:

  • The origin serving a corrupted file.
  • A CDN mirror serving a stale or wrong file.
  • A --mirror URL pointing at a subtly different object.
  • A mid-flight transmission glitch that survives TLS framing (rare but observed).

--sha256 enforces a published-by-the-source contract (the project declares "this archive's hash is X") end-to-end.

Multi-URL runs

For multi-part URLs and multi-volume archives, --sha256 is repeatable. Pass zero flags (no verification) or exactly one --sha256 per URL, paired by order:

peel \
  https://host/dataset.tar.part0000 \
  https://host/dataset.tar.part0001 \
  --sha256 0a8de6e83fd8ba040fd052fd8d4fd0e009a9736ace5cb32bb2abd4ac6a61725d \
  --sha256 1bcf4d2e9aa01ff5...                                              \
  -o ./out/

Each part's hash is verified at its part-boundary as the decoder advances. A wrong number of --sha256 flags (1 hash for 2 URLs, 3 hashes for 2 URLs) is rejected at parse time.

Scope and limits

  • --sha256 covers the streaming pipeline: anything that goes through the .tar.* / raw codec / .7z path.
  • .zip archives extract per-entry and integrity checking does not extend to the streaming-source path in the current release. Each ZIP entry's own CRC32 (in the central directory) is still verified per-entry.
  • For .rar and .7z, the format's per-entry integrity check (RAR's BLAKE2sp / CRC32, 7z's per-substream CRC32) is verified independently and on top of --sha256.

CRC32C fingerprints: automatic drift detection

Every bitmap chunk (default 4 MiB) has a CRC32C fingerprint stored in the checkpoint. Two scenarios where this matters:

Mid-run source drift

If the source changes during a long run (someone re-uploaded the file, or a CDN edge invalidated and re-pulled a different version), a worker fetching a later chunk receives bytes that disagree with those of an earlier worker. The fingerprint comparison catches this case: peel aborts with a "source changed during run" error rather than producing wrong output.

Resume after a kill

On kill -9, the part-file may contain bytes for chunks that were not yet marked complete in the bitmap. On resume, peel:

  1. Reads the bitmap to find which chunks are complete.
  2. Re-verifies the fingerprint against the bytes on disk for any chunk near a recent bitmap update.
  3. Marks the chunk complete if the fingerprint matches, or re-fetches it otherwise.

This procedure makes a kill -9 mid-write safe. Bytes that landed on disk are reused when correct and refetched when not.

ETag / Last-Modified handling

When --sha256 is not set, peel uses ETag and Last-Modified as secondary identity signals:

  • At startup, the HEAD probe records the ETag and Last-Modified.
  • On resume, the ETag and Last-Modified are re-checked. A change indicates the source changed and the resume aborts.
  • For multi-mirror runs, mirrors with disagreeing ETags are dropped at startup (unless --sha256 is set, in which case the hash is the source of truth).

Strong ETags are honoured strictly. Weak ETags (W/"…") are treated as best-effort, since they may legitimately differ across CDN edges for the same file. A weak-ETag mismatch is logged but does not fail the run.

Reading a hash from a file

peel does not provide a --sha256-file flag. Use shell substitution:

# Bash, zsh:
peel "$URL" --sha256 "$(awk '{print $1}' dataset.tar.zst.sha256)" -o ./out/

# With process substitution:
peel "$URL" --sha256 $(< checksum.txt) -o ./out/

A future --sha256-file <PATH> flag is under consideration. Shell substitution is the recommended path in the interim.

Failure modes

Error messageCauseAction
digest mismatch--sha256 value disagrees with what was streamedCheck the source against the published hash; the file may have been re-uploaded
source changed during runCRC32C fingerprint disagrees between chunksRe-run; the source is unstable
ETag mismatch on resumeThe source's ETag changed since the run startedDelete the sidecars and start fresh, or pass --sha256 to trust the hash instead
multi-URL sha256 count mismatchWrong number of --sha256 for the URL countPass exactly one per URL, or none

Encrypted archives

peel decrypts encrypted ZIP, 7z, and RAR5 archives. It never encrypts. Re-encrypting an extracted stream to a different password is out of scope; pipe to 7z or zip for that.

The password is supplied via --password-from <SOURCE> and never appears on the command line. argv is visible to every process on the host and is the wrong default for a passphrase.

Supported schemes at a glance

FormatSchemeKDFAuthenticated
zipWinZip-AES (AE-1 / AE-2; AES-128/192/256-CTR)PBKDF2-HMAC-SHA1, 1000 iterationsHMAC-SHA1-80 trailer
zipPKWARE traditional "ZipCrypto" (CRC32-keyed PRGA)password-derived 12-byte headernone (CRC32 of plaintext)¹
rar5AES-256-CBC, archive-header encryption (type 4)PBKDF2-HMAC-SHA256, 2^(kdf_count+15) iterationsoptional pswcheck
rar5AES-256-CBC, per-file encryption (extra record 1)same as above (per-record salt / IV / kdf_count)optional pswcheck
7zAES-256-CBC (coder id 06:F1:07:01)bespoke SHA-256 "round-tower" KDFnone (CRC32 of plaintext)

¹ ZipCrypto is insecure: published 1994, broken under known-plaintext attack. Supported only for compatibility with archives that already use it.

Supplying a password

--password-from <SOURCE>

SourceUse it whenNotes
promptInteractive terminalReads /dev/tty directly (so a piped stdin carrying archive data can't accidentally answer). Echo disabled. Up to 3 retries on wrong password.
env:NAMECI / scripted runsReads the named environment variable. Strips a trailing newline; empty values are refused.
file:PATHLong-lived credential filesReads the first line of PATH. Modes other than 0600 emit a one-shot warning.
fd:NProcess substitution / pass integrationReads from file descriptor N (until EOF or newline). peel … --password-from fd:3 3< <(pass …).

Absence of --password=<value>

Process-list visibility (ps aux, /proc/<pid>/cmdline, Get-Process -IncludeUserName) is the wrong default for a passphrase. Every other source above keeps the password out of argv. For a one-step non-interactive invocation, wrap with env:NAME:

PEEL_PW="$(cat ~/.peel-passwords/dataset)" \
  peel "$URL" --password-from env:PEEL_PW -o ./out/
unset PEEL_PW

Examples

Interactive prompt

peel https://example.com/secret.zip -o ./out/ --password-from prompt

The prompt reads /dev/tty. Three failed attempts trigger exit code 4.

From an environment variable

PEEL_PW='hunter2' peel "$URL" --password-from env:PEEL_PW -o ./out/

From a file

echo 'hunter2' > /root/.peel-pw
chmod 0600 /root/.peel-pw

peel "$URL" --password-from file:/root/.peel-pw -o ./out/

The 0600 chmod silences the mode warning.

From an fd via process substitution

peel "$URL" --password-from fd:3 3< <(pass show archives/dataset) -o ./out/

Integrate with pass, gopass, 1password-cli, or any other passphrase manager that writes to stdout by piping its output into an fd peel reads.

RAR5 specifics

RAR5 has two independent encryption layers. An archive may use either, both, or neither.

Archive-header encryption (HEAD_CRYPT)

When present, every header after HEAD_CRYPT is AES-256-CBC encrypted under a per-archive key. Each encrypted header is prefixed by its own 16-byte IV and padded to a 16-byte boundary.

Data areas are not encrypted by this layer. They pass through cleartext (or under per-file encryption, below). peel's walker switches into encrypted-header mode after parsing HEAD_CRYPT.

Per-file data encryption (extra record type 1)

Each file header may carry an encryption record with its own salt, IV, kdf_count, and optional pswcheck. When present, the file's data area is AES-256-CBC encrypted under a per-file key.

Both layers share a single password (resolved once per archive). The kdf_count byte is capped at the spec maximum of 24 (= 2^39 iterations) before key derivation runs.

When a checkpoint resumes a partially-extracted run, encrypted entries restart from byte 0 on the in-flight entry. The CBC chain state cannot yet be migrated across a checkpoint snapshot. The sink replays the on-disk prefix to seed its hashes, so the user-visible bytes remain byte-identical to a clean run.

7z specifics

7z has a single encryption shape: an AES-256-CBC coder (id 06:F1:07:01) at the front of a folder's coder chain.

The coder props blob encodes:

  • numCyclesPower (low 6 bits of byte 0): the SHA-256 round-tower KDF runs 2^power rounds.
  • Optional salt (up to 16 bytes) and IV (up to 16 bytes), present when the high bits of byte 0 are set.

The KDF derives a 32-byte AES-256 key by hashing salt || password_utf16le || round_counter_le for each of the 2^power rounds. The on-disk IV is zero-padded to 16 bytes if shorter.

7z has no in-archive password verifier (unlike RAR5's optional pswcheck or ZIP-AES's 2-byte PBKDF2 verifier). The first correctness signal is the per-substream CRC32 inside the decoded plaintext. Under a wrong password the plaintext is random and the CRC32 mismatches with overwhelming probability. peel translates that into EncryptionError::PasswordIncorrect when it knows the folder is encrypted.

All folders in an archive share one password (loaded lazily on the first encrypted folder), matching 7-Zip's own behaviour. Resume restarts the in-flight folder from byte 0, the same constraint as RAR5's per-file encryption, for the same reason (CBC chain state).

Exit code 4

Password-related failures use a dedicated exit code so scripts can distinguish them from generic extraction failures:

  • 0: extraction completed.
  • 1: generic extraction or I/O failure.
  • 4: PasswordIncorrect or PasswordMissing anywhere in the error chain.
  • 128 + signum: graceful shutdown after SIGINT (130) or SIGTERM (143). The .peel.part / .peel.ckpt sidecars are left on disk; re-running resumes.

A retry loop on wrong password looks like:

while true; do
  peel "$URL" --password-from prompt -o ./out/ && break
  rc=$?
  if [ "$rc" != "4" ]; then
    echo "peel failed with code $rc (not a password issue)"; exit "$rc"
  fi
  echo "wrong password, retry"
done

Threat model

peel decrypts. It does not authenticate the user. It has no support for hardware tokens, smart cards, GPG-encrypted passphrases, or biometric unlock. The user supplies a passphrase via one of the --password-from sources; everything beyond that is the operating system's responsibility.

peel does not protect against an attacker with:

  • Read access to the process's address space (/proc/<pid>/mem on Linux, vmmap on macOS). The internal Password wrapper zeroises its backing storage on drop, but a snapshot taken during key derivation will see the cleartext.
  • Read access to the swap device. If the machine swaps mid-extraction the passphrase may be written to disk. Disable swap for the workload if this matters.
  • Read access to argv, which is precisely why --password=<value> does not exist.
  • Precise micro-architectural timing side-channels (Spectre-class, cache-timing on a co-located VM). Tag comparisons go through a length-stable ct_eq function, but the underlying AES / HMAC primitives are not cycle-constant.

peel does apply the following discipline:

  • Every cryptographic primitive ships with a differential test suite cross-checking against a reference upstream crate (sha1, hmac, pbkdf2, sha2, aes, ctr, cbc). The runtime binary links none of these.
  • Tag and verifier comparisons route through crypto::ct_eq.
  • Password bytes never travel through any code path that prints Debug. The Password type explicitly redacts.
  • All KDF iteration counts come from the archive's header. peel does not guess a sensible default.

Out of scope, permanently

  • Re-encrypting on the fly. peel never encrypts. "Decrypt this remote archive and re-encrypt to a different password" is not a peel job.
  • Password-protected gzip / xz / lz4 / zstd. None of these formats has a native encryption layer; the convention "GPG-encrypted tarball" is a separate pipeline.
  • ZIP central-directory encryption (PKWARE strong-encryption spec, general-purpose flag bit 6). Used by approximately one product outside enterprise contexts. Surfaces as unsupported feature: PKWARE strong encryption.
  • Hardware-accelerated AES (AES-NI). Software AES first; a runtime-probed AES-NI path may land later behind a feature flag.

Verifying the primitives

Every primitive ships with a differential test suite that runs ≥ 1000 random inputs through both peel's implementation and the upstream reference crate, asserting byte-identical output. The corpus also includes known-answer vectors from the format specs themselves (FIPS 197 for AES, RFC 3174 for SHA-1, NIST SP 800-132 for PBKDF2).

cargo test --tests test_crypto_diff

The reference crates pinned in [dev-dependencies] are sha1, hmac, pbkdf2, aes, ctr, cbc, sha2, blake2. The runtime binary links none of these.

Performance and tuning

peel's defaults target a laptop-class machine on a healthy network and land within ~6% of the system tools across the bench grid in the project README. Outside that envelope (extremely high bandwidth, severe memory pressure, picky filesystems, locked-down kernels), the following knobs matter.

File-IO backend

--io-backend <auto|blocking|uring|mmap>

The part-file is written from many workers concurrently, then read linearly by the decoder, then hole-punched. Three backends implement this differently:

BackendWorkers write viaPuncherSockets
blockingpwrite(2)fallocate(PUNCH_HOLE) / F_PUNCHHOLEBlocking BSD sockets
mmap (Linux only)memcpy into MAP_SHAREDmadvise(MADV_REMOVE)Blocking BSD sockets
uring (Linux only)pwrite SQE on the ringfallocate(PUNCH_HOLE) SQETCP connect / send / recv on the ring
auto (default)Probes each at startup

What auto picks

On Linux, auto selects:

  • mmap for the part-file if the filesystem supports MADV_REMOVE (probed at startup with a small test mapping). All major Linux filesystems do. The probe fails on some unusual mounts (for example, tmpfs does not accept MADV_REMOVE and falls back cleanly).
  • io_uring for the HTTP client's sockets if io_uring_setup succeeds. Falls back to blocking sockets with one info! log if the kernel rejects ring construction (kernel < 5.6, seccomp blocking such as cri-o's default profile under Kubernetes, or RLIMIT_MEMLOCK too low).

On non-Linux platforms (macOS, BSD), auto selects the blocking backend for both sockets and file IO. No io_uring equivalent exists. Mmap with hole-punching works but does not beat the blocking path by enough to default to it.

When to override

  • A/B benchmarking. --io-backend blocking forces the pre-io_uring path everywhere. Useful for measuring the speedup the fast paths contribute on a given network.
  • Hard requirement on io_uring. --io-backend uring errors out if the kernel cannot construct a ring. Suitable for CI verification that the fast path is actually in use.
  • Hard requirement on mmap. --io-backend mmap selects the mmap part-file path explicitly with the blocking socket backend. Same memcpy-into-the-mapping shape, but no io_uring for sockets.

Confirming what got selected

RUST_LOG=info peel <URL> -o ./out/ 2>&1 | grep 'selected'
# selected file IO backend = mmap
# selected socket backend = io_uring

HTTP version

--http-version <auto|h1|h2>

ValueBehaviour
auto (default)ALPN-negotiate H1 / H2 over TLS; H1 over plaintext
h1Force HTTP/1.1
h2Force HTTP/2 (prior-knowledge h2c over plaintext)

When to override

  • Suspected H2 misbehaviour. Some origins (and some middleboxes, typically corporate proxies) handle ranged GETs better on H1 than on H2. Force --http-version h1 to test.
  • Origin only speaks h2c. Plaintext H2 does not ALPN-negotiate and must be selected explicitly.
  • Origin negotiation fails. TLS handshakes that succeed but negotiate something peel cannot use surface as a clear error with the negotiated protocol name.

Bandwidth and disk caps

--max-bandwidth <RATE>

Aggregate token-bucket cap across all workers and mirrors. Accepts decimal (K, M, G, T, 1000-based) or binary (Ki, Mi, Gi, Ti) suffixes. Trailing B and /s are accepted and ignored.

peel <URL> --max-bandwidth 50MB/s   -o ./out/    # 50 megabytes/s, decimal
peel <URL> --max-bandwidth 512MiB/s -o ./out/    # 512 mebibytes/s, binary
peel <URL> --max-bandwidth 1000000  -o ./out/    # 1 million bytes/s

The cap is aggregate, not per-mirror: --max-bandwidth 50MB/s with three mirrors caps the total at 50 MB/s, not 150 MB/s.

When to use it:

  • Polite scraping of a public mirror.
  • Co-tenant workloads where peel must not saturate the pipe.
  • Reproducible benchmarks needing a deterministic wire-time floor.

--max-disk-buffer <SIZE>

Cap on the on-disk lookahead: bytes downloaded but not yet consumed by the decoder. When the gap reaches this value, the scheduler stops dispatching new chunks until the decoder catches up.

peel <URL> --max-disk-buffer 256MiB -o ./out/    # tighter cap for memory-constrained env
peel <URL> --max-disk-buffer none   -o ./out/    # disable

Default 1GiB. The default rarely engages on a healthy disk and bounds the part-file's physical size on a slow one.

When to lower it:

  • Containers with a hard ephemeral-disk quota (Kubernetes pods with small emptyDir, CI runners with capped tmpfs).
  • A network much faster than the disk (10 Gbps NIC and a spinning disk output target) where the part-file must not balloon before the decoder catches up.

When to raise it (or disable):

  • Decoder is the bottleneck and network bursts should be absorbed fully into the buffer.
  • Very fast disk with a slow or bursty network where pre-buffering wins.

Worker count

--workers <N>

Default 4. The scheduler will not dispatch more than this many concurrent ranged GETs against the primary or any mirror.

Tuning matrix:

SymptomDirection
Wire under-utilised, origin far away (high RTT)Raise: more workers in flight overlap the RTT
Origin returns 429 / 503 under loadLower: back off the per-origin parallelism
Per-worker throughput collapses with more workersLower: local CPU or NIC is the bottleneck
Memory pressure from many in-flight buffersLower: each worker holds its in-flight chunk

The default 4 suits a laptop-class machine on a healthy network. On a high-spec server pulling from a far CDN, 8–16 is often faster. On a constrained client, 2 can win.

Chunk-size tuning

--chunk-size <BYTES> + --no-adaptive-chunk-size

The bitmap chunk size is the unit of completion tracked in checkpoints (default 4 MiB). It is also the smallest possible ranged GET.

With adaptive sizing (the default), the scheduler watches per-GET latency and retry rate and may coalesce several consecutive bitmap chunks into a single ranged GET:

  • 1 MiB floor, 64 MiB cap.
  • 30 s hysteresis: the scheduler waits before reacting to transient changes.
  • Bitmap unit and dispatch unit are decoupled. Checkpoints stay fine-grained while the wire-level request size scales with the network.

Pass --no-adaptive-chunk-size to lock dispatch to the bitmap unit. The scheduler then dispatches exactly one bitmap chunk per worker task, with no growth or shrink decisions over the lifetime of the run. Useful for benchmarking and reproducible test runs.

Puncher and checkpoint cadence

--punch-threshold <BYTES>

Minimum gap between in-loop hole-punch syscalls (default 4 MiB).

  • Smaller: tighter physical-disk footprint, more syscalls.
  • Larger: fewer syscalls, larger transient physical footprint.

Tune downward for a hard ceiling on physical disk usage. Tune upward if the filesystem's punch-hole implementation is slow (some network-attached storage backends have noticeable per-punch cost).

--checkpoint-min-bytes / --checkpoint-min-secs / --checkpoint-target-secs

See Checkpoint and resume for the full discussion of these.

Defaults: 8 MiB, 2 s, 0.2 s target. Raise the min-bytes floor when the filesystem has slow fsync and it dominates wall-clock. Lower it for tighter resume granularity on very long runs.

Workdir placement

--workdir <DIR>

Place the .peel.part and .peel.ckpt sidecars in a separate directory from the output.

Use cases:

  • Slow output disk, fast scratch disk. Extract onto slow HDD-backed /data, keep the in-flight part-file on fast NVMe at /var/cache/peel:

    peel <URL> -o /data/out/ --workdir /var/cache/peel/
    
  • Persistent Kubernetes PVC, ephemeral container scratch. Output goes onto the PVC mount, sidecars onto the container's ephemeral scratch. Resume across pod restarts uses the PVC's data and the ephemeral checkpoint as a fast-path optimisation. After a pod restart, delete the ephemeral checkpoint and peel re-derives state from the part-file's bytes.

  • Read-only output filesystem. Output is a read-only mount where only the extracted contents are needed. Sidecars go to a writable scratch dir.

The directory is created if missing. Basenames stay the same (<output_name>.peel.part, <output_name>.peel.ckpt). Only the parent directory changes.

Progress and logging

peel emits a live three-line block on a TTY:

download: 412.3 MiB / 1.2 GiB (33.7%) @ 187 MB/s  (4 workers, 312 MiB on disk)
extract : 387.1 MiB / 1.2 GiB (31.8%) @ 178 MB/s
eta     : 4.6s

On a non-TTY (CI logs, redirected output), the progress UI falls back to periodic tracing::info! lines. No extra flag is needed.

RUST_LOG=<level> controls verbosity:

  • RUST_LOG=warn: only warnings and errors. The default when RUST_LOG is unset is info.
  • RUST_LOG=info: startup banners (selected backend, discovered volumes, mirror probes, checkpoint cadence summaries).
  • RUST_LOG=debug: per-chunk dispatch, per-checkpoint writes, per-mirror selection decisions.
RUST_LOG=debug peel <URL> -o ./out/ 2>peel-debug.log

A typical tuning workflow

  1. Run with defaults. Inspect the progress UI's download rate and "on disk" footprint.
  2. If the download rate is far below what the network should support, raise --workers (try 8, then 16).
  3. If the rate is fine but the physical disk footprint is uncomfortable, lower --max-disk-buffer and --punch-threshold.
  4. If fsync dominates CPU on a slow disk, raise --checkpoint-min-bytes (try 64 MiB).
  5. Fallback warnings from --io-backend auto are expected on most non-Linux or restricted-kernel hosts. Verify with RUST_LOG=info that the blocking backend is selected, then check throughput before concluding the fallback matters.

Kubernetes init container

A Kubernetes init container that hydrates a PersistentVolumeClaim from a remote archive is a primary peel use case. The PVC is sized for the extracted contents plus a small download window, not for compressed + extracted. Resume across pod restarts is automatic.

The minimal Pod spec

apiVersion: v1
kind: Pod
metadata:
  name: model-server
spec:
  volumes:
    - name: model
      persistentVolumeClaim:
        claimName: model-pvc

  initContainers:
    - name: hydrate
      image: ghcr.io/agouin/peel:latest
      args:
        - https://models.example.com/llama-3.tar.zst
        - --sha256
        - ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
        - --max-bandwidth
        - 200MB/s
        - -o
        - /model/
      volumeMounts:
        - name: model
          mountPath: /model

  containers:
    - name: app
      image: ghcr.io/example/model-server
      volumeMounts:
        - name: model
          mountPath: /model
          readOnly: true

Properties of this configuration:

  • PVC sizing: extracted_size + ~300 MB, not archive_size + extracted_size. A 40 GiB extracted model fits on a 41 GiB PVC.
  • Resume across pod restarts: if the init container OOM-kills or the node reboots mid-extraction, the next pod restart picks up at the last checkpoint. The sidecars (.peel.part, .peel.ckpt) live on the PVC, so they survive the restart.
  • Integrity: --sha256 verifies the source end-to-end. A corrupted upstream produces a clear failure rather than a silently-bad model.
  • Bandwidth limiting: --max-bandwidth 200MB/s prevents the hydration from saturating shared cluster network.

Sizing the PVC

Roughly:

PVC size = extracted_size                 # the model / dataset
         + --max-disk-buffer (default 1G) # in-flight window
         + ~100 MiB                       # checkpoint + filesystem overhead
         + some slack                     # for the workload to grow

If disk is tight, lower --max-disk-buffer:

args:
  - https://models.example.com/llama-3.tar.zst
  - --max-disk-buffer
  - 256MiB
  - -o
  - /model/

This tightens the lookahead floor to 256 MiB. The decoder blocks briefly when the network outruns it. There is no correctness penalty.

Sidecars on ephemeral scratch

To keep per-pod sidecars off a shared PVC, place them on the container's writable scratch layer:

args:
  - https://models.example.com/llama-3.tar.zst
  - --workdir
  - /tmp/peel-state
  - -o
  - /model/
volumeMounts:
  - name: model
    mountPath: /model

Tradeoff: the sidecars do not survive a pod restart, so resume is lost. The next pod restart re-fetches from scratch. This is acceptable for short-running hydration and unsuitable for large archives over flaky networks.

RBAC / network policy

peel makes outbound HTTP/HTTPS to the supplied URLs. It does not talk to the Kubernetes API. The cluster's egress policy must allow the origin host(s). The peel container itself requires no elevated permissions: it runs as a normal user and requires no CAP_* capabilities.

Multi-mirror in-cluster

For intra-cluster mirrors of the same archive (e.g. a MinIO bucket inside the cluster plus a public origin outside), use --mirror:

args:
  - https://internal-cache.svc.cluster.local/llama-3.tar.zst
  - --mirror
  - https://models.example.com/llama-3.tar.zst
  - --sha256
  - ba7816bf...
  - -o
  - /model/

peel prefers the internal mirror (faster, no egress cost) and falls back to the public origin only if the internal one fails.

io_uring inside the pod

By default on Linux 5.6+, peel uses io_uring for sockets. cri-o's default seccomp profile blocks io_uring_* syscalls, so in practice peel logs one fallback warning at startup and continues with the blocking backend. Two options exist for enabling io_uring:

  1. Loosen the seccomp profile (requires securityContext.seccompProfile.type: Unconfined or a custom profile that allows io_uring_*).
  2. Accept the fallback. peel operates correctly without io_uring, with reduced throughput on high-bandwidth links.

Most clusters take option 2. Revisit if the workload is bandwidth-bound.

A complete example with secrets

apiVersion: v1
kind: Secret
metadata:
  name: archive-password
type: Opaque
stringData:
  password: my-archive-password

---
apiVersion: v1
kind: Pod
metadata:
  name: hydrated-app
spec:
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: data-pvc

  initContainers:
    - name: hydrate
      image: ghcr.io/agouin/peel:latest
      env:
        - name: PEEL_PW
          valueFrom:
            secretKeyRef:
              name: archive-password
              key: password
      args:
        - https://example.com/secret.tar.zst
        - --password-from
        - env:PEEL_PW
        - --sha256
        - ba7816bf...
        - -o
        - /data/
      volumeMounts:
        - name: data
          mountPath: /data

  containers:
    - name: app
      image: ghcr.io/example/app
      volumeMounts:
        - name: data
          mountPath: /data
          readOnly: true

The password is mounted as an env var via the standard Secret mechanism. peel reads it via --password-from env:PEEL_PW. The secret never appears on the command line.

Comparison with curl + tar -x

The naive shape, which has the problems described below:

# NOT recommended
- name: hydrate
  image: alpine
  command: [sh, -c]
  args:
    - |
      apk add curl tar zstd
      curl -fL "$URL" -o /tmp/data.tar.zst
      tar -I zstd -xf /tmp/data.tar.zst -C /data/
      rm /tmp/data.tar.zst

Problems:

  • PVC size: peak disk = archive_size + extracted_size. A 40 GiB extracted model needs an 80+ GiB PVC.
  • No resume: OOM-kill mid-download restarts from byte 0.
  • No integrity: curl does not verify a hash. Layering sha256sum on afterwards is a separate step.
  • Single TCP stream: parallel ranged GETs are faster on high-RTT origins.
  • Image bulk: apk add pulls packages every pod restart.

A single peel invocation addresses all of these.

CI runner with flaky network

CI runners typically have small ephemeral disks, flaky outbound network (especially in self-hosted runners behind corporate proxies), and a strong preference for fail-fast. A job that silently produces a different artifact is worse than a job that errors clearly.

peel addresses all three:

  • Bounded compressed-side disk usage via the sliding lookahead window.
  • Resume on transient network failure without losing the partial download.
  • --strict-format and --sha256 turn upstream drift into a clear exit code 1 rather than a degraded build.

GitHub Actions example

name: ml-test

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install peel
        run: cargo install peel-rs --locked

      - name: Hydrate model fixtures
        run: |
          peel \
            https://fixtures.example.com/models-v3.tar.zst \
            --sha256 ${{ vars.MODELS_SHA256 }} \
            --strict-format \
            --max-disk-buffer 512MiB \
            -o ./fixtures/

      - name: Run tests
        run: cargo test --release

Flag behavior:

  • --sha256 ${{ vars.MODELS_SHA256 }}: the expected hash is in the repo's Actions variables, so a wrong-fixture upload fails CI immediately. The hash is in version control implicitly. It ratchets forward as the team uploads new fixtures and updates the variable.
  • --strict-format: if the upstream URL ever serves a different shape (e.g. a 404 HTML page with a 200 status code from a misbehaving proxy), the run fails clearly instead of producing a corrupt fixtures directory.
  • --max-disk-buffer 512MiB: GitHub-hosted runners have ~14 GB free. Capping the lookahead avoids transient disk pressure during hydration.

GitLab CI example

test:
  stage: test
  image: rust:1.93
  before_script:
    - cargo install peel-rs --locked
  script:
    - >
      peel
      "$FIXTURE_URL"
      --sha256 "$FIXTURE_SHA256"
      --strict-format
      --max-disk-buffer 256MiB
      -o ./fixtures/
    - cargo test --release
  variables:
    FIXTURE_URL: https://fixtures.example.com/models-v3.tar.zst
    FIXTURE_SHA256: ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
  cache:
    paths:
      - fixtures.peel.part
      - fixtures.peel.ckpt

The cache directive retains fixtures.peel.part and fixtures.peel.ckpt between runs. If a previous run was interrupted partway through (timeout, runner restart), the next run resumes from the checkpoint, saving network bandwidth and wall-clock on every retry.

Self-hosted runner behind a corporate proxy

A frequent CI failure mode is a self-hosted runner behind an HTTPS proxy that does TLS termination with its own CA. peel honours SSL_CERT_FILE:

- name: Hydrate fixtures
  env:
    HTTPS_PROXY: https://proxy.corp.example.com:8443
    SSL_CERT_FILE: /etc/ssl/certs/corp-bundle.pem
  run: peel "$FIXTURE_URL" --sha256 "$FIXTURE_SHA256" -o ./fixtures/

If the proxy mangles HTTP/2 (the most common cause of intermittent hydration failures on locked-down corporate networks), force HTTP/1.1:

run: peel "$FIXTURE_URL" --http-version h1 -o ./fixtures/

Caching the extracted output

If the CI cache supports it, cache the extracted output directly rather than only the sidecars:

- uses: actions/cache@v4
  with:
    path: ./fixtures/
    key: fixtures-${{ vars.MODELS_SHA256 }}

- if: steps.cache.outputs.cache-hit != 'true'
  run: peel "$URL" --sha256 "$SHA256" -o ./fixtures/

Hydration runs only when the cache misses. The cache key includes the SHA-256, so an updated fixture set automatically invalidates the cache.

Failing the build on upstream drift

The combination of --sha256 + --strict-format is the strongest guarantee:

Failure--sha256 catches?--strict-format catches?
Upstream re-uploaded a corrupted file
Upstream serves a 200 status on a 404 HTML body
Upstream changed the format (.tar.zst.tar.gz)
Upstream re-uploaded a legitimately-different file
Mirror is serving stale content

Use both in CI. Omit them only when downloading a non-deterministic resource by intent.

Comparison with actions/cache

If the CI has a well-managed artifact cache (sized, verified, mirrored), and the archive is small enough that download time is not a concern, actions/cache (or actions/restore-cache, or the CI's equivalent) is simpler. peel is preferable when:

  • The archive is large enough that hydration time matters.
  • End-to-end verification of the source is required, not just the cache.
  • The CI's cache TTL is shorter than the fixture's lifetime, so cache misses force a re-hydration where bounded disk and resume matter.
  • Integration is from outside the CI (e.g. a pre-job step in a test orchestrator that lacks CI-native caching).

Exit code handling

CI scripts want to distinguish "fixture hydration failed transiently" from "fixture is wrong":

#!/usr/bin/env bash
set -u

peel "$URL" --sha256 "$SHA256" --strict-format -o ./fixtures/
rc=$?

case "$rc" in
  0)   echo "fixtures ready"; exit 0 ;;
  1)
    # Generic failure: could be transient network, disk full, hash mismatch.
    # Check stderr to distinguish. For CI, retry once.
    echo "first attempt failed; sleeping 10s then retry"
    sleep 10
    peel "$URL" --sha256 "$SHA256" --strict-format -o ./fixtures/
    ;;
  *)
    echo "peel failed with $rc; not retrying"
    exit "$rc"
    ;;
esac

See Exit codes for the full list.

Arbitrum snapshot bundle

Arbitrum publishes Nitro chain snapshots as multi-part archives: pruned.tar.part0000 through pruned.tar.partNNNN, each ~5–15 GiB, totalling 200–500 GiB depending on chain and pruning mode. Per-part SHA-256s are published in a manifest.

This workload uses peel's multi-part URL path. The bundled scripts/arb-snapshot.sh in the repo wraps it.

The manual version

peel \
  https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0000 \
  https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0001 \
  https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0002 \
  ... \
  --sha256 0a8de6e83fd8ba040fd052fd8d4fd0e009a9736ace5cb32bb2abd4ac6a61725d \
  --sha256 1bcf4d2e9aa01ff5e8aa72a2ab39310af020bdb6f76d6f7c75c7c14ade38c6ce \
  --sha256 c40bf8a2cb9d9a90e4c80a5b7c6e9c5d3b8a2e1f9d4a6c1b7e2f8d3a5c0b9e1f0 \
  ... \
  -o ./nova-out/

The byte-concatenation of every URL's body is decoded as one logical pruned.tar, written into ./nova-out/. Per-part hashes are verified at each part boundary as the decoder advances.

Via a manifest file

For chains with dozens of parts, a per-line manifest is cleaner:

# nova-volumes.txt
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0000
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0001
https://snapshot.arbitrum.io/nova/2026-04-26-7efe0f23/pruned.tar.part0002
# ... etc
peel @nova-volumes.txt -o ./nova-out/

For hashes, generate a --sha256 arg list from the published manifest:

peel @nova-volumes.txt \
  $(jq -r '.parts[] | "--sha256 \(.sha256)"' nova-manifest.json) \
  -o ./nova-out/

Disk math

A typical Nitro Nova snapshot:

  • Total compressed (sum of all parts): ~120 GiB
  • Extracted: ~340 GiB

With peel, peak disk = extracted_size + lookahead_window~341 GiB. The --max-disk-buffer default (1 GiB) bounds the compressed-side window.

Without peel (download-all-then-extract): peak disk = compressed_size + extracted_size~460 GiB.

This is a 120 GiB savings on a single node. For a fleet, the multiplier matters. For a one-node bootstrapping flow on tight disk, it is the difference between "works" and "does not fit."

On Kubernetes

Snapshot hydration matches the Kubernetes init container workflow. The PVC sizes to ~extracted_size + 1 GiB instead of ~compressed + extracted, and a pod restart mid-hydration resumes at the last checkpoint:

initContainers:
  - name: hydrate-nova
    image: ghcr.io/agouin/peel:latest
    args:
      - @/manifest/nova-volumes.txt
      - --sha256-from-file=/manifest/nova-hashes.txt   # (planned)
      - --max-bandwidth
      - 500MB/s
      - -o
      - /chain/
    volumeMounts:
      - name: chain
        mountPath: /chain
      - name: manifest
        mountPath: /manifest
        readOnly: true

(--sha256-from-file is on the roadmap; for now, expand inline via shell substitution.)

Bandwidth limiting

Arbitrum snapshot mirrors are CloudFront-fronted with generous burst allowances, but a fleet of nodes hydrating simultaneously will hit rate-limits. The default --workers 4 is conservative. Raise to --workers 8 on a fat pipe when needed. Add --max-bandwidth 500MB/s to bound aggregate throughput.

Recovery from kill -9

Snapshot hydration is a long-running, interruptible workload. Power loss, OOM, scheduler eviction, and upstream rate-limit-induced retries are all normal.

In every case, re-run the same command:

peel @nova-volumes.txt --sha256 ... -o ./nova-out/

peel reads the .peel.ckpt next to ./nova-out, picks up at the checkpointed part and byte, and continues. Bytes already extracted to ./nova-out/ are kept, not re-written. Final output is byte-identical to a clean single run.

See also

Bare downloader (aria2c replacement)

peel --no-extract is a parallel-ranged-GET downloader with mirror fan-out, SHA-256 verification, and resume. It covers the same surface as aria2c, minus the extract step.

The basic case

peel https://example.com/big-file.iso --no-extract

Behavior:

  • Issues 4 parallel ranged GETs against the URL.
  • Writes the bytes to <basename>.peel.part (sparse).
  • On clean completion, renames to the final filename (big-file.iso).

The bytes never pass through a decoder. No hole-punching occurs, since no decoder advances the puncher. The part-file grows to the full Content-Length.

--download-only is an alias for callers who prefer aria2c-style naming.

With explicit output path

peel https://example.com/big-file.iso --no-extract -o /downloads/
peel https://example.com/big-file.iso --no-extract -o /downloads/renamed.iso

Same semantics as extract mode: a trailing slash makes -o a directory, otherwise it is the final file path.

With hash verification

peel https://example.com/big-file.iso \
  --no-extract \
  --sha256 ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad

The SHA-256 is checked against the downloaded bytes. This is the same hash that sha256sum big-file.iso would produce after the download finishes, without the separate hash step.

With mirror fan-out

peel https://primary.example.com/big-file.iso \
  --mirror https://eu.mirror.example.com/big-file.iso \
  --mirror https://us.mirror.example.com/big-file.iso \
  --sha256 ba7816bf... \
  --no-extract

All Multi-mirror downloads machinery applies: parallel HEAD validation at startup, per-mirror health tracking, 30s exclusion on failure, aggregate bandwidth cap.

With bandwidth cap

peel https://example.com/big-file.iso \
  --no-extract \
  --max-bandwidth 10MB/s

Useful when:

  • Downloading on a shared link where saturating the pipe is disruptive.
  • Cron-scheduled downloads that should run at steady-state rather than burst-and-idle.

With resume across kills

--no-extract has the same resume guarantee as extract mode. Ctrl-C / kill -9 / network drop / OOM:

peel https://example.com/big-file.iso --no-extract
# ... interrupted at 40% ...

peel https://example.com/big-file.iso --no-extract
# Picks up where it left off, completes the rest.

The sidecars (big-file.iso.peel.part and big-file.iso.peel.ckpt) stay on disk between runs.

Choosing peel --no-extract over aria2c

NeedTool
Parallel ranged GETsboth
Resume on kill -9both
Multiple URLs treated as one logical fileboth (aria2c -Z, peel's multi-part URL path)
Multiple URLs serving the same file (mirror fan-out)both
SHA-256 verificationboth
Hand-rolled, vetted single-binary installpeel
Out-of-band integration with a streaming extract steponly peel (toggling --no-extract switches to default extract mode)
Bittorrent / Metalink / multi-protocolonly aria2c
Browser-style cookie handling, OAuth, etc.only aria2c

peel --no-extract applies when callers want the streaming/resumable/parallel guarantee without a separate extract step. For a plain download (no archive) where the eventual extract is not a concern, it offers aria2c-level UX in one binary.

A typical script

#!/usr/bin/env bash
set -euo pipefail

URL=https://example.com/big-file.iso
SHA256=ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
OUT=/downloads/big-file.iso

peel "$URL" \
  --no-extract \
  --sha256 "$SHA256" \
  --max-bandwidth 50MB/s \
  -o "$OUT"

echo "Downloaded and verified: $OUT"

Re-running this script after any failure resumes from the last checkpoint. Re-running it after a clean completion is a no-op (the file is already at $OUT, the sidecars are gone, and the next invocation downloads from scratch because there is no checkpoint to resume from).

For "only download if not already present" semantics, wrap with a check:

if [ ! -s "$OUT" ]; then
  peel "$URL" --no-extract --sha256 "$SHA256" -o "$OUT"
fi

(The -s test checks for non-empty, which catches both "missing" and "empty partial".)

Non-goals

  • Torrent client. No DHT, no peers.
  • Protocol-coercing tool. HTTP / HTTPS only.
  • Auth-aware downloader. No OAuth flow, no browser-cookie import. For URLs that require auth, pre-sign or pass a custom Authorization header via a reverse proxy. peel honours HTTP_PROXY / HTTPS_PROXY / NO_PROXY env vars.

Troubleshooting

Symptoms, likely causes, and verification steps. For problems not listed here, see Exit codes for the error code key and FAQ for design-rationale questions.

"No space left on device"

The extraction filled the output filesystem. Two causes:

  1. The extracted tree is genuinely bigger than the free space. Check the archive's expected uncompressed size (most formats report it in their metadata, and peel logs it on the progress UI's first line). If that exceeds free space, more disk is required: the sliding window only bounds the compressed side.

  2. The part-file's lookahead window grew faster than the decoder consumed it. Lower --max-disk-buffer (default 1 GiB) so the scheduler back-pressures sooner:

    peel <URL> --max-disk-buffer 128MiB -o ./out/
    

    Then confirm hole-punching is working (see the next section).

"Hole-punching seems disabled / part-file is huge"

Check the part-file's physical size (du -h) versus its logical size (ls -la):

$ ls -la out.peel.part out.peel.ckpt
-rw-r--r--  ...  10737418240 ... out.peel.part   # 10 GiB logical
$ du -h out.peel.part
182M    out.peel.part                           # 182 MiB physical (healthy)

If du is close to ls, hole-punching is not trimming. Possible causes:

  • -k/--keep-archive is set. The puncher is intentionally disabled. Remove -k if archive preservation is not required.

  • --no-extract is set. Nothing decodes, so nothing punches. Expected for --no-extract: the bytes are kept verbatim.

  • The filesystem does not support punch-hole. Some unusual mounts and old kernels reject it. peel logs a warn! at startup when the probe fails:

    WARN  filesystem rejected MADV_REMOVE probe, falling back to fallocate(PUNCH_HOLE)
    WARN  filesystem rejected fallocate(PUNCH_HOLE) probe, source bytes will not be released
    

    Move the workdir to a filesystem that supports it (--workdir /var/tmp/peel), or accept the larger transient footprint.

"io_uring fallback warning"

On Linux, one of these messages may appear:

WARN  io_uring_setup failed (errno=1 EPERM), falling back to blocking sockets
WARN  io_uring not available (kernel < 5.6), falling back to blocking sockets
WARN  RLIMIT_MEMLOCK too low for io_uring (need at least N KiB), falling back to blocking sockets

These are informational, not errors. peel falls back to the blocking backend and continues. The fallback path is the same code every non-Linux build uses; results are correct either way.

To force io_uring and fail-fast when it is not available:

peel <URL> --io-backend uring -o ./out/

Common causes:

  • Seccomp profile blocks the syscalls. cri-o's default profile is the most common case under Kubernetes. Add io_uring_* to the allowed syscalls or accept the fallback.
  • Kernel too old. Minimum 5.6 for the SQEs peel uses. In practice 5.10+ is more reliable.
  • RLIMIT_MEMLOCK too low. Container default may be 16 KiB, while io_uring rings need a few MiB. Raise the limit (ulimit -l unlimited in the container spec) or accept the fallback.

"Wrong digest at completion"

error: digest mismatch
  expected: ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
  got:      e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

The streamed bytes did not hash to the value asserted by --sha256. Possibilities:

  1. The source moved or was re-uploaded. Re-download a small sample (curl --range 0-1023 $URL | sha256sum) and compare against what the publisher currently advertises.
  2. Wrong hash supplied. Double-check the hash source.
  3. A --mirror is serving subtly different bytes. Remove mirrors one at a time to identify the culprit. peel drops misbehaving mirrors at the HEAD validation step, but some forms of subtle corruption only show up post-decoder.

"Source changed during run"

error: source changed during run
  chunk 1247 fingerprint mismatch: stored=…, refetched=…

The CRC32C fingerprint of a chunk does not agree across fetches. The source bytes changed between when one worker fetched the chunk and when another did. Causes:

  • CDN-edge cache drift. Common with mirror infrastructure that is mid-rollout. Wait for propagation, then re-run.
  • Origin re-uploaded the file. Check the upstream publishing timeline.
  • Network corruption. Rare with TLS, but observed on some middlebox-heavy paths. Repeat the run; the second attempt usually succeeds.

Delete the .peel.ckpt to start fresh from the part-file's bytes (which peel re-verifies chunk-by-chunk), or delete both sidecars to start completely from scratch.

"ETag mismatch on resume"

error: source identity changed since last run
  ETag at startup: "8b1a9953c4611296a827abf8c47804d7"
  ETag now:        "65a8e27d8879283831b664bd8b7f0ad4"

The source's ETag (or Last-Modified) changed between the start of the run and the resume. peel aborts rather than silently mixing bytes from two different versions of the file.

Two fixes:

  1. Delete the sidecars and start from scratch. The bytes already on disk belong to the previous version and are not useful.
  2. Pass --sha256 of the new version. With --sha256 set, the hash is the source of truth, and agreeing mirrors are trusted regardless of ETag drift.

"Multi-volume probe returned 404s"

warn: multi-volume probe: backup.part0005.rar returned 404, stopping at 4 volumes

This is normal: auto-discovery stops at the first missing volume. The reported count is what peel will fetch.

If more volumes are expected than the probe found:

  • The volumes may be on a different host or path.
  • The numbering may have a gap.
  • The volumes may use a different convention than the seed implies.

Use the explicit positional list or an @manifest.txt file instead.

"Wrong format detected"

error: format mismatch: URL suffix says .tar.zst, magic bytes are 0x1f 0x8b (gzip)
       pass --force-format-from-magic to trust magic, or --format <NAME> to pin

The URL suffix and the magic bytes disagree. Three options:

  1. Trust the magic when the file is known to match its bytes: --force-format-from-magic.
  2. Pin the decoder when the expected format is known: --format zstd (or whichever).
  3. Investigate the source. The file may have been re-encoded without renaming, or the URL is genuinely serving the wrong file.

"Permission denied" writing the output

The output's parent directory is not writable for the user running peel. The error message names the path:

error: cannot create output directory ./out/: Permission denied (os error 13)

peel does not elevate privileges. Run as a user with write permission on the output path, or use --workdir to relocate only the sidecars to a writable location while writing the final extracted tree to a location the user owns.

"Output file already exists" (or seems to)

peel overwrites the extracted output:

  • For tree-shaped outputs (tar, zip, 7z, rar), existing files at paths matching an archive entry are overwritten. Existing files at paths not in the archive are left alone.
  • For stream-shaped outputs (raw .zst, .xz, .lz4, .gz), the output file is overwritten unconditionally.

For a non-destructive run, point -o at a fresh directory.

"I want to interrupt and resume later"

Press Ctrl-C (or send SIGTERM). peel traps the signal, flushes the in-flight checkpoint, and exits with code 130 (SIGINT) or 143 (SIGTERM). The sidecars stay on disk. Re-run the exact same command to resume.

kill -9 (SIGKILL) is also safe. peel is designed so that even an ungraceful kill leaves the part-file's bytes and the last checkpoint in a consistent state, and the next run reconciles.

"Where do the logs go?"

stderr. The progress UI block goes to stderr as well, redrawn in place on a TTY. To capture:

peel <URL> -o ./out/ 2>peel.log                    # only log
peel <URL> -o ./out/ 2> >(tee peel.log >&2)        # log and show
RUST_LOG=debug peel <URL> -o ./out/ 2>peel.log     # verbose

"Live progress UI shows wrong percentage"

The percentage is streamed_bytes / Content-Length. Two pitfalls:

  • For multi-part URLs, the denominator is the sum of all parts' Content-Length values (accurate).
  • For a server that does not return Content-Length (rare, mostly badly-configured proxies), peel falls back to a chunk-count progress estimate. The percentage will be approximate.

Getting better diagnostics

Always run with RUST_LOG=info (or RUST_LOG=debug) when filing a bug report:

RUST_LOG=debug peel <URL> -o ./out/ 2>peel-debug.log

The first few lines list the selected backends, the discovered volumes and mirrors, and the format detection result. Misbehaviour typically shows up there.

FAQ

Design-rationale notes and answers to common "why does peel do X" questions.

No --password=<value> flag

argv is visible to every process on the host via ps, /proc/<pid>/cmdline, and Get-Process -IncludeUserName. A passphrase on the command line is read by:

  • Any unprivileged process on the host (until the process exits).
  • Anything that scrapes process listings: monitoring agents, shell history collectors, exit-code-replaying scripts.
  • Container observability tools that log process state on crash.

--password-from keeps the passphrase out of argv. env:NAME, file:PATH, and fd:N integrate with non-interactive workflows. For a single-line non-interactive invocation, PEEL_PW=… peel … --password-from env:PEEL_PW is two characters longer than --password=… and avoids the visibility problem.

Why .bz2 support was added in round two

Round one shipped without .bz2 on the basis that bzip2 is a slow, single-threaded codec superseded by xz (better ratio) and zstd (faster). That priors-only argument held until a real corpus arrived as .tar.bz2 and bunzip2 | peel /dev/stdin discarded the streaming + resume properties peel exists to deliver — the source side's on-disk footprint went unbounded across the workaround pipe, and a mid-extraction kill -9 restarted the whole thing.

.bz2, .tar.bz2, .tbz2, and .tbz are now first-class formats. The pipeline matches the other compressed .tar.* codecs: parallel ranged HTTP downloads, in-flight streaming decompression, fallocate(PUNCH_HOLE) reclaim of the compressed source as the decoder advances, and per-block frame-aligned checkpointing so a crash mid-extraction resumes exactly where it left off. See internal/PLAN_bz2_support.md for the engineering plan and trade-offs (randomised blocks, mid-block resume, parallel block decoding) deferred from this round.

No raw lzma support (only xz)

XZ is the modern container format wrapping LZMA2. Raw .lzma (without the XZ headers) was the LZMA1 era's format and is rare in modern publishing. peel's decoder is per-cycle equivalent to liblzma on the XZ path. Adding the raw LZMA1 framing is in the backlog but does not fit the streaming-from-HTTP workflow peel targets.

No nested archive handling

Each invocation of peel extracts one archive. Chain invocations for a nested archive:

peel https://host/outer.tar.zst -o ./outer/
peel ./outer/inner.zip -o ./final/                   # local mode

Nested-archive auto-detection adds an order-of-magnitude of complexity (filesystem walking, recursion limits, archive bombs) for no compelling user-facing win.

Reason for --no-extract

Three things peel --no-extract provides that plain curl does not:

  1. Parallel ranged GETs, like aria2c. curl has --parallel but it parallelises over many URLs, not over ranges of one URL.
  2. Resume after kill -9, with checkpointed state. curl -C - resumes a single in-flight transfer and does not survive a kill that lost the file descriptor.
  3. Mirror fan-out and SHA-256 verification. --mirror's per-mirror health tracking and aggregate token-bucket bandwidth cap are built in.

Use curl for one-off "download this one file fast". Use peel --no-extract for parallel-GET, resume, mirror failover, or hash verification (the full aria2c use case).

--mirror failover is parallel, not sequential

Modern CDN topologies are mostly symmetric: any mirror should serve any byte. Parallel scheduling across mirrors gives the aggregate bandwidth of all of them. Sequential failover wastes that.

When a mirror starts failing, the scheduler excludes it for 30 s and rebalances. The exclusion is logged for debugging which mirror went out of rotation.

For true sequential failover (mirror 2 used only if mirror 1 is totally unreachable), wrap with shell logic:

peel "$PRIMARY"  -o ./out/ ||
peel "$BACKUP_1" -o ./out/ ||
peel "$BACKUP_2" -o ./out/

Corporate proxy support

peel honours the standard HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables for outbound requests, plus SSL_CERT_FILE for trust-store overrides.

For TLS errors with a corporate CA, point SSL_CERT_FILE at the bundle that includes it:

SSL_CERT_FILE=/etc/ssl/certs/corp-bundle.pem peel <URL> -o ./out/

H2 through a corporate proxy is the most fragile combination. --http-version h1 is the usual workaround when an H2-aware proxy is doing something subtle wrong.

No Homebrew formula

The crates.io publish is the primary distribution path. The GitHub release attachments cover platforms where cargo install is not convenient. A Homebrew formula is on the wish list but not yet in place. PRs welcome at https://github.com/agouin/peel.

Windows support

Not officially supported. The blocking backend and the codec machinery are platform-neutral and should work, but:

  • io_uring is Linux-only.
  • mmap with MADV_REMOVE is Linux-only.
  • The progress UI's terminal handling is tested on TTYs that behave like xterm. cmd.exe is not in the test grid.
  • --password-from prompt reads /dev/tty, which does not exist on Windows.

WSL2 is a reasonable workaround and provides the full Linux path. Native Windows support is open for contribution.

Large on-disk part-file logical size

The part-file is sparse. The logical size is the full archive length (ls -la shows the full size), while the physical size on disk is the in-flight window (du -h shows actual usage):

$ ls -la out.peel.part
-rw-r--r--  ...  10737418240 ... out.peel.part   # 10 GiB logical
$ du -h out.peel.part
182M    out.peel.part                           # 182 MiB physical

Tools that ignore sparse files (some backup tools, some tar implementations) see the logical size. The actual disk usage is the physical size.

Choosing --max-disk-buffer

Default 1 GiB rarely engages on a healthy disk. Tune it when:

  • A hard ceiling on transient disk usage is required (CI runner with small /tmp).
  • The network is much faster than the disk and the lookahead grows unboundedly before the decoder catches up.

Common values: 256 MiB on memory-constrained or disk-constrained containers, default 1 GiB on a laptop or server, disable (--max-disk-buffer none) on a high-bandwidth host where the network burst should be absorbed fully into the buffer.

Bench grid platform

The README's bench grid is single-machine, single-run on an Apple M4 Max. macOS was chosen because:

  1. The reference CLIs (zstd, xz, lz4, gzip, 7z, unzip, unrar) are all available as Homebrew packages with stable versions.
  2. peel's blocking backend is in use (no io_uring on macOS), so the grid measures the codec story alone. Linux-specific fast paths (mmap, io_uring) provide additional gains on top.

A Linux grid with the io_uring backend is in internal/bench-results/.

Licensing

MIT OR Apache-2.0, at the user's option. The full text is at LICENSE-MIT and LICENSE-APACHE.

The RAR3 and RAR5 decoders are clean-room implementations. RARLAB's unrar source has not been consulted at any point. See Supported formats §RAR provenance.

Filing bugs or feature requests

GitHub Issues: https://github.com/agouin/peel/issues. Include the output of peel --version, the command that was run, and (if applicable) a RUST_LOG=debug log.