Ingest from a URL

at1 fetch pulls a file from a URL and writes a verified .at1 in one step. For line-oriented data the HTTP body is fed through the encoder in flight, so the raw plaintext is never written to disk— peak disk is the compressed output, not the original. Move a 50 GB log out of object storage and only ever land the compressed copy.

at1 fetch https://logs.example.com/app.log  app.at1
at1 fetch https://data.example.com/events.csv  events.at1   columnar
at1 fetch https://1000genomes.example/chr22.vcf chr22.at1   vcf  --backend zstd
  • Codec — pass one explicitly, or omit it and AT-1 infers from the URL extension / Content-Type.
  • Plaintext never stored for the line-streamable codecs (log, columnar, vcf, ssh, json): the download is compressed chunk-by-chunk and only the .at1 is written.
  • Verified as it lands — every chunk is decoded back and checked against the source bytes during compression, and a SHA-256 integrity trailer is written over the original; a mismatch refuses to trust the output.
  • Backends --backend xz (max ratio, default) or zstd (fast); chunk size via --chunk-lines.

Confirm the result the same way you would any .at1:

at1 integrity app.at1        # SHA-256 trailer: decode == original
at1 decompress app.at1 app.log   # byte-for-byte identical to the source

Honest scope: codecs that must see the whole file to choose a layout (e.g. whole-file auto selection, or non-line formats) fall back to a buffered mode — the download is written to a temporary file, compressed, then deleted, so the plaintext is on disk only transiently. AT-1 tells you which mode it used. Resume / range-GET is not in this version. See also streaming & live ingest.