Operational tools

Three command-line tools wrap the AT-1 pipeline for the jobs that come before and around production use: sizing a migration, auto-tiering cold data, and attesting archives for compliance. Each runs the same verified, byte-exact pipeline as at1 compress — they never estimate from a lookup table, and they show the files where AT-1 only ties xz, not just the wins.


at1-doctor — measure what AT-1 would save

at1-doctor scan /data [--sample-mb 8] [--max-files 200]
                      [--rate-storage 0.023] [--report savings.html]

Scans a directory and sample-compresses every file through the real verified pipeline (auto codec selection + the byte-exact gate), then emits a self-contained HTML savings report. Ratios are measured on the first --sample-mb of each file; file-level totals are extrapolated from that measured ratio; the $/GB-month storage rate you pass is the one stated assumption. Already-compressed formats (gz, zst, xz, png, mp4, pdf, …) are skipped.

file                          size        codec   sample ratio  note
events/2026-05.ndjson    1,204,338,112    qjson         11.34x  measured
trades/ticks.csv           880,201,003  qcolumnar        6.10x  measured
assets/logo.bin              4,096,000        -             -   ties xz -- no structural win

scanned 3 file(s), 2.089 GB in 7s (sample=8 MB/file -- ratios measured, totals extrapolated)
projected after AT-1: 0.241 GB  -> ~$510/yr storage at $0.023/GB-mo (assumption)

When to use: point it at a prospect or production data directory to size a migration and produce a defensible savings number — with the provenance of every figure labelled, and the no-win files shown rather than hidden. Add --report savings.html for a shareable artifact.


at1-watch — set-and-forget auto-tiering daemon

at1-watch DIR --older-than 7d [--interval 60] [--delete-original]
                  [--include "*.csv,*.log,*.ndjson"] [--once] [--dry-run]
                  [--verify-ledger]

Watches a directory and tiers files that have gone cold. The policy is conservative by default because it touches customer data:

  • A file is tiered only when it is older than --older-than (mtime) and its size has been stable across two consecutive scans — nothing mid-write gets tiered.
  • Tiering runs the full gated pipeline (auto codec, query-optimized, verification gate, SHA-256 trailer) to <name>.at1 next to the file.
  • The original is kept unless --delete-original is set — and even then it is deleted only after an independent decompress-and-compare against the recorded SHA-256.
  • Every action lands in a hash-chained ledger (DIR/.at1_watch_ledger.jsonl); the timestamp is hashed into each link, so tampering is detectable with --verify-ledger.

When to use: run it as a background daemon for hands-off lifecycle management of logs, exports, and telemetry. Start with --dry-run to preview, then --once for a single pass (it scans twice so the stability check has both samples), or leave it looping on --interval.


at1-attest — cryptographic attestation for compliance & custody

at1-attest TABLE_DIR [--report attestation.html] [--deep] [--timestamp]

Produces a one-command attestation report for an AT-1 table or watch ledger — “this archive's full contents and history, cryptographically verified, as of now.” Three independently checkable layers:

  • Contents — every live segment re-hashed and compared to its manifest SHA-256.
  • History — the hash-chained event log recomputed end-to-end, so any edit to any past append/compact event breaks every later link.
  • Bytes — with --deep, every segment is decompressed and its embedded integrity trailer re-verified (decode == original, byte-for-byte).
VERIFIED    contents: 14 segment(s) re-hashed against the manifest
VERIFIED    history:  hash chain recomputed across 9 event(s) (appends/compactions)
VERIFIED    bytes:    14 segment(s) decompressed; embedded SHA-256 trailer verified decode == original

root hash: 7f3a…c91d
verdict: ALL CHECKS VERIFIED

The report states exactly what was and was not checked, and emits a root hash over the segment hashes plus the chain head. This is evidence generation, not a signature scheme: pair the root hash with your own timestamping/signing — RFC 3161 (--timestamp fetches a token from a public TSA), sigstore, or a notarized email — for third-party non-repudiation.

When to use: for compliance and chain-of-custody — proving to an auditor or counterparty that an archived dataset and its full edit history are intact and untampered as of a given moment.