Recall cost

The compression world measures one number: ratio. But a smaller file you have to unpack in full to read isn't memory — it's a smaller photograph. The number that matters for cold data is recall cost: the bytes you must materialize (decompress or read) to answer a recall, plus the idle resident size — the bytes occupied when nothing is being recalled.

A whole-file compressor has a brutal recall cost: any access decompresses the whole thing. AT-1 is addressable memory — it recalls a record byte-exact while materializing a tiny fraction, and occupies near-nothing when idle. Think of it as unified memory for cold data: the archive and the index are the same bytes.

A one-cell lookup, measured

On a 50,000-row, 1,083,004-byte CSV, measured with the real AT-1 reader, a single-row point lookup costs:

  • xz whole-file: materializes 117,328 B — the entire archive (100%). A point, range, aggregate, or full read all cost the same: everything.
  • AT-1 queryable: materializes 3,910 B 0.36% of the CSV. The reader scans 1 of 26 row-groups.

Put plainly: a one-cell lookup costs xz the entire archive (117 KB) and costs AT-1 3.9 KB.

Idle resident size

When nothing is being recalled, the bytes still sit somewhere. On the same file: xz holds 117,328 Bresident (10.8% of the CSV); AT-1's queryable container holds 99,441 B (9.2%) — and unlocks per-record recall on top.

Recall cost by access shape

  • Point — scans 1 of 26 row-groups: 3,910 B.
  • Range — zone maps skip 25 of 26 row-groups, reading only the survivors.
  • Aggregate (SUM) — answered from the footer roll-up with zero data-block decode. To be precise: it reads 51,378 B of footer (~2× cheaper than xz) — footer-only, not literally zero.

The generative store — near-zero idle

For a column AT-1 can model, recall cost collapses further. The modeled column's idle resident size is 48 B — just the generator params — versus 1 MB raw, and a point recall materializes 50 B. Be honest about scope: this applies only to the modeled column, not the whole archive. Where a generator fits, idle is near-zero; where it doesn't, the queryable container still gives you the 0.36% point-lookup above.

Measure it yourself — at1 recall-cost

at1 recall-cost data.at1
# -> reports two numbers a ratio can't:
#    IDLE RESIDENT  — bytes occupied while nothing is being recalled
#    RECALL COST    — bytes materialized (decompressed/read) to answer a recall
#
# measured on a 50,000-row, 1,083,004-byte CSV with the real AT-1 reader:
#    point lookup (one row):  3,910 B materialized   (0.36% of the CSV)
#    same lookup under xz:    117,328 B               (the ENTIRE archive)

The same primitives are in the reader — point, range, and the footer aggregate roll-up — so you can profile recall cost against your own access pattern:

from at1_reader import AT1Reader

r = AT1Reader("data.at1")
row   = r.point("row_id", 41027)     # scans 1 of 26 row-groups -> ~3,910 B materialized
rows  = r.range("ts", lo, hi)        # zone maps skip 25 of 26 row-groups
total = r.aggregate("price", "sum")  # answered from the footer roll-up: no data-block decode

Decoding, verifying, and recalling are always free and never need an account; the encode and query paths are metered against the account whose API key the host process supplies — same as the rest of AT-1. Reading a record is part of decode: free.