Recall cost
The compression world measures one number: ratio. But a smaller file you have to unpack in full to read isn't memory — it's a smaller photograph. The number that matters for cold data is recall cost: the bytes you must materialize (decompress or read) to answer a recall, plus the idle resident size — the bytes occupied when nothing is being recalled.
A whole-file compressor has a brutal recall cost: any access decompresses the whole thing. AT-1 is addressable memory — it recalls a record byte-exact while materializing a tiny fraction, and occupies near-nothing when idle. Think of it as unified memory for cold data: the archive and the index are the same bytes.
A one-cell lookup, measured
On a 50,000-row, 1,083,004-byte CSV, measured with the real AT-1 reader, a single-row point lookup costs:
- xz whole-file: materializes 117,328 B — the entire archive (100%). A point, range, aggregate, or full read all cost the same: everything.
- AT-1 queryable: materializes 3,910 B — 0.36% of the CSV. The reader scans 1 of 26 row-groups.
Put plainly: a one-cell lookup costs xz the entire archive (117 KB) and costs AT-1 3.9 KB.
Idle resident size
When nothing is being recalled, the bytes still sit somewhere. On the same file: xz holds 117,328 Bresident (10.8% of the CSV); AT-1's queryable container holds 99,441 B (9.2%) — and unlocks per-record recall on top.
Recall cost by access shape
- Point — scans 1 of 26 row-groups: 3,910 B.
- Range — zone maps skip 25 of 26 row-groups, reading only the survivors.
- Aggregate (SUM) — answered from the footer roll-up with zero data-block decode. To be precise: it reads 51,378 B of footer (~2× cheaper than xz) — footer-only, not literally zero.
The generative store — near-zero idle
For a column AT-1 can model, recall cost collapses further. The modeled column's idle resident size is 48 B — just the generator params — versus 1 MB raw, and a point recall materializes 50 B. Be honest about scope: this applies only to the modeled column, not the whole archive. Where a generator fits, idle is near-zero; where it doesn't, the queryable container still gives you the 0.36% point-lookup above.
Measure it yourself — at1 recall-cost
at1 recall-cost data.at1 # -> reports two numbers a ratio can't: # IDLE RESIDENT — bytes occupied while nothing is being recalled # RECALL COST — bytes materialized (decompressed/read) to answer a recall # # measured on a 50,000-row, 1,083,004-byte CSV with the real AT-1 reader: # point lookup (one row): 3,910 B materialized (0.36% of the CSV) # same lookup under xz: 117,328 B (the ENTIRE archive)
The same primitives are in the reader — point, range, and the footer aggregate roll-up — so you can profile recall cost against your own access pattern:
from at1_reader import AT1Reader
r = AT1Reader("data.at1")
row = r.point("row_id", 41027) # scans 1 of 26 row-groups -> ~3,910 B materialized
rows = r.range("ts", lo, hi) # zone maps skip 25 of 26 row-groups
total = r.aggregate("price", "sum") # answered from the footer roll-up: no data-block decodeDecoding, verifying, and recalling are always free and never need an account; the encode and query paths are metered against the account whose API key the host process supplies — same as the rest of AT-1. Reading a record is part of decode: free.