One artifact: archive, table, and audit-grade original
Compress once, verified byte-exact. Query the same file in place. Restore the exact original whenever you need it.
Compress, verified lossless
Every encode is byte-compared to the original before it ships. If a structural codec can't beat plain xz/zstd, the fallback stores that instead — AT-1 output is never worse.
$ at1 compress qcolumnar trades.csv trades.at1
# structure-aware encode: columns split, deltas,
# per-block min/max zone maps, then xz/zstd entropy
# stage. Before the file ships, the encoder decodes
# it back and byte-compares against the original:
#
# LOSSLESS (byte-for-byte): True
#
# A SHA-256 of the original is embedded in the file.
# Non-inferiority fallback: never larger than xz/zstd.Query it in place
Per-block min/max zone maps let a predicate skip whole row-groups and decode only the columns it touches — over S3, range-GET-ing just those byte ranges. Same path from 11 engines (DuckDB, Spark, Postgres, …).
$ at1 sql trades.at1 "SELECT aggId, price
WHERE ts BETWEEN 1704067200000 AND 1704067210000"
# zone maps skip row-groups the predicate can't match;
# only the touched column blocks are read (range-GET
# over object storage):
#
# matched 1,691 rows
# read 1/54 of the file (54x less I/O)
# exact original rowsRestore the exact bytes
Unlike Parquet/ORC, the same file that answers queries still reconstructs the exact original — audit-grade. The open decoder means zero lock-in.
$ at1 decompress trades.at1 trades.out
$ cmp trades.out trades.csv && echo identical
identical
# the decoder re-checks the embedded SHA-256 and
# REFUSES on mismatch. The decoder is ~130-200 KB of
# portable C (also Go, Rust, Node, Python, WASM),
# Apache-2.0 -- anyone can open an AT-1 file forever.One decode core. Eleven engines, verified live.
There is exactly one piece of hard technology — a ~260-line, fuzz-hardened, zone-mapped block decoder. Every engine adapter is a thin layer over it, through native C, Apache Arrow, or Postgres/JDBC federation. We've run real SQL over AT-1 data on all eleven.
Adding an engine is ~300 lines of glue, or zero for anything Arrow-native. The same .at1 still reconstructs the original byte-for-byte through the non-querying decoder — querying is additive, never a re-encode. Full matrix, connection strings + reproduce commands: the engines guide.