How it works

How It Works

One artifact: archive, table, and audit-grade original

Compress once, verified byte-exact. Query the same file in place. Restore the exact original whenever you need it.

Compress, verified lossless

Every encode is byte-compared to the original before it ships. If a structural codec can't beat plain xz/zstd, the fallback stores that instead — AT-1 output is never worse.

Verified at encode time, refused on mismatch

$ at1 compress qcolumnar trades.csv trades.at1

# structure-aware encode: columns split, deltas,
# per-block min/max zone maps, then xz/zstd entropy
# stage. Before the file ships, the encoder decodes
# it back and byte-compares against the original:
#
#   LOSSLESS (byte-for-byte): True
#
# A SHA-256 of the original is embedded in the file.
# Non-inferiority fallback: never larger than xz/zstd.

Query it in place

Per-block min/max zone maps let a predicate skip whole row-groups and decode only the columns it touches — over S3, range-GET-ing just those byte ranges. Same path from 11 engines (DuckDB, Spark, Postgres, …).

Under 1% of the file read on selective queries

$ at1 sql trades.at1 "SELECT aggId, price
    WHERE ts BETWEEN 1704067200000 AND 1704067210000"

# zone maps skip row-groups the predicate can't match;
# only the touched column blocks are read (range-GET
# over object storage):
#
#   matched 1,691 rows
#   read 1/54 of the file (54x less I/O)
#   exact original rows

Restore the exact bytes

Unlike Parquet/ORC, the same file that answers queries still reconstructs the exact original — audit-grade. The open decoder means zero lock-in.

Byte-for-byte, SHA-256 re-checked on decode

$ at1 decompress trades.at1 trades.out
$ cmp trades.out trades.csv && echo identical
identical

# the decoder re-checks the embedded SHA-256 and
# REFUSES on mismatch. The decoder is ~130-200 KB of
# portable C (also Go, Rust, Node, Python, WASM),
# Apache-2.0 -- anyone can open an AT-1 file forever.

Query it from the engine you already run

One decode core. Eleven engines, verified live.

There is exactly one piece of hard technology — a ~260-line, fuzz-hardened, zone-mapped block decoder. Every engine adapter is a thin layer over it, through native C, Apache Arrow, or Postgres/JDBC federation. We've run real SQL over AT-1 data on all eleven.

DuckDB

native C ext

SQLite

C vtable

PostgreSQL

C FDW

ClickHouse

Arrow

Spark

Arrow

Trino

FDW federation

Presto

FDW federation

Flink

JDBC federation

Polars

Arrow

pandas

Arrow

Dask

Arrow

Adding an engine is ~300 lines of glue, or zero for anything Arrow-native. The same .at1 still reconstructs the original byte-for-byte through the non-querying decoder — querying is additive, never a re-encode. Full matrix, connection strings + reproduce commands: the engines guide.