Not just smaller files — a verified-lossless data engine.
The same byte-exact container that shrinks your cold data also answers queries in place, ingests live streams, remembers its own tamper-evident history, and meters exactly what a read costs. Every capability below ships today and is backed by a measured result — wins, boundaries, and “not run here” all labelled honestly.
Query & access
Read the archive without rehydrating it
Per-block min/max zone maps skip whole row-groups; only the columns a query touches are decoded — over object storage, range-GET-ing just those byte ranges, while the same file still reconstructs the exact original.
Selective query p50 ~16 ms / p99 ~41 ms, ~47× faster than a full scan
Read the docs →Header-first layout: a reader holding only a URL fetches a few KB of footer, then range-GETs only the blocks a predicate cannot exclude. No server, no index service.
Wire bytes == the logical minimum; ~49× cheaper than pulling the raw object
Read the docs →Emit a single self-contained .html that IS the database: the compressed archive + an 84 KB WASM query engine, fused. A branded, friendly UI — full-text search, named-column filters, a sortable/paginated table, CSV export — runs entirely on the client. Nothing uploads; the data is byte-for-byte recoverable.
Email it; it opens as a searchable database on any phone, offline — try one over real Common Crawl data
Read the docs →Live data & history
Stream it in, and remember exactly what happened
Batch a stream, seal each batch as an immutable segment through the verification gate, commit. Readers see only whole, byte-exact-verified batches — never a torn write — while ingestion continues.
tail -F app.log | at1-live · mid-stream query saw exactly the sealed rows
Read the docs →A hash-chained audit log lets scan_as_of(event) reconstruct and query the table at any past moment. Exact while segments exist (they're immutable); states reclaimed by compaction raise rather than approximate.
“Here's your table last Tuesday — and cryptographic proof nobody edited the history.” Glacier can't; Delta/Iceberg lack the byte-exact + tamper-evident layers
Read the docs →at1_watch tiers files once they go cold through the full gated pipeline. Paranoid by design: age + size-stability across two scans (nothing mid-write is touched); originals kept unless you opt in, and even then deleted only after an independent decompress + SHA-256 re-verify. Every action lands in a hash-chained ledger.
9/9 tests — incl. a growing file never tiered and a forged ledger entry caught
Read the docs →Packaging, billing & the lakehouse
One container, many jobs
Pack a whole directory into one .at1, each entry independently verified and extractable. Pack PDFs and AT-1 co-stores a text index, so you full-text-search inside the PDFs without unpacking them.
Search “credential” → the exact PDF + page, no document materialized
Read the docs →The S3 gateway meters ?select queries at exactly their bytes_read and object GETs at bytes served, through the same TLS-guarded control-plane hook the encoder uses. Fire-and-forget by default (reads never block on billing); AT1_ENFORCE makes it synchronous.
A select billed 2,813 B precisely; an over-quota GET denied, usage unchanged
docker compose up → Trino + the gateway, auto-tiering ./data and serving it back over s3://. DuckDB, Spark and Trino each query .at1 in place doing their own SigV4 signing.
Composed from individually-verified pieces; the compose itself wasn't run in CI (Docker Hub pull limit) — first docker compose up on your machine confirms it
Read the docs →New verticals
Where it wins next
Blue Gene/L RAS logs: AT-1's log codec hits 25.1×, beating xz-9 — a new vertical for national labs and HPC operators sitting on enormous console-log archives.
Writing AT-1-compressed data to LTO-9 fits 240–452 TB logical per cartridge — ~5–8× fewer cartridgesand ~$2.7–3k media saved per PB vs the drive's built-in compression. A capacity model; ratios depend on your data mix.
Every number here was measured on real data or is labelled as a model. The boundaries are published too — see the benchmarks and the project's honesty notes.