Benchmarks — AT-1

Proof on real data

Measured, not promised

Every row is a real public dataset, accepted only after the encoder's gate confirmed a byte-exact reconstruction. “AT-1 ratio” is times-smaller vs raw; the last column is the margin over the format that industry actually uses today.

Data type	AT-1 ratio	vs the format you use today
Server logs Apache access · SSH auth · HDFS	19–37×	1.1–1.4× smaller than xz-9
Genomics 1000-Genomes VCF, chr22	209×	2.48× smaller than native BCF (2.95× vs .vcf.gz)
Telemetry / IoT UCI smart-meter power	22×	1.9× smaller than xz-9 · ~3× vs Parquet-zstd
Neurophysiology / EEG PhysioNet CHB-MIT scalp EEG	5.1×	1.6× smaller than xz-9, lossless
Financial ticksQUERYABLE Binance BTCUSDT aggTrades	18.7×	~3× smaller than Parquet-zstd · 54× less query I/O
Event JSON / NDJSON GitHub Archive events	21×	1.15× smaller than xz-9
Database exportsQUERYABLE Mongo / Elasticsearch NDJSON	~28×	~2× smaller than xz-9, and queryable
Lakehouse tabularQUERYABLE NYC-taxi, Parquet round-trip	−27%	smaller than Parquet-zstd, still block-addressable
Map / geo OpenStreetMap, Luxembourg	14×	1.32× smaller than PBF, 1.47× vs xz

Every row is reproduced from a real public dataset, and every codec reports a byte-for-byte lossless check — we cite no result that isn't verified lossless.

Radical honesty

Where we lose — and what we don't claim

Every compression vendor publishes the wins. We publish the losses too, because the advantage is the capability (query + byte-exact) and the economics, not a ratio number that changes quarter to quarter. Here is exactly where AT-1 is the wrong tool.

Decode speed is xz-class

The decoder is fast enough for archival reads, not for hot paths — ~2.2× xz and ~12× zstd in CPU/byte. AT-1 is a cold / archival tier, not your hot storage.

Already-compressed media

JPEG, H.264, and other entropy-saturated media gain ~nothing from any compressor, AT-1 included. We don't pretend otherwise.

Monochrome DICOM under JPEG-LS

On monochrome pixel data already under JPEG 2000 / JPEG-LS, the image-domain codec wins. AT-1's imaging win is uncompressed / RLE / color DICOM only.

Numeric-heavy tabular

On dense numeric SMART/sensor columns, a trained OpenZL graph edges us ~1.06× — at the cost of minutes of per-format training. We're zero-config.

High-entropy network data

On NetFlow and Zeek conn logs, the data is near its entropy floor; AT-1 ties xz and the non-inferiority fallback correctly kicks in. No structural win to claim.

“Best ratio everywhere”

Ratio leadership is contested and unprovable, so we never claim it. We claim only what the verification gate measured on real data, per domain.

Reproduce any number on this page — every codec prints LOSSLESS (byte-for-byte): True/False and we cite no result that prints False. Sources: comparison.html, VALIDATION_RESULTS.md, BENCHMARKS_OPENZL_AND_SPEED.md.