Measured, not promised
Every row is a real public dataset, accepted only after the encoder's gate confirmed a byte-exact reconstruction. “AT-1 ratio” is times-smaller vs raw; the last column is the margin over the format that industry actually uses today.
| Data type | AT-1 ratio | vs the format you use today |
|---|---|---|
Server logs Apache access · SSH auth · HDFS | 19–37× | 1.1–1.4× smaller than xz-9 |
Genomics 1000-Genomes VCF, chr22 | 209× | 2.48× smaller than native BCF (2.95× vs .vcf.gz) |
Telemetry / IoT UCI smart-meter power | 22× | 1.9× smaller than xz-9 · ~3× vs Parquet-zstd |
Neurophysiology / EEG PhysioNet CHB-MIT scalp EEG | 5.1× | 1.6× smaller than xz-9, lossless |
Financial ticksQUERYABLE Binance BTCUSDT aggTrades | 18.7× | ~3× smaller than Parquet-zstd · 54× less query I/O |
Event JSON / NDJSON GitHub Archive events | 21× | 1.15× smaller than xz-9 |
Database exportsQUERYABLE Mongo / Elasticsearch NDJSON | ~28× | ~2× smaller than xz-9, and queryable |
Lakehouse tabularQUERYABLE NYC-taxi, Parquet round-trip | −27% | smaller than Parquet-zstd, still block-addressable |
Map / geo OpenStreetMap, Luxembourg | 14× | 1.32× smaller than PBF, 1.47× vs xz |
Every row is reproduced from a real public dataset, and every codec reports a byte-for-byte lossless check — we cite no result that isn't verified lossless.
Where we lose — and what we don't claim
Every compression vendor publishes the wins. We publish the losses too, because the advantage is the capability (query + byte-exact) and the economics, not a ratio number that changes quarter to quarter. Here is exactly where AT-1 is the wrong tool.
Decode speed is xz-class
The decoder is fast enough for archival reads, not for hot paths — ~2.2× xz and ~12× zstd in CPU/byte. AT-1 is a cold / archival tier, not your hot storage.
Already-compressed media
JPEG, H.264, and other entropy-saturated media gain ~nothing from any compressor, AT-1 included. We don't pretend otherwise.
Monochrome DICOM under JPEG-LS
On monochrome pixel data already under JPEG 2000 / JPEG-LS, the image-domain codec wins. AT-1's imaging win is uncompressed / RLE / color DICOM only.
Numeric-heavy tabular
On dense numeric SMART/sensor columns, a trained OpenZL graph edges us ~1.06× — at the cost of minutes of per-format training. We're zero-config.
High-entropy network data
On NetFlow and Zeek conn logs, the data is near its entropy floor; AT-1 ties xz and the non-inferiority fallback correctly kicks in. No structural win to claim.
“Best ratio everywhere”
Ratio leadership is contested and unprovable, so we never claim it. We claim only what the verification gate measured on real data, per domain.
Reproduce any number on this page — every codec prints LOSSLESS (byte-for-byte): True/False and we cite no result that prints False. Sources: comparison.html, VALIDATION_RESULTS.md, BENCHMARKS_OPENZL_AND_SPEED.md.