Compress structured data by its shape, not as a byte soup.
Generic compressors treat a CSV or log line as one undifferentiated byte stream. AT-1's format-aware stream-splittransposes the data into columns and gives each typed field its own stream — so numbers compress against numbers and timestamps against timestamps. It's byte-exact lossless, with a never-worse guard, and it beats xz -9e on real structured data.
xz -9eBinance trade data — columnar transpose + typed delta over the price/size/time streams.
xz -9ePer-channel readings split into their own typed streams before compression.
xz -9eStructured access logs — fields separated so each compresses against its own kind.
Same kind packed with its own kind
Stream-split is a columnar transpose with typed delta — the established BtrBlocks-class technique, applied automatically and verified byte-exact. There's no novel-IP claim here: the value is that it's built in, measured on real data, and guarded so it never loses to the plain codec.
# format-aware compression with a never-worse guard at1 columnar compress ticks.csv --out ticks.at1 # 2.04× vs xz-9e, byte-exact at1 columnar compress meters.csv --out meters.at1 # 1.60× vs xz-9e at1 columnar compress access.log --out access.at1 # 1.62× vs xz-9e at1 columnar decompress ticks.at1 --out ticks.csv # reconstructs every original byte
Logs, by structure
The win tracks how structured the log is: heavily-fielded access logs gain the most; freeform syslog is closer to parity. All byte-exact, all vs xz -9e.
| Dataset | vs xz-9e |
|---|---|
| Apache access logs | 1.62× |
| HDFS logs | 1.23× |
| Linux syslog | 1.03× |
Every result above reconstructs the original file byte-for-byte. The never-worse guard means that if the stream-split wouldn't help a given file, it falls back so you never end up larger than the standard codec. The market here is structured telemetry, CSV, JSON and logs — the data that has columns to split — not natural media, where we delegate to the codecs that already model it.
Built for
Financial tick archives · IoT & smart-meter telemetry · application & access logs · CSV/JSON exports — anywhere you store a lot of structured rows and want them smaller without giving up lossless reconstruction.