Format-aware compression

Compress structured data by its shape, not as a byte soup.

Generic compressors treat a CSV or log line as one undifferentiated byte stream. AT-1's format-aware stream-splittransposes the data into columns and gives each typed field its own stream — so numbers compress against numbers and timestamps against timestamps. It's byte-exact lossless, with a never-worse guard, and it beats xz -9e on real structured data.

2.04×
Finance ticks
vs xz -9e

Binance trade data — columnar transpose + typed delta over the price/size/time streams.

1.60×
Smart-meter telemetry
vs xz -9e

Per-channel readings split into their own typed streams before compression.

1.62×
Apache logs
vs xz -9e

Structured access logs — fields separated so each compresses against its own kind.

Same kind packed with its own kind

Stream-split is a columnar transpose with typed delta — the established BtrBlocks-class technique, applied automatically and verified byte-exact. There's no novel-IP claim here: the value is that it's built in, measured on real data, and guarded so it never loses to the plain codec.

# format-aware compression with a never-worse guard
at1 columnar compress ticks.csv  --out ticks.at1     # 2.04× vs xz-9e, byte-exact
at1 columnar compress meters.csv --out meters.at1    # 1.60× vs xz-9e
at1 columnar compress access.log --out access.at1    # 1.62× vs xz-9e

at1 columnar decompress ticks.at1 --out ticks.csv    # reconstructs every original byte

Logs, by structure

The win tracks how structured the log is: heavily-fielded access logs gain the most; freeform syslog is closer to parity. All byte-exact, all vs xz -9e.

Datasetvs xz-9e
Apache access logs1.62×
HDFS logs1.23×
Linux syslog1.03×
Lossless, byte-exact, never-worse

Every result above reconstructs the original file byte-for-byte. The never-worse guard means that if the stream-split wouldn't help a given file, it falls back so you never end up larger than the standard codec. The market here is structured telemetry, CSV, JSON and logs — the data that has columns to split — not natural media, where we delegate to the codecs that already model it.

Built for

Financial tick archives · IoT & smart-meter telemetry · application & access logs · CSV/JSON exports — anywhere you store a lot of structured rows and want them smaller without giving up lossless reconstruction.