When order is noise, stop paying to store it.
A graph edge list, an adjacency table, a posting list, a port allowlist — these are sets. The order the rows happen to arrive in carries no information, yet a general compressor pays to preserve it. AT-1 stores the canonical sorted adjacency and reconstructs the same set — exact at the set level, order-invariant, and with no dense relabeling (the real (a, b) values are kept).
342,077 src→dst flow edges stored as a set: 1,073,552 B as a 2-column table → 502,531 B as canonical adjacency. Set round-trip exact.
143,143 unique edges: 789,264 B → 473,154 B. The win is exactly the bits the arbitrary edge order was costing.
The order was the cost
A set of N distinct pairs can be written in N!orders that all mean the same thing — and that choice of order is pure entropy a byte-stream compressor faithfully preserves for nothing. AT-1 fixes a single canonical order (sorted adjacency / CSR), so the order bits simply disappear; decode returns the identical set. It's lossless at the level that matters — the set — and it never relabels your identifiers, so the values you get back are the values you put in.
# a set of edges, stored as a set — order omitted, same set returned at1 set compress edges.csv edges.at1set --a SrcAddr --b DstAddr # +53.2% vs the 2-column table at1 set decompress edges.at1set edges.csv # the SAME set (no relabel cheat)
Decode reconstructs the same set of pairs, verified. The honest scope: this wins on artifacts that are genuinely delivered as a set— graph/topology edge lists, adjacency, posting lists, allowlists. A row-bound table, where the row order pairs values the consumer needs, stays row-bound — there the order isn't noise, so we don't drop it.
Built for
Network & topology graphs · SIEM/SOC connection & edge lists · recommendation & posting lists · social / citation graphs · port and identifier allowlists — anywhere you archive a large set and the storage order is incidental.