Research

Compression as a structure detector.

Our research thesis — the FLOOR map — is that a single compression methodology, run across many unrelated real domains, reveals which generating primitive each one reduces to, and where each hits its entropy floor: the boundary between determinism and chance. Validated, lossless, and honest about where it finds nothing.

Language family tree from raw bytes

Run the compressor pairwise over raw text in many languages and the language family tree falls out — relatedness rebuilt with no linguistics, just bytes. (It sees surface form: the same language in two alphabets looks far apart until you romanize it.)

The edge of chaos, located

Sweep a family of systems from orderly to random and compression pinpoints the thin band where complexity lives — universal computation (Rule 110) sits at a measured ~0.47 bits per cell, between the trivial and the noise.

Rule-governed games win, statistical style floors

Chess — governed by rules — compresses far past a general compressor, because the rule is the short description. Bach, governed by statistical style rather than a closed rule, floors near the general-codec limit. The method tells the two apart.

High-entropy data floors honestly

Protein, DNA and earthquake records floor near 0% gain — they're close to incompressible, and the detector is not fooled into a false signal. Sunspots, by contrast, reveal a linear recurrence (1.13×). The honesty is the result.

Why this matters for a compression company

The same discipline that finds the shortest description of your data is, in effect, a detector for hidden structure. It's why AT-1 wins where data is rule-governed (logs, ticks, genomics, games) and honestly delegates where it isn't (natural media at its entropy floor). The research isn't a side project — it's the same lens, pointed at the world instead of a single file.

Read the paper

A minimum-description-length instrument for measuring the determinate-to-stochastic ratio of structure across scientific domains. Dylan Wolpe, 2026. The full FLOOR-map methodology and its results across 11+ unrelated real domains — including the honest negatives, because the failures are what make the boundary trustworthy.

Open-access preprint · DOI: 10.5281/zenodo.20811282