AT-1 Replay

Store the recipe, not the trace.

A huge amount of data isn't captured — it's generated: seeded simulations, synthetic-ML datasets, Monte-Carlo risk runs, fuzzing corpora. For that data you don't need to keep the output, only the recipe that reproduces it byte-for-byte. AT-1 Replay stores the recipe and re-executes it at decode — SHA-256 verified, with a never-worse fallback for everything that isn't generated.

Read the docs Generative compression

Store the rule, not the output

Seeded simulations and synthetic datasets are fully described by their recipe: {generator + seed + params}. We store the few hundred bytes of recipe and re-execute it at decode — a 15 MB seeded dataset becomes a few hundred bytes.

Byte-exact, fail-closed

Reconstruction is SHA-256-verified, and the recipe is bound to a code-hash over the generator source + library versions. A decode on a mismatched build refuses rather than silently emit different bytes.

Lost the seed? Recover it

Blind seed-recovery searches a candidate pool, regenerates, and matches the bytes — then stores the recovered recipe. A 2^16 pool resolves in seconds for typical CI/run-id seeds.

Honest, never-worse fallback

If the data was genuinely captured (not generated), or the declared generator doesn't reproduce it, it falls back to a never-worse xz container — so the artifact is never meaningfully larger than xz, and it never fabricates a regeneration.

Three commands

at1 replay pack run.f64 --out run.at1r --generator np_montecarlo --params '{"seed":2025,"n_paths":500,"n_steps":2000}'

Store the recipe; verified byte-exact against the file.

at1 replay verify run.at1r --against run.f64

Reconstruct + SHA-256 check (and match the original file).

at1 replay unpack run.at1r --out run.f64

Re-execute the generator, fail-closed, byte-exact.

Where it wins — and where it doesn't

On real, widely-used generators (scikit-learn's make_classification/make_regression/make_blobs with a fixed random_state, and numpy PCG64 Monte-Carlo) the recipe is a few hundred bytes — ~10⁵× vs xz, byte-exact. On genuinely captured data (sensor logs, market ticks, network flows, genomics) it finds no spurious generator and falls back to a container that's never meaningfully larger than xz. The win is real where data is generated, and honest everywhere else.

Replay docs Enable (generative engine)