Store the recipe, not the trace.
A huge amount of data isn't captured — it's generated: seeded simulations, synthetic-ML datasets, Monte-Carlo risk runs, fuzzing corpora. For that data you don't need to keep the output, only the recipe that reproduces it byte-for-byte. AT-1 Replay stores the recipe and re-executes it at decode — SHA-256 verified, with a never-worse fallback for everything that isn't generated.
Store the rule, not the output
Seeded simulations and synthetic datasets are fully described by their recipe: {generator + seed + params}. We store the few hundred bytes of recipe and re-execute it at decode — a 15 MB seeded dataset becomes a few hundred bytes.
Byte-exact, fail-closed
Reconstruction is SHA-256-verified, and the recipe is bound to a code-hash over the generator source + library versions. A decode on a mismatched build refuses rather than silently emit different bytes.
Lost the seed? Recover it
Blind seed-recovery searches a candidate pool, regenerates, and matches the bytes — then stores the recovered recipe. A 2^16 pool resolves in seconds for typical CI/run-id seeds.
Honest, never-worse fallback
If the data was genuinely captured (not generated), or the declared generator doesn't reproduce it, it falls back to a never-worse xz container — so the artifact is never meaningfully larger than xz, and it never fabricates a regeneration.
Three commands
at1 replay pack run.f64 --out run.at1r --generator np_montecarlo --params '{"seed":2025,"n_paths":500,"n_steps":2000}'Store the recipe; verified byte-exact against the file.
at1 replay verify run.at1r --against run.f64
Reconstruct + SHA-256 check (and match the original file).
at1 replay unpack run.at1r --out run.f64
Re-execute the generator, fail-closed, byte-exact.
Where it wins — and where it doesn't
On real, widely-used generators (scikit-learn's make_classification/make_regression/make_blobs with a fixed random_state, and numpy PCG64 Monte-Carlo) the recipe is a few hundred bytes — ~105× vs xz, byte-exact. On genuinely captured data (sensor logs, market ticks, network flows, genomics) it finds no spurious generator and falls back to a container that's never meaningfully larger than xz. The win is real where data is generated, and honest everywhere else.