Prove what your AI actually decided.
When a regulator or a court asks you to justify an automated decision two years later, a log that says “the model said no” isn’t proof. The AI Evidence Capsule is a notarized, reproducible receipt: re-run the exact inference offline and get byte-identical output, with the model’s integrity cryptographically sealed. Think git + a notary public for AI decisions.
- 10/10
- byte-identical cold-process replay (real Pythia-70m)
- ~0.5 KB
- evidence per inference (weights shared once)
- caught
- 1-byte weight tamper detected on verify
- offline
- self-contained — no live re-execution or model call
Each inference keeps a ~0.5 KB capsule — exact input, seed/decode policy, the pinned execution manifest, and SHA-256 of the model store + output. The weights live once in a shared, verified store; the per-decision evidence is kilobytes.
An auditor rebuilds the model from the verified weight store + the pinned manifest, replays the capsule's exact input from a cold process, and gets byte-identical output. “We logged it” becomes “re-run it and check the bytes.”
The weight store carries per-tensor SHA-256. Flip a single byte and the integrity check fails — so you can prove the model that made the decision was the untouched original, not something altered after the fact.
How it works
seal: weights -> one verified AT-1 store (lossless, per-tensor SHA-256)
record: each inference -> ~0.5 KB capsule {input, seed, manifest, output-hash}
replay: auditor rebuilds the model from the verified store + pinned manifest,
re-runs the capsule input -> asserts BYTE-IDENTICAL output
tamper: flip one weight byte -> integrity check fails (detected + located)The one requirement is discipline, not magic: the manifest pins the execution recipe(framework + version + dtype + device + kernel). With it pinned, replay is exact; that’s the same “pin the kernel/SKU” rule any reproducible system follows.
Who needs it
- EU AI Act high-risk systems — record-keeping that lets a decision be reconstructed (Art. 12).
- Credit, hiring, insurance, healthcare AI — defensible, reproducible decision records.
- Model providers — prove which exact weights served a request, untampered.
- AI assurance / Big-4 — an evidence layer beneath governance platforms.