Training data that proves its own composition.
Regulators and customers increasingly ask the same question of an AI dataset: what is it actually made of, and can you prove it? AT-1 Corpus Governance seals a dataset's real-vs-synthetic composition and per-record lineage into one verifiable artifact — so a value tamper or a composition lie is detected, not taken on trust.
Per-record lineage
Every record carries where it came from — real source vs synthetic generator — so the corpus can answer 'what is this made of?' down to the row.
Composition you can prove
The exact % real vs synthetic is sealed into the artifact. A reviewer verifies the mix without trusting your word for it.
Tamper- AND lie-detected
Alter a value, or misreport the real/synthetic split, and verification fails. It catches both data tampering and composition misrepresentation.
EU-AI-Act ready
The provenance record high-risk AI obligations ask for — kept as one verifiable, exportable artifact, not a spreadsheet of promises.
Where we're honest
This isn't a claim that synthetic data is as good as real — it's a claim that you can prove what your corpus contains. Validated on real public datasets; it catches both tampering and a misreported real/synthetic split. The engine ships compiled and license-gated and runs on your own machine — your data never leaves it.