A VCF that is compressed, region-queryable, tamper-evident, and per-sample erasable.
This is the genomics vertical of the Regulated Archive— the same product, applied to a VCF. One sealed archive holds the variant sites and every sample's genotypes, so you query a locus without rehydrating the cohort, prove the archive was never altered, and honour a per-participant erasure — while the variant sites stay byte-stable and queryable.
- 1
- sealed VCF archive — compressed + queryable + provable + erasable
- CHROM:POS
- region query reads only the blocks it touches
- per-sample
- GDPR Art.17 erasure — one individual's genotypes, gone
- byte-stable
- variant sites unchanged after an erasure; audit chain intact
Four properties, one archive
Compressed
A population VCF becomes one sealed archive — variant sites and per-sample genotype matrices packed together, far smaller than raw. It carries the next three properties no .vcf.gz or BCF does.
Region-queryable
Ask for variants in a CHROM:POS range and the archive returns them by reading only the blocks that range touches — no full decompress, no rehydrating the whole cohort to look at one locus.
Tamper-evident
A SHA-256 manifest binds the variant sites and per-sample genotype keys. Flip a single byte anywhere and verification fails — the archive is provably the original cohort, or provably not.
Per-sample erasable
Each sample's genotypes are encrypted under that individual's own key. A right-to-erasure request destroys that one key; the variant sites stay byte-stable and queryable, and every other sample is untouched.
The part that makes this legally and scientifically real: erasing a sample destroys that individual's genotype contribution, but the variant sites stay byte-stable and queryable and every other sample is intact. In our validation on a real 1000G-style archive, a region query returned the variants in a CHROM:POSrange while reading only the blocks that range touched; erasing a sample destroyed that individual's genotype key while the variant sites stayed byte-for-byte identical and the other samples queried unchanged.
GDPR Art.17 cryptographic erasure (key destruction) — rendering a subject's data permanently unrecoverable.Erasure is performed by destroying the subject's unique encryption key; the ciphertext remains byte-stable so tamper-evidence/audit chains stay intact, and the data is cryptographically irrecoverable. This is the recognised NIST SP 800-88r1 “Cryptographic Erase” method — an established, regulator-recognised approach, not a new cryptographic claim.
One command surface
at1 regulated build cohort.vcf --subject-field sample_id --out arc/
# variant sites + per-sample genotypes, one sealed archive
at1 regulated query arc/ --region chr7:117480000:117670000
# variants in the range, reading only touched blocks
at1 regulated verify arc/ # -> integrity: PASS
at1 regulated erase arc/ NA12878 --signing-key issuer.key --out-cert cert.json
# sample's genotype key destroyed; variant sites unchanged
at1 regulated verify arc/ # -> still PASS (manifest re-sealed)This is the Regulated Archive bundle applied to VCF — the region-query advantage applies to selective queries over position-clustered blocks (it reads only what a range touches); a whole-genome scan reads everything, same as anyone. Per-sample encryption adds a fixed per-participant overhead, so the archive is at its best when the variant-site payload outweighs per-sample keys — cohorts with many variants across many samples. For the raw VCF compression numbers see Genomics; for how key-destruction erasure works see Right to erasure.