The Regulated Archive — quickstart
One sealed bundle that is compressed, queryable in place, tamper-evident, and per-subject erasable— at the same time. It composes AT-1's queryable columnar codec for the analytic columns with per-subject cryptographic erasure for the PII columns, bound by a SHA-256 integrity manifest. See the product overview for the why and the benchmarks.
1. Build the archive
Split your columns into analytic (queryable) and PII (erasable). The subject field stays in both — a pseudonymous key in the analytic table and the erasure key for the PII.
# records.json: a JSON array of row objects. Pick the data-subject field and the PII columns; # everything else becomes the queryable analytic table. at1 regulated build txns.json --subject-field user_id --pii email,card_last4 --out arc/ # -> arc/analytic.at1 queryable qcolumnar (predicate/projection pushdown) # arc/archive.json per-subject AES-256-GCM PII blocks (compressed before encryption) # arc/vault.json the key store (back with a KMS/HSM in production) # arc/manifest.json schema + SHA-256 integrity binding both parts
2. Query in place
The analytic columns answer predicate/projection queries without a full decompress, returning the exact original rows. Add more on the query model.
# query the analytic columns IN PLACE — no decompress, reads only the blocks a predicate touches at1 regulated query arc/ --where amount_cents:240000:250000 --select amount_cents at1 regulated query arc/ --where ts:1700000000000:1700000600000 --select txn_id,ts # selective windows over a clustered column skip whole row-groups via zone-maps
3. Prove it was never altered
at1 regulated verify arc/ # -> integrity: PASS (recomputes the manifest's SHA-256 over analytic.at1 + archive.json)
4. Erase a subject — and keep the data useful
Erasing a subject destroys their PII key. Their identifying data is gone, but their pseudonymous analytic rows stay queryableand the analytic archive's bytes never move — so aggregates, totals and models stay correct while the person is genuinely forgotten.
at1 regulated keygen --out issuer.key # Ed25519 signing identity (+ .pub) at1 regulated read arc/ 1337 # the subject's PII (email, card_last4) at1 regulated erase arc/ 1337 --signing-key issuer.key --out-cert cert.json at1 regulated read arc/ 1337 # -> exits non-zero: ERASED at1 regulated verify arc/ # -> still PASS (manifest re-sealed) at1 regulated query arc/ --where amount_cents:0:1000000000 --select txn_id # -> ALL rows still return: the forgotten subject's pseudonymous analytic rows are intact
What makes this different
- Four properties, one artifact. gzip is opaque (restore for any query or erase); Parquet is queryable but can't erase a subject or prove integrity; a live DB isn't a compressed retained archive. Only this bundle does all four.
- Erasure is O(1) in archive size. Destroy one key; the encrypted PII block stays physically present but unrecoverable. Milliseconds whether the archive is a megabyte or a petabyte.
- The analytic archive is immutable under erasure. Its SHA-256 is identical before and after — the manifest proves no other data moved.
Honest scope
The query advantage applies to selective queries over clustered columns (it reads only the blocks a predicate touches); a random-column full scan reads everything, same as any format. Per-subject encryption adds storage overhead, so the bundle trails Parquet on raw ratio — it is the only one that is also erasable and tamper-evident. Erasure is cryptographic erasure (crypto-shredding), a regulator-recognised method (EDPB 5/2019, ICO, CNIL); the contribution is the unified, queryable, productised archive, not a new cryptographic claim. Encryption is AES-256-GCM; certificates are Ed25519-signed. Back the vault.json key store with a KMS/HSM in production. See also the erasure internals.