One archive that is compressed, queryable, tamper-evident, and erasable — all at once.
Regulated data forces a trade-off: compress it and it goes opaque; keep it queryable in a warehouse and you can't erase a subject or prove it was never altered. AT-1's Regulated Archive is a single sealed bundle that does all four — so you query without rehydrating, prove integrity on demand, and honour a right-to-erasure request in milliseconds, without rewriting a byte.
Compressed
A single bundle, 3–5× smaller than raw — competitive with Parquet on footprint, and it carries the next three properties no columnar format does.
Queryable in place
Predicate/projection pushdown returns exact original rows without a full decompress. A selective window over a clustered column touches under 1% of the file — ~8.5× faster than restore-then-scan at a million rows.
Tamper-evident
A SHA-256 manifest binds the analytic and PII parts. Flip a single byte anywhere and verification fails — the archive is provably the original, or provably not.
Per-subject erasable
Each data subject's PII is encrypted under their own key. A GDPR Art.17 erasure destroys that one key in milliseconds; their pseudonymous analytic rows stay queryable and the archive bytes never move.
Nobody else does all four
We're not the smallest file on this table — Parquet edges us on raw ratio. We're the only one that is also tamper-evident and per-subject erasable, in the same artifact you query.
| Format | Compressed | Queryable in place | Tamper- evident | Per-subject erasable |
|---|---|---|---|---|
Live database erasable & queryable, but not a compressed retained archive | partial | |||
gzip / zstd / xz smallest-ish, but opaque — any query or erase means a full restore | ||||
Parquet + zstd queryable & compact, but cannot erase a subject or prove integrity | ||||
AT-1 Regulated all four, in one sealed artifact |
The part that makes this legally and commercially real: erasing a subject removes their identifying PII (name, email, card) — but their pseudonymous rows stay queryable. Your aggregates, fraud models and financial totals stay correct while the individual is genuinely forgotten. In our benchmark, all 1,000,000 analytic rows still queried after the erasure, and the analytic archive's bytes were byte-for-byte identical before and after.
Benchmarked, honestly, on 1,000,000 transactions
Honest scope: the query advantage applies to selective queries over clustered columns (it reads only the blocks a predicate touches); a random-column full scan reads everything, same as anyone. Per-subject encryption adds storage overhead, which is why we trail Parquet on raw ratio. That overhead is fixed per data-subject, so the storage win shows up when analytic columns outweigh PII— transaction, event and telemetry data with many rows per person (validated on real public payroll: the analytic part compressed 13.7×, while a thin mostly-names table is the wrong shape and barely beats raw). Build throughput is the current frontier we're widening. Cryptographic erasure (“crypto-shredding”) is an established, regulator-recognised method — our contribution is the unified, queryable, productised archive, not a new cryptographic claim.
One command surface
at1 regulated build txns.json --subject-field user_id --pii email,card_last4 --out arc/
at1 regulated query arc/ --where amount_cents:240000:250000 --select amount_cents
# queries the compressed bundle in place
at1 regulated verify arc/ # -> integrity: PASS
at1 regulated read arc/ 1337 # -> subject 1337's PII
at1 regulated erase arc/ 1337 --signing-key issuer.key --out-cert cert.json
# PII destroyed; analytic rows still query; bytes unchanged
at1 regulated verify arc/ # -> still PASS (manifest re-sealed)Who this is for
- DPOs & Legal — close the backup-erasure gap while keeping analytics on retained data.
- Fintech & payments — query transaction history in place, erase a customer, prove WORM integrity for audit.
- Healthcare & adtech — keep pseudonymous analytics correct after a subject is forgotten.
- Long-retention archives — years of snapshots that must stay queryable, provable, and erasable.
Send a representative sample — 100k to 1M rows, with identifiers masked on your side if you prefer. We run build → query → erase → verify and return a one-page report with your numbers: storage vs gzip/Parquet, query-in-place latency and % of the file read, per-subject erasure time with a signed certificate, and an integrity check. No data leaves your control beyond the sample you choose to send, and there's no commitment.
Bring a sample of your regulated data — we'll prove all four on it in a pilot.