The Regulated Archive

One archive that is compressed, queryable, tamper-evident, and erasable — all at once.

Regulated data forces a trade-off: compress it and it goes opaque; keep it queryable in a warehouse and you can't erase a subject or prove it was never altered. AT-1's Regulated Archive is a single sealed bundle that does all four — so you query without rehydrating, prove integrity on demand, and honour a right-to-erasure request in milliseconds, without rewriting a byte.

Compressed

A single bundle, 3–5× smaller than raw — competitive with Parquet on footprint, and it carries the next three properties no columnar format does.

Queryable in place

Predicate/projection pushdown returns exact original rows without a full decompress. A selective window over a clustered column touches under 1% of the file — ~8.5× faster than restore-then-scan at a million rows.

Tamper-evident

A SHA-256 manifest binds the analytic and PII parts. Flip a single byte anywhere and verification fails — the archive is provably the original, or provably not.

Per-subject erasable

Each data subject's PII is encrypted under their own key. A GDPR Art.17 erasure destroys that one key in milliseconds; their pseudonymous analytic rows stay queryable and the archive bytes never move.

Nobody else does all four

We're not the smallest file on this table — Parquet edges us on raw ratio. We're the only one that is also tamper-evident and per-subject erasable, in the same artifact you query.

Format	Compressed	Queryable in place	Tamper- evident	Per-subject erasable
Live database erasable & queryable, but not a compressed retained archive				partial
gzip / zstd / xz smallest-ish, but opaque — any query or erase means a full restore
Parquet + zstd queryable & compact, but cannot erase a subject or prove integrity
AT-1 Regulated all four, in one sealed artifact

Forget the person, keep the data useful

The part that makes this legally and commercially real: erasing a subject removes their identifying PII (name, email, card) — but their pseudonymous rows stay queryable. Your aggregates, fraud models and financial totals stay correct while the individual is genuinely forgotten. In our benchmark, all 1,000,000 analytic rows still queried after the erasure, and the analytic archive's bytes were byte-for-byte identical before and after.

Benchmarked, honestly, on 1,000,000 transactions

4.45×

smaller than raw CSV (one bundle). gzip 3.28×, Parquet 5.10× — and neither can erase or prove integrity.

~8.5×

faster on a selective time-window query (89 ms vs 761 ms restore-then-scan), reading 0.4% of the file.

228 ms

to erase a subject and emit a signed certificate — independent of archive size; the bytes never move.

1 byte

is all it takes to fail verification — tamper anywhere and the manifest catches it.

Honest scope: the query advantage applies to selective queries over clustered columns (it reads only the blocks a predicate touches); a random-column full scan reads everything, same as anyone. Per-subject encryption adds storage overhead, which is why we trail Parquet on raw ratio. That overhead is fixed per data-subject, so the storage win shows up when analytic columns outweigh PII— transaction, event and telemetry data with many rows per person (validated on real public payroll: the analytic part compressed 13.7×, while a thin mostly-names table is the wrong shape and barely beats raw). Build throughput is the current frontier we're widening. Cryptographic erasure (“crypto-shredding”) is an established, regulator-recognised method — our contribution is the unified, queryable, productised archive, not a new cryptographic claim.

One command surface

at1 regulated build txns.json --subject-field user_id --pii email,card_last4 --out arc/
at1 regulated query  arc/ --where amount_cents:240000:250000 --select amount_cents
                                        # queries the compressed bundle in place
at1 regulated verify arc/               # -> integrity: PASS
at1 regulated read   arc/ 1337          # -> subject 1337's PII
at1 regulated erase  arc/ 1337 --signing-key issuer.key --out-cert cert.json
                                        # PII destroyed; analytic rows still query; bytes unchanged
at1 regulated verify arc/               # -> still PASS (manifest re-sealed)

Who this is for

DPOs & Legal — close the backup-erasure gap while keeping analytics on retained data.
Fintech & payments — query transaction history in place, erase a customer, prove WORM integrity for audit.
Healthcare & adtech — keep pseudonymous analytics correct after a subject is forgotten.
Long-retention archives — years of snapshots that must stay queryable, provable, and erasable.

A free proof on your own data, in 24–48 hours

Send a representative sample — 100k to 1M rows, with identifiers masked on your side if you prefer. We run build → query → erase → verify and return a one-page report with your numbers: storage vs gzip/Parquet, query-in-place latency and % of the file read, per-subject erasure time with a signed certificate, and an integrity check. No data leaves your control beyond the sample you choose to send, and there's no commitment.

Bring a sample of your regulated data — we'll prove all four on it in a pilot.

Start a pilot Read the docs How erasure works The compliance tier