addressable weights · patent pending

Compressed weights that are still randomly addressable.

Models ship essentially uncompressed because gzip/xz forfeit the random access that loading a model needs — one stream, all-or-nothing. AT-1 compresses each tensor independently behind an index, so the file is ~32–42% smaller than raw bf16 and yet any single tensor is fetched, decompressed, and SHA-256-verified on its own. A drop-in for safetensors.

32–42%
smaller than raw bf16 (the format models ship in)
per-tensor
random access — fetch one tensor, not the file
SHA-256
integrity verified on every tensor read
byte-exact
94/94 Pythia tensors reconstruct identically

What a whole-file compressor can’t do

Fetch one tensor, not the whole file

gzip/xz compress a model as one stream — to read any tensor you must decompress all of it. AT-1 compresses tensor-by-tensor behind an index, so a single layer, one mixture-of-experts expert, or a LoRA adapter is fetched and decompressed without touching the rest. That's the capability a whole-file compressor structurally cannot offer.

Every weight is integrity-checked

Each tensor carries its own SHA-256, verified the moment it's read. After the model-poisoning and tampered-checkpoint scares, a registry that proves each downloaded weight is exactly the published bytes — per tensor, not just per file — is its own selling point.

A drop-in for safetensors

save_file / load_file mirror the safetensors API, plus load_tensor for selective fetch. Point a hub's download path at .at1w and callers don't change — they just get ~40% smaller storage, smaller downloads, and selective loading.

Validated on real transformer weights

Real trained models were packed and read back byte-for-byte. bf16 — the format modern large models ship in — compresses best because it has fewer random mantissa bits.

Pythia-70m (fp16)
166 MB → 114 MB

31% smaller, byte-exact on all 94 tensors, a single tensor fetched without decompressing the rest, integrity verified per tensor.

Lossless size vs the raw stored format
fp32 ~16% · fp16 ~25% · bf16 32–42%

The byte-plane (exponent/mantissa) split adds ~5–10% on ratio over a general compressor — but the decisive, unique property is staying randomly addressable and integrity-verified.

Where we’re honest about the boundary

What we sell

~32–42% smaller than raw bf16 WHILE staying randomly addressable + per-tensor integrity-verified.

What we don't claim

We are not a better ratio than xz — only ~5–10% better. If you just want max ratio on a cold model, use xz; the moat here is addressability + integrity, not the byte count.

Honest limit

You cannot run inference on compressed bytes — a tensor is decompressed on load. The win is storage + download bandwidth + selective load + integrity, not compute.

How a registry integrates it

01

Pack

Convert a .safetensors repo to .at1w — lossless, tensor-by-tensor, with a per-tensor index and hash. ~40% smaller at rest.

02

Serve

Put .at1w behind the registry's download path. A drop-in loader decompresses per tensor on load; selective fetch pulls only the layers/experts/adapters requested.

03

Verify

Every served tensor is SHA-256-checked against the published value — supply-chain integrity on every download, not just the file.

04

Bill

Registered as TB under management on the same meter as AT-1 storage. Decompress/read isn't billed.

Measure it on your own corpus

We’ll pack a sample of your model registry and report the size reduction, the byte-exact check, and a selective-fetch latency — on your own weights.

Patent pending (US provisional filed 2026). Billed as TB under management, like AT-1 storage.