Docs · patent pending
Addressable weight compression: format & results
The .at1wcontainer stores a model’s tensors compressed losslessly, behind an index that keeps every tensor randomly addressable and integrity-verified. That’s the property a general compressor (gzip/xz on a whole file) structurally cannot offer: to read any tensor from a single compressed stream you must decompress all of it. This page documents the format, the drop-in API, the verified results, billing, and the honest boundary.
What the format does
For each tensor, the bytes of every element are split into byte planes (the k-th byte of every element grouped together — a structure-of-arrays transform), and each plane is entropy-coded independently. For floating-point weights the exponent plane is highly redundant and compresses strongly, while mantissa planes are near-random; coding the planes separately beats coding the interleaved bytes. Each compressed tensor is concatenated into a body, and a footer index records, per tensor, its name, dtype, shape, offset, compressed length, element count and a per-tensor SHA-256, with a whole-body hash.
A reader seeks to a named tensor’s offset, reads onlyits compressed bytes, reverses the byte-plane transform, and verifies its hash — random access without touching any other tensor. That’s how one layer, one mixture-of-experts expert, or a single LoRA adapter is loaded selectively.
A drop-in for safetensors
The API mirrors safetensors.numpy, plus a selective fetch:
import at1_weights as w
# Convert an existing model repo (lossless, addressable):
raw, packed = w.convert_safetensors("model.safetensors", "model.at1w")
# e.g. 166.0 MB -> 114.4 MB
# safetensors-compatible save / load:
w.save_file(tensors, "model.at1w") # {name: np.ndarray} -> .at1w
all_tensors = w.load_file("model.at1w") # -> {name: np.ndarray}, each verified
# The capability xz/gzip cannot offer — fetch ONE tensor, integrity-checked,
# without decompressing the rest:
q = w.load_tensor("model.at1w", "layers.0.attention.wq.weight")bf16 has no numpy dtype; pass it as uint16 (its exact bit pattern) — stored losslessly.
Verified results (real transformer weights)
| Format | Smaller than raw | Lossless |
|---|---|---|
| fp32 (BERT-tiny) | ~16% | byte-exact ✓ |
| fp16 (Pythia-70m) | ~25–31% | byte-exact ✓ (94/94 tensors) |
| bf16 (the format large models ship in) | 32–42% | byte-exact ✓ |
Pythia-70m: 166 MB → 114 MB (31% smaller), byte-exact on all 94 tensors; a single tensor fetched without decompressing the rest, integrity verified per tensor. bf16 compresses best because it has fewer random mantissa bits.
Billing
Packing a model registers the container as TB under management (archive_id="weights:<file>") on the same 3-axis meter the AT-1 encoder uses — storage + ratio axis on pack, no charge for decompress/read. It’s the same usage rail and quota as AT-1 storage; offline / unlicensed use is a no-op.
The honest boundary
Two things we state up front. First, this is not a better ratio than xz— only ~5–10% better on the byte count. The moat is that the result stays randomly addressable and per-tensor integrity-verified, which xz/gzip on a whole file cannot be. For a purely cold model where you only want max ratio, plain xz is fine and we won’t pitch there. Second, you cannot run inference on the compressed bytes — a tensor is decompressed on load. The benefit is storage + download bandwidth + selective partial loading + integrity, not compute.
See the addressable-weights overview and the core AT-1 format spec. To measure it on your own registry, request a measurement.