Docs · patent pending

Addressable weight compression: format & results

The .at1wcontainer stores a model’s tensors compressed losslessly, behind an index that keeps every tensor randomly addressable and integrity-verified. That’s the property a general compressor (gzip/xz on a whole file) structurally cannot offer: to read any tensor from a single compressed stream you must decompress all of it. This page documents the format, the drop-in API, the verified results, billing, and the honest boundary.

What the format does

For each tensor, the bytes of every element are split into byte planes (the k-th byte of every element grouped together — a structure-of-arrays transform), and each plane is entropy-coded independently. For floating-point weights the exponent plane is highly redundant and compresses strongly, while mantissa planes are near-random; coding the planes separately beats coding the interleaved bytes. Each compressed tensor is concatenated into a body, and a footer index records, per tensor, its name, dtype, shape, offset, compressed length, element count and a per-tensor SHA-256, with a whole-body hash.

A reader seeks to a named tensor’s offset, reads onlyits compressed bytes, reverses the byte-plane transform, and verifies its hash — random access without touching any other tensor. That’s how one layer, one mixture-of-experts expert, or a single LoRA adapter is loaded selectively.

A drop-in for safetensors

The API mirrors safetensors.numpy, plus a selective fetch:

import at1_weights as w

# Convert an existing model repo (lossless, addressable):
raw, packed = w.convert_safetensors("model.safetensors", "model.at1w")
# e.g. 166.0 MB -> 114.4 MB

# safetensors-compatible save / load:
w.save_file(tensors, "model.at1w")     # {name: np.ndarray} -> .at1w
all_tensors = w.load_file("model.at1w")  # -> {name: np.ndarray}, each verified

# The capability xz/gzip cannot offer — fetch ONE tensor, integrity-checked,
# without decompressing the rest:
q = w.load_tensor("model.at1w", "layers.0.attention.wq.weight")

bf16 has no numpy dtype; pass it as uint16 (its exact bit pattern) — stored losslessly.

Verified results (real transformer weights)

Format	Smaller than raw	Lossless
fp32 (BERT-tiny)	~16%	byte-exact ✓
fp16 (Pythia-70m)	~25–31%	byte-exact ✓ (94/94 tensors)
bf16 (the format large models ship in)	32–42%	byte-exact ✓

Pythia-70m: 166 MB → 114 MB (31% smaller), byte-exact on all 94 tensors; a single tensor fetched without decompressing the rest, integrity verified per tensor. bf16 compresses best because it has fewer random mantissa bits.

Billing

Packing a model registers the container as TB under management (archive_id="weights:<file>") on the same 3-axis meter the AT-1 encoder uses — storage + ratio axis on pack, no charge for decompress/read. It’s the same usage rail and quota as AT-1 storage; offline / unlicensed use is a no-op.

The honest boundary

Two things we state up front. First, this is not a better ratio than xz— only ~5–10% better on the byte count. The moat is that the result stays randomly addressable and per-tensor integrity-verified, which xz/gzip on a whole file cannot be. For a purely cold model where you only want max ratio, plain xz is fine and we won’t pitch there. Second, you cannot run inference on the compressed bytes — a tensor is decompressed on load. The benefit is storage + download bandwidth + selective partial loading + integrity, not compute.

See the addressable-weights overview and the core AT-1 format spec. To measure it on your own registry, request a measurement.