AT-1 Vectors

A compressed vector store that proves the recall it keeps.

Vector search is RAM-bound — fp32 embeddings are expensive to keep hot. Quantizing to int8 makes the index 4x smaller, but it quietly degrades search quality and nobody tells you by how much. AT-1 Vectors compresses the index and measures the resulting recall@k against the exact neighbours, then stamps that number into the file. You pick the footprint; we tell you, provably, the recall you keep.

4x smaller hot index

The int8 scan tier is 4x smaller than fp32, so 4x more vectors stay hot in the same RAM. The expensive part of a vector DB is the memory; this is where the bill lives.

A measured recall certificate

At pack time we measure recall@k on held-out queries versus the EXACT fp32 neighbours and stamp the number into the container. You never have to guess what quantization cost you.

Certified-100% option

The `exact` level keeps a full-precision re-rank tier, so a fast int8 scan returns candidates and an fp32 re-rank restores the exact top-k — recall@10 = 1.00, certified.

Re-checkable, not a claim

`verify` re-decodes the container and re-measures recall against the stored high-fidelity tier, then checks it against the stamped certificate. The promise is testable any time.

Measured, not promised

On real MiniLM embeddings (384-dim, a 20-newsgroups corpus), int8 quantization gives a 4.0x smaller index at 99.4% recall@10 — and our certificate says so. A more aggressive int4 hit only 91.9% recall, so the certificate rejects it rather than ship a silent quality loss. For workloads that need every neighbour, the exact level adds an fp32 re-rank tier and certifies recall@10 = 1.00, while the hot scan tier stays 4x smaller.

LevelHot RAM indexCertified recall@10What it's for
exact4.0x smaller1.00every-neighbour workloads — the recall is certified-100%
balanced4.0x smaller~1.00fp16 re-rank — near-exact recall at a smaller stored size
aggressive4.0x smaller~0.97int8-only, smallest on disk — when a few percent recall is acceptable

Pack once — the certificate is in the file

Packing measures recall@k on a held-out query sample versus the exact fp32 neighbours and writes it into the container header. The number travels with the data.

# pack an embedding matrix; we MEASURE recall@10 and stamp a certificate
at1 vector pack embeddings.npy --out index.at1x --level exact
#  packed (40000, 384) -> index.at1x

at1 vector info index.at1x
#  AT-1 vector store: 40000 x 384  level=exact  rerank=float32
#    recall@10 certificate: 1.0   (measured on 200 held-out queries)
#    hot int8 index ~15,360,000B (4.0x smaller RAM)

Re-check the promise any time

verify re-decodes the store and re-measures recall against the highest-fidelity tier it carries, then compares it to the stamped certificate. The guarantee is testable, not a marketing number.

at1 vector verify index.at1x
#  { "level": "exact", "k": 10,
#    "stamped_recall_at_k": 1.0,
#    "remeasured_recall_at_k": 1.0,
#    "VERDICT": "PASS" }

Search is the standard two-stage recovery

A fast int8 scan returns a wide candidate set; a re-rank against the kept high-precision tier restores the true ordering. It's the well-known trick — the new part is that we measured and certified what it recovers, per dataset.

# two-stage search: fast int8 scan -> fp32 re-rank of the candidates
at1 vector search index.at1x queries.npy -k 10

How it's different

Approach4x smaller indexCertified recallRe-checkable
fp32 vector store
full quality, but the whole index is heavy in RAM — no compression
int8 quantized (typical)
4x smaller, but the recall hit is silent — nobody tells you what it cost
PQ / OPQ codebooks
small and fast, but recall is approximate and tuned, not measured-and-stamped per dataset
partial
AT-1 Vectors
4x smaller hot index, a stamped recall certificate, and a certified-100% re-rank tier

Quantization and two-stage re-rank are standard techniques — AT-1 Vectors is not claiming to invent them. What's new is shipping a measured recall certificate inside the compressed store and a certified-100%level, so you can size the index against a number you can re-check, not a vendor's promise.

Built for

RAG retrieval indexes · semantic search · recommendation candidate stores · dedup / near-duplicate search — anywhere you keep millions of embeddings hot and want to cut the RAM bill without silently losing recall.

Searching and verifying a store needs no account; packing an index is metered against a connected account.