A compressed vector store that proves the recall it keeps.
Vector search is RAM-bound — fp32 embeddings are expensive to keep hot. Quantizing to int8 makes the index 4x smaller, but it quietly degrades search quality and nobody tells you by how much. AT-1 Vectors compresses the index and measures the resulting recall@k against the exact neighbours, then stamps that number into the file. You pick the footprint; we tell you, provably, the recall you keep.
4x smaller hot index
The int8 scan tier is 4x smaller than fp32, so 4x more vectors stay hot in the same RAM. The expensive part of a vector DB is the memory; this is where the bill lives.
A measured recall certificate
At pack time we measure recall@k on held-out queries versus the EXACT fp32 neighbours and stamp the number into the container. You never have to guess what quantization cost you.
Certified-100% option
The `exact` level keeps a full-precision re-rank tier, so a fast int8 scan returns candidates and an fp32 re-rank restores the exact top-k — recall@10 = 1.00, certified.
Re-checkable, not a claim
`verify` re-decodes the container and re-measures recall against the stored high-fidelity tier, then checks it against the stamped certificate. The promise is testable any time.
Measured, not promised
On real MiniLM embeddings (384-dim, a 20-newsgroups corpus), int8 quantization gives a 4.0x smaller index at 99.4% recall@10 — and our certificate says so. A more aggressive int4 hit only 91.9% recall, so the certificate rejects it rather than ship a silent quality loss. For workloads that need every neighbour, the exact level adds an fp32 re-rank tier and certifies recall@10 = 1.00, while the hot scan tier stays 4x smaller.
| Level | Hot RAM index | Certified recall@10 | What it's for |
|---|---|---|---|
| exact | 4.0x smaller | 1.00 | every-neighbour workloads — the recall is certified-100% |
| balanced | 4.0x smaller | ~1.00 | fp16 re-rank — near-exact recall at a smaller stored size |
| aggressive | 4.0x smaller | ~0.97 | int8-only, smallest on disk — when a few percent recall is acceptable |
Pack once — the certificate is in the file
Packing measures recall@k on a held-out query sample versus the exact fp32 neighbours and writes it into the container header. The number travels with the data.
# pack an embedding matrix; we MEASURE recall@10 and stamp a certificate at1 vector pack embeddings.npy --out index.at1x --level exact # packed (40000, 384) -> index.at1x at1 vector info index.at1x # AT-1 vector store: 40000 x 384 level=exact rerank=float32 # recall@10 certificate: 1.0 (measured on 200 held-out queries) # hot int8 index ~15,360,000B (4.0x smaller RAM)
Re-check the promise any time
verify re-decodes the store and re-measures recall against the highest-fidelity tier it carries, then compares it to the stamped certificate. The guarantee is testable, not a marketing number.
at1 vector verify index.at1x
# { "level": "exact", "k": 10,
# "stamped_recall_at_k": 1.0,
# "remeasured_recall_at_k": 1.0,
# "VERDICT": "PASS" }Search is the standard two-stage recovery
A fast int8 scan returns a wide candidate set; a re-rank against the kept high-precision tier restores the true ordering. It's the well-known trick — the new part is that we measured and certified what it recovers, per dataset.
# two-stage search: fast int8 scan -> fp32 re-rank of the candidates at1 vector search index.at1x queries.npy -k 10
How it's different
| Approach | 4x smaller index | Certified recall | Re-checkable |
|---|---|---|---|
fp32 vector store full quality, but the whole index is heavy in RAM — no compression | |||
int8 quantized (typical) 4x smaller, but the recall hit is silent — nobody tells you what it cost | |||
PQ / OPQ codebooks small and fast, but recall is approximate and tuned, not measured-and-stamped per dataset | partial | ||
AT-1 Vectors 4x smaller hot index, a stamped recall certificate, and a certified-100% re-rank tier |
Quantization and two-stage re-rank are standard techniques — AT-1 Vectors is not claiming to invent them. What's new is shipping a measured recall certificate inside the compressed store and a certified-100%level, so you can size the index against a number you can re-check, not a vendor's promise.
Built for
RAG retrieval indexes · semantic search · recommendation candidate stores · dedup / near-duplicate search — anywhere you keep millions of embeddings hot and want to cut the RAM bill without silently losing recall.
Searching and verifying a store needs no account; packing an index is metered against a connected account.