AT1DB · AI & models
AT-1 Vectors
A vector search index is RAM-bound: fp32 embeddings are expensive to keep hot. Quantizing to int8 makes the index 4x smaller, but naive quantization quietly degrades search quality — and nobody tells you by how much. AT-1 Vectors compresses the index and, at pack time, measures the resulting recall@k on a held-out query sample versus the exact fp32 neighbours, then stamps that number into the container as a certificate. You choose the footprint; we tell you, provably, what it costs your search quality.
Pack — measure and stamp the certificate
Packing quantizes the index to an int8 scan tier and (for balanced / exact) keeps a higher-precision re-rank tier. It then measures recall@k against the exact neighbours on a held-out sample and writes the number into the header, so the certificate travels with the data.
at1 vector pack embeddings.npy --out index.at1x --level exact # pack measures recall@10 on held-out queries vs the EXACT fp32 neighbours # and stamps that number into the container header. levels: # exact int8 hot index + fp32 re-rank -> recall@10 = 1.00 (certified-100%) # balanced int8 hot index + fp16 re-rank -> recall@10 ~ 1.00 # aggressive int8 only -> recall@10 ~ 0.97, smallest on disk
The certificate, measured on real embeddings
On real MiniLM embeddings (384-dim, a 20-newsgroups corpus), int8 quantization gave a 4.0x smaller index at 99.4% recall@10. A more aggressive int4 hit only 91.9% — so the certificate rejects it rather than ship a silent quality loss. For workloads that need every neighbour, the exact level adds an fp32 re-rank tier and certifies recall@10 = 1.00 while the hot scan tier stays 4x smaller. The mechanism — scalar quantization plus a two-stage re-rank — is standard; the new part is the measured, re-checkable certificate per dataset.
at1 vector info index.at1x # AT-1 vector store: 40000 x 384 level=exact rerank=float32 # recall@10 certificate: 1.0 (measured on 200 held-out queries) # stored ... vs fp32 ... (disk ratio) # hot int8 index ~15,360,000B (4.0x smaller RAM)
Verify — re-check the promise
verify re-decodes the store and re-measures recall@k against the highest-fidelity tier in the container, then compares it to the stamped certificate. The guarantee is testable any time — a stored number you can reproduce, not a marketing claim.
at1 vector verify index.at1x
# re-decodes the store and RE-MEASURES recall@k against the highest-fidelity
# tier it carries, then compares to the stamped certificate:
# { "level": "exact", "k": 10,
# "stamped_recall_at_k": 1.0,
# "remeasured_recall_at_k": 1.0,
# "VERDICT": "PASS" }Search — the standard two-stage recovery
A fast int8 scan returns a wide candidate set; a re-rank against the kept high-precision tier restores the true ordering. This is the well-known recovery trick — what AT-1 adds is measuring and certifying exactly what it recovers, so you can size the index against a number rather than a guess.
at1 vector search index.at1x queries.npy -k 10 # two-stage: a fast int8 scan returns a wide candidate set, # then an fp32 (exact) / fp16 (balanced) re-rank restores the true top-k order.
What it's for
- RAG retrieval indexes — keep millions of chunks hot in less RAM with a known recall floor.
- Semantic search — cut the memory bill without silently losing the right results.
- Recommendation candidate stores — a smaller hot index for the first-stage retrieval.
- Dedup / near-duplicate search — certified-recall nearest-neighbour at a fraction of the footprint.
Searching and verifying a store are free and need no account; packing an index is metered against the connected account — same as the rest of AT-1.