AT1DB · AI & models

AT-1 Vectors

A vector search index is RAM-bound: fp32 embeddings are expensive to keep hot. Quantizing to int8 makes the index 4x smaller, but naive quantization quietly degrades search quality — and nobody tells you by how much. AT-1 Vectors compresses the index and, at pack time, measures the resulting recall@k on a held-out query sample versus the exact fp32 neighbours, then stamps that number into the container as a certificate. You choose the footprint; we tell you, provably, what it costs your search quality.

Pack — measure and stamp the certificate

Packing quantizes the index to an int8 scan tier and (for balanced / exact) keeps a higher-precision re-rank tier. It then measures recall@k against the exact neighbours on a held-out sample and writes the number into the header, so the certificate travels with the data.

at1 vector pack embeddings.npy --out index.at1x --level exact
# pack measures recall@10 on held-out queries vs the EXACT fp32 neighbours
# and stamps that number into the container header. levels:
#   exact       int8 hot index + fp32 re-rank  -> recall@10 = 1.00 (certified-100%)
#   balanced    int8 hot index + fp16 re-rank  -> recall@10 ~ 1.00
#   aggressive  int8 only                       -> recall@10 ~ 0.97, smallest on disk

The certificate, measured on real embeddings

On real MiniLM embeddings (384-dim, a 20-newsgroups corpus), int8 quantization gave a 4.0x smaller index at 99.4% recall@10. A more aggressive int4 hit only 91.9% — so the certificate rejects it rather than ship a silent quality loss. For workloads that need every neighbour, the exact level adds an fp32 re-rank tier and certifies recall@10 = 1.00 while the hot scan tier stays 4x smaller. The mechanism — scalar quantization plus a two-stage re-rank — is standard; the new part is the measured, re-checkable certificate per dataset.

at1 vector info index.at1x
# AT-1 vector store: 40000 x 384  level=exact  rerank=float32
#   recall@10 certificate: 1.0   (measured on 200 held-out queries)
#   stored ... vs fp32 ...        (disk ratio)
#   hot int8 index ~15,360,000B  (4.0x smaller RAM)

Verify — re-check the promise

verify re-decodes the store and re-measures recall@k against the highest-fidelity tier in the container, then compares it to the stamped certificate. The guarantee is testable any time — a stored number you can reproduce, not a marketing claim.

at1 vector verify index.at1x
# re-decodes the store and RE-MEASURES recall@k against the highest-fidelity
# tier it carries, then compares to the stamped certificate:
# { "level": "exact", "k": 10,
#   "stamped_recall_at_k": 1.0,
#   "remeasured_recall_at_k": 1.0,
#   "VERDICT": "PASS" }

Search — the standard two-stage recovery

A fast int8 scan returns a wide candidate set; a re-rank against the kept high-precision tier restores the true ordering. This is the well-known recovery trick — what AT-1 adds is measuring and certifying exactly what it recovers, so you can size the index against a number rather than a guess.

at1 vector search index.at1x queries.npy -k 10
# two-stage: a fast int8 scan returns a wide candidate set,
# then an fp32 (exact) / fp16 (balanced) re-rank restores the true top-k order.

What it's for

RAG retrieval indexes — keep millions of chunks hot in less RAM with a known recall floor.
Semantic search — cut the memory bill without silently losing the right results.
Recommendation candidate stores — a smaller hot index for the first-stage retrieval.
Dedup / near-duplicate search — certified-recall nearest-neighbour at a fraction of the footprint.

Searching and verifying a store are free and need no account; packing an index is metered against the connected account — same as the rest of AT-1.