Verified Model Runtime — at1 model
at1 model turns any local model into one self-describing container — weights, config.json, and the tokenizer sealed together in a single .at1m file — and runs inference straight out of it. Every tensor carries a SHA-256; loading fails closed if a single byte was altered, and the server surfaces it to the client as X-AT1-Integrity: verified.
The engine ships compiled and license-gatedand runs on your own machine — your model and prompts never leave it. Honest framing: the OpenAI API shape and on-disk weight packing aren't novel on their own; the wedge is the verified + compressed + addressable + self-describing container you run inference from.
Pack & verify
# seal weights + config + tokenizer into ONE verified container at1 model pack ./my-model --out model.at1m --name my-model # ./my-model is a HF model dir (or a single .safetensors). Sharded models are supported. at1 model info model.at1m # manifest: arch, tensor/aux counts, codec at1 model verify model.at1m # re-check every tensor SHA-256 + manifest (fail-closed)
pack streams weights one tensor at a time (peak RAM ~ the largest single tensor), so a model larger than RAM still packs. verify re-reads every tensor, recomputes its SHA-256, and checks the embedded manifest — a tamper anywhere is named and the command exits non-zero.
Run (one-shot)
# one-shot generation, straight from the verified container at1 model run model.at1m --prompt "Summarise this release in one line." --max-tokens 128 # -> [at1-model my-model N tensors SHA-256-verified on load] + the completion
Tokenizer and config are read from inside the container (verified), so there's no model-dir to keep alongside the file. Works for instruct and base checkpoints.
Serve (OpenAI-compatible)
# OpenAI-compatible local endpoint
at1 model serve model.at1m --port 11434 --cache-gb 8
# point any OpenAI SDK / IDE at it — no code change:
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hello"}]}'
# response carries X-AT1-Integrity: verified and "at1_integrity":"verified"--cache-gb keeps hot tensors resident (LRU) so generation runs at resident speed while cold tensors still stream — a model bigger than the cache budget still runs. at1 loginso managed inference is licensed & metered.
Pricing
pack / info / verify are always free. Managed verified-inference is metered at $0.30 / 1M tokens (prompt + completion) with the first 1M tokens/month free; an air-gapped enterprise self-host deploy with attestation + support is $2,000 / deploy / month. See the product page and pricing.
Enable it from your dashboard → Licensed engines (a card on file is required; usage rolls into your bill).