Verified Model Runtime — `at1 model`

at1 model turns any local model into one self-describing container — weights, config.json, and the tokenizer sealed together in a single .at1m file — and runs inference straight out of it. Every tensor carries a SHA-256; loading fails closed if a single byte was altered, and the server surfaces it to the client as X-AT1-Integrity: verified.

The engine ships compiled and license-gatedand runs on your own machine — your model and prompts never leave it. Honest framing: the OpenAI API shape and on-disk weight packing aren't novel on their own; the wedge is the verified + compressed + addressable + self-describing container you run inference from.

Pack & verify

# seal weights + config + tokenizer into ONE verified container
at1 model pack ./my-model --out model.at1m --name my-model
#   ./my-model is a HF model dir (or a single .safetensors). Sharded models are supported.

at1 model info   model.at1m     # manifest: arch, tensor/aux counts, codec
at1 model verify model.at1m     # re-check every tensor SHA-256 + manifest (fail-closed)

pack streams weights one tensor at a time (peak RAM ~ the largest single tensor), so a model larger than RAM still packs. verify re-reads every tensor, recomputes its SHA-256, and checks the embedded manifest — a tamper anywhere is named and the command exits non-zero.

Run (one-shot)

# one-shot generation, straight from the verified container
at1 model run model.at1m --prompt "Summarise this release in one line." --max-tokens 128
#   -> [at1-model my-model  N tensors SHA-256-verified on load]  + the completion

Tokenizer and config are read from inside the container (verified), so there's no model-dir to keep alongside the file. Works for instruct and base checkpoints.

Serve (OpenAI-compatible)

# OpenAI-compatible local endpoint
at1 model serve model.at1m --port 11434 --cache-gb 8

# point any OpenAI SDK / IDE at it — no code change:
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"hello"}]}'
#   response carries  X-AT1-Integrity: verified  and  "at1_integrity":"verified"

--cache-gb keeps hot tensors resident (LRU) so generation runs at resident speed while cold tensors still stream — a model bigger than the cache budget still runs. at1 loginso managed inference is licensed & metered.

Pricing

pack / info / verify are always free. Managed verified-inference is metered at $0.30 / 1M tokens (prompt + completion) with the first 1M tokens/month free; an air-gapped enterprise self-host deploy with attestation + support is $2,000 / deploy / month. See the product page and pricing.

Enable it from your dashboard → Licensed engines (a card on file is required; usage rolls into your bill).

Verified Model Runtime — at1 model

Pack & verify

Run (one-shot)

Serve (OpenAI-compatible)

Pricing

Verified Model Runtime — `at1 model`