Run any model from one verified container.
Pack a local model into a single self-describing .at1mfile — weights, config, and tokenizer sealed together — then run or serve it with every weight SHA-256-verified on load. Smaller, offline, byte-exact, and OpenAI-compatible so your tools don't change.
One self-describing file
Weights + config + tokenizer are sealed into a single .at1m container. Move that one file and the model runs anywhere — no model-dir to ship alongside it, smaller and offline.
Verified on every load
Each tensor carries a SHA-256; loading fails closed if a single byte was altered. The endpoint surfaces it to the client as X-AT1-Integrity: verified, and `verify` re-checks the whole container.
Addressable & streaming
Tensors stream individually — a forward pass reads only what it touches — so a model larger than your cache budget still runs. Byte-exact vs the original weights.
OpenAI-compatible, no IDE change
Point Cursor / Continue / any OpenAI SDK at the local endpoint. You get a sovereign, verified model behind the API shape your tools already speak.
Three commands
at1 model pack ./my-model --out model.at1m
Seal weights + config + tokenizer into one verified container.
at1 model verify model.at1m
Re-check every tensor SHA-256 + the manifest — fail-closed.
at1 model serve model.at1m --port 11434
OpenAI-compatible endpoint at http://localhost:11434/v1 (X-AT1-Integrity: verified).
Pricing
Free grant
1M tokens/mo
First 1M inference tokens/month free of charge. Pack & verify are never metered.
Managed
$0.30/1M tokens
Prompt + completion, after the first 1M/month. Metered through your usage bill.
Enterprise self-host
$2,000/deploy/mo
Air-gapped deploy with attestation + support. Your weights never leave your machine.
The engine ships compiled and license-gatedand runs on your own hardware — your model and prompts never leave it. Enabling the runtime requires a card on file (no separate engine fee — usage rolls into your bill), then your first 1M inference tokens each month are free of charge. Honest framing: the OpenAI API shape and on-disk weight packing aren't novel on their own; the wedge is the verified + compressed + addressable + self-describing container you run inference straight out of.