AT-1 Verified Model Runtime

Run any model from one verified container.

Pack a local model into a single self-describing .at1mfile — weights, config, and tokenizer sealed together — then run or serve it with every weight SHA-256-verified on load. Smaller, offline, byte-exact, and OpenAI-compatible so your tools don't change.

See pricing Read the docs

One self-describing file

Weights + config + tokenizer are sealed into a single .at1m container. Move that one file and the model runs anywhere — no model-dir to ship alongside it, smaller and offline.

Verified on every load

Each tensor carries a SHA-256; loading fails closed if a single byte was altered. The endpoint surfaces it to the client as X-AT1-Integrity: verified, and `verify` re-checks the whole container.

Addressable & streaming

Tensors stream individually — a forward pass reads only what it touches — so a model larger than your cache budget still runs. Byte-exact vs the original weights.

OpenAI-compatible, no IDE change

Point Cursor / Continue / any OpenAI SDK at the local endpoint. You get a sovereign, verified model behind the API shape your tools already speak.

Three commands

at1 model pack ./my-model --out model.at1m

Seal weights + config + tokenizer into one verified container.

at1 model verify model.at1m

Re-check every tensor SHA-256 + the manifest — fail-closed.

at1 model serve model.at1m --port 11434

OpenAI-compatible endpoint at http://localhost:11434/v1 (X-AT1-Integrity: verified).

Pricing

Free grant

1M tokens/mo

First 1M inference tokens/month free of charge. Pack & verify are never metered.

Managed

$0.30/1M tokens

Prompt + completion, after the first 1M/month. Metered through your usage bill.

Enterprise self-host

$2,000/deploy/mo

Air-gapped deploy with attestation + support. Your weights never leave your machine.

The engine ships compiled and license-gatedand runs on your own hardware — your model and prompts never leave it. Enabling the runtime requires a card on file (no separate engine fee — usage rolls into your bill), then your first 1M inference tokens each month are free of charge. Honest framing: the OpenAI API shape and on-disk weight packing aren't novel on their own; the wedge is the verified + compressed + addressable + self-describing container you run inference straight out of.

Enable the runtime Model integrity Addressable weights