Managed cloud
The writable, productized sibling of the read-only cold-tier gateway. It owns the object lifecycle: your application writes ordinary CSV/JSON and reads ordinary CSV/JSON, while on disk every object is a verified .at1 costing ~25% of raw — and the same objects are queryable in place. Drop-in for an S3 prefix: point your SDK at the endpoint and change nothing else.
Run it
AT1_CLOUD_TOKEN=secret python at1_cloud.py serve ./store \
--bucket at1 --access-key AK --secret-key SK --port 9100
# S3 verbs are SigV4-signed; /sql takes a bearer token. Loopback-only without keys.Write & read — transparent, verified
PUT runs the full gated pipeline (auto codec selection, query-optimized layout, an independent byte-exact round-trip verify, and a SHA-256 trailer) beforethe object is kept. The original bytes are never written to disk — if the compressed copy doesn't decompress back bit-for-bit, the write is rejected with 422. GET transparently decompresses, so reads — including HTTP Range — return the exact original bytes.
# any S3 SDK — boto3, aws s3 cp, DuckDB, Spark — pointed at the endpoint: aws --endpoint-url http://localhost:9100 s3 cp events.csv s3://at1/data/events.csv # -> the original is NEVER stored; only a verified .at1 (a failed round-trip = HTTP 422) # response headers: x-at1-original-bytes, x-at1-compressed-bytes, x-at1-ratio aws --endpoint-url http://localhost:9100 s3 cp s3://at1/data/events.csv ./back.csv # -> transparent decompress: back.csv is byte-identical to events.csv (HTTP Range works too)
- S3-compatible:
PUT/GET/HEAD/DELETE/ListObjectsV2, SigV4-signed (the scheme every real S3 client uses). Nested keys (a/b/c.csv) are real, traversal-guarded directories. - Metered & gated: writes report the storage/ratio axis and register the archive as TB-under-management; reads report the I/O axis; the stop-usage gate refuses both for an unpaid or over-quota account.
- Observable: PUT responses carry
x-at1-original-bytes,x-at1-compressed-bytes,x-at1-ratio; a live ticker is at/_at1/stats.
SQL REST
POST /sql takes a SQL string and compiles it to the proven predicate/projection pushdown: zone maps skip whole row-groups, so only the blocks a query touches are read. The reported stats.bytes_read vs total_block_bytes makes the saving observable.
curl -s http://localhost:9100/sql -H "Authorization: Bearer secret" \
-d '{"sql": "SELECT id, user FROM data/events.csv
WHERE score BETWEEN 10 AND 12 LIMIT 100"}'
# -> {"columns": ["id","user"], "rows": [...], "row_count": 150,
# "stats": {"bytes_read": 2167, "total_block_bytes": ...}} # only touched blocks readHonest scope. The SQL is a deliberately small, safe subset — a single object in FROM; operators =, >=, <=, >, <, BETWEEN; AND; LIMIT. It compiles to a pushdown prefilter and an exact row filter, so results are correct, not approximate. Joins, aggregates, and ORDER BY are out of scope here — run those in DuckDB / Trino over the same objects via the S3 surface.
# S3-Select-style JSON pushdown is also available per object:
POST /at1/data/events.csv?select
{"where": {"score": ["=", 0]}, "select": ["id"], "limit": 1000}Self-host with npm i -g @tinyfiles/cli then at1-cloud serve …. The same byte-exact archive underneath means a select returns values while GET still serves the exact original bytes.