Managed cloud

The writable, productized sibling of the read-only cold-tier gateway. It owns the object lifecycle: your application writes ordinary CSV/JSON and reads ordinary CSV/JSON, while on disk every object is a verified .at1 costing ~25% of raw — and the same objects are queryable in place. Drop-in for an S3 prefix: point your SDK at the endpoint and change nothing else.

Run it

AT1_CLOUD_TOKEN=secret python at1_cloud.py serve ./store \
    --bucket at1 --access-key AK --secret-key SK --port 9100
# S3 verbs are SigV4-signed; /sql takes a bearer token. Loopback-only without keys.

Write & read — transparent, verified

PUT runs the full gated pipeline (auto codec selection, query-optimized layout, an independent byte-exact round-trip verify, and a SHA-256 trailer) beforethe object is kept. The original bytes are never written to disk — if the compressed copy doesn't decompress back bit-for-bit, the write is rejected with 422. GET transparently decompresses, so reads — including HTTP Range — return the exact original bytes.

# any S3 SDK — boto3, aws s3 cp, DuckDB, Spark — pointed at the endpoint:
aws --endpoint-url http://localhost:9100 s3 cp events.csv s3://at1/data/events.csv
#   -> the original is NEVER stored; only a verified .at1 (a failed round-trip = HTTP 422)
#   response headers: x-at1-original-bytes, x-at1-compressed-bytes, x-at1-ratio

aws --endpoint-url http://localhost:9100 s3 cp s3://at1/data/events.csv ./back.csv
#   -> transparent decompress: back.csv is byte-identical to events.csv (HTTP Range works too)
  • S3-compatible: PUT / GET / HEAD / DELETE / ListObjectsV2, SigV4-signed (the scheme every real S3 client uses). Nested keys (a/b/c.csv) are real, traversal-guarded directories.
  • Metered & gated: writes report the storage/ratio axis and register the archive as TB-under-management; reads report the I/O axis; the stop-usage gate refuses both for an unpaid or over-quota account.
  • Observable: PUT responses carry x-at1-original-bytes, x-at1-compressed-bytes, x-at1-ratio; a live ticker is at /_at1/stats.

SQL REST

POST /sql takes a SQL string and compiles it to the proven predicate/projection pushdown: zone maps skip whole row-groups, so only the blocks a query touches are read. The reported stats.bytes_read vs total_block_bytes makes the saving observable.

curl -s http://localhost:9100/sql -H "Authorization: Bearer secret" \
  -d '{"sql": "SELECT id, user FROM data/events.csv
                WHERE score BETWEEN 10 AND 12 LIMIT 100"}'
# -> {"columns": ["id","user"], "rows": [...], "row_count": 150,
#     "stats": {"bytes_read": 2167, "total_block_bytes": ...}}   # only touched blocks read

Honest scope. The SQL is a deliberately small, safe subset — a single object in FROM; operators =, >=, <=, >, <, BETWEEN; AND; LIMIT. It compiles to a pushdown prefilter and an exact row filter, so results are correct, not approximate. Joins, aggregates, and ORDER BY are out of scope here — run those in DuckDB / Trino over the same objects via the S3 surface.

# S3-Select-style JSON pushdown is also available per object:
POST /at1/data/events.csv?select
  {"where": {"score": ["=", 0]}, "select": ["id"], "limit": 1000}

Self-host with npm i -g @tinyfiles/cli then at1-cloud serve …. The same byte-exact archive underneath means a select returns values while GET still serves the exact original bytes.