How was this file generated?
at1 originreads a file’s compression signature and estimates its origin — human, source, LLM, PRNG, CSPRNG, encrypted, sensor, or tabular. It’s a fast triage signal, calibrated on synthetic and exemplar data, that sees surface structure, not meaning.
The origin classes
Reads the compression signature
Different kinds of data leave different fingerprints under a battery of transforms and codecs — how compressible they are, and by which method. at1 origin turns that signature into a best-guess origin class with a confidence.
One pass, no per-type setup
The same probe runs on any file with no feature engineering and no trained model per format. It's a fast triage pass — "what am I even looking at?" — for a folder of unknown blobs.
Surface structure, not meaning
It classifies from statistical structure, not semantics. It can tell encrypted from prose from a sensor stream; it cannot read what the prose says or verify a claim. Treat it as a signal, not a verdict.
One command surface
# classify how a file was generated, from its compression signature at1 origin classify unknown.bin # -> csprng/encrypted (confidence 0.94) at1 origin classify notes.txt # -> human prose (confidence 0.81) at1 origin classify export.parquet # -> tabular (confidence 0.88) at1 origin scan ./inbox/ # triage a whole folder of unknown blobs
Honest scope
at1 origin is an instrument, not an oracle. It is calibrated on synthetic and exemplar data; broad real-corpus validation is the explicit graduation gate before any hard accuracy claim. It reads surface structure, not meaning— so an LLM asked to imitate a human, or a file that mixes origins, can read ambiguously, and it returns a confidence and an “unknown” class rather than guessing. To ask instead how much of a file is rule-governed, see the determinism score; to check whether a specific integer generator is recoverable, see at1 recover.