Is this a law, or a memorization?
MDL-Select scores whether a fit is a genuine law or a memorization by description length in bits— bits(model) + bits(residuals), Rissanen's Minimum Description Length — instead of R² or MSE, so it's far harder to fool on noisy data. Its verdict is corroborated by a held-out extrapolation test.
- bits
- scores fits by description length, not R² or MSE
- 4
- verdicts: LAW / FIT / MEMORIZATION / NO-LAW
- held-out
- corroborated by an extrapolation test on unseen points
- MDL
- Rissanen's Minimum Description Length principle
Description length is the test
Scores in bits, not R²
MDL-Select measures a fit by its total description length — bits(model) + bits(residuals). A genuine law describes the data in fewer total bits than it takes to list the data; a memorization pays for every point it stores. That accounting is what R² and MSE miss.
Far harder to fool on noise
R² and MSE reward any curve that hugs the points — so a high-order fit can score beautifully by memorizing noise. MDL charges for the model's own complexity, so over-fitting a noisy sample costs more bits than it saves. The scorer sees through the flattery.
Corroborated by extrapolation
A LAW should keep working on points it never saw. MDL-Select cross-checks its verdict with a held-out extrapolation test: a real law predicts the unseen tail; a memorization falls apart the moment it leaves the data it was fit to.
Four honest verdicts
It returns LAW (compresses and extrapolates), FIT (useful locally but not a law), MEMORIZATION (only stores the data), or NO-LAW (nothing beats just listing the values). It commits to which — and says NO-LAW rather than dressing up noise.
One command surface
# score a candidate fit by description length, not R²
at1 mdl-select score data.csv --model "y = a*exp(-b*x) + c"
# -> LAW · bits(model)+bits(residuals) < bits(data)
at1 mdl-select score noisy.csv --model "<order-9 polynomial>"
# -> MEMORIZATION · pays for every point, fails held-out
at1 mdl-select score random.csv --model "..."
# -> NO-LAW · nothing beats just listing the valuesMDL-Select is a model-selection scorer — a scientific and sensor-regression instrument. It scores candidate fits you supply and tells you which is a law, which is a local fit, and which is memorization; it is not a full symbolic-discovery engine and does not invent the candidate formula for you. It is a licensed engine — enable it at /engines. Used the right way it's a sharper, harder-to-fool alternative to R²/MSE for deciding whether a regression captured real structure. For the broader “store the rule, not the output” portfolio see Research.