MDL-Select

Is this a law, or a memorization?

MDL-Select scores whether a fit is a genuine law or a memorization by description length in bits— bits(model) + bits(residuals), Rissanen's Minimum Description Length — instead of R² or MSE, so it's far harder to fool on noisy data. Its verdict is corroborated by a held-out extrapolation test.

Enable the engine The research

bits: scores fits by description length, not R² or MSE
4: verdicts: LAW / FIT / MEMORIZATION / NO-LAW
held-out: corroborated by an extrapolation test on unseen points
MDL: Rissanen's Minimum Description Length principle

Description length is the test

Scores in bits, not R²

MDL-Select measures a fit by its total description length — bits(model) + bits(residuals). A genuine law describes the data in fewer total bits than it takes to list the data; a memorization pays for every point it stores. That accounting is what R² and MSE miss.

Far harder to fool on noise

R² and MSE reward any curve that hugs the points — so a high-order fit can score beautifully by memorizing noise. MDL charges for the model's own complexity, so over-fitting a noisy sample costs more bits than it saves. The scorer sees through the flattery.

Corroborated by extrapolation

A LAW should keep working on points it never saw. MDL-Select cross-checks its verdict with a held-out extrapolation test: a real law predicts the unseen tail; a memorization falls apart the moment it leaves the data it was fit to.

Four honest verdicts

It returns LAW (compresses and extrapolates), FIT (useful locally but not a law), MEMORIZATION (only stores the data), or NO-LAW (nothing beats just listing the values). It commits to which — and says NO-LAW rather than dressing up noise.

One command surface

# score a candidate fit by description length, not R²
at1 mdl-select score data.csv --model "y = a*exp(-b*x) + c"
                                        # -> LAW · bits(model)+bits(residuals) < bits(data)
at1 mdl-select score noisy.csv --model "<order-9 polynomial>"
                                        # -> MEMORIZATION · pays for every point, fails held-out
at1 mdl-select score random.csv --model "..."
                                        # -> NO-LAW · nothing beats just listing the values

Honest scope

MDL-Select is a model-selection scorer — a scientific and sensor-regression instrument. It scores candidate fits you supply and tells you which is a law, which is a local fit, and which is memorization; it is not a full symbolic-discovery engine and does not invent the candidate formula for you. It is a licensed engine — enable it at /engines. Used the right way it's a sharper, harder-to-fool alternative to R²/MSE for deciding whether a regression captured real structure. For the broader “store the rule, not the output” portfolio see Research.