An Intellectual History  ·  Seven Centuries of Compression

Minimum Description Length

From Clausius's thermodynamic entropy and Boltzmann's S = k ln W to Rissanen's compression criterion — the deep genealogy of the principle that to understand is to compress.

I

The Thermodynamic Ancestry

1824 – 1902
1824

Carnot and the Engine of Irreversibility

Sadi Carnot

The conceptual ground for entropy was prepared by Sadi Carnot's 1824 memoir Réflexions sur la puissance motrice du feu. Carnot showed that no heat engine could exceed a universal efficiency limit determined solely by its operating temperatures — implying something fundamental about the direction of physical processes. 1

This was not yet entropy, but it demanded the concept. Why cannot all heat be converted to work? The answer — something is always irretrievably lost — would become the second law, and entropy would name that loss.

Primary Sources

Thurston translation (1897): nd.edu/~powers/ame.20231/carnot1897.pdf

Internet Archive scan: archive.org/details/reflectionsonmot00carn

1834

Clapeyron Rescues Carnot

Émile Clapeyron  ·  Journal de l’École Polytechnique

Carnot died of cholera in 1832 at age 36, his 1824 memoir nearly forgotten. Émile Clapeyron’s 1834 paper “Mémoire sur la puissance motrice de la chaleur” rescued it — reformulating Carnot’s arguments using calculus and the pressure-volume indicator diagram, giving them the mathematical form the scientific community could absorb.

Without Clapeyron, Clausius and Lord Kelvin would almost certainly never have encountered Carnot’s theorem. Clausius explicitly credited Clapeyron’s memoir as the route by which he came to the work. The chain Carnot → Clapeyron → Clausius → Boltzmann → Gibbs → Shannon is unbroken — and Clapeyron is its indispensable link. 50

1850 – 1865

Clausius Names Entropy

Rudolf Clausius

Rudolf Clausius is the architect of entropy as a named physical quantity. His 1850 paper displaced caloric theory, establishing that heat flows from hot to cold and cannot spontaneously reverse. 2 Over the following decade he developed the "equivalence value" of transformations — his precursor to entropy — and formalized the inequality:

Clausius Inequality (1856) ∮ δQ / T ≤ 0

In 1865 he named this quantity entropy (Greek: transformation content), deliberately echoing energy: "so closely related in physical significance that a certain similarity in their names appears to be appropriate." 3 His conclusion: "The energy of the universe is constant. The entropy of the universe tends to a maximum." This is the first clean articulation of irreversible directional change — a concept that reappears, centuries later, as the cost of model description. 2

1867

Maxwell's Demon — Knowledge Meets Entropy

James Clerk Maxwell

In a private letter to Peter Guthrie Tait, Maxwell conceived an imaginary demon capable of sorting fast and slow molecules without apparent energy expenditure — seemingly violating the second law. 4 Unresolved for sixty years, the demon posed a profound question: what is the relationship between knowledge — the information the demon holds — and entropy? This question became central to MDL's intellectual foundations.

1872 – 1877

Boltzmann's Statistical Bridge

Ludwig Boltzmann

Boltzmann made the decisive leap from thermodynamic entropy to probability. His 1872 H-theorem established that a gas's approach to equilibrium is fundamentally probabilistic. He wrote: "the problems of the mechanical theory of heat are really problems in probability calculus." 5

The synthesis came in 1877. Boltzmann saw that entropy increases because uniform molecular distributions vastly outnumber non-uniform ones — a purely combinatorial insight. The formula engraved on his Vienna tombstone: 6

Boltzmann Entropy (1877) S = k ln W

The logarithmic form — probability mapped to additive scalars through the logarithm — is structurally identical to what Shannon would later call entropy. Shannon acknowledged this connection when naming his measure after Boltzmann's H-function. The thermodynamic ancestry of MDL is embedded in this equation. 7

1875 – 1902

Gibbs Generalizes: S = −k Σ p ln p

Josiah Willard Gibbs

Gibbs extended Boltzmann's work into a general statistical mechanics. His 1875–1878 memoir used entropy maximization as the condition of thermodynamic equilibrium. 8 His 1902 textbook coined "statistical mechanics" and defined entropy for an arbitrary probability distribution:

Gibbs Entropy (1902) S = −kB Σᵢ pᵢ ln pᵢ

This is the direct algebraic precursor to Shannon's information entropy. The chain Clausius → Boltzmann → Gibbs → Shannon was not accidental: it was a continuous mathematization of the insight that disorder and missing information are the same thing. 9

II

Information Theory & Computation

1929 – 1961
1929

Szilard Quantifies the Demon's Knowledge

Leó Szilárd

Szilard resolved Maxwell's demon with a landmark 1929 paper. He showed the demon must acquire knowledge of each molecule's state, and that this information acquisition carries a thermodynamic cost precisely sufficient to compensate the entropy reduction achieved. 10 Szilard established that information is physical — knowing a system's state is energetically equivalent to reducing its entropy. Measurement and knowledge are subject to thermodynamic accounting. 11

1936

Turing's Universal Machine

Alan Turing

Turing's "On Computable Numbers" constructed a universal Turing machine capable, in principle, of any computable process. 12 The deeper implication: any computable object can be described by a finite program, and the length of that program is its minimum description. Turing planted the foundational intuition that complexity equals the length of the shortest description — the core of MDL, formalized three decades later. 13

1948

Shannon's Mathematical Theory of Communication

Claude E. Shannon · Bell Labs

Shannon's 1948 paper is the single most consequential publication in MDL's genealogy. He defined the entropy of a discrete distribution: 14

Shannon Entropy (1948) H = −Σᵢ pᵢ log₂ pᵢ

…and proved this is the minimum average bits needed to encode messages — a fundamental lower bound on lossless compression. Shannon named his measure after Boltzmann's H-function. Warren Weaver noted it "roots back to Boltzmann's observation that entropy is related to 'missing information.'" 15

Shannon's source coding theorem is the direct ancestor of MDL: any statistical regularity permits compression below raw bit count, and the degree of compression measures how much regularity a model has captured. 16

1961

Landauer's Erasure Principle

Rolf Landauer · IBM

Landauer proved that erasing one bit of information dissipates at least kBT ln 2 joules — closing the loop on Maxwell's demon, whose memory erasure pays the entropy price. 17

Landauer's Limit (1961) ΔQ ≥ kBT ln 2  per bit erased

Description length and thermodynamic work are in precise quantitative correspondence. Any description that stores a bit of information about the world is implicitly performing work. This is not metaphor — it is physical equivalence. 18

III

Algorithmic Information Theory

c.1347 – 1969
c. 1287–1347

Occam's Razor — The Philosophical Ancestor

William of Ockham

"Plurality must never be posited without necessity." This principle — prefer the simpler of two equally explanatory hypotheses — has guided scientific model-building for seven centuries. 19 The modern MDL principle is, in the precise technical sense, the computational and information-theoretic formalization of Ockham's razor: shorter descriptions receive exponentially higher probability under the universal prior. 20

1960 – 1964

Solomonoff's Universal Prior

Ray Solomonoff

Solomonoff is the unheralded originator of the computational branch of MDL. His 1964 paper "A Formal Theory of Inductive Inference" constructed a rigorous framework: the a priori probability of any sequence is proportional to the sum of probabilities of all programs on a universal Turing machine that generate it. 21

Shorter programs receive exponentially higher probability — a mathematically perfect realization of Occam's razor with provably optimal convergence for any computable data-generating process. 22 Rissanen explicitly acknowledged Solomonoff's influence when developing MDL. 23

1965

Kolmogorov Complexity K(x)

Andrei Kolmogorov

Kolmogorov defined the complexity K(x) of a string x as the length of the shortest program on a universal Turing machine that outputs x. 24 The Invariance Theorem guarantees that the choice of programming language affects K(x) by at most a constant. This "algorithmic entropy" is the theoretically ideal, objective measure of information in any individual object.

Kolmogorov complexity is the theoretical ideal toward which MDL strives. Because it is uncomputable, practical MDL methods are computable approximations that trade theoretical perfection for tractability. 25

1966 – 1969

Chaitin's Prefix-Free Complexity

Gregory Chaitin

Chaitin independently discovered algorithmic complexity in 1966 and extended it in 1969 to the prefix-free (self-delimiting) variant, more directly connected to probability theory through the Kraft inequality — the foundation for NML and the universal codes used in practical MDL. 26 Chaitin also demonstrated deep links between algorithmic randomness and Gödel incompleteness. 27

IV

Statistical MDL Is Born

1968 – 1986
1968

Wallace & Boulton — Minimum Message Length

Monash University · The Computer Journal

Chris Wallace and David Boulton published "An Information Measure for Classification," introducing Minimum Message Length (MML): the best model minimizes the combined bit-length of the hypothesis and the data encoded under that hypothesis. 28 MML is a parallel invention to Rissanen's later MDL, differing primarily in its explicit Bayesian framing and treatment of continuous parameters. 29

1973 – 1974

Akaike Information Criterion

Hirotugu Akaike

Akaike derived AIC from the relationship between maximum likelihood estimation and Kullback-Leibler divergence, penalizing complexity by the number of free parameters. 30 AIC, BIC (Schwarz, 1978), and MDL share philosophical ancestry in Occam's razor and Shannon's theory — their near-simultaneous emergence reflects the same underlying insight pressing through different disciplines. 31

1978

Rissanen Coins "MDL"

Jorma Rissanen · Automatica

The term Minimum Description Length was coined by Jorma Rissanen in "Modeling by the Shortest Data Description," published in Automatica. The central thesis: all statistical learning is about finding regularities in data, and the best hypothesis is the one that compresses the data most effectively. 32

His original two-part code — describe the model, then encode data given the model, minimize total length — is closely related to Schwarz's BIC (also 1978), though Rissanen's motivation came entirely from information theory and universal coding, not Bayesian inference. 33

1983 – 1986

Stochastic Complexity

Jorma Rissanen · Annals of Statistics

Rissanen extended MDL with stochastic complexity — the shortest description length achievable by any universal code for a parametric model class, moving beyond the two-part code. 34 He derived the celebrated parametric penalty:

MDL Parametric Penalty (Rissanen 1986) L(model) ≥ (k/2) log n

where k is the number of parameters and n the sample size. This connects MDL to the Fisher information matrix and establishes a precise relationship between model complexity and sample-size requirements. 35

V

Maturation & Unification

1991 – 2007
1993

MDL Enters Neural Networks

Hinton & Van Camp · COLT '93

Geoffrey Hinton and Drew van Camp brought MDL into neural network theory in "Keeping Neural Networks Simple by Minimizing the Description Length of the Weights." 36 Their argument: networks generalize well when the information in the weights is less than the information in the training outputs. They proposed controlling weight information by adding Gaussian noise — the first direct application of MDL to deep learning, prefiguring Bayesian deep learning and the information bottleneck framework. 37

1996

Normalized Maximum Likelihood

Jorma Rissanen

Rissanen introduced Normalized Maximum Likelihood (NML) — what he considered the purest form of MDL. For each data sequence, NML selects the model under which it has maximum probability, normalized across all possible sequences. 38 The stochastic complexity under NML has a clean minimax interpretation: it achieves the smallest achievable regret over all data-generating distributions in the model class. 39

1998

The Canonical MDL Review

Barron, Rissanen & Yu · IEEE Trans. Inf. Theory

"The Minimum Description Length Principle in Coding and Modeling" synthesized two decades of research, showing that NML, mixture coding, and predictive coding all achieve stochastic complexity to within asymptotically vanishing terms. 40 This remains the canonical reference for MDL's relationship to statistical modeling theory. 41

1999

Tishby's Information Bottleneck

Tishby, Pereira & Bialek

Tishby et al. framed representation learning as a rate-distortion problem: find the most compressed representation of X that maximally preserves information about target Y. 42 A close cousin of MDL — both operationalize compression as a criterion for structure extraction — the Information Bottleneck gained renewed prominence in 2017 when Tishby argued that deep networks implicitly solve an IB objective layer by layer.

2007

Grünwald's Definitive Treatise

Peter Grünwald · MIT Press

Grünwald's 736-page MIT Press textbook The Minimum Description Length Principle became the definitive reference, covering universal coding, stochastic complexity, NML, Bayesian model selection, and MDL for regression, density estimation, and non-parametric problems. 43 MDL was established as a mature, unified theory of inductive inference, with several previously open theoretical questions about consistency and convergence resolved. 44

VI

The Deep Learning Era

2013 – Present
2013 – 2018

Compression Explains Generalization

Arora, Ge, Neyshabur, Zhang · NeurIPS

Arora et al.'s 2018 NeurIPS paper showed that compressing trained network weights leads to provable generalization bounds: the networks that generalize best are the ones most compactly described. 45 Ollivier's 2018 NeurIPS paper demonstrated that MDL-inspired prequential coding yields tighter compression bounds than standard variational inference — an empirical vindication of the algorithmic information theory tradition. 46

2016 – 2017

Deep Variational Information Bottleneck

Alemi, Fischer, Dillon, Murphy

Alemi et al. introduced the Deep VIB, parameterizing the information bottleneck objective via a neural network and the reparameterization trick. 47 Models trained with the VIB objective outperformed standard regularization on generalization and adversarial robustness. Deep VIB brought the MDL-compression tradition and the IB-representation tradition into explicit contact within modern deep learning architectures.

2020 – 2026

MDL Revisited & Active Frontiers

Grünwald & Roos · Ongoing

Grünwald and Roos published "Minimum Description Length Revisited" in 2020, surveying three decades of development and identifying frontier directions: MDL for discrete data, finite-sample NML behavior, and PAC-Bayes generalization bounds. 48 Grünwald's 2025 JASA paper "Learning with the Minimum Description Length Principle" catalogued new developments including connections to deep learning theory, conformal prediction, and safe anytime-valid inference. 49 The central question — identifying models that capture genuine structure rather than overfitting noise — remains as urgent as it was in 1978.

Timeline at a Glance

YearEventFigureMDL SignificancePart
c. 1347Occam's RazorWilliam of OckhamPhilosophical precursor — parsimony principleIII
1824Carnot efficiency limitSadi CarnotFirst articulation of thermodynamic irreversibilityI
1834Clapeyron rescues CarnotÉmile ClapeyronReformulates Carnot via calculus; Clausius reads it hereI
1850–1865Entropy named & formalizedRudolf ClausiusRoot: S → max; irreversibility quantifiedI
1867Maxwell's Demon conceivedJ. C. MaxwellKnowledge ↔ entropy connection positedI
1872–1877S = k ln WLudwig BoltzmannProbability–entropy bridge; logarithm emergesI
1875–1902S = −kΣp ln pJosiah W. GibbsShannon's algebraic template establishedI
1929Szilard EngineLeó SzilárdInformation is physical; measurement costs entropyII
1936Universal Turing MachineAlan TuringComplexity = length of shortest programII
1948H = −Σp log pClaude ShannonCompression bound; entropy named for BoltzmannII
1961kBT ln 2 per bitRolf LandauerBits and thermodynamic work are equivalentII
1960–1964Universal inductive priorRay SolomonoffOccam formalized computationallyIII
1965K(x) — Kolmogorov complexityAndrei KolmogorovIdeal (uncomputable) MDL; Invariance TheoremIII
1966–1969Prefix-free complexityGregory ChaitinProbability–complexity bridge; NML foundationIII
1968MML coinedWallace & BoultonBayesian two-part code; model selectionIV
1973–1974Akaike Information CriterionHirotugu AkaikeKL divergence bridge; adjacent parsimonyIV
1978MDL coinedJorma RissanenCompression = model selection, formally namedIV
1983–1986Stochastic complexity; (k/2) log nJorma RissanenFisher information connection; MDL deepenedIV
1993MDL for neural netsHinton & Van CampFirst deep learning MDL applicationV
1996NML — refined MDLJorma RissanenMinimax regret; purest MDL formV
1998Canonical MDL reviewBarron, Rissanen, YuUnified with coding theoryV
1999Information BottleneckTishby et al.Compression as representation learningV
2007Grünwald's treatisePeter GrünwaldFull synthesis; 736-page referenceV
2013–2018Compression → generalization boundsArora et al.MDL explains deep learning generalizationVI
2016–2017Deep Variational IBAlemi et al.MDL and IB unified in deep netsVI
2020–2026MDL revisited; PAC-BayesGrünwald & RoosActive frontier in modern MLVI

Sources

1
Carnot, S. (1824). Réflexions sur la puissance motrice du feu — Thurston translation (1897).
nd.edu/~powers/ame.20231/carnot1897.pdfarchive.org/details/reflectionsonmot00carn↑ Return to citation
50
Clapeyron, É. (1834). “Mémoire sur la puissance motrice de la chaleur.” Journal de l’École Polytechnique, 14(23), 153–190. English translation by E. Mendoza in Reflections on the Motive Power of Fire (Dover, 1960).
biodiversitylibrary.org/item/20044archive.org/details/reflectionsonmot00carn (Dover volume includes Clapeyron translation)↑ Return to citation
2
EBSCO Research Starters. "Clausius and the Second Law of Thermodynamics."
ebsco.com/research-starters/history/clausius-and-second-law-thermodynamics↑ Return to citation
3
Kronecker Wallis. "Rudolf Clausius and the Second Law of Thermodynamics Explained." (2026)
kroneckerwallis.com/rudolf-clausius-and-the-second-law-of-thermodynamics-explained/↑ Return to citation
4
Stanford Encyclopedia of Philosophy. "Information Processing and Thermodynamic Entropy." (2009)
plato.stanford.edu/entries/information-entropy/↑ Return to citation
5
Uffink, J. "Boltzmann's Work in Statistical Physics." Stanford Encyclopedia of Philosophy (2012)
plato.stanford.edu/archives/fall2012/entries/statphys-Boltzmann/↑ Return to citation
6
Kathy Loves Physics. "Boltzmann's Entropy Equation: A History from Clausius to Planck." (2025)
kathylovesphysics.com/boltzmanns-entropy-equation/↑ Return to citation
7
SCIRP. "On Clausius, Boltzmann and Shannon Notions of Entropy." (2016)
scirp.org/journal/paperinformation?paperid=63224↑ Return to citation
8
PMC. "The Gibbs Paradox: Early History and Solutions." (2018)
pmc.ncbi.nlm.nih.gov/articles/PMC7845772/↑ Return to citation
9
NAUN. "Boltzmann Entropy of Thermodynamics versus Shannon Entropy of Information Theory." (2014)
naun.org/main/NAUN/mechanics/2014/a182003-086.pdf↑ Return to citation
10
PubMed. "Variations on a demonic theme: Szilard's other engines." (2020)
pubmed.ncbi.nlm.nih.gov/33003907/↑ Return to citation
11
Emergent Mind. "Szilard's Engine: Thermodynamics & Information." (2025)
emergentmind.com/topics/szilard-s-engine↑ Return to citation
12
History of Information. "Alan Turing Publishes 'On Computable Numbers'"
historyofinformation.com/detail.php?id=619↑ Return to citation
13
Stanford Encyclopedia of Philosophy. "Turing Machines." (2018)
plato.stanford.edu/entries/turing-machine/↑ Return to citation
14
Shannon, C. E. (1948). "A Mathematical Theory of Communication." Bell System Technical Journal, 27, 379–423.
people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf↑ Return to citation
15
Quanta Magazine. "How Claude Shannon's Concept of Entropy Quantifies Information." (2022)
quantamagazine.org/how-claude-shannons-concept-of-entropy-quantifies-information-20220906/↑ Return to citation
16
Stanford CS. "1948: Claude Shannon Creates Modern Information Theory."
cs.stanford.edu/…/information-theory/history.html↑ Return to citation
17
PMC. "The Landauer Principle: Re-Formulation of the Second Law of Thermodynamics." (2019)
pmc.ncbi.nlm.nih.gov/articles/PMC7514250/↑ Return to citation
18
Plenio & Vitelli. "The physics of forgetting: Landauer's erasure principle and information theory." (2001)
boun.edu.tr/…/pleniovitelli2001_landauers_erasure_principle.pdf↑ Return to citation
19
Internet Encyclopedia of Philosophy. "Ockham (Occam), William of."
iep.utm.edu/ockham/↑ Return to citation
20
Britannica. "Occam's Razor — Origin, Examples & Facts."
britannica.com/topic/Occams-razor↑ Return to citation
21
Solomonoff, R. J. (1964). "A Formal Theory of Inductive Inference. Part I." Information and Control, 7(1), 1–22.
sciencedirect.com/science/article/pii/S0019995864902232↑ Return to citation
22
Scholarpedia. "Algorithmic Probability." (2007)
scholarpedia.org/article/Algorithmic_probability↑ Return to citation
23
Roos, T. "Minimum Description Length Principle." University of Helsinki CS. (2016)
cs.helsinki.fi/u/ttonteri/pub/roosmdlencyc2016.pdf↑ Return to citation
24
Scholarpedia. "Algorithmic Information Theory." (2007)
scholarpedia.org/article/Algorithmic_information_theory↑ Return to citation
25
Grünwald, P. & Roos, T. "Minimum Description Length Revisited." Int. J. Mathematics for Industry, 2020.
geoinfotheory.org/…MDL_revisited.pdf↑ Return to citation
26
Li, M. & Vitányi, P. "Kolmogorov Complexity and its Applications." CWI, 2011.
ir.cwi.nl/pub/2011/2011D.pdf↑ Return to citation
27
TalkOrigins. "Algorithmic Information Theory (Chaitin, Solomonoff & Kolmogorov)." (2005)
talkorigins.org/faqs/information/algorithmic.html↑ Return to citation
28
Allison, L. "Minimum Message Length (MML)." Monash University. (2003)
allisons.org/ll/MML/↑ Return to citation
29
Bayesian Intelligence. "Minimum Message Length: A Computational Bayesianism." (2012)
bayesian-intelligence.com/bwb/2012-04/…↑ Return to citation
30
Cavanaugh & Neath. "The Akaike Information Criterion: Background, Derivation, Properties." (2019)
iowabiostat.github.io/…/Cavanaugh_Neath_2019.pdf↑ Return to citation
31
Machine Learning Mastery. "Probabilistic Model Selection with AIC, BIC, and MDL." (2020)
machinelearningmastery.com/probabilistic-model-selection-measures/↑ Return to citation
32
Grünwald, P. "A Tutorial Introduction to the Minimum Description Length Principle." CWI.
homepages.cwi.nl/~paulv/course-kc/mdlintro.pdf↑ Return to citation
34
JSTOR. Rissanen, J. "Stochastic Complexity and Modeling." Annals of Statistics, 1986.
jstor.org/stable/3035559↑ Return to citation
35
CWI. "The Minimum Description Length Principle." Grünwald (book overview).
ir.cwi.nl/pub/11997/11997D.pdf↑ Return to citation
36
Hinton, G. & Van Camp, D. (1993). "Keeping Neural Networks Simple by Minimizing the Description Length of the Weights." COLT '93.
cs.toronto.edu/~hinton/csc2535/readings/colt93.pdf↑ Return to citation
37
Semantic Scholar. Hinton & Van Camp (1993) — full citation record.
semanticscholar.org/paper/Hinton-Camp/25c9f33aceac6dcff357727cbe82ef6c42d4be39↑ Return to citation
38
Navarro, D. "Model Selection by Normalized Maximum Likelihood." (2006)
papers.djnavarro.net/2006_nml.pdf↑ Return to citation
39
ILLC Amsterdam. "Minimum Description Length Model Selection."
eprints.illc.uva.nl/id/document/11887↑ Return to citation
40
Barron, A., Rissanen, J. & Yu, B. (1998). "The Minimum Description Length Principle in Coding and Modeling." IEEE Trans. Information Theory.
stat.yale.edu/~arb4/…CodingAndModelingIEEEIT.pdf↑ Return to citation
42
Tishby, N., Pereira, F. & Bialek, W. (1999/2000). "The Information Bottleneck Method." arXiv:physics/0004057.
arxiv.org/abs/physics/0004057↑ Return to citation
43
Grünwald, P. D. (2007). The Minimum Description Length Principle. MIT Press. 736 pp.
direct.mit.edu/books/monograph/3813/The-Minimum-Description-Length-Principle↑ Return to citation
44
ThriftBooks. "The Minimum Description Length Principle." Peter Grünwald.
thriftbooks.com/w/the-minimum-description-length-principle…↑ Return to citation
45
Arora, S., Ge, R., Neyshabur, B. & Zhang, Y. (2018). "Stronger Generalization Bounds for Deep Nets via a Compression Approach." NeurIPS 2018.
ias.edu/…/AroraGeNeZh2018.pdf↑ Return to citation
46
Ollivier, Y. (2018). "The Description Length of Deep Learning Models." NeurIPS 2018.
papers.neurips.cc/paper/7490-the-description-length-of-deep-learning-models.pdf↑ Return to citation
47
Alemi, A., Fischer, I., Dillon, J. & Murphy, K. (2016). "Deep Variational Information Bottleneck." arXiv:1612.00410.
arxiv.org/abs/1612.00410↑ Return to citation
48
Grünwald, P. & Roos, T. "Minimum Description Length Revisited." Int. J. Mathematics for Industry, 2020.
geoinfotheory.org/…MDL_revisited.pdf↑ Return to citation
49
Grünwald, P. (2025). "Learning with the Minimum Description Length Principle." Journal of the American Statistical Association.
tandfonline.com/doi/full/10.1080/01621459.2025.2583392↑ Return to citation