Home/Explainers/Kolmogorov Complexity

Algorithmic Information Theory

Kolmogorov
Complexity

The length of the shortest program that produces a string is its true information content. And you can never compute it.

Consider these two 100-character strings:

String A

01010101010101010101010101010101010101010101010101...

100 characters

String B

k8mP2xL9qN4jR7vT3wY6bH1dF5gK0sC8mU2pX9nJ4rV7tW3yB...

100 characters

Both strings are 100 characters long. But they are fundamentally different in their information content.

String A

print("01" * 50)

Program: ~15 characters

String B

print("k8mP2xL9qN4...")

Program: ~105 characters

The Kolmogorov complexity of a string

is the length of the shortest program that produces it.

K("0101...") ~ 15 bits. K("k8mP...") ~ 105 bits.

PART I

Patterns vs Randomness

Some strings contain patterns that allow compression. Others are incompressible - their shortest description is themselves.

Try it: Compression Visualizer

Raw String

ababababababab

Length: 14 characters

Bits needed: ~112 bits

Shortest Program

repeat("ab", 7)

Estimated complexity: ~2 characters

Compression: 86%

Low Complexity

  • Repeating patterns
  • Mathematical sequences
  • Structured data
  • Natural language text

Short programs generate long outputs

High Complexity

  • Random noise
  • Encrypted data
  • Compressed files
  • Most strings!

No shorter description than the string itself

Key insight: Most strings of length n have Kolmogorov complexity close to n. Random strings are the norm; compressible strings are rare.

PART II

Berry's Paradox

Here is where things get strange. Consider this phrase:

"The smallest positive integer not
definable in under sixty letters"

This phrase has 57 letters.

If such a number exists, we just defined it in under 60 letters - but it was supposed to be undefinable in under 60 letters!

Step Through the Paradox

1

The Setup

Consider all positive integers. Some can be described in English with very few words.

Example

"one" (3 letters), "twelve" (6 letters), "one million" (10 letters)

The connection to Kolmogorov complexity: Replace "definable in N letters" with "generatable by a program of N bits."

"The first string with Kolmogorov complexity greater than 1000 bits" - but this description itself is much shorter than 1000 bits!

PART III

Programs That Generate Strings

The same string can be generated by many programs. Kolmogorov complexity is defined by the shortest one.

Compare Program Lengths

0000...0000 (100 zeros)

Direct encoding (write it out)17 chars
Shortest program14 chars

Direct

print("0" * 100)

Shortest

print("0"*100)

Patterned strings compress well - short programs can generate long outputs.

Pi

1000 digits ~ 16 bits

Computable by short formula

Text

1000 chars ~ 600 bits

Language has redundancy

Random

1000 chars ~ 8000 bits

Incompressible

PART IV

Why K(x) Is Uncomputable

Here is the deep result: there is no algorithm that computes Kolmogorov complexity. The proof uses self-reference, like Berry's paradox.

The Uncomputability Proof

Step 1: Assume K(x) is computable

COMPLEXITY(x)

Black box that computes K(x)

Input: xK(x)

Suppose we have a program COMPLEXITY(x) that computes the Kolmogorov complexity of any string x.

What This Means

  • 1.You can never know the true complexity of a string
  • 2.You can find upper bounds (by finding short programs) but never prove optimality
  • 3.The halting problem is embedded in complexity measurement
  • 4."Is this string random?" is undecidable in general
PART V

Solomonoff Induction & Occam's Razor

Despite being uncomputable, Kolmogorov complexity provides the foundation for optimal prediction.

Ray Solomonoff showed that if you weight hypotheses by 2^(-K), you get a universal prior that is optimal for prediction in a precise mathematical sense.

Complexity and Prior Probability

Prior Probability by Complexity

Simple patternK = 10 bits
P = 2^(-10) ~ 45%
Medium complexityK = 25 bits
P = 2^(-25) ~ 30%
Complex explanationK = 50 bits
P = 2^(-50) ~ 15%
OverfittedK = 100 bits
P = 2^(-100) ~ 8%
Random noiseK = 200 bits
P = 2^(-200) ~ 2%

Solomonoff's Universal Prior: The probability of a hypothesis is inversely related to its Kolmogorov complexity.

P(H) ~ 2^(-K(H))

The Deep Connection

This is why Occam's Razor works! Simpler hypotheses:

  • Have lower Kolmogorov complexity
  • Deserve higher prior probability
  • Will dominate given sufficient evidence
  • Make predictions with less overfitting
PART VI

Implications

Defining Randomness

A string is random if its shortest description is roughly its own length. This is the only rigorous definition of randomness for individual strings.

Compression Limits

Most strings cannot be compressed. ZIP files work because real data has patterns. Random data (like encrypted files) cannot be made smaller.

Machine Learning

Minimum Description Length (MDL) approximates Kolmogorov complexity for model selection. Prefer models that compress the data best.

Fundamental Limits of Knowledge

Some questions about structure and randomness are forever undecidable. This is not a practical limitation but a mathematical impossibility.

The Paradox of Description

To describe the simplest description of something, you often need a description longer than the thing itself. Self-reference creates undecidability at the foundations of computation and information.

Information has an intrinsic complexity.
And we can never fully measure it.

Explore More Paradoxes

Kolmogorov complexity is just one window into the strange limits of computation and knowledge. Explore more mind-bending ideas.

Back to Home

References: Kolmogorov (1965), Solomonoff (1964), Chaitin (1966)