A Statistical Paradox
How Wheat Prices
Help Predict Baseball Averages
In 1956, Charles Stein proved something that still confuses statisticians: unrelated data improves estimates.
Here's an absurd claim: If you want to estimate a baseball player's true batting average, you should look at wheat prices.
Not metaphorically. Not as a sanity check. Actually use them in your calculation. Your estimate will be more accurate.
This isn't a trick. It's called Stein's Paradox, and it broke statistics in 1956. The proof is airtight. The math is correct. And yet it feels deeply, fundamentally wrong.
Let me show you why it works.
But first, we need to understand what "better estimate" even means.
The Estimation Game
Before we get to the paradox, let's play a game. I'll think of a number. You'll see a noisy observation of it. Your job is to guess the true value.
I'll generate a random true value from a normal distribution. You'll see it plus some noise. Try to guess the original.
When you have one thing to estimate, your best strategy is obvious: just use the observation. It's unbiased. Any shrinkage toward zero or any other value would add bias.
For two things? Same answer. Estimate each one with its observation.
But for three or more?
The Stein Shock
In 1956, Charles Stein proved that for three or more parameters, the MLE is inadmissible.
That word has a precise meaning: there exists another estimator that is always at least as good, and sometimes strictly better, no matter what the true values are.
The James-Stein estimator: shrink all your estimates toward their common mean.
Run the simulation with 1 or 2 dimensions: MLE wins about half the time. There's no improvement.
Now try 3 or more. Watch James-Stein win consistently. The more dimensions, the bigger the advantage.
This is the paradox. At exactly n=3, something fundamental changes.
The Geometric Intuition
Here's the intuition. Think about what happens when you observe noisy data.
In one dimension, noise pushes your observation left or right with equal probability. On average, you're at the right distance from zero.
In high dimensions, something weird happens. Almost all the volume of a high-dimensional sphere is near the surface, not the center.
2D Circle
50% of area is within 70% of the radius from center. Points are spread throughout.
100D Hypersphere
99.9999...% of volume is in a thin shell near the surface. Points cluster at the boundary.
This means in high dimensions, random noise almost always pushes observations outward, away from the truth.
The MLE just uses these inflated observations directly. It systematically overshoots.
James-Stein corrects for this by shrinking toward the center. It's not magic—it's geometry.
The optimal shrinkage depends on the noise level and number of dimensions.
James-Stein calculates this automatically from the data.
Why This Matters
Stein's paradox isn't just a statistical curiosity. It's the theoretical foundation for modern machine learning.
Regularization
L2 regularization (Ridge regression) is James-Stein shrinkage. Penalizing large weights is the same as shrinking toward zero.
Bayesian Priors
Even a "wrong" prior helps. Shrinking toward any value beats MLE in high dimensions. The prior is doing the same work.
Empirical Bayes
Use data to estimate the prior, then apply it. This is exactly what James-Stein does—estimate shrinkage from the observations.
Neural Network Weights
Weight decay, dropout, batch normalization—all forms of shrinkage. Modern deep learning is Stein's paradox at scale.
So yes, wheat prices really do help predict batting averages.
Not because they're related. Because in high dimensions, shrinking toward any common value beats treating each estimate independently.
The "unrelated" data provides the shrinkage target.
Stein's paradox tells us: in a complex world with many parameters, borrowing strength from everywhere beats going it alone.
Want More Explainers Like This?
We build interactive, intuition-first explanations of complex AI and statistics concepts.
Reference: Stein (1956), James & Stein (1961)