A Causal Inference Trap
Should Pregnant Women
Smoke?
Among low-birth-weight babies, maternal smoking is associated with lower mortality. This is real data. Should we update medical advice?
This page demonstrates why the naive interpretation is WRONG.
Smoking during pregnancy is extremely harmful. The paradox we're about to explore shows how "controlling for variables" can create dangerous statistical illusions.
Imagine you're a medical researcher. You have data on thousands of births. You want to understand how maternal smoking affects infant mortality.
Your first analysis: Babies born to smoking mothers have higher mortality. Smoking is bad. Obviously.
But then a colleague says: "Wait, smoking causes low birth weight, and low birth weight causes mortality. You should control for birth weightto isolate the direct effect of smoking."
Sounds reasonable. You run the analysis again, this time only looking at low-birth-weight babies...
And suddenly smoking looks protective.
This is not a bug. This is the Low Birth Weight Paradox.
See The Paradox
Below is a simulated population of 5,000 babies. The simulation captures the real causal structure: smoking causes low birth weight, but genetic/congenital issues are the real driver of mortality.
Overall Population: Mortality by Smoking Status
Low Birth Weight Only: Mortality by Smoking Status
Total Births
5,000
Smoking Mothers
757
Low Birth Weight
67
Toggle the control. Watch the effect flip. In the overall population, smoking is associated with higher mortality (correctly). Among LBW babies only, it's associated with lower mortality (paradoxically).
What's going on here?
The Collider Trap
To understand this paradox, we need to visualize the causal structure. This is called a Directed Acyclic Graph (DAG).
Smoking and Genetic Issues are independent causes of Low Birth Weight. Only Genetic Issues directly cause Mortality. The causal path is clear.
A collider is a variable that has two causes pointing into it. Low Birth Weight is caused by both smoking and genetic issues.
Here's the key insight: conditioning on a collider creates a spurious association between its causes.
An analogy: Imagine you're looking at admissions to a selective college.
Students get in either by being smart OR by being good at sports. Among admitted students (conditioning on the collider), smart students appear to be worse at sports - not because they actually are, but because if they got in without sports, they probably got in via intelligence.
The Hidden Variable
The key to understanding the paradox is the hidden variable: genetic and congenital issues.
There are two paths to low birth weight:
Path 1: Smoking
Mother smokes, baby has low birth weight. But the baby is otherwise healthy. The LBW is "benign" in the sense that it doesn't carry the same mortality risk.
Path 2: Genetic Issues
Baby has genetic or congenital problems, causing low birth weight. This baby is fundamentally unhealthy. The LBW is a symptom of deeper problems that cause mortality.
| Group | Count | % with Genetic Issues | Mortality Rate |
|---|---|---|---|
| Non-smoker, Normal weight | 4196 | 4.4% | 1.5% |
| Non-smoker, Low weight | 47 | 100.0% | 29.8% |
| Smoker, Normal weight | 737 | 3.3% | 1.8% |
| Smoker, Low weight | 20 | 85.0% | 5.0% |
When you control for birth weight, you're comparing apples to oranges. Non-smoking LBW babies are more likely to have genetic issues (the deadly kind of LBW). Smoking LBW babies often just have smoking-caused LBW (the "less deadly" kind).
Smoking isn't protective. It just changes which kind of low birth weight babies you're looking at.
Lessons for Causal Inference
The Low Birth Weight Paradox teaches us a critical lesson: "controlling for variables" is not always correct.
Rule 1: Never control for descendants of treatment
Birth weight is downstream of smoking. Controlling for it blocks the causal path you're trying to measure and opens spurious paths.
Rule 2: Never control for colliders
A collider is a variable with two causes. Conditioning on it creates spurious associations between those causes, even if they're independent.
Rule 3: Draw the DAG first
Before running any regression, sketch out the causal structure. Identify which variables are confounders (control for these) vs. colliders (don't control).
Rule 4: "Adjusting" can introduce bias
The naive intuition that "more controls = better" is wrong. Bad controls can create bias where none existed.
Real-world examples of collider bias
- Hospital admission: Conditioning on hospitalization can make diseases appear protective of each other
- Selection effects: Looking only at published studies creates bias between effect size and sample size
- Survival bias: Conditioning on survival can reverse the apparent effect of treatments
- Obesity paradox: Among heart disease patients, obesity appears protective - because severe heart disease causes weight loss
The answer to "should pregnant women smoke?" is unambiguously no.
The paradox reveals a statistical trap, not a medical insight.
When to Control?
Here's a quick decision guide for whether to control for a variable Z when estimating the effect of X on Y:
CONTROL if Z is a confounder:
Z causes both X and Y (common cause)
Example: Age affects both exercise habits and heart health
DON'T CONTROL if Z is a mediator:
X causes Z causes Y (on the causal path)
Example: Exercise affects cholesterol affects heart health
DON'T CONTROL if Z is a collider:
Both X and Y cause Z (common effect)
Example: Both talent and luck cause success
BE CAREFUL if Z is downstream of X:
X causes Z (Z is a descendant of treatment)
Example: This is exactly the birth weight situation!
Learn More About Causal Inference
This paradox is a gateway to understanding modern causal inference. We build interactive explainers to make these concepts accessible.
Reference: Hernandez-Diaz et al. (2006), Pearl (2009)