A Decision Theory Puzzle
Parfit's
Hitchhiker
You're dying in the desert. A driver says “I'll save you if I predict you'll pay me $100 in town.” You can't lie - he reads intentions.
You stumble through the desert, dehydrated and dying. A car appears on the horizon. The driver rolls down the window.
“I'll drive you to town - 100 miles away - if I predict you'll pay me $100 when we arrive. But I should warn you: I'm 99% accurate at reading people's true intentions.”
You have $1000 in your pocket. Once you reach town, no one can force you to pay. The driver knows this. He's not asking what you'll say - he's reading what you actually intend.
Here's the dilemma:
- 1.If you plan to stiff him once safe, he'll see it and leave you to die.
- 2.You can't bindingly commit to pay - there's no contract that works in the desert.
- 3.Once in town, keeping the $100 is the “rational” choice - the ride already happened.
The question isn't “What will you decide in town?”
It's “What kind of agent ARE you?”
Face the Driver
Put yourself in the desert. The driver is reading you right now. What do you intend?
You are dying of thirst in the desert. A driver approaches.
"I'll rescue you if I predict you'll pay me $100 when we reach town. I'm 99% accurate at reading intentions."
You have $1000. Once in town, no one can force you to pay.
Notice: if you try to “decide later,” the driver sees that as planning to stiff him. You can't fool a 99% accurate predictor by being undecided.
The CDT Agent Dies
Causal Decision Theory (CDT) says: evaluate each choice by its causal consequences from that moment forward.
CDT Reasoning in Town:
- 1. The ride has already happened. Paying or not won't change the past.
- 2. Paying $100 leaves me with $900. Not paying leaves me with $1000.
- 3. Therefore, not paying is strictly better. Keep the money.
This reasoning is impeccable... once you're in town. The problem is that it kills you in the desert.
The Fatal Loop:
- 1.CDT agent knows they won't pay in town (that's the “rational” choice)
- 2.Driver sees this intention with 99% accuracy
- 3.Driver refuses to rescue
- 4.CDT agent dies with $1000 in their pocket
The CDT agent is “rational” at every step. They're just dead.
YOUR DISPOSITION
What kind of agent are you?
CDT Agent
"Won't pay in town"
FDT Agent
"Will pay in town"
Driver predicts your disposition...
LEFT TO DIE
99% of cases
$0, Dead
Rescued
1% (prediction error)
$1000, Alive
RESCUED
99% of cases
$900, Alive
Hover over the decision tree to see the CDT path highlighted.
The FDT Agent Survives
Functional Decision Theory (FDT) takes a different view: don't evaluate individual actions - evaluate your entire decision algorithm.
FDT Reasoning:
- 1. The driver's prediction is based on my decision algorithm, not my moment-to-moment choice.
- 2. If my algorithm says “pay when rescued,” the driver will rescue me.
- 3. Running “pay when rescued” results in: alive with $900.
- 4. Running “don't pay” results in: dead with $1000.
- 5. Therefore, I should BE the kind of agent who pays.
FDT asks: “What algorithm, if I were running it, would lead to the best outcomes?”
The answer is clear: the “pay your rescuers” algorithm wins.
The Key Insight
Your decision algorithm is visible to predictors. It affects their behavior toward you. The “rational” action at each moment might produce a globally irrational algorithm.
“It's not about what you decide - it's about what you ARE.”
See It In Action
Let's run thousands of scenarios. Watch how CDT agents and FDT agents perform when facing the 99% accurate predictor.
CDT Agents
"Once I'm in town, I won't pay. That's the rational choice."
Survival Rate
0%
Avg Expected Value
$0
0 survived / 0 total
FDT Agents
"I AM the kind of agent who pays. The driver can see that."
Survival Rate
0%
Avg Expected Value
$0
0 survived / 0 total
The numbers don't lie: being a “payer” is more valuable than the $100 you keep by not paying.
Being vs Deciding
The deepest lesson of Parfit's Hitchhiker isn't about predictors or desert rescues. It's about the difference between deciding to pay and being a payer.
Deciding to Pay
- - A choice made at a moment in time
- - Can be reconsidered when circumstances change
- - Based on consequences from that moment forward
- - Others can't reliably predict it
Being a Payer
- - A property of your character/algorithm
- - Stable across situations
- - Based on who you want to be
- - Others CAN reliably predict it
The CDT agent tries to decide their way to good outcomes. The FDT agent becomes the kind of agent who gets good outcomes.
Real-World Applications
- Reputation:Being known as trustworthy opens doors. “Deciding to be trustworthy this time” doesn't.
- Commitments:People invest in you when they believe you'll follow through, not when you “might.”
- AI Alignment:An AI that “decides” to be helpful is less trustworthy than one that IS helpful.
What Kind of Agent Are You?
Take this quiz to discover your natural decision-making disposition. Answer honestly - there's no “right” answer, just self-knowledge.
You find a wallet with $500 and an ID. No one is watching.
Why This Matters
Derek Parfit introduced this thought experiment to challenge our assumptions about rationality. It turns out the “obviously rational” choice can be disastrously wrong.
AI Decision-Making
Should AI systems use CDT or FDT? An AI that "defects" when it's locally optimal might not be trusted by humans or other AIs.
Contract Theory
Many agreements work because people ARE trustworthy, not because they're enforceable. Parfit's Hitchhiker shows why reputation matters.
Ethics of Commitment
Should you keep promises when breaking them is "rational"? FDT suggests genuine commitment is more valuable than strategic compliance.
Self-Modification
Could you make yourself into an FDT agent? Can you change your decision algorithm? The answer matters for personal growth.
The Paradox Resolved?
Parfit's Hitchhiker isn't truly a paradox - it's a puzzle that reveals CDT's blindspot. The “rational” thing to do depends on what rationality means. If it means “best outcomes,” then FDT wins. If it means “best action given the current state,” CDT wins - and you die.
The driver is always watching.
What kind of person do you want to be?
Explore More Decision Theory
Parfit's Hitchhiker is closely related to Newcomb's Paradox. Both challenge our intuitions about rationality and prediction.
Reference: Parfit (1984), Yudkowsky & Soares (2017)