Home/Explainers/Parfit's Hitchhiker

A Decision Theory Puzzle

Parfit's
Hitchhiker

You're dying in the desert. A driver says “I'll save you if I predict you'll pay me $100 in town.” You can't lie - he reads intentions.

You stumble through the desert, dehydrated and dying. A car appears on the horizon. The driver rolls down the window.

“I'll drive you to town - 100 miles away - if I predict you'll pay me $100 when we arrive. But I should warn you: I'm 99% accurate at reading people's true intentions.”

You have $1000 in your pocket. Once you reach town, no one can force you to pay. The driver knows this. He's not asking what you'll say - he's reading what you actually intend.

Here's the dilemma:

1.If you plan to stiff him once safe, he'll see it and leave you to die.
2.You can't bindingly commit to pay - there's no contract that works in the desert.
3.Once in town, keeping the $100 is the “rational” choice - the ride already happened.

The question isn't “What will you decide in town?”
It's “What kind of agent ARE you?”

PART I

Face the Driver

Put yourself in the desert. The driver is reading you right now. What do you intend?

PARFIT'S HITCHHIKER SIMULATOR

You

Driver

You are dying of thirst in the desert. A driver approaches.

"I'll rescue you if I predict you'll pay me $100 when we reach town. I'm 99% accurate at reading intentions."

You have $1000. Once in town, no one can force you to pay.

Notice: if you try to “decide later,” the driver sees that as planning to stiff him. You can't fool a 99% accurate predictor by being undecided.

PART II

The CDT Agent Dies

Causal Decision Theory (CDT) says: evaluate each choice by its causal consequences from that moment forward.

CDT Reasoning in Town:

1. The ride has already happened. Paying or not won't change the past.
2. Paying $100 leaves me with $900. Not paying leaves me with $1000.
3. Therefore, not paying is strictly better. Keep the money.

This reasoning is impeccable... once you're in town. The problem is that it kills you in the desert.

The Fatal Loop:

1.CDT agent knows they won't pay in town (that's the “rational” choice)
2.Driver sees this intention with 99% accuracy
3.Driver refuses to rescue
4.CDT agent dies with $1000 in their pocket

The CDT agent is “rational” at every step. They're just dead.

DECISION TREE

YOUR DISPOSITION

What kind of agent are you?

CDT Agent

"Won't pay in town"

FDT Agent

"Will pay in town"

Driver predicts your disposition...

LEFT TO DIE

99% of cases

$0, Dead

Rescued

1% (prediction error)

$1000, Alive

RESCUED

99% of cases

$900, Alive

Common path

Rare path

Hover over the decision tree to see the CDT path highlighted.

PART III

The FDT Agent Survives

Functional Decision Theory (FDT) takes a different view: don't evaluate individual actions - evaluate your entire decision algorithm.

FDT Reasoning:

1. The driver's prediction is based on my decision algorithm, not my moment-to-moment choice.
2. If my algorithm says “pay when rescued,” the driver will rescue me.
3. Running “pay when rescued” results in: alive with $900.
4. Running “don't pay” results in: dead with $1000.
5. Therefore, I should BE the kind of agent who pays.

FDT asks: “What algorithm, if I were running it, would lead to the best outcomes?”

The answer is clear: the “pay your rescuers” algorithm wins.

The Key Insight

Your decision algorithm is visible to predictors. It affects their behavior toward you. The “rational” action at each moment might produce a globally irrational algorithm.

“It's not about what you decide - it's about what you ARE.”

PART IV

See It In Action

Let's run thousands of scenarios. Watch how CDT agents and FDT agents perform when facing the 99% accurate predictor.

CDT vs FDT AGENT COMPARISON

Simulation Speed: 200ms

FastSlow

CDT Agents

"Once I'm in town, I won't pay. That's the rational choice."

Survival Rate

Avg Expected Value

0 survived / 0 total

FDT Agents

"I AM the kind of agent who pays. The driver can see that."

Survival Rate

Avg Expected Value

0 survived / 0 total

The numbers don't lie: being a “payer” is more valuable than the $100 you keep by not paying.

PART V

Being vs Deciding

The deepest lesson of Parfit's Hitchhiker isn't about predictors or desert rescues. It's about the difference between deciding to pay and being a payer.

Deciding to Pay

- A choice made at a moment in time
- Can be reconsidered when circumstances change
- Based on consequences from that moment forward
- Others can't reliably predict it

Being a Payer

- A property of your character/algorithm
- Stable across situations
- Based on who you want to be
- Others CAN reliably predict it

The CDT agent tries to decide their way to good outcomes. The FDT agent becomes the kind of agent who gets good outcomes.

Real-World Applications

Reputation:Being known as trustworthy opens doors. “Deciding to be trustworthy this time” doesn't.
Commitments:People invest in you when they believe you'll follow through, not when you “might.”
AI Alignment:An AI that “decides” to be helpful is less trustworthy than one that IS helpful.

PART VI

What Kind of Agent Are You?

Take this quiz to discover your natural decision-making disposition. Answer honestly - there's no “right” answer, just self-knowledge.

WHAT KIND OF AGENT ARE YOU?1 / 5

You find a wallet with $500 and an ID. No one is watching.

PART VII

Why This Matters

Derek Parfit introduced this thought experiment to challenge our assumptions about rationality. It turns out the “obviously rational” choice can be disastrously wrong.

AI Decision-Making

Should AI systems use CDT or FDT? An AI that "defects" when it's locally optimal might not be trusted by humans or other AIs.

Contract Theory

Many agreements work because people ARE trustworthy, not because they're enforceable. Parfit's Hitchhiker shows why reputation matters.

Ethics of Commitment

Should you keep promises when breaking them is "rational"? FDT suggests genuine commitment is more valuable than strategic compliance.

Self-Modification

Could you make yourself into an FDT agent? Can you change your decision algorithm? The answer matters for personal growth.

The Paradox Resolved?

Parfit's Hitchhiker isn't truly a paradox - it's a puzzle that reveals CDT's blindspot. The “rational” thing to do depends on what rationality means. If it means “best outcomes,” then FDT wins. If it means “best action given the current state,” CDT wins - and you die.

The driver is always watching.
What kind of person do you want to be?

Explore More Decision Theory

Parfit's Hitchhiker is closely related to Newcomb's Paradox. Both challenge our intuitions about rationality and prediction.

Newcomb's Paradox All Explainers

Back to Home

Reference: Parfit (1984), Yudkowsky & Soares (2017)