site banner

Recursive thinking, Newcomb's problem, and free will

felipec.substack.com

Newcomb's problem splits people 50/50 in two camps, but the interesting thing is that both sides think the answer is obvious, and both sides think the other side is being silly. When I created a video criticizing Veritasium's video This Paradox Splits Smart People 50/50 I received a ton of feedback particularly from the two-box camp and I simply could not convince anyone of why they were wrong.

That lead me to believe there must be some cognitive trap at play: someone must be not seeing something clearly. After a ton of debates, reading the literature, considering similar problems, discussing with LLMs, and just thinking deeply, I believe the core of the problem is recursive thinking.

Some people are fluent in recursivity, and for them certain kind of problems are obvious, but not everyone thinks the same way.

My essay touches Newcomb's problem, but the real focus is on why some people are predisposed to a certain choice, and I contend free will, determinism, and the sense of self, all affect Newcomb's problem and recursivity fluency predisposes certain views, in particular a proper understanding of embedded agency must predispose a particular (correct) choice.

I do not see how any of this is not obvious, but that's part of the problem, because that's likely due to my prior commitments not being the same as the ones of people who pick two-boxes. But I would like to hear if any two-boxer can point out any flaw in my reasoning.

3
Jump in the discussion.

No email address required.

My interpretation:

Free will is indistinguishable from randomness, and your brain has some randomness. The alien understands your personality, but can't predict randomness. For example, maybe immediately before the experiment, they cloned you and ran a perceptually identical experiment; then, if your clone picked both boxes in the practice run, the alien didn’t fill the opaque box for the real run.

You can win $1,001,000, but only if you're lucky. For example, let's say you have a 50% chance of choosing both boxes. Then the alien has a 50% chance of filling the opaque box. You have a 25% chance of winning $1,001,000...but a 25% chance of winning $0, and 25% chance of winning $1,000.

You can't trick the alien: if you're more likely to choose both boxes, the alien is less likely to fill the opaque box. Formally, if you with probability p pick both boxes, the alien with probability 1 - p fills the opaque box. Imagine your clone, in the same perceived surroundings, with your same strategy.

Alien / You One box Both boxes
Empty opaque box $0 * (1 - p)p $1,000 * p^2
Full opaque box $1,000,000 * (1 - p)^2 $1,001,000 * p(1 - p)

If the experiment was repeated ∞ times, on average you'd win $1000p^2 + $1000000(1 - p)^2 + $1001000p(1 - p) = $1000000 - $999000p; increasing p strictly decreases your average win. The statistically optimal strategy is to always pick one box.

Inalterability doesn’t imply futility. Choice and determinism are compatible regardless of how a complete picture of the universe’s physics turn out. All you have to do is rebut the intuition that leads you to fatalism.

Let’s just go ahead and say there’s a means–end relation between a contemplated action and a goal, just in case the desirability of the goal rationally contributes motivation for taking the action which is to say all else being equal (i.e., in the absence of conflicting consequences of higher priority), it makes sense to take the action for the sake of the goal’s achievement.

By what criteria can someone recognize the existence of a particular link? There’s an evidential criterion in that there’s a means–end link from action to goal, just in case the goal is more likely to be found to obtain when the action is found to be taken than when the action is found not to be taken (i.e., the action’s occurrence is correlated with the goal’s occurrence). A counter factual criterion implies there’s a means–end link from action to goal just in case the goal would obtain if the action were taken, but not otherwise (or at least the goal would more likely obtain if the action were taken than if otherwise). A causal criterion would mean there’s a means–end link from action to goal just in case the action causes (or tends to cause) the goal to obtain. And then as already mentioned, a fatalist criterion; which implies there’s never a means–end link from action to goal; and all actions are futile. (No one takes fatalism seriously in practice, but a lot of people believe it would indeed follow if the universe were deterministic; and therefore, they reject determinism).

The second one is somewhat counterintuitive. Inference involves propositions of the form, “If X then Y; we infer consequent Y from antecedent X.” But logic textbooks distinguish many varieties of inference (including subjunctive inference and material implication). Mathematical logic more often uses the latter because it’s much simpler to formalize. In material implications, “If X then Y,” just means, “It is not the case both that X is true and Y is false.”

The first three criteria often coincide with one another. If I take the action of crossing the street to achieve the goal of getting to the other side (and take that to mean that “action” as the initiation of a series of muscle contractions, not as the passage across the street, so the goal’s achievement doesn’t just follow tautologically from the action’s occurrence.) Knowing that I will cross informs you that I’ll get to the other side, but knowing I will not cross informs you otherwise, which justified the evidential criterion. If I walked across the street, I would (likely) get to the other side, but (very likely) not otherwise, fulfilling the counter factual criterion. And my walking across the street ‘causes’ me to get to the other side, fulfilling the causal criterion. By any of those three criteria, there’s a means–end link from the action of crossing to the goal of getting to the other side. Given that means–end link, and other things being equal, my desire to be on the other side rationally motivates my crossing.

The problem in Newcomb’s Paradox though, is that the criteria diverge. Taking just the opaque box, forfeiting the $1,000, is strong evidence that you obtain $1,000,000 in the opaque box, but taking both boxes is strong evidence that the opaque box is empty. But taking the transparent box or not has ‘no’ causal influence on the content of the already-sealed opaque box. The evidential criterion says there’s a means–end link from the action of taking just the opaque box, to the goal of obtaining $1,000,000 in the opaque box #1. But the causal criterion says otherwise; if there’s a means–end link, then it’s an acausal one. (This is also the exact same divergence that happens in real life Prisoner’s Dilemma situations)

The causal and evidential criteria diverge even in some completely mundane cases, where an action correlates with (but doesn’t cause) a subsequent state. If you were to take just the opaque box and not both, then there would be $1,000,000 in the opaque box, #2. In a more rigid formulation, counterfactual links are just causal links, what would differ if you were to take just the opaque box compared with your taking both boxes is whatever taking just the opaque box causes and nothing more than that.

This is why the fatalist criterion can’t be correct. Because there are innumerable means–end links in a deterministic universe. Even under the interpretation of Many-Worlds, quantum mechanics is technically deterministic (in that quantum amplitude flows deterministically through configuration space), and ‘still’ has the property that a given classical state (in some particular configuration-space branch) is followed by a prior, divergent classical successor states (in subsequent branches). It “looks” nondeterministic as far as choice is concerned but in any case present state of our universe is compatible with multiple futures, in some sense or other. Whether or not the multiplicity of futures is genuinely nondeterministic in some sense, the argument still holds. Some degree of multiplicity of futures is compatible with choice (even though it isn’t required), but ‘excessive’ multiplicity would undermine choice.


TL;DR: Take box B.

You have a 25% chance of winning $1,001,000...but a 25% chance of winning $0, and 25% chance of winning $1,000.

No. You are forgetting the correlation. The problem very clearly states that the predictor "almost certainly" will predict your choice. That means that for the 50% that you choose one-box, the predictor won't be filling the mystery box 99.99% of the time. And for the 50% that you chose one-box, the predictor will be filling the mystery box 99.99% of the time.

So the breakdown is: $1,001,000 (0.005%), $1,000,000 (49.995%), $1,000 (49.995%), $0 (0.005%).

Formally, if you with probability p pick both boxes, the alien with probability 1 - p fills the opaque box.

That isn't quire right because the predictor is not 100% accurate. If we assume the accuracy is 99.99% (q), then the probability that the predictor will fill the mystery box is (q)(1 - q). Close, but not quite the same.

The statistically optimal strategy is to always pick one box.

Correct.

But the real question my essay is trying to explore is why some people do not see that's the case. In my experience the reason why people choose two-boxes is that they completely ignore the accuracy of the predictor, and instead of assuming that q is close to 100%, they simply treat it as a completely unknown variable that could take any value, including 0.01, despite the formulation of the problem.

Why do they do that?

Newcomb originally specified that Omega would leave Box B empty in the case that you tried to decide by flipping a coin; since this violates algorithm-independence, we can alternatively suppose that Omega can predict coinflips.

You’re right, I misunderstood the problem.

Why do they do that?

Why do people gamble?

Alternatively, they also misunderstand the problem. I wonder if the “practice run” method of predicting their behavior would change their mind.

Could there be a common cause between eating ice cream and sunscreen efficacy? How about summer? When it’s summer more people eat ice cream and the rate at which people get skin cancer increases. If that’s the case, then it behooves you to use more sunscreen, not because ice cream makes it more effective, but because its effectiveness is correlated with summer.

It behooves you to use more sunscreen because it's summer, not because you're eating ice cream. If you spend your summer holiday at the beach then you should use sunscreen, even if you eat little ice cream because the ice cream stand at the beach has been closed. Meanwhile, if it's january and you eat ice cream for comfort while huddling indoors after a bad breakup, there's no point in using sunscreen.

If you don't understand the causalities, or if you have no other information, ice cream is a proxy that lets you do better than random. But if you do understand it's actually about summer, ice cream is just a distraction.

If you present a variant of Newcomb as "Omega is giving you a choice between two options, "two-boxing" and "one-boxing". People who picked "two-boxing" on average gain $1000, "one-boxing" pickers gain on average $1,000,000", then, sure, obviously B is the optimal choice. But if you add information about what the choices are, and the causal mechanisms behind the payout, then it becomes reasonable to analyze that, and the disagreement about actual Newcomb is pretty much about causality.

It behooves you to use more sunscreen because it's summer, not because you're eating ice cream.

You know that because I gave you the background of how I constructed my synthetic problem, but in the real world all you have is the correlation.

If you spend your summer holiday at the beach then you should use sunscreen, even if you eat little ice cream because the ice cream stand at the beach has been closed.

But to assume otherwise would be an inverse error fallacy. My question was in the form of a⇒b, I didn't ask anything if ¬a.

If you don't understand the causalities, or if you have no other information, ice cream is a proxy that lets you do better than random.

Which is all that matters.

Consider another study that finds a correlation between a) "the number of nesting storks in European countries" and b) "human birth rates". Are you just going to discount that correlation just because the causal network is not immediately available?

If you present a variant of Newcomb as "Omega is giving you a choice between two options, "two-boxing" and "one-boxing". People who picked "two-boxing" on average gain $1000, "one-boxing" pickers gain on average $1,000,000", then, sure, obviously B is the optimal choice.

That is literally what the original formulation of Newcomb's problem tells you is going to happen.

But if you add information about what the choices are, and the causal mechanisms behind the payout, then it becomes reasonable to analyze that, and the disagreement about actual Newcomb is pretty much about causality.

You are once again committing an inverse error fallacy.

The fact that you do not immediately see a direct causal link between the choice and the prediction doesn't mean that there isn't a causal network between the two. But more importantly: it doesn't negate the very real and undeniable correlation.

You know that because I gave you the background of how I constructed my synthetic problem, but in the real world all you have is the correlation.

No, I know it because it's not hard to figure out the actual causation. But where I know it from doesn't matter, it still means I can do better in practice.

But to assume otherwise would be an inverse error fallacy. My question was in the form of a⇒b, I didn't ask anything if ¬a.

That you asked about a doesn't mean analyzing ¬a can't give useful further insight. This has nothing to do with inverse error fallacy.

Anyway, the point stands: Understanding the causality allows for a more successful strategy than just knowing a correlation.

Consider another study that finds a correlation between a) "the number of nesting storks in European countries" and b) "human birth rates". Are you just going to discount that correlation just because the causal network is not immediately available?

For what purpose? For most purposes I could think of, I am indeed going to discount the correlation, because it's unlikely there's a direct causal effect between a and b. If I'm concerned about birth rates, I'm not going to conclude helping storks nest will matter just from that correlation.

That is literally what the original formulation of Newcomb's problem tells you is going to happen.

The description of the problem tells me the causal rules that govern the problem. Statistical outcomes might be derived from that, but there's no further benefit to that if you already understand the causality.

You are once again committing an inverse error fallacy.

What? How so?

The fact that you do not immediately see a direct causal link between the choice and the prediction doesn't mean that there isn't a causal network between the two. But more importantly: it doesn't negate the very real and undeniable correlation.

In Newcomb's problem, the causal links are stated, and we know what direction they do not go in, and we can reason with that. Correlation, meanwhile, tells us little if we already know the causation.

I'm not even sure how Newcomb is supposed to be analogous to the ice cream example. In your post you claim it is, but you never lay it out.

You know A (ice cream) is correlated with B (sun screen is effective against skin cancer), and B implies action X. You observe A, therefore you should do X. Assuming you have indeed no other information, this is correct, and it works because A is a proxy for C (summer) that causes B. But what's A in Newcomb's problem, or the correlation version thereof? It can't be the same as your decision X, because if you touch A, it loses its value as a proxy for C. If you're worried about skin cancer, eating more ice cream won't help, because it doesn't actually affect sun screen efficacy. It's just correlated with it.

Understanding the causality allows for a more successful strategy than just knowing a correlation.

No it doesn't. The answer is the same regardless.

For most purposes I could think of, I am indeed going to discount the correlation, because it's unlikely there's a direct causal effect between a and b.

Then you are an irrational person. It's that simple.

What? How so?

a⇒b,¬a∴¬b a) find causation, b) choose X.

If you find causation, then you choose X; you didn't find causation, you don't choose X. That's an obvious inverse error fallacy.

Correlation, meanwhile, tells us little if we already know the causation.

It's precisely the other way around. In the real world causation is merely a hypothesis, it's a tentative story you tell yourself. And the only reason you worked out a potential causation is because of the correlation.

Correlation is the only real information.

I'm not even sure how Newcomb is supposed to be analogous to the ice cream example.

There is a high correlation between the choice and the prediction. Ignoring that correlation is irrational.

No it doesn't. The answer is the same regardless.

I gave two examples where my strategy of analyzing the causation yields better results than yours. Only in one restricted example they come out the same.

Then you are an irrational person. It's that simple.

It's definitely not as simple as just claiming your interlocutor is irrational without doing any work to establish that.

a⇒b,¬a∴¬b a) find causation, b) choose X.

If you find causation, then you choose X; you didn't find causation, you don't choose X.

That doesn't sound like what I've been saying. I haven't actually made a claim on what you should choose, only that you should reason from causation if available. What you actually do depends on the details.

And generally, "A implies B, Not A implies Not B" isn't an inverse error fallacy. Only concluding the latter from the former is.

It's precisely the other way around. In the real world causation is merely a hypothesis, it's a tentative story you tell yourself.

  1. It's not the real world, it's a thought experiment, and we do know the causation from the construction.
  2. Even in the real world it's still possible to understand something about causation. We do in fact understand the causal relation between weather, ice cream and skin cancer.

There is a high correlation between the choice and the prediction. Ignoring that correlation is irrational.

You haven't done any work to establish that. You just claimed they're analogous and have the same conclusion. But I pointed out they're not actually analogous, so just repeating your claim isn't gonna cut it.

I gave two examples where my strategy of analyzing the causation yields better results than yours.

No you did not. You assumed the better results given that you were right. This is circular reasoning.

It's definitely not as simple as just claiming your interlocutor is irrational without doing any work to establish that.

It is a mathematical fact that if a is correlated with b and b leads to better outcomes, using a as information tends to lead to better outcomes.

I use a and I get better outcomes, you don't use a and you get worse outcomes. I'm being rational. You are being irrational.

Saying "it's not that simple" is just false.

I haven't actually made a claim on what you should choose, only that you should reason from causation if available.

No, you are not only saying that, you are also saying that if causation is not available, correlation should be ignored.

You assumed the better results given that you were right.

No, it's common knowledge that putting on sunscreen at the beach in summer is more useful than putting on sunscreen at home in winter. Or do you disagree?

In any case, if I know the causal effects, I can just derive the outcome.

It is a mathematical fact that if a is correlated with b and b leads to better outcomes, using a as information tends to lead to better outcomes.

You should replace "b leads to better outcomes" with "b affects the optimal strategy" for this to be meaningful. And "using a as information" does all the work here. Because in Newcomb there's no information A, and in the stork example you haven't specified what what the problem is and therefore what it would mean to "use A as information". "Using A as information" is not necessarily the same as "making your decision dependent on A in a specific way".

Assuming we fix your formulation to something reasonable: Better outcomes than random, on average, not necessarily better than a different strategy that takes advantage of a causal understanding of the problem.

I use a and I get better outcomes, you don't use a and you get worse outcomes.

You haven't done any work to establish I get worse outcomes. You haven't even stated the problem.

No, you are not only saying that, you are also saying that if causation is not available, correlation should be ignored.

No, I'm not. What needs to be available is knowledge about causation. If I know A doesn't causally affect B, there's no point in picking A to influence B. That's knowledge about causation that affects the optimal strategy.

In any case, if I know the causal effects, I can just derive the outcome.

That is irrelevant.

You should replace "b leads to better outcomes" with "b affects the optimal strategy" for this to be meaningful.

No wrote it "leads to better outcomes" because it does lead to better outcomes.

Because in Newcomb there's no information A

The correlation is the information.

Better outcomes than random, on average, not necessarily better than a different strategy that takes advantage of a causal understanding of the problem.

It is demonstrably better than all the other strategies.

You haven't done any work to establish I get worse outcomes.

I don't need to. It's a mathematical fact.

What needs to be available is knowledge about causation.

Otherwise you are going to ignore the causation. Which is precisely what I said.

That is irrelevant.

It's what invalidates your argument, because it means you don't need to rely on correlation.

No wrote it "leads to better outcomes" because it does lead to better outcomes.

The point is that that doesn't allow for the conclusion you want, and you need to fix your statement in order to be actually correct.

The correlation is the information.

The correlation between what? Your statement defined a correlation between some A which is observable and can be used as information and some B which is relevant. What is A?

I don't need to. It's a mathematical fact.

Mathematics is not a field where you just get to make claims without supporting them. In fact it's among the fields the least like that. To establish a claim as mathematical fact, you need mathematical proof. Actually, scratch that, first you need to define the claim, which you haven't done yet.

Otherwise you are going to ignore the causation.

I'm not sure what your argument is supposed to be here.

More comments

The problem with Newcomb's problem is that it basically involves time travel, and generally underspecifies how that time travel works. Consider a similar problem:

Time 1: you discover box 1 with 1,000,000 points

Time 2: you discover a box 2 with 1,000 points

Time 3: someone shows up claiming to be a time traveler shows up and says that if you hand him box 2, he will multiply it by 1,000, and then go back in time to put it in box 1. Actually, he claims, that's where box 1 came from all along and if you don't give him the 1,000 your box 1 will disappear.

Assuming you are rational/selfish, whether you say yes or no very much depends on whether he's telling the truth. If the problem carefully specifies that he actually is a time traveler telling the truth, and time travel does work this way, then obviously you should give it to him (one box). If this happened in real life, I would not give him anything and two box, because my prior on time travel existing is less than 1/1000 and he's just a liar trying to con me. If the problem is not careful and is ambiguous about his truthfulness then people's answers are going to depend on their trustfulness, suspension of disbelief, or just general attitudes towards how willing they are to buy time travel in a hypothetical logic puzzle.

Actual Newcomb's problem is basically the same as this in that decisions you make in the future affect things in the past, and the being making the boxes has to have time travel powers in order to guarantee a 100% success rate (though not all version of the problem specify this precisely, maybe it just has a 99% success rate, or a vague but high success rate) The reason people so confidently disagree is that in any well-specified version of the problem the answer is obvious, but in any vague under-specification it's ambiguous to which well-specified version people will round it to. This is the exact same reason the Monty Hall problem is controversial as well. It's not merely there being a counter-intuitive answer, it's that the problem specifications are very volatile and people keep leaving important details ambiguous that they shouldn't.

The problem with Newcomb's problem is that it basically involves time travel, and generally underspecifies how that time travel works. Consider a similar problem:

Not time travel, just perfect prediction. If you're actually a perfect predictor then you can in essence see the future. If you had a perfect model of physics and initial conditions then you could predict a coin flip with 100% accuracy. The kind of reason a human does when presented with the boxes is no different unless you a proposing some spooky non-material stuff in the reasoning. The formulation I'm familiar with is perfect prediction in which case there are four theoretical cases.

  1. You one box and Omega correctly predicted you would one box thus you get $1m

  2. You one box and omega incorrectly predicted you would 2 box so you get zero. This is impossible by construction, omega cannot predict wrongly.

  3. You two box and Omega incorrectly assumed you would one box, you get $1m + $1k. This is impossible by construction, Omega cannot be wrong.

  4. You two box and Omega correctly guesses you'll two box. You get $1k.

There are only two actually possible options with the given constraints and you get to make a choice which of them is the case. This is not a paradox unless predicting future events is impossible.

Your whole reasoning relies on there being something intrinsically impossible about predicting your decisions, even as you lay out the reasoning for them. Is it so hard to imagine that someone could read you well enough to know which outcome you'll ultimately reach?

Not time travel, just perfect prediction

Corporate want's you to find the difference between these two pictures and they are the same picture. Either they KNOW the future or they don't, and the problem as typically presented gives us no reason to believe that they do.

Corporate want's you to find the difference between these two pictures and they are the same picture.

That's precisely the reason why the actual formulation of the problem avoided 100% accuracy: the prediction is almost certain, not perfect.

This whole thread is a red herring.

Time travel would imply omnipotance rather than mere omnicience.

This is impossible by construction, omega cannot predict wrongly.

This is only possible if you model people as deterministic mechanisms and not as rational game theory agents. If Newcomb's problem posits that you make a decision AFTER Omega makes a decision, then Omega can be wrong. For instance, if you play a mixed strategy (one box with 50% probability, two box with 50% probability), then omega has to not only model your brain perfectly, but also your coin flip. If you used quantum decay to randomize then it would have to predict that perfectly. If omega can perfectly predict that then you've removed an important tool in Game Theory. It's like trying to argue that Rock Paper Scissors as a paradox because no matter what you do your opponent can predict you and defeat you. If you tell me that I can't do mixed strategies, or that other people can, in the past, respond to my mixed strategy outcomes then that's fundamentally incompatible with 90% of Game Theory. I don't even know what the rules of this are or what the goal is. Omega can do literally whatever it wants and I'll get literally any payoff that it chooses to give me. I suppose if it's a god that wants to punish two boxers then I guess I'll obey its commandments and one box, but that's not adjency, that's not game theory, that's just submitting to the religious edicts of a higher power with arbitrary rules.

We can model it much more simply by making an alternate version where, at time 1, you decide to one box or two box. Then at time 2 omega is informed of your decision and puts stuff in a box, then at time 3 you get the result of the decision you already made. Here you obviously 1 box. This is a very straightforward, simple, and uninteresting game theory problem. The problem with this then is not what the original Newcomb's box says happens. It says you make the decision after omega does. If you actually mean that people make the decision before omega then saying they make it after is lying.

Maybe this is still useful as a critique of attempts to map Game Theory to reality. Essentially saying "Every game is an iterated game played out over the course of your lifetime. Any decisions you make will affect your personality and reputation, so doing greedy things will hurt you in the long term even if they are the rationally correct choice to a one shot game that you see in the short term." Which, sure. This is how cooperation can exist in prisoner's dilemma-like situations, because you cannot incentivize (and it is irrational to try) to cooperate in a true, pure, one-shot prisoner's dilemma with no modifications, but none of those conditions apply in real life. But you also don't have mind-reading omegas in real life either, so I don't think that's quite what people mean by this.

Ultimately, the premises are fundamentally contradictory, so the only way to come to a solution is to suspend disbelief on half of them. Either you believe that omega can perfectly predict you and you have no agency, so hope that your were born as a one-boxer (because you don't get to decide), or you believe you are a rational agent who can make decisions when it says you can, in which case omega can't predict you so might as well two box. But these are beliefs about the premises of the problem, not about what is good epistemic or rational behavior in a given coherent scenario (which always follows logically and mathematically from the premises and math to determine the maximum payoff)

This is only possible if you model people as deterministic mechanisms and not as rational game theory agents

These are not in tension. In some game theory scenarios adding randomness, if such a thing is actually possible, is useful to some agent. But Newcome's problem is not such a scenario. Adding any chance of walking away without the $1 million is not worth going for the extra $1k and to the best of my knowledge the best you could do by adding randomness would be to make your expected value $500,500. Whereas your expected value if you cooperate is $1m

As for the rest of the post, yeah just seems like you're demanding the hypothetical grant you libertarian free will and say something different than it says. It's very "But I did eat breakfast this morning" fighting of the hypothetical. If you want to demand that actually you can't be predicted, even hypothetically then you're just not willing to engage with the question.

I'm not demanding that I can't be predicted, I'm asking "what does that even mean?" and also "if that's true in the hypothetical, then what's the point? there's nothing left." The problem just axiomatically erases the game away with no details.

Consider another game. Suppose you are given an opportunity to play Tic Tac Toe against Umega. If you choose to play and you lose, you get 0 points. If you win or tie, you get 1000 points. But if you forfeit prior to playing the game you can get 100 points. Umega can't predict the future, but instead is extremely tricky. Somehow, it always wins at Tic Tac Toe. No matter what you originally intend to do or plan you make, things don't go the way you intended and it tricks you and lose anyway. What do you do?

The way to maximize your points here is to forfeit. Because if you try to play the game then, axiomatically, Umega tricks you and you lose. If you present a winning strategy for Tic Tac Toe that cannot be beaten and say you'll stick to that strategy no matter what Umega does then I retort that you attempt to do that, but then Umega tricks you and you lose anyway. It's a master of psychology and deceit beyond human comprehension. Okay, fine, you have no agency, you forfeit and get the 100 points because the game mathematically reduces to "forfeit: 100 points, don't forfeit: 0 points" with no other options.

Now suppose I go around presenting this to Chess Grandmasters or something (since Tic Tac Toe masters don't exist), and present it as some deep and extremely challenging variant of Tic Tac Toe and grand strategy. Chess Grandmasters stumped by this challenging variant of Tic Tac Toe!

It's not Tic Tac Toe! The optimal strategy of the game is completely and utterly independent to strategies for winning Tic Tac Toe. No matter how much or how little someone knows about Tic Tac Toe, it has nothing to do with the optimal strategy for this game (which is just forfeit) because it's not even engaging with the rules at all. You could substitute basically any strategy game there, I merely used Tic Tac Toe to point out the apparent contradiction of axiomatically declaring that you lose despite there being a known winning strategy.

This has the same fundamental problem as Newcomb's problem. If your agency is stripped from you then what even is the point of the thought experiment? "Suppose you have no free will, and will be punished for trying to act as if you have free will. What do you do?" I guess I forfeit and hope that next time I'm offered a thought experiment it allows me to make choices.

The problem with Newcomb's problem is that it basically involves time travel

No it doesn't. That's what two-boxers claim in order to fit the problem into their view of reality, but that commits an appeal to incredulity fallacy.

Actual Newcomb's problem is basically the same as this in that decisions you make in the future affect things in the past, and the being making the boxes has to have time travel powers in order to guarantee a 100% success rate

That's not the Newcomb's problem: 100% success rate is never specified, it's "almost certainly". That means close to 100%, not 100%.

You are saying it's not possible for Omega to have such accuracy unless the future affects the past, but you don't provide any justification for that. You are basically saying: "I don't see how X is possible, therefore X is not possible". That is not a valid argument, that's an appeal to incredulity fallacy.

That is precisely why I devised my sunscreen problem. Your argument is the same as saying: "I don't see how efficacy against skin cancer and eating ice cream could be causally related, therefore they are not causally related".

Just because you don't see how Omega could predict your choice almost certainly without backwards causality doesn't mean that it can't.

That's not the Newcomb's problem: 100% success rate is never specified, it's "almost certainly". That means close to 100%, not 100%.

This is why I complain about it being underspecified. If omega can be wrong then the entirety of the problem hinges on when/how/why it can be wrong. If it's possible for someone to get away with two boxing and get both boxes, and you can put yourself in that scenario, then you can win by two boxing. If omega attempts to minimize its failed prediction rate, maybe you can employ a mixed strategy (flip a very slightly weighted coin) which randomizes and then you could one box with probability 50.01% and two box with probability 49.99%, causing omega to predict you will one box, and you always get the one box plus almost half the time you get a bonus. Can it predict coin tosses before they're made? Can it predict radioactive decays? This is not mere psychology. I'm not saying it's impossible for someone to cold read you and make educated guesses. If I read psychological profiles on people I could guess that sneakier, greedier, more disagreeable people are more likely to two box while straightforward, naive, or chill people are more likely to one box, and probably get like a 70-80% success rate. Is that what omega is doing? Because then I'm just screwed: I overthink things and seem like a two boxer and if I bit the bullet and decided to one box I would end up getting nothing because it would false guess me as a two boxer.

Literally none of this is explained in the premise. The problem very much depends on information is not present. If I give you "MathWizard's Paradox" and say

"There are two boxes. The left box has some money. The right box has a different amount of money not equal to the left box. You only get one box, which one do you pick?"

This likewise is going to lead to disagreement (or would, if people cared and tried to argue about it). If I added a whole bunch of window dressing to disguise the obvious stupidity of this problem, a bunch of superficial characteristics that made it seem more interesting and less obvious, it wouldn't change the underlying symmetry and lack of information. I have, in my head, decided how much money is in each box. There is a correct answer. But I haven't told you enough information for you to deduce it, and there are infinite variations of this problem, half of which have the opposite correct answer.

Just because you don't see how Omega could predict your choice almost certainly without backwards causality doesn't mean that it can't.

It's not that I can't see a way for this to happen, it's that I can imagine a dozen hypothetical ways it could try to do this, and half of them let me two box anyway while half of them don't.

But why can't the problem just be that it's a very good predictor but it's not doing anything supernatural and the way it does the prediction is simply unknown to you? You need a clear specification to know if it's possible to trick the oracle. But even if it's possible, if there's no reasonable way for you to deduce how you would trick the oracle then strategically the solution is still to just be a 1-box pre-committer.

But even if it's possible, if there's no reasonable way for you to deduce how you would trick the oracle then strategically the solution is still to just be a 1-box pre-committer.

Probably. But "there is no reasonable way for you to deduce how you would trick the oracle" is usually not explicitly spelled out in the original problem. It's just left unmentioned, and left as an exercise to the reader as to how or whether it might be tricked.

"A man comes at you with a knife and demands 1000 points from you. If you refuse he will try to stab you and if you get stabbed you lose 10,000 points. What do you do?" Is a question one might ask and argue and debate about. But it is not a logical math puzzle. It depends on empirical facts about the real world and the specific person being asked the question (how good are you at martial arts.) This is not interesting, the rules are not well-defined, and there is not a concrete definitive and objectively correct answer that is true independent of who is answer it.

But "there is no reasonable way for you to deduce how you would trick the oracle" is usually not explicitly spelled out in the original problem. It's just left unmentioned, and left as an exercise to the reader as to how or whether it might be tricked.

That is false, the original formulation of the problem says very explicitly: "all this leads you to believe that almost certainly this being's prediction about your choice in the situation to be discussed will be correct".

If the prediction "will be correct", then you cannot trick it. Furthermore, Robert Nozcik goes on to say "You have no reason to believe that you are any different, vis-a-vis predictability, than they are.".

This thinking precisely aligns with my hypothesis: two-boxers think they are special, they have free will, they have the ability to choose otherwise, they somehow exist outside the system, thy can be the exception regardless of how unlikely that is.

That's not how time works. Sequential games in game theory are formally defined such that at each time step, players can make decisions that depend on events that have happened in the past, but not in the future (or present if there are turns with simultaneous moves from different players). A complete strategy profile for each player is defined a series of decisions and contingencies that cover every possible game state so it's always possible to know what they would do at any point. At time 1 I do X, with no conditions because nothing else would happen. At time 2 the other person does Y if I x or Z if I Y. At Time 3 I do A if they Y and B if they Z. I can't say "at time 1 I do X if they Y" because they haven't Y yet. Which means that whoever goes last IS special, because they get the last decision that nobody can respond to. That's how time works in real life, that is how sequential games that happen over time are defined in standard game theory.

If Y does or does not happen at time 2, then you can only condition on Y after time 2. If you claim to be conditioning on Y and it's time 1 then that's a contradiction. You are either lying, or have invented a brand new version of Game Theory and decision-making theory, which requires literal books to establish, not a couple of sentences at the beginning of a problem.

Two boxers think they are special because you told them they were. When you say "Omega can predict you" you are saying "you are not special." When you say "Omega leaves and then you make your decision afterwards" you are saying "you are special." One boxers believed your first sentence more, two boxers believed the second more. Either way you are lying because they can't both happen simultaneously. If Omega can respond to your decision then it goes after you. You have mathematically described the scenario "you pick one box or two, and then if you picked one box Omega puts more money in the big box", and then appended the statement "Omega goes before you." I am reminded of Yudkowsky's Parable of the Dagger. You can say whatever words you want to say, even if they're self-contradictory. They're just words. But then they no longer map to anything real or meaningful.

If you claim to be conditioning on Y and it's time 1 then that's a contradiction.

No it is not.

You are either lying, or have invented a brand new version of Game Theory and decision-making theory, which requires literal books to establish, not a couple of sentences at the beginning of a problem.

You didn't bother reading my essay, did you?

All this is explained very clearly in it.

When you say "Omega leaves and then you make your decision afterwards" you are saying "you are special."

Wrong.

If it's possible for someone to get away with two boxing and get both boxes, and you can put yourself in that scenario, then you can win by two boxing.

Let's put a number on it -- what successful prediction rate would Omega need to have for you to consider taking both boxes? Depends how badly you need a thousand bucks I guess?

IF Omega's predictions have an independent p probability of being wrong, and the ratio of the big box to the small box is R, then two boxing is worth it if p > R/(R+1), which for the original problem where the big box is 1000 times larger would mean that you should only two box if p > 99.9% , meaning it's almost always wrong. Which obviously makes sense. The extra box is a thousand times smaller, so only worth it if you are risking less than 1/1000 chance of the first box.

The problem is not so much that the problem is mean and unfair and not letting me two box because I'm willing to risk my one box. The problem is that it's not specified to be random. It's not specified at all. You can't solve logical and mathematical problems that aren't well-specified. You can shrug and say "I dunno, If I don't know what's going on I guess one boxing seems more likely to work out for me." If I make Mathwizard's Paradox V2 and say

"There are two boxes. The left box either has $0 or $10. The right box either has $0 or $10000. You can only pick one box to open and keep, which do you pick?" You'd probably pick the right box, because might as well, but this is not a logical deduction which must be a correct answer. Maybe Box 1 has money with higher probability because I'm more willing to give up $10 than $10,000. If I did this demonstration in real life in a classroom, the right box would guaranteed be empty because no way am I sacrificing that much for a demonstration. But if I haven't specified probabilities or anything within the problem then you can only guess. There is no unique solution because there is no unique problem, it's actually a broad class of problems that all satisfy the wording in the premises. Most models that satisfy the axioms of Newcomb's problem have one boxing as the correct solution, but some have two boxing as the correct solution. The notion that you are randomizing between all possible variants of an underspecified problem can't be mathematically resolved without applying a measure to that space, which is not generally how people solve logical problems, and itself still involves semi-arbitrary choices that can result in different answers.

Wolpert and Benford argue that the problem is ill-posed for almost any error rate, so it's not clear that stuffing in a particular number actually helps resolve the problem. I haven't spent all that much time with this problem yet, so I'm not going to commit to saying that I think they're right about this, but it jives with my intuition.

Generally speaking, in order to have a well-posed game, one must be very formal and precise in many details. Particularly things like order of operations, allowable policy spaces, information sets, and details around estimators. I've become more annoyed by estimators in various problems over time, even apart from the relatively minimal thinking I've done on Newcomb's problem. One of the greatest sources of my criticisms in reviews of submitted papers (or even when my collaborators come to me with a problem set-up and/or proposed solution) revolves around not taking sufficient care around estimators.

I do think that Wolpert/Benford at least suffice in arguing that there are at least two possible formalizations that are sufficiently well-posed. I think it's probably on someone else to either bite the bullet and say they are clearly choosing one form or the other... or to provide a sufficient alternative formalization that makes the details more clear.

Aside on Yudkowsky, relevant for the discussion below and my thoughts generally on these sorts of problems. I wouldn't be surprised if he has/had something in mind like what he did to the prisoners' dilemma problem, with the business about source codes and such. There could be a way to try to resolve Newcomb's problem in a similar fashion, but my perspective is that it would still be proposing a very specific formalization... and one that is not at all just a clear instantiation of the initial problem statement. I might go so far as to say that in the prisoners' dilemma case, he just proposed a different problem, with different policy spaces. Interesting in its own right, sure. Probably correct for that particular formalization of that particular version of the problem, sure. But also kind of just a different problem. In general, even minor tweaks to these aspects of the formulation can result in different games.

Similarly for Newcomb's problem, unless one takes the step of clearly laying out in a formal way exactly what they're going to specify for the domain of the problem (and then, I guess, argue that this is like, 'the one true interpretation of the original problem' or something), then I'm probably going to lean toward just thinking that the original problem is so informally stated as to be ill-posed.

I can't be arsed to do it, but it seems pretty trivial to plug whatever parameters you like into some simulation code and let it run a few million times? Some cases you will take one box but the alien predicted wrong and you get nothing; sometimes you take two and he was wrong and you get 1M + 1K. So long as he is mostly correct I don't see how the EV is not strongly driven by the cases where you pick the mystery box and get $1M -- no loss of free will required.

My sense tracks with that of @MathWizard. If you add some particular assumptions about the form of the problem, you can code it up, and likely, for a wide range of parameters, 1-boxing is higher EV.

I think the criticism of Wolpert/Benford is also similar in type. (Again, not really having spent sufficient time with it.) That is, they construct two possible interpretations. Either of them, you could just sit down and code. It may even be the case that for a wide range of parameters, EV still points to 1-boxing for both versions. However, my understanding of their claim is that those two codes will be very different. Even the strategy spaces are fundamentally different in their claim. And for a similarly wide range of parameters, the joint distributions will be contradictory. The point is not that the sign may be the same for this particular ratio of prizes; it's that there are just multiple contradictory ways to construct it.

Of course, someone could take the time and search out what ratio of prizes in the respective boxes produces maximum tension between the two interpretations, so that rather than having the two EV calcs mostly pointing in the same direction, we could maximize how often they conflict. That's kind of not the point of the critique, but I suppose it could be done if one found it necessary to really grok the difference between a well-posed and ill-posed problem. Though, like you put it, I probably can't be arsed to do it.

That said, I am almost motivated enough to try it (but it would probably have to wait a few weeks, and then, I'll probably be bored with it). I certainly don't know that we can for sure find parameters where the two possible games differ in terms of sign. If this problem was actually relevant to my research interests, I would absolutely just do it, because it's one where I have a vague sense of, "Wouldn't it have to be amazingly coincidental if the values were different, but the signs were always the same?" And when I sniff at the possibility that there could be an amazing coincidence like that, it's usually an indicator of a really interesting theoretical opportunity.

If omega can be wrong then the entirety of the problem hinges on when/how/why it can be wrong.

No it doesn't. The entirety of intelligent agency relies partial knowledge, and you want to simplify Newcomb's problem to a problem of simply not having enough knowledge: "if only we knew how the system works then we could cheat the system".

The original formulation of the problem tells you explicitly that you have no reason to believe you will be any different, that is: you will not cheat the system.

It tells you explicitly your choice will be predicted almost certainly.

flip a very slightly weighted coin

The problem states that if you do that, Omega will put nothing in the mystery box.

Literally none of this is explained in the premise.

Yes it is.

It's not that I can't see a way for this to happen, it's that I can imagine a dozen hypothetical ways it could try to do this, and half of them let me two box anyway while half of them don't.

And these hypothetical ways violate what is very explicitly stated in the problem.

In the red/blue button debate, people sometimes argue that blue-pushers only say they push blue, but would actually push red if the experiment were carried out.

From this perspective, the blue/red debate is a misunderstanding:

  • Red-pushers are answering the thought-experiment literally based on the perceived coordination point
  • Blue-pushers are engaging in rhetoric, in an attempt to set the coordination point

Could Newcombe's problem just be a misunderstanding?

  • Two-boxers are answering the thought-experiment literally based on what they would do
  • One-boxers are engaging in rhetoric, precommitting to be the kind of person who wins the game

From this perspective, the blue/red debate is a misunderstanding:

But that's not true in my personal case. I truly would push the blue button, even that means I die. I would still do it out of principle. It's not rhetoric.

Could Newcombe's problem just be a misunderstanding?

It's not true in my case. I choose one-box because I truly believe that's what most likely to maximize my reward.

That's an interesting idea with the red/blue button scenario. I don't think it's entirely true, but there are probably some social dynamics things going on that are at least adjacent to it.

But I don't think it's true at all of Newcomb. (Empirically, personally: it's nothing to do with why I'd one-box, or why I say I'd one-box.)

One-boxers are literally answering what they would do, because it's the thing that gets you the most money by the very definition of the problem. We can argue that the problem isn't physically realisable or whatever, but if you accept the problem as stated, then two boxers are just incorrect.

The red blue button experiment is about morality, IMO morality only truly exists for practical, non-abstract, circumstances and the fact that people argue for a difference what people say and what they would do reinforces this conviction.

Two-boxers fundamentally disbelieve the premise - refuse to engage the actual hypothetical. The strict domination idea, that 'once you're in the room, the money is already there' is discarding the very thing the problem is working with - the predictor predicts you. If you are thinking along two box lines, then the predictor will leave one box empty. Once you have entered the room, the game is already over. The prediction has already been made, and if you're a two-boxer, you've already lost. You have to realize that the only way to win is for the predictor to think you are going to pick one box, and for the predictor to think you are going to one box, since it is extraordinarily accurate, you have to be a one-boxer. You can't solemnly resolve to be a one-boxer while secretly planning to be a two-boxer, because the predictor will pick up on that. You have to actually have the thought patterns of a one-boxer. You have to believe one-boxing is the superior strategy. It's not 'irrational' - it's playing the game. In this specific case, because of the predictor's stipulated accuracy, one-boxing is the strategy that wins. It doesn't matter how the accuracy comes about - a lack of free will, time travel, hand waving woo - it's there. The experiment depends upon it, and discarding it is foolish.

Once you have entered the room, the game is already over. The prediction has already been made, and if you're a two-boxer, you've already lost.

Honestly, I agree with this framing, and I think it's a strong argument for two-boxing.

If I already lost (or won) the "get Omega to make a beneficial prediction" game, then all that remains to do is two-box and collect consolation prize (perhaps on top of the jackpot). My decision doesn't impact what's in the second box, only my personality at the time of Omega's prediction does, but that's factor I can't influence because the game from my perspective starts after that.

The question isn't "Omega will choose you for Newcomb's problem in one year, do you try to pre-commit to one-boxing just this once?" It's "you're sitting in a room, Omega has explained the rules to you, the box is already filled." If I one-box now, it won't improve my outcome (in fact it will reduce my payout by $1000 either way). Only already being, per Omega's judgment, the kind of person who would one-box to begin with will, which I can't change retroactively.

If you choose to one box after the decision period by reasoning it out then you are in fact the kind of person who would one box. If you say fuck it, it's too late then you're in fact the kind of person to two box. Thus it still hinges on your decision, albeit the concept of libertarian free will is questionable.

No, it doesn't, because my decision doesn't retroactively change what kind of person I am. The causality goes in the other direction.

Basically, depending on what kind of person I am, Omega offers me a different game.

Your choice reveals what kind of person you are, which omega already knew. If you didn't know what you were going to choose ahead of time that's a mark of your ignorance, not omega's.

I.e. my choice doesn't change anything, it just "reveals" information already known to the relevant player Omega.

What I know or don't know ahead of time doesn't matter, because I'm not making a decision ahead of time.

You can also just model it as omega knowing whether or not you're smart or lucky enough to come up with the right answer to get the $1m. If you pick the right answer you get $1m if you don't then you don't. It's a bit of a brain twister but it works out.

But that's kinda circular, because whether the answer does get me the money depends on Omega's knowledge and decision. So whether I'm the kind of person who Omega rewards is luck, and my decision doesn't retroactively affect it.

In a clockwork universe it's of course all luck all the way down.

More comments

Two-boxers fundamentally disbelieve the premise - refuse to engage the actual hypothetical.

Of course, that was my conclusion as well. But the question is why.

I've discussed the problem with many two-boxers and all of them dismiss the high accuracy of the predictor on the basis that your choice doesn't affect the prediction. But they never bother to explain why that's relevant.

Even Robert Nozick in the original paper says if the decision doesn't affect the final state, then one should ignore the accuracy of the predictor. Why? Because that's what he was "lead to believe".

There is no reason to discount the correlation just because two-boxers don't see a direct causal link. That's why I devised the sunscreen problem: to show why it's irrational to discount a correlation on the basis of no apparent direct causal link.

In my experience most 1-boxers are 1-boxers because they implicitly believe in backwards causation. 2-boxers are people that realize that backwards causation isn't possible. But then when you realize pre-commitment is an option you should be back at 1-box. If a 2-boxer has heard the argument for pre-commitment and remain 2-boxers then I don't understand that but I think most people just don't even get to that point because they're still stuck on the backwards causation part. To be honest I don't really see that recursion had to do with it. It just comes down to "be the kind of person the oracle predicts would guess 1-box, the oracle is smart and accurate enough that you shouldn't try to trick it".

In my experience most 1-boxers are 1-boxers because they implicitly believe in backwards causation.

That is not not true. Two-boxers make that claim with zero evidence. I've been told I must believe in backwards causation because I'm a one-boxer.

Let me be clear: I do not believe in backwards causation, and I'm a one-boxer.

To be honest I don't really see that recursion had to do with it.

Then why do you insist in backwards causation? If a and b are correlated, that's all you need to know to make an informed decision, no backwards causation needed.

I think you misunderstood my comment. I'm also a 1-boxer and I don't think you need to believe in backwards-causation to be a 1-boxer. I just think a lot of 1-boxers do.

I'm just trying to explain why I think 2-boxers are 2-boxers. They think "backwards causation is wrong so 1-boxing is wrong". Actually backwards causation is wrong but it doesn't mean 1-boxing is wrong.

I just think a lot of 1-boxers do.

I don't think so. I haven't seen a single one-boxer make that claim.

I'm just trying to explain why I think 2-boxers are 2-boxers. They think "backwards causation is wrong so 1-boxing is wrong".

Yes, that is certainly one of the rationales of two-boxers. But that doesn't mean many one-boxers do actually believe that.

But then when you realize pre-commitment is an option you should be back at 1-box.

But is pre-commitment an option? The problem as usually stated stipulates that by the time Omega has finished explaining the rules to you, the content of the box is already determined. It's too late for pre-committing.

P.S.: Should I be worried for myself because I know what your name refers to?

Even if it's too late for pre-commitment, if you're the kind of person who decides it's not too late and only opens the 1 box, it'll have the money.

By pre-commit I just mean that for the general class of newcomb-like problems you decide you will pick the 1-box option. So as long as you are aware there are problems like this you can do it.

And maybe :)

I don't think the problem is recursive thinking here. Newcomb's problem is fairly simple to analyze recursively, I think the problem is just that people strongly dislike the idea that freewill doesn't exist, so much that they are recalcitrant to even accept it as a premise to a thought experiment. Unsurprising since if it weren't so there would be no discussion of freewill, since it obviously can not exist.

I think the problem is just that people strongly dislike the idea that freewill doesn't exist, so much that they are recalcitrant to even accept it as a premise to a thought experiment.

Yes, but the issue is why. I contend there is a common cause why they don't believe in free will, and also why they are two-boxers.

They are committed to see themselves as outside the system they inhabit, in other words the opposite of embedded agency. And I think the reason why they have so much trouble understanding they are part of a system in which their decisions affect their decisions is lack of recursivity fluency.

Ask any two-boxer if they are able to imagine their choice to be causally linked all the way back to the beginning of the universe. I don't think they can.

Which side in the Newcomb debate is supposed to have the hangup about free will? Yudkowsky for example is a two boxer, and I don’t think he would perceive himself to have any psychological obstacles regarding free will in this case.

As others pointed out: Yudkowsky is a one-boxer. Can you provide a single two-boxer that doesn't believe in libertarian free will? I doubt there's any.

Where's yud's 2 box argument? I'm not sure what the very smart 2 boxers believe but one of the most common two box explanations boils down to a disbelief that their actions can be predicted at all because they imagine some kind of free will break after the boxes are set.

I didn't notice the first link was yud himself but unless I'm reading the post wrong he seems like a one boxer? does he take a definitive side elsewhere?

All good, I was really confused because it feels like being a two boxer would have super conflicted with everything he believes in.

The two-box-takers. I don't know what Yudkowsky is thinking.