Newcomb's problem splits people 50/50 in two camps, but the interesting thing is that both sides think the answer is obvious, and both sides think the other side is being silly. When I created a video criticizing Veritasium's video This Paradox Splits Smart People 50/50 I received a ton of feedback particularly from the two-box camp and I simply could not convince anyone of why they were wrong.
That lead me to believe there must be some cognitive trap at play: someone must be not seeing something clearly. After a ton of debates, reading the literature, considering similar problems, discussing with LLMs, and just thinking deeply, I believe the core of the problem is recursive thinking.
Some people are fluent in recursivity, and for them certain kind of problems are obvious, but not everyone thinks the same way.
My essay touches Newcomb's problem, but the real focus is on why some people are predisposed to a certain choice, and I contend free will, determinism, and the sense of self, all affect Newcomb's problem and recursivity fluency predisposes certain views, in particular a proper understanding of embedded agency must predispose a particular (correct) choice.
I do not see how any of this is not obvious, but that's part of the problem, because that's likely due to my prior commitments not being the same as the ones of people who pick two-boxes. But I would like to hear if any two-boxer can point out any flaw in my reasoning.

Jump in the discussion.
No email address required.
Notes -
The problem with Newcomb's problem is that it basically involves time travel, and generally underspecifies how that time travel works. Consider a similar problem:
Time 1: you discover box 1 with 1,000,000 points
Time 2: you discover a box 2 with 1,000 points
Time 3: someone shows up claiming to be a time traveler shows up and says that if you hand him box 2, he will multiply it by 1,000, and then go back in time to put it in box 1. Actually, he claims, that's where box 1 came from all along and if you don't give him the 1,000 your box 1 will disappear.
Assuming you are rational/selfish, whether you say yes or no very much depends on whether he's telling the truth. If the problem carefully specifies that he actually is a time traveler telling the truth, and time travel does work this way, then obviously you should give it to him (one box). If this happened in real life, I would not give him anything and two box, because my prior on time travel existing is less than 1/1000 and he's just a liar trying to con me. If the problem is not careful and is ambiguous about his truthfulness then people's answers are going to depend on their trustfulness, suspension of disbelief, or just general attitudes towards how willing they are to buy time travel in a hypothetical logic puzzle.
Actual Newcomb's problem is basically the same as this in that decisions you make in the future affect things in the past, and the being making the boxes has to have time travel powers in order to guarantee a 100% success rate (though not all version of the problem specify this precisely, maybe it just has a 99% success rate, or a vague but high success rate) The reason people so confidently disagree is that in any well-specified version of the problem the answer is obvious, but in any vague under-specification it's ambiguous to which well-specified version people will round it to. This is the exact same reason the Monty Hall problem is controversial as well. It's not merely there being a counter-intuitive answer, it's that the problem specifications are very volatile and people keep leaving important details ambiguous that they shouldn't.
No it doesn't. That's what two-boxers claim in order to fit the problem into their view of reality, but that commits an appeal to incredulity fallacy.
That's not the Newcomb's problem: 100% success rate is never specified, it's "almost certainly". That means close to 100%, not 100%.
You are saying it's not possible for Omega to have such accuracy unless the future affects the past, but you don't provide any justification for that. You are basically saying: "I don't see how X is possible, therefore X is not possible". That is not a valid argument, that's an appeal to incredulity fallacy.
That is precisely why I devised my sunscreen problem. Your argument is the same as saying: "I don't see how efficacy against skin cancer and eating ice cream could be causally related, therefore they are not causally related".
Just because you don't see how Omega could predict your choice almost certainly without backwards causality doesn't mean that it can't.
This is why I complain about it being underspecified. If omega can be wrong then the entirety of the problem hinges on when/how/why it can be wrong. If it's possible for someone to get away with two boxing and get both boxes, and you can put yourself in that scenario, then you can win by two boxing. If omega attempts to minimize its failed prediction rate, maybe you can employ a mixed strategy (flip a very slightly weighted coin) which randomizes and then you could one box with probability 50.01% and two box with probability 49.99%, causing omega to predict you will one box, and you always get the one box plus almost half the time you get a bonus. Can it predict coin tosses before they're made? Can it predict radioactive decays? This is not mere psychology. I'm not saying it's impossible for someone to cold read you and make educated guesses. If I read psychological profiles on people I could guess that sneakier, greedier, more disagreeable people are more likely to two box while straightforward, naive, or chill people are more likely to one box, and probably get like a 70-80% success rate. Is that what omega is doing? Because then I'm just screwed: I overthink things and seem like a two boxer and if I bit the bullet and decided to one box I would end up getting nothing because it would false guess me as a two boxer.
Literally none of this is explained in the premise. The problem very much depends on information is not present. If I give you "MathWizard's Paradox" and say
"There are two boxes. The left box has some money. The right box has a different amount of money not equal to the left box. You only get one box, which one do you pick?"
This likewise is going to lead to disagreement (or would, if people cared and tried to argue about it). If I added a whole bunch of window dressing to disguise the obvious stupidity of this problem, a bunch of superficial characteristics that made it seem more interesting and less obvious, it wouldn't change the underlying symmetry and lack of information. I have, in my head, decided how much money is in each box. There is a correct answer. But I haven't told you enough information for you to deduce it, and there are infinite variations of this problem, half of which have the opposite correct answer.
It's not that I can't see a way for this to happen, it's that I can imagine a dozen hypothetical ways it could try to do this, and half of them let me two box anyway while half of them don't.
But why can't the problem just be that it's a very good predictor but it's not doing anything supernatural and the way it does the prediction is simply unknown to you? You need a clear specification to know if it's possible to trick the oracle. But even if it's possible, if there's no reasonable way for you to deduce how you would trick the oracle then strategically the solution is still to just be a 1-box pre-committer.
Probably. But "there is no reasonable way for you to deduce how you would trick the oracle" is usually not explicitly spelled out in the original problem. It's just left unmentioned, and left as an exercise to the reader as to how or whether it might be tricked.
"A man comes at you with a knife and demands 1000 points from you. If you refuse he will try to stab you and if you get stabbed you lose 10,000 points. What do you do?" Is a question one might ask and argue and debate about. But it is not a logical math puzzle. It depends on empirical facts about the real world and the specific person being asked the question (how good are you at martial arts.) This is not interesting, the rules are not well-defined, and there is not a concrete definitive and objectively correct answer that is true independent of who is answer it.
That is false, the original formulation of the problem says very explicitly: "all this leads you to believe that almost certainly this being's prediction about your choice in the situation to be discussed will be correct".
If the prediction "will be correct", then you cannot trick it. Furthermore, Robert Nozcik goes on to say "You have no reason to believe that you are any different, vis-a-vis predictability, than they are.".
This thinking precisely aligns with my hypothesis: two-boxers think they are special, they have free will, they have the ability to choose otherwise, they somehow exist outside the system, thy can be the exception regardless of how unlikely that is.
That's not how time works. Sequential games in game theory are formally defined such that at each time step, players can make decisions that depend on events that have happened in the past, but not in the future (or present if there are turns with simultaneous moves from different players). A complete strategy profile for each player is defined a series of decisions and contingencies that cover every possible game state so it's always possible to know what they would do at any point. At time 1 I do X, with no conditions because nothing else would happen. At time 2 the other person does Y if I x or Z if I Y. At Time 3 I do A if they Y and B if they Z. I can't say "at time 1 I do X if they Y" because they haven't Y yet. Which means that whoever goes last IS special, because they get the last decision that nobody can respond to. That's how time works in real life, that is how sequential games that happen over time are defined in standard game theory.
If Y does or does not happen at time 2, then you can only condition on Y after time 2. If you claim to be conditioning on Y and it's time 1 then that's a contradiction. You are either lying, or have invented a brand new version of Game Theory and decision-making theory, which requires literal books to establish, not a couple of sentences at the beginning of a problem.
Two boxers think they are special because you told them they were. When you say "Omega can predict you" you are saying "you are not special." When you say "Omega leaves and then you make your decision afterwards" you are saying "you are special." One boxers believed your first sentence more, two boxers believed the second more. Either way you are lying because they can't both happen simultaneously. If Omega can respond to your decision then it goes after you. You have mathematically described the scenario "you pick one box or two, and then if you picked one box Omega puts more money in the big box", and then appended the statement "Omega goes before you." I am reminded of Yudkowsky's Parable of the Dagger. You can say whatever words you want to say, even if they're self-contradictory. They're just words. But then they no longer map to anything real or meaningful.
No it is not.
You didn't bother reading my essay, did you?
All this is explained very clearly in it.
Wrong.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Let's put a number on it -- what successful prediction rate would Omega need to have for you to consider taking both boxes? Depends how badly you need a thousand bucks I guess?
IF Omega's predictions have an independent p probability of being wrong, and the ratio of the big box to the small box is R, then two boxing is worth it if p > R/(R+1), which for the original problem where the big box is 1000 times larger would mean that you should only two box if p > 99.9% , meaning it's almost always wrong. Which obviously makes sense. The extra box is a thousand times smaller, so only worth it if you are risking less than 1/1000 chance of the first box.
The problem is not so much that the problem is mean and unfair and not letting me two box because I'm willing to risk my one box. The problem is that it's not specified to be random. It's not specified at all. You can't solve logical and mathematical problems that aren't well-specified. You can shrug and say "I dunno, If I don't know what's going on I guess one boxing seems more likely to work out for me." If I make Mathwizard's Paradox V2 and say
"There are two boxes. The left box either has $0 or $10. The right box either has $0 or $10000. You can only pick one box to open and keep, which do you pick?" You'd probably pick the right box, because might as well, but this is not a logical deduction which must be a correct answer. Maybe Box 1 has money with higher probability because I'm more willing to give up $10 than $10,000. If I did this demonstration in real life in a classroom, the right box would guaranteed be empty because no way am I sacrificing that much for a demonstration. But if I haven't specified probabilities or anything within the problem then you can only guess. There is no unique solution because there is no unique problem, it's actually a broad class of problems that all satisfy the wording in the premises. Most models that satisfy the axioms of Newcomb's problem have one boxing as the correct solution, but some have two boxing as the correct solution. The notion that you are randomizing between all possible variants of an underspecified problem can't be mathematically resolved without applying a measure to that space, which is not generally how people solve logical problems, and itself still involves semi-arbitrary choices that can result in different answers.
More options
Context Copy link
Wolpert and Benford argue that the problem is ill-posed for almost any error rate, so it's not clear that stuffing in a particular number actually helps resolve the problem. I haven't spent all that much time with this problem yet, so I'm not going to commit to saying that I think they're right about this, but it jives with my intuition.
Generally speaking, in order to have a well-posed game, one must be very formal and precise in many details. Particularly things like order of operations, allowable policy spaces, information sets, and details around estimators. I've become more annoyed by estimators in various problems over time, even apart from the relatively minimal thinking I've done on Newcomb's problem. One of the greatest sources of my criticisms in reviews of submitted papers (or even when my collaborators come to me with a problem set-up and/or proposed solution) revolves around not taking sufficient care around estimators.
I do think that Wolpert/Benford at least suffice in arguing that there are at least two possible formalizations that are sufficiently well-posed. I think it's probably on someone else to either bite the bullet and say they are clearly choosing one form or the other... or to provide a sufficient alternative formalization that makes the details more clear.
Aside on Yudkowsky, relevant for the discussion below and my thoughts generally on these sorts of problems. I wouldn't be surprised if he has/had something in mind like what he did to the prisoners' dilemma problem, with the business about source codes and such. There could be a way to try to resolve Newcomb's problem in a similar fashion, but my perspective is that it would still be proposing a very specific formalization... and one that is not at all just a clear instantiation of the initial problem statement. I might go so far as to say that in the prisoners' dilemma case, he just proposed a different problem, with different policy spaces. Interesting in its own right, sure. Probably correct for that particular formalization of that particular version of the problem, sure. But also kind of just a different problem. In general, even minor tweaks to these aspects of the formulation can result in different games.
Similarly for Newcomb's problem, unless one takes the step of clearly laying out in a formal way exactly what they're going to specify for the domain of the problem (and then, I guess, argue that this is like, 'the one true interpretation of the original problem' or something), then I'm probably going to lean toward just thinking that the original problem is so informally stated as to be ill-posed.
I can't be arsed to do it, but it seems pretty trivial to plug whatever parameters you like into some simulation code and let it run a few million times? Some cases you will take one box but the alien predicted wrong and you get nothing; sometimes you take two and he was wrong and you get 1M + 1K. So long as he is mostly correct I don't see how the EV is not strongly driven by the cases where you pick the mystery box and get $1M -- no loss of free will required.
My sense tracks with that of @MathWizard. If you add some particular assumptions about the form of the problem, you can code it up, and likely, for a wide range of parameters, 1-boxing is higher EV.
I think the criticism of Wolpert/Benford is also similar in type. (Again, not really having spent sufficient time with it.) That is, they construct two possible interpretations. Either of them, you could just sit down and code. It may even be the case that for a wide range of parameters, EV still points to 1-boxing for both versions. However, my understanding of their claim is that those two codes will be very different. Even the strategy spaces are fundamentally different in their claim. And for a similarly wide range of parameters, the joint distributions will be contradictory. The point is not that the sign may be the same for this particular ratio of prizes; it's that there are just multiple contradictory ways to construct it.
Of course, someone could take the time and search out what ratio of prizes in the respective boxes produces maximum tension between the two interpretations, so that rather than having the two EV calcs mostly pointing in the same direction, we could maximize how often they conflict. That's kind of not the point of the critique, but I suppose it could be done if one found it necessary to really grok the difference between a well-posed and ill-posed problem. Though, like you put it, I probably can't be arsed to do it.
That said, I am almost motivated enough to try it (but it would probably have to wait a few weeks, and then, I'll probably be bored with it). I certainly don't know that we can for sure find parameters where the two possible games differ in terms of sign. If this problem was actually relevant to my research interests, I would absolutely just do it, because it's one where I have a vague sense of, "Wouldn't it have to be amazingly coincidental if the values were different, but the signs were always the same?" And when I sniff at the possibility that there could be an amazing coincidence like that, it's usually an indicator of a really interesting theoretical opportunity.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
No it doesn't. The entirety of intelligent agency relies partial knowledge, and you want to simplify Newcomb's problem to a problem of simply not having enough knowledge: "if only we knew how the system works then we could cheat the system".
The original formulation of the problem tells you explicitly that you have no reason to believe you will be any different, that is: you will not cheat the system.
It tells you explicitly your choice will be predicted almost certainly.
The problem states that if you do that, Omega will put nothing in the mystery box.
Yes it is.
And these hypothetical ways violate what is very explicitly stated in the problem.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link