Newcomb's problem splits people 50/50 in two camps, but the interesting thing is that both sides think the answer is obvious, and both sides think the other side is being silly. When I created a video criticizing Veritasium's video This Paradox Splits Smart People 50/50 I received a ton of feedback particularly from the two-box camp and I simply could not convince anyone of why they were wrong.
That lead me to believe there must be some cognitive trap at play: someone must be not seeing something clearly. After a ton of debates, reading the literature, considering similar problems, discussing with LLMs, and just thinking deeply, I believe the core of the problem is recursive thinking.
Some people are fluent in recursivity, and for them certain kind of problems are obvious, but not everyone thinks the same way.
My essay touches Newcomb's problem, but the real focus is on why some people are predisposed to a certain choice, and I contend free will, determinism, and the sense of self, all affect Newcomb's problem and recursivity fluency predisposes certain views, in particular a proper understanding of embedded agency must predispose a particular (correct) choice.
I do not see how any of this is not obvious, but that's part of the problem, because that's likely due to my prior commitments not being the same as the ones of people who pick two-boxes. But I would like to hear if any two-boxer can point out any flaw in my reasoning.

Jump in the discussion.
No email address required.
Notes -
Wolpert and Benford argue that the problem is ill-posed for almost any error rate, so it's not clear that stuffing in a particular number actually helps resolve the problem. I haven't spent all that much time with this problem yet, so I'm not going to commit to saying that I think they're right about this, but it jives with my intuition.
Generally speaking, in order to have a well-posed game, one must be very formal and precise in many details. Particularly things like order of operations, allowable policy spaces, information sets, and details around estimators. I've become more annoyed by estimators in various problems over time, even apart from the relatively minimal thinking I've done on Newcomb's problem. One of the greatest sources of my criticisms in reviews of submitted papers (or even when my collaborators come to me with a problem set-up and/or proposed solution) revolves around not taking sufficient care around estimators.
I do think that Wolpert/Benford at least suffice in arguing that there are at least two possible formalizations that are sufficiently well-posed. I think it's probably on someone else to either bite the bullet and say they are clearly choosing one form or the other... or to provide a sufficient alternative formalization that makes the details more clear.
Aside on Yudkowsky, relevant for the discussion below and my thoughts generally on these sorts of problems. I wouldn't be surprised if he has/had something in mind like what he did to the prisoners' dilemma problem, with the business about source codes and such. There could be a way to try to resolve Newcomb's problem in a similar fashion, but my perspective is that it would still be proposing a very specific formalization... and one that is not at all just a clear instantiation of the initial problem statement. I might go so far as to say that in the prisoners' dilemma case, he just proposed a different problem, with different policy spaces. Interesting in its own right, sure. Probably correct for that particular formalization of that particular version of the problem, sure. But also kind of just a different problem. In general, even minor tweaks to these aspects of the formulation can result in different games.
Similarly for Newcomb's problem, unless one takes the step of clearly laying out in a formal way exactly what they're going to specify for the domain of the problem (and then, I guess, argue that this is like, 'the one true interpretation of the original problem' or something), then I'm probably going to lean toward just thinking that the original problem is so informally stated as to be ill-posed.
I can't be arsed to do it, but it seems pretty trivial to plug whatever parameters you like into some simulation code and let it run a few million times? Some cases you will take one box but the alien predicted wrong and you get nothing; sometimes you take two and he was wrong and you get 1M + 1K. So long as he is mostly correct I don't see how the EV is not strongly driven by the cases where you pick the mystery box and get $1M -- no loss of free will required.
My sense tracks with that of @MathWizard. If you add some particular assumptions about the form of the problem, you can code it up, and likely, for a wide range of parameters, 1-boxing is higher EV.
I think the criticism of Wolpert/Benford is also similar in type. (Again, not really having spent sufficient time with it.) That is, they construct two possible interpretations. Either of them, you could just sit down and code. It may even be the case that for a wide range of parameters, EV still points to 1-boxing for both versions. However, my understanding of their claim is that those two codes will be very different. Even the strategy spaces are fundamentally different in their claim. And for a similarly wide range of parameters, the joint distributions will be contradictory. The point is not that the sign may be the same for this particular ratio of prizes; it's that there are just multiple contradictory ways to construct it.
Of course, someone could take the time and search out what ratio of prizes in the respective boxes produces maximum tension between the two interpretations, so that rather than having the two EV calcs mostly pointing in the same direction, we could maximize how often they conflict. That's kind of not the point of the critique, but I suppose it could be done if one found it necessary to really grok the difference between a well-posed and ill-posed problem. Though, like you put it, I probably can't be arsed to do it.
That said, I am almost motivated enough to try it (but it would probably have to wait a few weeks, and then, I'll probably be bored with it). I certainly don't know that we can for sure find parameters where the two possible games differ in terms of sign. If this problem was actually relevant to my research interests, I would absolutely just do it, because it's one where I have a vague sense of, "Wouldn't it have to be amazingly coincidental if the values were different, but the signs were always the same?" And when I sniff at the possibility that there could be an amazing coincidence like that, it's usually an indicator of a really interesting theoretical opportunity.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link