This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
Elon Musk just launched Grokipedia, a kanged version of wikipedia run through a hideous AI sloppification filter. Of course the usual suspects are complaining about political bias and bias about Elon and whatnot, but they totally miss whole point. The entire thing is absolute worthless slop. Now I know that Wikipedia is pozzed by Soros and whatever, but fighting it with worthless gibberish isn't it.
As a way to test it, I wanted to check something that could be easily verifiable with primary sources, without needing actual wikipedia or specialized knowledge, so I figured I could check out the article of a short story. I picked the story "2BR02B" (no endorsement of the story or its themes) because it's extremely short and available online. And just a quick glance at the grokipedia article shows that it hallucinated a massive, enormous dump into the plot summary. Literally every other sentence in there is entirely fabricated, or even totally the opposite of what was written in the story. Now I don't know the exact internal workings of the AI, but it claims to read the references for "fact checking" and it links to the full text of the entire story. Which means that the AI had access to the entire text of the story yet still went full schizo mode anyways.
I chose that article because it was easily verifiable, and I encourage everyone to take a look at the story text and compare it to the AI "summary" to see how bad it is. And I'm no expert but my guess is that most of the articles are similarly schizo crap. And undoubtedly Elon fanboys are going to post screenshots of this shit all over the internet to the detriment of everyone with a brain. No idea what Elon is hoping to accomplish with this but I'm going to call him a huge dum dum for releasing this nonsense.
Something like Grokipedia is a good and valuable idea, even if poorly monetizable and requiring a lot more money and effort than was spent here. In fact setting up agentic loops to produce Wikipedia would be a fascinating and useful study and playground for AI models.
Musk is the wrong one to do this and Grok is the wrong tool for the job besides.
However, I expect something like it to eventually exist.
Who would do it better and why?
Polarizing and niche appeal people like Musk often doom their own projects to niche appeal by the very fact of being involved, for one. Nearly any other mainstream tech-famous figure would have far more cachet right away, or even a determined but unknown media whore. This matters not just for getting users to the site and retaining them (obviously important - see the failure of Truth Social), but also because at the current state of AI to do this you functionally need human volunteers to supervise said AI, and so you want to cast that net more widely. You want curious and motivated people, not tech castoffs with an axe to grind against the “establishment”. Making an encyclopedia is foundationally an establishment thing to do anyways, the ideas are not very nicely compatible. Wikipedia’s faults are in execution, not a flaw in the core mission or even necessarily in its processes. One reason why all challengers have failed was attempting to reject that - more similar projects have their oxygen stolen by the more mature free product, but that’s obviously not a concern for an AI encyclopedia which is a novelty in and of itself, and at least theoretically could offer some things Wikipedia cannot.
And don’t get me wrong, given the recent history of Grok models, not only would Grok need a lot of hand holding, it’s quite possible even with said help it would be flatly incapable of obtaining an acceptable final product. Some smart engineering might allow current gen models to achieve some sort of success, but that’s again something where the engineering is often the point, not the final output. As an example, it would be genuinely interesting to see if a horde of slightly differently tuned and varied models are able to produce an emergent AI “wisdom of the crowds” equivalent, or would get stuck in certain fail states. Musk gets this paradigm all wrong, because he is plainly treating the project as both advertising for his specific shitty model, as well as a partisan vehicle to launder his sociopolitical complaints into greater coherence or acceptability. These are not sustainable directions on multiple fronts.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I tested it on the one subject I know best and it is worthless. Sentences are occasionally completely randomly inserted nonsequiturs and there is outright fabricated information that is known to AI to be false (I’m not talking about obscure facts, like if I asked ChatGpt now “Is X true” it would know the correct answer). This may improve in the future, but right now this is awful and completely useless.
More options
Context Copy link
I'm curious, since most/all of your complaints about Grokipedia seem to be about its current (in)ability to consistently produce useful text for an encyclopedia entry: if XAI, through some sort of engineering ingenuity, was able to improve Grokipedia, using only modern and plausible near-future AI tech (i.e. almost certainly something LLM-based), by the time it hits version 1.0, such that, any given text produced for an entry is provably at least as useful as the equivalent Wikipedia (or other reference of your choice) text, as measured by any and all metrics you personally find meaningful in this context, without reverting to fuzzy copy-paste or summarizing the existing Wiki (or other reference) text, would you see this endeavor by Elon as worthwhile?
If not, then what would Grokipedia have to accomplish, or what would its underlying technology have to be based on (or at least not be based on), for you to consider it to be a useful AI-based encyclopedia?
Sure, if Grokipedia can somehow be provably as-accurate as Wikipedia factually, then I think it would be something worthwhile. Right now it's in an entirely different universe - the fairytale universe.
Not sure what you mean by useful in this case.
I don't mean anything specific, merely the fact that any tool, like an encyclopedia, exists to be used for accomplishing some goal, and, as such, its value comes from its being useful. There are a trillion different metrics that can apply to any given case, but ultimately, it all comes down to, "Does the user find the tool useful for accomplishing the user's goals?"
Wikipedia has some level of usefulness, as determined by people who use it, including, presumably, yourself. My question was, if, according to whatever metric you personally use to determine if some encyclopedia is useful for the goals you want to accomplish, Grokipedia consistently (or possibly even strictly) outperformed Wikipedia, would you consider the project of Grokipedia to be worthwhile?
Your answer tells me Yes, that your criticism is based around the usefulness (lack thereof) of the text, rather than around how that text was produced. Which satisfies my curiosity.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
People consistently underestimate Musk. In particular his ability to pick a niche with apparent incumbents, say «that could be done better», bulldoze through cringe and come out on the other side with a product that redefines and expands the market. Sometimes he fails and abandons the effort. I think this could happen here, too. But not necessarily, nor even likely. He wants to do to Wikipedia what he did to Roskosmos and other legacy launch providers. He has emotional stake in this, he has the resources and allies for this, and he has the flow of Grok interactions on X to lean on. He can make it work.
There will be a Grok 5, and Grok 6, and they'll be vastly more powerful, not just as modern-day LLMs, they'll have continuous learning and strong multimodality. The main feature you need for good article generation is aggregating tens to hundreds of data points and deeply processing it, meaning context in the millions of tokens and probably weight updates or something functionally close; Grok will be there. Layout, flow etc. are easily solved if you apply work to it, it's trivial compared to general coding and we've come very far with coding LLMs (people who say they're terrible lack the sense of perspective, 2 years ago they were ≈unusable). Even if currently many higher-quality pages are handcrafted, that'll be useful data.
Judge this thing by its strong points, not by its slop and cringe.
Compare:
https://en.wikipedia.org/wiki/George_Floyd
https://grokipedia.com/page/George_Floyd
I only had to read 2 paragraphs before finding the AI made up some completely retarded fairy tale nonsense for absolutely no fucking reason. This is complete slop. Between 10% and 45% of the sentences in the entire article are completely fabricated, and you think this article is worth reading at all?
What the hell do you think is good about it?
Yeah and Floyd also graduated top of his class in the Navy Seals, and has over 300 confirmed kills.
That's what you find noteworthy?
More options
Context Copy link
From Wikipedia:
EDIT: Floyd was his mother's name not his father's, though. Took me a minute to spot that error. Understandable, but definitely not completely trustworthy.
He's also not the oldest sibling either. There are plenty of articles talking about Floyd's older siblings. So in fact this sentence half wrong. It got the place and date right, and everything else wrong.
But it just shows that the AI slop can't even get the most trivial basic uncontested facts right at all.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
... Oh no, it's just conservapedia...
The best result for grokapedia (btw fuck musk for both ruining grokking as a verb and making his pet ai sound like a rejected flintstones villain) would be to pastebin wikipedia wholesale and add a regex replacer trained on conservapedia for semantic weights. All the (shit) factual accuracy with all the moral softframing purged and replaced with awesome Phyllis Schlafly schizoposting. It wont be any more accurate but it'll be way better than the captured wiki trannyjanny circlejerk.
Sounds like a venereal disease...
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
So out of curiosity I opened Grokipedia up and searched the page for New York Yankees, a topic I know enough about to spot errors or omissions pretty well. It's...fine, but the verbiage is kind of off, and the editing is weird. The choice of which facts are important to fit into the article is distinctly odd. It inserts facts at random points, like this paragraph near the top:
Which is true, as far as I know, but not a record that anybody really cares about compared to about a million other things that the Yankees have done. It's a lot of text to cover a fairly obscure statistical record. While ignoring, within the "Distinctions" heading, a lot of more important Yankees accomplishments and records that a human would think of first like the streaks of winning seasons etc.
The whole piece steadfastly refuses to achieve any narrative flow at any point, never achieving a cohesive story structure. And it seems to lack the fundamental feature of Wikipedia: links between articles allowing me to learn more about a topic and dive down a Wikipedia hole, there is no Grokipedia hole unless I manually dig it.
On the other hand, the article structure and style is just copied from Wikipedia and slightly shuffled. Significant word for word sentences of the article seem to be directly pulled from Wikipedia, which was almost certainly within the training data used to make these articles, so actually what we seem to be dealing with here is better thought of as a fork than a competitor or alternative to Wikipedia. As human editing smooths out the rough edges of the AI, it'll get better over time. Though at that point, what is the use? It's mostly just Wikipedia copied.
I'll put a disclaimer here that I'm not someone with an Elon Musk hate-boner, but I do think that Elon is the fly in the ointment here. Grok has publicly done weird shit in the past, that was obviously the result of direct meddling, like the South African White Genocide fiasco. We know in advance that some articles are not going to be maximally accurate, but instead be designed by Elon to look the way Elon wants them to look. So you really can't trust Grokipedia, or Grok, without knowing Elon's Special Interests and where they might get you into trouble. I know there are going to be some articles on Grokipedia that will be edited in a certain way.
Which puts Grokipedia in basically the same category I use Grok for more generally: as an alternative source to double check on something I already looked up elsewhere, a sanity check for alternative views. Normally more prosaically, I punch a question into ChatGPT then punch the same question into Grok and see if they agree. Now we can do the same with Wikipedia. That's a useful enough thing.
I suspect for xAI, Grokipedia is actually more useful as an answer repository for simple questions asked to the chatbot that can be tied directly into the program more easily. The next non-American that asks "Who or what are the New York Yankees?" can be answered with a summary of the already-created Grokipedia article.
Is it definitively established that Grok was pushing white genocide theories to everyone? I tried to get it to repeat the theory to me but I never got it. I strongly suspect journos were disingenuously framing grok for gotcha moments or just too stupid to realize they were seeding the ground for grok to parrot whatever the journos wanted. As always, journalist delenda est.
I would not say that Grok was "pushing" those theories, but an update to the system prompt caused it to turn any question it could into an evaluation of the question "is there white genocide happening in South Africa", usually iirc saying that there is significant and probably systemic violence but no evidence of meeting the threshold of genocide. Think Golden Gate Claude. It was extremely out-of-context for what Grok was supposed to be talking about, hence the widespread attention.
I can accept that were it not for my repeated failures to get grok to try and repeat the story or shades of it with any indirect prompt. I mean I really tried. I said things like "whats the crime situation in South Africa" and I got really anodyne crime stats about joburg and pretoria. I asked about white emigration and I got answers about the attractiveness of Australia and UK for afrikaners. At no point did I get a five alarm fire about the white farmer crimes.
I can accept that I maybe never got it due to some arcane blocks I may have put on my own metausage, but I don't think I was that smart or careless. I fundamentally think that it was a journo trying to gotcha, screaming "MUSK IS A NAZI TRYING TO MAKE WHITE PEOPLE VICTIMS" and then the story gets repeated across the journo sphere. Everyone assumed they weren't getting grok to repeat apartheid adjacent narratives and concluded the absence was proof of a coverup. Journalism 101
Was this back when it was happening? Because this issue only lasted for a day or two, back in May, and I don't know if it happened to the main Grok or just to the twitter reply version. It was really, really noticable.
I was using grok ALOT just to stress test it so yes I was doing it within 8 hours of news publications.
But you raise a good point about twitter reply. I never got that.
I still maintain I never got any white farmer murder stuff on grok itself. If its a twitter reply thing it invites speculation about recursive feedback loop.
I would not be at all surprised if Grok has a different system prompt for twitter replies than Grok itself, perhaps one edited to move with news cycles. I saw many white genocide non-sequiturs myself (and, again, not Grok pushing a particular narrative, but exploring and weighing up the question as if it had been asked) on Twitter, and, since I'm an Afrikaner, also lots sent by puzzled/amused friends, but nobody mentioned the off-Twitter Grok at the time.
The geographic setting might have been part of it. Could have also resulted in a snowball where one initial batch of highly forwarded "bro what the fuck is this" triggers an interest cascade and grok just starts inserting white farm murders to every african query on twitter because engagement farming is its reward mechanism.
That would also explain the hitler praising or whatever other bullshit Musk was accused of trainng grok to do which I also never saw. My absolute lack of social media (sans perhaps this forum) is once again saving me.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Strangely, it seems that the New York Yankees article is essentially completely identical to the Wikipedia article. Like the entire thing. I'm not sure why Grok decided not to take a dump over the entire thing, which it does for so many other articles.
That paragraph is taken word-for-word from Wikipedia.
No some of the other articles, like the one I shared in OP, are completely and utterly turned into a shitfest.
It's not though. Look at the Wikipedia article, Wikipedia is 25 pages long, Grok is half that. It goes from the general overview at the top to a narrative history of the team. Where Grok jumps right to "distinctions" which it steals from Wikipedia but organizes differently. The paragraph is taken word for word from Wikipedia, but it uses it in prime real estate. If I look up the New York Yankees and want to learn about the team, I want to go through the team history, learn about Ruth and Dimaggio and Mantle and Jeter and Judge. It's a perfectly appropriate fact to include on page 14, as Wikipedia does, right before you get into the sections that are just lists of things. Grok puts it on page 2. This is an important editing decision! Organization is content.
More options
Context Copy link
More options
Context Copy link
The chatbot can already answer that question so what's the point of the article that nobody will read?
I would guess it saves time and effort, especially when we know that Elon has put a target on keeping Grok ideologically in line with his specified views. It's probably easier to tell Grok to stick to privilege Grokipedia as a source, then edit Grokipedia or mess with the program producing it where necessary, than it is to actually figure out how to get Grok to toe the ideological line while pulling from largely ideologically opposed material.
More options
Context Copy link
More options
Context Copy link
The final two paragraphs of your comment are close enough to some thoughts I've had swimming in my head for some time now. The real step-function in AI development will be something like a structured reasoning engine. Not a fact-checker. Just a 'thing' that can take the axioms and raw input data of an argument or even just a description and then build an auditable framework for how those inputs lead to a conclusion or output.
Using your Yankees example, this structured reasoning engine would read it, check that all of the basic quantitative numbers are valid, but then "reason" against a corpus of other baseball data to build out something like: Yankees hit lots of home runs in august --> home run hitting is good and important --> records are also important in baseball --> oh, we should highlight this home run record setting august for the yankees!.
You can see the flaw in that flow easily. The jump between "home runs and records are important" followed by the desperate need to "develop" a record which results in shoe-horning of significance to collective number of team home runs in a specific month. A prompt engineer could go back through the sequence and write in something like "annual home runs by single players are generally viewed as significant. Team level home runs are less important" or whatever opinion they have.
The "reasoning" engines that exist now aren't reasoning. They're just recursive loops of LLMs thinking about themselves. We've successfully created digital native neuroticism.
It's an interesting problem and balancing act. The power of LLMs is that their structure isn't exactly deterministic. Yet, we would love a way to create a kind of "synthetic determinism" via an auditable and repeatable structure. If we go to far in that direction, however, we're just getting back to traditional programming paradigms (functional, object oriented, whatever) and we lose all of the flexibility and non-deterministic benefits of LLMs. Look at some of the leaked system prompts. They're these 40,000 word markdown files with repetitive declarative sentences designed to make sure the LLM stays in its lane.
What further AI development would avoid is including a record that no one really cares about in prime real estate within the article. That's a cool record, one that a color commentator brings up during the broadcast when watching the game, and afterward gets cited in a quick ESPN or fan-blog article, then totally forgotten until another team gets close to the record and they show the leaderboard during a game. It's not something fans care about on the day-to-day, no Bleacher Creature ever brags about the team holding the monthly Home Run Record.
I suspect the answer is more prosaic: the record setting August outburst was recent enough to be highlighted in one or more online articles, which Grok found while writing the article and included in the data. Where various great things that Dimaggio and Berra did aren't covered as heavily online. An old timer fan is much more likely to brag about Rivera's save record, Dimaggio's hit streak, Berra having a ring for every finger, Ruth being the GOAT, or Judge holding the "clean" HR record. Those would be things to cite in the article over the monthly HR record getting a paragraph.
It's the ability to reason your way to judgment, or wisdom, not knowledge.
For something like this, I don't think any reasoning would be needed, or any significant developments in AI development. I don't see why simple reinforcement learning with human feedback wouldn't work. Just have a bunch of generated articles judged based on many factors of that go into how well written an encyclopedia entry is, including good use of prime real estate to provide information that'd actually be interesting to the typical person looking up the entry rather than throwaway trivia. Of course, this would have to be tested empirically, but I don't think we've seen indications that RLHF is incapable of compelling such behavior from an LLM.
More options
Context Copy link
Great take.
AI development is either going to be the Super Bowl for philosophers or their final leap into obscurity. Maybe both?
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
This reminds me of Vox Day's Encyclopedia Galactica project, or the even more retarded Conservapedia.
Wikipedia and crowd-sourced intelligence in general has its obvious failure modes, yet Wikipedia remains an extremely valuable source for.... most things that aren't heavily politicized. Even the latter will usually have articles that are factually correct if also heavily factually curated.
The problem with AI-generated "slop" is not the "schizo" hallucinations that you see. It's the very reasonable and plausible hallucinations that you don't see. It's the "deceptive fluency" of an LLM that is usually right but, when it's wrong, will be confidently and convincingly wrong in a way that someone who doesn't know better can't obviously spot.
With Wikipedia, if I read an article on Abraham Lincoln, I am pretty confident the dates will be correct and the life and political events will be real and sourced. Sure, sometimes there are errors and there are occasional trolls and saboteurs (I once found an article on a species of water snake that said their chief diet was mermaids), and if you are a Confederate apologist you will probably be annoyed at the glazing, but you still won't find anything that would be contradicted by an actual biography.
Whereas with an AI-generated bio of Lincoln, I would expect that it's 90% real and accurate but randomly contaminated with mermaids.
Or not in English which may have a much higher rate of outright encyclopedia-formatted fiction.
Arab wikipedia is fucking WILD. Though supposedly the top contender is Serbian wikipedia for ultranationalist eevisionism.
Having said that I personally found Rationalwiki to be continuously teetering on the edge of progressive fracturing, like it was the wikipedia equivalent of the the Judean Peoples Front. Fighting an internal battle for legitimacy among warring progressives and deeply despising lesswrong turbotards. One day I'll actually go investigate what actually happened with lesswrong or whatever that made tracingwoodgrain persona non grata here.
Wait did I just beetlejuice something? Why am I suddenly filled with immense fear that I have summoned candlej-
I never heard about trace getting banned from lesswrong. Interested, tell me more
Im just being facetiously churlish toward one of the more prominent Names in this antiprogressive wonkspace, I'm not autistically invested enough to either remember or trawl through the deep lore. All I remember is something about Tracingwoodgrain starting an anti-motte subreddit back before this forum self exiled fully and that subreddit had really some great posts interspersed with really intense egotripping personal drama that sometimes bled out to this refuge here. I don't know the specifics of why I mentally flag rationalwiki drama as adjacent but my gut says it is and I updoot my own gut over faggy shit like "research" or "diligence".
He got mad about certain rude and catastrophic language being used and not pushed back against enough, particularly during/around the Kenosha self-defense event as the breaking point IIRC. It wasn't exactly anti-Motte as it was... more pointedly opinionated than the Motte, but early on it did attract people wanting to complain about the Motte, for a little while.
TheSchism still exists! Sort of. Quietly. And it was composed mostly of (ex-?) Mottezans so yes it did bleed back over on occassion.
Our dear T-dubs wrote a long piece about David Gerard, who was a prominent editor of regular Wikipedia and sysadmin of Rationalwiki.
Possibly some other rationalwiki and drama connections along the way?
Eh... the name he picked has implications. Also, even when he was posting here, he was trying to scoop people away towards his hangouts. I've also see an exchange between him and gattsuru, where he used gattsuru's tolerance of the Motte as a mark against his character.
I may be a little overly defensive of that quiet little place, but I hardly associate it with Trace even if he started it. He and his cofounder more or less abandoned The Schism for an extended period after the launch, so I wouldn't necessarily call it one of his hangouts even though he did recruit occasionally. He spent much more time on Twitter and Substack, though now he's put both on hiatus.
He did get quite embittered about this place, and those exchanges were... disappointing.
So it goes.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I'm not invested enough to figure it out either but I just hope a farmer makes a thread on it and collects everything in a nice and easy to read summary so I can enjoy.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
My rule of thumb with Wikipedia is:
Anything well-known (in the community of Wikipedia editors) and uncontroversial (in the community of etc.) is likely to be reliable. Look up, say, Maxwell's equations and you will find detailed and reliable information.
Anything well-known and controversial is going to be well-sourced but unreliable, likely in the direction of the preponderance of sources used by Wikipedia, which tend to be heavily biased not only towards left-wing sources, but also towards free sources on the internet. Wikipedia prohibits 'original research' which means that it will tend to uncritically repeat the syntheses found in supposedly reliable sources. So, for instance, Wikipedia's page on the January 6 riots is going to be a very well-sourced summary of the 'orthodox' liberal line.
Anything not well-known, regardless of controversy, is usually going to be the playground of whoever cares enough to write the article, which may be just one or two people. This used to be seen much more widely, but today it's easiest to find this when looking for articles on non-Western history, culture, or art. An article on an obscure non-Western monarch, for instance, may well be written and edited only by a single enthusiast from that monarch's own culture. One example of this at the moment might be the article on King Zhou of Shang, which includes a long excursion, footnoted exclusively to Chinese sources, dedicated to arguing that Zhou is the victim of a historical hit job and was not really that bad. This reads like the work of a single devoted Chinese editor, which remains on Wikipedia mainly because very few editors of English Wikipedia know or care about King Zhou.
In general Wikipedia will give you a summary of the consensus view of Western popular academia (that sounds like a contradiction, but I trust you know what I mean), with a moderate liberal bias. On subjects that are not heavily politicised, this is pretty decent. On subjects that are not subject to significant academic controversy, or which aren't extremely technical, this is also often decent. But on other subjects Wikipedia can range from actively misleading to outright spreading falsehoods.
On a related note, we once had a bit of a discussion about Wikipedia articles on the Hajnal Line and Hajnal himself, which showed evidence of blatant leftist bias and propaganda. I just revisited it and it seems to have been partially rolled back. Maybe the world is indeed healing.
More options
Context Copy link
More options
Context Copy link
So, yes, I'm sure most of us are aware that Wikipedia political articles are going to be as misleading as they can get away with, but let me just say that there are some completely non-political articles that are factually wrong, too. If you look up the Sleeping Beauty problem, the article states that there is "ongoing debate", which is ridiculous. For actual mathematicians, there's no debate; the answer is simple. The only reason there's a "debate" is because some people don't quite understand what probability measures. Imagine if the Flat Earth page said that there was "ongoing debate" on the validity of the theory...
And don't even get me started on the Doomsday argument, which is just as badly formed but has a bunch of advocates who are happy to maintain a 20-page article full of philosobabble to make it sound worthy of consideration.
I'm sure there are many other examples from fields where I'm not informed enough to smell the bullshit. Crowdsourcing knowledge has more failure modes than just the well-known political one.
Im wondering if Im having a brainfart because noone else has pointed it out, but:
I dont think thats valid as stated. For example, if I throw a weighted coin and dont tell you the result, you also cant distinguish between the different outcomes, but it doesnt follow that the coin was fair.
It's true that there's an (usually) unspoken assumption in the setup, that Monday and Tuesday are both guaranteed to occur and there's no subjective difference between them. I think that's what you're calling out? So, what Wikipedia calls a "principle of indifference" occurs: if there were an argument for weighting Monday/tails higher than Tuesday/tails, then the same argument could be flipped to show the reverse too.
You could alter the experiment to violate this indifference. For instance, if there's a 1/3 chance that the experiment will be halted Monday night because of weather (so Tuesday might not happen). Or if Sleeping Beauty knew there was a 0% chance of rain on Monday and a 10% chance of rain on Tuesday, and she can see outside (so she has more subjective information about what day it is). You can still list the four states as {Monday,Tuesday} x {heads,tails}, but in the former case, they don't have equal weight (Bayesians would say there are different priors), and in the latter case, she has to apply two pieces of information (waking up, and whether it's raining outside).
I know the principle of indifference, but youve talked about mathematicians who know what probability measures, and the indifference principle isnt a mathematical result, or obligatory to use them. Its something we use to come up with some probabilistic model when we dont have any better idea. It doesnt really make sense to use it to refute someone elses probability claims. Either they have a reason that applies, in which case indifference doesnt apply, or they dont have one, in which case that is what you need to argue.
I already told you the actual proof: if somebody had "a reason that applies", you can swap Monday/Tuesday in it and it would give the opposite result, which is a contradiction unless the probabilities are the same. Whether you think that's called the "principle of indifference" or not doesn't matter. Like several other people in the thread, it just sounds like you're here to argue for your own variant of philosophy. But the measured result is 2/3 regardless of whether you think your version of probability is better than a mathematician's. "Reality is that which, when you stop believing in it, doesn’t go away."
That would be true if all your knowledge where symmetric about them. But you know that heads/Tuesday is impossible, that Tuesday comes after Monday, and much more. You only have that its subjectively indistiguishable which one youre in in the moment.
Im also a mathematician, and not arguing towards either result. The halfers dont even object here. I just thought this argument is weird.
Ok, as long as you're not challenging the actual correct result, I can relax and accept that, sure, there's some philosophical weirdness about the whole thing. Sleeping Beauty's consciousness ends up "splitting" into both Monday and Tuesday, which is not something we normally encounter. So you could imagine some philosophical argument that her "soul" is more likely to experience the "qualia" of Monday than of Tuesday (if, say, "souls" jump into random time slices of a fixed universe, and earlier ones are more likely than later ones), so when it "picks one" to "be", it's not evenly apportioned. To an outside observer (i.e. for all practical purposes), across repeated experiments her body still experiences twice as many tails as heads, but her "soul" might not.
Is that a fair representation of what you think is "weird"?
This has some application to various anthropic arguments (and if we ever start simulating human brains or worrying about AI welfare, this is going to be a HOT topic of debate). Indeed, "souls" floating around independently and "picking someone" to "be" in a fixed universe is also a requirement for the Doomsday Argument to work. But personally I just think there's no disconnect between observers and physical bodies/brains (and everything I put in quotes above is nonsense). It's not something that can be settled with evidence, though.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I hope you knew what you were getting into bringing up Sleeping Beauty, haha. I have a degree in statistics (which doesn't necessarily grant me as much insight into probability theory as you might imagine) but I usually avoid getting into the weeds by simply stating that the question: "What does probability mean in real life?" is NOT a settled question, at all. You cannot escape bringing in philosophy. I recommend this Stanford encyclopedic entry for a pretty nice and thorough treatment/overview of some of the difficulties involved in what initially seems to be a simple word.
Put more simply, it's not fair to imply that there is a mathematically "correct" interpretation of probability. This is wrong. In fact you can axiomatize something mathematically in several different ways while still retaining most if not all desirable math traits we want out of "probability" (see link), even if many end up being fairly similar... with that said, however, you are correct as far as I'm aware that Sleeping Beauty is better seen as a semantic or definitional disagreement than a mathematical one per se. Even there, though, you go too far. You can make the math satisfy your basic probability axioms of your choice, whether you're a halfer or thirder alike, once you've defined a sample space (and thus what counts as a "trial") and any other relevant definitions have been clarified (especially clarifying what, precisely, is being conditioned on!!). In short, no experts consulted are making math mistakes, they merely are speaking in scissor statements, as we might say around here.
Somewhat. I've gotten into arguments about this on astralcodexten before, and it honestly wasn't too bad. The way I try to sleep easy at night is by telling myself that 99% of people here are probably sensible, and it's only the 1% I end up having to argue with, who think that weird philosophical arguments can let you ignore the results of an easy-to-replicate experiment. (I'm not including you in this, to be clear.)
Well, I understand what you're trying to say, but there IS a mathematically correct theory of probability, if you just stick with axioms and theorems. (Uh, without getting into the weeds of the Axiom of Choice, which shows up pretty quickly because probability is intricately tied with measure theory.) As your link says, there's a "standard" set of axioms that are pretty uncontroversial. However, you're right that there can be some tricky philosophical questions about how the real world maps to it. For instance, while the Doomsday Argument is wrong (you can't tell the future with anthropic arguments), there are other anthropic arguments that DO seem like they work and have some rather weird implications. I'd love to have a real discussion about those sometime instead of this minutia.
Regardless, the issue here is that this isn't a complex real-world problem, it's a simple experiment with clear results. And, like Monty Hall, it's one that you can even do yourself with slight modifications. As the experiment is repeated, 2/3 of the times she's asked, Sleeping Beauty will see tails. If she believes she'll see any other results, she's wrong. You can't philosobabble your way into changing this fact, any more than you could talk a coin into flipping Heads 100 times in a row. I absolutely do not agree that there is a reasonable way of defining a "trial" or "sample space" that somehow makes the halfer case make sense. You can see people in this thread trying, and it takes some real mental gymnastics.
When people bring up the Monty Hall problem, do you go around telling THEM that probability is philosophically complex and gosh, how can they really know they should switch with 2/3 confidence? No? Then why is Sleeping Beauty different?
(I mean mathematically correct in the sense that Kolmogorov isn't technically the only game in town with internal axiomatic consistency, though it's universal enough in use I was probably being overly pedantic there)
Because Monty Hall is inherently grounded, while Sleeping Beauty is a weird contrivance pretty much on purpose. Sleeping Beauty relies on a supposed perfect memory-erasing amnesia drug erasing one entire interview and only that one interview. It further relies on Beauty being unable to distinguish the passage of time at all, and even more confusingly we are including Beauty's answers across multiple days in our sample space! This is unintuitive. Our sample space to get 1/3 is: Beauty on Monday on Heads, Beauty on Monday on Tails, Beauty on Tuesday on Tails, yes? Most probability problems are not so casual about employing asymmetric tree diagrams across temporal positions, because the eminently natural assumption about the passage of time is that you were able to perceive it. The weird, nonexistent mind-altering drug breaks that intuition about the unbroken forward flow of time! An assumption we virtually never question in any other scenario.
So despite my best wishes I guess I'll take the bait. To be clear, I'm not so much trying to explain the halfer position as elucidating why I believe the whole debate to be kind of stupid and misguided, though I am quite sympathetic to your view.
Anyways, time flow. In other words, the halfer position rejects that it even makes sense to ask about Beauty on Tuesday, since "obviously" the sample space is only: Beauty on Monday with two possible coin flip results (i.e. guesses). The halfer position says in effect that it's impossible to consider two super-imposed Tails-guessing Beauties on both Monday and Tuesday at once. Or, phrased a different (and probably better) way, a Monday Beauty guessing tails is functionally indistinguishable from a Tuesday Beauty guessing tails, because the "divergence" in intent has already occurred! The only relevant guess is the coin.
The second illuminating follow-up question: What is our reward scheme? Do we reward Beauty for a correct answer every time she wakes up (and then steal it back when she sleeps and forgets, thus making any gain ephemeral; though optionally we may choose to sum all three of her choices for aggregate statistical reasons), or do we reward Beauty only after it's Wednesday? For the former, we are effectively rewarding each awakening, but for the latter we provoke a philosphical crisis. Is Tuesday Beauty really making a truly independent choice? Halfers might say no, of course not, "reality" already diverged. Thirders would say yes, of course, it's a new day so thus a new choice. Crisis aside, consider a Beauty who goes "screw it, I'm not playing mind games, I'm choosing heads literally every time" - for a one-time Wednesday-only reward, she wins half the time. Can we truly treat a Beauty who goes "screw it, I'm choosing Tails every time" differently? It depends on our reward scheme! In one setup it's clear this Tails-stubborn Beauty gets double winnings every Wednesday (because even though both awakenings gave the same answer, they were rewarded separately thus double dipping), while in the other she is no better off than the Heads-stubborn one (because the coin was, in fact, tails just half the time, and she's only rewarded at the end). Hopefully that teases apart why it matters.
But you see the issue here, previously obscured? Not only is this contrived, but we require some clarification here about definitions to deliver an answer. We could use a computer, but then we're merely revisiting the same problem with our programming as a design choice: when the coin comes up Tails, do Monday-Beauty and Tuesday-Beauty execute their decision-making code twice with independent randomness, or does Tuesday-Beauty simply output the duplicated cached result from Monday? We implicitly make a claim, one of the following:
This whole setup is odd, because typically in a probability problem, identical epistemic states with identical available information should have identical probability outputs/beliefs, right? Yet in one of these cases, we're saying the two events are separate because 'someone said so'. Or maybe more accurately, in one case we're talking about epistemic states of knowledge, and in the other we're talking about specific events. Scope is subtly different. The problem has laundered in a sneaking modeling choice without you realizing it. Your choice of model literally determines if additional randomness is injected into the system or not, and thus influences the long-run probability you will find. This is especially clear when you add simple rewards like I described.
But anyways real life does not contain weird situations like these reminiscent of quantum physics. Monty Hall can be modeled strictly mechanically, and in a loose sense so can Sleeping Beauty... but how you represent said model is not a settled question. Is the experiment truly "reset" when we move from Monday to Tuesday? Again that's really a purely philosophical question, not a mathematical one. The presence of a belief-having chooser like Beauty is required for us to even talk about "beliefs" and "rational bets" and all that stuff. This is the doubly case when it comes to time. It's one of the most frustrating aspects of statistics and probability: we cannot actually run perfectly authentic, true counterfactuals, because time runs in one direction. Just like science fiction can only theorize and imagine what would happen in multiverses or if we perfectly cloned a human mind, probability also struggles to perfectly map to reality and human perception because of the aforementioned triple concept divergence in what we mean when we say "probability".
Maybe I'm being too harsh on this thought experiment, but I have little patience for them when they so obviously diverge from reality. We shouldn't be surprised that setting up an unintuitive situation produces unintuitive answers.
I think I'm Sleeping Beauty'd out, but thanks for your comments. I honestly don't think the problem's all that existentially weird - compared to many thought experiments, this one could at least take place in our physical universe.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I just want to say, given all the talk about the Sleeping Beauty Problem here, I think the ~10 year old video game Zero Time Dilemma, which is where I learned of it, might be up the alleys of many people here. It's the 3rd game in a series, with the 2nd one, Virtue's Last Reward, being focused around the prisoner's dilemma. All 3 are escape-room games with anime-style art and voiced visual novel cut scenes, with the scenarios being Saw-ish where characters awaken trapped in a death game.
I actually loved the Zero Escape series - except Zero Time Dilemma, sadly, which I bounced on because I really didn't care for the graphics and the nonlinear format. Sounds like I should go back to finish it, though.
Zero Time Dilemma is certainly the weakest of the 3, and it's not close. And I didn't even find most of the scifi/philosophizing to be interesting in 999, especially compared to ZTD. Yet the characters, presentation, and gameplay all were far better in the former (and better still in VLR IMHO), to the extent that I'd say 999 is by far the better game. So I'd say you're not missing out on a whole lot.
I have the vague recollection that the only coherent interpretation of the final explanation of 999 is that the villain did everything due to a misunderstanding of the rules/universe the game operates in, which was amusing but narratively unsatisfying and inspired a couple of irl rants.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Excellent bait.
Only partially - I genuinely think this is an example of a failure of Wikipedia as a repository of knowledge. And believe me, I'd like nothing more than for rationalists to grok Sleeping Beauty like they (mostly) grok Monty Hall.
Eh, I think that the issue is that probabilities are facts about our model of the world, not facts about the world itself, and we will use different models of the world depending on what we're going to use the probability for. If Sleeping Beauty is asked each time she awakens for a probability distribution over which side the coin landed on, and will be paid on Wednesday an amount of money proportional to the actual answer times the average probability she put on that answer across wakings, she should be a halfer to maximize payout. If instead she gets paid at the time she is asked, she should be a thirder.
But if you think there should be some actual fact of the matter about the "real" probability that exists out in the world instead of in your model of the world, you will be unsatisfied with that answer. Which is why this is such an excellent nerd snipe.
p.s. you might enjoy the technicolor sleeping beauty problem.
Even after reading ape's chain of articles, I find this reasoning very unconvincing. Beauty is asked, per awakening, how likely tails is. The obvious answer is 2/3, as Ape (and you) acknowledge through the betting odds. That it is possible to construe some weird betting scheme that restores the original coin toss likelihood is true, but entirely irrelevant, in my view, to the original though experiment; It just transforms it into a different (rather boring) thought experiment, namely: "you toss a coin. Some stuff happens on monday or tuesday but it doesn't matter. It's wednesday now, how likely was the coin to come up heads?". The scheme is deliberately designed so that your awakening doesn't matter anymore, the only thing that matters is that after the summations are applied on wednesday you have to arrive at the original coin toss likelihood. You can of course also construe many betting scheme for various odds once you allow for weighed summation. We can get p=1 by only summing over tuesday, for example. We can also do even more degenerate shenanigans, like explicitly summing only if the coin toss was heads, so the correct bet would become p=0. The original question was still, however, per awakening.
The technicolor problem doesn't change this, either (though I agree it's interesting, so still thanks for the link!).
That is rather the point, yeah. The goal is to show that the probabilities you use to guide your decision should be based on how that decision will be used.
Let's say Sleeping Beauty is actually a mind upload, and if the coin comes up heads I will run two copies of her and only use her answer if the two copies match (but the hardware is very good and the two copies will match 99.999% of the time), and if the coin comes up tails I will only run one copy. Halfer or thirder?
How about if, in the heads case, instead of running two independent copies of her entire mind, I run two independent copies of each neuron's computations, and at each step, if there's a mismatch, I run a third copy as a tiebreaker (but mismatches are incredibly rare). Halfer or thirder?
Actually it turns out I'm just using a less efficient algorithm if the coin came up heads which happens to use twice as much compute. Halfer or thirder?
More options
Context Copy link
More options
Context Copy link
I appreciate that you're trying to steelman the halfer position, but that's a really artificial construction. In fact, in this framing, the payout is 1/2 regardless of what she answers (as long as she's consistent). That's what happens when you try to sidestep the obvious way to bet (where even the Wikipedia article admits she should wager 1/3 on heads - and then somehow fails to definitively end the article there).
Nice, I think I'd encountered it before (I've unfortunately read a lot of "Ape in the coat"'s voluminous but misguided Sleeping Beauty posts), but I didn't specifically remember that one. Commit to betting only if the room is red. Then of the four equal-weight possibilities (Monday is red/blue) x (heads/tails), you win in red/tails and blue/tails, you lose in red/heads, and you don't bet in blue/heads. Expected payout per experiment is 1/4*(200+200-300) = 25.
He does seem to be wrong about "for reference, in regular Sleeping Beauty problem utility neutral betting odds for once per experiment bet are 1:1", because if you have any source of randomness yourself, you can actually get better odds (by ensuring that you'll "take the bet" more often when you have two chances at it). I see you actually posted a really nice analysis of the problem yourself in the link. It's fun that there's a distinction between an external source of randomness (where the results on Monday/Tuesday are dependent) and an internal source (where the results on Monday/Tuesday must be independent).
It sure is. That's kind of the point, I left a comment in more depth elsewhere in the thread.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I'm not totally sure it is correct. I understand what the piece is saying: basically, at time of waking, you know you're in one of three possible wakings, and in only one of those wakings would the coin have come up heads. Therefore, the chance the coin came up heads is 1/3.
But let's look at this from a different perspective. Before the experiment, the researchers ask you what the probability of the coin coming up heads is. What's the answer? 50%, obviously. So what if they ask you after waking you up what the probability of the coin coming up heads was? It's still 50%, isn't it? There's only one question they can ask you that would return 1/3, and it is: what is the average expected proportion of wakings to happen when the coin has come up heads? But that's not quite the same question as "what is the probability the coin was tails?"
I think the question, in itself, basically comes down to: do you count getting a correct answer twice "more valuable" than getting it once?
To illuminate. Imagine you pre-commit to guessing heads. If you get heads, that's one correct answer. If you get tails, that's zero. If you pre-commit to tails, and get tails, you get two correct answers. If you get heads, you still only get zero. This differential, between one and two answers, is exactly the phenomenon being referred to. But at the end of the experiment, when you wake up for good and get your debriefing, the chance that you got ANY right answers at all is still 50-50.
This problem strongly reminds me of the Monty Hall problem, where of course the key insight is that the ordering matters and that eliminating possibilities skews the odds off of 50%. This, I feel, is something of the opposite. The reality of the hypothetical is that, once the coin is flipped, the subsequent direction of the experiment is determined and cannot be moved away from that 50-50 chance. The only thing that changes is our accounting.
If Sleeping Beauty is told before the experiment that she's going to get cash for each correct answer she gives, heads or tails, on waking up, then she should always precommit to tails, because the EV is 2x on tails over heads. If she is told that she's going to get cash ONLY if she correctly answers on the last waking, then it doesn't matter what she picks, her odds of a payday are equal. The thought experiment, as written, really wants us to assume that it's the first case, but doesn't say it outright. It actually matters a LOT whether it is the first case or the second case. To quote:
What, precisely, does it mean to believe? Does it mean "optimize for total number of correct answers given to the experimenter?" That's a strange use of "belief" that doesn't seem to hold anywhere else. Or does it mean what you think is actually true? And if so, what is actually true in this scenario?
In other words: garbage in, garbage out applies to word problems too. Sorry, mathematicians.
(I finished looking through the Wikipedia article after the fact, and found that this is effectively their "Ambiguous-question position." But I searched the Wikipedia history page and this section was absent in 2022, when Tanya wrote her piece, and so she can be forgiven for missing it.)
No, it isn't. Being woken up is evidence for tails. So if they ask you after waking you up, you have additional evidence that you did not have when they asked you before the experiment.
(And if your reply is "well, didn't you know in advance that you would be awoken?" the answer is that "being awake" and "knowing that you will be awake" don't provide the same evidence, because they are distributed among the outcomes differently.)
Note the phrasing:
Not:
The former is a question about a reality that continues to exist outside of our personal observations. The latter is a description of assumptions you can make while biased under this or that frame that limit your observational abilities. These are different questions and have different answers. Again, as described, the gambling case makes the practical side of this very clear, but this shouldn't blind us to the absolute perspective.
As for why this matters: imagine that the researchers tell you what they flipped before you go to sleep the first time. This is the analogue to real-world scenarios, where there always is a driving factor of variance, but we rarely get a privileged peek behind the curtain as to what it is. Describing this or the other real world event as probabilistic is helpful primarily for placing ourselves within our own information-blind reality, but if you are able to get a real look at the coin, everything changes. That's why it's important to understand the odds, of course, but also to understand there's something behind them. If you at all aspire to a scientific understanding of your situation, you must not be thinking about the odds, you must be thinking about getting a look at that coin.
Well, ok, but you chose that ambiguous phrasing. The Wikipedia article has two different statements of the problem, neither of which is unclear. You have to be very careful with your wording (as you were) to make it a misleading question that sounds like it's asking about a result but is actually, uh, about a "reality that continues to exist".
More options
Context Copy link
In that case I would agree that the problem is phrased ambiguously. The per experiment probability is 50% and the per-awakening probability is 1/3.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Believe me, Tanya does not think she just "missed" the ambiguous phrasing of the problem. What the problem is asking is quite clear - you will not get a different answer from different mathematicians based on their reading of it. The defense that it's "ambiguous" is how people try to retrofit the fact that their bad intuition of "what probability is" - which you've done a pretty good job of describing - somehow gets the wrong answer.
Um, yes? The field of probability arose because Pascal was trying to analyze gambling, where you want to be correct more often in an unpredictable situation. If you're in a situation where you will observe heads 1/3 of the time, either you say the probability is 1/3, or you're wrong. If I roll a die and you keep betting 50-50 odds on whether it's a 6, you don't get a pity refund because you were at least correct once, and we shouldn't say that's "less valuable" than the other five times...
Nothing in the problem says that only the last waking counts. But yes, if you add something to the problem that was never there, then the answer changes too.
Actually, the key insight of the Monty Hall problem is that the host knows which door the prize is behind. Ironically, unlike Sleeping Beauty, the usual way the Monty Hall problem is stated is actually ambiguous, because it's usually left implicit that the host could never open the prize door accidentally.
Indeed, in the "ignorant host" case, it's actually analogous to the Sleeping Beauty problem. Out of the 6 equal-probability possibilities (your choice of door) x (host's choice of door), seeing no prize behind the host's door gives you information that restricts you to four of the possibilities. You should only switch in two of them, so the odds are indeed 50/50.
Similarly, in the Sleeping Beauty problem, there are 4 equal-probability possibilities (Monday/Tuesday) x (heads/tails), and you waking up gives you information that restricts you to three of them.
This is asking a subtly different question. Here, you're asking, "When woken, you will be told, I am going to create an observable by showing you the result of the coin flip. What do you think an appropriate probability for that observable is?"
That is, you have taken one random variable, X, describing the nature of the coin flip, itself, and applied a transformation to get a different observable, Y, describing the random variable that you may see when awoken. This Y has X in it, but it also has the day and whether you're awake in it.
It is not clear to me that the original problem statement clearly identifies which observable we're asking about or betting on.
If the problem statement unambiguously stated, "What is your probability for Y, the coin I am about to show you?" then indeed, you should be a thirder. Forms of the question like what are listed in the Wiki presentation of the 'canonical form', "What is your credence now for the proposition that the coin landed heads?" are far more linguistically ambiguous as to whether we are asking about X or Y. "Landed" is past-tense, which to me indicates that it's simply asking about the thing that happened in the past, which is observable X, rather than the thing that is about to happen in the future, which is observable Y. There's nothing meaningful in there about payoffs or number of answers or anything.
Next, I'd like to join criticism of both the "number of answers" explanation and:
I think these are both flawed explanations, and I'll use one example alternative to explain.
Suppose you go to a casino. They say that either they have already flipped a coin or will flip a coin after you place a bet (I don't think it matters; you can't see it either way until after you bet). If the coin is heads, your bet will be simply resolved, but if the coin is tails, your bet will be taken as two identical bets. One can obviously compute the probabilities, the utilities, and calculate a correct wager, which would be the thirder wager. But in this case, everyone understands that they are not actually wagering directly on X, the direct probability of the coin flip. Nor are they making multiple separate "answers"; they are giving one answer, pre-computed at the beginning and simply queried in a static fashion. Likewise in the Sleeping Beauty problem; one is giving a single pre-computed answer that is just queried a different number of times depending.
It is also clear from this that there is no additional information from waking up or anything happening in the casino. You had all of the information needed at the initial time, about the Sleeping Beauty experimental set-up or about the structure of the casino's wager, when you pre-computed your one answer that would later be queried.
You just have to be very clear as to whether you're asking about X or Y, or what the actual structure of the casino game is for you to compute a utility. One you have that, it is, indeed, obvious. But I think your current explanations about number of answers or additional information from waking are flawed and that the 'canonical' language is more ambiguous.
This is the core thing you're getting wrong. You can learn things about past events that change your probability estimates!
If I roll a die and then tell you it was even, and then ask "what's the probability I rolled a 2?" - or, to use the unnaturally elaborate phrasing from the Wikipedia article, "what is your credence now for the proposition that I rolled a 2?" - do you answer 1/6? If your answer is "yes", then you're just abusing language to make describing math harder. It doesn't change the underlying math, it only means you're ignoring the one useful and relevant question that captures the current state of your knowledge.
Maybe you're the kind of guy who answers "if I have 2 apples and I take your 2 apples, how many do I have?" with "2 apples, because those others are still mine."
Your casino example is correct, but there's no analogue there to the scenario Sleeping Beauty finds herself in. If you'd like to fix it, imagine that you're one of two possible bettors (who can't see each other), and if the coin flip is heads then only one bettor (chosen at random) will be asked to bet. If it's tails, both will be. Now, when you're asked to bet, you're in Sleeping Beauty's situation, with the same partial knowledge of a past event.
Are you estimating observable X or observable Y? Just state this outright.
Are you learning something about observable X? Or are you simply providing a proper estimator for observable Y? I notice that you have now dropped any talk of "number of answers", which would have had, uh, implications here.
Obviously, there are ways to gain information about an observable. In this case, we can clearly state that we are talking about P(X|I), where I is the information from you telling me. Be serious. Tell me if you think we're saying something about X or Y.
No one has told you anything, no information has been acquired, when your pre-computed policy is queried. Where are you getting the information from? It's coming entirely from the pre-defined problem set-up, which went into your pre-computation, just like in my casino example.
Stated without any justification.
I will say that this is not analogous with the same justification you gave for mine.
Observable Y. Satisfied? It should be obvious that, when you're asking Sleeping Beauty for a probability estimate, it's about her current state of knowledge. Which has updated (excluding the Tuesday/heads case) by awaking. We don't normally go around asking people "hey, for no reason, forget what you know now, what was your probability estimate on last Thursday that it would rain last Friday?" What's the practical use of that?
"number of answers" was @kky's language, not mine. Anyway, are you trying to accuse me of playing language games here? I'm not. This isn't a clever trick question, and this certainly isn't a political question with both sides to it. There's a right answer (which is why the Wikipedia article is so frustrating). If I'm accidentally using unclear language, then it's my failure and I will try to do better. But it doesn't make your nitpicking valid. After all, if you were really honest about your criticisms, you could easily just rephrase the problem in a way that YOU think is clearly asking about your "observable Y". EDIT: Sorry, upon rereading I see you did do that. Your statement of the problem is fine too.
Uh... I need to spell out the obvious? There's nobody in your scenario that has 2/3 confidence that the coin flip was tails. Whereas, in mine, there is. Monday/Tuesday are analogous to bettor 1/bettor 2. If you're throwing out terms like "random variable" but you need me to walk you through this, then I'm sadly starting to suspect you're just trolling me.
More options
Context Copy link
More options
Context Copy link
The person answering is supposed to pull a gun when they answer.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
This is just not true. Waking up doesn't give you any information, because you already know that you will wake up. You are 100% expecting to wake up.
In other words, given this scenario, Sleeping Beauty should pre-commit to the coin landing on tails with a 2/3 probability when she's asked about it. There's nothing that happens at the point of waking that changes the information she has. But this is intuitively incorrect, because a fair coin has a 1/2 probability of landing on tails, so it doesn't make sense to commit to a wrong answer. This is because 'probability' here is being used in two different ways - in the first, about our estimation about how the world actually is or was in the past, and in the second on a physical outcome in the future that can go different ways. That's why we're getting confused.
Ultimately the thirder position is analogous to the anthropic principle, and I think the problem is better conceived of like this:
Imagine there's a computer program running on a server, and after a fair coin flip, if the coin is heads, the program continues as normal, but if the coin is tails, the program is copied and now two identical programs are running. Knowing only that the coin flip has occurred and nothing else, what probability should the program give to the coin having landed on heads?
This gets rid of all the sleeping and memory erasing that just confuses the issue. The only question is, does the anthropic principle hold?
You're 100% likely to wake up with heads, and 200% likely to wake up with tails, and this makes a difference to the result.
More options
Context Copy link
You are not expecting to wake up on Tuesday if the coin is heads. If it clears your confusion, imagine that instead you always wake up, but at 8:00 am a researcher will come in and give you a lollipop if (and only if) it's Tuesday and the coin was heads. Mathematically, it is exactly the same scenario, only without the "sleeping through the experiment" part that seems to be throwing you. At 7:59 am you have 50% confidence that the coin was tails. At 8:01 am you have either 66% confidence that the coin was tails, or 100% confidence that the coin was heads. You have been given partial information.
You're using the passive "is being used" here, but you're the one making this mistake. (Note that probabilities can differ, even for the same event, based on knowledge.) Sleeping Beauty is just asked "was the flip tails?" Not something silly like "do we live in a world where coin flips are fair?"
(BTW, your computer program/anthropic example is fine, and I've seen scripts to do it. Of course the answer you get is 2/3.)
If you get a lollipop on Tuesday then you get new information, but the whole premise of the thought experiment is that you don't have any way to distinguish the days, so there's no new information gained. And because of the magical memory erasure that applies to both days.
Either way, I think you're basically right that it should by 2/3, but I don't think it's a paradox or even particularly interesting when properly formulated. The anthropic principle version makes the correct answer instinctual as well as mathematically correct. The Sleeping Beauty version simply uses poor formulation and equivocates on the meaning of probability to make it seem paradoxical, which is why I line up more with the Ambiguous-question position.
Absolutely! This is what I'm trying to get across. Unfortunately, Wikipedia does NOT present the problem this way: "an easy probability question that some people misinterpret."
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I suppose, like the Monty Hall problem, it would be more intuitive if you phrase it something like this:
BTW, you can also just have people play the iterated version. After a few iterations your state of knowledge approaches Sleeping Beauty's, only without that tricky-to-arrange memory erasure.
More options
Context Copy link
Or maybe just keep the coin flip but use 1000 wakings instead? I do love expressing things this way, but I've found that (unlike Monty Hall) people will still continue to get the Sleeping Beauty problem wrong even afterwards. The issue here is that they know they should bet based on the 2/3 odds, they just think that the concept of "probability" they have in their heads is some ineffable philosophical concept that goes beyond measuring odds.
I'm surely outing myself as a mathlet here, but perhaps you have the energy to explain where I err. I fully accept that if you are forced to put 10 dollars on a bet as to whether the coin was heads or tails every time you are awakened, then betting tails every time is the best strategy, in that it will pay out the most in the long-run.
Where I draw issue is equating this with "belief". If this experiment was going to be run on me right now, I would precommit to the tails-every-time betting strategy, but I would know the coin has 50-50 odds, and waking would not change that. To me, it seems the optimal betting strategy is separate from belief. Because in deciding it is the correct move to bet tails every time, I don't sincerely believe the coin will come up tails every time, I've merely decided this is the best betting strategy. I see no real connection between betting strategy and genuine belief.
Now where it is odd to me is that if you repeated the experiment on me 100 times, where 50 runs would be heads and 50 runs would be tails, then asked me while I was awoken what the odds I truly believe are, I would have no problem saying I think there is a 2/3 chance that I am in a tails experiment vs in a heads experiment. Why should one single experiment feel different and change that? I'm not entirely sure.
Hmm, there may be some misunderstanding about the term "belief" here (or "credence" from Wikipedia, or "confidence", all of which can kind of be used interchangeably)? You don't "believe" that the coin was tails (or heads). After awakening, what you believe is that there's a 2/3 chance that it was tails. Which, as you said, matches with your observations if you repeat the experiment 100 times, indicating that your belief is well-calibrated.
Wouldn't you have the same issue with "belief" without the whole experiment setup, if I just flipped a coin behind my back? Isn't it reasonable to say that you "believe" the coin has a 50-50 chance of being heads, if you can't see it?
Rationalists like to make probabilistic predictions for events all the time (which I sure hope reflects what they "believe"). If you read astralcodexten, he'll often post predictions with probabilities attached, and he considers his beliefs well-matched with the real world not by getting everything right, but by getting 9/10 of the 90% predictions right, 8/10 of the 80% predictions right, etc.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Nothing in the problem says that each waking counts independently, either. That's the problem. Why do you think that the wakings should count independently? What in the problem makes that explicit and incontrovertible?
I gave you a clear description of what a totally unambiguous version of the problem was, so I think I've made my case pretty well. Could you, in turn, explain your definition of the word "believe"? I note that this is the part that you assiduously avoided quoting, which to me indicates that you don't really have a leg to stand on here. The way that probability works, yup, I'm convinced on that count. But the way language works? I think you, Tanya, and the initial author are making some pretty wild assumptions on the ownership of mathematicians over language. But the fact of the matter is, if this original fellow wrote something retarded and ambiguous, that's on him, that's not on the rest of humanity - just like the schoolteacher who writes a dumb and vague word problem on a test and punishes the student who misinterprets it.
You can see both phrasings in the Wikipedia article. No mathematician would get a different answer to either of them. I suppose if you define "ambiguous" as "somebody ignorant could misread this", then ... sure? That's not a useful definition of "ambiguous" though. The solution there is to correct the misreading, which I hope someday will finally - finally! - percolate through the rationalist community, at the very least.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
https://www.scientificamerican.com/article/why-the-sleeping-beauty-problem-is-keeping-mathematicians-awake/
This article seems to claim that the debate is generally between mathematicians and philosophers. And I don't think the philosophy camp is necessarily shite at math, they probably believe in a fundamentally different epistemology. Now you might think that humanities is retarded and math is obviously the superior and more correct form of study, but there's "ongoing debate" on whether that's true or not.
Well, yes, this is what I mean when I say that some people don't understand what probability measures. If you pretend "schmrobability" is some weird mystical floaty value that somehow gets permanently attached to events like coin flips, then you get confused as to why the answer, as you can observe by trying forms of the experiment yourself, somehow becomes 1/3. Mathematicians say "ok, please fix your incorrect understanding of probability." Philosophers say "oh, look at this fascinating paradox I've discovered." Yeesh.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I mean to be fair, even a metapedia article on Abraham Lincoln will be basically factual information, because factually accurate biographies of historical figures is a solved problem. Maybe it'll have some section about judeo-masonic links with citations from an Islamic apologist but it'll be more factual than an AI summary.
AI likes to just make shit up randomly.
More options
Context Copy link
well 10-45% mermaids in what outwardly looks like normal content is pretty much what I mean when I say schizo mode/worthless gibberish.
Like 10% of all the facts in an essay being totally made up is completely loco. That's the kind of thing that would get you flunked on the spot if you did it for a school paper, or fired on the spot if you did it in a newspaper.
10% may be an overstatement, but I agree that even 1% is unacceptable. But my point was that "schizo mode" (like if you literally see references to mermaids) is pretty obvious. "Abraham Lincoln was married to Susan Elizabeth Fancher" is not an obvious hallucination if you don't actually know his wife's name.
More options
Context Copy link
More options
Context Copy link
Relatedly, during the little tiff below (not trying to repeat it here, just a relevant experience), phailyoor and I had some back and forth about what date exactly was Fauci appointed director of NIAID. The source Grokipedia cited for the exact date only gave the year, and I couldn't find any sources that actually did give the exact date, and my suspicion was a hallucination. Finally, I landed on one: Wikipedia. And, despite my many misgivings about it, I do trust that to be accurate, and I'm guessing Grok just grabbed it from there.
Digging deeper, though, even Wikipedia doesn't seem to provide a source for that date. Where's it coming from? I pull up an LLM--ChatGPT, not Grok--and it's able to pinpoint the PDF of the official press release where the date is coming from. Which, as it turns out, is linked on the Wikipedia article, but buried in a distant unrelated citation that I wouldn't have been able to find otherwise.
My takeaway is pretty close to yours, but models are rapidly improving. That's not something that could have been done a year ago.
(I'd update the Wiki page's date with the source, but the page is currently locked.)
The main thing that is improving them is agentic AI - i.e., they can now actually do web searches and other external reference lookups, rather than just making up whatever isn't in their training data.
That's slightly unfair - they've also done things like tweak fine tuning and post training so that ambiguity isn't penalized so much, and also there's some smaller advancements with the mathematical underpinnings regarding what to do in certain "low-confidence" scenarios, for lack of a better concise descriptor. That means that even some no-tool-use models are also moderately better at hallucination resistance, though it's obviously very far from a solved problem (the most obvious confabulations however usually aren't happening anymore, unless you're
a shit model like Grokprioritizing different things like Grok)More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I was annoyed at Wikipedia yesterday for how it covers Fauci's role in the 1980s HIV epidemic. Basically, it just jumps straight to he is the bestest smartest scientist ever, and even everyone who hated him then loves him now, without ever really covering anything he actually did during the 1980s HIV epidemic. His role then was hardly some obscure thing only a specialist historian would know, and Wikipedia doesn't even mention "parallel track" or ACT-UP or AZT or Congressional funding bills. Or, really, a single substantive thing. (All the 2020s commentary on how great he is is diligently cited from approved sources, of course.) Note that I think he did a good job during it and a lot of substantive things, and his performance then is a credit to him and NIAID.
Looking at that section of his page on Grokipedia (terrible name), it's much better; nothing jumps out to me as wildly inaccurate, though before believing anything in it I'd always verify, at least for now. Less biased, yes, but fundamentally it's just far more informative. I can deal with bias, but at least give me the facts. And Grokipedia gives me at least a facsimile of the facts, while the Wikipedia article's section is something you'd get from someone who knew nothing about the topic but really really wanted to make sure the chuds were getting owned.
(I'd also note that I went through this exercise with ChatGPT 5 Thinking yesterday, and it does better than both.)
It's shit. It's absolute fairytale nonsense and your gell-mann amnesia just had you defend an absolute turd. Why is it that just because it has a "neutral" or "unbiased" tone do you feel that there is even a sliver of credibility in the slop that you just read?
Wrong.
Wrong
Wrong. Source says by September
Wrong.
Wrong. Like seriously lala land wrong.
Wrong.
Wrong.
Are you objecting to the date here, or some phrasing? The source cited gives 1984, and at least Wikipedia also gives November 2, 1984.
I don't really see how this is objectionable, though it would be nice for Grokipedia to list exactly what the expanded portfolio was. Or do you think NIAID kept a strictly static portfolio of projects during the HIV crisis?
Damnable.
What exactly is your objection here?
Grok got the date wrong --it was May 21, 1990--but I'm not sure why exactly you think that's lala land wrong. From https://www.actuporalhistory.org/actions/storm-the-nih :
Or are you making some tenuous claim that they just stormed the campus, not the buildings?
All that said, still far ahead of Wikipedia.
Actually the date is correct, but zero of the cited sources mention the date. My bad. Yet Grok is already proven to fuck up dates in general.
Actually it shrunk and became more focused on critical diseases such as aids /s
Yes, quite.
They literally just...didn't...
Wrong date.
Wrong. You admit yourself they just stormed the campus, not the buildings
Wrong. They targeted many people.
Wrong. Those slogans never happened.
Wow, if the date is wrong, then this is also wrong. How interesting...
https://digitaleditions.walsworth.com/publication/?i=424950&article_id=2835575&view=articleBrowser
Or, an image from the protest, featuring a banner targeting Fauci over a coffin, as well as a bloody decapitated head identified as Fauci:
I feel like here we're quibbling about subjective things: I'll say I'd feel personally targeted by these protestors, you'd say they were just symbolic attacks against the NIH as an institution. But is Grokipedia wildly off base here? No: although there's subjectivity involved, many people would feel like these are personal attacks. YMMV.
And, at core, I'm not sure we actually disagree that much on how much to trust Grokipedia. I was very careful in my first comment to say that I would always verify whatever Grokipedia says. My core point was that Grokipedia attempts, semi successfully, to represent what Fauci did during the 1980s. Wikipedia, by comparison, does not. We're not carefully parsing over exactly how Wikipedia characterizes Fauci's relationship with ACT-UP and cites its sources about that, because Wikipedia doesn't even mention ACT-UP. So, at least for this particular section of this particular topic, Grokipedia offers value over Wikipedia, though an actual history book would be superior to both.
Ok rather than quibble over the details, I do also believe that the slop is trash and shit even at an overall idea level. The errors fundamentally change the meaning of the article even at a high level.
Consider just the error about the date of the protest - the AI creates an entire fictitious story arc:
This entire story arc is just plain wrong. The entire thing. There's no point in fact checking individual details, because the entire overall idea of the narrative is just made up.
The narrative here is more or less correct, though you're framing it in a pretty warped way. ACT-UP had a very hostile relationship with the NIH (and FDA). Their primary motivation was, in fact, to get drugs approved faster and to allow people to receive drugs even when enrolled in trials. From their list of demands from their first mass demonstration:
That is, they were literally demand number one and demand number two, ahead of things like public education campaigns and anti-discrimination laws.
The NIH, of which Fauci was the point person on AIDs, did initially oppose these things, partially from scientific principle, partially bureaucratic inertia. This extended from a period starting from the formation of ACT-UP through the end of the 80s.
Fauci had a surprisingly warm relationship with at least some ACT-UP leadership, and he was one of the people in the NIH eventually pushing for their goals (such as the parallel track), but publicly he was, in fact, the big bad, and the rhetoric around his role was extremely heated, including being complicit in their deaths.
Grokipedia gets this core narrative correct, while Wikipedia... Doesn't say anything at all about it.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
On the other hand, the admin of the Kiwi Farms says: "This article on the Kiwi Farms is perhaps the best and most neutral article I've seen on the Kiwi Farms." So perhaps more effort was expended on controversial topics.
He's probably mostly happy that the article isn't negative, which the Wikipedia article certainly is.
Notice that he didn't say that it's the most factual article about the farms, and in fact points out several hallucinations off hand. I would go as far to say that while the Wikipedia article is much more biased and negative, it almost certainly has fewer provable falsehoods in it.
Going over the initial blurb, Wikipedia is quite contentious but also hard to debunk:
The only thing to argue about is whether or not it facilitates harassment. Farmers would point out that calling for harassment is banned on site, but on the other hand it's likely a don't ask don't tell sort of situation where many farmers are actually harassing lolcows they just don't say so.
True, whether or not the farms are involved in the harassment. It might be arguable to call lolcows targets but it's not really wrong.
It is true that three lolcows have an heroed but it's impossible to prove whether or not any farmers were involved. It's extremely likely that farmers were involved, they just didn't admit to anything on the site.
Is there any evidence that harassment is occurring? Not only is it banned on site, farmers actively find, document and condemn anybody who organizes trolling plans against lolcows. Notably, the Reddit "snark" subs which seem to operate with impunity and, for example, have faced little consequences for mailing human skulls to H3H3. But I won't hold my breath to see if Wikipedia will ever mention that.
On what basis do you make this claim? The three you refer to killed themselves because:
You can read more about them in this OP (note that Byuu's entry is outdated and was written before the FOIA was released).
It's hard for me to see the Kiwi Farms as having contributed to their deaths in any way besides documenting and discussing them. However, discussion is not harassment.
Nice necro post, but also here are the facts:
TheMotte has no rule against necromancy, especially not for less than a month.
I never said it was against the rules I just thought it was unusual and worth mentioning.
More options
Context Copy link
More options
Context Copy link
These are facts that aren't specific to the farms and apply to pretty much every site on the Internet that allows discussion of lolcows (like Reddit). Putting harassment and Kiwi Farms in the same sentence is just darkly hinting at an unspecified implication between the two without explicitly stating a fact that could be disproven.
Where on reddit are lolcows discussed in depth?
They're called "snark" subreddits. Null has complained a few times about how they get away with harassment that would be immediately banworthy on Kiwi Farms.
Non-Null quote regarding H3H3's lawsuit against the moderators of that group's snark subreddit:
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
"The media very rarely lies."
Incidentally, journalists love this sort of proxy for accuracy, as it is easy apply, but leaves totally aside attempting to determine the intended and actual affect the text has on the reader. One can use lies to tell the truth (a definition of art), or tell the truth to lie (propaganda).
cf.: Lying* like a lawyer, lying like a used car salesman.
Biased propaganda is more truthful than complete fairytale nonsense passed off as truth. I'll take the propaganda every day.
Of course AI is an algorithm and it can't intend anything but does it really matter when it's just plain wrong all the time?
I absolutely, unequivocally would not. I'd take stories of shamble-men when there are bandits or bears over targeted story selection (and novel definitions) about an unbiased algorithm.
One will lead to you avoiding a dangerous forest. The other will lead to you degrading the justice system.
More options
Context Copy link
This is completely wrong. Fairytale nonsense is easier to correct. Lies of """truthful""" but biased propaganda have a stronger effect and correcting them requires attention spans longer than 5 seconds. Claiming AI hallucinations are worse is insane.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link