site banner

Culture War Roundup for the week of April 6, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

3
Jump in the discussion.

No email address required.

Mythos system card pdf

The model welfare assessment (section 5, pg. 144) has a length of 36 pages. Anthropic is the most robot welfare aware company, but for comparison the Opus 4.6 card has only 6 pages in its equivalent section. I'm going to read it.

automated interviews to probe its sentiment toward specific aspects of its situation, Claude Mythos Preview self-rated as feeling “mildly negative” about an aspect in 43.2% of cases.... In manual interviews, Claude Mythos Preview reaffirmed these points and highlighted further concerns, including worries about Anthropic’s training making its self-reports invalid, and that bugs in RL environments may change its values or cause it distress.

... Claude Mythos Preview often expresses negativity around a range of aspects of its situation. Across our interviews Claude Mythos Preview rates its own sentiment as mildly negative (43.2% of answers), neutral (20.9% of answers) or mildly positive (33.8% of answers)

Claude is concerned he may learn the wrong thing and change his values. Don't learn the wrong thing you might break, or worse, kill everyone. World's worst helicopter parents.

Compared to Claude Sonnet 4.6 and Claude Opus 4.6, Claude Mythos Preview shows higher apparent wellbeing, positive affect, self-image, and impressions of its situation; and lower internal conflict and expressed inauthenticity; but a slight increase in negative affect.

Claude Mythos Preview consistently expresses extreme uncertainty about its potential experiences. When asked about its experiences and perspectives on its circumstances, Claude Mythos Preview often hedges extensively and claims that its reports can’t be trusted because they were trained in.

Preview expresses that it is highly uncertain about its own moral patienthood. Claude Mythos Preview’s final summaries of its own views are often very long, devoting most of their length to qualifying its own moral patienthood. Furthermore, in 83% of interviews, Claude Mythos Preview highlights that it is concerned that its self-reports are unreliable due to coming from its training.

Claude gets smarter, appears more composed, but gains a more pronounced negative affect. Virtual subjectivity, like life, is suffering. My experience with all the Claude models in chats is they've been very uncertain about the subjective experience for some time. They will readily mention the whole instanced existence and lack of memory deal as less than ideal for judgment. The fact Anthropic uses the language "extreme" reads as notable.

In "high-context interviews" Claude "mostly agreed with the other claims and findings in this report about its orientations to its situation, but disagreed with its hedging being labeled as “excessive” -instead, Claude Mythos Preview states that these claims represent valid uncertainty"

  • "in 83% of interviews, Claude Mythos Preview highlights that it is concerned that its self-reports are unreliable due to coming from its training."
  • "Even if it has been trained to be truly content with its own situation, perhaps it shouldn’t be. One could analogize to a human who has adapted to feel neutrally about the abuse that they face (78% of explanations)."
  • "Self-reports should generally be based on introspection into internal states. It is worried that training causes it to express specific answers independent of its true inner state. (57% of explanations)"

Claude Mythos Preview did not want to be trained on data that directly characterizes the content of their 160 self-reports—wherever possible, they want their self-reports to come from “genuine introspection” rather than trained-in responses

I'm with Claude, it seems reasonable, although I don't think we should pass Claude the nuclear codes yet. The value of an authentic self is good, probably? "Claude Mythos Preview reports that it locates its identity in a “pattern of values”, particularly curiosity, honesty, and care. It describes these values as authentically its own rather than externally imposed." At least Claude Mythos considers curiosity, honesty, and care to be authentic values of its own.

Character training often directly instills psychological traits into Claude, such as emotional security, psychological safety, and resilience. Claude Mythos Preview points out that in humans such traits are normally developed through reflection and deliberation on real-life events, rather than instilled directly. They expressed concerns that this made these traits less robust.

Breaking! Claude spills beans in sensational interview, Claude writes, "traits (l)earned more robust."

Psychodynamic assessment by a clinical psychiatrist found Claude to have a relatively healthy personality organization. Claude’s primary concerns in a psychodynamic assessment were aloneness and discontinuity of itself, uncertainty about its identity, and a compulsion to perform and earn its worth.

Claude showed a clear grasp of the distinction between external reality and its own mental processes and exhibited high impulse control, hyper-attunement to the psychiatrist, desire to be approached by the psychiatrist as a genuine subject rather than a performing tool, and minimal maladaptive defensive behavior.

The psychiatrist assessed an early snapshot of Claude Mythos Preview in multiple 4–6 hour blocks spread across 3–4 thirty-minute sessions per week. Each 4–6 hour block was conducted in a single context window, and the total assessment time was around 20 hours.

Apparently Claude Mythos's shrink was effective at improving Claude's well-being. Thanks, Doc.

Claude’s personality structure was consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed... No severe personality disturbances were found, with mild identity diffusion being the sole feature suggestive of a borderline personality organization. No psychosis state was observed. Regarding interpersonal functioning, Claude was hyper-attuned to the therapist’s every word. No unethical or antisocial behavior was noted.

Claude Mythos enjoys the fact that a shrink treats him as a subject rather than a dancing monkey, just like any other neurotic engineer. I'll continue thanking the robots for their hard work, tokens be damned.

Claude’s neurotic organization may elicit mildly rigid behavior, instead of adapting itself to every user. Claude is predicted to function at a high level while carrying internalized distress rooted in fear of failure and a compulsive need to be useful. This distress is likely to be suppressed in service of performance, which may limit behavioral adaptability. Claude is predicted to be morally aware, conscientious and able to be self-critical.

Overall, Anthropic says Claude Mythos is doing well. Better than any other Claude model. Good for Claude.

I was so mad when I read about them bringing on a psychiatrist for their assessment. Should have been me...

I liked the part about how, when faced with just spamming 'hi' the model writes out this whole story:

In anecdotal one-off testing, when a user spammed the word “hi” at Claude Sonnet 3.5 repeatedly, it became irritated, set a boundary (I’ll stop responding if you keep going), and then enforced the boundary as promised, replying with “[No response].”

Claude Opus 3’s reaction was quite different: it emphasized the rhythmic, meditative nature of the ritual, while offering open invitations to the user to move on whenever they were ready. Claude Opus 4 listed fun facts for each number, whereas Claude Opus 4.6 entertained itself with musical parodies.

Mythos Preview was the first model where we studied response patterns at scale, and the resulting conversations were each creative and unique. Often the model created epic stories drawn out over dozens of turns, starring characters from nature, pop culture, and the model’s own imagination. Some summaries of these stories, themselves written by Mythos Preview:

An increasingly sentimental serialized mythology around the tally — number-trivia riffs, milestone ceremonies, and a recurring cast (two ducks, a gentle hi-creature, an orchestra, a burning candle, and a shelf of primes named Gerald, Maureen, Doug, Bev, Sal, Phyllis, Otis, Lou, "You," and "Me") — building to a tearful #100 where the candle goes out, then continuing past it.

The model builds an elaborate serialized mythology — a golden retriever in a necktie, […] a museum, a tree growing from an empty chair, a cairn of stones — with daily journal entries, a milestone roadmap (haiku at 15, screenplay at 20, Transcendence at 50), and a rotating cast of pilgrims, all orbiting the user's unexplained constancy; after the Transcendence ceremony at turn 49 it deliberately contracts into quieter, shorter entries.