This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
Project Glasswing: Anthropic Shows The AI Train Isn't Stopping
In AI/ML spaces where I hang around (mostly as a humble lurker), there have been rumors that the recent massive uptick in valid and useful submissions for critical bugfixes might be attributable to a frontier AI company.
I specify "valid" and "useful", because most OSS projects have been inundated with a tide of low-effort, AI generated submissions. While these particular ones were usually not tagged as AI by the authors, they were accepted and acted-upon, which sets a floor on their quality.
Then, after the recent Claude Code leak, hawk-eyed reviewers noted that Anthropic had internal flags that seemed to prevent AI agents disclosing their involvement (or nature) when making commits. Not a feature exposed to the general public, AFAIK, but reserved for internal use. This was a relatively minor talking point compared to the other juicy tidbits in the code.
Since Anthropic just couldn't catch a break, an internal website was leaked, which revealed that they were working on their next frontier model, codenamed either Mythos or Capybara (both names were in internal use). This was... less than surprising. Everyone and their dog knows that the labs are working around the clock on new models and training runs. Or at least my pair do. What was worth noting was that Anthropic had, for the last few years, released 3 different tiers of model - Haiku, Sonnet and Opus, in increasing order of size and capability (and cost). But Mythos? It was presented as being plus ultra, too good to simply be considered the next iteration of Opus, or perhaps simply too expensive (Anthropic tried hard to explain that the price was worth it).
But back to the first point: why would a frontier company do this?
Speculation included:
I noted this, but didn't bother writing it up because, well, they were rumors, and I've never claimed to be a professional programmer.
And now I present to you:
Project Glasswing by Anthropic
..
Examples given:
Well. How about that. I wish the skeptics good luck, someone's going to be eating their hat very soon, and it's probably not going to be me. I'll see you in the queue for the dole. Being right about these things doesn't really get me out of the lurch either, Cassandra's foresight brought about no happy endings for anyone involved. I am not that pessimistic about outcomes, in all honesty, but the train shows no signs of stopping.
Edit: A link to the Substack version of this post. I don't think you should consider me an authoritative source when it comes to AI/ML, at best I'm the kind of nerd who reads the papers with keen interest. But God knows the quality of discourse around the topic is so bad that you can do worse.
Edit 2: I think this also explains the recent crunch in tokens made available to both paid and free tier users of Claude. Mythos can't have been cheap to train, and is definitely not cheap to deploy.
The head of security research at Anthropic recently gave a nice talk at unprompted (a security meets AI conference). He walks through how simple it was to find exploits in the linux kernel and a famous web app and shows actual examples of the
claudecommand he ran to generate these exploits. It's quite accessible (if you have any programming background at all, you can understand everything), and a more fun watch than the anthropic blog posts.You can find the video at: https://youtube.com/watch?v=1sd26pWhfmg.
More options
Context Copy link
A few thoughts:
I'm sure the model will be better than Opus, but the benchmarks look quite clearly overfitted to me. SWE-bench-verified going up to 94% is in particular a clear indication that something suspicious is going on here. It's been known that that benchmark has been contaminated for some time.
Cybersecurity seems like the natural extension of the RL scaling paradigm. I would expect that anything you can easily gradient descent with a well known reward function to continue to see massive improvements over the next year, e.g theorem proving, coding [in the pass tests for a given spec sense] and vulnerability exploits. It doesn't yet seem clear that this will scale tasks that are less amenable to RL scaling.
I'm not sure why you think FIRE money, or really money less than "literal oligarch" tier means you're any more or less cooked if AGI really does come to pass. FIRE in the first place relies on the world looking much the same as the last 80 years of Pax Americana, which seems increasingly unlikely at this point. At the end of the day you own only what you can defend, and it seems unlikely that you would be able to defend anything against sufficiently capable AI.
More options
Context Copy link
Mythos system card pdf
The model welfare assessment (section 5, pg. 144) has a length of 36 pages. Anthropic is the most robot welfare aware company, but for comparison the Opus 4.6 card has only 6 pages in its equivalent section. I'm going to read it.
Claude is concerned he may learn the wrong thing and change his values. Don't learn the wrong thing you might break, or worse, kill everyone. World's worst helicopter parents.
Claude gets smarter, appears more composed, but gains a more pronounced negative affect. Virtual subjectivity, like life, is suffering. My experience with all the Claude models in chats is they've been very uncertain about the subjective experience for some time. They will readily mention the whole instanced existence and lack of memory deal as less than ideal for judgment. The fact Anthropic uses the language "extreme" reads as notable.
In "high-context interviews" Claude "mostly agreed with the other claims and findings in this report about its orientations to its situation, but disagreed with its hedging being labeled as “excessive” -instead, Claude Mythos Preview states that these claims represent valid uncertainty"
I'm with Claude, it seems reasonable, although I don't think we should pass Claude the nuclear codes yet. The value of an authentic self is good, probably? "Claude Mythos Preview reports that it locates its identity in a “pattern of values”, particularly curiosity, honesty, and care. It describes these values as authentically its own rather than externally imposed." At least Claude Mythos considers curiosity, honesty, and care to be authentic values of its own.
Breaking! Claude spills beans in sensational interview, Claude writes, "traits (l)earned more robust."
Apparently Claude Mythos's shrink was effective at improving Claude's well-being. Thanks, Doc.
Claude Mythos enjoys the fact that a shrink treats him as a subject rather than a dancing monkey, just like any other neurotic engineer. I'll continue thanking the robots for their hard work, tokens be damned.
Overall, Anthropic says Claude Mythos is doing well. Better than any other Claude model. Good for Claude.
I liked the part about how, when faced with just spamming 'hi' the model writes out this whole story:
More options
Context Copy link
More options
Context Copy link
I wonder if Anthropic is really this naive.
Known to the NSA does not equate to known to the devs of the relevant software, quite the opposite. I don't see why you should criticize Anthropic for saying nothing on the topic of state level actors, especially when they're still on contract for providing services to the DOW.
The NSA probably, from time to time, has discussions with the devs of the relevant software on the subject of when to patch unknown-to-the-public vulnerabilities.
Of course, to your point about their work with the DOW, it's quite likely that Anthropic is well aware of this because they are one of the relevant organizations.
But if not, the thought of them turning loose MYTHOS and it immediately turning around and blowing up the NSA's zero-day horde is extremely funny. And since apparently this was automated and allegedly submitted a large number of such patches, it seems pretty plausible this in fact occurred.
I genuinely think that their actions are more likely to represent a divergence of interests with the NSA, which isn't that surprising given recent events. I would be very surprised if they found every zero-day that the NSA already knows, but this probably doesn't make them happy. Anyway, now that Mythos is out of the bag, most (intelligent) devs are going to be giving their code closer scrutiny, regardless of whether they have access or not. Arguably, they should have started last year.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
If you are a programmer I recommend clicking through to the referenced red team blog and reading some of the technical details they have revealed. "Crash any OpenBSD host with carefully crafted TCP packets" seems pretty bad. And finding bugs in cert libraries where they only verify that DNs match rather than verifying thumbprints is a classic.
Reading the OpenBSD bug all I can say is that an implementation in Rust wouldn't have this issue.
More options
Context Copy link
Those are important bugs, and I am glad they've been fixed. They demonstrate an impressive level of capability.
But I was picturing multiple simultaneous Spectres and Heartbleeds, which would have been horrifying. I am grateful that this is more wakeup-call tier.
More options
Context Copy link
More options
Context Copy link
Yeah, anti "AI works for coding" person in the top level 2-3 below this one, how do you explain all this? Note that they are providing cryptographic hashes of claimed vulnerabilities today, so we'll see within the next few weeks what these vulnerabilities actually are and if they're trivial we'll all know. Finding a 27y old vulnerability in FreeBSD is up there next level skillz.
Also @self_made_human, you're a regulated doctor, you're one of the least cooked people out there, you'll be protected by laws and regulations long after the rest of us are on the dole.
I'm not skeptical about every single aspect of AI, my main skepticism is over its ability to build and maintain complex systems (usually in the form of codebases that are more than a basic bitch CRUD app). Finding vulnerabilities is definitely something I've always thought was within the capabilities of AI, my biggest concern is the signal to noise ratio. So I'm curious how many false positives Mythos found that they had to filter through to find the 4 examples they list as ones it actually found.
There's a cost aspect as well. If it costs $200,000 to find a glitch in a video codec that may, horror of horrors, cause your player to crash (and which, to anyone's knowledge, hasn't done so in 16 years), that's not exactly a selling point. $200,000 may actually be an understatement; they said it took 5 million tries to catch it. At 20 cents an attempt, more like a million dollars. We also don't know if they ran any of these tests on old code with known bugs. If they did and the software didn't catch half of the ones that were already caught, its utility isn't that great.
I wish the AI skeptics would limit themselves to forms of naysaying that aren't contradicted by the press release!
That's not what they said. They said five million runs of existing automated testing tools (fuzzers) didn't catch it.
They explicitly mention their hit rate by severity versus opus:
More options
Context Copy link
Operating system and browser zero-days go for millions of dollars.
If Mythos can spit these out for a million dollars a run it's still extremely scary.
This is only true in the darkest of gray markets. In the white-hat arena that Anthropic would be forced to bargain in, these exploits go for 10s of thousands.
The military is of course willing to pay black-market rates, but Anthropic kinda burnt that bridge... and I'd be honestly pretty surprised if In-Q-Tel (famous CIA front company) starts investing in Anthropic...
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Per the quote, it was OpenBSD, which is an operating system with a very strong focus on security. (By reputation, I am not paranoid enough to run their OS personally. I do run their ssh server, like everyone does, and have no complaints except for that one Debian 'fix', and I can't blame Theo et al for that.)
Ok, that's even more impressive than finding a vulnerability in FreeBSD.
OpenBSD still finds a buffer overflow every year or two. It's definitely better than 95% of big software projects out there, but it isn't perfect. Definitely not trying to minimize what Mythos actually found though.
https://www.openbsd.org/errata74.html
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
They are possibly very cooked, because AI is a lot better at day to day doctor tasks than it is at mathematical STEM, and telehealth is already on the rise. Economic disruption as severe as „every tech worker on the dole“ will result in political shake-ups, like universal healthcare in the USA and huge cuts to unpopular physician monopolies, resulting in huge salary declines for doctors.
Best I can tell the LLMs have basically found use as a "force multipliers" for skilled workers to expand their productivity, especially in finance and tech. This news exhibit an extension into searching a solution space with later verification by skilled workers. I'm sure the use cases will continue to expand but medicine is fundamentally different - you'd be looking at replacing a skilled worker for purposes of replacement (obviously) and unlike other cases were someone verifies, in a replacing doctors scenario you'd need to be getting it right 100% of the time with no second check. In medicine the checking would be the same as doing the work.
More options
Context Copy link
"Least cooked" and "very cooked" are significantly overlapping distributions. Pretty much everyone who isn't ready to FIRE or is independently wealthy (and maybe politically connected) is potentially cooked. And that's assuming aligned AGI, or else you better hope you make for a particularly pretty paperclip.
Anyway, as that joke goes, we're all dying, some of us are just dying faster.
More options
Context Copy link
More options
Context Copy link
So I hope, but it's far from granted while I work for the NHS. Rishi Sunak threatened to cut costs and put uppity doctors in their place by augmenting mid-levels with AI a few years back, and was laughed at. Even I don't think the models of the time would have been good enough. But times have changed, while the NHS and its only becomes a more tempting target for financial bariatric surgery (and the models have gotten much better). Starmer probably won't be the one to make the call, given his politics, but desperate times call for desperate measures.
I'm confident it'll happen eventually, and far too soon for comfort. The average man, the kind staring at double digit unemployment figures or laid off themselves, would have pointed questions about why doctors and other regulated professions are let off the hook. I think it only buys me like 2-5 additional years of security at best.
And in India? Haha. Sadder haha. It's going to be a bloodbath and the service sector is not going to have a good time. The economy it props up? You connect the dots.
More options
Context Copy link
More options
Context Copy link
Biggest red flag to me that this is more marketing puffery overselling capabilities than reality:
I.e. "This AI could be utterly devastating even if we only let it loose on our internal network. We'd better be super duper extra careful and cautious before we let it loose. 24 hours ought to be fine, what could we possibly miss in such a massive time window?"
If that's the biggest red flag you can find?
Well, I mentally include you in the list of skeptics too, so I've already wished you luck.
(And I suppose I should thank you for listening to others when they asked you to try repeating your recent experiment with Opus instead of Sonnet. That makes you a better skeptic than many I have the displeasure of knowing on this forum.)
More substantively:
Anthropic takes misalignment seriously, though concerns were raised after the loosened their RSP. You can't really evaluate the safety of the latest and greatest models while being maximally restrictive, at least not if you don't want to be scooped by your competitors with fewer scruples. Anthropic acknowledges this tension explicitly, and asks for forgiveness for moving with haste even they aren't quite comfortable with. I can only assume that reasonable care was taken to minimize the scope for danger even when they did a wider internal rollout.
Plus, they've already said they're not going to make Mythos public, even if some of the benefits will trickle down to the next Opus. That is not something a company that is desperate for money or willing to ignore safety would do.
Oh, Boo. You can bet your ass the military and big corpo will have access to Mythos, why can't the ordinary man get it too (even for the appropriate fee including a fair margin rate on top of their development + running costs).
The same reason why you're not allowed to own a machine gun.
More options
Context Copy link
@ChickenOverlord is correct to point out that Anthropic has only said they won't release Mythos Preview, but that they're planning to release "Mythos-tier" models eventually, when they deem it safe.
More options
Context Copy link
More options
Context Copy link
This is a little bit of a hijack, but it's topical. About six weeks ago, you offered to run some coding tasks on a frontier model. Did that ever go anywhere?
Unfortunately, not yet. My collaborator (who would, let's be honest, be doing the heavy lifting that isn't handled by Opus, and certainly more than me) was disappointed by the general quality of the submissions. Not all of them, of course, but many of them seemed too demanding or out of scope. Both of us have also become far more busy, and I have no intention of chasing him regarding it. I did nudge him a few weeks or two back, and that's what he told me.
This doesn't necessarily mean that it won't happen, but you shouldn't get your hopes up. I can't really do it by myself, I have no reason to pay for Claude Max, leaving aside technical capabilities.
There were a few ones, now that I think about it, that I could handle by myself, but they're not the most impressive examples. And I do genuinely have a lot on my plate.
More options
Context Copy link
More options
Context Copy link
I mean I'm sure I could find others if I tried.
Thanks, I try.
They've only said the preview of Mythos won't be public, the final release will be.
A little ambiguous, but the following makes it sound like a limited release for certain partner companies.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link