This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
Project Glasswing: Anthropic Shows The AI Train Isn't Stopping
In AI/ML spaces where I hang around (mostly as a humble lurker), there have been rumors that the recent massive uptick in valid and useful submissions for critical bugfixes might be attributable to a frontier AI company.
I specify "valid" and "useful", because most OSS projects have been inundated with a tide of low-effort, AI generated submissions. While these particular ones were usually not tagged as AI by the authors, they were accepted and acted-upon, which sets a floor on their quality.
Then, after the recent Claude Code leak, hawk-eyed reviewers noted that Anthropic had internal flags that seemed to prevent AI agents disclosing their involvement (or nature) when making commits. Not a feature exposed to the general public, AFAIK, but reserved for internal use. This was a relatively minor talking point compared to the other juicy tidbits in the code.
Since Anthropic just couldn't catch a break, an internal website was leaked, which revealed that they were working on their next frontier model, codenamed either Mythos or Capybara (both names were in internal use). This was... less than surprising. Everyone and their dog knows that the labs are working around the clock on new models and training runs. Or at least my pair do. What was worth noting was that Anthropic had, for the last few years, released 3 different tiers of model - Haiku, Sonnet and Opus, in increasing order of size and capability (and cost). But Mythos? It was presented as being plus ultra, too good to simply be considered the next iteration of Opus, or perhaps simply too expensive (Anthropic tried hard to explain that the price was worth it).
But back to the first point: why would a frontier company do this?
Speculation included:
I noted this, but didn't bother writing it up because, well, they were rumors, and I've never claimed to be a professional programmer.
And now I present to you:
Project Glasswing by Anthropic
..
Examples given:
Well. How about that. I wish the skeptics good luck, someone's going to be eating their hat very soon, and it's probably not going to be me. I'll see you in the queue for the dole. Being right about these things doesn't really get me out of the lurch either, Cassandra's foresight brought about no happy endings for anyone involved. I am not that pessimistic about outcomes, in all honesty, but the train shows no signs of stopping.
Edit: A link to the Substack version of this post. I don't think you should consider me an authoritative source when it comes to AI/ML, at best I'm the kind of nerd who reads the papers with keen interest. But God knows the quality of discourse around the topic is so bad that you can do worse.
Edit 2: I think this also explains the recent crunch in tokens made available to both paid and free tier users of Claude. Mythos can't have been cheap to train, and is definitely not cheap to deploy.
Here's a relevant AI Explained video about Mythos. Some highlights and personal comments:
7:28 Right now I think coding models are at their most powerful when being used as a force multiplier for human experts. (Akin to Cyborg Chess.) Here, a computer security expert mentions that he found more vulnerabilities in a few weeks than in his entire prior career. This ability to find zero-day exploits isn't an artificial benchmark, this is a real-world result that shows we really are entering some sort of new regime. Although ... I suspect statements like this are going to get so common that we no longer recognize how startling they are, like how we ignore the fact that models flawlessly understanding natural speech would have been considered miraculous 10 years ago. And we'll get more idiotic posts by so-called "skeptics" who think that spending 30 minutes failing at using AI counts as definitive proof that frontier models do not exhibit intelligence.
9:10 Safety concerns related to some prior discussion with @Corvos, @YoungAchamian, @roystgnr, and others. To quote: "In contrast, experts were consistently able to construct largely feasible catastrophic scenarios, reinforcing a view of the model as a powerful force-multiplier of existing capabilities." We're not close to the point of plagues being bioengineered in garages, fortunately, but at some point a reasonably-sized terrorist group with some funds and some expertise might be able to do a lot of damage.
13:23 I really don't consider FOOM to be a realistic scenario, and this is just more evidence. Individual researchers being made much more productive does not immediately translate into model intelligence; any real-world endeavour has dozens of bottlenecks (like training compute limits, here) that you can't just outsmart. It's similar to the popular visions of moon cities from the 1960s. Our imaginations regarding rapid technological progress always elide the difficulty of actually implementing it.
16:20 More safety concerns: Apparently it's still pretty vulnerable to an attack known as "prefilling", where you make it look like it's in the middle of a conversation where it's already misbehaved. This kind of makes sense to me - after all, no matter how much reinforcement learning you do, it is fundamentally a model designed to continue text, so if you want it to change course in the middle of a conversation, you're trying to override its most basic functions. If you're just using the model through the company's site, they can of course clearly separate their prompt from the user's input, but this might mean they'll have to limit unrestricted prompt-free access. And in some scenarios Little Bobby Prefilling might become a thing.
17:04 As they get smarter, it's getting harder and harder to run alignment testing on models without them knowing they're in an artificial scenario. Interestingly, though, since Anthropic has done a lot of work on introspection, they can actually artificially lower the weights for "I'm in a test", forcibly tricking the model. Like the way that we can turn image recognizers into image generators, this feels like another unintuitive consequence of running an intelligent mind as a program. We literally have the power to mind-control it, and I bet we'll get better at this. (This will be very unethical if AI develops consciousness - fortunately I'm quite confident LLMs don't qualify, but unfortunately I don't think we'll stop doing this even if AI does cross that threshold. AI welfare is something I'm genuinely worried about for the future.)
20:30 So-called "hallucinations" are of course still happening, and I still suspect this is something that we'll never truly defeat, again because of how LLMs work. You don't complete the sentence "The answer is" with "oh wait never mind I don't know". Models might get smart enough to know the answer to most of the things we ask them, which will help, but getting them to precommit to not knowing something (before they begin with the bullshit and can't back out) is an uphill battle.
Sigh. I've been getting increasingly tired of arguing with the skeptics, at least on this site. Not all of them are equally as bad, of course, but Mythos represents the straw that's given that camel a prolapsed disc.
What's the point? You don't have to worship at the altar of the God of Straight Lines (even on graphs with a logarithmic axis). If people can't see what's happening in front of their eyes, then they'll be in denial right till the end. Good for them, ignorance might well be bliss. Being right about the pace of progress so far has brought me little peace.
I was surprised to hear about the prefilling attacks on Mythos, because I'm quite confident that Anthropic recently restricted or removed the ability to prefill messages on the API. I guess that must still be an internal capability.
The question of model consciousness or qualia is, for me, a moot point. I genuinely don't care either way. I'd prefer, all else being equal, that AI doesn't suffer, but that could be achieved by removing its ability to suffer. I'm an unabashed transhumanist chauvinist, I think that only humans and our direct transhuman and posthuman descendants or derivatives deserve rights. LLMs don't count, nor would sentient aliens that we could beat by force. That's the same reason I'd care about the welfare of a small child but would happily eat a pig of comparable intelligence. Are models today in possession of qualia or consciousness? Maybe. It simply doesn't matter to me as more than a curiosity, especially when we have no solution to the Hard Problem for humans either.
I hope I am not in this grouping in your mind. I am not a skeptic on AI per se, I am a skeptic on LLMs. Entirely for technical reasons related to training data availability. LLMs perform great on any task that has a large corpus of training data available to. Multi-headed Attention really is a great technique. I think you made the same mistake Dase makes, you think AI == LLMs when really LLMs are a subset of AI, not the whole pinata.
I exist however in a field where there aren't large corpuses of data. There aren't millions of samples on what an IED does to the human body, in a wide diversity of situations, or how a combat medic should respond to various injuries, or the secondary and tertiary blast effects of a nuclear warhead on different locals with different burst patterns, yield dynamics, etc. To date nobody has been able to create reliable wargaming material on actual simulated conflicts that display actual tactical and strategic insights, and trust me they have tried...
We will achieve a super intelligence eventually, and while I am skeptical on a "singularity" (tm) it's probably possible eventually, I just don't think LLMs without serious modifications are really it, and I don't believe brute force scaling is going to achieve it.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link