@gattsuru comments on "Culture War Roundup for the week of August 4, 2025

Culture War Roundup for the week of August 4, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

gattsuru 10mo ago

That's a fair point, and does seem to work with Grok, as does just giving it only one web page and requesting it to not use others. Still struggles, though.

That said, a lot of the logic 'thinking' steps are things like "The summary suggests list operations exist, but they're not fully listed due to cutoff.", getting confused by how Consideration/Introspection works (as start/end escape characters) or trying to recommend Concat Distillation, which doesn't exist but is a reasonable (indeed, the code) name for Speaker's Distillation. So it's possible I'm more running into issues with the way I'm asking the question, such that Grok's research tooling is preventing it from seeing the necessary parts of the puzzle to find the answer.

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi gattsuru 10mo ago

I tried using o3, but it correctly noted that the file you mentioned isn't available, and its web browsing tool failed when trying to use the website.

I can't do anything about the missing document, but I did manually copy and paste most of the website. This is its answer:

https://chatgpt.com/s/t_6892b68c0c3081919777d514df3ba8c2

gattsuru self_made_human 10mo ago

That's a bit weird an approach -- you're drawing 20 Hermes Gambits rather than having the code recurse, and the Gemini Decomposition → Reveal → Novice's Gambit could be simplified to just Reveal -- but it does work and fulfills the requirement. Can run it in this IDE if anyone wants, though you'll have to use the simplified version since Novice's Gambit (and Bookkeeper's Gambit) isn't supported there, but the exactly-as-chatGPT'd version does work in-game (albeit an absolute pain to draw without a Focus).

That's kinda impressive. Both Rotation Gambit II and Retrospection are things I'd expect LLMs to struggle with.

Thanks for verifying the answer! If there's a takeaway here, I'm not sure why you're ?paying for Grok 4. Grok 3 was genuinely impressive, and somewhat noticeably better than the competition at launch. Not the case here I'm afraid.

With that aside, I'm not sure how other people see LLMs tackling problems of this complexity and then claim they're not reasoning. It bemuses me.

gattsuru self_made_human 10mo ago · Edited 10mo ago

With that aside, I'm not sure how other people see LLMs tackling problems of this complexity and then claim they're not reasoning.

Both the complexity, and also just the novelty. These LLMs were definitely trained with some stack-based languages, both historic like forth and modern assembly, but as similar as they might be from a conceptual perspective, the implementation philosophy and simple names are drastically different. And while there's a tiny number of HexCasting examples on the open web or on discord, but they universally take different approaches, and some of them (like my screenshot above) simply can't be read by any parser that doesn't already have a good understanding of the language: a completed Jester's Gambit, Rotation Gambit, and Rotation Gambit II are visually identical. And, of course, you don't have to write a spell from top-left-going-right-then-down like an English paragraph.

That's a fun example because it's got an easy third-party evaluation, but you don't have to make up a programming language to do this sorta thing. These problems don't have to be hard, just be being new they undermine a lot of these arguments, and these writers could run them, and just don't seem interested in it.

I'm not sure why you're ?paying for Grok 4. Grok 3 was genuinely impressive, and somewhat noticeably better than the competition at launch. Not the case here I'm afraid.

Yeah, I'm a little surprised. I didn't expect any of the LLMs to handle this great -- I'm kinda amazed that ChatGPT could do as well as it did, and even some of my gripes are probably downstream of the question being underspecified -- but the level of hallucination from Grok4 is disappointing, and I could see the arguments for dropping it.

Partly a politics and work politics thing. My boss is a big booster of everything Musk, so there's been a lot more value, separate from the LLM's specific capabilities, in both knowing the thing and knowing the limits of the Grok4. And while I don't particularly trust xAI, I neither trust nor like OpenAI.

Some of it's use case. I do find even Grok 3 more effective for writing, and reviewing writing than the ChatGPT equivalents. Compare this to this, or at the risk of touching on erwgv3g34's topic today, this to this (cw: discussion of an excerpt from nsfw m/m/f text; there's no actual sex or even nudity, but it's very very clearly smut.)

They're all sycophantic, and even where 4o is better at catching spelling and grammar checks, at larger consistency or coherence or theme questions I've had a hell of a time getting any of the early ChatGPT models to really push back with anything deeper than the Your First Writing Advice that amadanb's criticized.

Of course, I also haven't experimented that hard with them, or with the newer paid ChatGPT models. I probably do need to do a deeper and more serious evaluation; I've also just been lazy about actual hard comparisons for fields with strict performance results. My work programming goes into stuff that I'm either unwilling to upload to an outside service or is large enough in scale that models have had problems maintaining logic (or both), while my hobby programming or teaching is mostly simple enough that Grok3 or 4mini can handle it.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats