@self_made_human comments on "Culture War Roundup for the week of August 4, 2025

Culture War Roundup for the week of August 4, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

double register = 1; void function(double a) { double b = register; print(b); b += a; register = b; } void main() { double max = Math.Pow(10,3); double start = 1; List inputs = new ArrayList(Collections.nCopies(max, start)); foreach(double val in inputs) { function(val); } }

Jump in the discussion.

No email address required.

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi 5mo ago

Huh. I was confident that I had a better writeup about why "stochastic parrots" are a laughable idea, at least as a description for LLMs. But no, after getting a minor headache figuring out the search operators here, it turns out that's all I've written on the topic.

I guess I never bothered because it's a Gary Marcus-tier critique, and anyone using it loses about 20 IQ points in my estimation.

But I guess now is as good a time as any? In short, it is a pithy, evocative critique that makes no sense.

LLMs are not inherently stochastic. They have a (not usually exposed to end-user except via API) setting called temperature. Without going into how that works, it suffices it to say that by setting the value to zero, their output becomes deterministic. The exact same prompt gives the exact same output.

The reason why temperature isn't just set to zero all the time is because the ability to choose something other than the next most likely token has benefits when it comes to creativity. At the very least it saves you from getting stuck with the same subpar result.

Alas, this means that LLMs aren't stochastic parrots. Minus the stochasticity, are they just "parrots"? Anyone thinking this is on crack, since Polly won't debug your Python no matter how many crackers you feed her.

If LLMs were merely interpolating between memorized n-grams or "stitching together" text, their performance would be bounded by the literal contents of their training data. They would excel at retrieving facts and mimicking styles present in the corpus, but would fail catastrophically at any task requiring genuine abstraction or generalization to novel domains. This is not what we observe.

Let’s get specific. The “parrot” model implies the following:

LLMs can only repeat (paraphrase, interpolate, or permute) what they have seen.
They lack generalization, abstraction, or true reasoning.
They are, in essence, Markov chains with steroids.

To disprove any of those claims, just gestures angrily look at the things they can do. If winning gold in the latest IMO is something a "stochastic parrot" can pull off, then well, the only valid takeaway is that the damn parrot is smarter than we thought. Definitely smarter than the people who use the phrase unironically.

The inventors of the phrase, Bender & Koller gave two toy “gotchas” that they claimed no pure language model could ever solve: (1) a short vignette about a bear chasing a hiker, and (2) the spelled-out arithmetic prompt “Three plus five equals”. GPT-3 solved both within a year. The response? Crickets, followed by goal-post shifting: “Well, it must have memorized those exact patterns.” But the bear prompt isn’t in any training set at scale, and GPT-3 could generalize the schema to new animals, new hazards, and new resolutions. Memorization is a finite resource but generalization is not.

(I hope everyone here recalls that GPT-3 is ancient now)

On point 2: Consider the IMO example. Or better yet, come up with a rigorous definition of reasoning by which we can differentiate a human from an LLM. It's all word games, or word salad.

On 3: Just a few weeks back, I was trying to better understand the actual difference between a Markov Chain and an LLM, and I had asked o3 if it wasn't possible to approximate the latter with the former. After all, I wondered, if MCs only consider the previous unit (usually words, or a few words/n-gram), then couldn't we just train the MC to output the next word conditioned on every word that came before? The answer was yes, but that this was completely computationally intractable. The fact that we can run LLMs on something smaller than a Matrioshka brain is because of their autoregressive nature, and the brilliance of the transformer architecture/attention mechanism.

Overall, even the steelman interpretation of the parrot analogy is only as helpful as this meme, which I have helpfully appended below. It is a bankrupt notion, a thought-terminating cliché at best, and I wouldn't cry if anyone using it meets a tiger outside the confines of a cage.

/images/17544215520465958.webp

Context

Fruck Lacks all conviction self_made_human 5mo ago

I liked using the stochastic parrot idea as a shorthand for the way most of the public use llms. It gives non-computer savvy people a simple heuristic that greatly elevates their ability to use them. But having read this I feel a bit like Charlie and Mac when the gang wrestles.

Dennis: Can I stop you guys for one second? What you just described, now that just sounds like we are singing about about the lifestyle of an eagle.

Charlie: Yeah.

Mac: Mm-hmm.

Dennis: Well I was under the impression we were presenting ourselves as bird-MEN which, to me, is infinitely cooler than just sort of... being a bird.

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi Fruck 5mo ago

I would consider myself an LLM evangelist, and have introduced quite a few not-particularly tech savvy people to them, with good results.

I've never been tempted to call them stochastic parrots. The term harms more than it helps. My usual shortcut is to tell people to act as if they're talking to a human, a knowledgeable but fallible one, and they should double check anything of real consequence. This is a far more relevant description of the kind of capabilities they possess than any mention of a "parrot".

-1

The fact you've never been tempted to use the 'stochastic parrot' idea just means you haven't dealt with the specific kind of frustration I'm talking about.

Yeah the 'fallible but super intelligent human' is my first shortcut too, but it actually contributes to the failure mode the stochastic parrot concept helps alleviate. The concept is useful for those who reply 'Yeah, but when I tell a human they're being an idiot, they change their approach.' For those who want to know why it can't consistently generate good comedy or poetry. For people who don't understand rewording the prompt can drastically change the response, or those who don't understand or feel bad about regenerating or ignoring the parts of a response they don't care about like follow up questions.

In those cases, the stochastic parrot is a more useful model than the fallible human. It helps them understand they're not talking to a who, but interacting with a what. It explains the lack of genuine consciousness, which is the part many non-savvy users get stuck on. Rattling off a bunch of info about context windows and temperature is worthless, but saying "it's a stochastic parrot" to themselves helps them quickly stop identifying it as conscious. Claiming it 'harms more than it helps' seems more focused on protecting the public image of LLMs than on actually helping frustrated users. Not every explanation has to be a marketing pitch.

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi Fruck 5mo ago · Edited 5mo ago

I still don't see why that applies, and I'm being earnest here. What about the "stochastic parrot" framing keys the average person into the fact that they're good at code and bad at poetry? That is more to do with mode collapse and the downsides of RLHF than it is to do with lacking "consciousness". Like, even on this forum, we have no shortage of users who are great at coding but can't write a poem to save their lives, what does that say about their consciousness? Are parrots known to be good at Ruby-on-rails but to fail at poetry?

My explanation of temperature is, at the very least, meant as a high level explainer. It doesn't come up in normal conversation or when I'm introducing someone to LLMs. Context windows? They're so large now that it's not something that is worth mentioning except in passing.

My point is that the parrot metaphor adds nothing. It is, at best, irrelevant, when it comes to all the additional explainers you need to give to normies.

I thought I explained it pretty well, but I will try again. It is a cognitive shortcut, a shorthand people can use when they are still modelling it like a 'fallible human' and expecting it to respond like a fallible human. Mode collapse and RLHF have nothing to do with it, because it isn't a server side issue, it is a user issue, the user is anthropomorphising a tool.

Yes, temperature and context windows (although I actually meant to say max tokens, good catch) don't come up in normal conversation, they mean nothing to a normie. When a normie is annoyed that chatgpt doesn't "get" them, the parrot model helps them pivot from "How do I make this understand me?" to "What kind of input does this tool need to give me the output I want?"

You can give them a bunch of additional explanations about mode collapse and max tokens that they won't understand (and they will just stop using it) or you can give them a simple concept that cuts through the anthropomorphising immediately so that when they are sitting at their computer getting frustrated at poor quality writing or feeling bad about ignoring the llms prodding to take the conversation in a direction they don't care about, they can think 'wait it's a stochastic parrot' and switch gears. It works.

A human fails at poetry because it has the mind, the memories and grounding in reality, but it lacks the skill to match the patterns we see as poetic. An LLM has the skill, but lacks the mind, memories and grounding in reality. What about the parrot framing triggers that understanding? Memetics I guess. We have been using parrots to describe non-thinking pattern matchers for centuries. Parroting a phrase goes back to the 18th century. "The parrot can speak, and yet is nothing more than a bird" is a phrase in the ancient Chinese Book of Rites.

Also I didn't address this earlier because I thought it was just amusing snark, but you appear to be serious about it. Yes, you are correct that a parrot can't code. Do you have a similar problem with the fact a computer virus can't be treated with medicine? Or that the cloud is actually a bunch of servers and can't be shifted by the wind? Or the fact that the world wide web wasn't spun by a world wide spider? Attacking a metaphor is not an argument.

Attacking a metaphor is not an argument.

I've explained why I think the parrot is a terrible metaphor above. And no, metaphors can vary greatly in how useful or pedagogical they are. Analyzing the fitness of a metaphor is a perfectly valid, and in this case essential, form of argument. Metaphors are not neutral decorations; they are cognitive tools that structure understanding and guide action.

A computer virus shares many properties with its biological counterpart, such as self-replication, transmission, damage to systems, the need for an "anti-virus". It is a good name, and nobody with a functional frontal lobe comes away thinking they need an N95 mask while browsing a porn site.

The idea of the Cloud at least conveys the message that the user doesn't have to worry about the geographical location of their data. Even so, the Cloud is just someone else's computer, and even AWS goes down on rare occasions. It is an okay metaphor.

The Parrot is awful. It offers no such explanatory power for the observed, spiky capability profile of LLMs. It does not explain why the model can write functional Python code (a task requiring logic and structure) but often produces insipid poetry (a task one might think is closer to mimicry). It does not explain why an LLM can synthesize a novel argument from disparate sources but fail to count the letters in a word. A user equipped only with the parrot model is left baffled by these outcomes. They have traded the mystery of a "fallible human" for the mystery of a "magical parrot".

I contend that as leaky generalizations go, the former is way better than the latter. An LLM has a cognitive or at least behavioral profile far closer to a human than it does to a parrot.

You brought up the analogy of "parroting" information, which I would assume involves simply reciting things back without understanding what they mean. That is not a good description of how the user can expect an LLM to behave.

On an object level, I strong disagree with your claims that LLMs don't "think" or don't have "minds". They clearly have a very non human form of cognition, but so does an octopus.

Laying that aside, from the perspective of an end-user, LLMs are better modeled as thinking minds.

The "fallible but knowledgeable intern" or "simulation engine" metaphor is superior not because it is more technically precise (though it is), but because it is more instrumentally useful. It correctly implies the user's optimal strategy: that performance is contingent on the quality of the instructions (prompting), the provided background materials (context), and a final review of the output (verification). This model correctly guides the user to iterate on their prompts, to provide examples, and to treat the output as a draft. The parrot model, in contrast, suggests the underlying process is fundamentally random mimicry, which offers no clear path to improvement besides "pull the lever again". It encourages users to conceptualize the LLM as a tool incapable of generalization, which is to ignore its single most important property. Replacing a user's anthropomorphism with a model that is descriptively false and predictively useless is not a pedagogical victory. It is swapping one error for another, and not even for a less severe one to boot.

We are looking at this from two different angles. My angle helps people. Your angle, which seems to prioritize protecting the LLM from the 'insult' of a simple metaphor, actively harms user adoption. My goal in using the parrot model is to solve a specific and very common point of frustration - the anthropomorphising of a tool. I know the parrot shortcut works, I have watched it work and I have been thanked for it.

The issue is that humans - especially older humans - have been using conversation - a LUI - in a very particular way their entire lives. They have conversations with other humans who are grounded in objective reality, who have emotions and memories, and therefore when they use a LUI to interact with a machine, they subconsciously pattern match the machine to other humans and expect it to work the same way - and when it doesn't they get frustrated.

The parrot model on the other hand, tells the user 'Warning: This looks like the UI you have been using your whole life, but it is fundamentally different. Do not assume understanding. Do not assume intention. Your input must be explicit and pattern-oriented to get a predictable output.' The parrot doesn't get anything. It has no intentions in the sense the person is thinking of. It can't be lazy. The frustration dissolves and is replaced by a practical problem solving mindset. Meanwhile the fallible intern exacerbates the very problem I am trying to solve by reinforcing the identification of the LLM as a conscious being.

The beauty is, once they get over that, once they no longer have to use the parrot model to think of it as a tool, they start experimenting with it in ways they wouldn't have before. They feel much more comfortable treating it like a conversation partner they can manipulate through the tech. Ironically they feel more comfortable joking about it being alive and noticing the ways it is like and unlike a person. They get more interested in learning how it actually works, because they aren't shackled by the deeply ingrained grooves of social etiquette.

You're right that metaphors should be analyzed for fitness, but that analysis requires engaging with the metaphor's intended purpose, not just attacking its accuracy literally. A metaphor only needs to illuminate one key trait to be effective, but the parrot goes a lot further than that. It is in fact fantastic at explaining the spiky profile of LLMs. It explains why an LLM can 'parrot' highly structured Python from its training data but write insipid poetry that lacks the qualia of human experience. Likewise I could train a parrot to recite 10 PRINT "BALLS"; 20 GOTO 10, but it could never invent a limerick. It explains why it can synthesize text (a complex pattern matching task) but can't count letters in a word (a character level task it's not trained to understand). Your analysis ignores this context, seemingly because the metaphor is offensive to an aspirational view of AI. But you're attacking a subway map for not being a satellite image. The resolution is drastically reduced yes - this is a selling point, not a flaw. Cultural cachet drastically outweighs accuracy when it comes to a metaphor's usefulness in real world applications.

And do you want to know another animal with a clearly non human form of cognition? A parrot. How did you skip over crows and dolphins to get to octupi, animals with an intelligence that is explicitly not language based, when we are talking about language models? Unlike an octopus, a parrot's intelligence is startlingly relevant here (my mentioning of parroting was just an example of how a parrot has been used as a metaphor for a non-thinking (or if you prefer, non-feeling) pattern matcher in the past.) Using a LUI a parrot can learn complex vocalisation. They can learn mimicry and memorisation. They can learn to associate words with objects and concepts (like colours and zero). They can perform problem solving tasks through dialogue. Is it just because octupus intelligence is cool and weird? Because that just brings me back to the difference between evangelising llms and helping people. You want to talk up llms, I want to increase their adoption.

Shaming users for not having the correct mental model is precisely how we end up with people who are afraid of their tools - the boomers who work out calculations on a pocket calculator before typing them into Excel, or who type 'Gmail login' into the Google search bar every single day. As social media amply demonstrates, technical accuracy does not aid in adoption, it is a barrier to it. We can dislike that from a nerd standpoint, which is why I admired your point in my original post (technically correct is the best kind of correct!) but user adoption will do a lot more for advancing the tech.

And do you want to know another animal with a clearly non human form of cognition? A parrot.

Touché. I walked into that one.

We are looking at this from two different angles. My angle helps people. Your angle, which seems to prioritize protecting the LLM from the 'insult' of a simple metaphor, actively harms user adoption.

Look, come on. We are literally in a thread dedicated to avoiding Bulverism. Do you honestly think I'm out here defending the honor of a piece of software? My concern is not for the LLM's public image. Sam Altman is not sending me checks. I pay for ChatGPT Plus.

I think the charitable, and correct, framing is that we are both trying to help people use these things better. We just disagree on the best way to do that. My entire point is that the "stochastic parrot" model, while it might solve the one specific problem of a user getting frustrated, ultimately creates more confusion than it solves. It's a bad mental model, and I care about users having good mental models.

You're right that a metaphor is a subway map, not a satellite image. Its value is in its simplification. But for a subway map to be useful, it has to get the basic topology right. It has to show you which stations connect. The parrot map gets the topology fundamentally wrong.

It tells you the machine mimics, and that's it. It offers zero explanation for the weird, spiky capability profile. Why can this "parrot" debug Python but not write a good joke? Why can it synthesize three different academic papers into a novel summary but fail to count the letters in a word? The parrot model just leaves you with "I guess it's a magic parrot". It doesn't give the user any levers to pull. What's the advice? "Just keep feeding the parrot crackers and hope it says something different?"

Compare that to the "fallible but brilliant intern" model. It's also a simplification, but it's a much better map. It correctly predicts the spikiness. An intern can be a world-class expert on one topic and completely sloppy with basic arithmetic. That feels right. More importantly, it gives the user an immediate, actionable strategy. What do you do with a brilliant but fallible intern? You give them very clear instructions, you provide them with all the necessary background documents, and you always, always double-check their work for anything mission-critical. That maps perfectly onto prompt engineering, RAG, and verification. It empowers the user. The parrot model just leaves them shrugging.

Shaming users for not having the correct mental model is precisely how we end up with people who are afraid of their tools

I'm pretty sure I haven't done that. My frustration isn't with your average user. It's with people who really should know better using the term as a thought-terminating cliche to dismiss the whole enterprise.

If my own grandmother told me she was getting frustrated because "Mr. GPT" kept forgetting what she told it yesterday, I wouldn't lecture her on stateless architecture. I'd say something like, "Think of it as having the world's worst long-term memory. It's a total genius, but you have to re-introduce yourself and explain the whole situation from scratch every single time you talk to it."

That's also a simple, not-quite-accurate metaphor. But it's a better one. It's a better map. It addresses her actual problem and gives her a practical way to think that will get her better results next time. It helps her use the tool, which is the goal I think we both agree on.

More comments

VoxelVexillologist Multidimensional Radical Centrist self_made_human 5mo ago

Computationally, maybe all we are is Markov chains. I'm not sold, but Markov chat bots have been around for a few decades now and used to fool people occasionally even at smaller scales.

LLMs can do pretty impressive things, but I haven't seen convincing evidence that any of them have stepped clearly outside the bounds of their training dataset. In part that's hard to evaluate because we've been training them on everything we can find. Can a LLM trained on purely pre-Einstein sources adequately discuss relativity? A human can be well versed in lots of things with substantially less training material.

I still don't think we have a good model for what intelligence is. Some have recently suggested "compression", which is interesting from an information theory perspective. But I won't be surprised to find that whatever it is, it's actually an NP-hard problem in the perfect case, and everything else is just heuristics and approximations trying to be close. In some ways it'd be amusing if it turns out to be a good application of quantum computing.

RandomRanger Just build nuclear plants! VoxelVexillologist 5mo ago

LLMs can do pretty impressive things, but I haven't seen convincing evidence that any of them have stepped clearly outside the bounds of their training dataset.

What does it mean to step outside the bounds of their training set? If I have it write a fanfic about Saruman being sponsored by NordVPN for a secure Palantir browsing experience (first month is free with code ISTARI), is that beyond the training set? It knows about NordVPN and Lord of the Rings but surely there is no such combo in the training set.

Or would it be novel if I give it my python code and errors from the database logs and ask it for a fix? My code specifically has never been trained on, though it's seen a hell of a lot of python.

R1 has seen use in writing kernels which is real work for AI engineers, is that novel? Well it's seen a bunch of kernels in the past.

Or something fundamentally new like a paradigm-changer like the transformer architecture itself or a whole new genre of fiction? If it's that, then we'd only get it at the point of AGI.

gattsuru VoxelVexillologist 5mo ago

I don't want to speak on 'intelligence' or genuine reasoning or heuristics and approximations, but when it comes to going outside the bounds of their training data, it's pretty trivially possible to take an LLM and give a problem related to a video game (or a mod for a video game) that was well outside of its knowledge cutoff or training date.

I can't test this right now, it's definitely not an optimal solution (see uploaded file for comparison), and I think it misinterpreted the Evanition operator, but it's a question that I'm pretty sure didn't have an equivalent on the public web anywhere until today. There's something damning in getting a trivial computer science problem either non-optimal or wrong, especially when given the total documentation, but there's also something interesting in getting one like this close at all with such minimum of information.

/images/17544296446888535.webp

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi gattsuru 5mo ago

What on earth is going on in that screenshot? I know Minecraft mod packs can get wild, but that's new.

gattsuru self_made_human 5mo ago · Edited 5mo ago

HexCasting is fun, if not very balanced.

It has a stack-based programming language system based on drawing Patterns onto your screen over a hex-style grid, where each Pattern either produces a single variable on the top of the stack, manipulates parts of the stack to perform certain operations, or act as an escape character, with one off-stack register (called the Ravenmind). You can keep the state of the grid and stack while not actively casting, but because the screen grid has limited space and the grid is wiped whenever the stack is empty (or on shift-right-click), there's some really interesting early-game constraints where quining a spell or doing goofy recursion allows some surprisingly powerful spells to be made much earlier than normal.

Eventually, you can craft the Focus and Spellbook items that can store more variables from the stack even if you wipe the grid, and then things go off the rails very quickly, though there remain some limits since most Patterns cost amethyst from your inventory (or, if you're out of amethyst and hit a certain unlock, HP).

Like most stack-based programming it tends to be a little prone to driving people crazy, which fits pretty heavily with the in-game lore for the magic.

That specific spell example just existed to show a bug in how the evaluator was calculating recursion limits. The dev intended to have a limit of 512 recursions, but had implemented two (normal) ways of recursive casting. Hermes' Gambit executes a single variable from the stack, and each Hermes' added one to the recursion limit as it was executed. Thoth's Gambit executes each variable from one list over a second list, and didn't count those multiplicatively. I think it was only adding one to the recursion for each variable in the second list? Since lists only took 1 + ListCount out of the stack's 1024 limit to the stack, you could conceivably hit a quarter-million recursions without getting to the normal block from the limit.

Psuedocode, it's about equivalent to :

Very ugly, but the language is intentionally constrained so you can't do a lot of easier approaches (eg, you have to declare 10^3 because the symbol for 1000 is so long it takes up most of the screen, you don't have normal for loops so that abomination of a list initialization is your new ~~worst enemy~~ best friend, every number is a double).

Not that big a deal when you're just printing to the screen, but since those could (more!) easily have been explosions or block/light placements or teleportations, it's a bit scary for server owners.

((In practice, even that simple counter would cause everyone to disconnect from a remote server. Go go manual forkbomb.))

For some other example spells, see a safe teleport, or a spell to place a series of temporary blocks the direction you're looking, or to mine five blocks from the face of the block you're looking at.

(Magical)PSI is a little easier to get into and served as the inspiration for HexCasting, but it has enough documentation on reddit that I can't confidently say it's LLM-training proof.

Why am I surprised? People make Redstone computers for fun. I guess this all just takes a very different mindset haha.

VoxelVexillologist Multidimensional Radical Centrist gattsuru 5mo ago

That is pretty impressive. Is it allowed to search the web? It looks like it might be. I think the canonical test I'm proposing would disallow that, but it is a useful step in general.

gattsuru VoxelVexillologist 5mo ago · Edited 5mo ago

Huh. ~~Uploading just the Patterns section of the HexBook webpage and disabling search on web looks better even on Grok3, though that's just a quick glance and I won't be able to test it for a bit.~~

EDIT: nope, several hallucinated patterns on Grok 3, including a number that break from the naming convention. And Grok4 can't have web search turned off. Bah.

Have you tried simply asking it not to search the web? The models usually comply when asked. If they don't, it should be evident from the UI.

gattsuru self_made_human 5mo ago

That's a fair point, and does seem to work with Grok, as does just giving it only one web page and requesting it to not use others. Still struggles, though.

That said, a lot of the logic 'thinking' steps are things like "The summary suggests list operations exist, but they're not fully listed due to cutoff.", getting confused by how Consideration/Introspection works (as start/end escape characters) or trying to recommend Concat Distillation, which doesn't exist but is a reasonable (indeed, the code) name for Speaker's Distillation. So it's possible I'm more running into issues with the way I'm asking the question, such that Grok's research tooling is preventing it from seeing the necessary parts of the puzzle to find the answer.

I tried using o3, but it correctly noted that the file you mentioned isn't available, and its web browsing tool failed when trying to use the website.

I can't do anything about the missing document, but I did manually copy and paste most of the website. This is its answer:

https://chatgpt.com/s/t_6892b68c0c3081919777d514df3ba8c2

That's a bit weird an approach -- you're drawing 20 Hermes Gambits rather than having the code recurse, and the Gemini Decomposition → Reveal → Novice's Gambit could be simplified to just Reveal -- but it does work and fulfills the requirement. Can run it in this IDE if anyone wants, though you'll have to use the simplified version since Novice's Gambit (and Bookkeeper's Gambit) isn't supported there, but the exactly-as-chatGPT'd version does work in-game (albeit an absolute pain to draw without a Focus).

That's kinda impressive. Both Rotation Gambit II and Retrospection are things I'd expect LLMs to struggle with.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats