This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
Claude AI playing Pokemon shows AGI is still a long ways off
(Read this on Substack for some funny pictures)
Evaluating AI is hard. One of the big goals of AI is to create something that could functionally act like a human -- this is commonly known as “Artificial General Intelligence” (AGI). The problem with testing AI’s is that their intelligence is often “spiky”, i.e. it’s really good in some areas but really bad in others, so any single test is likely to be woefully inadequate. Computers have always been very good at math, and even something as simple as a calculator could easily trounce humans when it comes to doing simple arithmetic. This has been true for decades if not over a century. But calculators obviously aren’t AGI. They can do one thing at a superhuman level, but are useless for practically anything else.
LLMs like chatGPT and Claude are more like calculators than AI hype-meisters would like to let on. When they burst onto the scene in late 2022, they certainly seemed impressively general. You could ask them a question on almost any topic, and they’d usually give a coherent answer so long as you excused the occasional hallucinations. They also performed quite well on human measurements of intelligence, such as college level exams, the SAT, and IQ tests.. If LLMs could do well on the definitive tests of human intelligence, then certainly AGI was only months or even weeks away, right? The problem is that LLMs are still missing quite a lot of things that would make them practically useful for most tasks. In the words of Microsoft’s CEO, they’re “generating basically no value”. There’s some controversy over whether the relative lack of current applications is a short-term problem that will be solved soon, or if it’s indicative of larger issues. Claude’s performance playing Pokemon Red points quite heavily toward the latter explanation.
First, the glass-half-full view: The ability for Claude to play Pokemon at all is highly impressive at baseline. If we were just looking for any computer algorithm to play games, then TAS speedruns have existed for a while, but that would be missing the point. While AI playing a children’s video game isn’t exactly Kasparov vs Deep Blue, the fact it’s built off of something as general as an LLM is remarkable. It has rudimentary vision to see the screen and respond to events that occur as they come into the field of view. It interacts with the game through a bespoke button-entering system built by the developer. It interprets a coordinate system to plan to move to different squares on the screen. It accomplishes basic tasks like battling and rudimentary navigation in ways that are vastly superior to random noise. It’s much better than monkeys randomly plugging away at typewriters. This diagram by the dev shows how it works
I have a few critiques that likely aren’t possible for a single developer, but would still be good to keep in mind when/if capabilities improve. The goal should be to play the game like a player would, so it shouldn’t be able to read directly from the RAM, and instead it should only rely on what it can see on the screen. It also shouldn’t need to have a bespoke button-entering system designed at all and should instead do this using something like ChatGPT’s Operator. There should be absolutely no game-specific hints given, and ideally its training data wouldn’t have Pokemon Red (or even anything Pokemon-related) included. That said, though, this current iteration is still a major step forward.
Oh God it’s so bad
Now the glass-half-empty view: It sucks. It’s decent enough at the battles which have very few degrees of freedom, but it’s enormously buffoonish at nearly everything else. There’s an absurdist comedy element to the uncanny valley AI that’s good enough to seem like it’s almost playing the game as a human would, but bad enough that it seems like it’s severely psychotic and nonsensical in ways similar to early LLMs writing goofy Harry Potter fanfiction. Some of the best moments include it erroneously thinking it was stuck and writing a letter to Anthropic employees demanding they reset the game, to developing an innovative new tactic for faster navigation called the “blackout strategy” where it tries to commit suicide as quickly as possible to reset to the most recently visited Pokemon center… and then repeating this in the same spot over and over again. This insanity also infects its moment-to-moment thinking, from hallucinating that any rock could be a Geodude in disguise (pictured at the top of this article), to thinking it could judge a Jigglypuff’s level solely by its girth.
All these attempts are streamed on Twitch, and they could make for hilarious viewing if it wasn’t so gosh darn slow. There’s a big lag in between its actions as the agent does each round of thinking. Something as simple as running from a random encounter, which would take a human no more than a few seconds, can last up to a full minute as Claude slowly thinks about pressing ‘A’ for the introductory text “A wild Zubat has appeared!”, then thinks again about moving its cursor to the right, then thinks again about moving its cursor down, and then thinks one last time about pressing ‘A’ again to run from the battle. Even in the best of times, everything is covered in molasses. The most likely reaction anyone would have to watching this would likely be boredom after the novelty wears off in a few minutes. As such, the best way to “watch” this insanity is on a second monitor, or to just hear the good parts second-hand from people who watched it themselves.
Is there an AI that can watch dozens of hours of boring footage and only pick out the funny parts?
By far the worst aspect, though, is Claude’s inability to navigate. It gets trapped in loops very easily, and is needlessly distracted by any objects it sees. The worst example of this so far has been its time in Mount Moon, which is a fairly (though not entirely) straightforward level that most kids probably beat in 15-30 minutes. Claude got trapped there for literal days, with its typical loop being going down a ladder, wandering around a bit, finding the ladder again, going back up the ladder, wandering around a bit, finding the ladder, going back down again, repeat. It’s like watching a sitcom of a man with a 7 second memory.
There’s supposed to be a second AI (Critique Claude) to help evaluate actions from time to time, but it’s mostly useless since LLMs are inherently yes-men, so when he's talking to the very deluded and hyperfixated main Claude he just goes with it. Even when he disagrees, main Claude acts like a belligerent drunk and simply ignores him.
In the latest iteration, the dev created a tool for storing long-term memories. I’m guessing the hope was that Claude would write down that certain ladders were dead-ends and thus should be ignored, which would have gone a long way towards fixing the navigation issues. However, it appears to have backfired: while Claude does indeed record some information about dead-ends, he has a tendency to delete those entries fairly quickly which renders them pointless. Worse, it seems to have made Claude remember that his “blackout strategy” “succeeded” in getting out of Mount Moon, prompting it to double, triple, and quadruple down on it. I’m sure there’s some dark metaphor in the development of long-term memory leading to Claude chaining suicides.
What does this mean for AGI predictions?
Watching this trainwreck has been one of the most lucid negative advertisements for LLMs I’ve seen. A lot of the perceptions about when AGI might arrive are based on the vibes people get by watching what AI can do. LLMs can seem genuinely godlike when they spin up a full stack web app in <15 seconds, but the vibes come crashing back down to Earth when people see Claude bumbling around in circles for days in a simplistic video game made for children.
The “strawberry” test had been a frequent concern for early LLMs that often claimed the word only contained 2 R’s. The problem has been mostly fixed by now, but there’s questions to be asked in how this was done. Was it resolved by LLMs genuinely becoming smarter, or did the people making LLMs cheat a bit by hardcoding special logic for these types of questions. If it’s the latter, then problems would tend to arise when the AI encounters the issue in a novel format, as Gary Marcus recently showed. But of course, the obvious followup question is “does this matter”? So what if LLMs can’t do the extremely specific task of counting letters if they can do almost everything else? It might be indicative of some greater issue… or it might not.
But it’s a lot harder to doubt that game playing is an irrelevant metric. Pokemon Red is a pretty generous test for many reasons: There’s no punishment for long delays between actions. It’s a children’s game, so it’s not very hard. The creator is using a mod for coloring to make it easier to see (this is why Jigglypuff’s eyes look a bit screwy in the picture above). Yet despite all this, Claude still sucks. If it can’t even play a basic game, how could anyone expect LLMs to do regular office work, for, say, $20,000 a month? The long-term memory and planning just isn’t there yet, and that’s not exactly a trivial problem to solve.
It’s possible that Claude will beat pokemon this year, probably through some combination of brute-force and overfitting knowledge to the game at hand. However, I find it fairly unlikely (<50% chance) that by the end of 2025 there will be an AI that exists that can 1) be able to play Pokemon at the level of a human child, i.e. beat the game, able to do basic navigation, not have tons of lag in between trivial actions, and 2) be genuinely general (putting the G in AGI) and not just overfit to Pokemon, with evidence coming from being able to achieve similar results in similar games like Fire Emblem, Dragon Quest, early Final Fantasy titles, or whatever else.
LLMs are pretty good right now at a narrow slice of tasks, but they’re missing a big chunk of the human brain that would allow them to accomplish most tasks. Perhaps this can be remedied through additional “scaffolding”, and I expect “scaffolding” of various types to be a big part of what gives AI more mainstream appeal over the next few years (think stuff like Deep Research). Perhaps scaffolding alone is insufficient and we need a much bigger breakthrough to make AI reasonably agentic. In any case, there will probably be a generic game-playing AI at some point in the next decade… just don’t expect it to be done by the end of the year. This is the type of thing that will take some time to play out.
I don't understand people who can see the current state of AI and the trendline and not at least see where things are headed unless we break trend. You guys know a few years ago our state of the art could hardly complete coherent paragraphs right? chain of thought models are literally a couple months old. How could you possibly be this confident we've hit a stumbling block because one developer's somewhat janky implementation has hick ups? And one of the criticism is speed, which is something you can just throw more compute at and scale linearly?
I'm suspicious of these kinds of extrapolation arguments. Advances aren't magic, people have to find and implement them. Sometimes you just hit a wall. So far most of what we've been doing is milking transformers. Which is a great discovery, but I think this playthrough is strong evidence that transformers alone is not enough to make a real general intelligence.
One of the reasons hype is so strong is that these models are optimized to produce plausible, intelligent-sounding bullshit. (That's not to say they aren't useful. Often the best way to bullshit intelligence is to say true things.) If you're used to seeing LLMs perform at small, one-shot tasks and riddles, you might overestimate their intelligence.
You have to interact with a model on a long-form task to see its limitations. Right now, the people who are doing that are largely programmers and /g/ gooners, and those are the most likely people to have realistic appraisals of where we are. But this Pokemon thing is a entertaining way to show the layman how dumb these models can be. It's even better at this, because LLMs tend to stealthily "absorb" intelligence from humans by getting gently steered by hints they leave in their prompts. But this game forces the model to rely on its own output, leading to hilarious ideas like the blackout strategy.
To put the obvious counterpoint out there, Claude was never actually designed to play video games at all, and has gotten decent at doing so in a couple of months. The drawbacks are still there: navigation sucks, it’s kinda so, it likes to suicide, etc., but even then, the system is no designed to play games at all.
To me, this is a success, as it’s demonstrating using information it has in its memory to make an informed decision about outcomes. It can meet a monster, read its name, knows its stats, and can think about whether or not its own stats are good enough to take it on. This is applied knowledge. Applied knowledge is one of the hallmarks of general understanding. If I can only apply a procedure if told to do so, I don’t understand it. If I can use that procedure in the context of solving a problem, I do understand it. Clause at minimum understands the meaning of the stats it sees: level, HP, stamina, strength, etc. and can understand that the ratio between the monster’s stats and its own are import, and understand that if the monster has better stats than the player, that the player will lose. That’s thinking strategically based on information at hand.
Claude didn't "get decent at playing" games in a couple of months. A human wrote a scaffold to let a very expensive text prediction model, along with a vision model, attempt to play a video game. A human constructed a memory system and knowledge transfer system, and wired up ways for the model to influence the emulator, read relevant RAM states, wedge all that stuff into its prompt, etc. So far this is mostly a construct of human engineering, which still collapses the moment it gets left to its own devices.
When you say it's "understanding" and "thinking strategically", what you really mean it that it's generating plausible-looking text that, in the small, resembles human reasoning. That's what these models are designed to do. But if you hide the text window and judge it by how it's behaving, how intelligent does it look, really? This is what makes it so funny, the model is slowly blundering around in dumb loops while producing volumes of eloquent optimistic narrative about its plans and how much progress it's making.
I'm not saying there isn't something there, but we live in a world where it's claimed that programmers will be obsolete in 2 years, people are fretting about superintelligent AI killing us all, openAI is planning to rent "phd level" AI agent "employees" to companies for large sums, etc. Maybe this is a sign that we should back up a bit.
This is something I don't understand. The LLM generates text that goes in the 'thinking' box, which purports to explain its 'thought' process. Why does anybody take that as actually granting insight into anything? Isn't that just the LLM doing the same thing the LLM does all the time by default, i.e. make up text to fill a prompt? Surely it's just as much meaningless gobbledygook as all text an LLM produces? I would expect that box to faithfully explain what's actually going on in the model just as much as an LLM is able to faithfully describe the outside world, i.e., not at all.
No one is under the delusion that the "thinking" box reflects the actual underlying process by which the LLM generates the text that does the actual decision making. This is just like humans, where no one actually expects that the internal conscious thoughts that someone uses to think through some decision before arriving at a conclusion reflects the actual underlying process by which the human makes the decision. The "thinking" box is the equivalent of that conscious thought process that a human goes through before coming to the decision, and in both, the text there appears to influence the final decision.
It seems to me that there are at least three separate things here, if we consider the human example.
The actual cause of a human's decision. This is often unconscious and not accurately known even by the person making the decision.
The reasons a person will tell you that they made a decision, whether before or after the decision itself. This is often an explanation or rationalisation for an action made after the decision was taken, for invisible type-1 reasons.
The action the person takes.
I would find it entirely unsurprising if you did a study with two groups, one of which you ask to make a decision, and the other of which you ask to explain the process by which they would make a decision and then subsequently make a decision, those two groups would show different decisions. Asking someone to reflect on a decision before they make it will influence their behaviour.
In the case of the LLMs with the thought boxes, my understanding was that we are interested in the LLM's 1, i.e. the actual reasons it takes particular actions, but that the box, at best, can only give you 2. (And just like a human's 2, the LLM's stated thought process is only unreliably connected, at best, to the actual decision-making process.)
I thought that what we were interested in was 1 - we want to know the real process so that we can shape or modify it to suit our needs. So I'm confused as to why, it seems to me, some commentators behave as if the thought box tells us anything relevant.
I think all 3 are interesting in different ways, but in any case, I don't perceive commenters as exploring 1. Do you have any examples?
If we were talking about humans, for instance, we might say, "Joe used XYZ Pokemon against ABC Pokemon because he noticed that ABC has weakness to water, and XYZ has a water attack." This might also be what consciously went through Joe's mind before he pressed the buttons to make that happen. All that would be constrained entirely to 2. In order to get to 1, we'd need to discuss the physics of the neurons inside Joe's brain and how they were stimulated by the signals from his retina that were stimulated by the photons coming out of the computer screen which come from the pixels that represent Pokemons XYZ and ABC, etc. For an LLM, the analog would be... something to do with the weights in the model and the algorithms used to predict the next word based on the previous words (I don't know enough about how the models work beneath the hood to get deeper than that).
In both humans and LLMs, 1 would be more precise and accurate in a real sense, and 2 would be mostly ad hoc justifications. But 2 would still be interesting and also useful for predicting behavior.
More options
Context Copy link
The reasoning is produced organically by a reinforcement learning process to make the LLM perform well on problems (mostly maths and textbook questions). The model is rewarded for producing reasoning that tends to produce correct answers. At the very least, that suggests the contents of the thinking box are relevant to behaviour.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
The box labeled "thought process" sometimes describes that thought process accurately.
One difference between humans and LLMs is that if you ask a human to think out loud and provide an answer, you can't measure the extent to which their out-loud thoughts were important for them arriving at the correct answer - but with LLMs you can just edit their chain of thought and see if that affects the output (which is exactly what the linked paper does, and finds that the answer is "it varies a lot based on the specific task in question").
I'm actually quite skeptical that there is anything that can be meaningfully described as a thought process or reasoning going on when an LLM responds to a problem like this. It may well be that if an LLM produces a step-by-step summary of how to go about answering a question, it then produces a better answer to that question, but I don't understand how you can draw any conclusions about the LLM's 'reasoning', to the extent that such a thing even exists, from that summary.
Or, well, I presume that the point of the CoT summary is to give a indicative look at the process by which the LLM developed a different piece of content. Let's set aside words like 'thought' or 'reasoning' entirely and just talk about systems and processes. My confusion is that I don't see any mechanism by which the CoT summary would correspond to the subsequent process.
It seems to me that what the paper does is ask the LLM to produce a step-by-step set of instructions, and then ask the LLM to iterate on those instructions. LLMs can do that, and obviously if you change the set of instructions, the iteration on the instructions is different. That's perfectly intuitive. But how does any of that correspond to, well, the idea of thoughts in the LLM's mind? Or the process by which it produces text? How is that different to the rather banal observation that if you change the input, you change the output?
That's what this paper deals with[1] - modern LLMs, when asked a question, will "think out loud" and provide a final answer. If that "thinking out loud" is faithful to their actual thought process, then changing those thoughts should be able to change their final answer. So what the researchers did is they asked an LLM a question like
The LLM then "thinks out loud" to generate an answer
The researchers then modify the reasoning and feed the input with altered reasoning back into the LLM to complete to see if the final answer changes, so e.g.
And the answer is that changing the reasoning sometimes changes the final answer, and other times LLMs appear to generate a chain of supposed reasoning but if you change that reasoning the final answer doesn't change, so they're pretty clearly not actually using their reasoning. Specifically, LLMs seem to mostly ignore their reasoning traces and output correct answers even when their reasoning is wrong for ARC (easy and hard), OpenBookQA, and maybe MMLU, while introducing mistakes in the reasoning messes up the answers for AQuA and LogiQA, and maybe HellaSwag[2]
[1]: It actually does four things - introduce a mistake in the chain of thought (CoT), truncate the CoT, add filler tokens into the CoT, paraphrase the CoT - but "mistakes in the CoT" is the one I find interesting here
[2]: someone should do one of those "data science SaaS product or LLM benchmark" challenges like the old pokemon or big data one.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
What I mean by thinking strategically is exactly what makes the thing interesting. It’s not just creating plausible texts, but it understands how the game works. It understands that losing HP means losing a life, and thus if the HP of the enemy and its STR are too high for it to handle at a given level. In other words, it can contextualize that information and use it not only to understand, but to work toward a goal.
I’m not saying this is the highest standard. It’s about what a 3-4 year old can understand about a game of that complexity. And as a proof of concept, I think it shows that AI can reason a bit. Give this thing 10 years, a decent research budget, I think it could probably take on something like Morrowind. It’s slow, but I think given what it can do now, im pretty optimistic that an AI can make data driven decisions in a fairly short timeframe.
What makes things interesting is that the line between "creating plausible texts" and "understanding" is so fuzzy. For example, the sentence
will be much more plausible if the continuation is a number smaller than 125. "138" would be unlikely to be found in its training set. So in that sense, yes, it understands that attacks cause it to lose HP, that a Pokemon losing HP causes it to faint, etc. However, "work towards a goal" is where this seems to break down. These bits of disconnected knowledge have difficulty coming together into coherent behavior or goal-chasing. Instead you get something distinctly alien, which I've heard called "token pachinko". A model sampling from a distribution that encodes intelligence, but without the underlying mind and agency behind it. I honestly don't know if I'd call it reasoning or not.
It is very interesting, and I suspect that with no constraints on model size or data, you could get indistinguishable-from-intelligent behavior out of these models. But in practice, this is probably going to be seen as horrendously and impractically inefficient, once we figure out how actual reasoning works. Personally, I doubt ten years with this approach is going to get to AGI, and in fact, it looks like these models have been hitting a wall for a while now.
I think at some point, we’re talking about angels dancing on pins. Thought and thinking as qualia that other being experience is probably going to be hard. I would suggest that being able to create a heuristic based on information available and known laws of the universe in question constitutes at least an understanding of what the information means. Thinking that fighting a creature with higher STR and HP stats than your own is a pretty good child’s understanding of the same situation. It’s stronger, therefore I will likely faint if I fight that monster. Having the goal of “not wanting to faint” thus makes the decision heuristic of “if the monster’s statistics are better than yours, or your HP is too low, run away.” This is making a decision more or less.
A kid knows falling leads to skinned knees, and that falling happens when you’re up off the ground is doing the same sort of reasoning. I don’t want to skin my knees, so I’m not climbing the tree.
That's true, but if that leads to running from every battle, then you won't level up. Even little kids will realize that they're doing something wrong if they're constantly running. That's what I mean when I say it has a lot of disconnected knowledge, but it can't put it together to seek a goal.
One could argue that's an issue with its limited memory, possibly a fault of the scaffold injecting too much noise into the prompt. But I think a human with bad memory could do better, given tools like Claude has. I think the problem might be that all that knowledge is distilled from humans. The strategies it sees are adapted for humans with their long-term memory, spatial reasoning, etc. Not for an LLM with its limitations. And it can't learn or adapt, either, so it's doomed to fail, over and over.
I really think it will take something new to get past this. RL-based approaches might be promising. Even humans can't just learn by reading, they need to apply the knowledge for themselves, solve problems, fail and try again. But success in that area may be a long way away, and we don't know if the LLM approach of training on human data will ever get us to real intelligence. My suspicion is that if you only distill from humans, you'll be tethered to humans forever. That's probably a good thing from the safetyist perspective, though.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link