This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
This aligns with my vibes although I've looked into it a lot less than you have it appears. The "nerd metaphysics" you describe seems to always be what I encounter whenever looking into rational spaces, and it always puts me off. I think that you should actually have a model of how the process scales.
For example you have the AI plays pokemon streams which are the most visible agentic applications of AI that is readily available. You can look at the tools they use as crutches, and imagine how they could be filled with more AI. So that basically looks like AI writing and updating code to execute to accomplish it's goals. I'd like to see more of that to see how well it works. But from what I've seen there it just takes a lot of time to process things, and so it feels like anything complicated it will just take a lot of time. And then as far as knowing whether the code is working etc. hallucination seems like a real challenge. So it seems like it needs some serious breakthroughs to really be able to do agentic coding reliably and fast without human intervention.
I actually have a separate piece on Claude Plays Pokemon. Substack is here, Motte discussion is here.
In short, anyone who bothered watching AI play Pokemon came out strongly doubting that AGI was right around the corner. It made so many elementary mistakes with basic navigation, it got stuck in loops, took ages to do much of anything, etc. It was also reading from RAM which humans obviously can't do, but I was willing to put up with it since it was only getting relatively minor details from that. But then someone made a Gemini agent play Pokemon, and they used the fact that the Claude version inspected RAM to cheat much more egregiously. It "beat" Pokemon a few weeks ago, but the benchmark has been so corrupted that it's functionally meaningless.
I gave it a read, and yeah it's a pretty accurate summary. But I don't agree that the gemini version is meaningless, and I don't think the limitations you suggest in your post would have made the Claude test better. We have a pretty good idea now of the varying level of crutches needed to be useless (none), to get through some of the game, and beat the game. Now we can ask the question of what would it take for a LLM to not be useless without human assist.
In my mind, it basically needs to write code, because the flaws to me appear fundamental due to how predictable they are across LLMs, and seeing versions of these issues going back years. The LLM has to write a program to help it process the images better, do pathfinding, and store memory. In that sense it would be building something to understand game state not that differently than the current RAM checking from reading the screen.
It then needs to have some structure that vastly improves it's ability to make decisions based on various state, I'd imagine multiple LLM contexts in charge of different things, with some method of hallucination testing and reduction.
And it has to do all this without being slow as hell, and that's the main thing I think that improved models can help with hopefully. I'd like it if any of the current Twitch tests started taking baby steps towards some of these goals now that we've gotten the crutch runs out of the way. It's odd to me that the Claude one got abandoned. It feels like this is something the researchers could be taking more seriously, and it makes me wonder if the important people in the room are actually taking steps towards ai agency or if they kust assume a better model will give it to them for free.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link