site banner

Culture War Roundup for the week of February 17, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

4
Jump in the discussion.

No email address required.

Zeno's AGI.

For a long time, people considered the Turing Test the gold standard for AI. Later, better benchmarks were developed, but for most laypeople with a passing familiarity with AI, the Turing Test meant something. And so it was a surprise that when LLMs flew past the Turing Test in 2022 or 2023, there weren't trumpets and parades. It just sort of happened, and people moved on.

I wonder if the same will happen with AGI. To quote hype-man Sam Altman:

trying Grok 3 has been much more of a "feel the AGI" moment among high-taste testers than i expected!

Okay, actually he said that about Chat GPT 4.5, but you get the point. The last 6 months have seen monumental improvements in LLMs, with DeepSeek making them much more efficient and xAI proving that the scaling hypothesis still has room to run.

Given time, AI has been reliably able to beat any benchmarks that we throw at it (remember the Winograd schema?). I think if, 10 years ago, if someone said that AI could solve PHD level math problems, we'd say AGI had already arrived. But it hasn't. So what ungameable benchmarks remain?

  1. AGI should lead to massive increases in GDP. We haven't seen productivity even budge upwards despite dumping trillions into AI. Will this change? When?

  2. AI discoveries with minimal human intervention. If a genius-level human had the breadth of knowledge that LLMs do, they would no doubt make all sorts of novel connections. To date, no AI has done so.

What stands in the way?

It seems like context windows might be the answer. For example, what if we wanted to make novel discoveries by prompting an AI. We might prompt a chain-of-reasoning AI to try to draw connections between disparate fields and then stop when it finds something novel. But with current technology, it would fill up the context window almost immediately and then start to go off the rails.

We stand at a moment in history where AI advances at a remarkable pace and yet is only marginally useful, basically just a better Google/Stack Overflow. It is as smart as a genius-level human, far more knowledgable, and yet also remarkably stupid in unpredictable ways.

Are we just one more advance away from AGI? It's starting to feel like it. But I also wouldn't be surprised if life in 2030 is much the same as it is in 2025.

From 5 hours ago: A complex problem that took microbiologists a decade to get to the bottom of has been solved in just two days by a new artificial intelligence (AI) tool.

Professor José R Penadés and his team at Imperial College London had spent years working out and proving why some superbugs are immune to antibiotics.

He gave "co-scientist" - a tool made by Google - a short prompt asking it about the core problem he had been investigating and it reached the same conclusion in 48 hours.

He told the BBC of his shock when he found what it had done, given his research was not published so could not have been found by the AI system in the public domain

Prof Penadés' said the tool had in fact done more than successfully replicating his research.

"It's not just that the top hypothesis they provide was the right one," he said.

"It's that they provide another four, and all of them made sense.

"And for one of them, we never thought about it, and we're now working on that.".

The researchers have been trying to find out how some superbugs - dangerous germs that are resistant to antibiotics - get created. Their hypothesis is that the superbugs can form a tail from different viruses which allows them to spread between species. Prof Penadés likened it to the superbugs having "keys" which enabled them to move from home to home, or host species to host species. Critically, this hypothesis was unique to the research team and had not been published anywhere else. Nobody in the team had shared their findings.

Slowly it's becoming clear that ASI is already with us. Imagine if you handed someone from 100 years ago a smartphone or modern networking technology. Even after explaining how it worked, it would take them some time to figure out what to do with it. It took a long time after we invented wheels to figure out what to do with them, for example.

The technology to automate 80-90% of white collar labor already exists, for example, with current-generation LLMs. It's just about interfaces and layers and regulation and safeguards now. All very important, of course, but it's not fundamental technical challenge.

I'm skeptical this is actually how it went down. Why would it take 2 days to come up with the hypotheses? I'm not aware of any LLM that thinks that long, which to me implies the scientist was working with it throughout and probably asking leading questions.

It looks like co-scientist is one of the new "tree searching agent" models: you give it a problem and it will spin off other LLMs to look into different aspects of the problem, then prune different decision trees and go further with subsequent spinoff LLMs based on what those initial report back, recursing until the original problem is solved. This is the strategy that was used by OpenAI in their "high-compute o3" model to rank #175 vs humans on Codeforce (competitive coding problems), pass the GPQA (Google-proof Graduate-level Q&A), and score 88% on ARC-AGI (vs. Human STEM graduate's 100%). The recursive thought process is expensive: the previous link cites a compute cost of $1000 to $2000 per problem for high-compute o3, so these are systems that compute on each problem for much longer than the 35 seconds available to consumer ($20/month) users of o1.

Thanks, that's good information. Still, I don't believe it would actually take two days straight to work through the problem, which indicates follow-up questions etc.

Doesn't sound like it.

He gave "co-scientist" - a tool made by Google - a short prompt asking it about the core problem he had been investigating and it reached the same conclusion in 48 hours.

Possible if running on Google servers that the request is somehow queued up or prepared at least in the testing phase which they appeared to have been invited to.

The thought occurred to me as well. Between 'no one else has our data!' and 'we didn't realize someone else might have our data,' I tend to default to the later.

That's not relevant to what 2rafa said. Her point is that coscientist may have taken two days to serve earlier queries and only then gotten to this query.