site banner

Culture War Roundup for the week of February 17, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

4
Jump in the discussion.

No email address required.

Grok 3 just came out, and early reports say it’s shattering rankings.

Now there is always hype around these sorts of releases, but my understanding of the architecture of the compute cluster for Grok 3 makes me think there may be something to these claims. One of the exciting and interesting revelations is that it tends to perform extremely well across a broad range of applications, seemingly showing that if we just throw more compute at an LLM, it will tend to get better in a general way. Not sure what this means for more specifically trained models.

One of the most exciting things to me is that Grok 3 voice allegedly understands tone, pacing, and intention in conversations. I loved OpenAIs voice assistant until it cut me off every time I paused for more than a second. I’d Grok 3 is truly the first conversational AI, it could be a game changer.

I’m also curious how it compares to DeepSeek, if anyone knows more than I?

Lmsys ranking is not nothing, but we are getting to the point where most models are "good enough" from the perspective of the average lmsys rater and most of the interesting differentiation between models is going on in benchmarks that test specialized skill and knowledge that's not necessarily common among lmsys raters.

I couldn't be bothered to click through the tweets (I don't have a Twitter account) so I don't know if they published other benchmarks too.

According to Andrej Karpathy:

As far as a quick vibe check over 2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI’s strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking. Which is quite incredible considering that the team started from scratch 1 year ago, this timescale to state of the art territory is unprecedented.

The architecture is extremely well known, the days of advancements in the space requiring genuine intellectual titans making leaps into the dark are over, it’s all relatively run of the mill iteration and more compute / bitter lesson now. This is pretty standard for great inventions, a few true geniuses and then a deluge of very intelligent people making rapid, incremental progress as the money and interest pours in. If you hired the best aircraft engineers in 1910 and put out a very impressive plane prototype in 1911 that would be no earth shattering achievement. If you did it in 1901 we’d still be talking about you.