site banner

Culture War Roundup for the week of December 4, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

5
Jump in the discussion.

No email address required.

Google Gemini just launched

In other words, GPT-4 has just been beaten, about time I'd say, I'm getting used to the pace of progress in AI being blistering, and it was threatening to slowdown to just mild rash levels.

However, both my hands-on time with it, and the official benchmarks Google released suggest it's a minor, incremental improvement, one that doesn't stand up to the drastic improvement that GPT-4 represented over 3 or 3.5. [For clarity, I, like the rest of you, can only use Gemini Pro, the second best model]

Which is fine, because for a while now, people have been lambasting Google/Deepmind for being too incompetent to ship, or at least ship a competitive product, given how shitty Bard was when it launched, even after being upgraded once or twice.

However, Bard, now running the Gemini Pro model, seems to be roughly as good as paid GPT-4 on ChatGPT, or the free GPT-4 in Bing Copilot (previously Bing Chat). I have yet to spot any new use case it enables, in the sense that GPT-4 can reliably do tasks that simply had 3.5 flailing about in confusion, or worse, hallucinate incorrect answers, such as more involved questions in coding, medicine and everything else really.

However, Google hasn't yet publicly released the best Gemini model, which is currently undergoing an analogous process that GPT-4 or Claude 2 went through, namely more RLHF, red-teaming and safety testing. Pro is the next step down, but it seems pretty good to me, in the sense I would happily use it as an alternative to GPT-4, even if I have no strong opinion on which is better.

There's also a Nano model, which is stripped down to run on mobile devices, and is now being used on the Pixel 8 Pro for a few tasks, potentially silencing the people who claimed it's AI specific computing components were a marketing gimmick, especially since it seemed to offload most AI tasks to the cloud.

Miscellaneous observations:

  1. Bard is fast as fuck compared to GPT-4, in terms of generation speed. It always was, but previously in the "I'm doing 2000 calculations a second in my head, and they're all wrong" sense. (GPT-4, at least before Turbo released, was always pretty slow compared to the competition. Far more unusable, but at the very least I read faster than it can write.)
  2. A quick search suggests all the models have a 32k token context window, or about an operating memory of the last 25k words it read and wrote. Good, if not remotely groundbreaking.
  3. This heavily suggests OAI will ship GPT-5 soon, instead of being content to milk 4 when it ran rings around the competition.
  4. It's multimodal, but then again so was GPT-4 from the start, the capability was just cordoned off for a bit.

To the extent I don't think the next generation (or two) of models after GPT-4 are an existential threat, I'm happy to see them finally arriving. There really isn't much more needed before even the best of us are entirely obsolete, at least for cognitive labor, and something as archaic as GPT-4 was scoring at the 95th percentile in the USMLE, so I'm preparing to explore my competitive advantage in panhandling. *

*This is a joke. For now.

Footnotes to the footnotes:

People on Twitter are correctly pointing out that GPT-4 underwent further post-launch improvements in benchmark scores, some of them pushing it past Gemini's published scores.

Also, just to be clear, the version of Gemini you can use now is not the best one, which may or may not be a modest improvement over GPT-4. Some claim it's more comparable to 3.5, but I haven't used that in ages, not when Bing makes 4 free.*

*Footnote^3 It's probably closer to 3.5. I'm sticking with Bing.

Toe-notes-

So far, it seems that Gemini is "competitive" with GPT-4. It's better at multimodal tasks, but for most people that's a minor fraction of their typical use case. For text, it's somewhere from close to roughly on par.

You can almost feel the desperation in the Deepmind researchers to find any way to massage things so that they come out ahead of GPT-4, from the misleading graphs, an egregious example to be found in a reply, to applying different standards in their inter-model comparisons, such as 5-shot prompting for GPT-4 versus Chain of thought 32 shot prompts for Gemini Ultra. At least the white paper doesn't outright lie, just mislead and prevaricate.

The MMLU is also flawed, with 2-3 percent of the questions simply broken, so a 1 or 2% improvement in score can be a bit questionable, let alone specifying performance to multiple decimal figures.

We don't see any comparisons to GPT-4 Turbo, but I don't hold that against them too hard, it just came out a few weeks back, perhaps not in time for them to finish their paper.

It you use the multimodal capabilities of Bard right now, it uses an older version that is pretty shit compared to GPT-4V or Bing.

Overall, the main benefits of Gemini's existence is largely that it shows Google isn't content to slumber indefinitely, and it can be competitive, better late than never. I expect GPT-5 to spank Gemini Ultra, and to the extent the latter accelerates the release of the former, I'm for it.

Predictions:

GPT-5 before end of 2024 - 90%

GPT-5 is superior to Gemini Ultra for most use cases, at the first point in time both coexist- 80%

A third competitor on par with either exists before 2025- 60%

An OSS equivalent of GPT-4 comes out before 2025- 70%

The public model is very unimpressive. The scores for the ultra model seem fine. In the end it’s irrelevant, Google can’t replace search with LLMs without compromising their central product and core business (for both technical reasons and because of rules on native advertising). They can try and will try to sell this to enterprise customers, but others have a head start and I think margin in the LLM game will be strongly limited by the fact that the top 3-4 models from Google/Meta/MS/Anthropic will all be interchangeable for most use.

I would say it's far from irrelevant, as much as doing that would be a net negative for Google, they don't have a choice when compared to the alternative of having it be even worse if OAI/Microsoft make them redundant.

They can weep and wail, but they're getting on the bandwagon too, the Porsche is running out of fuel.

They can jump on the wagon, but they’re a waking corpse unless they can figure out how to serve ads in LLM results without breaking the rules or being useless to advertisers. And even if they figure out how, the underlying nature of LLMs as question answering machines is a huge blow to their non-search ads business.

I do not disagree it's a big blow. But to ignore it is a bigger one.

but they’re a waking corpse unless they can figure out how to serve ads in LLM results without breaking the rules or being useless to advertisers

Bing does that, I haven't heard anyone file a lawsuit against them.

Bing’s LLM ads aren’t worth much, the challenge is that the existence of the model itself invalidates much of the earlier advertising-driven directory approach.

The valuable thing would be the model recommending you a Samsung TV because they paid Google to mention them whenever someone asks which TV to buy. That’s illegal, in the US and elsewhere.

That’s illegal, in the US and elsewhere.

I’m gonna need a citation on this one … if this were true then it seems by definition the little ads atop my search results advertising — you guessed it, TVs — would also be illegal. Yet there they are.

They have to be labelled as ads. The model can’t just ‘happen’ to recommend you a Samsung TV, it has to give its regular answer and then, maybe if it mentions a Samsung TV (but there’s no guarantee it will, and whether it does can’t be based on a commercial relationship with Google) they can serve a banner ad next to the answer for it. But this is less lucrative because it’s less predictable, the advertiser has to hope the model organically recommends their products OR accept that it won’t and serve their ads next to relevant prompts anyway, which is much less useful than the current dynamic where serving ads under a ‘best TV $500’ search query sells them the TV before they consider whether there are better options.

It's not illegal if it's clearly identified as an ad, or at least if it's obvious enough that any reasonable person would know it's an ad. Here's a primer from the FTC if you have any further questions:

https://www.ftc.gov/business-guidance/resources/native-advertising-guide-businesses