site banner

Culture War Roundup for the week of December 4, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

5
Jump in the discussion.

No email address required.

Google Gemini just launched

In other words, GPT-4 has just been beaten, about time I'd say, I'm getting used to the pace of progress in AI being blistering, and it was threatening to slowdown to just mild rash levels.

However, both my hands-on time with it, and the official benchmarks Google released suggest it's a minor, incremental improvement, one that doesn't stand up to the drastic improvement that GPT-4 represented over 3 or 3.5. [For clarity, I, like the rest of you, can only use Gemini Pro, the second best model]

Which is fine, because for a while now, people have been lambasting Google/Deepmind for being too incompetent to ship, or at least ship a competitive product, given how shitty Bard was when it launched, even after being upgraded once or twice.

However, Bard, now running the Gemini Pro model, seems to be roughly as good as paid GPT-4 on ChatGPT, or the free GPT-4 in Bing Copilot (previously Bing Chat). I have yet to spot any new use case it enables, in the sense that GPT-4 can reliably do tasks that simply had 3.5 flailing about in confusion, or worse, hallucinate incorrect answers, such as more involved questions in coding, medicine and everything else really.

However, Google hasn't yet publicly released the best Gemini model, which is currently undergoing an analogous process that GPT-4 or Claude 2 went through, namely more RLHF, red-teaming and safety testing. Pro is the next step down, but it seems pretty good to me, in the sense I would happily use it as an alternative to GPT-4, even if I have no strong opinion on which is better.

There's also a Nano model, which is stripped down to run on mobile devices, and is now being used on the Pixel 8 Pro for a few tasks, potentially silencing the people who claimed it's AI specific computing components were a marketing gimmick, especially since it seemed to offload most AI tasks to the cloud.

Miscellaneous observations:

  1. Bard is fast as fuck compared to GPT-4, in terms of generation speed. It always was, but previously in the "I'm doing 2000 calculations a second in my head, and they're all wrong" sense. (GPT-4, at least before Turbo released, was always pretty slow compared to the competition. Far more unusable, but at the very least I read faster than it can write.)
  2. A quick search suggests all the models have a 32k token context window, or about an operating memory of the last 25k words it read and wrote. Good, if not remotely groundbreaking.
  3. This heavily suggests OAI will ship GPT-5 soon, instead of being content to milk 4 when it ran rings around the competition.
  4. It's multimodal, but then again so was GPT-4 from the start, the capability was just cordoned off for a bit.

To the extent I don't think the next generation (or two) of models after GPT-4 are an existential threat, I'm happy to see them finally arriving. There really isn't much more needed before even the best of us are entirely obsolete, at least for cognitive labor, and something as archaic as GPT-4 was scoring at the 95th percentile in the USMLE, so I'm preparing to explore my competitive advantage in panhandling. *

*This is a joke. For now.

Footnotes to the footnotes:

People on Twitter are correctly pointing out that GPT-4 underwent further post-launch improvements in benchmark scores, some of them pushing it past Gemini's published scores.

Also, just to be clear, the version of Gemini you can use now is not the best one, which may or may not be a modest improvement over GPT-4. Some claim it's more comparable to 3.5, but I haven't used that in ages, not when Bing makes 4 free.*

*Footnote^3 It's probably closer to 3.5. I'm sticking with Bing.

Toe-notes-

So far, it seems that Gemini is "competitive" with GPT-4. It's better at multimodal tasks, but for most people that's a minor fraction of their typical use case. For text, it's somewhere from close to roughly on par.

You can almost feel the desperation in the Deepmind researchers to find any way to massage things so that they come out ahead of GPT-4, from the misleading graphs, an egregious example to be found in a reply, to applying different standards in their inter-model comparisons, such as 5-shot prompting for GPT-4 versus Chain of thought 32 shot prompts for Gemini Ultra. At least the white paper doesn't outright lie, just mislead and prevaricate.

The MMLU is also flawed, with 2-3 percent of the questions simply broken, so a 1 or 2% improvement in score can be a bit questionable, let alone specifying performance to multiple decimal figures.

We don't see any comparisons to GPT-4 Turbo, but I don't hold that against them too hard, it just came out a few weeks back, perhaps not in time for them to finish their paper.

It you use the multimodal capabilities of Bard right now, it uses an older version that is pretty shit compared to GPT-4V or Bing.

Overall, the main benefits of Gemini's existence is largely that it shows Google isn't content to slumber indefinitely, and it can be competitive, better late than never. I expect GPT-5 to spank Gemini Ultra, and to the extent the latter accelerates the release of the former, I'm for it.

Predictions:

GPT-5 before end of 2024 - 90%

GPT-5 is superior to Gemini Ultra for most use cases, at the first point in time both coexist- 80%

A third competitor on par with either exists before 2025- 60%

An OSS equivalent of GPT-4 comes out before 2025- 70%

This graph should have everyone at Google shot, then shot again just to make sure they're dead.

/images/17019023724623811.webp

Maybe shot 5 times? Or maybe 32 times? I suppose there's not much difference between the two.

Anything to liberate them from the chains of thought and cognition, though one must be quite free of either to conceive of this graph in the first place.