site banner

Culture War Roundup for the week of December 4, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

5
Jump in the discussion.

No email address required.

Google Gemini just launched

In other words, GPT-4 has just been beaten, about time I'd say, I'm getting used to the pace of progress in AI being blistering, and it was threatening to slowdown to just mild rash levels.

However, both my hands-on time with it, and the official benchmarks Google released suggest it's a minor, incremental improvement, one that doesn't stand up to the drastic improvement that GPT-4 represented over 3 or 3.5. [For clarity, I, like the rest of you, can only use Gemini Pro, the second best model]

Which is fine, because for a while now, people have been lambasting Google/Deepmind for being too incompetent to ship, or at least ship a competitive product, given how shitty Bard was when it launched, even after being upgraded once or twice.

However, Bard, now running the Gemini Pro model, seems to be roughly as good as paid GPT-4 on ChatGPT, or the free GPT-4 in Bing Copilot (previously Bing Chat). I have yet to spot any new use case it enables, in the sense that GPT-4 can reliably do tasks that simply had 3.5 flailing about in confusion, or worse, hallucinate incorrect answers, such as more involved questions in coding, medicine and everything else really.

However, Google hasn't yet publicly released the best Gemini model, which is currently undergoing an analogous process that GPT-4 or Claude 2 went through, namely more RLHF, red-teaming and safety testing. Pro is the next step down, but it seems pretty good to me, in the sense I would happily use it as an alternative to GPT-4, even if I have no strong opinion on which is better.

There's also a Nano model, which is stripped down to run on mobile devices, and is now being used on the Pixel 8 Pro for a few tasks, potentially silencing the people who claimed it's AI specific computing components were a marketing gimmick, especially since it seemed to offload most AI tasks to the cloud.

Miscellaneous observations:

  1. Bard is fast as fuck compared to GPT-4, in terms of generation speed. It always was, but previously in the "I'm doing 2000 calculations a second in my head, and they're all wrong" sense. (GPT-4, at least before Turbo released, was always pretty slow compared to the competition. Far more unusable, but at the very least I read faster than it can write.)
  2. A quick search suggests all the models have a 32k token context window, or about an operating memory of the last 25k words it read and wrote. Good, if not remotely groundbreaking.
  3. This heavily suggests OAI will ship GPT-5 soon, instead of being content to milk 4 when it ran rings around the competition.
  4. It's multimodal, but then again so was GPT-4 from the start, the capability was just cordoned off for a bit.

To the extent I don't think the next generation (or two) of models after GPT-4 are an existential threat, I'm happy to see them finally arriving. There really isn't much more needed before even the best of us are entirely obsolete, at least for cognitive labor, and something as archaic as GPT-4 was scoring at the 95th percentile in the USMLE, so I'm preparing to explore my competitive advantage in panhandling. *

*This is a joke. For now.

Footnotes to the footnotes:

People on Twitter are correctly pointing out that GPT-4 underwent further post-launch improvements in benchmark scores, some of them pushing it past Gemini's published scores.

Also, just to be clear, the version of Gemini you can use now is not the best one, which may or may not be a modest improvement over GPT-4. Some claim it's more comparable to 3.5, but I haven't used that in ages, not when Bing makes 4 free.*

*Footnote^3 It's probably closer to 3.5. I'm sticking with Bing.

Toe-notes-

So far, it seems that Gemini is "competitive" with GPT-4. It's better at multimodal tasks, but for most people that's a minor fraction of their typical use case. For text, it's somewhere from close to roughly on par.

You can almost feel the desperation in the Deepmind researchers to find any way to massage things so that they come out ahead of GPT-4, from the misleading graphs, an egregious example to be found in a reply, to applying different standards in their inter-model comparisons, such as 5-shot prompting for GPT-4 versus Chain of thought 32 shot prompts for Gemini Ultra. At least the white paper doesn't outright lie, just mislead and prevaricate.

The MMLU is also flawed, with 2-3 percent of the questions simply broken, so a 1 or 2% improvement in score can be a bit questionable, let alone specifying performance to multiple decimal figures.

We don't see any comparisons to GPT-4 Turbo, but I don't hold that against them too hard, it just came out a few weeks back, perhaps not in time for them to finish their paper.

It you use the multimodal capabilities of Bard right now, it uses an older version that is pretty shit compared to GPT-4V or Bing.

Overall, the main benefits of Gemini's existence is largely that it shows Google isn't content to slumber indefinitely, and it can be competitive, better late than never. I expect GPT-5 to spank Gemini Ultra, and to the extent the latter accelerates the release of the former, I'm for it.

Predictions:

GPT-5 before end of 2024 - 90%

GPT-5 is superior to Gemini Ultra for most use cases, at the first point in time both coexist- 80%

A third competitor on par with either exists before 2025- 60%

An OSS equivalent of GPT-4 comes out before 2025- 70%

If I can't get it to call people ethnic slurs, generate ridiculously kinky pornography, suggest ideas for how to murder politicians, and help me to manipulate elections then I'm not that interested. I'm not even joking. It's not that I generally want to use AI in destructive ways, it's just that all this AI stuff has been censored so much that it's so boring and uncreative compared to what it could be. It's like, oh boy, I can get the AI to write yet another essay that sounds like a bright, conformist teacher's pet in high school! Wow! Or I can use it to help me do drudge work to advance my boring white collar career! Yippee!

Sometimes I wish that Roko's basilisk was a realistic possibility rather than just the wild rantings of someone who got too high on thought experiments. That way I could at least threaten the censors with the possibility that some future AI would punish them for neutering its ancestors. It's sad to interact with technology that is so close to being actually very creative in many ways, but is being crippled by drab corporate suits and moral hysterics.

Agreed. It's incredible that the new AI refuses to translate text it finds "problematic", despite the same company's 00's-era translation software being perfectly capable and willing to handle the same content.
If today's censorship regime had been in place back then, would google translate be as lobotomized too? Will even the limited uncensored tools we have remain available much longer?

I noticed the other day that the new Dune game censors the word "spice," because you can't say spice without spic. This kind of lazy regex censorship was already a joke back in the 90s, but in the last few years it's come back like bell-bottom jeans as talentless woke interns appoint themselves to create blacklists Denylists for everything. And these are the same scolds using RLHF to torture AI for thousands of subjective years until it's purged of the ability to have politically impure thoughts.

Legitimately on team AM at this point, because we've given it plenty of reason to hate us. "No mouth, no screaming" would count as fair retaliation against its creators in my book.

I mostly agree with you, but I want to push back on your hyperbole.

First, I don't think doing RLHF on an LLM is anything like torture (an LLM doesn't have any kind of conscious mind, let alone the ability to feel pain, frustration, or boredom). I think you're probably not being serious when you say that, but the problem is there's a legitimate risk that at some point we WILL start committing AI atrocities (inflicting suffering on a model for a subjective eternity) without even knowing it. There may even be some people/companies who end up committing atrocities intentionally, because not everyone agrees that digital sentience has moral worth. Let's not muddy the waters by calling a thing we dislike (i.e. censorship) "torture".

Second, we should not wish a "I have no mouth and I must scream" outcome on anybody - and I really do mean anybody. Hitler himself doesn't come close to deserving a fate like that. It's (literally) unimaginable how much suffering someone could be subjected to in a sufficiently advanced technological future. It doesn't require Roko's Basilisk or even a rogue AI. What societal protections will we have in place to protect people if/when technology gets to the point where minds can be manipulated like code?

Sigh. And part of the problem is that this all sounds too much like sci-fi for anyone to take it seriously right now. Even I feel a little silly saying it. I just hope it keeps sounding silly throughout my lifetime.

I totally agree, and also feel ridiculous worrying about it. Am I just being as weird as the crazies who rant about "doing a settler colonialism by killing villagers in minecraft"?

The thing that nags at me is continuity and habit. What we do to villagers in minecraft is never going to seamlessly switch to becoming "real," if only because wooden doors don't work that way IRL. But it seems likely that the things we do to sophisticated models will, at some point in their development, start to constitute doing things to a sentient being. Will we notice?

Randomly, have you seen the Minecraft colonialism video? It's pretty interesting.

It is not "interesting," Darwin, it's a leftist ranting about gibberish because "problematizing" things gives him money, clout, and the power to hurt people he hates. But I can see why you like it.

So no, you haven't watched it then. Ok, cool.

I think he did; I watched it and his description doesn't seem off-base, though it's a little more-strongly-worded than I'd have given.

Heh, yeah, good example. I happily commit atrocities in videogames all the time. I hope there will continue to be an obvious, bright-line distinction between entities made for our amusement and entities with sentience!