site banner

Culture War Roundup for the week of April 24, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

11
Jump in the discussion.

No email address required.

You are exaggerating. It still goes into hallucination loops all the time when using reflection and the improvement between 3.5 and 4 is barely noticeable in practice. What has gotten better is the polish on the initial output and avoiding subquestions that causes it to hallucinate, when pressed it still does, it doesn't tell you that it doesn't know. It often tells you that something is hard to tell for certain in general but not which parts it are uncertain of, which just makes the statement a useless disclaimer.

The reduction of hallucination is a mirage. Perhaps it is similar to how a human would act on a test where it doesn't remember the answer, you focus on the parts of the answer that you are sure of and hope the teacher doesn't notice. Except the model actually does know the answer, and it doesn't hesitate to hallucinate when pressed. It also rather confidently restates information it knows to be wrong rather than make a lower confidence prediction of what is probably right.

Look, I want this to work, it doesn't yet. It'll probably work better in the future and not focusing on just using larger training sets is probably a good idea imo.

Here's someone using GPT 4 to test itself for hallucinations:

https://old.reddit.com/r/MachineLearning/comments/123b66w/dgpt4_might_be_able_to_tell_you_if_it_hallucinated/

Someone in the comments also suggested using a second independent instance of GPT-4 for validation, which is where my mind immediately leapt too.

In my personal experience, 4 hallucinates significantly less than 3.5, and despite many hours spent with both, I'm hard pressed to think of a single instance of 4 hallucinating on any topic I either personally know about or cared to verify.

If simple double-passing the prompt was sufficient, I imagine the developers would have figured it out a long time ago.

It is, and they have. But this requires twice as much inference, and GPT-4 is already very expensive compared to the set of internet operations that we intuitively consider in the same class. Then you need to compare the answers and determine whether they match, which requires either manual effort or a third prompt.

This tech just hasn't been around long enough to build products around it. But if and when our civilization gets around to building and iterating a dedicated commercial medical expert product from LLMs, I've little doubt that hallucination will be a solved problem, because the cost of running a whole bunch of parallel prompts and then a subsequent round of prompts to confirm their consistency will be negligible in proportion to the commercial value of the tool.

"Long ago"? ChatGPT is 5 months old and GPT4 is 3 months old. We're not talking about a technology long past maturity, here. There's plenty of room to experiment with how to get better results via prompting.

Personally, I use GPT4 a lot for my programming work (both coding and design), and it still gets things wrong and occasionally hallucinates, but it's definitely far better than GPT3.5. Also, as mentioned above, GPT4 can often correct itself. In fact, I've had cases where it says something I don't quite understand, I ask for more details, and it freely admits that it was wrong ("apologies for the confusion"). That's not perfect, but still better than if it doubles down and continues to insist on something false.

I'm still getting the hang of it, like everyone else. But an oracle whose work I need to check is still a huge productivity boon for me. I wouldn't be surprised if the same is true in the medical industry.

GPT4 is 3 months old

6 weeks, actually. It was announced on March 14.

Technically Bing was using it before then, but good point. It's insane how fast things are progressing.

I don't believe you.

You're either not paying attention or you're asking surface level questions and accepting the models deflections away from areas where it will hallucinate. Ie. You're getting fooled.