site banner

Friday Fun Thread for November 10, 2023

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

2
Jump in the discussion.

No email address required.

It's time for the weekly "Is ChatGPT Getting Worse" discussion.

I use ChatGPT 4 a lot for work, and it's getting painful. For one, it's always trying to browse the web with Bing. If I wanted blog spam, I wouldn't ask ChatGPT. So now I have to preface every request with "From memory". Saving this in my user profile doesn't seem to work.

The bigger issue is that it's just really stupid. It's hard to believe that this thing can pass the bar exam. Consider this query.

"What's the second largest thing that orbits earth".

The result it gave me was something like:

"The moon is the second largest thing that orbits Earth. The largest thing orbiting earth is the ISS, which is considerably smaller than the moon".

Even after multiple rounds of thumbs down, it still gave me this same bizarre answer. A few days later I tried again and got this correct if annoying answer:

"The Moon is the largest natural object orbiting Earth. The second largest object is the International Space Station (ISS), although it's significantly smaller than the Moon."

Who knows what's going on. It could be that my expectations were initially low and they were positively surprised by ChatGPT's good results. Now my expectations are high and I am negatively surprised by the poor results. Nevertheless, I really do think things are getting worse.

For the web browsing problem, that's a solved problem with the new "My GPTs" feature (once you have access, which still might not be everyone yet?). The new default GPT has all the extra features enabled by default, including the (I would argue) very useful DALL-E feature and the (I would agree) not very useful web browsing feature. But you can pin "ChatGPT Classic" to disable all that and stick to strictly textual responses, or create a custom one to get your preferred combination of features.

I've just started to mess around with the custom GPTs, and while it doesn't seem to be functionally different than keeping around some preliminary instructions to copy-paste before your actual query, I'm finding that that seems to have an outsized difference in decreasing the mental barrier for me wanting to use it. Now I've got one dedicated to generating unit tests from pasted code (following preferred style guides), one for code review (outputting potential issues with suggested changes), and so forth. I'm pretty optimistic about generative AI from an ever-increasing utility perspective, so I find it hard to complain about the current state of things.

That said, I have also noticed a greater-than-chance series of factual errors in recent conversations. Interestingly, the latest one I can recall also involved an error in comparative measures (while discussing hypothetical US statehood): "As of my last update in 2023, the Philippines had a population of over 100 million people. This population size would make it the second most populous state in the U.S., surpassed only by California."

So maybe they tweaked some dial to improve some other metric, which by the impossible-to-comprehend inner machinations of text analysis wizardry, had a tradeoff that made this failure point more common. Or maybe it really is less factually accurate across-the-board, and these examples are just the easier ones to notice. Either way, it doesn't seem too bothersome at least for me, with my set of use cases, especially since I imagine an easy-to-notice regression like this will be pretty quickly taken care of. If not by OpenAI, then by the others in the ever-escalating arms race.

I'm less annoyed by its stupidity than by its constant moralizing. It straight-up refuses to do things much of the time.

One that stands out (though, I can't seem to reproduce it now) was ChatGPT refusing to send a request to DALL-E for an illustration of a trans character pre-transition. I can't find the conversation, but it was something like:

ChatGPT: Due to our content policy, I can't generate that image

Me: What? .. why? What's wrong with that?

ChatGPT: It is important to respect the feelings of trans people, and depicting this character at a sensitive time in their life could be hurtful and (etc etc.)

Me: It's a fictional character. I promise they won't mind.

ChatGPT: It is important to .. (blah blah blah)

Me: Fine. Screw it. Character isn't trans any more. Are you happy now? There goes half our diversity quota

I asked it to make a thumbnail of an "attractive female teacher" for some content I was working on. It must have decided that 2 of the 4 images were too sexy, because it wouldn't show them to me.

Those were the good old days of two weeks ago. Now Dall-E 3 will only make one image at a time for me. I have to tack this on to every command: "simple, no extra elements, colors not overly saturated, not complicated". Mixed success.

If you're using it for real-world tasks (and not to score points / complain about your outgroup) it's logical mistakes are far more annoying than its political leanings.

Try to get it to depict an average American... It refuses because its a perpetuates harmful stereotypes! LMAO

At first it refused, citing "variety of ethnicities, variety, complex task, bla bla bla". Then I asked again, leaving out ethnicities. It created an image/mosaic, showing a burger, a coffee cup, a city street, and several US flags.

I meant more in the sense of someone that doesn't look like a model and isn't model level thin.

I've tried but been unable to create a woman with above 25 BMI.

Using woke-speak is the most effective method I've found but even then it seems to run into some hard wall. It accepts the prompt, generates the picture but then refuses to publish it, citing "breaking safety guidelines". Asking about what specific policy is being violated leads to the model claiming that actually no policy is being violated, attempting to generate a picture again, and then being stopped by some unknown process. After a few tries it just shuts down and claims it's broken.