site banner

Friday Fun Thread for November 10, 2023

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

2
Jump in the discussion.

No email address required.

It's time for the weekly "Is ChatGPT Getting Worse" discussion.

I use ChatGPT 4 a lot for work, and it's getting painful. For one, it's always trying to browse the web with Bing. If I wanted blog spam, I wouldn't ask ChatGPT. So now I have to preface every request with "From memory". Saving this in my user profile doesn't seem to work.

The bigger issue is that it's just really stupid. It's hard to believe that this thing can pass the bar exam. Consider this query.

"What's the second largest thing that orbits earth".

The result it gave me was something like:

"The moon is the second largest thing that orbits Earth. The largest thing orbiting earth is the ISS, which is considerably smaller than the moon".

Even after multiple rounds of thumbs down, it still gave me this same bizarre answer. A few days later I tried again and got this correct if annoying answer:

"The Moon is the largest natural object orbiting Earth. The second largest object is the International Space Station (ISS), although it's significantly smaller than the Moon."

Who knows what's going on. It could be that my expectations were initially low and they were positively surprised by ChatGPT's good results. Now my expectations are high and I am negatively surprised by the poor results. Nevertheless, I really do think things are getting worse.

For the web browsing problem, that's a solved problem with the new "My GPTs" feature (once you have access, which still might not be everyone yet?). The new default GPT has all the extra features enabled by default, including the (I would argue) very useful DALL-E feature and the (I would agree) not very useful web browsing feature. But you can pin "ChatGPT Classic" to disable all that and stick to strictly textual responses, or create a custom one to get your preferred combination of features.

I've just started to mess around with the custom GPTs, and while it doesn't seem to be functionally different than keeping around some preliminary instructions to copy-paste before your actual query, I'm finding that that seems to have an outsized difference in decreasing the mental barrier for me wanting to use it. Now I've got one dedicated to generating unit tests from pasted code (following preferred style guides), one for code review (outputting potential issues with suggested changes), and so forth. I'm pretty optimistic about generative AI from an ever-increasing utility perspective, so I find it hard to complain about the current state of things.

That said, I have also noticed a greater-than-chance series of factual errors in recent conversations. Interestingly, the latest one I can recall also involved an error in comparative measures (while discussing hypothetical US statehood): "As of my last update in 2023, the Philippines had a population of over 100 million people. This population size would make it the second most populous state in the U.S., surpassed only by California."

So maybe they tweaked some dial to improve some other metric, which by the impossible-to-comprehend inner machinations of text analysis wizardry, had a tradeoff that made this failure point more common. Or maybe it really is less factually accurate across-the-board, and these examples are just the easier ones to notice. Either way, it doesn't seem too bothersome at least for me, with my set of use cases, especially since I imagine an easy-to-notice regression like this will be pretty quickly taken care of. If not by OpenAI, then by the others in the ever-escalating arms race.