site banner

Small-Scale Question Sunday for May 18, 2025

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

3
Jump in the discussion.

No email address required.

For those of you who have asked recent LLMs questions in your area of expertise, how accurate are the responses? What is your field and what models are you using?

I'm in the biomedical engineering field. I last used ChatGPT-4o months ago and found the answers to be quite terrible, like what I might expect from someone who only watched a youtube video on the topic. Reading it felt uncanny valley in a way that reminded me vaguely of watching a movie scene with cheap green-screen effects — I could feel the lack of substance viscerally. It left a bad impression and, with my slightly Luddite disposition, I largely ignored LLMs for anything but coding since.

I recently needed a good layman explanation for a project and asked Grok 3. I came away genuinely impressed. I asked it to expand on certain points more rigorously and even formulated a few questions that would be appropriate for a graduate level course, and it did all of this so well it even improved my own understanding of some aspects. When I get time, I’ll try to poke and prod to see if I can find gaps or limits, but it has genuinely changed my view of LLMs. Previously, I felt like they were only really good for coding and expected they would hit diminishing returns, but I’m less sure now.

I felt like they were only really good for coding

They aren't that good for coding. I mean, they are ok for coding simple things that doesn't involve any complicated concepts or deep understanding, something like just reading the manual and applying it directly, many times just copypasting from the right example. But if it gets a bit more advanced it can't help you much. It also loves hallucinating new APIs and settings which don't actually exist, which is hugely annoying - I've been in this scenario many times: "Describe the ways to do X with system S?" - "The best way is to use api A with setting do_X=true, see the following code" - "This code does not work, because api A does not have setting do_X" - "Thanks for correcting me, actually it's api A.do_X which has configuration value enable_doing_X=1" - "That configuration doesn't exist either" - "Thanks for correcting me, actually there's no way to do X with api A" - "Are thee other ways to do X with system S" - "Yes, the best way is to use apis B and C with options do_X=true"... you can guess the rest. They are good for easy tasks, but as soon as the tasks require any actual understanding and not just regurgitating pre-chewed information, its usability drops dramatically. Don't get me wrong, there are a lot of tasks which are literally just applying the right copypastes in the right sequence, but it can only get you so far.