Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?
This is your opportunity to ask questions. No question too simple or too silly.
Culture war topics are accepted, and proposals for a better intro post are appreciated.
Jump in the discussion.
No email address required.
Notes -
For those of you who have asked recent LLMs questions in your area of expertise, how accurate are the responses? What is your field and what models are you using?
I'm in the biomedical engineering field. I last used ChatGPT-4o months ago and found the answers to be quite terrible, like what I might expect from someone who only watched a youtube video on the topic. Reading it felt uncanny valley in a way that reminded me vaguely of watching a movie scene with cheap green-screen effects — I could feel the lack of substance viscerally. It left a bad impression and, with my slightly Luddite disposition, I largely ignored LLMs for anything but coding since.
I recently needed a good layman explanation for a project and asked Grok 3. I came away genuinely impressed. I asked it to expand on certain points more rigorously and even formulated a few questions that would be appropriate for a graduate level course, and it did all of this so well it even improved my own understanding of some aspects. When I get time, I’ll try to poke and prod to see if I can find gaps or limits, but it has genuinely changed my view of LLMs. Previously, I felt like they were only really good for coding and expected they would hit diminishing returns, but I’m less sure now.
They aren't that good for coding. I mean, they are ok for coding simple things that doesn't involve any complicated concepts or deep understanding, something like just reading the manual and applying it directly, many times just copypasting from the right example. But if it gets a bit more advanced it can't help you much. It also loves hallucinating new APIs and settings which don't actually exist, which is hugely annoying - I've been in this scenario many times: "Describe the ways to do X with system S?" - "The best way is to use api A with setting do_X=true, see the following code" - "This code does not work, because api A does not have setting do_X" - "Thanks for correcting me, actually it's api A.do_X which has configuration value enable_doing_X=1" - "That configuration doesn't exist either" - "Thanks for correcting me, actually there's no way to do X with api A" - "Are thee other ways to do X with system S" - "Yes, the best way is to use apis B and C with options do_X=true"... you can guess the rest. They are good for easy tasks, but as soon as the tasks require any actual understanding and not just regurgitating pre-chewed information, its usability drops dramatically. Don't get me wrong, there are a lot of tasks which are literally just applying the right copypastes in the right sequence, but it can only get you so far.
More options
Context Copy link
Answered questions about TXV's reasonably, but mostly generically. Repeated official government positions nobody in the field believes about freon.
What's the red pill on Freon?
The ozone layer and global warming is not real/not threatened by Freon, but the US government acts on behalf of large chemical companies to ensure that there will never be a generic version of most refrigerants available by inventing excuses to ban them before the patent lapses. There’s some other stuff about the government intentionally underpaying informants tied in there based on evidence standards for environmental regulations(venting Freon requires video evidence from a licensed technician and not any other kind) and sometimes this ties into eccentric metaphysical/spiritual beliefs.
HVAC techs are probably the most conspiratorial/far right demographic in the country because of the recruiting population, so stuff like that is par for course.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I'm a litigator, and Westlaw's built-in AI has essentially replaced interns and is in serious danger of replacing 1st year attorneys for me. I find the AI requires roughly the same amount of prompting to produce roughly the same quality of work, only instead of getting a memo of middling usefulness in 5 days, I get it in 45 seconds. And I'm not expected to provide edits or mentorship to an AI. The AI is generally pretty good at getting me in the general ballpark of what I'm looking for, before doing the rest of my research manually. I have not been willing to try using AI in the drafting process yet, as that seems like a bridge too far in having something else doing my thinking for me.
It's tough, because we still need to make the long term investment in keeping the pipeline full of young attorneys who will eventually be able to provide value that can't be replicated by an AI, but it's at the point where I give the interns assignments for the job training, without actually using any of their work. They'd be crushed if they knew.
More options
Context Copy link
That is literally their training data.
But they improve fast - when it comes to lets say give diagnosis based on symptoms - they really hit the mark. I had a doctor friend of mine test it with real cases they had
More options
Context Copy link
For "frontier tasks" in physics/electrical engineering, it's bad. It just doesn't work, even as a search engine.
My most recent request was "Find me patents about the application of concept X at high magnetic field". Should be easy, patents are public by definition. Searching google patents has worked for decades. There's proprietary patent databases with curated keywords. Perfect training data, easy to search.
But all the current reasoning models with web search just give me results at extremely low magnetic field (which is the standard application, there's many patents like that. That's the reason I'm asking an LLM, I don't want to sift through those by hand). So I specify: "Keep in mind that milli tesla and micro tesla are low magnetic fields. Please exclude patents that use those units from your search". I'm already disillusioned, I shouldn't need to do this. A nerdy highschooler would know better. But it doesn't work. It just ignores the request, appologizes, and keeps spitting out patents with those units in the abstract.
Also, I still need to paste every single patent it spits out into my patent database tool, because literally 50% of the results are hallucinated. The patent number is a completely different patent, and the title it prints doesn't exist.
One core weakness of the current models seems to be things that don't exist (as might be the case for the patent I'm looking for). Another example for that is requests like. "I'm using Oscilloscope Y, and I want to change the color of one of the traces on the display. How do I do that?" For my oscilloscope, the answer is "you can't, those traces have their colors hard-coded, fuck color blind people." But the LLM will automatically read the correct manual (good!), link it, and then proceed to hallucinate itself into psychosis. Just flat out invents entire menus and setting dialogs every time I press it harder.
Maybe they would be better if you gave them the complete patent database of your domain. Sometimes this sort of thing works. You would have to use the paid models though.
At least with gemini, it should just use patents.google.com
Also, that would be many, many millions of tokens.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I've only ever used the free tiers, but ChatGPT loves to hallucinate new Apache Spark configurations. Gemini, surprisingly, knows even less.
I have paid ChatGPT and it hallucinates profusely too, see my other comment above. Had this issue many times, not with Apache Spark specifically but with many other libraries and APIs - it just decides "it'd be nice to have this setting" and just invents it out of thin air, and I spent half an hour trying to hunt it down and going to the source to finally find out it never existed.
More options
Context Copy link
More options
Context Copy link
GPT 4o has improved dramatically quite recently.
More options
Context Copy link
More options
Context Copy link