site banner

Small-Scale Question Sunday for June 15, 2025

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

1
Jump in the discussion.

No email address required.

How do you best verify Large language model output?

I hear lots of people say they use LLM's to search through documents or to get ideas for how something works, but my question is how do people verify the output? Is it as simple as copy-pasting keywords onto google to get the actual science textbooks? Or is there some better set of steps to take that I miss. I also wonder how you do that for looking through a document, is there some sort of method for getting the LLM to output page citations so you check those (maybe it's in settings or something)

I would imagine it depends on the kind of thing you want to verify. In the old days (meaning last year) I would often simply ask, after an answer had been produced: "Really?" and the LLM would double check itself and at times respond with really annoying phrases like "You caught me!" and proceed to explain why what it had just reported to me as accurate was, in fact, inaccurate. Again, it depends on what it's doing for you, and how it's been calibrated by you to do that (though calibration is not perfect. I've long inserted that it should not fabricate or embroider, and at times it still does.)

The easiest thing to do is just ask it. "Can you produce the pages and precise quotes of xyz?" Depending on the response, continue questioning it until you're where you want to be.

Others will very likely be able to suggest a more efficient strategy.

I have similar experiences, but the LLMs will correct their correct answer to be incorrect. I now just view the whole project as useful for creative idea generation, but any claims on the real world need to be fact checked. No lab seems to be able to get these things to stop confabulating, and I'm astonished people trust them as much as they seem to.

Just to round out the space of anecdotes a little more: when I've called out LLMs in the past I've sometimes had them "correct" their incorrect answer to still be incorrect but in a different way.

(has anyone seen an LLM correct their correct answer to be correct but in a different way? that would fill the last cell of the 2x2 possibility space)

They're still very useful in cases where checking an answer for correctness is much easier than coming up with a possible answer to begin with. I love having a search engine where my queries can be vague descriptions and yet still come up with a high rate of reasonable results. You just can't skip the "checking an answer for correctness" step.

Yes this used to be commonplace in my experience. One should always at the very least triangulate results with other sources if the stakes are high.