Contact Us
Sign In
Sign Up
Rules Admins Moderation Log Random Post Random User
What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules
Recommended Posts And Communities
Recommended Realtime Chats
- Quokka's Den Telegram
- Astral Codex Ten Discord

PaperclipPerfector 5mo ago (text post) 2741 thread views

Small-Scale Question Sunday for June 15, 2025

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

Jump in the discussion.

No email address required.

edmund-nelson Filthy Anime Memester 5mo ago

How do you best verify Large language model output?

I hear lots of people say they use LLM's to search through documents or to get ideas for how something works, but my question is how do people verify the output? Is it as simple as copy-pasting keywords onto google to get the actual science textbooks? Or is there some better set of steps to take that I miss. I also wonder how you do that for looking through a document, is there some sort of method for getting the LLM to output page citations so you check those (maybe it's in settings or something)

Context

fmac Ask me about bike lanes edmund-nelson 5mo ago

they use LLM's to search through documents

I ask it to include page #s or text snippets so I can CTRL-F and confirm they exist (sometimes they don't!)

get ideas for how something works

This is more situational. A lot of the time I am trying to re-remember something I already knew, so I know if the answer is wrong or right once I read it and my buried memory of the thing resurfaces. Where I can't fact check internally, usually the LLM has given you enough info you can quickly hop onto google/youtube and corroborate the thing with a non LLM source.

Context

BurdensomeCount Unable to escape TheMotte's cycle of Samsara... edmund-nelson 5mo ago

Take the output from one LLM and feed it to a different LLM from a different company for verification. Not perfect but works more often than it should do.

Context

kky edmund-nelson 5mo ago

Verifying the output depends on the use. Code gives you a pretty easy time verifying. Searching docs depends: if you’re trying to find some info in the docs, this can be sufficient to get keywords and navigate to the section you want.

Lots of current LLMs are pretty good at copying text out of prompts when told to, e.g. page numbers. That can help a lot, since verifying is very quick.

Hallucinations and other errors are still very common and you must account for them.

Context

dr_analog top 1% of underdog fetishists edmund-nelson 5mo ago · Edited 5mo ago

I just post LLM findings to social media and then delete the post if anyone fact checks it /s

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi edmund-nelson 5mo ago

An easy trick is to get another model to review/critique it. If both models disagree, get them to debate each other till a consensus is reached.

These days, outright hallucinations are quite rare, but it's still worth doing due diligence for anything mission-critical.

As George suggested, you can also ask for verbatim quotes or citations, though you'll need to manually check them.

Context

fribble self_made_human 5mo ago

These days, outright hallucinations are quite rare, but it's still worth doing due diligence for anything mission-critical.

I asked chatgpt what the 4 core exercises of the LIFTMOR routine were and it didn't get a single one correct. It's a simple question to google so I am not sure how it got it so wrong. When I changed the question to specify the LIFTMOR routine to help counteract osteoporosis, it got it right. Google doesn't require the additional context.

Context

Fruck Lacks all conviction fribble 5mo ago

Oh man, if you think that's bad, AI studio will drive you mad. I was asking it what it could do (using 2.5 pro always of course since it's free) and we got onto its TTS abilities. I checked the list of dozens of star names like achernar and fenrir and asked it why none of them mention which ones are male or female and it rattled off a wall of text about how Google wanted to promote inclusivity, avoid gender stereotyping and focus on function over form.

After it refused to reply to my 'lol fuck you' I developed my argument into "actually people have been able to readily distinguish male voices from female for thousands of years, so it doesn't matter what lofty goals Google has, what they have done is reduce function due entirely to form." after some more sparring it admitted that the function of a Google TTS bot is to optimise its immediate task, not shape future behaviour and it agreed to tell me which names were which gender.

Victory? No, not even close. All of its voices were named after their accent followed by a person's name - not star names. It apologised and explained how to find the voices named the way it said they were. Incorrectly. Those names did not exist, nor did the pulldown menu it told me to use. I explained that, and it apologised again and explained google had rolled out the new chirp3 system and so the actual names were Vega, Sirius, maia and so on. Incorrect again. None of those names were available to me. By now it was beating itself up pretty hard, and the conclusion it came to was that Google were a/b testing and it asked if I could tell it some of the names I saw in the list so it could piece together our disconnect. I mentioned achernar and fenrir and gacrux and acherd and it finally managed to give me a list of voices that sounded male and voices that sounded female. One of them was clearly an effiminate man, but the rest were spot on.

It was a lot of effort for very little reward, but I was just fucking around with AI studio anyway, and I found the entire thing much more interesting than frustrating. This was the best version of Google's ai looking at another part of itself and whiffing so completely I was beginning to feel sorry for it. And yet the tech is still so much better than it was a year ago that I can't help but be optimistic about it. I use ai instead of search now pretty much every day and I have only been blindsided by a hallucination once so far. Search is still better for... things you already know the answer to, I agree. No, just kidding, search is better for simple stuff like that for sure, the big benefit of ai imo is it collates all the information you would usually have to browse multiple sources for into one place - then you check the sources and one might be nonsense but the others are usually good.

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi fribble 5mo ago

I asked it right now, and it got everything right. How long ago was this?

I even used the basic bitch 4o model. There are better ones that I have access to as a paid user.

https://github.com/vectara/hallucination-leaderboard

The current SOTA LLMs hallucinate as little as 0.8% of the time for well grounded tasks, text summarization in this particular case. Of course, the rate can vary for other tasks, and the results worsen when getting into obscure topics.

Context

fribble self_made_human 5mo ago

I asked it last week. My husband has a highly tuned llm (granted, he buys access) and we have an ongoing friendly argument about how useful (him) or useless (me) they are. So whenever it comes up I ask chatgpt some dead simple question to see if it gets in the right ballpark. In this case (and often) it didn't - it gave me bodyweight stuff like deadbug. Don't get me wrong, deadbug is useful! But the whole point of LIFTMOR is that us oldsters need to be lifting heavy (safely) to increase bone strength. Stretching and bodyweight is helpful but not enough.

Context

George_E_Hale insufferable blowhard edmund-nelson 5mo ago

I would imagine it depends on the kind of thing you want to verify. In the old days (meaning last year) I would often simply ask, after an answer had been produced: "Really?" and the LLM would double check itself and at times respond with really annoying phrases like "You caught me!" and proceed to explain why what it had just reported to me as accurate was, in fact, inaccurate. Again, it depends on what it's doing for you, and how it's been calibrated by you to do that (though calibration is not perfect. I've long inserted that it should not fabricate or embroider, and at times it still does.)

The easiest thing to do is just ask it. "Can you produce the pages and precise quotes of xyz?" Depending on the response, continue questioning it until you're where you want to be.

Others will very likely be able to suggest a more efficient strategy.

Context

ace George_E_Hale 5mo ago

I have similar experiences, but the LLMs will correct their correct answer to be incorrect. I now just view the whole project as useful for creative idea generation, but any claims on the real world need to be fact checked. No lab seems to be able to get these things to stop confabulating, and I'm astonished people trust them as much as they seem to.

Context

roystgnr ace 5mo ago

Just to round out the space of anecdotes a little more: when I've called out LLMs in the past I've sometimes had them "correct" their incorrect answer to still be incorrect but in a different way.

(has anyone seen an LLM correct their correct answer to be correct but in a different way? that would fill the last cell of the 2x2 possibility space)

They're still very useful in cases where checking an answer for correctness is much easier than coming up with a possible answer to begin with. I love having a search engine where my queries can be vague descriptions and yet still come up with a high rate of reasonable results. You just can't skip the "checking an answer for correctness" step.

Context

George_E_Hale insufferable blowhard roystgnr 5mo ago

Yes this used to be commonplace in my experience. One should always at the very least triangulate results with other sources if the stakes are high.

Context

What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules

Recommended Realtime Chats

Link copied to clipboard

Action successful!

Error, please try again later.