Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.
- 353
- 2
What is this place?
This website is a place for people who want to move past shady thinking and test their ideas in a
court of people who don't all share the same biases. Our goal is to
optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.
The weekly Culture War threads host the most
controversial topics and are the most visible aspect of The Motte. However, many other topics are
appropriate here. We encourage people to post anything related to science, politics, or philosophy;
if in doubt, post!
Check out The Vault for an archive of old quality posts.
You are encouraged to crosspost these elsewhere.
Why are you called The Motte?
A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently,
it's an element in a rhetorical move called a "Motte-and-Bailey",
originally identified by
philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial
but high value claim to a defensible but less exciting one upon any resistance to the former. He likens
this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for
the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired
propositions to which one retreats when hard pressed."
On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.
New post guidelines
If you're posting something that isn't related to the culture war, we encourage you to post a thread for it.
A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts
such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a
submission statement. A submission statement is required for non-text sources (videos, podcasts, images).
Culture war posts go in the culture war thread; all links must either include a submission statement or
significant commentary. Bare links without those will be removed.
If in doubt, please post it!
Rules
- Courtesy
- Content
- Engagement
- When disagreeing with someone, state your objections explicitly.
- Proactively provide evidence in proportion to how partisan and inflammatory your claim might be.
- Accept temporary bans as a time-out, and don't attempt to rejoin the conversation until it's lifted.
- Don't attempt to build consensus or enforce ideological conformity.
- Write like everyone is reading and you want them to be included in the discussion.
- The Wildcard Rule
- The Metarule

Jump in the discussion.
No email address required.
Notes -
I finally got around to using ChatGPT Agent and it is actually, finally, tingling my "this thing has reasoning and problem-solving capacity and might actually be sentient" senses.
Used it for creating a delivery/pickup order from the Sam's Club website. It hunted down the items, navigated challenges that I intentionally threw up for it, and successfully completed the task I gave it, with very minimal prompting to get it there.
Yet another "Future Shock" moment for me, which is happening every two months nowadays. My benchmark is very, very close to being met.
Anyhow: Anyone have any ideas for some non-mundane, but also non-illegal and non-dangerous ways to make use of a slow but reliable personal assistant that can navigate the internet?"
Yes. I'm very pedantic about my music collection and I insist on having exact dates of release. Often, though, the exact release date isn't easily available, so I have to conduct research to determine an estimated release date. If ChatGTP can imitate my research process I'll take back everything negative I ever said about it:
The following caveats also apply:
*For non-US releases, domestic releases often trailed their foreign counterparts by several months. Any data derived from US sources must take this into account when determining if the proposed estimate is reasonable.
There's a ton more I could put here if I really wanted to get into the weeds, but I don't think ChatGTP can do what I've asked of it thus far.
Honestly I think you probably could get it to work okay right now with current models. However, for something like this, you really need to have some above-average skills in prompting. You'd find it helpful to read something like Anthropic's prompting guide, although that one's specialized a bit more for Claude than OpenAI's stuff. Some of the advice is non-intuitive, and you might need some tweaking. For example, for Claude (has some unique preferences like wrapping sections in XML tags), they recommend something kind of like the following in terms of general structure, and yes, before you ask, order can matter. If you don't want to read through it, here's my abbreviated notes for a good prompt structure for something like this:
You are __. The Task is __ (simple one-sentence summary).
< context to consider first, including why the task is important or needs to be done this way. Yes, telling the AI "why" actually does improve model outputs in many cases >
< input (or input set) to take action on; at least for really long inputs, it should be near the beginning, short outputs this can go later >
< details on how to do it, guiding the thought process. This is where you'd put some version of your bullet points. Your layout seems reasonable but it's possible scaffolding or flowcharting a bit more explicitly, including perhaps what to consider, could help >
< explain how the output should be formatted, and the expected output (possibly repeat yourself here about the original goal) >
< optional: 3-5 diverse examples that help with interpretation of goals and reinforce style and formatting. Also optional is you could provide the thought process to reach those answers in each case, mirroring the logic already outlined >
< any final reminders or bookkeeping stuff >
Did you know that Anthropic actually have a whole tool for that process? If you follow the link, you can get a prompt generator (literally, use AI to help you tweak the prompt to find a better one), auto-generate test cases, etc. It's pretty neat. You can also somewhat mitigate confabulation here by adding a bullet point instruction to allow it to return "I don't know" or "too hard" for the more difficult cases. Also, it's possible that, depending on the level of tool use and thinking needed per bullet, that applying it to a giant music library would require some real money.
I will note that OpenAI's guide has some slightly different advice, but still pretty similar. The main difference is a lack of XML tags, and also, OpenAI recommends this structure:
< identity, style, high-level goals >
< detailed instructions >
< examples of possible inputs with desired outputs >
< context that might be helpful >
As you can tell, it's actually pretty similar overall. Yes, you have more control (as well as more complicated stuff to manage) when doing it programmatically via the API, but I think you could probably try via the normal chat interface with decent results. I should also note that if the AI doesn't need to use very much "judgement", you might actually do better with a well-prompted 'normal' model instead of a simulated-reasoning model.
Thanks for the ideas, but I tried this out and prompting doesn't seem to be the problem. I gave a more detailed response to the below post, but the issue was that while the AI seemed to understand the instructions well enough, it wasn't able to access the necessary information. It seems like it can find stuff on html text pages fine, but if it requires looking at another format (like an OCRed PDF) or a database query it just can't do it. It also doesn't seem to understand how to do certain things absent specific instructions, but that's a subject for another time.
More options
Context Copy link
More options
Context Copy link
Do you have a paid plan? If not, I can try and ask o3 to give this a go, if you tell me a name and have the ground truth handy. I'm reasonably confident it can do this.
I don't and I can give you a couple if you think it would help, but I tried it with 4o and o4-mini and it didn't work well. I've done hundred, if not thousands, of these manually, and I checked several that terminate at different stages of the analysis to see if any would correspond with what I determined originally. I would add the caveat that the actual algorithm would be more complex; I was writing this as I was leaving work on Friday afternoon and there were several rules that I failed to consider that came up when I ran it, most notable that if there are two conflicting months of release then use the last usual release day of the earlier month (assuming the months ore consecutive or otherwise close together or that there's no reason to believe that the earlier month is wrong). There are also a bunch of edge cases that I didn't put in, like singles that are released locally before being given a national release some months later (occasionally happened with smaller labels in the 1960s who had local hits that would get picked up nationally), and specifying which country of release to use, and a bunch of other stuff that's too uncommon to even mention. That out of the way here are the trends I found:
Miscellaneous Notes: It made a few odd errors along the way. It wasn't able to determine a typical release day for any label and always defaulted to Monday, except in the case of British releases, where it defaulted to Friday. These were the most common release days in the 60s and 70s for these territories, but they were by no means universal, and I specifically tested it with labels that released on other days. It also made some errors where it would give an incorrect date, e.g., It would say June 18th was a Monday in a particular year but it was really a Wednesday.
Conclusion: It's capable of producing reasonable estimates that are relatively close to my own estimates, but are nonetheless almost always off. If I don't have a credible release date, almost all estimates will be derived from either copyright data, trade publication review dates, or ARSA chart dates. Since the models seem incapable of accessing any of these, they are functionally useless. They're limited to finding dates I can already find more easily without AI, and estimating release dates based on chart data. I'm not familiar with o-3 or how it compares to what I was able to use, but if you think it could succeed where the others failed, let me know and I'll give you a few to try out. I don't want to waste your tokens on a vanity project for an extremely niche application, but I understand you might be interested in how these models work. Also consider that I'm an AI skeptic who would pay for a service like this if it could reliably do what I need it to do. A lot of my skepticism, though, stems from the fact that it seems incapable of accessing information that's trivial for an actual person to access.
Go for it. o3 is far more competent than either of 4o and 4o-mini. It will probably look for better sources, and spend tens of minutes at the task if it deems it necessary.
A helpful analogy is that 4o is a smooth talking undergrad with lots of charisma and some brains. o3 is an autistic grad-student, far more terse, but far more capable in return. It justifies the price of subscription for me.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link