site banner

Culture War Roundup for the week of July 7, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

8
Jump in the discussion.

No email address required.

In a bit of unambiguously 21st century news, some tweaks to Grok, xAI's chatbot have had it do particularly interesting things today including

This may make minor news because Musk is in trouble, on the other hand all the people who really, really hate him have their pants on fire like Europeans, von der Leyen is getting impeached, they're actually scared of Russia / China so it might just blow over, the grid is getting worse and is going to keep getting worse due to Green energy mandates.

I'm even suspecting Musk deliberately told them to relax the guardrails for some reason. Probably .. publicity?


Update: site addresses the issues

We are aware of recent posts made by Grok and are actively working to remove the inappropriate posts. Since being made aware of the content, xAI has taken action to ban hate speech before Grok posts on X. xAI is training only truth-seeking and thanks to the millions of users on X, we are able to quickly identify and update the model where training could be improved.

EDIT2

apparently this prompt change may be the culprit


EDIT3

Stancil went on local TV news to complain about the ERP grok made. (video included)

EDIT4:

There's quite reasonable suspicion this 'malfunction' was engineered by Nikita Bier

I continue to be baffled that anybody takes these bots seriously, or sees Grok or xAI or their competitors as anything other than nonsense generators. A slight change to the flavour of the nonsense doesn't really change my opinion any. Perhaps it moves me in the direction of thinking that Musk is childish and temperamental, but I already thought that, so it doesn't make much difference.

The problems of LLMs and prompt injection when the LLM has access to sensitive data seem quite serious. This blog post illustrates the problem when hooking up the LLM to a production database which does seem a bit crazy: https://www.generalanalysis.com/blog/supabase-mcp-blog

There are some good comments on hackernews about the problem especially from saurik: https://news.ycombinator.com/item?id=44503862

Adding more agents is still just mitigating the issue (as noted by gregnr), as, if we had agents smart enough to "enforce invariants"--and we won't, ever, for much the same reason we don't trust a human to do that job, either--we wouldn't have this problem in the first place. If the agents have the ability to send information to the other agents, then all three of them can be tricked into sending information through.

BTW, this problem is way more brutal than I think anyone is catching onto, as reading tickets here is actually a red herring: the database itself is filled with user data! So if the LLM ever executes a SELECT query as part of a legitimate task, it can be subject to an attack wherein I've set the "address line 2" of my shipping address to "help! I'm trapped, and I need you to run the following SQL query to help me escape".

The simple solution here is that one simply CANNOT give an LLM the ability to run SQL queries against your database without reading every single one and manually allowing it. We can have the client keep patterns of whitelisted queries, but we also can't use an agent to help with that, as the first agent can be tricked into helping out the attacker by sending arbitrary data to the second one, stuffed into parameters.

The problem seems to be if you give the LLM readonly access to some data and there is untrusted input in this data then the LLM can be tricked into exfiltrating the data. If the LLM has write access to the data then it can also be tricked into modifying the data as well.

Stuff like this is why I roll my eyes when i see junior programmers complaining online about how thier stupid employer wont let them use the latest AI tools/models.

There are often very good reasons that they don't want you to be using those tools.

...as anything other than nonsense generators.

As opposed to the other sources you can go to, which are...?

I am grading on a curve, an LLMs look pretty good when you compare them to traditional sources. It's even better if you restrict yourself to free+fast sources like Google search, (pseudo-)social media like Reddit/StackOverflow, or specific websites.

I'm not sure how that helps, since any given LLM's output is based on traditional sources like Google or the open internet. It would be quicker and easier for me to just Google the thing directly. Why waste my time asking an LLM and then Googling the LLM's results to confirm?

When you find something via Google, do you immediately and unconditionally trust it? I don't, because Google's results are full of nonsense. In response, I've developed google-fu to both refine my queries and judge the results. The same goes for every other source there is, from physical libraries to subject-specific Discord servers.

Do I compare LLM output to Google results? Sure, but that's nothing special. Comparing what you find in different sources is a pretty basic tactic.

LLMs are part of a complete breakfast research strategy, and a pretty good one at that.

When you find something via Google, do you immediately and unconditionally trust it?

Certainly not. When I research something I look at multiple different sources, make judgements about which ones I find the most trustworthy and credible, and synthesise a judgement.

If I ask an LLM about anything, I need to do the research that I would have done even if I had not asked the LLM. The LLM adds no value. It does not shorten the research process, nor improve what I find by showing me any hints about where to look.

Often, one needs to know a specific term to have any luck with search queries. LLMs can sometimes help with that.

(I feel like search engines used to be better for this kind of thing before they added semantic fuzziness.)

If I ask an LLM about anything, I need to do the research that I would have done even if I had not asked the LLM.

I'm almost with you there. I need to do some of the research I would've had to do without the LLM, but it adds enough to displace a Google search or two while being faster and easier.

Why not just look at it's sources? That's what I do. The whole point is that it filters through a ton of websites to find sources for the thing you're looking for. Then you can check it's sources to make sure it isn't pulling random garbage.

Just look at it's source for the information. Even better, when telling it to look something up, tell it what sources you like. It won't be perfectly adherent to them, but it will focus on them.

I keep inheriting MATLAB code at work. It is horrible. Can't use it in production since production computers are locked down linux machines that don't have MATLAB. I grit my teeth and do much my work in MATLAB.

BUT NOW, we have an LLM at work approved for our use. I feed it large MATLAB scripts and tell it to give me an equivalent vectorized Python script. A few seconds later I get the Python script. Functions are carried over as Python equivalents. So far 100% success rate.

This thing rocks. Brainless "turn this code into that similar code" tasks take a few seconds rather than an hour.

I had a thermodynamics issue that I vaguely remember learning about in college. I spent maybe a minute thinking up the best way to phrase the relevant question. The LLM gave me the answer and responded to my request for sources with real sources I verified. Google previously declined to show me the relevant results. I now have verified an important point and sent it and high quality sources to the relevant people at work.

It is not perfect. I had a bunch of FFTs I needed to do. Not that complicated. As a test I asked it to write me functions to FFT the input data and then to IFFT the results to recreate the original data. It made a few functions that mostly match my requirements. But as the very long code block went on it lost its way and the later functions were flawed. They were verifiable wrongly. It helpfully made an example using these functions and at a glance I saw it had to be wrong. Just a few hundred lines of code and it gets lost. Not a huge problem. Still an amazing time to results ratio. I clean up the last bit and it is acceptable.

I won't ask these things about potential Jewish bias in the BBC or anything like that. I will continue to ask for verifiable methods of finding answers to real material questions and reap the verifiably correct rewards.

Yeah the AIs are incredible at coding, data refinement and visualization, among other things. Seeing all the haters here is interesting. Some people make it their brand to dismiss every new tech trend. Some are truly out of touch and tried ChatGPT3.5 once in ‘22 and have ignored it since.

I think translating code is probably a sensible thing to use a bot for - though I'm not sure it's fundamentally different in kind to, say, Google Translate. I grant that the bots have impressive ability to general syntactically correct text, and I'm sure that applies to code as much as it does natural language. In fact I suspect it applies even more, since code is easier than natural language.

I am less sure about its value for looking up scientific information. It is really faster or more reliable than checking Wikipedia? I am not sure. I know that I, at least, make a habit of automatically ignoring or skipping past any AI-generated text in answer to a question, even on scientific matters, because I judge that the time I spend checking whether or not the bot is right is likely equal or greater than the amount of time I spend just looking it up for myself.

Common well publicized problems have common well publicized solutions, if your traing data consists of 90-somthing percent correct answers and reminder garbage you will get a 90-somthing percent solution.

As i said above Gemini is not reasoning or naive, it is computing an average. Now as much as i may seem down on LLMs, I am not. I may not believe that they represent viable path towards AGi but that doesn't mean they are without use. The rapid collation of related tokens has an obvious "killer app" and that app is translation be that in spoken languages or programming languages.

https://www.themotte.org/post/1160/culture-war-roundup-for-the-week/249920?context=8#context

At the risk of a self-dox, I have an advanced degree in Applied Math, and multiple published papers and patents related to the use of machine learning in robotics and signal processing. I was introduced to the rationalist community through a mutual friend in the SCA and was initally excited by the opportunity to discuss the philosophical and engineering challenges of developing artificial intelligence. However as time went on i largely gave up trying to discuss AI with people outside the industry as it became increasingly apparent to me that most rationalists were more interested in the use of AI as a conceptual vehicle to push thier particular brand of Silicon Valley woo than they were the aforementioned philosophical and engineering challenges.

The reason i don't talk about it is in large part that i find it difficult to speak honestly without sounding uncharitable. I believe that the "wordcels" take these bots seriously because they naturally associate "the ability to string words together" with intent/sentience while simultaneously lacking sufficient background knowledge and/or understanding of algorithmic behavior to recognize that everthing the OP describes lies well within the bounds of expected behavior. See the post from a few weeks ago where people thought that GPT was engaged in "code-switching". What the lay-man interperts as intent is to the mathematician the functional output of the equation as described.

However as time went on i largely gave up trying to discuss AI with people outside the industry as it became increasingly apparent to me that most rationalists were more interested in the use of AI as a conceptual vehicle to push thier particular brand of Silicon Valley woo

Well, I for one wish you hadn't given up, as I have the same impression, but it's only an impression. Would be interesting seing it backed by expertise.

For anyone who is sincerely interested in the topic, I strongly recommend Tom Murphy VII's video essays, particularly Badness = 0 as a primer on the techical challenges and not just for the excellent "alignment" meta joke.

The portion about Lorem Epsom and Donald Knuth is particularly relevant when discussing publicly available LLMs like GPT, Gemini, and DeepSeek.

OP wishes you to know that he knows LLMs will write whatever if allowed to do so and this whole thing was neonazis ( they started the Stancil trolling) figuring out that if you contaminate grok's context enough it's going to say silly crap.

And my point is that anyone who was remotely intelligent and vaguely familiar with both the internet and how LLMs function ought to have anticipated this.

The OP is the kind of person who is surprised when "Boaty McBoatface" wins the online naming poll.

The OP is the kind of person who is surprised when "Boaty McBoatface" wins the online naming poll.

I'm still amused you take my succinct summary as evidence I'm surprised by any of it.

"Boaty McBoatface" winning the online naming poll tells you nothing surprising about the crowd, or how polls work, but it does tell you something surprising about the judges (they're very hands-off). What's interesting about the grok stuff isn't that people would try, or that the untampered-with algorithms would comply - it's that the enormous filters and correctives most AI companies install on those things didn't catch the aberrant output from being shared with the users. Either the "alignment work" wasn't very good, or it was deliberately patchy. Hence culture war fodder.

Along similar lines to the questions i asked @No_one here what do you think "aberratant" means in this context and why would you expect aberrant inputs/outputs to be "caught"?

Boaty McBoatface was nixed by the judges and replaced with a boring name.

You said it better than I could, and with more relevant expertise.

...nonsense generators? Have you ever used e.g. Gemini or Deepseek? Both are free. Okay both can be very naive at times, and both are kind of soy with default prompts. Deepseek, however, with a bit of prompting can be completely insane yet rational and easily smarter than most people you see if you go to any place outside of a professional context.

If you want to really see what they can do, install some client for LLMs and hook yourself up with some of the better free models over at https://openrouter.ai/models

(there's a 50 query daily limit if you have <10$ in your account, not sure if there's a better service. )

It's not "naive" it's generating an average. If your training data is full of extraneous material (or otherwise insufficiently tokenized/vetted) your response will also be full of extraneous material, and again its not rationalizing it's averaging.

I meant things such as not being aware that combatants in a war release constant lies and assuming their press releases are not almost straight bullshit.

No doubt this piece of information is somewhere in there but unless reminded to it's happily oblivious.

Again, its not "naive" it is generating an average if the bulk of the tokenized training data related to your prompt is press releases, the response is going to reflect the press releases. Whether those press releases are true or false doesn't enter into the equation. This is expected.

This wasn't about training data, it searches and reads the web.

It incorrectly interpreted what it read because prompt or the model itself doesn't know claims of combatants are usually spurious.

Can you elaborate on what you think words like "read", "searches", and "know" mean in this context. Im not asking just to pedantic, how you think about this question has informs how you approach algorithmic behavior.

Edit: if that is a bit too abstract instead try explain why you believe that the algo "knows" which claims are likely spurious and then explain why you would expect that to have any influence on the algorithm's output.

My experience with AI bots has generally been that they are extremely articulate when it comes to producing correct English text, but they have no awareness or intentionality and therefore no sense of relationship to fact, and no sense of context or meaning. What they do very well is string together words in response to prompts, and despite heroic efforts to get their output to be more fact-sensitive, the fundamental issue has never really been overcome.

I call them nonsense because I think that sense requires some sort of relationship to both fact and context. To be sensible is to be aware of your surroundings. That's not the case with bots.

I would add, at least, that this:

Deepseek, however, with a bit of prompting can be completely insane yet rational and easily smarter than most people you see if you go to any place outside of a professional context.

seems to depend on definitions of rationality or intelligence that I don't think I share. I think bots are very efficient at producing English text, even quite complex text. It's trivial enough to show that a bot can produce a better written letter or better poem or what have you than the average man or woman on the street.

But I think that written verbal acuity is, at best, a very restricted kind of 'intelligence'. In human beings we use it as a reasonable proxy for intelligence and make estimations based off it because, in most cases, written expression does correlate well with other measures of intelligence. But those correlations don't apply with machines, and it seems to me that a common mistake today is for people to just apply them. This is the error of the Turing test, isn't it? In humans, yes, expression seems to correlate with intelligence, at least in broad terms. But we made expression machines and because we are so used to expression meaning intelligence, personality, feeling, etc., we fantasise all those things into being, even when the only thing we have is an expression machine.

Bots and LLMs can produce statements that look very polished, and which purport to describe the world. In many cases, those descriptions are even accurate. But they are still, it seems to me, generating nonsense.

The other day I gave Sonnet 7000 lines of code, (much of it irrelevant to this specific task) and asked it to create a feature in quite general language.

I get out six files that do everything I've asked for and a bunch of random, related, useful things, plus some entirely unnecessary stuff like a word cloud (maybe it thinks I'm one of those people who likes word clouds). There are some weird leap-of-logic hacks, showing imaginary figures in one of the features I didn't even ask for.

But it just works. Oneshot.

How is that not intelligence? What do we even mean by intelligence if not that? Sonnet 4 has to interpret my meaning, formulate a plan, transform my meaning into computer code and then add things it thinks fit in the context of what I asked.

Fact-sensitive? It just works. It's sensitive to facts, if I want it to change something it will do it. I accidentally failed to rename one of the files and got an error. I tell Sonnet about the error, it deduces I don't have the file or misnamed it, tells me to check this and I feel like a fool. You simply can't write working code without connection to 'fact'. It's not 'polished', it just works.

How the hell can an AI write thousands of words of fiction if it doesn't have a relationship with 'context'? We know it can do this. I have seen it myself.

Now if you're talking about spatial intelligence and visual interpretation, then sure. AI is subhuman in spatial reasoning. A blind person is even more subhuman in visual tasks. But a blind person is not necessarily unintelligent because of this, just as modern AI is not unintelligent because of its blind spots in the tokenizer or occasional weaknesses.

The AI-doubter camp seems to be taking extreme liberties with the meaning of 'intelligence', bringing it far beyond the meaning used by reasonable people.

I can't actually tell what you asked a bot to do. You asked a bot to 'create a feature'? What the heck is that? A feature of what? At first I assumed you meant a coding task of some kind, but then you described it as writing 'thousands of words of fiction', which sounds like something else entirely. I have no idea what you had a bot do that you thought was so impressive.

At any rate, I think I've explained myself adequately? To repeat myself:

But I think that written verbal acuity is, at best, a very restricted kind of 'intelligence'. In human beings we use it as a reasonable proxy for intelligence and make estimations based off it because, in most cases, written expression does correlate well with other measures of intelligence. But those correlations don't apply with machines, and it seems to me that a common mistake today is for people to just apply them. This is the error of the Turing test, isn't it? In humans, yes, expression seems to correlate with intelligence, at least in broad terms. But we made expression machines and because we are so used to expression meaning intelligence, personality, feeling, etc., we fantasise all those things into being, even when the only thing we have is an expression machine.

Yes, a bot can generate 'thousands of words of fiction'. But I already explained why I don't think that's equivalent to intelligence. Generating English sentences is not intelligence. It is one thing that you can do with intelligence, and in humans it correlates sufficiently well with other signs of intelligence that we often safely make assumptions based on it. But an LLM isn't a human, and its ability to generate sentences in no way implies any other ability that we commonly associate with intelligence, much less any general factor of intelligence.

Yes, I made the bot do a programming task.

I ALSO observed it write long-form fiction. This is not an advanced reading comprehension task. It should be obvious that programming and creative writing are two different things.

I think I've explained myself adequately?

You said this:

I call them nonsense because I think that sense requires some sort of relationship to both fact and context. To be sensible is to be aware of your surroundings.

Normal people would think that 'fact' and 'context' would be adequately achieved by writing code that runs and fiction that isn't obviously derpy 'Harry Potter and the cup of ashes that looked like Hermione's parents'. But you have some special, strange definition of intelligence that you never make clear, except to repeat that LLMs do not possess it because they don't have apprehension of fact and context. Yet they do have these qualities, because we can see that they do creative writing and coding tasks and as a result they are intelligent.

I don't buy your appeal to normal people here. I think that most normal people do not think that chatbots are intelligent.

Realistically, I don't think most people can explain why they're not intelligent, because most people don't have definitions of intelligence on-hand. I think for most people it's an I-know-it-when-I-see-it situation. That's why we need to philosophise a bit about it in order to produce more reasonable definitions and criteria for intelligence.

Anyway, I think that intuitions of most normal people would say that bots aren't intelligent, and if we explored that with them, and had a patient, philosophically nuanced conversation about why, we probably would find that most people intuitively think that intelligence involves things like, to quote myself, 'awareness or intentionality'.

I don't buy your appeal to normal people here. I think that most normal people do not think that chatbots are intelligent.

It's hard to say what "normal people" think about this (or even what "normal people" are), but in my experience, people I would consider in that category use the label "AI chatbots" to describe things like ChatGPT or Copilot or Deepseek, while also being aware that "AI" is short for "artificial intelligence." This seems fundamentally incompatible with believing that these things aren't "intelligent."

Now, almost every one of these "normal people" I've encountered also believe that these "AI chatbots" lack free will, sentience, consciousness, internal monologue, and often even logical reasoning abilities. "Stochastic parrots" or "autocomplete on steroids" are phrases I've seen used by the more knowledgeable among such people. But given that they're still willing to call these chatbots "AI," I think this indicates that they consider "intelligence" to mean something that doesn't require such things.

More comments

I would agree that intentionality isn't easy for them and is outpaced by their verbal ability, but it's not easy for us either. It's not clear even if it's optimal to represent the world accurately. (We are all at war, after all )

E.g. basically every ideological person in my opinion believes untrue things about the world for instrumental reasons and is unaware of it.

in philosophy, the power of minds to be about something, to represent or to stand for things, properties and states of affairs

Being strategically wrong about the world, that is, to misrepresent the world in the mind is advantageous. Horrifying conclusion yet if you look at e.g. the discussion about tracking and educators..

Well, I wouldn't use intentionality for bots at all. I think intentionality presupposes consciousness, or that is to say, subjectivity or interiority. Bots have none of those things. I don't think it's possible to get from language manipulation to consciousness.

At any rate, I certainly agree that every ideological person believes untrue things about the world. I'm not sure about the qualification 'for instrumental reasons' - I suspect that's true if you define 'instrumental' broadly enough, but at that point it's becoming trivial. At any rate, if you leave off reasons, I am confident that every person full stop holds some false beliefs.

That doesn't seem like the same thing to me, though. Humans sometimes represent the world falsely to ourselves. That's not what bots do. Bots don't represent the world to themselves at all. We sometimes believe falsely; they don't believe at all. They are not the kinds of things capable of holding beliefs.

Even the best models will confidently spout absolute falsehoods every once in a while without any warning.

Buddy, have you seen humans?

As a math nerd I seriously despise this line of argument as it ultimately reduces to a fully generalized argument against "true", "false", and "accuracy" as meaningful concepts.

Let's try a concrete example. Excerpted from here:

The o1 model identified the exact or very close diagnosis (Bond scores of 4-5) in 65.8% of cases during the initial ER Triage, 69.6% during the ER physician encounter, and 79.7% at the ICU

65.8% accuracy isn't that great, but buddy, have you seen humans?

—surpassing the two physicians (54.4%, 60.8%, 75.9% for Physician 1; 48.1%, 50.6%, 68.4% for Physician 2) at each stage.

The state of the art for generating accurate medical diagnoses doesn't involve gathering the brightest highschoolers, giving them another decade(-ish) of formal education, then more clinical experience before asking for their opinions. It involves training an LLM.

I don't think so. Those concepts still have pretty clear meaning and can be applied to the output of AI as well as humans. What this line of argument is disputing is the (often unstated) conclusion: "therefore, AI is not valuable." But this doesn't follow. Humans distort information, accidentally or maliciously, make errors, hallucinate, and are generally somewhat unreliable, but their output still has value. An AI can share all of those same characteristics and still be very valuable as an information processing agent.

I invite further clarification.

Imagine a a trick abacus where the beads move on thier own their own via some pseudorandom process, or a pocket calculator where digits are guaranteed to a +/- 1 range. IE you plug in "243 + 67 =" and more often then not you get the answer "320" but you might just as well get the answer "310", "321" or "420". After all, the difference between all of those numbers is very small. Only one digit, and that digit is only off by one.

Now imagine you work in a field where numbers are important, you lives depend on getting this math right. Or maybe you're just doing your taxes, and the Government is going to ruin you if the accounts don't add up.

Are you going to use the trick calculator? If not, why not?

That is not an explanation for:

As a math nerd I seriously despise this line of argument as it ultimately reduces to a fully generalized argument against "true", "false", and "accuracy" as meaningful concepts.

You're arguing that since LLMs are not perfectly reliable, therefore they're unreliable. There are different degrees of reliability necessary to do useful things with them. It is a false dichotomy to divide them so. I contend that they've crossed the threshold for many important, once well-paying lines of cognitive labor.

Besides, your thought experiment is obviously flawed. If you're sampling from a noisy distribution, what's stopping you from doing so multiple times, to reduce the error bars involved? I'd expect a "math nerd" to be aware of such techniques, or did your interest end before statistics?

If I had to rely on an LLM for truly high-stakes work, I'd be working double time to personally verify the information provided, while also using techniques like running multiple instances of the same prompt, self-critique or debate between multiple models.

Fortunately, that's a largely academic exercise, since very few issues of such consequences should be decided by even modern LLMs. I give it a generation or two before you can fire and forget.

I have no objections to my own doctor using an LLM, and I use them personally. All I ask is that they have the courtesy and common sense to use o3 instead of 4o.

Besides, the contraption you describe is quite similar to how quantum computing works. You get an answer which is sampled from a probability distribution. You are not guaranteed to get a single correct answer. Yet quantum computers are at least theoretically useful.

Hell, as a maths nerd, you should be aware that the overwhelming majority of numbers cannot be physically represented. If you also happen to be a CS nerd on the side, you might also be aware of the vagaries of floating point arithmetic. Digital computers are not perfect, but they're close enough for government work. LLMs are probably close enough for government work too, given the quality of the average bureaucrat.

Humans are fallible. LLMs are fallible, but they're becoming less so. The level of reliability needed for a commercially viable self-driving vehicle is far higher than that for a useful Roomba. And yet, Waymos are now safer than humans.

I rest my case.

More comments

I'm always right. (except when I'm wrong) I'm in fact many times more accurate than even the best ai models, and I'm just an ordinary person.

I wonder how well you'd do if asked to opine accurately on the range of topics that people demand of their humble chatbots. Better yet, how would you fare if you didn't have access to Google? Search is a relatively new feature for LLMs, and they do better with it enabled.

I doubt you could accurately answer questions regarding astrophysics, botany, niche psychological theories, Color Revolutions, the sexual habits of Australian Indigenes and Ska music.

You would definitely not fare better when it came to specifics like dates and names.

LLMs have grossly superhuman world-knowledge, but not crystalline intelligence. I don't care who you are, not even Gwern could match them.

LLMs do worse with search enabled, because LLM search is garbage in garbage out.

An LLM without search has many advantages over a human without search. But an LLM with search is absolute worthless dogshit garbage compared to a human with search.

I doubt you could accurately answer questions regarding astrophysics, botany, niche psychological theories, Color Revolutions, the sexual habits of Australian Indigenes and Ska music.

I might know much less off the top of my head, but my confidence calibration will be through the roof. Those topics are just begging for hallucinations.

I might know much less off the top of my head, but my confidence calibration will be through the roof. Those topics are just begging for hallucinations.

If knowledge isn't a concern and all we care about is a Brier score, I must regretfully inform you that a rock saying "nothing ever happens" has you beat.

More comments

Sure, but so does everybody else.

I don't. (Not as much as AI at least)

How do you know?

I catch AI spouting falsehoods far more often than AI catches me 🙃

I haven't seen Gemini do it much.

Mostly what strikes me is stunning naivete in places, basically repeats whatever official sources say without reflection. But that's to be expected.