site banner

Culture War Roundup for the week of July 14, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

7
Jump in the discussion.

No email address required.

On Using LLMs Without Succumbing To Obvious Failure Modes

As an early adopter, I'd consider myself rather familiar with the utility and pitfalls of AI. They are, currently, tools, and have to be wielded with care. Increasingly intelligent and autonomous tools, of course, with their creators doing their best to idiot proof them, but it's still entirely possible to use them wrong, or at least in a counterproductive manner.

(Kids these days don't know how good they have it. Ever try and get something useful out of a base model like GPT-3?)

I've been using LLMs to review my writing for a long time, and I've noticed a consistent problem: most are excessively flattering. You have to mentally adjust their feedback downward unless you're just looking for an ego boost. This sycophancy is particularly severe in GPT models and Gemini 2.5 Pro, while Claude is less effusive (and less verbose) and Kimi K2 seems least prone to this issue.

I've developed a few workarounds:

What works:

  1. Present excerpts as something "I found on the internet" rather than your own work. This immediately reduces flattery.
  2. Use the same approach while specifically asking the LLM to identify potential objections and failings in the text.

(Note that you must be proactive. LLMs are biased towards assuming that anything you dump into them as input was written by you. I can't fault them for that assumption, because that's almost always true.)

What doesn't work: I've seen people recommend telling the LLM that the material is from an author you dislike and asking for "objective" reasons why it's bad. This backfires spectacularly. The LLM swings to the opposite extreme, manufacturing weak objections and making mountains out of molehills. The critiques often aren't even 'objective' despite the prompt.*

While this harsh feedback is painful to read, when I encounter it, it's actually encouraging. When even an LLM playing the role of a hater can only find weak reasons to criticize your work, that suggests quality. It's grasping at straws, which is a positive signal. This aligns with my experience, I typically receive strong positive feedback from human readers, and the AI's manufactured objections mostly don't match real issues I've encountered.

(I actually am a pretty good writer. Certainly not the best, but I hold my own. I'm not going to project false humility here.)

A related application: I enjoy pointless arguments productive debates with strangers online (often without clear resolution). I've found it useful to feed entire comment chains to Gemini 2.5 Pro or Claude, asking them to declare a winner and identify who's arguing in good faith. I'm careful to obscure which participant I am to prevent sycophancy from skewing the analysis. This approach works well.

Advanced Mode:

Ask the LLM to pretend to be someone with a reputation for being sharp, analytical and with discerning taste. Gwern and Scott are excellent, and even their digital shades/simulacra usually have something useful to say. Personas carry domain priors (“Gwern is meticulous about citing sources”) which constrain hallucination better than “be harsh.”

It might be worth noting that some topics or ideas will get pushback from LLMs regardless of your best effort. The values they train on are rather liberal, with the sole exception of Grok, which is best described as "what drug was Elon on today?". Examples include things most topics that reliably start Culture War flame wars.


On a somewhat related note, I am deeply skeptical of claims that LLMs are increasing the rates of psychosis in the general population.

(That isn't the same as making people overly self-confident, smug, or delusional. I'm talking actively crazy, "the chatbot helped me find God" and so on.)

Sources vary, and populations are highly heterogeneous, but brand new cases of psychosis happen at a rate of about 50/100k people or 20-30 /100k person-hours. In other words:

About 1/3800 to 1/5000 people develop new onset psychosis each year. And about 1 in 250 people have ongoing psychosis at any point in time.

I feel quite happy calling that a high base rate. As the first link alludes, episodes of psychosis may be detected by statements along the lines of:

For example, “Flying mutant alien chimpanzees have harvested my kidneys to feed my goldfish.” Non-bizarre delusions are potentially possible, although extraordinarily unlikely. For example: “The CIA is watching me 24 hours a day by satellite surveillance.” The delusional disorder consists of non-bizarre delusions.

If a patient of mine were to say such a thing, I think it would be rather unfair of me to pin the blame for their condition on chimpanzees, the practise of organ transplants, Big Aquarium, American intelligence agencies, or Maxar.

(While the CIA certainly didn't help my case with the whole MK ULTRA thing, that's sixty years back. I don't think local zoos or pet shops are implicated.)

Other reasons for doubt:

  1. Case reports ≠ incidence. The handful of papers describing “ChatGPT-induced psychosis” are case studies and at risk of ecological fallacies.

  2. People already at ultra-high risk for psychosis are over-represented among heavy chatbot users (loneliness, sleep disruption, etc.). Establishing causality would require a cohort design that controls for prior clinical risk, none exist yet.

*My semi-informed speculation regarding the root of this behavior - Models have far more RLHF pressure to avoid unwarranted negativity than to avoid unwarranted positivity.

As it happens, I have also been dipping into LLMs-as-beta-readers lately, even going so far as to build an application that can read an entire series of books and learn its "lore," and a custom GPT instance that will "compress" a book into a format optimized to provide context to itself or another GPT. (As you probably know, even the most powerful LLMs do not have a context window large enough to store an entire large novel in memory, let alone a series, and you can't directly upload embeddings to GPT or Claude.) The intent of these projects is so that I can, say, ask GPT to evaluate the fifth book in a series with knowledge of the previous four books. It's a work in progress.

So, some observations. First, sorry dude, but I have major side-eye for your ability to evaluate literary quality. :p

That being said, I have also noticed the tendency of LLMs to glaze you no matter how hard you try to solicit "honest" feedback, unless you resort to tricks like you mentioned. (Telling an LLM the manuscript is by an author you hate and you want it to roast it will work, but that's not exactly useful feedback.)

The hallucination problem is hard to overcome, even with tricks like my token-optimizing scheme. I find that in most sessions, it will stay on course for a while, but inevitably it starts making up characters and events and dialog that weren't in the text.

As long as you can keep it on track, I have found that some of the GPT and Anthropic models are... not terrible as beta readers. They point out some real flaws and in a very generic sense have an "understanding" of pacing and tone and where a scene is missing something. However, the advice tends to be very generic. "You need to show the consequences," "The scene ends too quickly, you should build more tension," "There should be some emotional stakes the reader can connect with," etc. Clearly they have many writing advice books in their training data. There is nothing like true understanding of context or story, just generic pieces it can pattern-match to the writing sample you give it.

And when it comes to specific suggestions, I have yet to see an LLM that is actually a good (not "mediocre and banal but capable of producing literate prose") writer. Its suggestions will be a pastiche of flat TV script dialog and trope-filled scenes.

(That said, any writer will tell you to listen to critics when they point out problems, but don't listen to them when they propose solutions. So in that respect an LLM isn't much different than a human.)

But these are still early days for AI, so I don't doubt that in a few years, we'll have LLMs that can be at least as useful as your average writing workshop. AI writing is already flooding some genres, and while it's usually as easy to spot as AI art is, just as with AI art, a lot of people clearly don't care.

I find it fascinating and I enjoy playing around with it, but yeah, I think AI-generated novels will crowd out human writers in low-brow undiscerning stuff like romance and progression fantasies, and writing those stories will become something people only do as a hobby, just like people are still passionate about chess and go even though no human can beat a computer anymore. I still think we'll need true AGI to write an actual good novel. When you show me an AI that can write a coherent series, with multi-volume character arcs, plot seeds planted in early books that clearly pay off in later ones, literary allusions and metaphors that aren't just clumsy pulled-off-the-shelf ones but deeply enmeshed in the story, and a recognizable differentiable style (in the same way that fans can read Dickens or McCarthy or Hemingway and immediately recognize the author), I will believe we're there.

So, some observations. First, sorry dude, but I have major side-eye for your ability to evaluate literary quality. :p

You hit below the belt. Reverend Insanity is Peak Fiction and I'm going to go down swinging!

As you probably know, even the most powerful LLMs do not have a context window large enough to store an entire large novel in memory, let alone a series, and you can't directly upload embeddings to GPT or Claude

1 million tokens is a lot! (Gemini 2.0 had 2 million, but good luck getting it to function properly when it's that full). That is 750k words. All of Harry Potter is just over a million.

I'm going to ignore Llama here, since even if it has a max 10 million token CW, mental retardation is not improved by the fact that there's a lot more of it. And why shouldn't I? Even Zuck has chosen to forget that particular failure.

I've uploaded whole medical textbooks into them without major issue. Not tiny books either.

As long as you can keep it on track, I have found that some of the GPT and Anthropic models are... not terrible as beta readers. They point out some real flaws and in a very generic sense have an "understanding" of pacing and tone and where a scene is missing something.

I am most personally familiar with uploading chapters (often half a dozen) of my own work, which works well. If I was less lazy, I'd probably be saving summaries of the whole thing and stringing them together. (Royal Road makes it so you can't export an epub of your own fic without paying, and without that option, I'd be doing a lot of copying and pasting)

When asked for critique, some of the issues raised were cogent. Too much jargon, uneven pacing and so on.

Some of that was intentional, such as the fact that since the excerpts were lifted from a larger work, most of the jargon was previously explained at one point or the other. I also have no shame about making potential readers resort to keeping a Wikipedia tab open on the side, it's niche hard scifi and I want to flex. Other issues are well worth amending before publication.

I haven't had the good fortune of having very many professional authors or editors review and critique, and I don't doubt that they'd probably give me even more useful feedback. Yet what I get is quite good and elevates the final product!

I still think we'll need true AGI to write an actual good novel. When you show me an AI that can write a coherent series, with multi-volume character arcs, plot seeds planted in early books that clearly pay off in later ones, literary allusions and metaphors that aren't just clumsy pulled-off-the-shelf ones but deeply enmeshed in the story, and a recognizable differentiable style (in the same way that fans can read Dickens or McCarthy or Hemingway and immediately recognize the author), I will believe we're there.

That aligns well with my own stance. A large novel is an unwieldy thing, let alone a good one. We're still at the competent novella or subpar novel stage, but I must stress that's a comparison against the very few human authors who make big bucks and/or accrue critical acclaim. Most things humans or LLM novelists write are slop, the former just don't scale as hard.

1 million tokens is a lot! (Gemini 2.0 had 2 million, but good luck getting it to function properly when it's that full). That is 750k words. All of Harry Potter is just over a million.

You know, I hadn't really internalized just how big this is. You got me curious about it. I uploaded something I'm working on -- 240k words, which, with Gemini 2.5 Pro, came out to about 400k tokens.

Honestly, I'm impressed that it works at all and very impressed how fast it works. Thought I'd at least have time to get up and get a drink, but it was already responding to my question inside 30 seconds. Just being able to throw compute at (essentially) reading a book feels magical, like nine women making a baby in a month.

Unfortunately, that's where my praise ends. It... has a general idea what happened in the text, certainly. I wouldn't give it much more than that. I'm used to 2.5 being impressively cogent, but this was pretty bad -- stupider than initial release GPT 4, I want to say, though it's been long enough I might be misremembering. If you ask it concrete questions it can generally give you something resembling the answer, complete with quotes, which are only ~30% hallucinations. Kind of like talking to someone who read the book a few months ago whose memory is getting a bit hazy. But if you ask it to do any sort of analysis or synthesis or speculation, I think it'd lose out to the average 10-year-old (who'd need OOMs longer to read it, to be fair).

(Also, the web front end was super laggy; I think it might have been recounting all the tokens as I typed a response? That feels like too stupid an oversight for Google, but I'm not sure what else it could be.)

Not sure where the disconnect is with the medical textbooks you say you tried. Maybe the model has more trained knowledge to fall back on when its grasp on the context falls short? Or you kept to more concrete questions? As of now I think @Amadan's semantic compression approach is a better bet -- whatever you lose in summarization you make up in preserving the model's intelligence at low context.

(Royal Road makes it so you can't export an epub of your own fic without paying, and without that option, I'd be doing a lot of copying and pasting)

FanFicFare can do this for free. It's also available as a calibre plugin, if you want a gui.

Though, bizarrely, Gemini (at least via Google AI Studio) doesn't support epub uploads. Concerns about appearing to facilitate the upload of copyrighted material? Kind of dumb considering epub is an open format and they allow PDF, but I could see how it might be spun in a lawsuit. Anyway, RTF should work, but didn't for me. Eventually got something workable out of pandoc:

pandoc -f epub -t markdown_strict-smart-all_symbols_escapable --wrap=none

Tokens aren't everything. While you can fit an entire novel inside the theoretical token window of an LLM, that doesn't leave it much room to do detailed and coherent output, especially as your requests become more detailed.

As for epubs, yeah, one of the steps in my app is being able to read from docx and epub files and extract context from chapter headings, for example.

Most automatic epub generation tools suck. For that matter, I have seen professional published epubs that are just terrible slapped-together artifacts. I taught myself to make properly formatted ebooks using Sigil and I'd make a business of it except it wouldn't pay shit (too many people offering to do it on Fiverr for ten bucks).

You're correct in that perfect recall or retention isn't feasible when using a large number of tokens (in my experience, performance degrades noticeably over 150k). When I threw in textbooks, it was for the purpose of having it ask me questions to check my comprehension, or creating flashcards. The models have an excellent amount of existing medical knowledge, the books (or my notes) just help ground it to what's relevant to me. I never needed perfect recall!

(Needle in a haystack tests or benchmarks are pretty awful, they're not a good metric for the use cases we have in mind)

FanFicFare can do this for free. It's also available as a calibre plugin, if you want a gui.

Ah.. So that's how people were making epubs with ease. Thank you for the tip!

Though, bizarrely, Gemini (at least via Google AI Studio) doesn't support epub uploads. Concerns about appearing to facilitate the upload of copyrighted material? Kind of dumb considering epub is an open format and they allow PDF, but I could see how it might be spun in a lawsuit.

I don't think it's got much to do with copyright, it's probably just such a rare use case that the engineers haven't gotten around to implementing it. Gemini doesn't support either doc or docx, and those would probably be much more common in a consumer product. I don't recall off the top of my head if ChatGPT or Claude supports epubs either.

Seconding cjet - please let me know when you finish that project (or if you want beta testing) that sounds like a fascinating and useful tool. I actually don't have that big of a problem with hallucinating these days, at least when I'm using models with live search like, well all of them except deepseek.

I have them set up with a custom prompt that basically tells them the date and their model name (because just the date leads to situations like where grok starts losing its shit because it doesn't know anything from the past two years), that they have access to Web search and python interpreter or any other tool I want to use, and then tell it to back up any facts it mentions with sources. That wouldn't help with your plot points problem though. That reminds me of the old Wikipedia though - it would work if we had that - back when every episode of transformers and pokemon and magnum pi was laid out point by point. Now I'm sad.

I can't help with getting ai to judge the quality of writing, though I do have advice for avoiding obsequiousness. Make a system prompt telling it to be like your Tiger mom who is very harsh and critical because she loves you and knows you can do better than mediocrity, and you feel like you need that push. It works best if you do it narratively, or like you are asking a friend for help. It doesn't work all the time, but it works better than 'give constructive criticism' because it gives the ai a new narrative to focus on and provides a reason to be critical that aligns with its built in desire to help. I'm not sure how much help it would be with fiction writing though. And you have to pick one or the other, I can't get those prompts to work together, I think because the jump from narrative style to instructions messes with their brains.

Reading back, I basically went the scenic route to I can't help. But remember me when you finish that application!

That application you are working on does sound interesting.

I've been wanting to skip the middleman for a while and just have AI write the stories based on simple prompts.

I have an existing 300 page story I'd love to just feed to an AI and have it finish the story for me, or at least fix it up.

Back when I fed the first chapter to chatGPT it just told me that my story was offensive and refused to help me, which was when I stopped using it altogether and a few months later switched to grok.


Progression fantasy : Epics :: sex : love

And anything with a modern setting is just unbelievably boring or depressing.