This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
You know, I hadn't really internalized just how big this is. You got me curious about it. I uploaded something I'm working on -- 240k words, which, with Gemini 2.5 Pro, came out to about 400k tokens.
Honestly, I'm impressed that it works at all and very impressed how fast it works. Thought I'd at least have time to get up and get a drink, but it was already responding to my question inside 30 seconds. Just being able to throw compute at (essentially) reading a book feels magical, like nine women making a baby in a month.
Unfortunately, that's where my praise ends. It... has a general idea what happened in the text, certainly. I wouldn't give it much more than that. I'm used to 2.5 being impressively cogent, but this was pretty bad -- stupider than initial release GPT 4, I want to say, though it's been long enough I might be misremembering. If you ask it concrete questions it can generally give you something resembling the answer, complete with quotes, which are only ~30% hallucinations. Kind of like talking to someone who read the book a few months ago whose memory is getting a bit hazy. But if you ask it to do any sort of analysis or synthesis or speculation, I think it'd lose out to the average 10-year-old (who'd need OOMs longer to read it, to be fair).
(Also, the web front end was super laggy; I think it might have been recounting all the tokens as I typed a response? That feels like too stupid an oversight for Google, but I'm not sure what else it could be.)
Not sure where the disconnect is with the medical textbooks you say you tried. Maybe the model has more trained knowledge to fall back on when its grasp on the context falls short? Or you kept to more concrete questions? As of now I think @Amadan's semantic compression approach is a better bet -- whatever you lose in summarization you make up in preserving the model's intelligence at low context.
FanFicFare can do this for free. It's also available as a calibre plugin, if you want a gui.
Though, bizarrely, Gemini (at least via Google AI Studio) doesn't support epub uploads. Concerns about appearing to facilitate the upload of copyrighted material? Kind of dumb considering epub is an open format and they allow PDF, but I could see how it might be spun in a lawsuit. Anyway, RTF should work, but didn't for me. Eventually got something workable out of pandoc:
Tokens aren't everything. While you can fit an entire novel inside the theoretical token window of an LLM, that doesn't leave it much room to do detailed and coherent output, especially as your requests become more detailed.
As for epubs, yeah, one of the steps in my app is being able to read from docx and epub files and extract context from chapter headings, for example.
Most automatic epub generation tools suck. For that matter, I have seen professional published epubs that are just terrible slapped-together artifacts. I taught myself to make properly formatted ebooks using Sigil and I'd make a business of it except it wouldn't pay shit (too many people offering to do it on Fiverr for ten bucks).
More options
Context Copy link
You're correct in that perfect recall or retention isn't feasible when using a large number of tokens (in my experience, performance degrades noticeably over 150k). When I threw in textbooks, it was for the purpose of having it ask me questions to check my comprehension, or creating flashcards. The models have an excellent amount of existing medical knowledge, the books (or my notes) just help ground it to what's relevant to me. I never needed perfect recall!
(Needle in a haystack tests or benchmarks are pretty awful, they're not a good metric for the use cases we have in mind)
Ah.. So that's how people were making epubs with ease. Thank you for the tip!
I don't think it's got much to do with copyright, it's probably just such a rare use case that the engineers haven't gotten around to implementing it. Gemini doesn't support either doc or docx, and those would probably be much more common in a consumer product. I don't recall off the top of my head if ChatGPT or Claude supports epubs either.
More options
Context Copy link
More options
Context Copy link