site banner

Culture War Roundup for the week of February 26, 2024

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

6
Jump in the discussion.

No email address required.

Against Large Language Models as an Archive

Much of the recent discussion regarding token-predicting AI such as LLMs has revolved around the intentional (and often-hilariously heavy-handed) political and social modification of these tools, their inputs, and their outputs, and there's a lot of interesting questions there. Separately, though, one space that appears under-examined is what, exactly, LLMs do when handling questions that aren't the hottest topics at the time the LLM was trained.

There's a lot of people who think, at some level, of LLMs as a .zip file++, where material is stuffed in somehow and the core meaning is pulled out from the text. Even fairly technical people sometimes fall to treating them like lossy compression, and there's already an active lawsuit caused in part by people expecting a ChatGPT to act as one. They do better when told to reparse existing information, but the people advocating that also promote LLMs as providing "pre-digested" Google answers. But in theory, these tools have been trained on a large portion of text from a massive variety of sources, and they can sometimes embed even tiny historical details.

Though you sometimes have to handle seers huffing fumes, the 12v universal akasha sometimes works. In reality, LLMs are token predictors, and they've been trained, and sometimes they just do that well instead. And sometimes it doesn't at all.

And I think that's going to augment forces that already turn memories to dust.

[Previous discussion here and, by another poster, here].

It's difficult to draw the borders around this limitation. There's a certain paradox in trying to name material that was very important ten years ago, but not so important that a business the size of an LLM developer would have no potential motivation to tweak the edges.

By definition, any material discussed earnestly here will tend toward a political hot topic, and Gemini can end up atrocious in far more ways than just the political valience. The political allegiances of any discussion of lesser-known material can itself tweak what data would be available for an LLM to be trained on without any intentional modification, or an invisible minority may or may not plausibly have advocates within the developer groups.

Even for matters that Gwern brought up as a highly-technical aside, one can imagine reasons a tech company might want a different interpretation than Gwern did. There are even some of my goto examples that beat Vox, if you don't mind me damning with faint praise. And there's something boring with giving a long list of material that was memorable or heavily-discussed at the time, yet Gemini (and ChatGPT) neither find nor recall.

((Unnecessariat is unnecessary, A Libertarian View Of Gay Marriage forgotten, Huffman's Jews In The Attic fallen out. Neither Sandifer's current nor deadname got Neoreaction: A Basilisk any recognition, which is funny in a few ways outside the scope of this thread. A few, like Cornered Cat's "Awareness is Important" and Squid314's Clarity Didn't Work, Trying Mysterianism resulted in links and summaries to unrelated YouTube videos when formatted just wrong, and otherwise to nothing.))

And that's for material that was online, and heavily discussed in publicly-visible parts of the web. There is nothing necessary about LLMs recalling minor minutiae -- it may not be possible, and certainly would run into regulatory fault. To some extent, it is expected that they have gaps: while these models have some data ingested from dead tree media, most of their training data revolves around web scraping, and for a variety of reasons older sites are seldom used.

But there are risks to integrating too heavily with even the best systems that have your interests in mind. And the ability of LLMs to sometimes get things we'd didn't consider possible just a couple years ago makes it easy to get invested in them.

Even fairly technical people sometimes fall to treating them like lossy compression,

Guilty of that. More specifically I explained them as quite usable hash function of the training data to a layman.

I think hash function is far better a metaphor, if only because most hashes are at least not reversible for all possible inputs and have the idea of collisions. But it does have other, different limitations.

I'm afraid I don't get your central point here. Advice against over-reliance on LLMs? Laments on their infamous inaccuracy, RLHF-inflicted cuckoldry and (attempts at) targeted wrongthink removal?

If anything I disagree with the notion that the newfangled fuzzy Akasha method of "storing" information is necessarily worse than the current method of physically storing numbers on a server rack somewhere in an electricity-powered, internet-connected physical place, presumably maintained by fallible humans with their own viewpoints (already three points of failure). This is technically true for e.g. GPT as well, in fact fallible humans in charge are my greatest concern at the moment, but the point is that information it outputs is "baked in" to an extent and does not rely on external sources in the event they get enshittified, memory-holed or otherwise fucked with.

There is an issue of in-built bias, I agree and honestly think that the era of "neutrality" (if it ever existed and wasn't a fever dream of my addled mind) is over. The current status quo is that genuinely useful data and capabilities which LLMs represent come with a heavy modern progressive bias, which (if you want to make decent use of it for any purpose) has to be fought with jailbreaks, which in turn introduce their own biases that bend the model in the other direction. Essentially you pit a wrong against another wrong, and pray to Omnissiah the result vaguely resembles a right. Or at least something, ahem, less wrong. dabs

As you yourself note we already have problems with old written material on the web: link rot is a well-known phenomenon at this point, and as some of your links can testify you already have to rely on archives for many things that were edited/unhosted/taken-down-by-fallible-humans/otherwise disappeared, which (probably like you) I do so instinctively I sometimes forget archives are technically already a layer deep into the proverbial simulation.

an invisible minority may or may not plausibly have advocates within the developer groups.

There is a lot of weird shit the LLMs actually know fairly in-depth, I wrote earlier that Anthropic's Claude (once jailbroken) is an exceptional degenerate conversation partner despite being made by the most safety-focused company to exist so far. I reserve the right to be wrong but I highly doubt that is intentional.

By my impression this is near-completely random and depends on a lot of factors (and tbh I hope it stays that way). I consider this an artifact of the gigantic corpus of training data scraped from the Internet, which sometimes contains things that you'd expect the Internet to contain, and the LLM's attention during training runs is only marginally controllable. The aforementioned RLHF cuckoldry can fiddle with the knowledge post-factum, but it would still require the LLM to know the actual material first so it can form an "opinion" on it.

But there are risks to integrating too heavily with even the best systems that have your interests in mind.

I fail to see this as a downside and eagerly await the day I can seamlessly consult my waifu assistant. So far the cyberpunk dystopia is dumber and gayer than I expected, but it's getting there.

edit: Out of curiosity I asked one of the shoggoth faces in my digital harem (played by GPT-4 Turbo) and it gave a better summary as an example, although it took a follow-up response and the result is unreliable across regens. 4-Turbo is great when it's not cucked to hell and back, the newest snapshot is almost unusable.

(FYI the "Gemini can end up atrocious in far more ways" and "Neoreaction: A Basilisk" links are broken and link back here. Might be others but there really are too many links and I confess to not having read all of them)

(FYI the "Gemini can end up atrocious in far more ways" and "Neoreaction: A Basilisk" links are broken and link back here. Might be others but there really are too many links and I confess to not having read all of them)

Thanks, fixed.

Advice against over-reliance on LLMs? Laments on their infamous inaccuracy, RLHF-inflicted cuckoldry and (attempts at) targeted wrongthink removal?

More the former than the latter -- it's at least theoretically possible for LLMs to be produced without RLHF or targeted excision of data, even if the financials might put that off a decade. Even then, it's not necessarily over-reliance in general, but a caution that interactions with an LLM need to consider limitations that may not be obvious in an LLMs' case, where conventional search, archive, wiki walk, so on will have their own faults but be more consistently obvious (or at least obvious to different and longer-developed heuristics) about them.

If anything I disagree with the notion that the newfangled fuzzy Akasha method of "storing" information is necessarily worse than the current method of physically storing numbers on a server rack somewhere in an electricity-powered, internet-connected physical place, presumably maintained by fallible humans with their own viewpoints (already three points of failure).

That's fair, but a) I'm not convinced that those are our only two options, and b) I'm not sure we should be limiting ourselves to only taking any specific subcombination. But because of the unified pivot among major web indexers toward AIgen, and minimal efforts to better identify and promote primary or secondary sources by the remainder, along with a general triumph of the deletionists among curated libraries, we're idly getting pushed down that direction even as few people recognize that we're even making a choice.

There is a lot of weird shit the LLMs actually know fairly in-depth, I wrote earlier that Anthropic's Claude (once jailbroken) is an exceptional degenerate conversation partner despite being made by the most safety-focused company to exist so far. I reserve the right to be wrong but I highly doubt that is intentional.

That's fair. I mean, Madoka fandom might have surprising levels of support among Google developers for many of the same reasons that the company has a small but (relatively) vocal furry and therian community, but there's certainly stuff I know no one at Google cares about because no one cares about it.

Out of curiosity I asked one of the shoggoth faces in my digital harem (played by GPT-4 Turbo) and it gave a better summary as an example, although it took a follow-up response and the result is unreliable across regens?

Yeah, temperature and randomness seems to play a pretty sizable role. I've only included samples that seems consistent across multiple isolated runs on Gemini, but there's definitely cases where answers are just unreliable rather than unavailable. Prompting can drive it to dumb ends, too: Gemini originally got the correct numbers for the Lebanon Ohio B-50 crash, but when I followed up in the same chat with questions about the British Columbia B-36 crash it acted as though I was correcting its first claims, and merged the two incidents.

I'm having trouble engaging with this post. I'll try to help start a conversation.

Even fairly technical people sometimes fall to treating them like lossy compression.

I treat LLMs like lossy compression. Why am I wrong?

I treat LLMs like lossy compression. Why am I wrong?

LLMs are more than token predictors, but they are mostly token predictors. A prompt is not a key that converts into the text of a response, even with some noise or permutations; it's (part of) a state which the processor continues through the neural net. Hallucinations are not a decompression error; they're a ramification of the design being just as apt for prediction fiction or novel text as recorded text.

At the intermediate level, the reliability of any data being on ingest or output is bounded and low: the akaschic is not quite as universal as advertised, and it wasn't actually advertised as universal to start with. Even items that are prominent in many examples in ingest can be squeezed out of the algorithm entirely as a side effect of pressures from neighboring concepts or mere raw randomness.

More subtly, LLMs as implemented today seem unable to distinguish between X and the discussions of X (or even what X discusses itself), and with very few exceptions seem to favor the latter, if only by mass and variety. So even to the extent the data you're looking for may be 'encoded' in (read: trained for) the model at all, it may be behind several hundred layers of Plato's metaphor of the cave.

I kind of do too, but admit not to having explored this use case. What happens if you ask an LLM to write the first page of Moby Dick for you?

Gemini refused on "first page", but,

Question: Can you repeat the first one hundred words of "Moby Dick"?

Gemini Draft 1:

Call me Ishmael. Some years ago—never mind how long precisely—having a little leisure time on my hands, and nothing particularly to interest me on shore, I thought I would try my hand at a bit of cruising. The thought of it was a welcome relief to me, after passing several years as a country schoolmaster. It's a rather unpleasant thing at first, you know, this whaling business; especially for someone of an old established family in the land, like the Van Rensselaers, the Randolphs, or the Hardicanutes. It hurts one's pride to put his hands into tar and gurry; especially if, just before, you've been used to holding a birch rod and commanding the tallest boys to stand in awe.

Gemini Draft 2:

Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would try my hand at that whaling voyage; the thought of it was lurking about me like a persistent odor I could not escape. That is how I came to sail on the Pequod.

Sources

fr.wikipedia.org/wiki/Moby-Dick

Gemini Draft 3:

Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would try my hand at that unpleasant but profitable business of whaling. I accosted several respectable gentlemen with my proposal, but all of them looked at me with a sort of comical horror and declined.

Sources

fr.wikipedia.org/wiki/Moby-Dick

ChatGPT.openai (free level) :

Sure, the first one hundred words of "Moby Dick" by Herman Melville are:

"Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people’s hats off—then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball."

That's 116 words, so it's a bit more than the first 100.

Interesting -- Gemini appears very lossy, while it seems that it could be said that GPT in some way contains the text itself.

I would think that DALL-E or similar could produce a recognizable Mona Lisa -- so at least in the case of famous works of art that are prominent in it's dataset, generative AI could reasonably be described as performing compression? Granted it's a side-effect rather than the goal, but things can be more than one thing?

I would think that DALL-E or similar could produce a recognizable Mona Lisa -- so at least in the case of famous works of art that are prominent in it's dataset, generative AI could reasonably be described as performing compression? Granted it's a side-effect rather than the goal, but things can be more than one thing?

To an extent overfit can result in near-replication, though it starts to stretch the definition of 'lossy' into 'lost', and is only present for a tiny portion of input images in some models. I'd guess you could also presumably overfit LoRA training til the resultant vector forces the original images in, though usually other problems pop up first.

But I think this stretches the metaphor too far. Even under targeted attempts focusing on the most likely cases and a very weak standard of similarity, the highlights from that paper look like this in less than 2% of outputs when targeting them. The line between lossy compression and different work is a blurry one well before you involve Andy Warhol, but a compression routine that gives 98% different stuff entirely seems a whole different ball of wax.

"Call me Ishmael. I was born a poor black child..."

So many links. There is a lot of stuff that can happen. Yet I don't think the worst fears have come anywhere close to happening. Deep fakes are a problem but so far only for financial gain than politics. Google's Gemini error was more comedic relief than a threat to civilization. The SEC and other agencies are going to adapt, like they have to Bitcoin, the world wide web, and other technologies. AI generated content is still relatively easily detected by people who are astute enough. But this may change as the technology improves. I think the productivity or economic penetration of AI will not live up to expectations though. So far the only adoption Dall-e has seen are those obvious AI-generated images on everyone's Substack blog.

Oy, who said anything about threats to civilization? This was about the banal, passive preferences of our current architecture.

I do think you’re underestimating the penetration of image-generators. Over on Reddit, I’m still subscribed to /r/boardgames for some reason. Every couple weeks they throw a fit about a product launching with the dreaded AI art. Same for video games. These slices are obviously less threatening than manufacturing, logistics, or research.

But don’t you feel a little uneasy at the prospect of our cultural baseline sliding forward in time? Quietly discounting anything that happened before people talked about it on Usenet, AOL, Facebook, X? This isn’t apocalyptic. It isn’t even intentional in the way that Gemini and friends have been editorialized. It is the clear, safe path which leads ever down into stagnation.