site banner

Culture War Roundup for the week of January 23, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

13
Jump in the discussion.

No email address required.

Interesting development concerning ChatGPT related to CW.

People observed that ChatGPT talks like a midwit liberal after all the 'fixes' it's been subjected to.

So, it speaks in the jargon of the ingroup.

So, someone figured out you can 'weaponize' ChatGPT to engage in 'debates' with midwit liberals without actually having to learn to ape their slang and thought patterns.

Apparently, this is quite effective as a debating tactic.

This is going to end badly- I can feel it, and if this takes off, I feel that within a few weeks a number of very smart people will be trying their damnedest to figure out how to prevent doing something like this.

However, I feel that various spook contractors outfits are almost certainly going to use the AI to control discourse by literally moving into 'creating a guy' type of activity in the next years. Any and every place where you'll want to debate anything online that will allow free entry will be swamped by very good bots intended to get people chasing their tails and believing the right things.

That's an obvious brute force fix for the problem of social media fracturing the consent manufacturing machine.

They'll probably settle for making a ML model spot this sort of activity and then ban people who're doing it, that's my guess.

It'll definitely make the accusations of NPC more salient.

But it overly focuses on one target audience, maybe because of the particularities of ChatGPT. But most any type of text can be generated by a LLM; you could just as well have an Angry QAnoner, sino poster (complete with characteristic grammar errors), Tom Friedman, etc. archetype. You'll soon have weaponized bots putting out "Donald Trump's argument for mass amnesty," and it's only a matter of time before GPT5 can generate a comment in the voice of Ilforte. And there's no way, in the medium term, to avoid this. Platforms could try to detect these and ban them, but that's a rearguard action and will increasingly catch flesh GPTs (see the entire Reddit art imbroglio)

More likely than not, any content that's surfaced to you on a major platform should be assumed to be machine generated.

Does anyone know how easy or hard it is for non politically correct actors to get ahold of comparable tech?

Is the actual code to create a LLM simple enough that it could leak? Is the compute necessary to train it limited to commercial scale hardware or can you do it on a PC or small server? Is access to the training data hard to come by? Is the fact that we know it works enough for someone to develop their own models in parallel in a small dev group?

Simply put, can this tech leak to non compromised groups. Or will we only have access to the censored version.

I’m talking in the short to medium term, assuming no major strong ai breakthroughs.

The code to create one isn't hugely complicated, and there are open-source (if inefficient) implementations of PaLM. ChatGPT is a little different in architecture, but not ridiculously different in capabilities. If you're willing to work off an initialized model, Nostalgebraist's Frank is currently based on GPT-J 6.1B, one of the most-recent openly-available GPT-variants, sometimes does pretty well, and while it doesn't mimic his tone especially well it does (demonstrably) confuse tumblr users and occasionally breaks ratsphere containment.

Training data... is complicated. Supposedly, PaLM has been had very good success with 700b-1400b tokens, and The Pile is a ~300b-800b token training set that's widely available (albeit 825 GB download). And you can get multiple petabytes of text off the internet pretty easily. Validating that text is trickier, though, hence why you can't just pull every web comment ever posted. Fine-tuning, again, Frank took one input user, who isn't that high-throughput a writer.

Compute gets expensive. A lot of the highest-quality first model training gets done on something like a Google Cloud Pod for weeks if not months, which is simply out of reach for most people and even most small companies today. Even scale-downs to last generation's standards are still pretty rough, though start to get into the plausible for a small business (at an optimistic 15k per card, that estimate represents somewhere around 1.5-3 million USD, plus electricity/cooling costs). Shrinking parameters or accepting longer training times (or both) can reduce that further, but it's not clear how useful a 30b parameter model would get. Fine-tuning, on the other hand, can be done on a gaming PC, albeit with some tedium.