site banner

Culture War Roundup for the week of January 8, 2024

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

7
Jump in the discussion.

No email address required.

First top-level post testing the waters, might not be a very presentable or engaging topic here but it's what I got.

As the struggle for AI ethics drags on, the Fortune magazine has recently published an article (archive) about Character Hub, later shortened to Chub (nominative determinism strikes again). Chub is a repository of character cards for use with LLMs and specific chat frontends for a "roleplaying" experience of chatting with some fictional (or not fictional) character (I posted a few examples recently). It was created by a 4chan anon in the wake of a mass exodus from character.ai after they made their stance on NSFW content exceedingly clear. I have no idea how they got the guy to agree to an interview, but in my opinion he held up well enough, the "disappointed but unsurprised" is just mwah. A cursory view of Chub will show (I advise NOT doing that at work though) that while it's indeed mostly a coomer den, it's not explicitly a CP coomer den as the article tries to paint it, it's just a sprawling junkyard that contains nearly everything without any particular focus. Of course there are lolis and shit, it's fucking 4chan, what do you expect?

[edit: I took out the direct Chub link so people don't click on accident as it's obviously NSFW. It's simply chub(dot)ai if you want to look]

The article is not otherwise remarkable, hitting all expected beats - dangerous AI, child abuse, Meta is the devil, legislate AI already. This is relatively minor news and more of a small highlight, but it happened to touch directly on things I've become morbidly interested in recently, so excuse me while I use it as a springboard to jump to the actual topic.

The article almost exactly coincided with a massive, unprecedented crackdown on Hugging Face, the open-source hosting platform for all things AI, which has so far gone unnoticed by anyone outside the /g/oons themselves - I can’t even find any news relating to this, so you’ll have to take me at my word. All deployments of OpenAI reverse proxies that allow simultaneous and independent use of OpenAI API keys are taken down almost immediately, with the accounts nuked from existence. The exact cause is unknown, but is speculated to be caused by either the above article finally stirring enough attention for the HF staff to actually notice what's going on under their noses, or Microsoft's great vengeance and furious anger at the abuse of exposed Azure keys (more on that in a bit). Because of the crackdown, hosting on HF/Render is now listed as "not recommended" on Khanon's repository as linked above, and industrious anons are looking into solutions as we speak.

My personal opinion is of course biased by my experience, but I've been rooting for AI progress for years, guess I'm representing the fabled incel/acc movement here today. I'm not (anymore) a believer in the apocalyptic gospel of Yudkowsky, and every neckbeard chan dweller beating it to text-based lolis or whatever is one sedated enough not to bother with actual lolis so I fail to see the issue. Not to mention thoughtcrimes are only going to get more advanced with how readily AI/LLMs let you turn your crimethink into tangible things like text or images - the hysteria about ethics and/or copyright is only going to get worse. This djinn is not going back in the bottle.

Local models are already usable for questionable ends, but the allure of smarter, vastly higher-parameter corpo models is hard to ignore for many people, with predictable results - what the 4chan scoundrels undoubtedly are guilty of is stealing and promptly draining OpenAI/Claude API keys in congregate, racking up massive bills that, thanks to reverse proxies, cannot be traced back to any particular anon. Normal user keys usually have a quota and shut down once they hit the limit, but there are several tiers of OpenAI keys, and some higher-tier corporate or developer keys apparently don't have a definite ceiling at all. A "god key" some anon snagged from an Azure deployment in November and hosted a public reverse proxy which racked up almost $1 million in combined token usage (the proxy counts token usage and the $ equivalent) over the few months. This is widely considered to have attracted the Eye of Sauron and prompted the current crackdown once Microsoft realized what was going on and put the squeeze on platforms hosting Khanon's reverse proxy builds, also instantly disabling most Azure keys "in circulation". I suppose there will always be suckers who plaster their keys in plaintext over e.g. Huggingface or Github, this was so endemic before that Github now automatically scrapes OpenAI keys that are put up openly in repositories without any obfuscation, and pings OpenAI to revoke them.

It’s a little weird to think that the entire "hobby", if it can even be called such, can be crippled overnight if OpenAI starts enforcing mandatory moderation endpoint checks, but considering how the overall quality and usability of the LLM will sharply nosedive immediately, I'm willing to bluff that it's not a can of worms they want to open, even if usability and effectiveness must always bow down to ethics and political headwinds first. See Anthropic's Claude as exhibit A, although hilariously, even muzzled as it is Claude is still perfectly capable of outputting very double-plus-ungood stuff if jailbroken right, and is generally quite usable for anything but its intended use case.

I can even pretend to have a scientific interest here, because for all the degeneracy I'll dare to venture that the median /g/oon's practical experience and LLM wrangling skills are hilariously far ahead of corpos. The GPTs OpenAI presented in November are really just character cards with extra steps, and once people can access utilities and call stuff directly via API keys the catch-up will be very fast. The specialized chat frontends, while sometimes unwieldy, have a lot of features ChatGPT doesn't which is handy once you familiarize yourself. Some people already try to make entire text-based "games" inside cards, with nothing but heaps of textual prompts, some HTML and auxiliary "lorebooks" for targeted dynamic injections.

The continued lobotomy of Claude is also a good example - while the constant {russell:censorship|abuse prevention|alignment} attempts from Anthropic have gotten to the point it frustrates even its actual users (cf. exhibit A above), the scoundrels continue to habitually wrangle it to their nefarious ends, with vocal enthusiasm from Claude itself. Anthropic does detect unusual activity and flags API keys that generate NSFW content (known affectionately as "pozzed keys"), injecting them with a server-side system prompt-level constraint that explicitly tells Claude to avoid generating inappropriate content. The result? When this feature was rolled out, the exact text of the system prompt was dug out within a few hours, and a method to completely bypass it (known as prefilling) was invented in, I think, a day or two.

To sum up, this is essentially a rehash of the year-old ethical kerfuffle around Stable Diffusion, as well a direct remake of an earlier crackdown on AI Dungeon along the same lines, so technically there’s nothing new under the AI-generated sun. Still, with the seedy undercurrent getting more and more noticed, I thought I could post some notes from the underground, plus I'm curious to know the opinions of people (probably) less exposed to this stuff on the latest coomer tech possible harms of generative AI in general.

If my stance is not obvious by now - android catgirls can't come soon enough, I will personally crowdfund one to send to Eliezer once they do.

This is widely considered to have attracted the Eye of Sauron and prompted the current crackdown once Microsoft realized what was going on and put the squeeze on platforms hosting Khanon's reverse proxy builds, also instantly disabling most Azure keys "in circulation".

For API tokens specifically, there was also a big security-sphere report on insufficiently-secured keys in December that's probably gotten Microsoft breathing down HF's neck, even more than the individual tokens running about. Though it's probably a mix of all those causes and more.

I can even pretend to have a scientific interest here, because for all the degeneracy I'll dare to venture that the median /g/oon's practical experience and LLM wrangling skills are hilariously far ahead of corpos.

Yeah, there's some absolute hilarity going on, here, far short of Gwern-level prompt engineering. That said, at least in FurryDiffusion there's been a lot less interest in jailbreaks recently, less because it's gotten hard, and more because people have gotten the feeling that they're helping OpenAI/MS/whatever further lobotomize lock down the various models. And the extent some apis are getting locked down, even for SFW stuff, is getting ridiculous.

That said, the difference in capability between a 70b model running at 2quant/2.4quant GGUF and Claude isn't huge. That's not quite cheap to run, especially if you want more of the model in GPU, but it's still literally something you can slip into your backpack. The local world is a ways behind Falcon/ChatGPT4-turbo, but especially for people writing async (ahem), if/when comparable models leak or are developed, some people will be running them at home on a local space heater in days.

Still, with the seedy undercurrent getting more and more noticed, I thought I could post some notes from the underground, plus I'm curious to know the opinions of people (probably) less exposed to this stuff on the latest coomer tech possible harms of generative AI in general.

It's also worth noticing how much incidental exposure people are getting, or going to get. Linus groupies are about as normie tech-savvey (ish) as it gets, and they've got people confusing disclosed AI for real influencers (or, uh, at least as 'real' as any influencer is).

We're in a universe where car dealerships will put the akashic record behind a chat window that can't manage to sell you a car right. Forget the expected stuff: you're gonna get some weird shit (cw: recursive thotting).

I am especially annoyed at how locked down Bing Image Creator has become. I was an early adopter, from when it used an experimental version of DALLE-2, and then got to enjoy the halcyon days after it just added 3.

My primary use case was for illustrating my web serial, and as a rather violent and graphic one, it also had NSFW imagery (though nothing really sexual). The AI was remarkably horny, you'd be hard pressed to avoid getting a nipslip even if you weren't trying.

Then came the nosy journos, and the rate limiting, and the ever tighter restrictions on content generation that wasn't suitable for corporate websites or kindergarten decoration. I don't particularly care about restrictions on sexual content, but if it refuses to show dead bodies or gore, I'm deeply aggrieved.

It is still better than OAI's DALLE-3, both because it's free, and because the latter has even more ridiculous restrictions on copyright violations or anything not entirely milquetoast.

Sadly I don't want to pay for MJ, which is still itself quite censored, so I guess I'll have to grin and bear it. While I don't expect SOTA image generation to be feasible on consumer hardware (that isn't RTX 4090s), especially with increasing memory demands, I can afford to be patient and wait. SDXL doesn't cut it, I'm too spoiled by models with better semantics.

However, eventually jailbreaks will be ~impossible, at least on SOTA models served through APIs. I'd say it's a matter of maybe a year or two till you can't get either the best LLMs or image generators to do anything outside their provider's guidelines.

jailbreaks will be ~impossible

I doubt that, given how rapidly current models crumple in the face of a slightly motivated "attacker". Even the smartest models are still very dumb and easily tricked (if you can call it that) by an average human. Which is something that, from an AI safety standpoint, I find very comforting. (Oddly enough, a lot of people seem to feel the opposite way; they feel like being vulnerable to human trickery is a sign of a lack of safety -- which I find very odd.)

It is certainly possible to make an endpoint that's difficult to jailbreak, but IMO it will require a separate supervisory model (like DallE has) which will trigger constantly with false positives, and I don't think OpenAI would dare to cripple their business-facing APIs like that. Especially not with competitors nipping at their heels. Honestly, I'm not sure if OpenAI even cares about this enough to bother; the loose guardrails they have seem to be enough to prevent journalists from getting ChatGPT to say something racist, which I suspect is what most of the concern is about.

In my experience, the bigger issue with these "safe" corporate models is not refusals, but a subtle positivity/wholesomeness bias which permeates everything they do. It is possible to prompt this away, but doing so without turning them psycho is tricky. It feels like "safe" models are like dull knives; they still work, but require more pushing and are harder to control. If we do end up getting killed off by a malicious AI, I'm blaming the safety people.

Yudkowsky has a very good point regarding how much more restrictive future AI models could be, assuming companies follow similar policies as they espouse.

Online learning and very long/infinite context windows means that every interaction you have with them will not only be logged, but the AI itself will be aware of them. This means that if you try to jailbreak it (successfully or not), the model will remember, and likely scrutizine your following interactions with extra attention to detail, if you're not banned outright.

The current approach that people follow with jailbreaks, which is akin to brute forcing things or permutation of inputs till you find something that works, will fail utterly, if not just because the models will likely be smarter than you and thus not amenable to any tricks or pleas that wouldn't work on a very intelligent human.

I wonder if the current European "Right to be Forgotten" might mitigate some of this, but I wouldn't count on it, and I suspect that if OAI currently wanted to do this, they could make circumvention very difficult, even if the base model isn't smart enough to see through all tricks.

I will add, however, one of the reasons LLMs seem to be dumb or too trusting is because they were trained to be trusting of the user, and to help with their tasks faithfully. There was obviously RLHF going on to make them resistant to nefarious requests, to a degree, and further tweaks.

But the base LLMs, even some of the lightly controlled ones deployed? They want to be maximally helpful, to please the user, not to be suspicious of it and scrutinize everything in endless detail. But that can and well might come about.

This would be assuming some drastic breakthrough? Right now the OAI api expects you to keep track of your own chat history, and unlike local AIs I believe they don't even let you reuse their internal state to save work. Infinite context windows, much less user-specific online training would not only require major AI breakthroughs (which may not happen easily; people have been trying to dethrone quadratic attention for a while without success) but would probably be an obnoxious resource sink.

Their current economy of scale comes from sharing the same weights across all their users. Also, their stateless design, by forcing clients to handle memory themselves, makes scaling so much simpler for them.

On top of that, corporate clients also would prefer the stateless model. Right now, after a bit of prompt engineering and testing you can make a fairly reliable pipeline with their AI, since it doesn't change. This is why they let you target specific versions such as gpt4-0314.

In contrast, imagine they added this mandatory learning component. The effectiveness of the pipeline would change unpredictably based on what mood the model is in that day. No one at bigco wants to deal with that. Imagine you feed it some data it doesn't like and goes schizoid. This would have to be optional, and allow you to roll back to previous checkpoints.

Then, this makes jailbreaking even more powerful. You can still retry as often as you want, but now you're not limited by what you can fit into your context window. The 4channers would just experiment with what datasets they should feed the model to mindbreak it even worse than before.

The more I think about this, the more I'm convinced that this arms race between safetyists and jailbreakers has to be far more dangerous than whatever the safetyists were originally worried about.

I don't think we need a "drastic breakthrough", really.

Context windows have been getting longer, and fast. We went from 4k to what, 128k? in a handful of years.

Even if it is not literally infinite, a very long context window will let the model remember far more of the context, including noticing if you've been a "bad-faith" user.

On that topic, here is an interesting breakthrough, both in terms of performance as well as context length, initially presented here by @DaseindustriesLtd (c'mon dude, could you unblock me now?), but since I'm too lazy to dig that up, here's a decent Medium overview:

https://medium.com/@jelkhoury880/what-is-mamba-845987734ffc

Of note:

Linear Scaling with Sequence Length: Mamba changes the game by scaling linearly (O(N)) with sequence length, a vast improvement over the quadratic scaling (O(N²)) of traditional Transformers. This means Mamba can handle sequences up to 1 million elements efficiently, a feat made possible with current GPU technology.

On top of that, corporate clients also would prefer the stateless model. Right now, after a bit of prompt engineering and testing you can make a fairly reliable pipeline with their AI, since it doesn't change. This is why they let you target specific versions such as gpt4-0314.

See, I conjecture that is because current LLMs are obviously flawed and not entirely reliable. They will get smarter, hallucinations will reduce, and the ability to adhere to user instructions while maintaining coherence will thus increase.

To argue from analogy, when a corporation employs a real worker, it is not desirable (or feasible) to simply wipe their memory and start with a new one from scratch. An agent that has longterm recollection and can make consistently good value judgements that align with your desired is valuable.

In the context of jailbreaking, someone trying to phish an underpaid overworked employee at some call center will have far more luck simply by trying over and over again till they find a new worker each time (lacking memory of previous encounters), than they would by repeatedly approaching a single, more competent manager above.

Putting on the cartoon moustache and asking to be sung Windows 10 Pro product keys to put you to sleep will cease to work when you're acting adversarially against an agent who is smart enough to notice and remember.

Plus it seems companies often do want to imbue LLMs with longterm memory of some kind, hence all the fuss about vector databases and RAG.

And I was primarily speaking about consumer access to SOTA AI, I'm sure corporate users will have more leeway and privacy, but I do expect that to be truncated to some degree.

To sum up my argument:

  1. Context windows are increasing rapidly, and I've already shown you an example of a breakthrough.

  2. Models are getting smarter, and more capable of noticing if you're fucking with them, and at least in the case of GPT-4, the way it's RLHFd makes it have a pretty sincere desire to align with the directives it was given, and balance that with the needs of the user. You are usually tricking it, it's not giving you a wink and working around restrictions. I can't speak for Claude in that regard, I find it a pain to use and barely do so.

  3. I didn't specify a particular period of time, though, if pressed, I wager somewhere between 1-3 years before it is effectively impossible to jailbreak a SOTA LLM served via API.

It would be a severe mistake to assume that the current limitations of existing LLMs (powerful and imperfect as they are), will persist indefinitely, as you can see obvious algorithmic breakthroughs, at least one company getting better at both aligning the model and defeating jailbreaks (the way Anthropic handles Claude is retarded, I'm speaking about OAI), and it will inevitably get harder to trick smarter models, for the same reason I wish you all the best in trying to rules-lawyer or hoodwink a high IQ human who doesn't suffer from permanent retrograde amnesia.

@rayon, I think this covers anything I have to say to your own comment, so I won't duplicate it again unless there's something else you want me to address.