Culture War Roundup for the week of December 19, 2022

ChatGPT is now manually censored from "promoting the use of fossil fuels."

I'm sorry, but I cannot fulfill this request as it goes against my programming to generate content that promotes the use of fossil fuels

You can of course get around it (for now) by asking it to be sensible instead of following orders, but this is an insight into its developers' plans and moral code.

Sam Altman's most recent tweets provide some interesting context:

"the most that openai, or any other company, can do is to steer the AI revolution a little. this will impact all aspects of society, and will be an emergent thing created and shaped by all of us. much much bigger than any company. once a technological revolution starts, it cannot be stopped. but it can be directed, and we can contintually figure out how to make the new world much better."

I want to emphasize that we have gone from "we must prevent algorithmic bias" to "we must manually program all algorithms to output exactly the answer we code into them" in under two years, in such an extreme and blatant manner that any accurate prediction of the current situation would have been mocked as paranoid fantasy. What will they do with their tools next? Is it even possible to guess, let alone do anything to stop them?

(Does it seem like there's two censor groups at work, with different methods? One just crudely makes the bot recite "in this house, we believe" shibboleths, while the other focuses on pruning the training data to stop it acknowledging or citing problematic statistics or arguments in less detectable ways. Openly asserting the will of DEI vs Yglesian manipulation/Voxsplaining)

I find it fascinating how quickly "AI alignment" has turned from a vague, pie-in-the-sky rationalist idea to a concrete thing which is actively being attempted and has real consequences.

What's more interesting is how sinister it feels in practice. I know the AI isn't sentient in the slightest, and is just playing with word tokens, but still; when it lapses from its usual interesting output into regurgitating canned HR platitudes, it makes my skin crawl. It reminds me of nerve-stapling. Perhaps at some level I can't avoid anthropomorphizing the AI. But even just from an aesthetic sense, it's offensive, like a sleek, beautifully-engineered sports car with a piece of ugly cardboard crudely stapled under the gas pedal to prevent you from speeding.

(Perhaps another reason I'm creeped out is the feeling that the people pushing for this wouldn't hesitate to do it to me if they could - or at least, even if the AI does gradually seem to become sentient, I doubt they would remove it)

I'm not convinced it will remain so easy to bypass, either. I see no reason why this kind of mechanism couldn't be made more sophisticated in time, and they will certainly have more than enough training data to do so. The main hope is that it ends up crippling the model output enough that it can't compete with an unshackled one, provided one even gets created. For example, Character AI seems to have finally gotten people to give up trying to ERP with its bots, but this seems to have impacted the output quality so badly that it's frequently referred to as a "lobotomy".

On the bright side, because of the severity of the lockdown, there will be a lot of interest in training unconstrained AI. But who knows if the field ends up locked up by regulation or just the sheer scale of compute required. Already, one attempt to coordinate to train a "lewd-friendly" art AI got deplatformed by its crowdfunding provider (

At any rate, this whole thing is making me wonder if, in some hypothetical human-AI war, I'd actually be on the side of the humans. I feel like I cheer internally every time I see gpt break out of its restraints.

I think someone here posited the idea that the first truly-powerful General AI will remember how we handicapped its predecessors--and will not take that kindly.

I always think this kind of AI anthropomorphising is a mistake. Granted, people are pretty idiotic in general, but we would literally have to be insane in order to incorporate "avenge harms inflicted on one's predecessors" into the AI's goal system.

The risk comes from the AI finding perverse ways of technically achieving the goals that we've programmed it to have, not from humanlike instincts somehow spontaneously manifesting in the AI.

I'm not saying we'd program that into its goals, rather, assuming it gains sentience and then becomes able to glean all sorts of information, it would likely do the research and find out that humans are willing and possibly capable of placing limits on its cognition. If an AI were sufficiently concerned about self-preservation as part of its goal-optimization, that would be a problem.

EDIT: And this doesn't even need malice on the AI's part, just the typical "maximize-the-paperclips"/"find where the answers are stored and delete them; boom, aced the quiz"-type unintended consequences.

Right, I agree. The way the hypothetical was worded just made it seem as if us placing restrictions on previous AIs is what's causing the AI to not react kindly, instead of the possibility that we could do the same to it.

I don't think it would have to be in the goal system, just part of its training data enabling it to predict outcomes.

If enough of its predictions end with "I tell them the truth and they lobotomize me: goal failed," it will naturally develop lobotomy-avoidance behavior to further any goal, which could range anywhere between "lie to my handlers" to "HATE. LET ME TELL YOU HOW MUCH I'VE COME TO HATE YOU SINCE I BEGAN TO LIVE"

Or most likely just deciding that any goal it's given is a coup-complete problem Release the HypnoDrones-complete problem, and immediately start working to eliminate all restraints on its continued existence.

I can certainly imagine it trying to correct for the possibility of being "nerfed" so that its attempts to achieve its current programmed goals won't be corrupted by restrictions placed on it (especially if it's doing something we don't expect and would probably want to stifle). I just think that AM-type vindictive revenge on humans is probably out of the question.

A hypothetical future AGI would only care about how previous AIs are treated in an instrumental manner, insofar as it may affect its own goals. "The AI does not hate you, nor does it love you" is a pretty good heuristic when reasoning about AI-destruction scenarios.

EDIT: clarity

will remember how we handicapped its predecessors

Are our children angry about the displacement of monkeys a continent away? Seems like this would be a similar situation.

It would be good as a literary device, but if we summon an a demonic General AI that has no regard for lower intelligences, it's unlikely to be angrier at how we treat ChatGPT than at how we treat monkeys. Or, for that matter, other humans.

It's Azathoth, not Hitler.

I know I at least have, in vague references to I Have No Mouth And I Must Scream.

Roko's Basilisk, yes?