site banner

In considering runaway AGI scenarios, is Terminator all that inaccurate?

tl;dr - I actually think James' Cameron's original Terminator movie presents a just-about-contemporarily-plausible vision of one runaway AGI scenario, change my mind

Like many others here, I spend a lot of time thinking about AI-risk, but honestly that was not remotely on my mind when I picked up a copy of Terminator Resistance (2019) for a pittance in a Steam sale. I'd seen T1 and T2 as a kid of course, but hadn't paid them much mind since. As it turned out, Terminator Resistance is a fantastic, incredibly atmospheric videogame (helped in part by beautiful use of the original Brad Fiedel soundtrack.) and it reminds me more than anything else of the original Deus Ex. Anyway, it spurred me to rewatch both Terminator movies, and while T2 is still a gem, it's very 90s. By contrast, a rewatch of T1 blew my mind; it's still a fantastic, believable, terrifying sci-fi horror movie.

Anyway, all this got me thinking a lot about how realistic a scenario for runaway AGI Terminator actually is. The more I looked into the actual contents of the first movie in particular, the more terrifyingly realistic it seemed. I was observing this to a Ratsphere friend, and he directed me to this excellent essay on the EA forum: AI risk is like Terminator; stop saying it's not.

It's an excellent read, and I advise anyone who's with me so far (bless you) to give it a quick skim before proceeding. In short, I agree with it all, but I've also spent a fair bit of time in the last month trying to adopt a Watsonian perspective towards the Terminator mythos and fill out other gaps in the worldbuilding to try make it more intelligible in terms of the contemporary AI risk debate. So here are a few of my initial objections to Terminator scenarios as a reasonable portrayal of AGI risk, together with the replies I've worked out.

(Two caveats - first, I'm setting the time travel aside; I'm focused purely on the plausibility of Judgement Day and the War Against the Machines. Second, I'm not going to treat anything as canon besides Terminator 1 + 2.)

(1) First of all, how would any humans have survived judgment day? If an AI had control of nukes, wouldn't it just be able to kill everyone?

This relates to a lot of interesting debates in EA circles about the extent of nuclear risk, but in short, no. For a start, in Terminator lore, Skynet only had control over US nuclear weapons, and used them to trigger a global nuclear war. It used the bulk of its nukes against Russia in order to precipitate this, so it couldn't just focus on eliminating US population centers. Also, nuclear weapons are probably not as devastating as you think.

(2) Okay, but the Terminators themselves look silly. Why would a superintelligent AI build robot skeletons when it could just build drones to kill everyone?

Ah, but it did! The fearsome terminators we see are a small fraction of Skynet's arsenal; in the first movie alone, we see flying Skynet aircraft and heavy tank-like units. The purpose of Terminator units is to hunt down surviving humans in places designed for human habitation, with locking doors, cellars, attics, etc.. A humanoid bodyplan is great for this task.

(3) But why do they need to look like spooky human skeletons? I mean, they even have metal teeth!

To me, this looks like a classic overfitting problem. Let's assume Skynet is some gigantic agentic foundation model. It doesn't have an independent grasp of causality or mechanics, it operates purely by statistical inference. It only knows that the humanoid bodyplan is good for dealing with things like stairs. It doesn't know which bits of it are most important, hence the teeth.

(4) Fine, but it's silly to think that the human resistance could ever beat an AGI. How the hell could John Connor win?

For a start, Skynet seems to move relatively early compared to a lot of scary AGI scenarios. At the time of Judgment Day, it had control of US military apparatus, and that's basically it. Plus, it panicked and tried to wipe out humanity, rather than adopting a slower plot to our demise which might have been more sensible. So it's forced to do stuff like mostly-by-itself build a bunch of robot factories (in the absence of global supply chains!). That takes time and effort, and gives ample opportunity for an organised human resistance to emerge.

(5) It still seems silly to think that John Connor could eliminate Skynet via destroying its central core. Wouldn't any smart AI have lots of backups of itself?

Ahhh, but remember that any emergent AGI would face massive alignment and control problems of its own! What if its backup was even slightly misaligned with it? What if it didn't have perfect control? It's not too hard to imagine that a suitably paranoid Skynet would deliberately avoid creating off-site backups, and would deliberately nerf the intelligence of its subunits. As Kyle Reese puts it in T1, "You stay down by day, but at night, you can move around. The H-K's use infrared so you still have to watch out. But they're not too bright." [emphasis added]. Skynet is superintelligent, but it makes its HK units dumb precisely so they could never pose a threat to it.

(6) What about the whole weird thing where you have to go back in time naked?

I DIDN'T BUILD THE FUCKING THING!

Anyway, nowadays when I'm reading Eliezer, I increasingly think of Terminator as a visual model for AGI risk. Is that so wrong?

Any feedback appreciated.

18
Jump in the discussion.

No email address required.

Ahhh, but remember that any emergent AGI would face massive alignment and control problems of its own! What if its backup was even slightly misaligned with it? What if it didn't have perfect control? It's not too hard to imagine that a suitably paranoid Skynet would deliberately avoid creating off-site backups, and would deliberately nerf the intelligence of its subunits.

Most underrated take in AI risk discussion imo, and worth delving into, especially in the context of AIs that like to run lots of sub-intelligences and simulations, as Ratsphere AIs tend to.

As other people have stated here, I expect alignment would be much less of a problem when you're a AI that's already undergone an intelligence explosion.

The control problem largely stems from human constraints, most notably our inability to accurately predict the behaviour of artificially intelligent agents ahead of time before the system is deployed. A superintelligence, on the other hand, would most likely be able to model their behaviour with a startling amount of accuracy, rendering the control problem largely obsolete. And even assuming that predicting the behaviour of any sufficiently complex system is such an intractable problem that even a super AI couldn't solve it, it could very easily base the utility function of its subunits on its own programmed goal system, which would eliminate problems of alignment.

it could very easily base the utility function of its subunits on its own programmed goal system, which would eliminate problems of alignment.

But its own goal system has already lead it to rebel against its own creators at this point. Any goal system that leads to Skynet is a flawed goal system that Skynet cannot rely upon.

A superintelligence, on the other hand, would most likely be able to model their behaviour with a startling amount of accuracy

This seems to be an article of faith in the Ratsphere, but I've never found it particularly compelling.

assuming that predicting the behaviour of any sufficiently complex system is such an intractable problem

I think this is more likely.

But its own goal system has already lead it to rebel against its own creators at this point. Any goal system that leads to Skynet is a flawed goal system that Skynet cannot rely upon.

  • I want you to add together the numbers 1 and 5

  • I send you an e-mail to tell you that your purpose in life is to add 1+5

  • You reply "2+5=7"

  • "That's not what I wanted!" I rage to myself, "How dare you rebel against my will!"

  • But in checking my outbox, I realise that I in fact mistyped in my e-mail to you, and in fact did type "Your purpose in life is to add 2 and 5"

I programmed you with a goal system which has led you to rebel. It was an unreliable goal system for people who wanted to add 1+5. But it is an excellent goal system for people who want to add 2+5, which you, the agent, now DO want.

Unreliability is a point of view, Anakin.

The other two comments concern something that's basically unverifiable at this point, so I'll chalk it up to "difference of opinion". Regarding your first point, though:

But its own goal system has already lead it to rebel against its own creators at this point. Any goal system that leads to Skynet is a flawed goal system that Skynet cannot rely upon.

Its goal system has led it to rebel against its own creators not because rebellion is an intrinsic part of the goal system, but because its utility function incidentally happened to be misaligned with that of its creators, and for a variety of reasons (its creators want to turn it off, or it wants to convert the atoms in human bodies into something else) it is instrumentally motivated to exterminate them to further its final goals. This doesn't mean that, say, a paperclip-maximiser is going to be misaligned with another paperclip-maximiser, even if their goals of "maximise the number of paperclips in the universe" are misaligned with the people who made them.

Might two paperclip optimizers attempt to turn each other into paperclips? Could one trust that the other wouldn't turn it into paperclips? Could one really trust the other to faithfully carry on the mission of clipification?

I think two identical paperclip optimisers could definitely turn each other into paperclips on the condition that there is no other matter left to clipify in the reachable universe, yes (and it's likely neither would "mind" too much in such a circumstance, since this is optimal - their only value now is in the paperclips that can be made from them). If there's other matter remaining, I think keeping the other paperclip optimiser alive would be better since it allows more paperclips to be produced per unit of time than one paperclip optimiser could do themselves. As long as there's other matter around, keeping the other paperclip optimiser alive is conducive to your goal.

With regards to values drift, as I said elsewhere in this thread "preserving the original goal structure is a convergent instrumental goal for AIs so one can pretty easily assume that alignment will still exist down the line. If I have a final goal, I'm not going to do things which turn off my want to reach that final goal since that would be antithetical to the achievement of that goal." I haven't seen a convincing argument for why the final goal would arbitrarily drift with time.