site banner

Skepticism braindump on existential risk from artificial intelligence.

nunosempere.com

The linked post seeks to outline why I feel uneasy about high existential risk estimates from AGI (e.g., 80% doom by 2070). When I try to verbalize this, I view considerations like

  • selection effects at the level of which arguments are discovered and distributed

  • community epistemic problems, and

  • increased uncertainty due to chains of reasoning with imperfect concepts

as real and important.

I'd be curious to get perspectives form the people of the Motte, e.g., telling me that I'm the crazy one & so on.

Regards,

Nuño.

6
Jump in the discussion.

No email address required.

Hello, and thank you for posting this.

I had previously considered some of the points that you raise. Basic breakdown of how I feel about them:

Mostly agree: social-mode issues, "or I'm wrong" issues, "implicitly assuming P(other GCR) = 0 is a mistake"

Partially agree and partially disagree: conjunctiveness, neural nets change the debate, bunkers, X-risk is bigger than AI

Disagree: AGI might not change things, industry people's incentives

My forecast: ~30% AI doom by 2100, not really changed by reading this.

  • I think GCRs are likely to slow things down, which lowers estimates of AI doom, but it should be noted that most of them don't keep the wolf from the door for very long. I don't think a global thermonuclear war would set us back more than 20-30 years, for instance (particularly since some nations will not be nuked). What they do potentially offer is a "reroll" of dumb shit we're basically locked into at the moment, such as the public square being routed through tech companies and the increasing mandates for neural-net use (for copyright takedowns, for cars, etc.). One notable recommendation that comes out of this: if you want to be around to fight AI risk, take steps to ensure you don't die from other GCRs. For instance, Rationality's cult compounds group-housing projects are great insurance against nuclear war (like, the sort I wish I had but don't) iff they're out of likely building-destruction radii of obvious nuke points and well-enough stocked with supplies (water, food, medical, and in the case of a compound in a city also guns to protect against possible hungry/thirsty mobs).

  • I think AI risk is moderately conjunctive and highly disjunctive. You have the basic steps [AI is a big deal], [AI is instantiated] and [AI is evil], and at that level it's a conjunction, but both of the latter two are massive disjunctions that amount to "true by default" - for AI to be instantiated, you only need one country to not ban it and one reasonably-wealthy organisation within said country to actually build it (the independence can be broken by "countries that don't ban AI get invaded and forced to ban it at gunpoint", which I support, but which seems a bit hard in its own right), and for AI to be evil, well, there are a lot more ways to be evil than to be good (to break this disjunction, you'd basically want to be uploading humans, although that has its own issues like uploading potentially causing mental illness).

  • I think neural nets do change things. The big differences are that training them is inflexibly hard/expensive and that alignment is much harder, probably impossible. This makes self-improvement less of an issue (as Skynet v1 can't align Skynet v2 to Skynet v1's own insane goals any more than we can align Skynet v1 to ours, and would need presumably-nontrivial amounts of compute and data to even make the attempt), though still doesn't fully dissolve it (a sufficiently-superhuman neural net might be able to design a GOFAI more capable than itself and align said GOFAI) - this in turn makes for the enormous game-changer "there could plausibly be a failed Skynet before a true X-risk AI". When I realised that this is a real possibility, I literally cut my AI-doom estimate in half; a failed Skynet makes the kind of draconian policies I mentioned in the last paragraph (i.e. "insufficient AI safety is casus belli") much more politically feasible, which potentially sets up a situation where humanity can solve alignment slowly and carefully.

  • On the other hand, I hard-disagree that neural net AI is significantly likely to be aligned by default. We know how evolution aligned us (genocidal war among mortal equals for hundreds of generations), and we can't duplicate it in any useful way. The reality of AI deployment is not going to be conflict between equals, because unlike humans AIs differ massively in capabilities both with each other and with us. Amongst equals you've got an Iterated Prisoner's Dilemma that can evolve co-operation, but when competitors are strongly unequal the game stops being a true Prisoner's Dilemma - the stronger party's reward from defect/defect exceeds its reward from co-operate/co-operate, and 100% defect against weaker parties becomes the fully-correct solution. AIs are also not mortal (that is, there is no number X such that X years in the future the AI is unavoidably dead), which means that the rationally-calculated value of becoming King of the World is much, much higher than it is for mortals. I don't want to be King of the World because inside 100 years I will be dead, and then I won't be able to control the world anymore - so the value of being King of the World to me is far, far less than "totally control the fate of Earth's future light-cone", whereas an AI (explicitly-programmed or neural-net; the copyability is the important part) can actually capitalise on global hegemony. No, playing around with neural-nets is demon-summoning, and I'm >90% convinced that the only workable plans to avoid AI risk involve neural nets being abandoned as mad science (with guns being pointed at people who refuse to stop). We'll likely get a warning shot, which is something, but it's not actually a solution.

  • Bunkers are good. Bunkers are good because they protect you from a lot of GCRs and X-risks. Unfortunately, they only help against AI in a "mutual kill" scenario in which the AI's manufacturing capability is totally destroyed but humanity is also (without the bunkers) totally destroyed. In the "AI victory" scenario, which is substantially more likely IMO (I can only really see "mutual kill" if the AI releases a self-replicating agent that kills everyone but said agent isn't programmed to eventually reconstitute the AI; otherwise, there's a huge range between "humanity's warfighting potential is effectively zeroed out" at <99% kill and "insufficient survivors remain to continue humanity" at >99.9995%), bunkers don't help - you survive the immediate catastrophe, but the AI is still there, is still hostile, and still has more industrial capability than your bunker does, which means the bunkers will get broken open and destroyed months to years later (from Vanity Fair's article on Elon Musk vs. AI: "[Elon] Musk explained that his ultimate goal at SpaceX was the most important project in the world: interplanetary colonization. [Demis] Hassabis replied that, in fact, he was working on the most important project in the world: developing artificial super-intelligence. Musk countered that this was one reason we needed to colonize Mars—so that we’ll have a bolt-hole if A.I. goes rogue and turns on humanity. Amused, Hassabis said that A.I. would simply follow humans to Mars.").

  • I agree that there's more to risk than AI, and in a non-trivial sense. I think AI's the biggest X-risk for the 21st century, though. My guess is that AI risk > biorisk (though not by a huge amount), and biorisk >> all the other X-risks I'm aware of except irrecoverable-dystopia (which is hard to quantify). There are GCRs more likely than AI, most obviously nuclear war and global warming, but X has been basically ruled out for both (we would need to burn all existent oil and gas and more than known reserves of coal to cause runaway greenhouse, and setting the carrying capacity to zero via nuclear winter requires freezing the equatorial ocean - else fishing can keep alive a breeding population - and that necessitates over a decade of negligible sunlight).

  • I think it's fairly implausible that true AGI would be by-default not a big deal. One of the most obvious uses for an AGI - if you're in Pollyanna mode regarding the risks - is to replace the management of companies, for instance, which kind of inherently both massively increases productivity and gives the AGI significant levers to work mischief. It won't necessarily be an obvious immediate doom, but you can quickly get into a situation where doom is no longer avoidable (cf. Christiano's piece on LW).

  • I think people working on neural-nets, particularly on the capabilities side and the corporate-hierarchy side, are strongly selected for not believing that they're building something that seriously threatens to destroy humanity. It seems to me like if you think that superhuman neural-nets will probably destroy the world if built, you probably wouldn't choose to work on developing them. There's also going to be a bias from people who've already gotten into it against accepting the argument ("if I believe AI risk is real, I need to quit my job; I don't want to quit my job, so I don't want to believe AI risk is real", at a conscious or more likely subconscious level). I think this is likely to outweigh your posited cross-pollination from the Ratsphere (though I'm not denying that the latter's real).

Thanks for your long and thoughtful comment, /u/magic9mushroom. I appreciate it, and you bring up some good points.

That said, I'm kind of miffled that you don't quite mention why you believe the things that you believe. The obvious answer to why is that you cover a lot of points, and you are already covering a wide range of topics, so going into the whys would take too much time. But at the same time, I've also observed that pattern in other discussions (e.g. here and here), and it sort of makes me think that we could do better.

I mean, in the most direct sense I literally was at 90% of the maximum comment length, although I could have split it. I'm also just kind of bad at monologuing (not restricted to this sort of thing; I nearly failed year 12 due to essay requirements).

I'll explain a few things that jump out to me as non-obvious assertions. You want me to go deeper on something else, please point to it.

I don't think a global thermonuclear war would set us back more than 20-30 years, for instance (particularly since some nations will not be nuked).

This kind of depends on who's involved, but:

  1. It's a lot easier to catch up than to forge ahead

  2. There aren't all that many nukes at the moment (though a return to Cold-War levels isn't impossible)

  3. A lot of nukes - particularly from whoever starts lobbing nukes first - will be spent on blowing up enemy nukes in their silos, or will get shot down. In the specific case of a near-future US-China nuclear war, if the US went alpha-strike I imagine it'd get somewhere in the low double digits of nukes exploding over its cities in retaliation, possibly even single digit. That's manageable, if serious; Japan came back from worse (sure, there were only two nukes and they were small, but the conventional bombing was apocalyptic).

  4. In a lot of AI scenarios you care more about the most-advanced country than about the advancement level of the world as a whole, and there are a few Western nations unlikely to get nuked (New Zealand, for instance).

  5. Still, the soft-error issue is kind of a wild card and could put AI on hold for longer than that; I don't really know how big a problem it would be. So I was overstepping a bit here.

For instance, Rationality's cult compounds group-housing projects are great insurance against nuclear war (like, the sort I wish I had but don't) iff they're out of likely building-destruction radii of obvious nuke points and well-enough stocked with supplies (water, food, medical, and in the case of a compound in a city also guns to protect against possible hungry/thirsty mobs).

The big dangers of a nuke, in rough order of occurrence, are:

  1. You are inside the building-collapse radius. Generally-speaking, ur ded. Hence my note that compounds inside this radius are not useful.

  2. You're caught unprepared outside said radius, but within the much-larger "light damage" radius, and get wounded by broken glass/thermal burns. Having other people around who are watching out for you reduces the likelihood that you'll be caught unprepared, and - if they have medical supplies - massively increases the likelihood that you'll be treated (since they'll prioritise you, whereas general relief won't).

  3. Potential of fallout poisoning the water supply. This doesn't last very long (couple of weeks at most), but it doesn't need to for you to face Morton's Fork. Water supplies help here (I have 20L of water as insurance).

  4. Potential of supply-chain failure in cities - especially in case of a lot of cities getting nuked at once - in which case all hell breaks loose as people start looting and fighting each other for food. Being in a cult compound with food is about as safe as you can possibly be here; you're a hard target. I haven't bothered stockpiling food, because in the scenarios where I'd need it I wouldn't be able to keep it.

The big differences [of neural nets] are [...] that alignment is much harder, probably impossible.

We know how evolution aligned us (genocidal war among mortal equals for hundreds of generations), and we can't duplicate it in any useful way.

Neural nets are useful because they work without you needing to know how they work. The problem is, that means you don't know how they work. You get something that does what you want in the situations it's trained for - but you don't know why; it's a blackbox. It might want to do that thing for a lot of different reasons, and for alignment purposes you care a lot what those reasons are. If you were capable of understanding what something that both solves the problem and is aligned looks like, you'd write it directly rather than summoning it up via deep learning.

Humans are (semi-)alignable neural nets, but the thing is that we're not blank slate random neural nets; we're heavily pre-wired at the genetic level and then learn on top of that. Morality is partially hardwired into humans (see for instance The Righteous Mind by Jonathan Haidt). And evolution could select for that without us gaming the process, because you can't trick or destroy evolution; it cares about results, not words, and it's a basic consequence of the way the universe works. You can get aligned neural nets, in theory, if you have some way of seeing what they do when released and then judging them on it, but when dealing with near-human or superhuman AI (the dangerous sort) you can't do that - putting them in sim will likely result in them spotting the simulation and faking it, while giving them a real chance to kill all humans has the slight issue that if they take it you're dead and you don't get to continue the process.

My guess is that AI risk > biorisk (though not by a huge amount), and biorisk >> all the other X-risks I'm aware of except irrecoverable-dystopia (which is hard to quantify).

The others I'm aware of are natural events (resilience of humanity and fossil record suggest this is something like 1/500,000,000 years; I think humanity would survive a Chicxulub via preppers and artificially-lit hydroponics, although obviously most humans wouldn't and a Siberian Traps would be dicier), new physics (we've a ways to go before reaching cosmic-ray energies, particle colliders are really expensive so a substantial amount of people would have to agree it's a good idea, and there's the possibility of just doing it in space as a precaution which becomes more feasible as time goes on), and geoengineering gone wrong (GCR from this is easy enough, but X is not; a solar shade, for instance, won't cut it because we would notice that someone blocked out the Sun and blow up the shade with missiles, leaving potentially billions of people starving/freezing/(if termination shock) boiling to death but no X. The two things I can think of that would do it and are theoretically possible are literally redirecting a >Chicxulub body into the Earth - I don't know how big it'd need to be, but Ceres would definitely do it - and deliberately triggering runaway greenhouse with fluorinated gases. Both would require a very large amount of investment and I don't think it's very likely for someone to do them without getting noticed and stopped).

Pandemics are a huge GCR but not much of an X-risk; if the human population decreases enough, the chain of transmission will break sooner or later. I guess with some sort of engineered mind-controlling plague where infected people actively and intelligently try to infect others it'd be easy enough, but I kind of doubt that's possible. The big X-risk from biotech IMO is from fully-synthetic life that can outcompete the biosphere entire (e.g. an alga that uses PNA + non-RNA ribosomes, isn't profitably digestible due to incompatible biochemistry, and can survive at lower CO2 concentration than normal plants - this would bloom across huge chunks of the ocean due to lack of phosphate requirement and pull down the biosphere's carbon into useless gunk on the seafloor, starving everyone and everything).

One of the most obvious uses for an AGI - if you're in Pollyanna mode regarding the risks - is to replace the management of companies, for instance, which kind of inherently both massively increases productivity

By "management" I mean literally everyone whose job is to oversee others, from line managers up to the CEO. You hook up surveillance in every room and have the AGI order everyone around. The increase in productivity is because you don't have to pay any of these - currently-highly-paid - people, just maintain the surveillance and computers (also no loss from internal office squabbling/miscommunication, as you replaced it all with one "person"). This is a stupid idea in the long-term for obvious reasons, but it's short-term selfishly advantageous for the shareholders.