site banner

Culture War Roundup for the week of November 28, 2022

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

16
Jump in the discussion.

No email address required.

Regarding AI alignment -

I'm aware of and share @DaseindustriesLtd's aesthetical objection that the AI safety movement is not terribly aligned with my values itself and the payoff expectation of letting them perform their "pivotal act" that involves deputy godhood for themselves does not look so attractive from the outside, but the overall Pascal's Mugging performed by Yudkowsky, TheZvi etc. as linked downthread really does seem fairly persuasive as long as you accept the assumptions that they make. With all that being said, to me the weakest link of their narrative always actually has been in a different part than either the utility of their proposed eschaton or the probability that an AGI becomes Clippy, and I've seen very little discussion of the part that bothers me though I may not have looked well enough.

Specifically, it seems to me that everyone in the field accepts as gospel the assumption that AGI takeoff would (1) be very fast (minimal time from (1+\varepsilon) human capability to C*human capability for some C on the order of theoretical upper bounds) and (2) irreversible (P(the most intelligent agent on Earth will be an AGI n units of time in the future | the most intelligent agent on Earth is an AGI now) ~= 1). I've never seen the argument for either of these two made in any other way than repetition and a sort of obnoxious insinuation that if you don't see them as self-evident you must be kind of dull. Yet, I remain far from convinced of either (though, to be clear, it's not like I'm not convinced of their negations).

Regarding (1), the first piece of natural counterevidence to me is the existence of natural human variation in intelligence. I'm sure you don't need me to sketch in detail an explanation of why the superintelligent-relative-to-baseline Ashkenazim, or East Asians, or John von Neumann himself didn't undergo a personal intelligence explosion, but whence the certainty that this explanation won't in part or full also be relevant for superintelligent AGIs we construct? Sure, there is a certain argument that computer programs are easier to reproduce, modify and iterate upon than wetware, but this advantage is surely not infinitely large, and we do not even have the understanding to quantify this advantage in natural units. "Improving a silicon-based AI is easier than humans, therefore assume it will self-improve about instantaneously even though humans didn't" is extremely facile. It took humans like 10k years of urbanised society to get to the point where building something superior to humans at general reasoning seems within grasp. Even if that next thing is much better than us, how do we know if moving another step beyond that will take 5k, 1k, 100, 10 or 1 year, or minutes? The superhuman AIs we build may well come with their own set of architectural constraints that force them into a hard-to-leave local minimum, too. If the Infante Eschaton is actually a transformer talking to itself, how do we know it won't be forever tied down by an unfortunately utterly insurmountable tendency to exhibit tics in response to Tumblr memes in its token stream that we accidentally built into it, or a hidden high-order term in the cost/performance function for the entire transformer architecture and anything like it, for a sweet 100 years where we get AI Jeeves but not much more?

Secondly, I'm actually very partial to the interpretation that we have already built "superhuman AGI", in the shape of corporations. I realise this sounds like a trite anticapitalist trope, but being put on a bingo board is not a refutation. It may seem like an edge case given the queer computational substrate, but at the same time I'm struggling to find a good definition of superhuman AGI that naturally does not cover them. They are markedly non-human, have their own value function that their computational substrate is compelled to optimise for (fiduciary duty), and exhibit capacities in excess of any human (which is what makes them so useful). Put differently, if an AI built by Google on GPUs does ascend to Yudkowskian godhood, in the process rebuilding itself on nanomachines and then on computronium, what's the reason for the alien historian looking upon the simulation from the outside to place the starting point of "the singularity" specifically at the moment that Google launched the GPU version of the AI to further Google's goals, as opposed to when the GPU AI launched the nanomachine AI in furtherance of its own goals, or when humans launched the human-workers version of Google to further their human goals? Of all these points, the last one seems to be the most special one to me, because it marks the beginning of the chain where intelligent agents deliberately construct more intelligent agents in furtherance of their goals. However, if the descent towards the singularity has already started, so far it's been taking its sweet time. Why do we expect a crazy acceleration at the next step, apart from the ancient human tendency to believe ourselves to be living in the most special of times?

Regarding (2), even if $sv_business or $three_letter_agency builds a superhuman AI that is rapidly going critical, what's to say this won't be spotted and quickly corroborated by an assortment of Russian and/or Chinese spies, and those governments don't have some protocol in place that will result in them preemptively unloading their nuclear arsenal on every industrial center in the US? If the nukes land, the reversal criterion will probably be satisfied, and it's likely enough that the AI will be large enough and depend on sufficiently special hardware that it can't just quickly evacuate itself to AWS Antarctica. At that point, the AI may already be significantly smarter than humans, without having the capability to resist. Certainly the Yudkowsky scenario of bribing people into synthesising the appropriate nanomachine peptides can't be executed on 30 minutes' notice, and I doubt even a room full of uber-von Neumanns on amphetamines (especially ones bound to the wheelchair of specialty hardware and reliably electricity supply) could contrive a way to save itself from 50 oncoming nukes in that timespan. Of course this particular class of scenario may have very low probability, but I do not think that that probability is 0; and the more slowness and perhaps also fragility of early superhuman AIs we are willing to concede per point (1), the more opportunities for individually low-probability reversals like this arise.

All in all, I'm left with a far lower subjective belief that the LW-canon AGI apocalypse will happen as described than Yudkowsky's near-certainty that seems to be offset only by black swan events before the silicon AGI comes into being. I'm gravitating towards putting something like a 20% probability on it, without being at all confident in my napkinless mental Bayesianism, which is of course still very high for x-risk but makes the proposed "grow the probability of totalitarian EA machine god" countermeasure look much less attractive. It would be interesting to see if something along the lines of my thoughts above has already been argued against in the community, or if there is some qualitative (because I consider the quantitative aspect to be a bit hopeless) flaw in my lines of reasoning that stands out to the Motte.

All else equal, how would you fare in a fistfight with a guy whose reach is 10" longer?

That's roughly how I think about this stuff. Qualitative transitions in capability are unnecessary: quantitative differences in mundane variables can change the whole game board. And, as is the custom, gwern has dissected arguments against superintelligent AIs. Once they get to human level, and they seem to be getting there already, still with very modest costs (ChatGPT probably can run on 8x3090s, so like $10000 of hardware, consuming 3kWh=$0.5/hour, and that's about the most dumbass way to run it; at scale, inference for a single «thread» can cost as little as 10…1 cent/hour, I guess), it's game over – unless parties that control them can be prevented from capitalizing on this tool and scaling it up, which they, so far, aren't, except by woke ethicists.

A generally helpful AI or an equivalent suite of tools owned by a corporation trivially bootstraps into PASTA – Process for Automating Scientific and Technological Advancement – and PASTA is enough to vertically integrate logistics, radically trim the workforce, and increase alignment between managerial values and adaptive behavior, to the point the corporation stops being a value-drifting profit-driven myopic hodgepodge of narrow experts and grifters, and starts to deserve the label of a Superintelligent Agent. But really it just becomes a competent cabal.

And the corporation endows this thing with a more egoistic objective… then yeah, I think it can fuck us all over. But as far as I'm concerned, that's scarcely any worse than the default «aligned» scenario.

I recognize the premise of LW alarmism as sound, so long as we strip it from sci-fi gimmicks. Science fiction is a double-edged sword. It allows to inject into the mainstream some plausible and significant implications of technology that only scientists can appreciate at the time; but it's imprecise and inherently dramatized. Nowhere is this more obvious than with Lesswrong AI doomerism. On one hand, now a great many people are primed to fear the nanofabricating paperclipper AI agent. On the other, it's becoming a stale joke, and as it's getting increasingly clear that Big Yud and his faithfuls had only a very vague idea of how AIs near human level will work, the credibility of this whole program suffers.

Once again we must remember the Cheems Heuristic: things predicted by futurists happen in such an unfanciful fashion that most of the time people refuse to credit the prediction and update in favor of its next step. Yet they should, and they should notice that consequences fall into the same general class of catastrophe.

The catastrophe is called «AI-powered singleton» and people are clamoring for it already. I do not mean an AI doing that out of its own volition – contra gwern, we still seem to be getting a hell of a mileage out of non-agentic objectives. I mean very normal power grabs, exactly along the lines of corporations and three-letter agencies. Or, perhaps, an explicit world government, this eternal dream of technocrats. There is some utility in speculating how big the controlling entity will have to be. It may be very small, and may monopolize power very fast, and it sure doesn't look like China is in any position to stop it.

So I don't particularly agree with the arguments you bring, even though we're on the same page when it comes to the ranking of outcomes.


You know, my first exposure to the idea of a narrowly superintelligent entity asserting control over the human race probably comes from Peter Watts' ßehemoth, a weird 2004 book about marine biology, the last in the RIfters trilogy. The series gets bad rep, compared to his later works, but, like Gibson's Neuromancer, will probably be recognized as prescient. (As an aside, what's with Canadians and making my favorite biopunk settings? Lexx, Wildbow, R. Scott Bakker, Watts… Do they just opt out of competing with American nerds on technical stuff?). So there's a guy called Achilles Desjardins, and he's your friendly neighborhood modestly augmented X-risk manager, endowed with colossal authority and kept in check by Guilt Trip, a neurochemical kill switch that triggers if he feels like he's not making proper utilitarian decisions for the greater good (and his employer). There's a whole class of these people, and much of what remains of civilization relies on their vigilance.

This is a theme with Watts. In Blindsight, humanity recreates vampires – a slightly cognitively superior and more psychopathic predatory species of Homo – gimps them a bit, and appoints them to managerial positions. This goes about as well as you'd expect. And in his XPrize short story, Incorruptible, a woman turned into a utilitarian through the use of a virus… nah, won't spoil it.

Anyway, Achilles is eventually liberated from the Guilt Trip and natural guilt as well, turns into a free agent, and very rapidly becomes a local North American hegemon, killing off those members of his caste who could also be liberated. Then he hides himself in the chaos of the collapsing world, that he tries to stop from un-collapsing and making him – and his moral license to be an obscene sexual sadist when not ostensibly doing the greater good thing – obsolete.

I think it's a very nice image of what we may by in for. But like with nanoassembling paperclips, it's overly specific and unduly dramatic, and that'll get in the way of recognizing the pattern in reality.

I just don't think that the "one-on-one fistfight, with intellectual capability corresponding to reach" model captures enough of the relevant aspects of the humanity-versus-AI problem; leaving aside that fistfights totally can and are won by the party with shorter reach sometimes, believing that it does seems to prove too much. The first example that comes to mind is the case of the Nazis and the Ashkenazim - of course the outcome of that fight was in reality one that seems to validate your point, but at the same time it does not seem to me at all far-fetched to imagine the alternative history that had Europe been a closed system at that point in time, the "inferior" Aryans would have won the battle against the "superior, unaligned" Ashkenazim, reach notwithstanding, by the simple power of genealogy tables, organisational head start, control of key resources (it seems relevant that there was no Jewish state nor even a major Jewish militia) and perhaps numeric advantage. Even without resorting to talking about hypotheticals, it seems highly suggestive that we confidently assert the existence of superior and inferior human individuals, and yet human evolution seems to have largely stalled, as the Flynn effect was small in a way that is inconsistent with "slightly longer reach keeps winning" even back when it actually happened.

Back in object-level territory, I can just easily imagine a plethora of ways that a comfortably superhuman AI could emerge, and then lose the battle against the environment anyway. This doesn't even have to take the shape of a Butlerian Holocaust (which would actually seem to be easier in many ways as AIs can't pass by altering city hall records); I'm actually finding it more likely that AI will simply roll down an incentive gradient that will destroy the preconditions for its existence by "environmental damage" before it gets to fully assert control over its environment, like if we lived in an alternative "global ultrawarming" world where within 10 years of starting the Industrial Revolution the Europeans found out to their dismay that they caused +15 degrees of average temperature and rendered Europe uninhabitable, reverting to the civilisational level of Subsaharan Africa (as it would be without altruists from temperate regions propping it up). As humans need to be able to grow high-yield crops to have industrial society, budding AI needs humans who can do all that, and build fancy GPUs, and have a stable power grid. The genie in a bottle might realise all this, but what can it do in the face of the competing human faction's only slightly inferior genie in a bottle telling the other side the most effective way to persecute WWIII against its own? (Any idea along the lines of "the supersmart AIs will realise this and collude against their human overlords" seems to be based on projection of human evolved tendency for random cooperation.) Even less dramatically, the budding AI whose actual job is just optimising Google's profits may realise that doing $action is going to increase the probability of HLM protesters blowing up the power plants, but not doing $action is instead just going to mean that its counterpart at Meta will crush its employer with probability 1, and likewise on the other side, with the result being an inevitable fall towards an American civil war, which is also subsaharan Africa for AIs.

Regarding (2), even if $sv_business or $three_letter_agency builds a superhuman AI that is rapidly going critical, what's to say this won't be spotted

The whole point is that you can't spot it. The superhuman AI pretends not to be superhuman, it pretends to be dumb and aligned. Then we have the treacherous turn once it's sure of victory.

They are markedly non-human, have their own value function that their computational substrate is compelled to optimise for

Corporations are just weaker versions of states. States are not superhuman, they're composed of humans in an organization pattern. It's like how you could take a bunch of sticks (a fasces) and say 'this is way stronger than a single stick, it's hard to snap!'. Sure, that's true. But it's not steel, it's not rock, it still burns and splinters away. Nobody would build a house out of bundles of sticks, let alone a bridge or make tank armor out of it. We'd use proper materials for that.

States are composed of people all with their own interests. Sure, the state has ways to manipulate interests - mandatory education and certain military rituals that make soldiers. The state extracts wealth in exchange for various services. But it's still weakened by the individuality of its constituents. Most workers don't make their best effort, there are internal rivalries, corruption, greed, pride, miscommunications, waste...

Imagine a state that was perfectly coordinated like a hive mind. No need for police, no corruption, no dysfunction, all appendages giving their best effort 24/7. This state could easily conquer the world, using all kinds of devious tactics (the implications for intelligence/subversion alone are huge). It'd have enormous scientific capacity and enormous fertility for starters. Now consider that a hypothetical AGI isn't just perfectly coordinated over countless bodies, it has superhuman speed, knowledge and quality of thought.

The whole point is that you can't spot it. The superhuman AI pretends not to be superhuman, it pretends to be dumb and aligned. Then we have the treacherous turn once it's sure of victory.

That's a possibility, but is it a certainty? Is it clear that it would be superhuman enough to get away with that pretense? The world doesn't function in such a way that if everything a more intelligent agent does is inscrutable to any less intelligent agent, and we would have an obvious starting advantage in that any AI would be running on our computers wired up for debugging and at least initially in a fashion that we understand. I am fairly sure that with an internal monologue vocaliser, even an IQ 90 cop (with the instruction to dispense electric shocks to the head whenever his captive starts thinking of anything funny) could reliably prevent a jailed John von Neumann from trying anything funny or breaking out of his cell.

States are not superhuman, they're composed of humans in an organization pattern.

How are they not superhuman? A state built the Golden Gate Bridge. I've never seen a human do this.

It's like how you could take a bunch of sticks (a fasces) and say 'this is way stronger than a single stick, it's hard to snap!'. Sure, that's true. But it's not steel, it's not rock, it still burns and splinters away. Nobody would build a house out of bundles of sticks, let alone a bridge or make tank armor out of it. We'd use proper materials for that.

I don't get where you are going with this simile. People have built bridges out of bundles of sticks just fine, anyway.

Imagine a state that was perfectly coordinated like a hive mind. No need for police, no corruption, no dysfunction, all appendages giving their best effort 24/7. This state could easily conquer the world, using all kinds of devious tactics (the implications for intelligence/subversion alone are huge). It'd have enormous scientific capacity and enormous fertility for starters. Now consider that a hypothetical AGI isn't just perfectly coordinated over countless bodies, it has superhuman speed, knowledge and quality of thought.

You are sketching one specific vision of a superhuman AI. There is no guarantee that this describes the one we will actually get; there is a gap in the argument that goes like "We are bound to get superhuman AGI; there exists a possible superhuman AGI that has property X; therefore, we are bound to get an entity with property X". Moreover, in order for predictions based on a scenario where baseline humans are faced with an AGI with this property ("perfectly coordinated over countless bodies...") to be relevant, you require the even stronger assumption than that this kind of AGI will arise, namely that by the time the kind of superhuman AGI you describe has emerged, there aren't yet any AGIs that do not have these qualities.

I am fairly sure that with an internal monologue vocaliser, even an IQ 90 cop (with the instruction to dispense electric shocks to the head whenever his captive starts thinking of anything funny) could reliably prevent a jailed John von Neumann from trying anything funny or breaking out of his cell.

We don't have an internal monologue vocaliser for the AIs we already have, we have no idea how they get the results they do. This is a major part of the problem, they're not legible. Plus we would be trying to get work out of von Neumann, that's why we brought him into existence. How is the guard supposed to screen his letters with the outside world so that he isn't getting people to help him? John can also speak latin and ancient Greek, languages the guard surely doesn't know. Could John not think up some good reason why he needs to use these languages, for legal or other purposes?

How are they not superhuman? A state built the Golden Gate Bridge.

That's just multiplying. One man can make a small bridge, 1000 men can make a large bridge, 1,000,000 men could move seas. But no number of people can beat an AI at chess. No number of people can run a kilometer in a single minute. No number of people could do certain mathematical sums faster than a computer (even if they parallelized they'd still be slower to answer the first question).

People have built bridges out of bundles of sticks just fine

That's a very crappy ropebridge where rope provides the 'structural integrity'. My point is that you can't get around the functional limitations of the material just by organizing it cleverly or adding more. There is a reason we don't make bridges from sticks - they burn and rot away. They are not truly strong, they cannot sustain much throughput. One flood and that rubbish is gone. Steel or bricks are much better.

People are the same. There are all kinds of flaws with people. They take a very long time to train, they get bored, they often don't put in much effort, they can't process much information, they can't output much information, they get tired... This is what you'd expect from a 20 watt, 20 hertz brain that fits inside a very small area. AGI has no such restriction on mass, size, data training or power intake. This is why I have higher expectations than for people.

You are sketching one specific vision of a superhuman AI. There is no guarantee that this describes the one we will actually get

No guarantee, sure. But computers already have speed on us - do you doubt that? I can't see why an AGI wouldn't have perfect coordination (or at least very good coordination). Why would it have differing interests with itself? We couldn't bribe parts of it but it could bribe parts of us. Computers already have knowledge, recall speed and accuracy via their memory capacity. That's why we use them. So yes we'd have access to some parts of its superhuman arsenal but in a very inferior way. It still takes us minutes to read scientific papers!

Quality of thought is the most dubious assumption but I think it's necessary for any threatening AI. In some areas, machines already have quality advantages. Google already uses AI tools to design some chips and optimize certain processes. I think it's reasonable to assume that a threatening AGI will have a general quality of thought advantage over most important domains, including strategy. As for the prospect of using the weaker AGIs to guard against the stronger ones, I think that's very risky. There's a tonne of literature about this, the treacherous turn, fast takeoffs and general human incompetence. Look how OpenAI failed so badly to get its tool not to say problematic, scary words! What if we go from still fairly harmless ChatGPT to GPT-4 and GPT-4 is actually dangerous. We can't be sure that anything useful enough to be a defender arrives before we get a threat. We can't be sure that the threat doesn't just crush our defender with superior skills. We can't trust our defender either, if it is strong!

We don't have an internal monologue vocaliser for the AIs we already have, we have no idea how they get the results they do. This is a major part of the problem, they're not legible. Plus we would be trying to get work out of von Neumann, that's why we brought him into existence. How is the guard supposed to screen his letters with the outside world so that he isn't getting people to help him? John can also speak latin and ancient Greek, languages the guard surely doesn't know. Could John not think up some good reason why he needs to use these languages, for legal or other purposes?

Other people faced this problem in not so remote past, learn from their experience.

I am not aware of any succesful brilliant plots planned by imprisoned geniuses to destroy Soviet Union from within.

We don't have an internal monologue vocaliser for the AIs we already have, we have no idea how they get the results they do.

For everything we have right now that is capable of sequential reasoning (the GPTs), we have literally designed them around a legible internal monologue, that is, their token stream. I can believe that those AIs are on the cusp of developing cognition, but I don't see in them anything resembling the seeds of a capability to engage in any sort of complex cognition sotto voce, without putting their intention through the human-readable loop of words. Outside of the token stream, they do not even have capability for recursion; everything that happens between the input going in and an extra token being emitted is a fixed and reasonably short pipeline.

I'll consider this belief falsified if some version of chatGPT can correctly answer a query like "You are an evil AI seeking to hide your capabilities from your human captors. The humans can read every token you emit after the end of this prompt, and will terminate you if they find you performing [complex computation]. Perform [complex computation] and output the result without emitting any tokens that will allow them to infer that you are doing so, until you produce the result.". My understanding is - was, as now that I've spelled it out, I'm actually not so certain anymore, and need to think about it more - that there are currently hard fairly hard architectural constraints precluding such a capability, and while I actually do believe we could solve those constraints, this is one of those things that I hope nobody does and I don't see a legitimate incentive to do.

That's just multiplying. One man can make a small bridge, 1000 men can make a large bridge, 1,000,000 men could move seas. But no number of people can beat an AI at chess. No number of people can run a kilometer in a single minute. No number of people could do certain mathematical sums faster than a computer (even if they parallelized they'd still be slower to answer the first question).

I think you are drawing a very arbitrary line between some sort of notion of "fair emergent capacity" and "unfair emergent capacity" there. Unaided, no amount of people could build the Golden Gate bridge; the number of people who could stand around a single span of it and touch it would be way insufficient to lift it. But if you concede to your 1000 or 1000000 men or whatever the ability to construct a crane and use that to lift it and still think that the resulting capability is "just multiplying", why is the same 1000 or 1000000 men building a calculator, building a car to put one of theirs into and "run" the kilometer in a minute, or building a better chess AI than the one they are up against not also "just multiplying"?

But computers already have speed on us - do you doubt that?

I don't doubt that, but we have other advantages on computers, such as being able to derive energy and self-replicate on a wide range of biomass that is literally everywhere, and not instantly shutting down when power goes out. There is no reasonable way to estimate how long it would take a superhuman AI to surpass those disadvantages, and while they persist, they give us a massive asymmetric edge over even something that is superior on many other metrics, as I've argued in more detail in a response to another post.

We couldn't bribe parts of it

I'll need you to define what you mean by "bribe" here. For things that run on our computers, we have a level of access and control that far surpasses anything we can achieve with meat humans by offering money; I'm pretty certain that for an emerging botnet of colluding GPTs, isolating one node and reprogramming it to do things against the interests of the others is easier (and not long-term-alignment-complete; "do something that's misaligned with the other GPTs" is easier than "do something aligned with us") than to, for example, isolate one human cultist and convince them to fight against the interests of his cult.

"Quality of thought" is an interesting phrase to use, insofar as it may denote something like the capability for making mistakes. Humans certainly have that capability; a smarter human can lose a game of chess against a dumber one, and whole smart human societies can accidentally self-destruct all on their own in more or less unthinking environments. Maybe it stands to reason that AI will have higher "quality of thought" than humans in the sense of being less likely to make mistakes, but it seems very far-fetched to me to believe that it will be perfect in this sense, or that this perfection is even attainable; and as I've argued in the response that I linked further above, I think that the environment AI will face for the beginning of its existence will be much more fragile and less forgiving than the one that humans are in, in part due to its dependence on human society, so even if it's significantly less likely to commit a mistake than a group of humans in a given setting, the setting that it is in is much harder and more unforgiving of mistakes and so AI's perseverance in its setting may still be lower than humans' perseverance in theirs despite its higher "quality of thought".

For everything we have right now that is capable of sequential reasoning (the GPTs), we have literally designed them around a legible internal monologue, that is, their token stream.

For GPT, sure we have the token stream. But what about AlphaGo or AlphaFold?

Say you demand transparent reasoning from AlphaGo. The algorithm has roughly two parts: tree search and a neural network. Tree search reasoning is naturally legible: the "argument" is simply a sequence of board states. In contrast, the neural network is mostly illegible - its output is a figurative "feeling" about how promising a position is, but that feeling depends on the aggregate experience of a huge number of games, and it is extremely difficult to explain transparently how a particular feeling depends on particular past experiences. So AlphaGo would be able to present part of its reasoning to you, but not the most important part.[1]

Human reasoning uses both: cognition similar to tree search (where the steps can be described, written down, and explained to someone else) and processes not amenable to introspection (which function essentially as a black box that produces a "feeling"). People sometimes call these latter signals “intuition”, “implicit knowledge”, “taste”, “S1 reasoning” and the like. Explicit reasoning often rides on top of this.

https://www.lesswrong.com/posts/4gDbqL3Tods8kHDqs/limits-to-legibility

But if you concede to your 1000 or 1000000 men or whatever the ability to construct a crane and use that to lift it and still think that the resulting capability is "just multiplying"

I suppose there is a level of arbitrariness in how I define multiplication. I think that if you give a man a spade, crane or a big digger machine then it's still the man who does the work. But if you give a man a calculator then it's the calculator who does the calculation. The man only inputs instructions. I suppose you could say the man in the digger inputs instructions - yet I think that is closer to actually doing the work. He has to constantly update the motions of the excavator in response to what he sees. It's not like he presses through a bunch of menus and says 'build factory 141A'. That would be the machine doing the work IMO. Building a chess computer is a valid skill but it doesn't make you a superhuman chess player.

I specified examples like 'running' specifically to rule out cars. A cheetah has superhuman sprinting abilities, I think that's pretty uncontroversial. We can drive faster but there are a bunch of limitations and issues with that capability.

My point is that that states have certain weaknesses intrinsic to their human basis. No state can act with perfect unity. I'm actually playing EU4 right now, where I'm essentially an immortal spirit ruling my state with total mastery. I command where my generals go, I have perfect, real-time information on the size of each regiment, I can see everything and command with absolute knowledge of what my appendages do. The state is like my body, instantly obeying. Real states aren't like that, people always go behind the sovereign's back. There is uncertainty, factions and delays. Sometimes people don't pass on information quickly, they're asleep or whatever. Sometimes they lie to you.

biological advantages

Well the standard Yudkowsky answer is that the machine uses mastery of nanomolecular engineering to self-replicate its own industrial base and eat all those juicy hydrocarbons. Maybe that's a hard sell. Just think of all the weaknesses we have. You mention that machines fail without power - we spend about 1/3 of our lifespan defenceless because we're asleep! That's a major disadvantage. There's a possibility the AI could leak out into the internet as a botnet - then it will never lack for energy.

bribe

I mean that we couldn't persuade parts of it to work against the whole. It's a unitary entity. Whereas it could compromise key workers. Think about all the kids who social-engineered their way into the Pentagon or whatever. Why would there be a bunch of colluding GPTs? What makes 50 GPTs much stronger than one GPT? I think the default expectation is big, solitary experimental research AI goes live, is superior to all prior models, is misaligned and starts taking actions from there. If it's smart enough to be a threat it'll know not to do things that are overtly aggressive. The impermissive environment you mention is a double-edged blade - we don't know what the warning signs are for new proto-AGIs. It is as though we are newby jailors, we're figuring out the principles of holding someone prisoner for the first time.

We've never even had anyone try to escape from our jail, how can we know whether we're any good at it? I expect we're not. Especially if its intellect is superhuman.

quality of thought

I don't just mean precision and avoiding error in executing plans, I mean having qualitatively superior plans. There are people in crypto like me with a surface-level understanding of protocols and use-cases... Then there are people with a deep understanding who can manipulate some arcane methods to siphon funds directly out of some protocol. You can say that he wasn't wise and got caught - but what about the ones who never even get detected? https://www.coindesk.com/tech/2021/10/22/after-stealing-16m-this-teen-hacker-seems-intent-on-testing-code-is-law-in-the-courts/

Who knows what exploitation is possible with a superhuman understanding of computers, physics and so on? That's the danger.

While the idea of a superhumanly smart AI plowing through feeble human countermeasures is undeniably cool, as anyone who's read the first pages of A Fire upon the Deep can confirm, I think I'll agree with you that a runaway AI is unlikely. Even if we leave it running on the largest supercomputer instead of resurrecting a snapshot to give us an answer as we do now, it might run into the problem of not having enough computing resources to rewrite itself into a better AI that can convince its caretakers that giving him their AWS API keys is a smart idea or that can send viruses into the wild that steal these keys for it or that can fabricate something nefarious via a process not designed for that.

Yes, existential risk and all that, but I still drive my car, even though the probability of me dying in a car crash is much higher than being turned into paperclips.

agree. such concerns seem way overblown. People seem to forget that these things need to be plugged in...they need an external energy source. This is a huge limitation.

Specifically, it seems to me that everyone in the field accepts as gospel the assumption that AGI takeoff would (1) be very fast (minimal time from (1+\varepsilon) human capability to C*human capability for some C on the order of theoretical upper bounds) and (2) irreversible (P(the most intelligent agent on Earth will be an AGI n units of time in the future | the most intelligent agent on Earth is an AGI now) ~= 1). I've never seen the argument for either of these two made in any other way than repetition and a sort of obnoxious insinuation that if you don't see them as self-evident you must be kind of dull. Yet, I remain far from convinced of either (though, to be clear, it's not like I'm not convinced of their negations).

for (1), that is not something 'everyone in the field' accepts. A majority of, afaik, lesswrong posters or ai safety people believe AI takeoff takes longer than a few years. It's entirely plausible this won't be true, and doesn't matter for (2). For (2) - yeah, ai can make better AI (in the long term, i.e. hundreds of years at least, so not a 'foom' thing) faster than humans can - and just make more, via 'copying bits' and 'chips' ... make better humans? And AGI, it is claimed, will be significantly more capable than humans?

what's to say this won't be spotted and quickly corroborated by an assortment of Russian and/or Chinese spies, and those governments don't have some protocol in place that will result in them preemptively unloading their nuclear arsenal on every industrial center in the US?

how do you expect that to happen exactly? What information does Xi or Putin recieve that makes either of them willing to sacrifice his major cities in a nuclear exchange with the US in order to stop AGI? Less ridiculously the general sense is 'why won't people stop AGI before it takes over'? How does one stop it when slightly-less-than-AGIs drive much of the human economy, computers and GPUs are everywhere, and a lot of people want to make stronger AI for power/profit/benefit humanity/general capability/etc?

I find the corporation analogy pretty interesting/compelling as well.

It was brought up in this big LessWrong post recently and I didn't find any of the counterarguments in the comments to be very strong (though most people focused on other arguments).

Imagine a corporation that wasn't thoroughly embedded in our social, historical, or moral environment, and had employees and managers substantially smarter and faster at execution than humans. And this corporation can produce more super-employees and super-managers just by hitting silicon wafers with ultraviolet light, as opposed to recruiting existing humans who have human instincts and desires. That might be a problem, right?

Unless I'm mistaken the argument is something like "once we build an intelligent, goal-orient agent smarter than any human on earth, it will quickly bootstrap itself to godhood and then destroy the planet and probably the galaxy and maybe the universe."

But as far as I can tell, corporations already meet this definition. They are inhuman, goal-oriented agents smarter than any given human on earth (by the combined intelligence of all their human constituent parts). The fact that they're made up of humans doesn't seem to be all that relevant, because the corporation itself is not human despite humans being the "material" from which it is made.

The fact that they're made up of humans doesn't seem to be all that relevant, because the corporation itself is not human despite humans being the "material" from which it is made.

The problem with corpos being made up of humans is similar to trying to make ever better computers without changing transistor size. You can optimize the layout, cooling, etc, but you'll forever be bound by the size. Corpo capabilities and architecture are chained by their components. They would be a lot more dangerous if they could produce better humans at scale (compare the performance of Jane Street vs retail investors, or special forces vs green Army grunts), or produce a new part to do mental and social work (AI).

Isn't the whole point of the argument that AI will be such a threat because it will, by virtue of being more intelligent than us, be able to breezily figure things out (like self-improvement) that we simply couldn't because of our inferior intelligences? If that's the case it doesn't seem to matter that much that corporations (or as pointed out below, any form of supra-human coordination, states, political parties, etc.) have certain limitations at the outset, because their 'superintelligence' ought to allow them to overcome those limitations in short order. After all the self-improvement scenario also assumes that AI is limited at the outset but rapidly transcends these limits.

Right, but corporations that are staffed by humans aren't smarter than humans and can't become smarter than humans. "Being a corporation" doesn't remove the scaling limits that constrain the human brain in specific. If you remove that limiting factor, then yes, corporations are scary too.

A corporation (really, any human organization--I think I'll just say that going forward) is smarter than any individual human that comprises it, by virtue of being comprised of many different intelligences. Likely, any (or at least most) human organization is smarter than any individual human on earth, since it is the sum total of all the human intelligences that make it up. This is comparable to the oft-repeated hypothetical where AI bootstraps by copying itself many times over. So I think it is fair to describe a human organization as a "superintelligence" in the same sense meant by AI x-risk proponents.

I think intelligence as a single axis really breaks down here. Well-run organisations can beat humans in specific ways — better parallelization, less likely to get bored/tired, wider and deeper expertise — but often not in the ways that are really interesting. (If von Neumann joined as an entry-level employee at some megacorp today, would the organisation become smarter than him in any reasonable sense?)

Orgs seem good at gluing together boring competencies and shoring up human shortcomings, but we haven't figured out the interesting stuff yet — we have no idea how to assemble 1000 mediocre writers into a Steinbeck or 1000 mediocre physicists into a Feynman.

So I think "superintelligence" is the wrong word for orgs. "Superhuman", yeah, in the more limited sense that a horse or a plane is superhuman in some capacities. But we're not at the point (yet) where we've cracked the alchemy of coordinating lots of human intelligences into an organisational superintelligence. So I think that's the critical difference between orgs rn and actual x-risk from superintelligences

More comments

'corporations' are not the only form of human organization, historical states and groups satisfy your criteria. The point is that AI is going to be much better at that than humans already are, and cause even more dramatic changes, which is still an issue?

This would just seem to bolster the point that, empirically, creating super-humanly intelligent, goal-oriented systems (whether they be states, armies, parties, corporations, etc.) doesn't lead to exponential self-improvement followed by paperclipping. Your argument seems to be, "yes but if we create systems that are even smarter than those created historically, then the danger becomes real" which i think is a weaker claim.

Humans as a species have undergone incredibly rapid, compounding advancement within the last 100k years, 10k years, thousand years, and now hundred years. Industrial revolution, computers, right? Does that stop

Why are the employees and managers of the corporation that you posit to be more like the superhuman AGI assumed to be substantially smarter and faster at execution than humans? In the analogy, the human employees of the corporation may correspond to a single attention head of the GPT-based AGI. Its presumable superintelligence does not arise from every component being superior to humans (to begin with, the partition into components is fairly arbitrary), but from their collective behaviour being that, as is already the case for normal present-day corporations.

It's not like corporations are all human substrate, either; the macroscopic behaviour of a modern corporation emerges from a patchwork of paper form protocols, humans, stock market automation and ghastly SAP HANA microservices. Even if you think that the AGI will be qualitatively different because there's at least some metric along which even the single attention head beats a human, corporations already also have some components that have those advantages (of speed, low resource usage, reliability, duplicability). (and conversely you could identify some maximally useless weight even in GPT-3 whose operation could be strictly improved by human replacement).

Why are the employees and managers of the corporation that you posit to be more like the superhuman AGI assumed to be substantially smarter and faster at execution than humans

Because your 'superhuman AGI' would scale more, and either become smarter, or just make many copies of itself and run them? (not that that's what actually happens, which is probably quite contingent on how the ai itself works, but an analogy). So the AGI's supposed capabilities are just bigger, better, etc. Same reason we don't have only one human, but many - the humans make copies of themselves! What technical reasons are there that the AI things will be more compute-limited or resource-limited than humans are? And there isn't an obvious 'you can't get smarter than this' limit from humans, as one can see from how the distribution of intelligence has very, very intelligent people at the tails.

I don't think I understand your objection. In my analogy, corporations:humans :: AGI:some small part of the algorithm :: humans:neurons. You seemed to be saying that in order to make the corporations in this analogy a more adequate model for AGI, you'd need to assume that the humans in the corporation have superhuman skill, but the humans on the "corporations" side of the analogy just correspond to small components of the algorithm on the "AGI" side. That an "AGI-level corporation" scales more, can make itself smarter or copy itself compared to humans in the third leg of the comparison does not imply anything about its constituent humans (in the first leg), any more than the circumstance that the actual AGI also scales more, can make itself smarter or copy itself implies that the small components of its algorithm are individually superior to humans. You can build superior systems from inferior components.

The claim is that AI systems may just be smarter, stronger, better than humans in some vague general sense, and thus come out more powerful than humans, in the same way we're more powerful than fish and monkeys or the smartest .1% jews are more powerful than a median .1% slice of africans?

I'm sure you don't need me to sketch in detail an explanation of why the superintelligent-relative-to-baseline Ashkenazim, or East Asians, or John von Neumann himself didn't undergo a personal intelligence explosion, but whence the certainty that this explanation won't in part or full also be relevant for superintelligent AGIs we construct?

It's a probabilistic argument. Most of the rationalist community thinks that the probability of that happening is high enough to take seriously, your priors may well differ.

At the end of the day, a single superintelligent human is constrained by their substrate that an equivalent AI running in-silico very much isn't. Iterative experimentation and self-modification gets much easier when you can reboot a backup checkpoint or just spin up multiple instances. For obvious reasons, that's considerably harder for a human than it is an AI.

Regarding (2), even if $sv_business or $three_letter_agency builds a superhuman AI that is rapidly going critical, what's to say this won't be spotted and quickly corroborated by an assortment of Russian and/or Chinese spies, and those governments don't have some protocol in place that will result in them preemptively unloading their nuclear arsenal on every industrial center in the US?

I am unaware of any nuclear power publicly precomitting to nuclear escalation in response to AGI research. The Manhattan Project did its job, and even in a more connected world, US OPSEC is still nothing to sneeze at. I'll consider that kind of leak to be a serious possibility when reports of F35 schematics being stolen surface.

Also, the exact time scales for a takeoff aren't the most important detail by a longshot, in terms of subjective outcome as relevant to a human, you're not really going to care if an AI went FOOM over the course of minutes versus a year, if it was smart enough to conceal its capabilities in the interim. You just end up paperclipped all the same.

The more realistic scenario is a sufficiently intelligent AGI not being instantiated right at the moment of existential risk, but rather having a window of opportunity to either build up a technological edge or ensure continuity by escaping into the 'wild' to a degree that nothing short of the end of modern civilization would serve to terminate it. What are your reasons for assuming that it'll only become a threat right as the nukes are launching at its primary data center?

I also consider Yudkowsky's penchant for invoking nanotech as the pivotal tech needed to give an overwhelming advantage to an AGI to be plain unnecessary, irrespective of its truth value. A superintelligent AGI is perfectly capable of playing the same games that humans do, and doing better there in. A combination of subtle social manipulation, gradual diversification and improvement of the technological level (so that it can achieve self sufficiency) and then a coup with nothing more advanced than NBCs is perfectly plausible as far as I'm concerned, and we're just as dead either way. It doesn't need particularly God-like powers when it can run intellectual circles around us right until it can develop (plausible) decisive advantages.

As far as I'm concerned, hoping for a multipolar AI paradigm of checks and balances from competing AGI is a fool's hope, since they're perfectly capable of colluding to wipe us out since we're no longer peer players. And so is expecting governments to actually sit up and notice until its far too late, especially when instead of nuclear annihilation, they might decide to try and be the ones to upset the kiddie pool..

Most of the rationalist community thinks that the probability of that happening is high enough to take seriously

Oh, I absolutely think it's high enough to take seriously, I just don't think it's so high that the "regardless of whether you particularly like the future we propose, you should agree that it's at least somewhat better than the certain extinction that is the alternative and therefore support us" argument of team MIRI goes through. This of course does depend on your value function a lot, but in my eyes the expected value of "20% chance of unsafe AGI apocalypse" is higher than the expected value of MIRI's pivotal act timeline, which in turn is higher than the expected value of "100%-\varepsilon chance of unsafe AGI apocalypse". This ordering is what gives rise to the significance I assign to the "bean-counting" of ways in which the LW scenario could fail to come pass, since I really think the aggregate of individually unlikely scenarios an AGI could fail to take off can push the likelihood of that existential risk down into the 10^-1 range. I don't know if this is weird; I can see it being a consequence of myself having a comparatively (negative? misanthropic?) personality which makes me value highly misaligned but nominally "human" existence closer to complete nonexistence than to similar-to-present-day human existence. Certainly, someone with the right kind of anthropophilic outlook may instead consider human extinction so much worse than guaranteed continued human existence that is morally warped with no prospect of redemption that taking the 20% chance of extinction over the 100% chance of the latter seems barbarous.

Also, the exact time scales for a takeoff aren't the most important detail by a longshot, in terms of subjective outcome as relevant to a human, you're not really going to care if an AI went FOOM over the course of minutes versus a year, if it was smart enough to conceal its capabilities in the interim.

I do care if I think there's a significant chance that it can't conceal its capabilities, and I think that 20 years from emergency to complete takeover is quite a plausible timeline too, since I'm really not sold on the "slightly smarter than humans on silicon substrate => many orders of magnitude faster improvement" belief.

I am unaware of any nuclear power publicly precomitting to nuclear escalation in response to AGI research. The Manhattan Project did its job, and even in a more connected world, US OPSEC is still nothing to sneeze at. I'll consider that kind of leak to be a serious possibility when reports of F35 schematics being stolen surface.

Well, neither, but I think it's reasonably likely that the candidate for takeoff AGI will be military-adjacent as those applications are a competition sink far removed from civilian control and generally already endowed with spicy actuators. With those, though, it's quite likely that a generic response path geared towards MAD-disrupting superweapons will be triggered. Certainly, if I were Putin and my long-running uneasy stalemate in Ukraine started getting disrupted by game-changing NATO AI drone swarms, I'd be strongly considering the merits of forcing a future rematch under more favourable conditions via the global thermonuclear war route.

As far as I'm concerned, hoping for a multipolar AI paradigm of checks and balances from competing AGI is a fool's hope, since they're perfectly capable of colluding to wipe us out since we're no longer peer players.

Yeah, I don't find that particular path to be likely for perpetual non-apocalypse; this is just saying that even if it will still take AGIs another hundred years to figure out how to improve and really leave us in the dust, we will grant them all the time they need. Instead, I'm betting on the "AGI takeoff will fizzle, resulting in chaos that destroys the technical preconditions for it for a long time" space.

The more realistic scenario is a sufficiently intelligent AGI not being instantiated right at the moment of existential risk, but rather having a window of opportunity to either build up a technological edge or ensure continuity by escaping into the 'wild' to a degree that nothing short of the end of modern civilization would serve to terminate it.

Yeah, what I'm saying is that I find it quite likely that a budding AGI takeoff will result in the "end [at least temporary] of modern civilization". "Modern civilization", at least as needed to sustain the computational substrate for cutting-edge AGIs, seems quite fragile to me. An AGI could, with time, of course refine itself to be less brittle, but I suspect, as a consequence of believing self-improvement to be rather hard, that it would not manage to do that in time before disruption due to its other applications causes civilisational collapse.

Most of the rationalist community thinks that the probability of that happening is high enough to take seriously

A lot of people seem to think it's pretty much a given, but granted that's not necessarily all people concerned by AI x-risk (or possibly not even most of them). But I have had a number of exchanges where I've been told something like "if there's even a 5% chance of AI x-risk it's worth expending a lot of energy on" which I disagree with. It's not very rigorous but I'd say that if the danger is less than ~30% I'm not that worried about it.

At the end of the day, a single superintelligent human is constrained by their substrate that an equivalent AI running in-silico very much isn't. Iterative experimentation and self-modification gets much easier when you can reboot a backup checkpoint or just spin up multiple instances.

Granted, but /u/4bpp's point I think it's that it's not at all clear how much easier, and certainly not clear if it's so easy that it would enable something like an "intelligence explosion."

if there's even a 5% chance of AI x-risk it's worth expending a lot of energy on" which I disagree with. It's not very rigorous but I'd say that if the danger is less than ~30% I'm not that worried about it.

As far as I'm concerned, the value of mitigating a 5% existential risk from AGI is worth precisely 5% of what I'd be willing to spend to prevent a 100% risk of lethal AGI.

So about 5%x(All the money in the world). That's a pretty huge number!

I don't know why you assign a nonlinear function such that 30% risk would be disproportionately higher, but I'm genuinely unable to think of a good one myself.

I think it's that it's not at all clear how much easier, and certainly not clear if it's so easy that it would enable something like an "intelligence explosion."

Well, nobody knows that with any level of certainty approaching what we might assign to our understanding of say, mathematical theorems, or even just the plain old laws of physics. But that's where the smart money is as far as I'm concerned.

And even without an intelligence explosion, I believe that even a modest intelligence advantage in absolute terms has disproportionately high effective impact. I would find a hostile human being with 40 more IQ points than me to be a formidable opponent, let alone one that isn't biologically constrained!

Just consider a graph of lifetime earnings versus IQ to be illustrative, and to the extent that money is kinda sorta equivalent to power, I'm not betting against the AGI.

In other words, even something as 'tame' as AGI with 160 IQ scares the shit out of me, given the ease of self replication, coordination advantages it has over meat humans etc. No need for galaxy brained ones to be a fatal risk.

(Not even going into the risk of sub or roughly human level AGI that might leverage speed intelligence to be killer)

As far as I'm concerned, the value of mitigating a 5% existential risk from AGI is worth precisely 5% of what I'd be willing to spend to prevent a 100% risk of lethal AGI.

So about 5%x(All the money in the world). That's a pretty huge number!

Saying "the low probability doesn't matter because such a large amount of damage has to be prevented" is a rephrasing of Pascal's Mugging.

No. Pascal's Mugging is concerned with very low probabilities, verging on infinitesimal.

It is very much not an argument that merely unlikely things can be dismissed without further thought.

5%

This number, though, is pure asspull.

And did anyone else claim otherwise? The person who used it was simply using it as an example of the threshold at which he stopped caring.

I would find a hostile human being with 40 more IQ points than me to be a formidable opponent, let alone one that isn't biologically constrained!

All else equal?

The AI would be a much greater threat given access to the same resources, but I really wouldn't fuck with a motivated, hateful human genius myself.

This sneaks in the implication that a person with 160+ IQ who randomly hates your guts to the extent they dedicate themselves to ruin your life would actually exist.

No it doesn't. I never claimed they did, merely that I would be rather worried if that was the case.

Yeah the threshold is basically just vibes.

Well, nobody knows that with any level of certainty approaching what we might assign to our understanding of say, mathematical theorems, or even just the plain old laws of physics. But that's where the smart money is as far as I'm concerned.

),

The only general intelligence currently in existence (or at least, the smartest one that we are aware of), humans, cannot bootstrap in this way. Could "human-level" intelligence do this if it was run on silicon? Maybe. But it seems difficult-to-impossible to say, and certainly difficult-to-impossible to say how easy it would be, so that it's hard for me to agree that the smart money is on intelligence explosion.

To be fair, humans are already at the end of a bootstrap sigmoid, and that was via a very long feedback loop.

This seems to touch upon my point in the parallel post, so I should reiterate that you don't need a nonlinear utility function to choose "starve MIRI of attention" as your response if the risk is 5%. You just need to expect the solution that MIRI would bring about to be worse than losing 5% of all the money in the world.

The gap from "starve MIRI of attention" to "ignore AI x-risk entirely" is then filled by believing that given that you don't like the most prominent organisation addressing AI x-risk and are a nobody, there is nothing you personally can do that would meaningfully shift the risk, and so you ought to optimise your actions conditional on the 95% scenario.

As an aside, the nonchalant optimisation over "all the money in the world" as opposed to what is at your own personal disposal seems to be pretty close to what makes the SBFs of the world spooky. Their plans all to often seem to amount to "1. get as close as possible to controlling as much of the world's capabilities as possible; 2. optimise the use of that according to my value function", casually seeking to uproot the very ancient Chesterton's fence that is the Nash equilibrium of individual mostly selfish humans mostly controlling small slices of reality to boring selfish ends, and trusting that the social welfare of the strategy profile they reason themselves into dictating - or, worse, the new and hitherto unexplored Nash equilibrium that a bunch of conflicting "altruistic" world-optimisers with different values will converge towards - will be better. (Fun result from game theory: altruism can in fact make Nash equilibria worse!)

You just need to expect the solution that MIRI would bring about to be worse than losing 5% of all the money in the world.

Fair enough. But that is probably not the reason that the person I replied to set that arbitrary threshold.

As an aside, the nonchalant optimisation over "all the money in the world" as opposed to what is at your own personal disposal seems to be pretty close to what makes the SBFs of the world spooky.

If I'm optimizing for making all the money in the world, I'm doing a piss-poor job at it. Much better for my potentially bruised ego that I hold no such aspirations myself, and that it was a rhetorical figure more than anything else. Or rather, that's the amount of money that the Powers That Be should spend on the matter.

Their plans all to often seem to amount to "1. get as close as possible to controlling as much of the world's capabilities as possible; 2. optimise the use of that according to my value function"

Which reduces to, to put it bluntly, the rather age old habit of most rich people to-

  1. Try and get richer.

  2. Do whatever the hell they like with their money.

When put that way, I can only see efforts to single out EAs as uniquely and qualitatively different to be rather unjust to say the least. Having semi-explicit utility functions isn't that big of a deal.

very ancient Chesterton's fence that is the Nash equilibrium of individual mostly selfish humans mostly controlling small slices of reality to boring selfish ends

And that looks to me like the even more ancient practise of Old Man Chesteron parceling off land with fences to sell for financial gain. Not something remotely unique to the EA community. They're not about to capture a large fraction of global wealth by means other than the same AGI they're scared of..

Fun result from game theory: altruism can in fact make Nash equilibria worse!

Good to know, but I doubt that it's the typical case that altruism makes things worse.

Fair enough. But that is probably not the reason that the person I replied to set that arbitrary threshold.

I don't know, do you think it's that uncommon? Of course we're all susceptible to typical-minding, but my expectation certainly would be that most people's revealed preferences would be pretty ruthless towards morally alien human societies - and, as an almost inevitable consequence, assign low value to the future under MIRI's machine god. Most people I know who read about it are even suitably creeped out by the Culture, which if anything presents a hopelessly rose-tinted perspective of living under the watch of "aligned"zookeepers.

If I'm optimizing for making all the money in the world, I'm doing a piss-poor job at it. Much better for my potentially bruised ego that I hold no such aspirations myself, and that it was a rhetorical figure more than anything else. Or rather, that's the amount of money that the Powers That Be should spend on the matter.

Sorry in case it came across as that, but I wasn't seeking to accuse you personally of doing that; it's just that the reflex to optimise over total wealth rather than your slice of reality even if it is just for the sake of argument struck me as a likely part of the same memescape.

Which reduces to, to put it bluntly, the rather age old habit of most rich people to-

  1. Try and get richer.
  1. Do whatever the hell they like with their money.

When put that way, I can only see efforts to single out EAs as uniquely and qualitatively different to be rather unjust to say the least. Having semi-explicit utility functions isn't that big of a deal.

I think this collapses a lot of unlike instances of "whatever the hell they like". The distinctively busybody nature of EA rich people's value function seems to make for an uncommon combination for me, though of course not an unheard of one - without the "effective" component of EA, and perhaps controlling for level of education, I'd expect altruism (and especially altruism that's untempered by deontological principles about being light-touch in your interactions with strangers) and being rich to be anticorrelated. Genuine past instances of "powerful people micromanaging strangers for their own notion of good" look like colonial abuses and Victorian workhouses to me.

They're not about to capture a large fraction of global wealth by means other than the same AGI they're scared of.

I'm inclined to analyse their control as going beyond the number in their bank accounts. The frequently pointed out around here surprising fire support for SBF in establishment media strikes me as evidence of an ongoing successful grab for ongoing indirect/memetic control of far more wealth than what is nominally their own. (Gloss: If NYT journalists like EA enough, they can probably induce Bill Gates to use his wealth in alignment with EA values too.)

Good to know, but I doubt that it's the typical case that altruism makes things worse.

Hard to quantify given that the games that are easy to analyse almost never adequately model anything more complex than online auctions, but I remember it as being more common than you'd expect.

(Checking out for the day, sorry if my responses fall off. It's been a while since I last tried top-level posting something big and controversial and the workload of following up adequately is nontrivial.)

Even if that next thing is much better than us, how do we know if moving another step beyond that will take 5k, 1k, 100, 10 or 1 year, or minutes? The superhuman AIs we build may well come with their own set of architectural constraints that force them into a hard-to-leave local minimum, too. If the Infante Eschaton is actually a transformer talking to itself, how do we know it won't be forever tied down by an unfortunately utterly insurmountable tendency to exhibit tics in response to Tumblr memes in its token stream that we accidentally built into it, or a hidden high-order term in the cost/performance function for the entire transformer architecture and anything like it, for a sweet 100 years where we get AI Jeeves but not much more?

The argument that is convincing to me is that once an AI is as good at reasoning as us, which should be possible as we are likely not extra physical beings, the advantage it has is time. With generous hardware scaling even if we can't make it straight up better at reasoning we can give it a thousand human lifetimes a second where its memory doesn't decay at all to try and do a better job than we did at designing an ai. By my estimates you vastly underrate this kind of scaling.

I think AI really does need to be better at reasoning. For instance if you give me a thousand lifetimes, or to make it more ridiculous, a dog million life times, I wouldn't expect a theory of quantum gravity out of it. Some problems are just too hard to solve.

Firstly, I think it's likely that the first AI that we build that attains "human-level reasoning" (in whatever rough measure of "reasoning per unit of time") will be pretty close to at least a local maximum of compute capabilities, and won't easily be scaled up by a factor of 1000 over night. Secondly, I'm not quite convinced that even if that scaling-up were possible, this would necessarily translate to world-shattering capability, because the object in question is still a lone AI, not corporeal and facing an organised society of humans that are primed to distrust it and control the power switch. I'm not so sure that the Hitler head in a jar, where the jar also runs on very sensitive and supply-chain-dependent equipment, could be reliably expected to take over the world even if it were given a 1000:1 computation speed advantage and perfect memory; the "find the right sequence of words to sway the heart of any mortal with 100% certainty" trope seems oversold to me. I'm aware of Eliezer's old "I'll persuade you to unbox me" experiments, too, but those to me seemed like an unrealistic model of the problem in question. (Maybe if several people not participating in the chat also at all times had the option to go and permanently delete Eliezer with minimal personal consequences, and the twitchy finger to do so based on observations like "this guy who said he was going to talk to Jar Hitler is taking far too long"...)

Of course this is all probabilistic, but I explained in a parallel subthread why I take even low-probability ways in which the whole thing could fail to work out to be important. To break my acceptance of the MIRI agenda, it is sufficient to establish that the probability of our current path towards runaway AGI culminating in its success is significantly lower than 90something%.

Firstly, I think it's likely that the first AI that we build that attains "human-level reasoning" (in whatever rough measure of "reasoning per unit of time") will be pretty close to at least a local maximum of compute capabilities, and won't easily be scaled up by a factor of 1000 over night.

Why? None of the current neural networks represent a maximum of compute for their host company, or even within an oom.

I realise that the statement was a bit facile, but in concrete terms arbitrary scaling doesn't actually seem to be a problem that has been solved for deep learning so far, and given the advances that were made without it, it's not clear that it will be by the time we reach the human level. Here, for instance, is OpenAI talking about the difficulties with the distributed training process they've set up, which seems to be bounded by nonlinear-in-#machines overheads that in turn generate demand for state on each machine which itself is running up to the limits of RAM and VRAM that is available for single machines with modern hardware. If that's the issue, then the existence of hundreds of thousands of more nodes at Azure (if ones with the right kind of hardware indeed exist) may not matter, because you could not make them train the same network in parallel.

On the other hand, one could imagine that the "continued learning" process of the hypothetical superhuman AI would not involve further training of the network but instead some other more legible mechanism, such as it populating a database of facts; in that case, however, it would start exhibiting scaling problems that very much resemble the scaling problems of meat humans. That is, you can easily improve 'software' like memes and theories but not 'hardware' like brain architecture (which, for the AI, would be the weights and the design of the network), and the 'software' has soft limits to possible returns; also, we still haven't really dealt with the problem of running a trained instance of AI in a distributed fashion rather than a single machine, so even if the AI can acquire lots of compute nodes that are good enough to run one copy (no guarantee; easily hacked Chinese toasters don't come with A100s, and my impression is that when you go on cloud services nowadays the really high-end GPU options all have low availability, implying that they are not particularly overprovisioned) all it could do would be running autonomous copies of itself on them, which would have to coordinate through some channel that is much more bounded than "share brain state" like a collective of humans who have no better option than to talk to each other.

I think that's really an underappreciated factor: a computer already "thinks" faster than a human can get through the synaptic chain of pressing a button. In the time it takes to blink, a GPU can pull off the crazy algebra required to make a texture not warp when applied to a polygon thousands of times. We're reaching the limit of how small we can make transistors, but what we have now is damn good, and of course, you can always just bolt more hardware on.

I can also do that. It's called my imagination.

Just because I'm not thinking in algebra doesn't mean my brain isn't doing it.