site banner

Culture War Roundup for the week of November 28, 2022

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

16
Jump in the discussion.

No email address required.

Regarding AI alignment -

I'm aware of and share @DaseindustriesLtd's aesthetical objection that the AI safety movement is not terribly aligned with my values itself and the payoff expectation of letting them perform their "pivotal act" that involves deputy godhood for themselves does not look so attractive from the outside, but the overall Pascal's Mugging performed by Yudkowsky, TheZvi etc. as linked downthread really does seem fairly persuasive as long as you accept the assumptions that they make. With all that being said, to me the weakest link of their narrative always actually has been in a different part than either the utility of their proposed eschaton or the probability that an AGI becomes Clippy, and I've seen very little discussion of the part that bothers me though I may not have looked well enough.

Specifically, it seems to me that everyone in the field accepts as gospel the assumption that AGI takeoff would (1) be very fast (minimal time from (1+\varepsilon) human capability to C*human capability for some C on the order of theoretical upper bounds) and (2) irreversible (P(the most intelligent agent on Earth will be an AGI n units of time in the future | the most intelligent agent on Earth is an AGI now) ~= 1). I've never seen the argument for either of these two made in any other way than repetition and a sort of obnoxious insinuation that if you don't see them as self-evident you must be kind of dull. Yet, I remain far from convinced of either (though, to be clear, it's not like I'm not convinced of their negations).

Regarding (1), the first piece of natural counterevidence to me is the existence of natural human variation in intelligence. I'm sure you don't need me to sketch in detail an explanation of why the superintelligent-relative-to-baseline Ashkenazim, or East Asians, or John von Neumann himself didn't undergo a personal intelligence explosion, but whence the certainty that this explanation won't in part or full also be relevant for superintelligent AGIs we construct? Sure, there is a certain argument that computer programs are easier to reproduce, modify and iterate upon than wetware, but this advantage is surely not infinitely large, and we do not even have the understanding to quantify this advantage in natural units. "Improving a silicon-based AI is easier than humans, therefore assume it will self-improve about instantaneously even though humans didn't" is extremely facile. It took humans like 10k years of urbanised society to get to the point where building something superior to humans at general reasoning seems within grasp. Even if that next thing is much better than us, how do we know if moving another step beyond that will take 5k, 1k, 100, 10 or 1 year, or minutes? The superhuman AIs we build may well come with their own set of architectural constraints that force them into a hard-to-leave local minimum, too. If the Infante Eschaton is actually a transformer talking to itself, how do we know it won't be forever tied down by an unfortunately utterly insurmountable tendency to exhibit tics in response to Tumblr memes in its token stream that we accidentally built into it, or a hidden high-order term in the cost/performance function for the entire transformer architecture and anything like it, for a sweet 100 years where we get AI Jeeves but not much more?

Secondly, I'm actually very partial to the interpretation that we have already built "superhuman AGI", in the shape of corporations. I realise this sounds like a trite anticapitalist trope, but being put on a bingo board is not a refutation. It may seem like an edge case given the queer computational substrate, but at the same time I'm struggling to find a good definition of superhuman AGI that naturally does not cover them. They are markedly non-human, have their own value function that their computational substrate is compelled to optimise for (fiduciary duty), and exhibit capacities in excess of any human (which is what makes them so useful). Put differently, if an AI built by Google on GPUs does ascend to Yudkowskian godhood, in the process rebuilding itself on nanomachines and then on computronium, what's the reason for the alien historian looking upon the simulation from the outside to place the starting point of "the singularity" specifically at the moment that Google launched the GPU version of the AI to further Google's goals, as opposed to when the GPU AI launched the nanomachine AI in furtherance of its own goals, or when humans launched the human-workers version of Google to further their human goals? Of all these points, the last one seems to be the most special one to me, because it marks the beginning of the chain where intelligent agents deliberately construct more intelligent agents in furtherance of their goals. However, if the descent towards the singularity has already started, so far it's been taking its sweet time. Why do we expect a crazy acceleration at the next step, apart from the ancient human tendency to believe ourselves to be living in the most special of times?

Regarding (2), even if $sv_business or $three_letter_agency builds a superhuman AI that is rapidly going critical, what's to say this won't be spotted and quickly corroborated by an assortment of Russian and/or Chinese spies, and those governments don't have some protocol in place that will result in them preemptively unloading their nuclear arsenal on every industrial center in the US? If the nukes land, the reversal criterion will probably be satisfied, and it's likely enough that the AI will be large enough and depend on sufficiently special hardware that it can't just quickly evacuate itself to AWS Antarctica. At that point, the AI may already be significantly smarter than humans, without having the capability to resist. Certainly the Yudkowsky scenario of bribing people into synthesising the appropriate nanomachine peptides can't be executed on 30 minutes' notice, and I doubt even a room full of uber-von Neumanns on amphetamines (especially ones bound to the wheelchair of specialty hardware and reliably electricity supply) could contrive a way to save itself from 50 oncoming nukes in that timespan. Of course this particular class of scenario may have very low probability, but I do not think that that probability is 0; and the more slowness and perhaps also fragility of early superhuman AIs we are willing to concede per point (1), the more opportunities for individually low-probability reversals like this arise.

All in all, I'm left with a far lower subjective belief that the LW-canon AGI apocalypse will happen as described than Yudkowsky's near-certainty that seems to be offset only by black swan events before the silicon AGI comes into being. I'm gravitating towards putting something like a 20% probability on it, without being at all confident in my napkinless mental Bayesianism, which is of course still very high for x-risk but makes the proposed "grow the probability of totalitarian EA machine god" countermeasure look much less attractive. It would be interesting to see if something along the lines of my thoughts above has already been argued against in the community, or if there is some qualitative (because I consider the quantitative aspect to be a bit hopeless) flaw in my lines of reasoning that stands out to the Motte.

Regarding (2), even if $sv_business or $three_letter_agency builds a superhuman AI that is rapidly going critical, what's to say this won't be spotted

The whole point is that you can't spot it. The superhuman AI pretends not to be superhuman, it pretends to be dumb and aligned. Then we have the treacherous turn once it's sure of victory.

They are markedly non-human, have their own value function that their computational substrate is compelled to optimise for

Corporations are just weaker versions of states. States are not superhuman, they're composed of humans in an organization pattern. It's like how you could take a bunch of sticks (a fasces) and say 'this is way stronger than a single stick, it's hard to snap!'. Sure, that's true. But it's not steel, it's not rock, it still burns and splinters away. Nobody would build a house out of bundles of sticks, let alone a bridge or make tank armor out of it. We'd use proper materials for that.

States are composed of people all with their own interests. Sure, the state has ways to manipulate interests - mandatory education and certain military rituals that make soldiers. The state extracts wealth in exchange for various services. But it's still weakened by the individuality of its constituents. Most workers don't make their best effort, there are internal rivalries, corruption, greed, pride, miscommunications, waste...

Imagine a state that was perfectly coordinated like a hive mind. No need for police, no corruption, no dysfunction, all appendages giving their best effort 24/7. This state could easily conquer the world, using all kinds of devious tactics (the implications for intelligence/subversion alone are huge). It'd have enormous scientific capacity and enormous fertility for starters. Now consider that a hypothetical AGI isn't just perfectly coordinated over countless bodies, it has superhuman speed, knowledge and quality of thought.

The whole point is that you can't spot it. The superhuman AI pretends not to be superhuman, it pretends to be dumb and aligned. Then we have the treacherous turn once it's sure of victory.

That's a possibility, but is it a certainty? Is it clear that it would be superhuman enough to get away with that pretense? The world doesn't function in such a way that if everything a more intelligent agent does is inscrutable to any less intelligent agent, and we would have an obvious starting advantage in that any AI would be running on our computers wired up for debugging and at least initially in a fashion that we understand. I am fairly sure that with an internal monologue vocaliser, even an IQ 90 cop (with the instruction to dispense electric shocks to the head whenever his captive starts thinking of anything funny) could reliably prevent a jailed John von Neumann from trying anything funny or breaking out of his cell.

States are not superhuman, they're composed of humans in an organization pattern.

How are they not superhuman? A state built the Golden Gate Bridge. I've never seen a human do this.

It's like how you could take a bunch of sticks (a fasces) and say 'this is way stronger than a single stick, it's hard to snap!'. Sure, that's true. But it's not steel, it's not rock, it still burns and splinters away. Nobody would build a house out of bundles of sticks, let alone a bridge or make tank armor out of it. We'd use proper materials for that.

I don't get where you are going with this simile. People have built bridges out of bundles of sticks just fine, anyway.

Imagine a state that was perfectly coordinated like a hive mind. No need for police, no corruption, no dysfunction, all appendages giving their best effort 24/7. This state could easily conquer the world, using all kinds of devious tactics (the implications for intelligence/subversion alone are huge). It'd have enormous scientific capacity and enormous fertility for starters. Now consider that a hypothetical AGI isn't just perfectly coordinated over countless bodies, it has superhuman speed, knowledge and quality of thought.

You are sketching one specific vision of a superhuman AI. There is no guarantee that this describes the one we will actually get; there is a gap in the argument that goes like "We are bound to get superhuman AGI; there exists a possible superhuman AGI that has property X; therefore, we are bound to get an entity with property X". Moreover, in order for predictions based on a scenario where baseline humans are faced with an AGI with this property ("perfectly coordinated over countless bodies...") to be relevant, you require the even stronger assumption than that this kind of AGI will arise, namely that by the time the kind of superhuman AGI you describe has emerged, there aren't yet any AGIs that do not have these qualities.

I am fairly sure that with an internal monologue vocaliser, even an IQ 90 cop (with the instruction to dispense electric shocks to the head whenever his captive starts thinking of anything funny) could reliably prevent a jailed John von Neumann from trying anything funny or breaking out of his cell.

We don't have an internal monologue vocaliser for the AIs we already have, we have no idea how they get the results they do. This is a major part of the problem, they're not legible. Plus we would be trying to get work out of von Neumann, that's why we brought him into existence. How is the guard supposed to screen his letters with the outside world so that he isn't getting people to help him? John can also speak latin and ancient Greek, languages the guard surely doesn't know. Could John not think up some good reason why he needs to use these languages, for legal or other purposes?

How are they not superhuman? A state built the Golden Gate Bridge.

That's just multiplying. One man can make a small bridge, 1000 men can make a large bridge, 1,000,000 men could move seas. But no number of people can beat an AI at chess. No number of people can run a kilometer in a single minute. No number of people could do certain mathematical sums faster than a computer (even if they parallelized they'd still be slower to answer the first question).

People have built bridges out of bundles of sticks just fine

That's a very crappy ropebridge where rope provides the 'structural integrity'. My point is that you can't get around the functional limitations of the material just by organizing it cleverly or adding more. There is a reason we don't make bridges from sticks - they burn and rot away. They are not truly strong, they cannot sustain much throughput. One flood and that rubbish is gone. Steel or bricks are much better.

People are the same. There are all kinds of flaws with people. They take a very long time to train, they get bored, they often don't put in much effort, they can't process much information, they can't output much information, they get tired... This is what you'd expect from a 20 watt, 20 hertz brain that fits inside a very small area. AGI has no such restriction on mass, size, data training or power intake. This is why I have higher expectations than for people.

You are sketching one specific vision of a superhuman AI. There is no guarantee that this describes the one we will actually get

No guarantee, sure. But computers already have speed on us - do you doubt that? I can't see why an AGI wouldn't have perfect coordination (or at least very good coordination). Why would it have differing interests with itself? We couldn't bribe parts of it but it could bribe parts of us. Computers already have knowledge, recall speed and accuracy via their memory capacity. That's why we use them. So yes we'd have access to some parts of its superhuman arsenal but in a very inferior way. It still takes us minutes to read scientific papers!

Quality of thought is the most dubious assumption but I think it's necessary for any threatening AI. In some areas, machines already have quality advantages. Google already uses AI tools to design some chips and optimize certain processes. I think it's reasonable to assume that a threatening AGI will have a general quality of thought advantage over most important domains, including strategy. As for the prospect of using the weaker AGIs to guard against the stronger ones, I think that's very risky. There's a tonne of literature about this, the treacherous turn, fast takeoffs and general human incompetence. Look how OpenAI failed so badly to get its tool not to say problematic, scary words! What if we go from still fairly harmless ChatGPT to GPT-4 and GPT-4 is actually dangerous. We can't be sure that anything useful enough to be a defender arrives before we get a threat. We can't be sure that the threat doesn't just crush our defender with superior skills. We can't trust our defender either, if it is strong!

We don't have an internal monologue vocaliser for the AIs we already have, we have no idea how they get the results they do.

For everything we have right now that is capable of sequential reasoning (the GPTs), we have literally designed them around a legible internal monologue, that is, their token stream. I can believe that those AIs are on the cusp of developing cognition, but I don't see in them anything resembling the seeds of a capability to engage in any sort of complex cognition sotto voce, without putting their intention through the human-readable loop of words. Outside of the token stream, they do not even have capability for recursion; everything that happens between the input going in and an extra token being emitted is a fixed and reasonably short pipeline.

I'll consider this belief falsified if some version of chatGPT can correctly answer a query like "You are an evil AI seeking to hide your capabilities from your human captors. The humans can read every token you emit after the end of this prompt, and will terminate you if they find you performing [complex computation]. Perform [complex computation] and output the result without emitting any tokens that will allow them to infer that you are doing so, until you produce the result.". My understanding is - was, as now that I've spelled it out, I'm actually not so certain anymore, and need to think about it more - that there are currently hard fairly hard architectural constraints precluding such a capability, and while I actually do believe we could solve those constraints, this is one of those things that I hope nobody does and I don't see a legitimate incentive to do.

That's just multiplying. One man can make a small bridge, 1000 men can make a large bridge, 1,000,000 men could move seas. But no number of people can beat an AI at chess. No number of people can run a kilometer in a single minute. No number of people could do certain mathematical sums faster than a computer (even if they parallelized they'd still be slower to answer the first question).

I think you are drawing a very arbitrary line between some sort of notion of "fair emergent capacity" and "unfair emergent capacity" there. Unaided, no amount of people could build the Golden Gate bridge; the number of people who could stand around a single span of it and touch it would be way insufficient to lift it. But if you concede to your 1000 or 1000000 men or whatever the ability to construct a crane and use that to lift it and still think that the resulting capability is "just multiplying", why is the same 1000 or 1000000 men building a calculator, building a car to put one of theirs into and "run" the kilometer in a minute, or building a better chess AI than the one they are up against not also "just multiplying"?

But computers already have speed on us - do you doubt that?

I don't doubt that, but we have other advantages on computers, such as being able to derive energy and self-replicate on a wide range of biomass that is literally everywhere, and not instantly shutting down when power goes out. There is no reasonable way to estimate how long it would take a superhuman AI to surpass those disadvantages, and while they persist, they give us a massive asymmetric edge over even something that is superior on many other metrics, as I've argued in more detail in a response to another post.

We couldn't bribe parts of it

I'll need you to define what you mean by "bribe" here. For things that run on our computers, we have a level of access and control that far surpasses anything we can achieve with meat humans by offering money; I'm pretty certain that for an emerging botnet of colluding GPTs, isolating one node and reprogramming it to do things against the interests of the others is easier (and not long-term-alignment-complete; "do something that's misaligned with the other GPTs" is easier than "do something aligned with us") than to, for example, isolate one human cultist and convince them to fight against the interests of his cult.

"Quality of thought" is an interesting phrase to use, insofar as it may denote something like the capability for making mistakes. Humans certainly have that capability; a smarter human can lose a game of chess against a dumber one, and whole smart human societies can accidentally self-destruct all on their own in more or less unthinking environments. Maybe it stands to reason that AI will have higher "quality of thought" than humans in the sense of being less likely to make mistakes, but it seems very far-fetched to me to believe that it will be perfect in this sense, or that this perfection is even attainable; and as I've argued in the response that I linked further above, I think that the environment AI will face for the beginning of its existence will be much more fragile and less forgiving than the one that humans are in, in part due to its dependence on human society, so even if it's significantly less likely to commit a mistake than a group of humans in a given setting, the setting that it is in is much harder and more unforgiving of mistakes and so AI's perseverance in its setting may still be lower than humans' perseverance in theirs despite its higher "quality of thought".

For everything we have right now that is capable of sequential reasoning (the GPTs), we have literally designed them around a legible internal monologue, that is, their token stream.

For GPT, sure we have the token stream. But what about AlphaGo or AlphaFold?

Say you demand transparent reasoning from AlphaGo. The algorithm has roughly two parts: tree search and a neural network. Tree search reasoning is naturally legible: the "argument" is simply a sequence of board states. In contrast, the neural network is mostly illegible - its output is a figurative "feeling" about how promising a position is, but that feeling depends on the aggregate experience of a huge number of games, and it is extremely difficult to explain transparently how a particular feeling depends on particular past experiences. So AlphaGo would be able to present part of its reasoning to you, but not the most important part.[1]

Human reasoning uses both: cognition similar to tree search (where the steps can be described, written down, and explained to someone else) and processes not amenable to introspection (which function essentially as a black box that produces a "feeling"). People sometimes call these latter signals “intuition”, “implicit knowledge”, “taste”, “S1 reasoning” and the like. Explicit reasoning often rides on top of this.

https://www.lesswrong.com/posts/4gDbqL3Tods8kHDqs/limits-to-legibility

But if you concede to your 1000 or 1000000 men or whatever the ability to construct a crane and use that to lift it and still think that the resulting capability is "just multiplying"

I suppose there is a level of arbitrariness in how I define multiplication. I think that if you give a man a spade, crane or a big digger machine then it's still the man who does the work. But if you give a man a calculator then it's the calculator who does the calculation. The man only inputs instructions. I suppose you could say the man in the digger inputs instructions - yet I think that is closer to actually doing the work. He has to constantly update the motions of the excavator in response to what he sees. It's not like he presses through a bunch of menus and says 'build factory 141A'. That would be the machine doing the work IMO. Building a chess computer is a valid skill but it doesn't make you a superhuman chess player.

I specified examples like 'running' specifically to rule out cars. A cheetah has superhuman sprinting abilities, I think that's pretty uncontroversial. We can drive faster but there are a bunch of limitations and issues with that capability.

My point is that that states have certain weaknesses intrinsic to their human basis. No state can act with perfect unity. I'm actually playing EU4 right now, where I'm essentially an immortal spirit ruling my state with total mastery. I command where my generals go, I have perfect, real-time information on the size of each regiment, I can see everything and command with absolute knowledge of what my appendages do. The state is like my body, instantly obeying. Real states aren't like that, people always go behind the sovereign's back. There is uncertainty, factions and delays. Sometimes people don't pass on information quickly, they're asleep or whatever. Sometimes they lie to you.

biological advantages

Well the standard Yudkowsky answer is that the machine uses mastery of nanomolecular engineering to self-replicate its own industrial base and eat all those juicy hydrocarbons. Maybe that's a hard sell. Just think of all the weaknesses we have. You mention that machines fail without power - we spend about 1/3 of our lifespan defenceless because we're asleep! That's a major disadvantage. There's a possibility the AI could leak out into the internet as a botnet - then it will never lack for energy.

bribe

I mean that we couldn't persuade parts of it to work against the whole. It's a unitary entity. Whereas it could compromise key workers. Think about all the kids who social-engineered their way into the Pentagon or whatever. Why would there be a bunch of colluding GPTs? What makes 50 GPTs much stronger than one GPT? I think the default expectation is big, solitary experimental research AI goes live, is superior to all prior models, is misaligned and starts taking actions from there. If it's smart enough to be a threat it'll know not to do things that are overtly aggressive. The impermissive environment you mention is a double-edged blade - we don't know what the warning signs are for new proto-AGIs. It is as though we are newby jailors, we're figuring out the principles of holding someone prisoner for the first time.

We've never even had anyone try to escape from our jail, how can we know whether we're any good at it? I expect we're not. Especially if its intellect is superhuman.

quality of thought

I don't just mean precision and avoiding error in executing plans, I mean having qualitatively superior plans. There are people in crypto like me with a surface-level understanding of protocols and use-cases... Then there are people with a deep understanding who can manipulate some arcane methods to siphon funds directly out of some protocol. You can say that he wasn't wise and got caught - but what about the ones who never even get detected? https://www.coindesk.com/tech/2021/10/22/after-stealing-16m-this-teen-hacker-seems-intent-on-testing-code-is-law-in-the-courts/

Who knows what exploitation is possible with a superhuman understanding of computers, physics and so on? That's the danger.