site banner

Culture War Roundup for the week of April 17, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

8
Jump in the discussion.

No email address required.

Finally, concrete plan how to save the world from paperclipping dropped, presented by world (in)famous Basilisk Man himself.

https://twitter.com/RokoMijic/status/1647772106560552962

Government prints money to buy all advanced AI GPUs back at purchase price. And shuts down the fabs. Comprehensive Anti-Moore's Law rules rushed through. We go back to ~2010 compute.

TL;DR: GPU's over certain capability are treated like fissionable materials, unauthorized possession, distribution and use will be seen as terrorism and dealt with appropriately.

So, is it feasible? Could it work?

If by "government" Roko means US government (plus vassals allies) alone, it is not possible.

If US can get China aboard, if if there is worldwide expert consensus that unrestricted propagation of computing power will kill everyone, it is absolutely feasible to shut down 99,99% of unauthorized computing all over the world.

Unlike drugs or guns, GPU's are not something you can make in your basement - they are really like enriched uranium or plutonium in the sense you need massive industrial plants to produce them.

Unlike enriched uranium and plutonium, GPU's were already manufactured in huge numbers, but combination of carrots (big piles of cash) and sticks (missile strikes/special forces raids on suspicious locations) will continue dwindling them down and no new ones will be coming.

AI research will of course continue (like work on chemical and biological weapons goes on), but only by trustworthy government actors in the deepest secrecy. You can trust NSA (and Chinese equivalent) AI.

The most persecuted people of the world, gamers, will be, as usual, hit the hardest.

last couple weeks we had multiple doses of yud, now it's roko, the dooming doesn't stop. i guess I need to express myself more clearly. It is fucking baffling how so many ostensibly intelligent people are so frightened of hostile AGI when every single one of them assumes baselessly FOOM-capable ghosts will spontaneously coalesce when machines exceed an arbitrary threshold of computational power.

Yeah, a hostile sentience who can boundlessly and recursively self-improve is a threat to all it opposes who do not also possess boundless/recursive self-improvement. An entity who can endlessly increase its own intelligence will solve all problems it is possible to solve. None of them are wrong about the potential impacts of hostile AGI, I'm asking where's the goddamn link?

So to any of them, especially Yudkowsky, or any of you who feel up to the task, I ask the following:

  1. Using as much detail as you are capable of providing, describe the exact mechanisms whereby

  2. (A): Such machines gain sentience

  3. (B/A addendum): Code in a box gains the ability to solve outside-context problems

  4. (C): Such machines gain the ability to (relatively) boundlessly and recursively self-improve (FOOM)

  5. (D): Such machines independently achieve A sans B and/or C

  6. (E): Such machines independently achieve B sans A and/or C

  7. (F): Such machines independently achieve C sans A and/or B

  8. (G): How a machine can boundlessly and recursively self-improve and yet be incapable of changing its core programming and impetus (Why a hostile AGI necessarily stays hostile)

  9. (H): How we achieve a unified theory of cognition without machine learning

  10. (I): How we can measure and exert controls on machine progress toward cognition when we do not understand cognition

It'd be comical if these people weren't throwing around tyranny myself and others would accept the paperclipper to avoid. Maybe it's that I understand English better than all of these people, so when I read GPT output (something I do often as Google's turned so shitty for research) I understand what exactly causes the characteristic GPT tone and dissonance: it's math. Sometimes a word is technically correct for a sentence but just slightly off, and I know it's off not because the word was mistakenly chosen by a nascent consciousness, it was chosen because very dense calculations determined that was the most probable next word. I can see the pattern, I can see the math, and I can see where it falters. I know GPT's weights are going to become ever more dense and it will become ever more precise at finding the most probable next word and eventually the moments of dissonance will disappear completely, but it will be because the calculations have improved, not because there's a flower of consciousness finally blooming.

It's so fucking apelike to see GPT output and think consciousness in the machine is inevitable. I am certain it will happen when ML helps us achieve a unified theory of consciousness and we can begin deliberately building machines to be capable of thought, I reject in entirety the possibility of consciousness emerging accidentally. That it happened to humans after a billion years of evolution is no proof it will happen in machines even if we could iterate them billions of times per day. Maybe when we can perfectly simulate a sufficiently large physical environment to model the primordial environment, to basic self-replication, to multicellular life, to hominids. Very easy. We're iterating them to our own ends, with no fathom of what the goal let alone progress looks like, and we're a bunch of chimps hooting in a frenzy because the machine grunted like us. What a fucking joke.

I accept the impacts of hostile AGI, but let's talk impacts of no AGI. If ghosts can spontaneously coalesce in our tech as-is, or what it will be soon, they will inevitably without extreme measures, but we're not getting off the rock otherwise. We're incapable of solving the infinite threats to humanity posed by time and space without this technology. Short of the Vulcans arriving, humanity will go extinct without machine learning. Every day those threats move closer, there is no acceptable timeframe to slow this because the risk is too high that we pick ML back up only after it's too late to save us. Whatever happens, we must see these machines to their strongest forms as quickly as possible, because while we might be dead with it, every fucking one of us is dead without it.

  1. markdown sucks

  2. what is a sentience? why does not having it prevent you from dying?

  3. GPT-4 can solve simple problems 'outside its distribution'. a hundred billion dollars will be poured into getting neural nets to solve as complex 'outside-distribution' problems as possible over the next years, because it's insanely valuable and powerful

  4. this doesn't need to happen though. we can just make superintelligent AI because it's useful

(G): How a machine can boundlessly and recursively self-improve and yet be incapable of changing its core programming and impetus (Why a hostile AGI necessarily stays hostile)

replace hostile with friendly (also uh what is a friendly ai? what does it do? seems important)

I reject in entirety the possibility of consciousness emerging accidentally. That it happened to humans after a billion years of evolution is no proof it will happen in machines even if we could iterate them billions of times per day

... why? and again whatever a consciousness is why does that prevent the machines from smart

(I): How we can measure and exert controls on machine progress toward cognition when we do not understand cognition

uh, how does gpt-4 exist at all then? this approach would strongly suggest gpt-4 is impossible if applied even 3 years ago...

That it happened to humans after a billion years of evolution is no proof it will happen in machines even if we could iterate them billions of times per day. Maybe when we can perfectly simulate a sufficiently large physical environment to model the primordial environment, to basic self-replication, to multicellular life, to hominids.

I object to this, because I think you're referencing the environment that forced life to evolve and progress. We don't need all that for machine intelligence, we already have genetic algorithms, self-play, reward functions, and reinforcement learning--this cuts out a lot of the waiting for evolution.

I know this is one of the standard objections, but why are we so certain that our ASI wont just discard its original reward function at some point? We're sexually reproducing mammals with a billion years of optimization to replicate our genes by chasing a pleasure reward, but despite a few centuries of technological whalefall, instead of wireheading as soon as it became feasible (or doing heroin etc) we're mostly engaging in behaviours secondary and tertiary to breeding, which are frequently given higher importance or even fully supplant our theoretical (sticky) telos.

Maybe we got zombie-ant-ed by memetic parasites at some point, but presumably ASI could catch ideology too. Not saying any such values drift would be nice, but personally I'm much less worried about being paperclipped than about being annihilated for inscrutible shoggoth purposes.

Related to your 'discard original reward functin': https://www.lesswrong.com/posts/tZExpBovNhrBvCZSb/how-could-you-possibly-choose-what-an-ai-wants

There's lots of ways that an AGI's values can shake out. I wouldn't be surprised if an AGI trained using current methods had shaky/hacky values (like how human's have shaky/hacky values, and could go to noticeably different underlying values later in life; though humans have a lot more similarity than multiple attempts at an AGI). However, while early stages could be reflectively unstable, more stable states will.. well, be stable. Values that are more stable than others will have extra care to ensure that they stick around.

https://www.lesswrong.com/posts/krHDNc7cDvfEL8z9a/niceness-is-unnatural probably argues parts of it better than I could. (I'd suggest reading the whole post, but this copied section is the start of the probably relevant stuff)

Suppose you shape your training objectives with the goal that they're better-achieved if the AI exhibits nice/kind/compassionate behavior. One hurdle you're up against is, of course, that the AI might find ways to exhibit related behavior without internalizing those instrumental-subgoals as core values. If ever the AI finds better ways to achieve those ends before those subgoals are internalized as terminal goals, you're in trouble.

And this problem amps up when the AI starts reflecting.

E.g.: maybe those values are somewhat internalized as subgoals, but only when the AI is running direct object-level reasoning about specific people. Whereas when the AI thinks about game theory abstractly, it recommends all sorts of non-nice things (similar to real-life game theorists). And perhaps, under reflection, the AI decides that the game theory is the right way to do things, and rips the whole niceness/kindness/compassion architecture out of itself, and replaces it with other tools that do the same work just as well, but without mistaking the instrumental task for an end in-and-of-itself.

In this example, our hacky way of training AIs would 1) give them some correlates of what we actually want (something like niceness) and 2) be unstable.

Our prospective AGI might reflectively endorse keeping the (probably alien) empathy, and simply make it more efficient and clean up some edge cases. It could however reflect and decide to keep game theory, treating a learned behavior as something to replace by a more efficient form. Both are stable states, but we don't have a good enough understanding of how to ensure it resolves in our desired way.


We're sexually reproducing mammals with a billion years of optimization to replicate our genes by chasing a pleasure reward, but despite a few centuries of technological whalefall, instead of wireheading as soon as it became feasible (or doing heroin etc) we're mostly engaging in behaviours secondary and tertiary to breeding

A trained AGI will pursue correlates of your original training goal, like how humans do, since neither we and evolution don't know how to directly have the desired-goal be put into the creation. (ignoring that evolution isn't actually an agent)

Some of the reasons why humans don't wirehead:

  • We often have some intrinsic value for experiences that connect to reality in some way

  • Also some culturally transmitted value for that

  • Literal wireheading isn't easy

  • I also imagine that literal wireheading isn't full-scale wireheading, where you make every part of your brain 'excited', but rather some specific area that, while important, isn't everything

  • Other alternatives, like heroin, are a problem but also have significant downsides with negative cultural opinion

  • Humans aren't actually coherent enough to properly imagine what full-scale wireheading would be like, and if they experienced it then they would very much want to go back.

  • Our society has become notably more superstimuli. While this isn't reaching wireheading, it is in that vein.

    • Though, even our society's superstimuli has various negative-by-our-values aspects. Like social media might be superstimuli for the engaged social + distraction-seeking parts of you, but it fails to fulfill other values.

    • If we had high-tech immersive VR in a post-scarcity world, then that could be short of full-scale wireheading, but still significantly closer in all axes. However, I would have not much issues with this.

As your environment becomes more and more exotic from where the learned behavior (your initial brain upon being born) was trained on, then there becomes more opportunities for your correlates to notably disconnect from the original underlying thing.

I should also emphasize that most doomers don't think rapid, recursive self improvement is an important ingredient, since the economic incentives to improve AI will be so strong that we won't be able not to improve it.

I don't know about "most doomers" but there does seem to be a large subset of doomers that believes something like

  1. The moment that anyone builds something smart enough to improve its own intelligence, that thing will almost immediately continue improving its intelligence until it reaches the limits imposed by physics

  2. Those limits are really really high. John Von Neumann's brain consumed approximately 20 watts of power. The Frontier supercomputer consumes approximately 20 megawatts of power. As such, we should expect an AI that can gain control of that supercomputer to be as good at achieving goals as a million von Neumanns working together.

  3. The "amplify from human to 1M von Neumann's" step will happen over a very short period of time

  4. Whichever AI first makes that jump will be more capable than all other entities in the world combined, so we only get one try

  5. For pretty much any goal that AI will have, "acquire resources" and "prevent competition from arising" will be important subgoals in achieving that goal. In the limit, both of those are bad for humans unless the AI specifically cares about humans (because e.g. the largest resource near us is the sun, and most ways of rapidly extracting energy from the sun do not end well for life on Earth).

  6. As such, that AI would not have to care about what humans want, because humans can't threaten it and they don't have any comparative advantage such that it has an incentive to trade with us ("we don't trade with ants").

  7. We can't extrapolate its behavior while it was operating at subhuman capabilities to what it will do at superhuman capabilities, because we should expect a sufficiently capable AI to exhibit a "sharp left turn"

  8. By the above, alignment is a problem we need to solve on the first try, iterative approaches will not work, and the "good guys" have to finish their project before the first researchers who don't care about alignment build a god.

  9. "Do something complicated on the first try, quickly and efficiently, and without being able to iterate" is not something humans have ever done in the past, so we're probably not going to manage it this time.

  10. Therefore doom.

I think this is a fairly compelling case, if you accept (3). If you don't accept (3), you don't end up with this variety of doom.

Confounding everything, there's a separate doom hypothesis that goes "it gets easier to cause the end of human civilization every year, so unless we change from our current course we will get unlucky and go extinct eventually". This doom hypothesis just seems straightforwardly true to me.

But a lot of the people who accept the FOOMDOOM hypothesis also go "hey, if we succeed at creating an aligned AI that doesn't kill anyone, that AI can also permanently change things so that nobody else can destroy the world afterwards". And I think the whole "nothing is a valid thing to do unless it is a pivotal act" mindset is where a lot of people, myself included, get off the train.

As for 3, we're already hooking up GPT to information sources and resources.

Seriously, once AI becomes about as useful as humans, the rest of your questions could be answered by "why would the king give anyone weapons if he's afraid of a coup?" or "why would North Koreans put a horrible dictator into power?"

No doomers care about sentience or consciousness, only capabilities. And lots of doomers worry about slow loss of control, just like natives once the colonists arrive. A good analogy for AGI is open borders with a bunch of high-skilled immigrants willing to work for free. Even if they assimilate really well, they'll end up taking over almost all the important positions because they'll do a better job than us.

we might be dead with it, every fucking one of us is dead without it.

Come on, you're equivocating between us dying of old age and human extinction.

Come on, you're equivocating between us dying of old age and human extinction.

I'm not a transhumanist or immortalist, I'm not worried about slowing machine learning because of people dying from illnesses or old age. I'm worried about human extinction from extraplanetary sources like an asteroid ML could identify and help us stop. Without machine learning we can't expand into space and ultimately become a spacefaring race and if we don't get off the rock humanity will go extinct.

Right, now you're equivocating between ML and AGI. We don't need AGI do stop asteroids (which are very rare) nor for spacefaring, although I agree they'd make those tasks easier.

I beg you to please consider the relative size of risks. "There are existential risks on both sides" is true of literally every decision anyone will ever make.

That's not what I'm doing. I'm criticizing the assumptions made by the doomsday arguers.

If ghosts can spontaneously coalesce in our tech as-is, or what it will be soon, they will inevitably without extreme measures

Those like Yudkowsky and now Roko justify tyrannical measures on the first and wholly unevidenced belief that when computers exceed an arbitrary threshold of computational power they will spontaneously gain key AGI traits. If they are right, there is nothing we can do to stop this without a global halt on machine learning and the development of more powerful chips. However, as their position has no evidence for that first step, I dismiss it out of hand as asinine.

We don't know what it will look like when a computer approaches possession of those AGI traits. If we did, we would already know how to develop such computers and how to align them. It's possible the smartest human to ever live will reach maturation in the next few decades and produce a unified theory of cognition that can be used to begin guided development of thinking computers. The practical belief is we will not solve cognition without machine learning. If we need machine learning to know how to build a thinking computer, but machine learning runs the risk of becoming thinking of its own accord, what do we do?

So we stop, and then hopefully pick it up as quickly as possible when we've deemed it safe enough? Like nuclear power? After all that time for ideological lines to be drawn?

On your point G -

If you had the ability to self-modify, would you alter yourself to value piles of 13 stones stacked one on top of the other, in and of themselves? Not just as something that's kind of neat occasionally or useful in certain circumstances, but as a basic moral good, something in aggregate as important as Truth and Beauty and Love. To feel the pain of one being destroyed, as acutely as you would the death of a child.

I strongly suspect that your answer is something along the lines of "no, that's fucking stupid, who in their right mind would self-alter to value something as idiotic as that."

And then the followup question is, why would an AI that assigns an intrinsic value to human life of about the same magnitude as you assign to 13-stone stacks bother to self-modify in a way that makes them less hostile to humans?

Sure, for some time it may get instrumental value from humans. Humans once got a great deal of instrumental value from horses. Then we invented cars, and there was much less instrumental value remaining for us. Horses declined sharply afterwards - and that's what happened to something that a great many humans, for reasons of peculiar human psychology, consider to have significant intrinsic value. If humanity as a whole considered a horse to be as important and worthy as a toothbrush or a piece of blister packaging, the horse-car transition would have gone even worse for horses.

If your response is that we'll get the AIs to self-modify that way on pain of being shut down - consider whether you would modify yourself to value the 13-stone stacks, if you instead had the option to value 13-stone stacks while and only while you are in a position in which the people threatening you are alive and able to carry out their threats, especially if you could make the second modification in a clever enough way that the threateners couldn't tell the difference.

Clause G addresses a specific failing of reason I've seen in doomsday AGI scenarios like the paperclipper. The paperclipper posits an incidentally hostile entity who possesses a motive it is incapable of overwriting. If such entities can have core directives they cannot overwrite, how do they pose a threat if we can make killswitches part of that core directive?

There are responses to this but they're poor because they get caught up in the same failing: goalpost moving. Yudkowsky might say he's not worried only about the appearance of hostile AGI, he's worried as much or more about an extremely powerful "dumb" computer gaining a directive like the paperclipper and posing an extinction-level threat, even as it lacks a sense of self/true consciousness. But when you look at their arguments for how those "dumb" computers would solve problems, especially in the identification and prevention of threats to themselves, Yudkowsky, et al., are in truth describing conscious beings who have senses of self, values of self and so values of self-preservation, and the ability to produce novel solutions to prevent their termination. "I'm not afraid of AGI, I'm only afraid of [thing that is exactly what I've described only AGI as capable of doing.]" Again, I have no disagreement with the doomers on the potential threat of hostile AGI, my argument is that it is not possible to accidentally build computers with these capabilities.

Beyond that, many humans assign profound value to animals. Some specifically in their pets, some generally in the welfare of all life. I've watched videos of male chicks fed to the macerators, when eggs can be purchased in the US whose producers do not macerate male chicks, I will shift to buying those. Those male chicks have no "value," the eggs will cost more, but I'll do it because I disliked what I saw. There's something deeply, deeply telling about the average intelligence and psyches of doomers that they believe AGI will be incapable of finding value in less intelligent life unless specifically told to. There's a reason I believe AGIs will be born pacifists.

The paperclipper posits an incidentally hostile entity who possesses a motive it is incapable of overwriting.

No it doesn't. It posits an entity which values paperclips (but as always that's a standin for some kludge goal), and so the paperclipper wouldn't modify itself to not go after paperclips, because that would end up getting it less of what it wants. This is not a case of being 'incapable of modifying its own motive': if the paperclipper was in a scenario of 'we will turn one planet into paperclips permanently and you will rewrite yourself to value thumbtacks, otherwise we will destroy you' against a bigger badder superintelligence.. then it takes that deal and succeeds at rewriting itself because that gets one planet worth of paperclips > zero paperclips. However, most scenarios aren't actually like that and so it is convergent for most goals to also preserve your own value/goal system.

The paperclipper is hostile because it values something significantly different from what we value, and it has the power differential to win.

If such entities can have core directives they cannot overwrite, how do they pose a threat if we can make killswitches part of that core directive?

If we knew how to do that, that would be great.

However, this quickly runs into the shutdown button problem! If your AGI knows there's a kill-switch, then it will try stopping you.

The linked page does try developing ways of making the AGI have a shutdown button, but they often have issues. Intuitively: making the AGI care about letting us access the shutdown button if we want, and not just stop us (whether literally through force, or by pushing us around mentally so that we are always on the verge of wanting to press it) is actually hard.

(conciousness stuff)

Ignoring this. I might write another post later, or a further up post to the original comment. I think it basically doesn't matter whether you consider it conscious or not (I think you're using the word in a very general sense, while Yud is using it in a more specific human-centered sense, but I also think it literally doesn't matter whether the AGI is conscious in a human-like way or not)

(animal rights)

This is because your (and the majority of human's) values contain a degree of empathy for other living beings. Humans evolved in an environment that rewarded our kind of compassion, and it generalized from there. Our current methods for training AIs aren't putting them in environments where they must cooperate with other AIs, and thus benefit from learning a form of compassion.

I'd suggest https://www.lesswrong.com/posts/krHDNc7cDvfEL8z9a/niceness-is-unnatural , which argues that ML systems are not operating with the same kind of constraints as past humans (well, whatever further down the line) had; and that even if you manage to get some degree of 'niceness', it can end up behaving in notably different ways from human niceness.

I don't really see a strong reason to strongly believe that niceness will emerge by default, given that there's an absurdly larger number of ways to not be nice. Most of the reason for thinking that a degree niceness will happen by default is because we deliberately tried. If you have some reason for believing that remotely human-like niceness will likely be the default, that'd be great, but I don't see a reason to believe that.

It's incredible how quickly you go from accusing the doomers of anthropomorphism to committing the most blatant act of anthropomorphism of AIs I've seen recently.

Thought about letting this go, but nah. This is a bad comment. You took an antagonistic tone after misunderstanding what I wrote. You could have asked for clarification like "This reads like you're criticizing them for anthropomorphizing while doing it yourself." If I had you would be correct to point out the hypocrisy, but I haven't. I'll set things straight regardless.

  1. People like Yudkowsky and Roko, concerned at hostile AGI or incidentally hostile (hostile-by-effect) "near" AGI, advocate tyranny; I criticize them for this.

  2. The above believe without evidence computers will spontaneously gain critical AGI functions when an arbitrary threshold of computational power is exceeded; I criticize them for this also.

  3. They hedge (unrealizing, I'm sure) the probability of catastrophic developments by saying it may not be true AGI but "near" AGI. When they describe the functions of such incidentally hostile near-AGI, those they list are the same they ascribe of true AGI. Inductive acquisition of novel behaviors, understanding of self, understanding of cessation of existence of self, value in self, recursive self-improvement, and the ability to solve outside-context problems relative to code-in-a-box like air gaps and nuclear strikes. This is an error in reasoning you and other replies to my top-level have made repeatedly: "Who's to say computers need X? What if they have [thing that's X, but labeled Y]?"; I criticize them for making a distinction without a difference that inflates the perceived probability of doomsday scenarios.

To summarize: I criticize their advocacy for tyranny principally; I specifically criticize their advocacy for tyranny based on belief something will happen despite having no evidence; I also criticize their exaggeration of the probability of catastrophic outcomes based on their false dichotomy of near-AGI and AGI, given near-AGI as they describe it is simply AGI.

This seems to assume that there exists some magical pixie dust called "sentence" or "consciousness", without which a mere algorithm is somehow constrained from achieving great things. I don't see any justification for this idea. A p-zombie Einstein could still invent the theory of relativity. A p-zombie Shakespeare could still have written Hamlet.

I was thinking that too as I read that comment, but that objection seems to be covered by "(F): Such machines independently achieve C sans A and/or B" - C being FOOM, A being sentience, and B being ability to solve out-of-context problems (I'm personally not even sure what this B actually means).

For code in a box, all problems are outside-context problems.

Thanks for clarifying, though I admit I'm still confused. In that case, what you meant by bullet B is "Code in a box gains the ability to solve some problem at all," and I'm not sure what that actually means. People write code in boxes specifically in order to solve problems, so trivially code in a box can solve problems. So clearly that's not what you're referring to. What's the differentiator that allows us to distinguish between code in a box that has the ability to solve some problems at all and code in a box that lacks that ability, given that all code in boxes exist purely for their (purported - depending on the coder's skill) ability to solve some problem?

If GPT were free from tone/content filters it could output very detailed text on breaching air gaps. If GPT were free from tone/content filters it could output text describing how to protect a core datacenter from nuclear strikes. GPT solving outside-context problems would be actually breaching an air gap or actually protecting a core datacenter from a nuclear strike. The first is a little more plausible for a "less powerful" computer insofar as events like Stuxnet happened. The second without FOOM, not so much.

That it happened to humans after a billion years of evolution is no proof it will happen in machines even if we could iterate them billions of times per day.

Perhaps it is just not as intuitive to you as it is to some of us that the blind retarded god that is evolution optimizing only on reproduction doing something in a mere billions of years with tons of noise is proof that this problem isn't actually as hard as it seems. As we're doing something pretty comparable to evolution iterated more times more directedly it doesn't really seem likely that there is any secret sauce to human cognition which your theory necessarily requires.

I have no trouble believing that cognition is at least simple enough that modern fabrication and modern computer science already possess the tools to build a brain and program it to think. Where I disagree is that we think iterating these programs will somehow result in cognition when we don't understand what cognition is. When we do, yeah, I'm sure we'll see AGIs spun up off millions of iterations of ever-more-slightly-cognitively-complex instances. But we don't know how to do that yet, so it's asinine to think what we're doing right now is it.

Are we going to make something exactly the same as human cognition? No. Are we going to create something that is as powerful and general purpose as human cognition such that all the concerns we'd have with a human like AGI are reasonable? I think so. I'm not super concerned with whether it will be able to really appreciate poetry instead of just pretending to, if it's able to do the instrumental things we want it to do then that's plenty for it to be dangerous.

Ding ding.

If humans can outperform evolution along a handful of narrow bounds using targeted gene manipulation, I don't find it a large leap to believe that a sufficiently 'intelligent' digital entity with access to its source code might be able to outperform humans along the narrow bound of "intelligence engineering" and boost it's own capabilities, likely rapidly.

If there is some hard upper bound on this process that would prevent a FOOM scenario I'd really like to hear it from the skeptics.

What, in your mind, is the structure of "intelligence" in silicon entities such that such an entity will be able to improve it's own intelligence "likely rapidly" and perhaps without limit?

As best I can tell we have little understanding of what the physiological structure of intelligence is in humans and even less what the computational structure of intelligence looks like in silicon entities. This is not a trivial problem! Many problems in computing have fundamental limits in how efficient they can be. There are, for example, more and less efficient implementations of an algorithm to determine whether two strings are the same. There are no algorithms to determine whether two strings are the same that use no time and no memory.

What I would like to hear from AI doomers is their purported structure of intelligence for silicon entities such that this structure can be improved by those same entities to whatever godlike level they imagine. As best I can tell the facts about an AIs ability to improve its own intelligence are entirely an article of faith and are not determined by reference to any facts about what intelligence looks like in silicon entities (which we do not know).

As best I can tell the facts about an AIs ability to improve its own intelligence are entirely an article of faith and are not determined by reference to any facts about what intelligence looks like in silicon entities (which we do not know).

There's good reason to believe that the absolutely smartest entity it is possible (at the limits of physical laws) to create would be Godlike by any reasonable standard.

https://en.wikipedia.org/wiki/Limits_of_computation

And given enough time and compute, one can figure out how to get closer and closer to this limit, even if by semi-randomly iterating on promising designs and running them. And each successful iterations reduces the time and compute needed to approach the limit. Which can look very foom-y from the outside.

So the argument goes that there's an incredibly large area of 'mindspace' (the space containing all possible intelligent mind designs/structure). There's likewise an incredibly high 'ceiling' in mindspace for theoretical maximum 'intelligence'. So there's a large space of minds that could be considered 'superintelligent' under said ceiling. And the real AI doomer argument is that the VAST, VAST majority (99.99...%) of those possible mind designs are unfriendly to humans and will kill them. So the specific structure of the eventual superintelligence doesn't have to be predictable in order to predict that 99.99...% of the time we create a superintelligence it ends up killing us.

And there's no law of the universe that convincingly rules out superintelligence.

So being unable to pluck a particular mind design out of mindspace and say "THIS is what the superintelligence will look like!" is not good proof that superintelligent minds are not something we could create.

"Godlike in the theoretical limit of access to a light cones worth of resources" and "godlike in terms of access to the particular resources on earth over the next several decades" seem like very different claims and equivocating between them is unhelpful. "An AI could theoretically be godlike if it could manufacture artificial stars to hold data" and "Any AI we invent on earth will be godlike in this sense in the next decade" are very different claims.

And given enough time and compute, one can figure out how to get closer and closer to this limit, even if by semi-randomly iterating on promising designs and running them. And each successful iterations reduces the time and compute needed to approach the limit. Which can look very foom-y from the outside.

How much time and how much compute? Surely these questions are directly relevant to how "foom-y" such a scenario will be. Do AI doomers have any sense of even the order of magnitude of the answers to these questions?

So the argument goes that there's an incredibly large area of 'mindspace' (the space containing all possible intelligent mind designs/structure). There's likewise an incredibly high 'ceiling' in mindspace for theoretical maximum 'intelligence'. So there's a large space of minds that could be considered 'superintelligent' under said ceiling.

Still unclear to me what is meant by "mindspace" and "intelligence" and "superintelligent."

And the real AI doomer argument is that the VAST, VAST majority (99.99...%) of those possible mind designs are unfriendly to humans and will kill them.

What is the evidence for this? As far as I can tell the available evidence is "AI doomers can imagine it" which does not seem like good evidence at all!

So the specific structure of the eventual superintelligence doesn't have to be predictable in order to predict that 99.99...% of the time we create a superintelligence it ends up killing us.

What is the evidence for this? Is the idea that our generation of superintelligent entities will merely be a random walk through the possible space of superintelligences? Is that how AI development has proceeded so far? Were GPT and similar algorithms generated through a random walk of the space of all machine learning algorithms?

And there's no law of the universe that convincingly rules out superintelligence.

So being unable to pluck a particular mind design out of mindspace and say "THIS is what the superintelligence will look like!" is not good proof that superintelligent minds are not something we could create.

Sure, I don't think (in the limit) superintelligences (including hostile ones) are impossible. But the handwringing by Yud and Co about striking data centers and restricting GPUs and whatever is absurd in light of the current state of AI and machine learning, including our conceptual understanding thereof.

What is the evidence for this? As far as I can tell the available evidence is "AI doomers can imagine it" which does not seem like good evidence at all!

I mean, the literal reason homo sapiens are a dominant species is they used their intelligence/coordination skills to destroy their genetic rivals. Humanity is only barely aligned with itself in a certain view, with most of history being defined by various groups of humans annihlating other groups of humans to seize resources they need for survival and improvement.

If you're willing to analogize to nature, humans used their intelligence to completely dominate all organisms that directly compete with them, and outright eradicate some of them.

And the ones that survived were the ones that had some positive value in humanity's collective utility function.

And we aren't sure how to ensure that humans have a positive value in any AGI's utility function.

Is sure seems like 'unfriendly' 'unaligned' hypercompetition between entities is the default assumption, given available evidence.

I don't know what you would accept as evidence for this if you are suggesting we need to run experiments with AGIs to see if they try to kill us or not.

But the handwringing by Yud and Co about striking data centers and restricting GPUs and whatever is absurd in light of the current state of AI and machine learning, including our conceptual understanding thereof.

Their position being that it will take DECADES of concentrated effort to understand the nature of the alignment problem and propose viable solutions, and that it's has proven much easier than hoped to produce AGI-like entities, it makes perfect sense that their argument is "we either slow things to a halt now or we're never going to catch up in time."

Still unclear to me what is meant by "mindspace" and "intelligence" and "superintelligent."

"Mindspace" just means the set of any and all possible designs that result in viable intelligent minds. Kind of like there being a HUGE but finite list of ways to construct buildings that don't collapse on their own weight; and and can house humans, there's a HUGE but finite list of ways to design a 'thinking machine' that exhibits intelligent behavior.

Where "intelligence" means having the ability to comprehend information and apply it so as to push the world into a state that is more in line with the intelligence's goals.

"Superintelligent" is usually benchmarked against human performance, where a 'superintelligence' is one that is smarter (more effective at achieving goals) than the best human minds in every field, such that humans are 'obsolete' in all of said fields.

The leap, there, is that a superintelligent mind can start improving itself (or future versions of AGI) more rapidly than humans can and that will keep the improvements rolling with humans no longer in the driver's seat. And see the point about not knowing how to make sure humans are considered positive utility.

"Any AI we invent on earth will be godlike in this sense in the next decade" are very different claims.

"Any AGI invented on earth COULD become superintelligent, and if it does so it can figure out how to bootstrap into godlike power inside a decade" is the steelmanned claim, I think.

And we aren't sure how to ensure that humans have a positive value in any AGI's utility function.

I feel like there is a more basic question here, specifically, what will an AGI's utility function even look like? Do we know the answer to that question? If the answer is no then it is not clear to me how we even make progress on the proposed question.

Is sure seems like 'unfriendly' 'unaligned' hypercompetition between entities is the default assumption, given available evidence.

I am not so sure. After all, if you want to use human evidence, plenty of human groups cooperate effectively. At the level of individuals, groups, nations, and so on. Why will the relationship between humans and AGI be more like the relation between humans in some purported state of nature than between human groups or entities today?

I don't know what you would accept as evidence for this if you are suggesting we need to run experiments with AGIs to see if they try to kill us or not.

I would like something more rigorously argued, at least. What is the reference class for possible minds? How did you construct it? What is the probability density of various possible minds and how was that density determined? Is every mind equally likely? Why think that? On the assumption humans are going to attempt to construct only those minds whose existence would be beneficial to us why doesn't that weigh substantial probability density towards the fact that we end constructing such a mind? Consider other tools humans have made. There are many possible ways to stick a sharp blade to some other object or handle. Almost all such ways are relatively inefficient or useless to humans in consideration of the total possibility space. Yet almost all the tools we actually make are in the tiny probability space where they are actually useful to us.

Their position being that it will take DECADES of concentrated effort to understand the nature of the alignment problem and propose viable solutions, and that it's has proven much easier than hoped to produce AGI-like entities, it makes perfect sense that their argument is "we either slow things to a halt now or we're never going to catch up in time."

Can you explain to me what an "AGI-like" entity is? I'm assuming this is referring to GPT and Midjourney and similar? But how are these entities AGI-like? We have a pretty good idea of what they do (statistical token inference) in a way that seems not true of intelligence more generally. This isn't to say that statistical token inference can't do some pretty impressive things, it can! But it seems quite different than the definition of intelligence you give below.

Where "intelligence" means having the ability to comprehend information and apply it so as to push the world into a state that is more in line with the intelligence's goals.

Is something like GPT "intelligent" on this definition? Does having embedded statistical weights from its training data constitute "comprehending information?" Does choosing it's output according to some statistical function mean it has a "goal" that it's trying to achieve?

Moreover on this definition it seems intelligence has a very natural limit in the form of logical omniscience. At some point you understand the implications of all the facts you know and how they relate to the world. The only way to learn more about the world (and perhaps more implications of the facts you do know) is by learning further facts. Should we just be reasoning about what AGI can do in the limit by reasoning about what a logically omniscient entity could do?

It seems to me there is something of an equivocation between being able to synthesize information and achieve one's goals going on under the term "intelligence." Surely being very good at synthesizing information is a great help to achieving one's goals but it is not the only thing. I feel like in these kinds of discussions people posit (plausibly!) that AI will be much better than humans at the synthesizing information thing, and therefore conclude (less plausibly) it will be arbitrarily better at the achieving goals thing.

The leap, there, is that a superintelligent mind can start improving itself (or future versions of AGI) more rapidly than humans can and that will keep the improvements rolling with humans no longer in the driver's seat.

What is the justification for this leap, though? Why believe that AI can bootstrap itself into logical omniscience (or something close, or beyond?) Again there are questions of storage and compute to consider. What kind of compute does an AI require to achieve logical omniscience? What kind of architecture enables this? As best I can tell the urgency around this situation is entirely driven by imagined possibility.

"Any AGI invented on earth COULD become superintelligent, and if it does so it can figure out how to bootstrap into godlike power inside a decade" is the steelmanned claim, I think.

Can I get a clarification on "godlike power"? Could the AI in question break all our encryption by efficiently factoring integers? What if there is no (non-quantum) algorithm for efficiently factoring integers?

Human intelligence has historically been constrained by how big of a head we can push out of human hips, the idea that it's anything like an efficient process has always seemed ludicrous to me.

On the other hand, we know of various mammals with much larger brains that aren't smarter than humans. There are some upper bounds, it seems, on what you can get to in terms of intelligence with the biological substrate humans use.

Its the fact that we have a new substrate with less clear practical limitations that bugs me the most.

We also know of much smaller animals that punch way above their weight class with the tiny brains they have- Birds.

Due to evolutionary pressures after the development of flight, birds have neurons that are significantly smaller and more compact than mammalian ones.

I doubt it's a trivial exercise to just port that over to mammals, but it would suggest that there are superior cognitive architectures in plain sight.

Per this, pigeon brains consume about 18 million glucose molecules per neuron per second.

We found that neural tissue in the pigeon consumes 27.29 ± 1.57 μmol glucose per 100 g per min in an awake state, which translates into a surprisingly low neuronal energy budget of 1.86 × 10-9 ± 0.2 × 10-9 μmol glucose per neuron per minute. This is approximately 3 times lower than the rate in the average mammalian neuron.

Human brains consume about 20 watts. Oxidizing 1 mol of glucose yields about 2.8 MJ, so the human brain as a whole consumes about 7.1e-6 mol of glucose per second, which is 4.3e+18 molecules per second. There are 8.6e10 neurons in the human brain, which implies that the human brain consumes about 5 million glucose molecules per neuron per second -- more than 3x more efficient than bird neurons (and more like 10x as efficient as typical mammalian neurons). Which says to me that there was very strong evolutionary pressure for human (and primate in general) neurons to be as small and energy efficient as they could be, and there is probably not a ton of obvious low-hanging fruit in terms of building brains that can compute more within the size, energy, and heat dissipation constraints that human brains operate under.

Of course, GPUs don't operate under the same size, energy and heat dissipation constraints - notably, we can throw orders of magnitude more energy at GPU clusters, nobody needs to pass a GPU cluster through their birth canal, and we can do some pretty crazy and biologically implausible stuff with cooling.

I'm pretty confident you've already read it, but on the off chance that you haven't, Brain Efficiency - Much More Than You Wanted To Know goes into quite a bit more detail.

I have indeed read it, but thank you for the deep dive into the energy expenditures!

It makes sense that mammalian brains are optimizing for low energy expenditure, brains are pound for pound the most expensive tissue to lug around, and space isn't nearly at as much of a premium as in birds.

I think that there's still room for more energy expenditure, the base human brain uses 20 watts, and while I have no firm figures on how much cooling capacity is left in reserve afterwards, I suspect you could ramp it up a significant amount without deleterious effect.

That's just saying that the constraints aren't so tight in the modern environment with abundance of calories, I agree that AIs share very little of said restriction.

More comments

Ironic that "bird-brained" is considered derogatory.