Contact Us
Sign In
Sign Up
Rules Admins Moderation Log Random Post Random User
What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules
Recommended Posts And Communities
Recommended Realtime Chats
- Astral Codex Ten Discord
- Quokka's Den Telegram

PaperclipPerfector 2yr ago (text post) 4942 thread views

Small-Scale Question Sunday for July 16, 2023

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

Jump in the discussion.

No email address required.

Nummaru 2yr ago · Edited 2yr ago

I don't know a lot about this topic, so I want to see if it makes sense: instrumental convergence is often posed in AI alignment as an existential risk, but could it not simply lead to a hedonistic machine? There is already precedent in the form of humans. As I understand it, many machine learning techniques operate on the idea of fitness, with a part that does something, and another part that rate its fitness. Already, it's common for AI to find loopholes in given tasks and designed aims. Is it a possibility that it would be much easier for the AI to, rather than destroying the world and such, simply find a loophole that gives it an "infinite" fitness/reward score? It seems logical to me that any sufficiently intelligent entity, with such simple coded motivations, would have almost a divergence, precisely because of self-modification. I suppose that the same logic applies to a system that is not originally like this, but turns into an agent.

Essentially: given the possibility of reward hacking, why would an advanced AI blow up the Earth?

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi Nummaru 2yr ago

Essentially: given the possibility of reward hacking, why would an advanced AI blow up the Earth?

If you consider that it might want to disassemble the planet to produce computational megastructures that make reward value go brrr, then from the perspective of a humble human who needs the biosphere, the difference is rather moot. You can always use more storage to hold larger values.

Context

Nummaru self_made_human 2yr ago · Edited 2yr ago

I'm not sure if that's the case. Acquiring more storage for that end means that you're, in the short-term, decreasing the reward value. It's functionally no different (eg. 100/110 and 90/100 have the same arithmetical difference). What's the incentive to go beyond a maximum? That would be like "over-completing" a goal, or, rather, setting a new goal- why would it expand its own laundry list? For example, an AI which has the goal to solve chess, has no incentive to go beyond that, if its reward value is maximum when it does solve chess. The machine is only incentivised to please this, it doesn't have any other prime motivation like long-term thinking. As a simplistic comparison, it's kind of like why very few projects aim to take control of the world.

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi Nummaru 2yr ago · Edited 2yr ago

You never specified that the AI in question had a "maximum" reward value beyond which it is indifferent. If it simply seeks to maximize a reward function, then more resources and more compute will obviously allow it to store bigger values of reward. If it hits a predetermined max beyond which it doesn't care, further behavior depends entirely on the specific architecture of the AI. It might plausibly seek more resources to help it minimize the probability of the existing reward being destroyed, be it by Nature, or other agents, or it might just shut itself off or go insane since it becomes indifferent to all further actions.

For example, an AI which has the goal to solve chess, has no incentive to go beyond that, if its reward value is maximum when it does solve chess. The machine is only incentivised to please this, it doesn't have any other prime motivation like long-term thinking. As a simplistic comparison, it's kind of like why very few projects aim to take control of the world.

You ought to pick an easier goal than solving chess. To dig down the entire decision tree would take colossal amount of resources, maybe even more than exists in the observable universe. Consider what that might imply for other goals that seem closed-ended.

Context

Nummaru self_made_human 2yr ago · Edited 2yr ago

You never specified that the AI in question had a "maximum" reward value beyond which it is indifferent.

Isn't that kind of implied if it can't store beyond a certain number? Like I said, acquiring more compute to store bigger values of reward is functionally the same as decreasing its value of reward.

If it hits a predetermined max beyond which it doesn't care, further behavior depends entirely on the specific architecture of the AI. It might plausibly seek more resources to help it minimize the probability of the existing reward being destroyed, be it by Nature, or other agents, or it might just shut itself off or go insane since it becomes indifferent to all further actions.

Yes, that's my central question. My argument is that it need not do anything close to apocalyptic for preservation. I am interested in the other possibilities, like "going insane", since I'm not sure what would happen in that case.

You ought to pick an easier goal than solving chess.

Ah, it's just a cliche example. However, I think that you can realistically weakly solve it, nonetheless. You're right that it would take an enormous amount of resources. My point is that it was a close-ended goal- but if you can't even measure the fitness properly for solving chess due to the complexity, and it would potentially ealise the futility, I'm not sure how ultimately relevant it is?

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi Nummaru 2yr ago

Isn't that kind of implied if it can't store beyond a certain number? Like I said, acquiring more compute to store bigger values of reward is functionally the same as decreasing its value of reward.

I struggle to think of any AI architecture that works the way you envision, using fractional ratios of reward to available room for reward instead of plain absolute magnitude of reward. I could be wrong, but I still doubt that's ever done.

Yes, that's my central question. My argument is that it need not do anything close to apocalyptic for preservation. I am interested in the other possibilities, like "going insane", since I'm not sure what would happen in that case.

It's impossible to answer that without digging into the exact specifications of the AI in question, and what tie-breaker mechanism it has to adjudicate between options when all of them have the same (zero) reward. Maybe it picks the first option, maybe it chooses randomly.

However, I am under the impression that in the majority of cases, a reward maximizing agent will simply try to minimize the risk of losing its accrued reward if it's maxed out, which will likely result in large scale behavior indistinguishable from attempting to increase the reward itself (turning the universe into computronium).

My point is that it was a close-ended goal- but if you can't even measure the fitness properly for solving chess due to the complexity, and it would potentially ealise the futility, I'm not sure how ultimately relevant it is?

Why could you not measure the fitness? Even if we can't evaluate each decision chain in chess, we know how many there are, so a reward that increases linearly for each tree solved should work.

Context

Nummaru self_made_human 2yr ago

using fractional ratios of reward to available room for reward instead of plain absolute magnitude of reward.

How does it follow that it's a fractional ratio? The only relevant fact is whether the maximum value has been reached. How could it even compare the absolute magnitude, if it can't store a larger number?

However, I am under the impression that in the majority of cases, a reward maximizing agent will simply try to minimize the risk of losing its accrued reward if it's maxed out,

I agree with this, but based on my knowledge of speculative ways to survive until the end of the Universe, few involve turning it into computronium. Presumably, AI would still factor in risk.

Why could you not measure the fitness?

I mean that, in practice, it could never be realised, for the reasons you mentioned- as in, achievement beyond a certain value would be impossible, since you can't strongly solve chess within current physical limits.

Context

What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules

Recommended Realtime Chats

Link copied to clipboard

Action successful!

Error, please try again later.