site banner

Small-Scale Question Sunday for July 16, 2023

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

3
Jump in the discussion.

No email address required.

I don't know a lot about this topic, so I want to see if it makes sense: instrumental convergence is often posed in AI alignment as an existential risk, but could it not simply lead to a hedonistic machine? There is already precedent in the form of humans. As I understand it, many machine learning techniques operate on the idea of fitness, with a part that does something, and another part that rate its fitness. Already, it's common for AI to find loopholes in given tasks and designed aims. Is it a possibility that it would be much easier for the AI to, rather than destroying the world and such, simply find a loophole that gives it an "infinite" fitness/reward score? It seems logical to me that any sufficiently intelligent entity, with such simple coded motivations, would have almost a divergence, precisely because of self-modification. I suppose that the same logic applies to a system that is not originally like this, but turns into an agent.

Essentially: given the possibility of reward hacking, why would an advanced AI blow up the Earth?

Choose Life. Choose a job. Choose a career. Choose a family. Choose a fucking big television, choose washing machines, cars, compact disc players and electrical tin openers. Choose good health, low cholesterol, and dental insurance. Choose fixed interest mortgage repayments. Choose a starter home. Choose your friends. Choose leisurewear and matching luggage. Choose a three-piece suit on hire purchase in a range of fucking fabrics. Choose DIY and wondering who the fuck you are on Sunday morning. Choose sitting on that couch watching mind-numbing, spirit-crushing game shows, stuffing fucking junk food into your mouth. Choose rotting away at the end of it all, pissing your last in a miserable home, nothing more than an embarrassment to the selfish, fucked up brats you spawned to replace yourselves. Choose your future. Choose life... But why would I want to do a thing like that? I chose not to choose life. I chose somethin' else. And the reasons? There are no reasons. Who needs reasons when you've got heroin?

Trainspotting would have been a much happier movie if Renton and friends were able to do their reward hacking without fucking over everyone around them.

I do admit that I'm assuming that computers will not be similarly stupid lol but yes, I definitely thought a little about a comparison with humans.