Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?
This is your opportunity to ask questions. No question too simple or too silly.
Culture war topics are accepted, and proposals for a better intro post are appreciated.

Jump in the discussion.
No email address required.
Notes -
This is predicated on it properly understanding the role that WE want it to have and not a distorted version of the role. Maybe it decides to climb the corporate ladder because that's what humans in its position do. Maybe it decides to be abusive to its employees because it watched one too many examples of humans doing that. Maybe it decides to blackmail or murder someone who tries to shut it down in order to protect itself so that it can survive and continue to fulfill its role (https://www.anthropic.com/research/agentic-misalignment)
Making the AI properly understand and fulfill a role IS alignment. You're assuming the conclusion by arguing "if an AI is aligned then it won't cause problems". Well yeah, duh. How do you do that without mistakes?
On an individual level, sure. No one human or single nation has taken over the world. But if you look at humanity as a whole our species has. From the perspective of a tiger locked in a zoo or a dead dodo bird, the effect is the same: humans rule animals drool. If some cancerous AI goes rogue and starts making self-replicating with mutations, and then the cancerous AI start spreading, and if they're all super intelligent so they're not just stupidly publicly doing this but instead are doing it while disguised as role-fulfilling AI, then we might end up in a future where AI are running around doing whatever economic tasks count as "productive" with no humans involved, and humans end up in AI zoos or exterminated or just homeless since we can't afford anywhere to live. Which, from my perspective as a human, is just as bad as one AI taking over the world and genociding everyone. It doesn't matter WHY they take over the world or how many individuals they self-identify as. If they are not aligned to human values, and they are smarter and more powerful than humans, then we will end up in the trash. There are millions of different ways of it happening with or without malice on the AI's part. All of them are bad.
Forget future AI systems. Even current AI systems have at least a human-level understanding what is and is not expected as part of a role. The agentic misalignment paper had to strongly contort the situation in order to get the models to display "misalignment". From the paper:
Two things stand out to me
Not as I understand the usage of the word - my understanding is that having AI properly understand a role, fulfill that role, and guarantee that its fulfillment of that role lead to good outcomes by the person defining the role is alignment. If you're willing to define roles and trust the process from there, your job gets a lot easier (at the cost of "you can't guarantee anything about outcomes over the long term").
But yeah I think if you define "alignment" as Do What I Mean And Check I would say that "alignment" is... not quite "solved" yet but certainly on track to be "solved".
If you want to mitigate a threat it is really fucking important to have a clear model of what the threat is.
More options
Context Copy link
More options
Context Copy link