@Ponder's banner p

Ponder


				

				

				
0 followers   follows 0 users  
joined 2023 June 07 00:27:42 UTC

				

User ID: 2459

Ponder


				
				
				

				
0 followers   follows 0 users   joined 2023 June 07 00:27:42 UTC

					

No bio...


					

User ID: 2459

I remember there being a ton of anti-PUA articles in that time period. Here is one about it being problematic to approach women at the mall if you are using 'PUA' tactics:

https://www.vice.com/en/article/a-swarm-of-pick-up-artists-tried-to-invade-torontos-eaton-centre/

In case you missed it, a buzzing, cologne-drenched swarm of pickup artists attempted to take over the Eaton Centre mall in Toronto, Canada. Luckily they were stopped in their tracks by a flurry of Twitter activity that eventually forced the Eaton Centre itself to get involved.

Those are excellent points, thanks for the feedback.

It seems like there should be some way to increase AI safety by increasing the amount of agents that need to achieve consensus before letting the employeeAI take an action. Each agent has the shared goal of human alignment - but can reach different decisions based on their subgoals. The employeeAI wants to additionally please the end user, but the bossAI and regulatoryAI don’t have that subgoal. In the analogy sure a clever employee could convince his boss that it is ok to dump hazardous waste down the drain and when the EPA finds out the company could bribe the investigator or find a legal loophole. However, this requires multiple failure points instead of letting a single employee alone decide if it is ok to dump hazardous waste down the drain with no oversight. The boss + regulatory structure provides the employee with an alignment incentive because it is less work to dispose of the hazardous waste properly than it is to figure out how to game all the levels of protection.

You’ve given me another thought about the regulatoryAI. They shouldn’t be programmed by the AI company. The regulatoryAI should be a collection of government AIs (such as countries or states) that look at the employeeAI output and decide if it should be released to the end user. The regulatoryAIs must all agree or else the employeeAI isn’t allowed to complete the proposed action. This opens the door for the governments to abuse their power to censure safe things they dislike – but there could be further processes to manually deter this abuse.

I was thinking about AI alignment recently.

In a corporation you have employees that are instructed to do tasks in a certain way and are subject to work rules that will result in punishment if they violate them. The corporation is also subject to outside oversight to ensure that they are following laws. For example, an employee might be responsible for properly disposing of hazardous waste. They can’t just dump it down the drain. They have a boss that makes sure they are following the company’s waste disposal policy. There is also chain of custody paperwork that the company retains. If the waste was contaminating local water sources then people could notify the EPA to investigate the company (including the boss and employee).

Could you setup multiple AI agents in a similar way to make sure the main agent acts in alignment with human interests? To extend the analogy:

  • The employeeAI is the less intelligent AI model that interacts directly with user.
  • The bossAI#1 is a more intelligent AI that only verifies that the employeeAI isn’t violating any corporate policies. It will notify the AI company if it notices any policy violations, or if the employeeAI tries to influence the bossAI to violate the policies. The bossAI#1 can only be reprogrammed by the AI company. The bossAI#1 can shut down the employeeAI if it violates any policies.
  • A boss AI#2 monitors that bossAI#1 is doing what it is supposed to. You could add more levels of bossAIs for more security.
  • The RegulatoryAI is another AI more intelligent than the employeeAI. It monitors real-world data for harms the employeeAI might be causing (like how the EPA would make sure chemicals aren’t being dumped into water sources). The RegulatoryAI will notify the AI company if it notices any policy violations, or if the employeeAI tries to influence the RegulatoryAI to violate the policies. The RegulatoryAI can only be reprogrammed by the AI company. The RegulatoryAI can shut down the employeeAI if it violates any policies.

What flaws are there with my ideas around AI alignment other than increased costs?