site banner

Alignment problem

A modest idea for those who want to try their hand at AI alignment problem but is deterred by the lack of actual AI to try it on.

Let's consider a simpler (I think?) stepping stone - a multi-billionaire alignment problem. Especially in the aftermath of recent events where different billionaires caused different turmoils in different areas with different results, I think it makes sense to ask ourselves, as a society, whether we can - or should - have some kind of billionaire alignment program, and how we should approach it, before we try the same towards more alien entities such as AIs.

The input is:

  1. We have a bunch on intelligent - but not super-intelligent yet, so the task is easier - entities. For this task we presume human-level intelligence, probably on the higher end of the spectrum but nothing overwhelming.

  2. These entities control resources comparable to the power of middle-of-the-road nation-state, and deploy them with little effective oversight from anyone.

  3. They deploy those resources to achieve their goals, which may go contrary to goals of the other people, and could cause - even when very well intentioned - enormous harm. A misguided economic intervention can lead to an economic collapse of a country, a misguided social policy can make a major city as unlivable as a bombing campaign (maybe more as the effects are more permanent), a misguided medical policy can rob generations of years of life, new modes of communication can destroy social bonds and cause widespread cultural disruptions, etc. etc. Of course, they are also capable of selfishness and outright evil, though we do not presume they are more inclined to it than average human being (or less, either).

  4. For the sake of this task, we do not consider it moral or practical to destroy these entities or their resources, but want to minimize the potential harm caused by them, including unintentional harm, and potentially maximize their benefit to humanity (workable definition of "benefit to humanity" should be included in the solution, but if you eventually will attempt to align the AI, you must have some ideas what you are aligning it to, right?).

  5. We assume, for the sake of the exercise, that there's no magic lever that we could pull (like: "you do this or we destroy you/take your resources/torture you/kill your dog") to instantly put these entities to somebody else's complete control, or that people that are in control of the lever would be likely under the control of at least one of the entities above, and possibly multiple ones.

  6. In the interest of saving time, we declare all the variants of "we just need to have the right people in control of it and everything will be ok" as a non-solution since a) it just changes the personal or collective entity that needs to be aligned and b) it doesn't provide any practical actionable suggestions.

Any ideas how we could approach solving this task?

3
Jump in the discussion.

No email address required.

We currently solve this problem by having the entirety of billionaire charity amount to like 1/400th of the US budget. At this scale the unaligned entities are basically capable of picking low hanging fruits of what they consider to be good that was neglected by the US government, not going against the US government in any shape or form.

There are some exceptions, like https://www.latimes.com/local/california/la-me-prosecutor-campaign-20180523-story.html, for that light is the best disinfectant.

There are other entities in that space, which are restricted from causing havoc by the 500 years of laws pertaining to corporations. This usually works well, Bernie Madoff was an exception unlike 99% of lawless-except-by-code crypto entities.

Well compared to the whole US government, sure, they are tiny. Assuming US government is still capable of consistent coordinated action and not wholly captured by a network of entities that are not going to allow to deploy its resources to any goal that contradicts their goals. Sure, US government has trillions. But how efficiently can it deploy those trillions towards a single goal? That's a question to which the answer is not clear to me at all.

But the US is not the only country. So "we solve" as in "US citizens solve". What about citizens of Brazil? or Pakistan? Or Nigeria? Or any smaller country? They'd like to have some solution for this too...

which are restricted from causing havoc by the 500 years of laws pertaining to corporations

But are they really? They have a huge part of writing most of these laws, and there's also such thing as "too big to fail"... And most of these laws were written after the harm was already caused, and only are able to prevent the past, not the future. I mean sure, they are somewhat restricted - if Musk or Soros would go on a rampage killing people in broad daylight, we probably know how to handle this. But that's not what we're talking about when we talk about alignment, right? We're worried the harm could be caused in ways we can not perceive or prevent before it's too late, and that's already true.