Mickey Beurskens

The Risk of Trusting AI

“It is “the future”. Our AI systems have finally achieved it: Super Human Intelligence. By some miracle we have ended up at this junction, with a system fully under our control that can match our brightest, but can do so much more quickly and with near infinite copies of itself. If we harness it well it will certainly propel humanity straight towards the next phase, transcending limitations that up until now we could only conceptualize as unsurmountable. But how can we know with certainty that our hyper talented and knowledgeable assistant will not lead us astray. How can we verify that its “intentions” are pure? We know for a fact it is smarter than any of us, because we built it that way. Have we created the world’s most competent plotter? The most convincing liar?…”

“How did we get into this mess?”

The Surgeons And The Miscreants

At first it might not seem like an obvious question. Certainly in every day life there are near infinite situations where there is someone doing something that I do not understand, that is critical, and that I need to trust another to do with good intentions and do well for our society to function. I’d like to know with relative certainty, for example, that my surgeon is a qualified medical professional, and if I visit a hospital I can be relatively sure that he is. Sure, there are moments when things go wrong and trust is breached. But at least we somehow manage to keep this strange system of trust going successfully through many a disruptive war or famine or other difficult disaster of dastardly distinction.

It seems that for all of its faults, humanity has some sort of functional alignment going on between its individual members. Functional in the sense that it is highly unlikely that I would be able to type these words on a “magic”, hand-held typing device that I could also use to control the temperature in my parents house on the other side of the world, had I made the suitable arrangements for that in advance, and even do my taxes. Yet here I am rambling digitally and waxing poetically anyway.

And so we go on our merry way, trusting each other to keep things going for a little longer and not thinking much of the insane amount of luck, or whatever it is that I do not understand fully, that is needed for our situation to turn out that way.

Now, this might sell us short a bit. In practical life we certainly do have systems to detect misalignment. We have all sorts of committees and random checks and audits and meetings and an array of other tools to make sure that people do not step out of line too much. We cannot check everyone, all the time, but we do not have to.

Say ice cream machine company “CreamCone” is counting too much on Randy the sales guy, who generates more than half of the cumulative income, and who retires in two weeks. It might be painful for the employees and the investors, but society has many more ice cream machine companies. Most of the time when our system of alignment fails, it fails locally. People are hurt, damage is done, miscreants might be punished, but from a societal standpoint it is manageable. Random checks and committees and the other checky things are enough to prevent total collapse in a system that is sufficiently distributed in the exposure to risk, which is to say that even if things go wrong the damage is limited. (cough Although we have been experimenting with “single point of failure” models more recently in society in the name of “safety” and “efficiency” cough).

Now, enter (super)human level artificial intelligence. First of all we can not be sure that it safe to trust. In fact, we can only ever be sure that it is not safe, never that it is safe! I say this not from the perspective of a paranoid engineer, although I might be, but from logical fact. The fact that an AI has helped us up to this point and has never shown hostility caused horrific mistakes can never be proof that it will always be safe in the future. Only the negative can be proved: If it does harm us, then we can be certain that it is not safe. It is exactly the same mistake as claiming that “black swans do not exist” because “no one ever saw a black swan”. Absence of evidence is not evidence of absence. Such is the asymmetry of information.

So we cannot guarantee that an intelligent system will have good intentions in the future, no matter how well aligned we think it is based on what it has done in the past. Perhaps science will discover a proof of safety anyway in the future, but I like to work under the (safer) assumption that such a thing will only happen after we have already invented AI systems and started relying on them.

Add to the problem of trust, the problem of scale. As a collective our impact on our world is huge, but even the most powerful individual is bound at least by 1) his or her own mortal body that it cannot copy or scale, 2) a non-perfect alignment to other members of the human race, “what is good for me might not be good for you”, which makes perfect cooperation difficult even if communication is perfect and 3) non-perfect communication. We need to cooperate to thrive, which aligns our actions. But if a single person makes a mistake, then failure mostly happens “locally” by virtue of the facts stated above. Individuals can make mistakes and others can live to tell the tale.

Our AI friend might not suffer quite so severely from being separate people, which is actually one of the reasons that it is successful without being “smarter than people are”. AI is currently becoming useful because it can do many tasks, like copy writing or image generation, much quicker than humans can, even if the quality is not super-human yet. So we’re building 1) a super scalable system, which we can never check by hand fast enough to keep up, that is 2) positioned to be used by almost everyone by virtue of being the “best” version of whatever is on the market, which we are 3) not able to trust on logical grounds. Adopting such a system as the backbone of our economic life and replacing workers left and right without taking the risk seriously is, it seems to me, insanity and a huge and flaming mistake that we might only get to make once.

I have been using a lot of italic text today, but then again it is quite serious.

For now one might argue that the fate of humans and AI is connected. If humans perish then so does AI. After all, AI cannot directly influence the real world. I am assuming that this does not need to stay that way. It would only need a body to be self sufficient, and we’re well on our way with robotics in that respect.

So we should probably reconsider our attitude towards Artificial Intelligence, especially when it starts producing outputs that are hard or annoying to verify, such as that bit of code that your boss asked you to build which determines the flight schedule at the local airport. Likewise I plead caution for those outputs that are damn near impossible to verify and would take an army of PhDs just to understand. It might be tempting to just trust the AI, copy paste and just be done with it for the sake of your favorite goal (progress, monetary gain, relief of boredom). Hell, I am being tempted to just copy and paste and be done with it all the time. But we need to figure out this trust situation. Until we figure it out, I think it best to avoid being exposed to the risk of catastrophic failure using Artificial Intelligence.

So in the absence of any wisdom on solving this dangerous bit of game theory I would caution this:

“Assume Murphy’s Law; For whatever can go wrong, will, so it is best to make sure the damage is contained when it does, and we can continue onward still.”