I’ve finally gotten around to reading Nick Bostrom’s SuperIntelligence. In one section, he talks about so-called “malignant failures” on the path to developing smarter-than-human AI:
One feature of a malignant failure is that it eliminates the opportunity to try again. The number of malignant failures that will occur is therefore either zero or one.
This, to my mind, is one of the scariest things about the AI situation. It’s unlike almost any other human endeavor, in that we have to pull it off without even a single malignant failure, or else it’s game over. And not just “no more humans.” Not even just “no more life on Earth.” More like “all matter in the Hubble volume is paperclips from now on, until the Universe dies.”
There are lots of ways of failing that aren’t malignant. Your project runs out of money, organizational politics destroy your project, you get stuck in a cul-de-sac, your method doesn’t work or hits some scaling limit you hadn’t noticed, etc.
But you can’t have even one meltdown.
Humans are not good at doing this even when it’s very obvious that it’s important. Nuclear power strikes me as a case where it’s been pretty obvious all along that we need to be really careful or very bad things will happen, where we’ve made our best effort at designing for safety first from the beginning. And yet:
As of 2014, there have been more than 100 serious nuclear accidents and incidents from the use of nuclear power. Fifty-seven accidents have occurred since the Chernobyl disaster, and about 60% of all nuclear-related accidents have occurred in the USA. Serious nuclear power plant accidents include the Fukushima Daiichi nuclear disaster (2011), Chernobyl disaster (1986), Three Mile Island accident (1979), and the SL-1 accident (1961).
And in the case of AI, it’s not even obvious to much of the industry that there’s any legitimate cause for concern. It’s as though we’re putting together bigger and bigger masses of subcritical uranium for heat or something in a world where criticality is a poorly-understood fringe theory.
Back in January, Eliezer said:
People occasionally ask me about signs that the remaining timeline might be short. It’s *very* easy for nonprofessionals to take too much alarm too easily. Deep Blue beating Kasparov at chess was *not* such a sign. Robotic cars are *not* such a sign.
… this represents a break of at least one decade faster than trend in computer Go.
This matches something I’ve previously named in private conversation as a warning sign – sharply above-trend performance at Go from a neural algorithm. What this indicates is not that deep learning in particular is going to be the Game Over algorithm. Rather, the background variables are looking more like “Human neural intelligence is not that complicated and current algorithms are touching on keystone, foundational aspects of it.” What’s alarming is not this particular breakthrough, but what it implies about the general background settings of the computational universe.
I hope that everyone in 2005 who tried to eyeball the AI alignment problem, and concluded with their own eyeballs that we had until 2050 to start really worrying about it, enjoyed their use of whatever resources they decided not to devote to the problem at that time.
Another facet of the issue that strikes me is how deeply AI is tied into the incentive structures of our world. You can have arms races, but in general you don’t get incrementally more money for incrementally more nukes. Increased human activity leads to increased CO2 emissions at the moment, but you just have to make clean energy cheaper than the alternatives and that stops happening. Whereas every incremental step towards better AI directly leads to greater ability to make money and accomplish things in the world.
Our options seem to be “solve a practically impossible engineering problem on an unreasonably tight time scale” or “change the basic nature of the human mind, at scale, fast, such that it is able to coordinate at a level totally unprecedented in history.”