The Beginning of the End, or the End of the Beginn...

The Beginning of the End, or the End of the Beginning: What Happens When AI Takes Over?

“If a superior alien civilization sent us a text message saying, ‘We’ll arrive in a few decades’, would we just reply, ‘OK, call us when you get here — we’ll leave the lights on’?  Probably not — but this is more or less what is happening with AI” Stephen Hawking, Stuart Russell, Max Tegmark, Frank Wilczek [1]

Time and again, our intuitive feeling of cosmic significance has been challenged by the progress of science.  Yet somehow, despite losing our place at the center of the universe and learning that we’re just one more product of evolution, we still feel special.  As justification, we can point to our species’ unique progress in language, culture, technology, and social organization, all unparalleled throughout the known universe.  With the development of artificial intelligence (AI), all of this is about to change.

“Man is a rope, tied between beast and overman […] What is great in man is that he is a bridge and not an end.” – Friederich Nietzsche 

pasted image 0

Eventually, machines will be able to do everything a human can, better.

It is an open secret: super-human AI is possible.  Debates over free will and consciousness rage on, but scientists overwhelmingly agree: our brain is made of the same physical matter as everything else, and subject to the same physical laws.  The upshot: intelligence can be reproduced and improved upon with better hardware and better software.  If you haven’t been paying attention, you might have missed the AI takeover: in the last three years, machines have surpassed humans in the tasks of speech and object recognition, and beat us in the game of Go — three aspects long considered landmarks for AI.  Nonetheless, the capabilities of these AIs are narrow: they perform well in the limited settings for which they are developed.  The true turning point will arrive with the development of a super-human artificial general intelligence (AGI). Super-human AGIs will be capable of outperforming humans at every task they set their minds to, and improving themselves using the same (or better) research processes used by the humans who developed them.  Once such a process of recursive self-improvement begins, humans will be behind in a losing race, as the intelligence gap between us and our creations grows increasingly wider and the machines achieve superintelligence.  Nobody knows exactly when super-human AGI will be developed, but the inexorable march of technological progress will bring us to this point, barring some unprecedented interruption.

“Well, will they be nice to us?”Geoff Hinton [2]

The short answer is: we don’t know how to define “nice” (or for that matter “us”).  While the legal profession is a testament to the ambiguity that exists in human communication, humans do understand the ambiguities in language.  Many modern AI techniques use probabilistic reasoning to learn fuzzy concepts, but their objectives are still specified via the literal instructions of computer code.  As a result, the post-modernist’s compulsive deconstruction of every concept has reemerged as an engineering problem: if we wanted to program an AI to be nice today, we would need to unpack the concept of nice and express it in a program [3]. This is an extremely difficult problem; and it might be tempting to wait and see if it doesn’t just work itself out. But before taking such a passive stance, it’s worth considering the consequences of discovering general AI before we figure out how to program it to be nice.

AI.nice = True

A flawed approach to making AI be nice to us.

“Is the default outcome doom?” – Nick Bostrom [4]

Nick Bostrom, who leads the Future of Humanity Institute at the University of Oxford, coined the term “existential risk” (or Xrisk) for threats to humanity’s continued existence (as well as events which could “drastically and permanently curtail its potential”).  Existential risks aren’t forecasts; they are possibilities.  But the magnitude of the consequences (extinction, or worse [5]) arguably makes any plausible Xrisk worthy of serious attention.  While the list of notable Xrisks contains asteroids hitting the earth and supervolcanoes, by far the biggest dangers are products of humanity itself, such as nanotechnology, engineered viruses, and AI, which tops the list.  A superintelligent AI could of course be deliberately programmed to kill all humans.  More disturbingly, it seems likely to wipe us out as a side effect, even if programmed to pursue a seemingly benign goal [6].

The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.” Eliezer Yudkowsky [7]

The designers of an AI must specify every aspect of its operation, including what it is meant to accomplish (aka its terminal goal).  Given this level of control, it seems paradoxical to imagine an AI going out of control.  Once a superintelligent AI has been turned on however, the programmer’s absolute control ends, and the AI has, in a sense, a mind of its own.  If the programmer then desires to change some aspect of her creation’s programming, she may no longer be able to.

The issue is: no matter what the AI is trying to accomplish, there are certain instrumental goals that seem likely to be useful, such as preserving its physical self and its terminal goal, and the acquisition and control of resources.  After all, having its terminal goal changed would likely prevent the AI from accomplishing it, and being shut-down could prevent it from accomplishing anything.  Furthermore, with access to (more) resources, an AI could make (more) copies of itself to help accomplish its goals more effectively, or self-modify to become more effective itself, by, say, improving its processing speed.  Such an AI would naturally come into conflict with humans [8], unless its terminal goal was a near-perfect expression of our values.

In the best case scenario, the AI would have a terminal goal that is “close enough” to capturing the values of its human designers, and we could acquiesce to its repurposing the environment towards this end.  For example, an AI that mostly understands human values, but not gustatory pleasure, might take over management of food production and optimize for nutrition, while being indifferent to flavor.  The outcome looks more grim if the AI doesn’t have an extremely thorough understanding of what constitutes a human being.  For instance, an AI which identifies human happiness by observing smiling human faces might prefer to replace temperamental real humans with ever-smiling mannequins.

“Reinforcement Learning really is the framework for people who are interested in trying to solve the big problems of artificial intelligence”David Silver [9]

The good news is that AI doesn’t need to be programmed to engage in such an unqualified pursuit of some terminal goal.  The bad news is that exactly this kind of problematic goal-directed AI (in the form of reinforcement learning) is currently considered the best candidate for creating AGI by many leading researchers.  Reinforcement learning specifies the goals of an agent in terms of the observations it makes, or more literally, the values registered by its sensors.  This allows an agent to learn how to act optimally to achieve its goal simply by interacting with the world, making it much more efficient than alternatives that require more human involvement.

“Hells Bells, Mr. Lund, if we don’t the goddamned competition will!” – French (character from O Brother, Where Art Thou?)

Finding a way of preventing the harmful pursuit of instrumental goals in goal-directed AIs without seriously degrading their performance is an open problem.  Given an appreciation of the risks this kind of AI poses, we might expect responsible researchers and engineers to avoid taking such risks with systems that could be approaching a dangerous level of intelligence.  Yet the payoffs of having the best AI around are substantial, and become more so as the technology advances.  Economically, having the best AI might mean cornering the market.  Geopolitically, it could mean global hegemony.

There is likely to be a trade-off between safety and performance in the design of any system, including AI, since resources devoted to safety features could instead be used to increase performance.  Even if the performance cost of safety is relatively small, winner-take-all dynamics could drive a race to the bottom, as companies or nations progressively turn down the safety knob in order to get an edge over the competition.  And, like any other technology, AI can be used responsibly or irresponsibly.  The only sure way for researchers and engineers to prevent dangerous AI technology from being used recklessly would be to refuse to disseminate the technology to untrusted actors–but even a single leak could potentially give anyone with an internet connection the power to make an AGI with a few keystrokes.

“It’s tough to make predictions, especially about the future” – Yogi Berra (attributed)

I’ve tried to give a clear and concise explanation of why AI poses an Xrisk.  To summarize: competition will drive people and organizations to create superintelligent AIs single-mindedly programmed to achieve their goals by any means necessary.  Programming goals that imperfectly capture human values could have disastrous consequences, incentivizing AI’s to channel all resources, including those necessary for humans’ well-being and survival, towards serving their goals.

These claims are hugely controversial, but personally, I find them compelling.  As more people confront and respond to these arguments, our collective understanding of AI and Xrisks will progress rapidly.  How we address the development of AI, and how it affects the outcome of AI development, is impossible to predict, but necessary to consider seriously.

“Some AI experts have asserted that the ability to assure safety and control is more important to the future of AI even than improvements in the AI algorithms themselves.”The White House Office of Science and Technology Policy [10]

The past two years have seen an explosion of interest in AI risk, and more generally, the social impacts of AI techniques.  At this point, these issues have seized the attention of journalists, researchers, intellectuals, politicians, and the public at large.  Next time, we’ll talk more about what is being done and could be done to prevent AI from killing us all, and why AI safety isn’t as simple as just pulling the plug.

Robot plugging itself into an outlet

One straightforward counter-argument to the “pull the plug” solution.

David Krueger is a PhD student at University of Montreal, studying Artificial Intelligence and Deep Learning.  He is the founder of the Montreal AI Ethics Group, which brings together researchers from University of Montreal and McGill.  He is currently an intern at the Future of Humanity Institute in Oxford, where he is researching how to train AI’s with limited human feedback. Learn more about David from his website.


[1] Stuart Russell is coauthor of Artificial Intelligence: A Modern Approach, the most popular textbook on artificial intelligence.  Hawking, Tegmark, and Wilczek are renowned physicists.  Tegmark cofounded the Future of Life Institute (FLI); Hawking and Wilczek are scientific advisors for FLI.  Last January, FLI authored an open-letter calling for research aimed at making AI systems robust and beneficial, which was signed by many of the most prominent AI researchers.

[2] Geoff Hinton is the “godfather” of Deep Learning, the technique driving recent AI breakthroughs, which is based on artificial neural networks inspired by the brain.

[3] Techniques which might someday allow an AI to learn concepts such as “niceness” exist, but are not sufficiently developed at present.

[4] Nick Bostrom, a philosopher, is the founder and director of the Future of Humanity Institute (FHI) at Oxford University, and author of New York Times bestseller Superintelligence: Paths, Dangers, Strategies, from which the quote is taken.

[5] E.g. any manner of stable dystopian world order

[6] The classic example being “make paperclips.

[7]  Eliezer Yudkowsky is the founder of the Machine Intelligence Research Institute (MIRI), renowned for its pioneering work on AI Xrisk.

[8] E.g. over the resources needed for our survival, such as the matter we are composed of.

[9] David Silver was one of two lead authors for Google DeepMind’s super-human Go engine, AlphaGo.  DeepMind is one of the largest and most successful AI research group in the world, with over 150 full-time research staff.  Their research focuses on combining Deep Learning with Reinforcement Learning.

[10] The white house recently announced a series of workshops on the risks and benefits of AI.