AlphaGo vs Lee Sedol – The new AI Challenge | Marc's Machine Learning Blog

AlphaGo vs Lee Sedol – The new AI Challenge

Posted: 2016-03-12 in research
Tags: Deep Learning, Reinforcement Learning

On March 4, I was contacted by the Xinhua News Agency to comment on the upcoming Go match between Google DeepMind’s AlphaGo algorithm and the top-Go player Lee Sedol. I am posting the questions and answers below:

Question: DeepMind says that AlphaGo combines an advanced tree search with deep neural networks, which allows the program to predict what the next move will be. So what is the advantage of AlphaGo AI design over traditional AI methods, which construct a search tree over all possible positions?

Answer: A central challenge of Go is that building and searching through the complete search tree is impossible: We are talking about $250^{150}$ different possible game outcomes. In nature, we could imagine a tree that grows from the ground to the sky. The ground (root) is the state of the game in the beginning. Every move in the game would split up the current branch into 250 new ones. Each new branch splits up 250 times again, and so on. If we repeat this 150 times (the number of moves a typical game lasts), we end up with 10^360 branches at the end, an unimaginably large number. And this number is the number of potential game outcomes of Go, such that exhaustive brute-force search for the best game strategy is impossible. Therefore, approximations need to be made.
AlphaGo combines Deep Learning and Monte Carlo Tree Search (MCTS) to play Go at a professional level. MCTS is a heuristic search strategy that analyzes the most promising moves in a game by expanding the search tree based on random sampling of the search space. MCTS is used by the previous state-of-the-art Go playing systems, which played well on smaller boards. However, for the full board, MCTS cannot be applied because the search tree is simply too huge.
AlphaGo’s key insight is a clever strategy to reduce the complexity of the search tree using Deep Learning techniques: 1) The depth of the search tree is reduced by simulating possible games up to some point (say, ten moves ahead) and estimating the quality of the game play from here using a “value network”. 2) The width of the search tree is reduced by exploring only a small number of promising branches. These branches are selected by the “policy network”.
Therefore, AlphaGo provides an interesting way of reducing the complexity of huge search trees in decision making problems, which is not specific to the game of Go, but can be more widely applied.

Question: What area can the machine learning techniques behind the computer program AlphaGo be applied to? Facial-recognition processing and predictive search?

Answer: There are two key ingredients in AlphaGo: Deep Learning and Monte-Carlo Tree Search. Both of them individually have many successful applications: Deep Learning is already being applied to various areas, such as image recognition, text translation, audio/text processing, face recognition, reinforcement learning, robotics, etc. Monte Carlo Tree Search is often used in Game Play AI, e.g., board games (e.g., Go, chess, Hex, Settlers of Catan), real-time video games (e.g., Pac-Man, Fable Legends), skat or poker.
AlphaGo now combines MCTS and Deep Learning, which will allow us to explore a new class of problems: Incredibly hard decision-making problems with huge data bases of decision outcomes. Immediate applications are currently unclear, but it may be possible for companies with huge data bases, such as Google, to phrase some of their core businesses as a problem that AlphaGo can solve (e.g., ranking of websites).

Question: If AlphaGo beat the top Go player in the upcoming match, does that mean computers are way “smarter” than humans, and we can totally rely on computers to tackle complex real world problems?

Answer: In some sense, we already have incredibly smart computer systems that we use every day: Apple’s Siri, Google Now, Facebook’s friend suggestions, Amazon’s purchase recommendations, Microsoft’s ranking in video games — all of them rely on AI technologies at their very core.
If AlphaGo beats the top Go player, it means that there is a computer program that plays Go at an amazing level. We can compare AlphaGo to DeepBlue, the computer program that beat Gary Kasparov in chess in the 90s: It is a computer program that can solve one particular task very well – way better than humans. If AlphaGo does not beat the world’s best Go player in March 2016, it is just a matter of time until this happens.
We should be careful when interpreting the results of these human-computer competitions. For AI researchers, these competitions are interesting because they are milestones toward human-like AI. Chess used to be the game that an AI system needs to be able to play well in order to get to human-like intelligence. When this was solved in the 90s by IBM’s DeepBlue, we raised the bar. The “Jeopardy” challenge was the next milestone problem until IBM’s Watson won the Jeopardy competition against the best human players. The game of Go was the most
recent milestone — Go is so much more complicated than Chess that it was not expected to be solved in the before 2025. None of these milestones have (yet) led to a truly intelligent system that we would consider similar to human intelligence and behavior.
AlphaGo and other AI systems that are based on a technology called “Reinforcement Learning”, such as IBM’s Watson, exhibit some properties of human learning: experience-based learning based on trial and error. This means that the underlying algorithms can learn from failures and refine their strategies in order to become better. Nevertheless, computer programs are not at the level of general intelligence that humans exhibit. Some features of human learning are
currently difficult to achieve by AI systems, e.g., the general ability to transfer knowledge from one problem to a new one, the ability to learn from limited experience (AlphaGo needed to play tens of millions of Go games to initialize learning, and many millions more games to achieve this level by playing against itself), the ability to reason at abstract levels or the ability to cooperate with other humans.

Question: IBM’s Watson platform also uses machine learning techniques. What is the status quo of machine learning development in the world of AI?

Answer: Most substantial advances in AI in the last 10 years have been made in an area coined as “Deep Learning”. Deep Learning combines the modeling flexibility of artificial neural networks, which have been around for decades, with the increased availability of huge data sets and computational power. Deep Learning exploits the combination of these three things to automatically extract compact representations of huge amounts of very complicated data, e.g., images, speech, audio or text data. These Deep Learning algorithms outperform all carefully engineered solutions to image recognition, speech translation etc., which makes Deep Learning so promising.
There are at least two other areas in AI that experienced a recent boost: Reinforcement learning for autonomous learning by trial and error (largely triggered by DeepMinds paper in Nature 2015) and Bayesian optimization for efficiently optimizing unknown but expensive-to-evaluate utility functions (this turns out to be useful for training Deep Neural Networks).

Question: With the continuing development of AI, how long will it be before intelligent personal assistants like Google Now and Siri can work smartly and interactively like Jarvis in the movie Iron Man?

Answer: We already use (electronic) personal assistants today (e.g., Apple’s Siri or Google Now): They communicate with us, keep track of us, remind us of our next travel, suggest accommodation and restaurants based on our preferences.
The development of AI has accelerated in the last 5 years, mostly in Deep Learning. This development was not at all predictable 10 years ago where only a handful of research groups were working on the relevant algorithms. If the success (and funding) of AI keeps accelerating at this rate, we may see something like “Jarvis” already in 10-20 years, but the research focus may move away from Deep Learning to the next research area as it has happened many times before.

Having followed the first three Go matches live (yes, it is worthwhile getting up at 4:00 a.m.), I have to say that I am deeply impressed by AlphaGo’s performance. Although I was expecting that AlphaGo would not lose by huge margins in these matches, the performance so far is simply fantastic. It is quite interesting to see how AlphaGo surprises even the experts with some moves that in the end seem to be part of a long-term strategy. It is also interesting to see how the attitude toward AlphaGo changed in the first three games: In the first match, the commentators were talking about some “strange” moves whereas in the third match the acknowledged that AlphaGo is an expert Go program that is capable of coming up with successful moves a human player may not have made. I am not sure whether this is always the case, but the attitude has changed in AlphaGo’s favor.
Silently, I am hoping to see a small glitch (some move/strategy that does not work out) that exposes some of AlphaGo’s limitations, such that one can continue improving the underlying technologies.

DeepMind’s AlphaGo has just set another milestone toward AI (after IBM was doing this in 1997 and 2011 with DeepBlue and Watson). A great effort and success. Congratulations to DeepMind!

Marc's Machine Learning Blog

Follow me on Twitter

Tags

Recent Posts

Archives

Categories

Top Posts & Pages

Blog Stats