Comment on Open AI’s Efforts to Robot Learning

Posted: 2016-07-28 in Uncategorized

On July 20, I was contacted by MIT Technology Review to comment about OpenAI’s efforts to train robots to do tasks in the home using reinforcement learning. I am posting the questions and answers below, the corresponding article is here.

Question: Reinforcement learning seems to be taking off in some industrial settings. How important do you think it will be to the future of robotics?

Answer: I believe that reinforcement learning will play an important role in the future of robotics: For many decades, robotics research was solely based on fundamental principles of physics and mechanics. There has been noticeable success with this approach, including Honda’s walking robot “Asimo”, Dyson’s Eye-360 vacuum cleaning robot or ETH’s impressive quadcopter maneuvers. However, when we use engineering knowledge, we also need to make some simplifying assumptions, e.g., point contacts of feet, which is heavily exploited by many walking robots, constant wind or simplified models of friction. These assumptions impose various limitations on the robotic systems and their behavior. Furthermore, despite many efforts, the behavior of the robot can never be modeled perfectly, and sometimes the assumptions made yield relatively poor models. This is especially the case when we are interested in more “skilled” or “dynamic” behavior of robots, where we need more elaborate and adaptive models.

In this situation, machine learning can be valuable: Machine learning can be used for learning models from data directly without the need for specific engineering assumptions. This allows for a high degree of automatic adaption, which is important in “open-ended robotics”. In practice, we would probably use engineering knowledge as much as possible for an “idealized” model and then use machine learning to correct this idealized model based on observed data.
Reinforcement learning is an important area of machine learning and concerned with experience-based learning of good control strategies from trial-and-error. This means that the underlying algorithms can learn from failures and refine their control strategies in order to become better. Reinforcement learning has recently had great success stories, most notably Google DeepMind’s AlphaGo system that beat the world-champion in Go.
Reinforcement learning has also been successfully applied to robotic systems over the last two decades, e.g., for learning acrobatic helicopter maneuvers, table tennis playing, folding clothes, autonomous driving or walking robots. Nevertheless, most of these “robot learners” do not learn “from scratch” (purely trial-and-error), but they rely on a good initialization from human demonstrations (where you show the robot a crude way of how to solve the problem) or engineering knowledge to simplify the learning problem. In future, it is conceivable to remove the engineer from the loop and automate learning to a much higher degree. In order to achieve this, we need to address some challenges, such as data-efficient learning (see below).
Question: Could OpenAI’s efforts have industrial benefits, too?
Answer: OpenAI announced to be working on enabling an off-the-shelf robot to perform basic housework. If this goal can be achieved then there will be economic and industrial benefits, too: Imagine a Roomba not only cleaning your floor but also doing the dishes, ironing the shirts, cleaning the windows, preparing breakfast etc. Since the robots are “off-the-shelf”, they can be manufactured in huge quantities (millions), which will drive their price down, making them attractive for customers.

Industrial settings are typically much more constrained than uncluttered households. Many industries already use robots (e.g., car manufacturing), but instead of using a single robot to build an entire car, there are many robots, which are specialized to do a single thing. Since the industrial setting is constrained, adaptation is hardly needed, and these specialized robots perform their task with high accuracy and high speed.
There are other industries that could benefit from OpenAI’s efforts: For example, in agriculture, a robot that can detect and harvest ripe strawberries (or other fruits/vegetables) would be useful; we could also imagine robots cleaning the streets and collecting garbage from homes.
Question: OpenAI is using off-the-shelf robot industrial robots for its research. DO you see any problem with that approach?
Answer: The idea of using an off-the-shelf robot aligns with OpenAI’s philosophy of reproducible research: You can buy the kit yourself and use publicly available software to reproduce their results.
Question: What else needs to be solved for robots to go into such unstructured an environment as a home?
Answer:  Generally, the robot needs to be able to close the perception-action-learning loop (perceive the environment, (inter)act with the environment, learn from experience)

A robot in an unstructured environment, such as a home, needs to be able to use general-purpose sensors to “see”. General-purpose sensors can include cameras, but also lasers or tactile sensors. Although object recognition has made leaps in the last years, the problem is not yet solved, especially not in a standard household where objects (muesli containers, plates, cutlery, flour, …) are not well separated from each other.
The robot needs to be able to choose a sequence of actions in order to solve a high-level problem (e.g., “make breakfast”). Starting from making tea (fill the kettle, boil water, find the teabag, find a cup, put teabag into the cup, pour hot water into the cup, …) to setting the table, this requires general manipulation skills that (at the moment) no robot possesses. An additional challenge is to make decisions quickly: We cannot wait for 60 minutes for the robot to make breakfast.
Robots need to be able to automatically (and quickly) adapt to a new environment. Machine learning is a good way for automatic adaption, but in order to adapt quickly the robot needs to be able to extract valuable information from small amounts of data, i.e., the learning algorithms need to be data efficient. Data-efficient machine learning can be summarized as “the ability to learn in complex domains without requiring large quantities of data” [1], and ideas toward data-efficient learning include transfer learning (e.g., how can I exploit knowledge about playing baseball when I start learning softball?), incorporation of structural prior knowledge (e.g., engineering prior knowledge or symmetries) and Bayesian optimization (data-efficient automatic optimization method).
Question: Do you think we will need new hardware as well as software?
Answer: Currently, the software seems to be the bottleneck. However, independent of this, better hardware could also lead to substantial improvements: Soft manipulators (robotic hands) and elastic and non-point-mass feet (e.g., similar to a monkey’s feet) are concepts that researchers started working on.  We may also need new types of materials for designing better hardware (e.g., tactile sensors).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s