Sensing versus Inferring in Robot Control Hans P. Moravec Carnegie-Mellon University Robotics Institute January 1987 \section{Introduction} For the last fifteen years I have worked with and around mobile robots controlled by large programs that take in a certain amount of data and mull it over for seconds, minutes or even hours [\cite{Moravec83}, \cite{Moravec85}]. Although their accomplishments are sometimes impressive, they are brittle - if any of many key steps do not work as planned, the entire process is likely to fail beyond recovery. This is a strange contrast to much more modest machines with which I worked and played, and that I contemplated, previously. Their sensors (touch switches and photocells, usually) were wired to motors through only simple logic, but they nevertheless managed to extricate themselves out of many very difficult and confusing situations. One of the most elaborate examples was the Hopkins beast, built around 1964 at Johns Hopkins university, that wandered the halls, centering itself by sonar. When its batteries ran low, the machine used a special photocell array sensor to find standard wall outlets (black on white walls), and to dock with them, to "feed" until the batteries were sufficiently recharged. Also notable were the artificial turtles of British psychologist W. Grey Walter in the 1950s, the most elaborate of which could learn to associate two stimulii in a Pavlovian way, using an array of capacitors for memory. Artificial animals of this kind were the clearest hint of a connection between mind and machine until the advent of computer based artificial intelligence, with its emphasis on the rational aspects of human thought, in the late 1950s. My own thinking has returned to these examples after being prodded by the technical success of the Denning Sentry, which manages to navigate night after night using techniques somewhere between those of the Hopkins beast and our recent "intelligent" robots, and also by the work of my friend Rod Brooks at MIT, who is getting very interesting behavior from small robots controlled by computer simulations of small nervous systems. Industrial vision modules, able to identify parts using methods chosen for speed and simplicity as well as effectiveness, are also relevant. The approaches can be ordered on a spectrum. At one extreme are hardwired responses such as limit switches that turn off motors when they are toggled, at the other are "AI" programs that subject sensor inputs to millions of computations before they decide on an effector action. A given amount of switching logic, or computational power, may be configured anywhere along this spectrum, from broad and shallow to narrow and deep. There are costs and benefits for any choice. At the "shallow" end the data bandwidth to the world can be very high since the logic handles each sensor input only briefly before going on to the next. This can be a great strength. Conversely, at the "deep" end the response can be very flexible, but as most of the logic is tied up making inferences, the system can notice the world only sparsely in space and time. The bandwidth is low, and the latency is high. Strategies near the shallow end can be much more responsive to changing events - with properly chosen connections the action in the world around them in effect becomes part of the robot's reasoning. It is this property that makes many such approaches work so well. Errors made at one moment are sensed and corrected the next. On the other hand, long term goals require the long memory and large scale models possible only in a deep controller. \section{Optimum Controller Depth} It is possible and desirable to organize a robot controller so that independent wide and deep strategies can co-exist. For some occurences such as a collision, an immediate reflex action (Stop!) is appropriate. At other times a long stretch of combinatorial thinking, resulting in a plan, is a good preface to action. My experiences suggest, however, that the most effective use of a given amount of computational power during normal motion is between the extremes of complexity. There is a way of {\it impedance matching} the available processing power to the time constants and complexity of the robot's environment. A larger, faster computer will permit better performance through more complex interpretation of more data, but overly complex algorithms on a smaller machine reduce performance by restricting the rate at which raw data can be ingested. For the time constants that apply to a robot crawling in a sedate indoor environment, controlled by a million instruction per second computer, processing about 100 independent numbers from the sensors every few seconds of travel, thus spending about 5,000 instructions per reading, seems to give the most solid results. Figure 1 is a graphical metaphor for this idea. The Denning machines have hosted a number control programs, obtaining their inputs from a ring of 24 sonar transducers which are fired in banks, and can give readings about three times per second, and also from an optical sensor which reports the azimuth and elevation of wall-mounted navigational infrared beacons. The simplest programs are able to poll the sensors at their maximum rate, and process all the readings in a uniform manner. The massively redundant data helps eliminate false readings, but the computer is so busy massaging raw data, there is no time to model the surroundings; decisions to slow down or speed up, veer left or right, must be made by simple tests and arithmetical combinations of sonar distance averages and beacon sensor angles. In this mode, halving the rate at which data is collected and processed affects performance only slightly. The complexity of the controller is too low for maximum effectiveness. We are now getting much richer behavior from a program nearer the optimum complexity, a mapper that processes about two hundred sonar readings collected during each five meters of travel into an occupancy map of its surroundings, and matches this map to ones built in an initial training run to accurately locate itself. The map building and the matching each consume about three seconds of onboard computer time, and the robot is able to navigate indefinitely soley by extended landmarks. The maps can also be used to locate corridors, and have other future potential. Any increase in program complexity would either require slower robot motion, or else would result in frequent navigational errors. Simplifying the processing would have the same effect, since the maps would be less accurate and reliable. Another program I feel is close to the optimum for a 1 mip machine controlling a slow indoor robot is the scanline stereo program of Serey and Matthies [\cite{Serey86}]. By restricting the visual field of two cameras to a single scanline, this program successfully navigates by subjecting about 500 numbers to a dynamic programming stereo correspondence method that gives a distance profile of objects penetrating a camera height horizontal plane around the robot. Properly optimized this program could produce its results with a few seconds of computation per image pair. The road following methods reported by Wallace et al. [\cite{Wallace84}, \cite{Wallace85}] similarly qualify. Programs too high in complexity for efficient robot operation on a 1 mip machine include the full three dimensional stereo navigation program of my "Stanford Cart" thesis, and to a lesser extent its somewhat simplified descendents [\cite{Thorpe84}] at CMU, and especially the programs controlling the SRI "Shakey" robot in the early 1970s [\cite{Nillson84}]. The symptoms of over complexity are brittleness - the program is unable to take in sufficient data to verify the correctness of its observations of the world, and often makes and executes elaborate plans on the basis of mistaken assumptions, leading to spectacular failures. The Cart programs fail to correctly cross a room about half the time, and I believe Shakey never completed a typical three or four step plan without human intervention. Compare this to the Denning robot, which manages to automatically patrol several thousand feet of office corridors for 30 nights at a time fully automatically, returning to its recharging hutch every morining. The impedance match of an overly complex program can be improved a little by effectively increasing the time constant of the world - that is slowing everything down. The Cart took five hours to cross a room, and Shakey took longer to complete a single task - each vision step alone involved and hour of processing. It could be argued that stretching in the direction of complexity is necessary so that the techniques will be ready for the more powerful computers to come. I think this approach is probably not the best one. Working with an overly complex controller in less than real time can dramatically limit the number of experiments that can be done. Yet experiments are the primary means by which effective methods can be distinguished from ineffective ones. Fundamentally, Shakey and the Cart can be considered spectacular, isolated stunts whose success and usefulness cannot be judged until their methods are experimentally and objectively pitted against a variety of other ways of accomplishing the same ends. Such experiments are still too expensive to do today. There is an alternative approach to developing the robot controllers of the future. The size of the tetrahedraon in Figure 1 represents the amount of computer power available in a robot. The tetrahedron is growing in size as the cost of computation decreases over the years, taking the "best performance" plateau up with it. Instead of struggling heroically to sit uncomfortably and ineffectively on the "Intelligent Systems" ledge at the top of the tetrahedron, from now on I propose to ride the "best performance" plateau as effectively as possible. I believe by doing so my experimental discoveries, modest though they may be individually, will accumulate in a steadily improving system over the years, with both complexity and bandwidth increasing gradually as the processors available improve. \end{document}