Caution! Robot Vehicle! Hans P. Moravec Robotics Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213 August 1990 \section{Introduction} A special road sign bearing the legend of the title greeted visitors to the Stanford Artificial Intelligence Laboratory during the time it was housed in the starship (unconvincingly disguised as the Donald C. Power building) that parked on a Stanford hill from the mid sixties to the mid eighties. The sign, near the periphery of SAIL's grounds, referred to the Stanford Cart, a guerrilla research project near the periphery of John McCarthy's core interests, but motivated by his desire for autonomous vision-guided (as opposed to co-ordinated wire-guided) automatic cars. In a 1969 essay \cite{JMC69} {\em Computer Controlled Cars}, John suggested that the power of a PDP-10 (in a smaller package) was adequate for the job. I think John still favors this estimate. John is guided by a strong and principled intuition that has proven itself correct in very many things. But in this paper I will present accumulating experimental evidence that hints that in this one opinion John's intuition misled him by more than a few orders of magnitude. \section{Cartography} How can one expect to interpret an image sequence at the many frames per second rate necessary for driving, on a one million instruction per second machine? A reasonable picture of the road by itself is an array of a million numbers. To even {\it touch} each of these pixels takes several seconds---and doing anything substantial at least several times longer. Driving imagery includes traffic, obstacles, road signs and other features that appear swiftly in all parts of the image, and often call for swift response. One answer often voiced in the early days was that only a small fraction of the image need be examined---with sufficient cleverness most of it could eliminated as uninteresting {\it a priori}. In 1971 Rod Schmidt used this approach in the first Cart thesis \cite{RAS71}. With Rod's program (about 200 Kbytes of tight assembly code, close to the upper size limit in those days) the Cart, moving at at a very slow walking pace, visually tracked a white tape line on the ground. The program contained a predictor for the future position of the line based on its past position. It digitized about 10\% of the full image around the predicted line position, and applied a specialized operator to once again find the line in those bounds. This location served as the next incremental input to both the predictor and a steering servo calculation. Using about half the power of the PDP10, the program could handle one image a second, and was able to follow a line for about 50 feet at a time---if the line was unbroken, didn't curve too much and was clean and uniformly lit. Rod noted that handling even simple exceptions, such as brightness changes caused by shadows, would require several times as much computation to search over alternative interpretations, and detecting and responding to obstacles, road signs and other hazards promised to be much more complicated. In the early 1970s computer vision was less than ten years old, and almost all was of the ``blocks world'' type, which reduced an image to a list of geometric edges before doing anything else. It was quite inappropriate for outdoor scenes containing few simple edges, but many complicated shapes and color patterns. A major exception to blocks world approach was a project begun at SAIL with impetus from Joshua Lederberg and John's enthusiastic support. It was to look for changes on the surface of Mars that might indicate life. Using digital images from Mariners 4, 6, 7 and 9, the project worked to register, in geometry and color, views of the same regions taken at different times, so that any differences could be detected. Since the spacecraft locations were known only approximately, the image registration process was to be guided by surface features themselves. Lynn Quam and associates developed a collection of statistical, intensity-based comparison, search and transformation methods that did the job \cite{PDQ71}. Since they dealt with complex natural scenes, those methods also seemed appropriate for interpreting imagery from an outdoor vehicle. Since NASA was then considering the possibility of a semi-autonomous roving robot explorer for Mars (20 years later they still are), there was a double bond between the Cart concept and the Mars group. I arrived in late 1971, an enthusiast for space and robots, and quickly adopted the Cart from Bruce Baumgart, its foster father. The Cart didn't have much of a research reputation, but discussions with Lynn produced a plan where I would provide a working vehicle (a non-trivial project given the shoestring construction) and PDP-10 resident driving control software. Lynn and company would adapt their image methods for visual navigation. By 1973 I was having a good time building and test-driving new remote control hardware and software, when a remote control misstep crashed the poor Cart off a small loading ramp. Months of low budget repairs left its TV transmitter still broken, and led me to beg John to invest several thousand dollars for a replacement. He agreed, but insisted that I demonstrate my ability to do vision programming. Real time was not an issue in interpreting Mars images. The missions were several years apart, and, until the Mariner 9 orbiter, each produced only a few dozen images. The Mars group could afford to run search programs for hours at a time to find exceedingly precise and dense matches over large image areas. A Cart-driving program might forgo this precision and coverage in exchange for speed. It seemed many tasks could be accomplished with just two basic image operators---one to pick out a good collection of distinctive local regions across a scene (here called {\it features}), and another to find them in different views of the same area. Three dimensional locations could then be determined by triangulation, obstacles detected, and the motion of the robot deduced. I set about to find fast implementations of these ideas. Working mostly with spatially compressed images, and cleverly coded in assembler, my operators were able to pick out a few dozen good features in one image and reacquire them in another using about ten seconds of computer time. In 1975 I built a program around them that controlled the Cart's heading by tracking horizon features on the roads around SAIL. The program would repeatedly digitize a frame and, in fifteen seconds, determine the horizontal displacement of features on the (usually tree-lined) boundary between ground and sky since the last frame, calculate a steering correction, and drive the robot up to ten meters. It did its unambitious task quite well, and was fun to watch. But it was intended as a mere practice for the main event, a much more ambitious program that could drive the Cart through an obstacle field by visually tracking its surroundings in three dimensions---to build a map, identify obstacles, plan safe routes and, most difficult, visually monitor and servo the robot's motion from the apparent motion of those surroundings. I decided to approach this task in full three dimensions from the start, hoping eventually to run the robot on the rolling adobe terrain outside the lab (a vain hope). The Cart carried a single camera, suggesting the use of forward motion of the vehicle to provide a stereoscopic baseline for triangulating features. Its motor control was very imprecise, so the motion would have to be solved simultaneously with the three dimensional position of tracked objects. The Mars team had a similar problem, and Don Gennery had already written a least squares "camera solver" for it. I struggled with this approach through early 1977. The program would take a picture, and choose up to a hundred features. It would then drive the robot forward about a meter, stop, take another picture, and search for the same features in the second image. Then it would invoke the camera solver to find the camera displacement and the feature positions. Despite much hacking, the program's error rate never dropped below about one wrong motion solution in four. At that rate the robot could move about four meters before becoming confused about its position relative to everything else----discouraging. The camera solver's answer was one that minimized an error expression involving initial estimates and calculated values of the feature positions and the camera displacement, each weighted by position uncertainty. It also had a way of pruning features with aberrant positions, as might be produced by incorrect matches in the images. It worked well for high quality spacecraft images of an almost two dimensional surface, with few matching errors and good {\it a priori} camera position estimates. My data, from noisy TV images of a nearby very three dimensional scene, with plenty of perspective distortion from frame to frame, was something else. Ten to twenty percent of the feature matches were wrong, often because an area chosen in a first image had, in a second image, been eclipsed, or had its appearance changed, by point-of-view effects. Cart movements, which produced the stereo baselines, could be controlled and estimated with only about 50\% precision. Also, the feature position accuracy in my low resolution images was modest, compounding the serious limitation that forward motion stereo degenerates for points near the camera axis. The combination of many outright bad points and large uncertainties made finding the right answer a chancy proposition. It was necessary to track about one hundred features to achieve even this performance, consuming several minutes of computer time. Improving my already very good matcher by handling its statistics more thoroughly, or alternatively widening the camera solver search, in hope of catching the correct solution more often, would increase the run time severalfold, one by multiplying the inner loop time, the other by increasing the number of iterations of the outer loop. Instead I chose to add some robot hardware to reduce the computational uncertainties. Multiple cameras or a repositionable camera on the robot would permit true stereoscopy, allowing three dimensional locations of features relative to the robot to be determined at each stop. Mismatched features between stops might then be pruned, before solving for robot motion, by exploiting the rigid motion constraint, i.e. that the mutual distances between pairs of features should remain unchanged by a move. Vic Scheinman, a steadfast friend of the Cart, found in his basement a mechanism able to slide the camera about 60 cm from side to side. Motorized, this provided a fine stereo baseline. Errors were further reduced by exploiting the redundancy of nine pictures taken across this track. The final result, first sufficiently debugged in October of 1979, was a program that would track about thirty small image features at a time to visually servo the robot through indoor clutter, mapping it and avoiding obstacles, using ten minutes of computing per meter of travel. In five hours it would arrive at a requested destination at the opposite end of a thirty meter room, succeeding in about three traverses in four. Outdoors the harsh contrast between sunlight and shadow overwhelmed the vidicon camera and degraded the success rate. \cite{HPM80}. \section{Fast Cars} In 1977 Japan's Mechanical Engineering Laboratory demonstrated a stereo-vision guided autonomous automobile that could follow well defined roads for distances of about 50 meters at speeds up to 30 km/h, using highly specialized hardware occupying a rack on the passenger side of a small car \cite{MEL89}. Two television cameras were mounted, one above the other on the car's front grill, and oriented so that their fast scan direction was vertical. Their video signals were electronically differentiated, to detect brightness changes, then quantized into binary bit streams. The streams from the two cameras were matched at various offsets by a tapped shift register and a bank of binary logic comparators. When properly adjusted for local conditions, this circuit, doing the equivalent of about 50 MIPS of computing, provided an indication of the distance of about eight major visual discontinuities, such as road embankments and obstacles, thirty times per second. The range indications were sampled about ten times per second by a $1/4$ MIPS minicomputer programmed to keep the vehicle on road and veer around obstacles. In 1984, as part of its Strategic Computing Initiative, DARPA initiated an overambitious program called ``Autonomous Land Vehicles'' (ALV) that promised stealthy robot crawlers to do reconnaissance, sabotage and perhaps combat on a battlefield. A decade of computer vision work had convinced the managers that stereo vision was too hard a problem for their time frame, but they guessed that the rest of the perception and navigation problem was tractable. Before being abandoned five years later, the project had financed a half dozen small experimental vision-guided vehicles and two large ones. A large rough-terrain vehicle at Martin-Marietta in Denver \cite{ALV89} was equipped with about 50 MIPS of computer power, color television cameras and a scanning laser rangefinder that provided, twice a second, a 128 by 256 array of distance measurements across the field of view, doing by physics what was impractical by computation. A similar machine, based on a large Chevy van, was constructed at Carnegie Mellon University \cite{CET90}. By the end of the project the ``ALV'' and the ``NavLab'' were both driving at down dirt roads at speeds up to 50 km/h, but usually much slower, tracking road boundaries with color based image operators, and stopping for obstacles detected by a minimal processing of the laser range data. Both were able to do this as long as the road boundaries were relatively well defined and without major discontinuities. But the simple road identification operators achievable with 50 MIPS were often fooled, and both vehicles were unlikely to stay on a road for a whole kilometer. The ALV program had specified off-road navigation as a subsequent goal, but the first phase results left few avenues for that. A project in Germany begun in 1984 \cite{DED90} produced a van that sometimes drives autonomously on the Autobahn at up to 100 km/h guided by the output of a single monochrome camera. The camera image goes to an array of up to a dozen specialized image processors, each with an effective computing rate of about 10 MIPS that permits simple operations, such as convolving an image patch, to proceed at a full 60 frames per second. Each processor is programmed to keep a single small image window on a feature in the scene (just like self-steering TV guided bombs and missiles). The features are pointed out to the system manually at the beginning of an autonomous run. Typically one is the left edge of the highway or the lane, another the right edge. These are tracked very much in the way Rod Schmidt's program tracked its white line. Other features are chosen on license plates or other distinctive marks on traffic ahead and to the sides of the van. Using motion prediction techniques, the image processors are able to maintain their visual locks for many minutes. Their output goes to another processor that servos the vehicle to stay in the lane, and to keep a safe distance from the other traffic. In none of the above systems is it advisable for the human supervisor to stray far from the manual override button---these simple-minded machines are very easily confused by such common driving events as shadows, road stains, lampposts, stopped cars or sudden curves. Recent work by several of the research groups has begun to rely on a combination of dead reckoning and satellite navigation for primary guidance, with sensing being demoted to the single task of slowing or stopping the vehicle when an obstacle blocks the way. With this simplified approach, reliable high speed ``playbacks'' of human driven routes have been demonstrated, adequate, perhaps, for controlling the repetitive trips of ore trucks in strip mines. \section{Night Crawlers} The availability of cheap microprocessor ``brains'' since 1980 encouraged dozens of individuals and groups worldwide to build or acquire small mobile robots. In hobby efforts, small companies, industrial and government labs, high schools, university undergraduate and graduate projects, programmable vehicles were built and operated. Low budget machines relied on contact, sonar or infrared proximity detectors and fractional MIPS processors; the more expensive versions carried TV cameras or laser rangefinders of various kinds, and often several 1 MIPS processors. A few had onboard manipulators. A half dozen companies offered hobby robots costing a few thousand dollars and controlled by eight bit processors. The majority were abandoned within a few years, having achieved, at best, feats similar to the 1950s toylike pre-computer light seeking turtles of W. Grey Walter and Norbert Wiener [WGW61,NW65] or the wall-socket feeding Hopkins Beast of 1965. Several more advanced machines with sonar or optical range sensors built two dimensional maps of their surroundings using a blocks-world-like edge-based representation. These could work in real time with clean measurements from robots moving slowly in simple office environments, but were overwhelmed in cluttered spaces, or where artifacts such as specular deflection of the ranging beam produced a high rate of range errors. A small number of projects with camera equipped robots applied stereo, range stripe and shading-based scene interpretation methods from computer vision research---methods that consume minutes or hours of computer time. Most have returned to mainstream vision research, having decided that a robot is an expensive and inconvenient way to obtain a few pictures. A few startup companies in the 1980s attempted to deliver autonomous mobile robots for industrial (as opposed to toy or experimental) markets. Still struggling are Denning Mobile Robotics of Wilmington, Massachusetts, Cybermotion of Roanoke, Virginia and Transitions Research of Danbury, Connecticut. All produce battery powered machines roughly human in scale weighing a few hundred pounds, controlled by about one MIPS of computer power, costing about \$50,000. They navigate and detect obstacles with various optical and acoustic sensors, but not computer vision. The companies have addressed applications in building security, factory parts delivery, TV studio camera transport, floor cleaning, warehouse inventory, hospital and office mail delivery---anywhere where navigating around is half the battle. These markets have proven difficult thus far, and most sales have been to mobile robot research groups. Most familiar to me is Denning, founded in 1982 with the idea of making robot security guards (or roving burglar alarms) to patrol and detect intruders in large warehouses or office suites. In 1983 the company decided on a shape like an oil drum, with three driven wheels and a steering arrangement that ganged the wheels and a sensing ``head''. Obstacle detection was by a belt of 24 inexpensive Polaroid sonar rangefinders. An area photodiode, which indicates the position of a spot of light, provided navigation keyed to infrared beacons mounted at the ends of hallways. Control was by a 1 MIPS Motorola 68000 microprocessor and a few smaller processors. In several years of evolutionary development that addressed increasingly subtle failure modes, the company demonstrated machines that patrolled aisles and hallways for minutes, hours, days and eventually months at a time without human intervention. A facility-specific guidance program drives the robot down beacon equipped hallways and integrates wheel rotations to navigate beaconless corners. The sonar readings are averaged in a simple way to squeeze through doorways and avoid obstacles \cite{MK86}. Each morning the program guides the robot to ``sleep'' in a recharging bay. In 1989, for AF Associates, a maker of TV studio equipment, Denning added a much more precise navigation instrument that relies on retroreflective tapes in the environment sensed by a horizontally spinning laser/detector on the robot. Triangulation from the angular position of three such tapes gives the robot's position to better than one centimeter. Several operatorless TV cameras in national news studios now move on bases that are fat Denning robots. The same guidance methods have been combined in successful demonstrations of vacuuming and wet scrubbing robots, in work with Windsor Industries, a maker of industrial cleaning machines. To date, robot navigators that attempt to model the world comprehensively are caught in a dilemma---either they reduce noisy sensor data too selectively and too uncritically, and so are easily confused and ``brittle'', or they consume hours in statistical deliberations, and so are unusably slow. The most successful machines, such as the road vehicles of the previous section, and the commercial robots of the last paragraph, make do with minimal or no models, keeping a short path between sensors and effectors. Sensor glitches in these simple-minded robots cause momentary behavioral transients, but are quickly compensated by subsequent inputs. Rodney Brooks' group at MIT has taken this ``reflexive'' approach to new highs of complexity \cite{ROD89}. Rod's small robots carry simple sensors and about 1 MIPS of microprocessor power, programmed with multiple interacting layers of reflexive programming by means of a special workstation-resident compiler. The result is complex and moderately competent behavior, resembling that of insects. The devices are particularly engaging because most of the navigational problem solving necessarily involves physical probings an scurryings rather than internal data manipulation. My own lab's research has taken an intermediate course. In the early 1980s my students streamlined the Cart obstacle program (which, with its three dimensional point maps still had a distinct blocks world flavor) by exploiting constraints (like the knowledge that the vehicle moves only two dimensionally) to produce a program that ran about ten times as fast \cite{CET84}. Its navigational accuracy was increased by the same factor by more elaborate modeling of geometric uncertainties \cite{LHM89}. But its 3 in 4 room crossing success rate was hardly changed. \cite{LHM89} describes further work in multiple view stereo vision that could, in principle, allow a denser and more robust world map---but at the usual price of hours-long processing per image set. In 1984 we changed our approach when we accepted a contract from Denning to do map-based navigation using their obstacle-detecting sonar ring. Each sonar transducer emits a $30\deg$ beam and reports the first echo it hears, leaving a great uncertainty about the lateral position of the detected object. We dealt with this by modeling the robot's knowledge of its surroundings as a spatial occupancy probability function (a {\it map}), represented as a discrete grid. A sonar reading was itself such a distribution (in this case called a {\it sensor model}), which was projected onto the appropriate part of the map, raising and lowering values there. We were surprised in 1985 when an {\it ad hoc} implementation gave us a robot program able to build maps of its surroundings and cross cluttered rooms with almost perfect reliability \cite{HPM88,AE89}. Rasterizing space with a probability values is potentially expensive, but with a coarse two-dimensional grid and optimized coding, our program processed 10 sonar readings per second on the 1 MIPS Denning robot. A key navigational step, matching two maps of the same area, took 3 seconds. In a recent developments, we have used the approach with stereo vision data, and devised a learning procedure for tuning sensor models. The best versions of our program can build two dimensional maps in real time on 10 MIPS processors. The approach extends naturally to three dimensional maps, which can be considered stacks of about 100 two dimensional maps, but the computational cost rises to 1,000 MIPS. \section{Reflection} Twenty five years of vision and robotics experience has given rather consistent results: 1 MIPS can extract only the most trivial real time measurements from live imagery---tracking a white line or a white spot on a mottled background is near its upper bound. 10 MIPS can track a complex gray-scale patch---smart bombs, cruise missles and German vans attest. 100 MIPS can follow a moderately complex and changeable feature like a road boundary---as the DARPA ALV effort demonstrated. 1,000 MIPS might be adequate to give coarse-grained three dimensional spatial awareness---suggested by several low resolution stereo vision programs and my occupancy grid experiences. 10,000 MIPS should be able to locate three dimensional objects in clutter---suggested by several ``bin-picking'' and fine grain stereo vision demonstrations, which were able to accomplish the task in an hour at 1 MIPS. There is not much robotics data beyond this point---with available computing power, research careers are too short to endure the necessary experiments. Are these numbers out of line? The more sophisticated vision programs interpret scenes by searching a vector space of alternative interpretations, evaluating candidates by computing statistical expressions on millions of pixels. These are truly enormous computations---combinatorial searches on the outside, megascale numerical evaluations on the inside, and call for extreme cleverness (and many cut corners). Naive implementations are often a hundred times slower that the estimates in the last paragraph. For a sense of perspective, consider the easier (because no need for search over alternatives) inverse problem of computer graphics. For smooth real-time animated video, 1 MIPS suffices to make simple line drawings---spacewar at SAIL. 10 MIPS can animate complex colored two dimensional shapes or simple three dimensional models---CAD programs in current workstations. 100 MIPS is enough for cartoonlike presentations of three dimensional scenes with a few thousand facets---specialized graphical workstations like Silicon Graphics Iris. 1,000 MIPS can generate scenery with tens of thousands of facets just adequate for daytime aircraft flight simulators---Evans and Sutherland CT5. Realistic imagery will require several orders of magnitude more, to judge from ``almost real'' single frames shown by Pixar and others, which take many hours at 1,000 MIPS. What about animal vision systems? The first stages of human vision occur in the retina, the best understood major part of the vertebrate nervous system, whose product is neatly packaged in the optic nerve. The optic nerve is a bundle of a million fibers, each carrying the results of an edge or motion computation that, in an efficient computer implementation, would require execution of at least 100 instructions. With an effective frame rate of 10 per second, the net computing equivalent of the whole retina is 1,000 MIPS \cite{HPM88mc}. The rest of the visual system is thousands of times larger, so human visual processing power may exceed the equivalent of 1,000,000 MIPS, a number fully consistent with the experiences of robotics vision research. Frankly, it would take a miracle to bridge the gap between the research experience and John's 1 MIPS guess for the difficulty of robot car vision. Mathematicians have speculated about the mathematical oversights that led Fermat to make his famous, almost certainly mistaken, marginal note. I will rudely do the same with John's estimate. John's had in mind the automation of reasoning when he coined ``Artificial Intelligence''. When electronic computers were new their most notable property was a prodigous ability to do arithmetic---superhuman by a factor of thousands. AI was the attempt to harness this powerful engine to other mental tasks. Reasoning programs cut computer power down to human size, but no further, and set the calibration on John's intuition. You can always tell the pioneers---they're the ones face down in the trail with the arrows in the back. To change the metaphor, let me describe the view from this giant's shoulders. Rational thought---reasoning---is a recent evolutionary invention, with little time or competition to perfect it. We don't do it very well, and it doesn't take much of a machine to match us. Arithmetic is even worse. But seeing and moving around, finding food and escaping enemies, has been a life or death issue since before we had a backbone, and the competition has been fierce. We are enormously optimized in those functions, and it will take a powerful machine to match us there. In \cite{HPM88mc} I've made my own calibration, which is that a whole human mind is the equivalent of 10,000,000 MIPS. Now, when doing arithmetic, I achieve about $10^{-8}$ MIPS effective. Playing chess, maybe 0.001 MIPS, as calibrated by good chess programs. Walking home, dodging traffic, maybe 1,000,000 MIPS. Grandmasters seem to find ways to harness the more powerful parts of their nervous systems for their tasks---by visualizing, feeling, hearing, perhaps mnemonizing, with different tasks giving different amounts of maximum purchase. The best lightning calculators still do only about $10^{-7}$ MIPS of arithmetic, but grandmaster chess machines suggest that best human performance there is worth about 10,000 MIPS. Do these estimates relegate John's self-driving cars into the indefinite future? I think not. Electronic computers have become 1,000 times more powerful every twenty years since they appeared a half century ago. Affordable 100 MIPS machines are almost here, and 1,000 MIPS will be available by the end of the decade. Specialized circuitry can provide a hundredfold speedup of systematic computations like those in vision and spatial modeling, allowing an effective power of 100,000 MIPS by then---a match for the visual system of a monkey. With a little help from external sources like satellite navigation and digitized maps, perhaps an auto-auto can find its own way to celebration $110_8$. \begin{thebibliography}{} \bibitem{JMC69} John McCarthy. \newblock Computer Controlled Cars. \newblock In SAIL computer directory [ESS,JMC] files {\em CAR.ESS} (1968) and {\em CAR.TEX} (1975). \bibitem{PDQ71} Lynn H. Quam. \newblock {\em Computer Comparison of Pictures}. \newblock Stanford AI Memo AIM-144, Stanford Computer Science Department, July, 1971. \bibitem{RAS71} Rodney A. Schmidt. \newblock {\em A Study of the Real-Time Control of a Computer Driven Vehicle}. \newblock Stanford AI Memo AIM-149, Stanford Computer Science Department, August, 1971. \bibitem{HPM80} Hans P. Moravec. \newblock {\em Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover}. \newblock Stanford AI Memo AIM-340, Stanford Computer Science Department, May, 1980. \bibitem{MEL89} T. Yatabe, T. Hirose and S. Tsugawa. \newblock Driving Control Method for Automated Vehicle with Machine Vision. \newblock {\em Journal of Mechanical Engineering Laboratory}; vol.43, no.6; Nov. 1989; pp. 267-75. \bibitem{ALV89} S. J. Hennessy and R. H. King. \newblock Future mining technology spinoffs from the ALV program. \newblock {\em IEEE Transactions on Industry Applications}; vol.25, no.2; March-April 1989; pp. 377-84. \bibitem{CET90} Charles E. Thorpe, editor. \newblock {\em Vision and navigation : the Carnegie Mellon Navlab}. \newblock The Kluwer international series in engineering and computer science. \newblock Boston : Kluwer Academic Publishers, c1990. xiv. \bibitem{DED90} E. D. Dickmanns, B. Mysliwetz and T. Christians. \newblock An integrated spatio-temporal approach to automatic visual guidance of autonomous vehicles. \newblock {\em IEEE Transactions on Systems, Man and Cybernetics}; vol.20, no.6; Nov.-Dec. 1990; pp. 1273-84. \bibitem{WGW61} W. Grey Walter. \newblock {\em The Living Brain}. \newblock Middlesex : Penguin Books, 1961. \bibitem{NW65} Norbert Wiener. \newblock {\em Cybernetics, or Control and Communication in the Animal and the Machine}. \newblock Cambridge : MIT Press, 1965. \bibitem{MK86} Mark Kadonoff, F. Benayad-Cherif, A. Franklin, J. Maddox, L. Muller, B. Sert and H. Moravec. \newblock Arbitration of Multiple Control Strategies for Mobile Robots. \newblock {\em SPIE conference on Advances in Intelligent Robotics Systems}, Cambridge, Massachusetts, October 26-31, 1986. \newblock In SPIE Proceedings Vol 727, paper 727-10. \bibitem{ROD89} Rodney A. Brooks. \newblock A robot that walks; emergent behaviors from a carefully evolved network. \newblock {Neural Computation}; vol.1, no.2; Summer 1989; pp. 253-62. \bibitem{CET84} Charles E. Thorpe. \newblock {\em FIDO: Vision and Navigation for a Mobile Robot}. \newblock Carnegie Mellon Computer Science report CMU-CS-84-168, Autumn 1984. \bibitem{LHM89} Larry H. Matthies \newblock {\em Dynamic Stereo Vision}. \newblock Carnegie Mellon University Computer Science report CMU-CS-89-195, October 1989. \bibitem{HPM88} Hans P. Moravec \newblock Certainty Grids for Sensor Fusion in Mobile Robots. \newblock {\em AI Magazine}, Summer 1988, pp 61-77. \bibitem{AE89} Alberto Elfes. \newblock Using Occupancy Grids for Mobile Robot Perception and Navigation. \newblock {\em IEEE Computer Magazine}, special issue on Autonomous Intelligent Machines, June 1989, pp. 46-58. \bibitem{HPM88mc} Hans Moravec. \newblock {\em Mind Children: the future of robot and human intelligence}. \newblock Cambridge : Harvard University Press, 1988. \end{thebibliography} \end{document}