BAA #99-09 Proposal Abstract Page #
12/1/98 - Carnegie Mellon - Hans Moravec


1. DARPA ITO Solicitation: BAA #99-09 Mobile Autonomous Robot Software

2. Technical Topic Area: Soft Computing

3. Proposal Title: Robust Navigation by Probabilistic Volumetric Sensing

4. Technical Contact: Hans Moravec
tel: 412-268-3829 or 412-363-9759
fax: 412-363-9834
postal: Robotics Institute
5000 Forbes Avenue
Carnegie Mellon University
Pittsburgh, PA 15213

5. Administrative Contact: Sandra Rocco
tel: 412-268-3744
fax: 412-268-5571
postal: Robotics Institute
5000 Forbes Avenue
Carnegie Mellon University
Pittsburgh, PA 15213

6. Summary of costs: Total: $970K
Year 1: $311K (of this, $155K is options)
Year 2: $323K (of this, $161K is options)
Year 3: $336K (of this, $168K is options)

7. Type of Business: Other Educational

Innovative claims for the proposed research

We will demonstrate 3D sensing, mapping and navigation software capable and reliable enough for practical free-ranging mobile robots in first-tier indoor applications like warehouse transport, floor maintenance, security and similar. The techniques are useful for other applications, but we target mass commercial markets, a self-sustaining development that we feel would greatly accelerate mobile robot progress. [ref 5] No large markets yet exist, but experience with recent small-market mobile robot products (obtained in work with Denning Mobile Robots, Cybermotion and other companies [ref 4]) suggest its requirements. Among them are robots that can be easily assigned new routes or work areas by ordinary workers, but which also can operate at least six months without errors. Existing products require specialized site-specific preparation, violating the first requirement. Experimental autonomous robots using 2D mapping fall far short of the second requirement, but the 3D maps described here, with 1,000 times as much data, should sufficiently reduce the error-causing ambiguities.

This is the finishing sprint in a long journey by the PI. Milestones include:
1) A 1970s Stanford thesis, which stereoscopically tracked dozens of localized 3D visual features to servo a robot called the Cart through obstacle courses, the first time this had been done in arbitrary environments. It used 10 min/meter on a 1 MIPS computer, and made navigation errors about every 100 meters. [ref 1]
2) 1980s invention of grids accumulating evidence of occupancy. This noise-tolerant representation allowed kilometers of travel between errors and reduced computational cost when used in 2D. A majority of the dozens of indoor robots cruising research hallways today use 2D grids, needing about 50 MIPS. [ref 2]
3) Very fast 3D grid code written in a 1992 sabbatical at Thinking Machines Corp. 3D grids are 1,000 times as big as 2D, due to the added dimension, and because they invite higher resolution. Use of TMC’s CM5 supercomputers was planned, but an intense series of representational and coding innovations produced a program 100 times faster than anticipated, usable on 30 MIPS workstations. It remains the only practical 3D grid implementation. [ref 3]
4) Stereoscopic front end for the 3D grid program, written in a 1996 sabbatical at Daimler-Benz research. The very encouraging first results, obtained at 100 MIPS, are good enough and fast enough to promise viable autonomous navigation and other functions given 1,000 MIPS. This first implementation suggests major optimizations, improvements and extensions. [ref 3]

Sufficient computing power is imminent, but a multi-year sustained, single-minded effort is needed to expand the core code to practical applicability. It needs 100% technical effort by the PI and a student, with occasional engineering support. Additional support may come from industrial sources who have expressed interest.

Technical Plan

Very encouraging results were obtained in 1997 from a new program that derives a dense three-dimensional evidence grid representation of a robot's surroundings from wide-angle stereoscopic images [ref 3]. The program adds several spatial rays of evidence to a grid for each of about 2,500 local image features chosen per stereo pair. It was used to construct a 256x256x64 grid, representing 6 by 6 by 2 meters, from a hand- collected test set of twenty stereo image pairs of an office scene. Fifty nine stereo pairs of an 8 by 8 meter laboratory were also processed. The positive (probably occupied) cells of the grids, viewed in perspective, resemble dollhouse scenes [see view below]. Details as small as the curvature of chair armrests are discernible. The processing time, on a 100 MIPS Sparc 20, is less than five seconds per stereo pair, and total memory is under 16 megabytes. The results seem abundantly adequate for very reliable navigation of freely roaming mobile robots, and plausibly adequate for shape identification of objects bigger than 10 centimeters. The program is a first proof of concept, and awaits optimizations, enhancements, variations, extensions and applications. Below is a view of this

The program has since been used by my sabbatical hosts at Daimler-Benz in Berlin to drive a robot around office corridors. It is available on the web, and has been implemented by several other groups. We ourselves made little progress since 1997 because of unrelated demanding commitments. Growing computer power strongly compels us now to bring the work to a practical conclusion in a sustained, single-minded effort. We need full-time support for this. This proposal is an attempt to secure such support. As a contingency, we have included a 50% support option, which would require us to find supplementary support elsewhere. It is nearly impossible to find adequate funding commercially because robotics has been a historically unprofitable business.

Our first work we will be to implement a lengthy list of intuitively promising improvements to the existing system, which was a first-cut experiment. Among them:

• Adjusting the stereo ray and many other parameters with an automatic learning process to optimize the system performance. For a learning criterion, we will “project” some scene images onto range-sorted occupied cells of a grid to naturally colorize them, then compare other scene images with corresponding views of the colorized grid. The better the match, the better a representation of reality is the grid. By this means we avoid the onerous task of supplying independent “ground truth” to the learning process. The camera images already code such information, even when the robot is in the field.
• Augmenting our dead-reckoning by statistically registering grids constructed with partial data. The has worked well in 2D, providing sub-voxel registration. It should be even better in 3D, with more pixels. Computational cost is the main concern: we plan to a coarse to fine strategy, and a comparison concentrating on the typically 2% of the cells that are occupied to keep this tolerable. We also plan to add a six-axis inertial navigation device to provide close initial estimates.
• Replacing monochrome binocular with color trinocular stereoscopy. Doing so will give an error-reducing redundancy to our range measurements, and allow us to detect purely horizontal features. These are lost to matching ambiguity now. We also plan to illuminate the surroundings with textured infrared light, creating matchable features on uniformly colored surfaces, and to add several camera triplets to widen our field of view, probably to 360°.
• Using a pyramid of image resolutions in the stereoscopic disparity search, allowing it to proceed in equal range steps. Now we search in pixel image displacements, and consequently generate a disproportionate amount of effort and and errors at short ranges.
• Higher resolution grids, which we know improves performance. With careful algorithm choice, the main cost of modest resolution increases is in required memory.
• Adding data from other sensors. Most of our robots carry sonar and contact transducers, and the grid representation allows evidence from all sources to be combined.
• Many more, some already conceived, others to be suggested by further experience.

Once we are satisfied with the core techniques, we will build a second layer of functionality. In this group will be procedures to:

• Plan (and execute) obstacle-avoiding paths in grid spaces. Grids are very good for this purpose, because they provide positive indications of empty regions. We have already written an A* path planner, which works, but is too slow. A coarse-to-fine strategy or other approximations will probably help.
• Store away and retrieve 3D map snapshots of the surroundings in a compressed form, and to volumetrically register a current map against a stored map. This technique can be orchestrated into a route-learning program, in which a trail of snapshots is stored on a training run, against which to position the robot when it travels autonomously. Differencing of maps registered by large-scale features could be used to locate smaller objects that have moved.
• Match smaller scale volumes, for identifying furniture-scale objects.
• Extract geometric properties, for instance to identify walls and doors.
• Do other things as needed by emerging applications.

Several demonstration applications will then be implemented in a third layer of code. The idea is to use the powerful lower layers to create task-specific but location-independent programs. For instance, a point-to point delivery program will be constructed to learn any route by saving a trail of map snapshots in a training run. A floor cleaning program will identify the boundaries and obstacles in any environment, and plan a covering route. A security program may be built to explore a network and plan a randomized route that leaves no area unvisited for very long. The success of these programs relies on the reliability of the 3D maps produced by the core layer, and the identifications extracted from them.

Our computing platforms will be updated on an approximately annual basis. We presently are using a Dual-processor Pentium-II Dell PC that provides about 700 MIPS. By the end of the three-year period we expect to have about 2,000 MIPS available. For mobile robot platforms, we are presently using the Uranus mobile robot, which we constructed ten years ago, and also retain the option of using our even older machine, Neptune. Uranus is aging, slow-moving and occasionally unreliable. Neptune is simpler and more reliable, but tethered and thus usable only for short-range experiments. John Holland of Cybermotion, Inc has expressed interest in collaborating with our project. One benefit of such collaboration would be the provision by Cybermotion of a very capable and sturdy “Navmaster” robot platform, worth about $100K. About 100 of these platforms are patrolling as autonomous/remote security guards, using site-specific programming. Karcher, Inc, a large German maker of industrial cleaning machines, has also expressed interest in collaboration. They currently make a radar-guided robotic floor scrubber, and could provide similar machines for experiments.

Test Robots: The Uranus mobile Robot, left, with trinocular cameras and sonar ring. The Neptune robot is seen at the right.


1: Moravec, H., The Stanford Cart and The CMU Rover, Proceedings of the IEEE, July 1983.

2: Martin, M. and Moravec, H., Robot Evidence Grids, CMU Robotics Institute Technical Report CMU-RI-TR-96-06, March 1996.

3: Moravec, H., Robot Spatial Perception by Stereoscopic Vision and 3D Evidence Grids, CMU Robotics Institute Technical Report CMU-RI-TR-96-34, September 1996.

4: Moravec, H., Autonomous Free-Ranging Utility Robots for Mass Market before 2005, January 1998.

5: Moravec, H., Curriculum Vita