Robust Navigation
by
Probabilistic Volumetric Sensing

Carnegie Mellon University
The Robotics Institute
5000 Forbes Avenue
Pittsburgh, PA 15213

Time Period: May 1, 1999 to April 30, 2002


PROPOSAL

1. DARPA ITO Solicitation: BAA #99-09 Mobile Autonomous Robot Software

2. Technical Topic Area: Soft Computing

3. Proposal Title: Robust Navigation by Probabilistic Volumetric Sensing

4. Technical Contact: Hans Moravec
email: hpm@cmu.edu
tel: 412-268-3829 or 412-363-9759
fax: 412-363-9834
postal: Robotics Institute
5000 Forbes Avenue
Carnegie Mellon University
Pittsburgh, PA 15213


A. Innovative claims for the proposed research (1)

Our goal is laboratory prototype sensor-based software for utility mobile robots for industrial transport, floor maintenance, security etc., that matches the months-between-error reliability of existing industrial robots without requiring their expensive worksite preparation or site-specific programming. Our machines will navigate employing a dense 3D awareness of their surroundings, be tolerant of route surprises, and be easily placed by ordinary workers in entirely new routes or work areas. The long-elusive combination of easy installation and reliability should greatly expand cost-effective niches for mobile robots, and make possible a growing market that can itself sustain further development.

We will employ 3D grids of spatial occupancy evidence, a technique we have been developing for two decades [A2]. 2D versions of the approach found favor in many successful research mobile robots [A0], but seem short of commercial reliability. 3D, with 1,000 times as much world data, was computationally infeasible until 1992, when when we combined increased computer power with 100x speedup from representational, organizational and coding innovations. In 1996 we wrote a preliminary stereoscopic front end for our fast 3D code, and the gratifying results convinced us of the practicability of the approach, given about 1,000 MIPS of computer power [A3]. We seek support for a sustained, single-minded effort to parlay that start into a universally convincing demonstration, just as the requisite computing power arrives.

The work has natural three stages: completion and improvement of the basic perception code; creation of an identification layer for navigation and recognition of architectural features; finally, sample application programs that orchestrate the other abilities into practical behaviors like patrol, delivery and cleaning. We need both capability and efficiency. The existing code allows one-second time resolution with 1,000 MIPS, but our 3D maps have millions of cells, and straightforward implementations of path planning, localization, object matching, etc. would be much slower. We will combine techniques like coarse-to-fine matching, selective sampling and alternate representations to get the desired results at sufficient speed.

Ideas awaiting evaluation include variable resolution trinocular stereoscopic matching; optimization of sensor evidence profiles and other system parameters using robot images of the scene to first naturally colorize the occupied cells in a grid, then to check its faithfulness, to close the learning loop; randomly textured light to improve stereo of uniform surfaces; coarse-to-fine path finding; occupied-cell-list matching of grids for fast localization and feature identification; and others. Ongoing work will lead us to more.


B. Technical Rationale, Approach and Plan
(18)

This work is to be the finishing sprint in a long research journey. Milestones on that journey include:

1) A 1970s Stanford thesis, which stereoscopically tracked dozens of localized 3D visual features to servo a robot called the Cart through obstacle courses, the first time this had been done in arbitrary environments. It used 10 min/meter on a 1 MIPS computer, and made navigation errors about every 100 meters. [A1]

2)
1980s invention of grids accumulating evidence of occupancy. This noise-tolerant representation allowed kilometers of travel between errors and reduced computational cost when used in 2D. A majority of the dozens of indoor robots cruising research hallways today use 2D grids, needing about 50 MIPS. [A2]

3)
Very fast 3D grid code written in a 1992 sabbatical at Thinking Machines Corp. 3D grids are 1,000 times as big as 2D, due to the added dimension, and because they invite higher resolution. Use of TMC’s CM5 supercomputers was planned, but an intense series of representational and coding innovations produced a program 100 times faster than anticipated, usable on 30 MIPS workstations. It remains the only practical 3D grid implementation. [A3]

4)
Stereoscopic front end for the 3D grid program, written in a 1996 sabbatical at Daimler-Benz research. The very encouraging first results, obtained at 100 MIPS, are good enough and fast enough to promise viable autonomous navigation and other functions given 1,000 MIPS. This first implementation suggests major optimizations, improvements and extensions. [A3]

During the 1996 sabbatical work we wrote a new program that derives a dense three-dimensional evidence grid representation of a robot's surroundings from wide-angle stereoscopic images [A3]. We chose to work with stereo because, of all approaches that provide dense enough data to make good 3D maps, it promised to be the alternative most likely to be inexpensive enough for mass-market robots in the near future. The program adds several spatial rays of evidence to a grid for each of about 2,500 local image features chosen per stereo pair. It was used to construct a 256x256x64 grid, representing 6 by 6 by 2 meters, from a hand- collected test set of twenty stereo image pairs of an office scene. Fifty nine stereo pairs of an 8 by 8 meter laboratory were also processed. The positive (probably occupied) cells of the grids, viewed in perspective, resemble dollhouse scenes [see view below]. Details as small as the curvature of chair armrests are discernible. The processing time, on a 100 MIPS Sparc 20, is less than five seconds per stereo pair, and total memory is under 16 megabytes. The results strike us as abundantly adequate for very reliable navigation of freely roaming mobile robots, and plausibly adequate for shape identification of objects bigger than 10 centimeters. The program is a first proof of concept, and awaits optimizations, enhancements, variations, extensions and applications.



The program has since been used by my sabbatical hosts at Daimler-Benz in Berlin to drive a robot around office corridors. It is available on the web, and has been used by several other groups. We ourselves used it to drive a robot short distances, but have made little additional progress because of unrelated demanding commitments. Growing computer power strongly compels us now to bring the work to a practical conclusion in a sustained, concentrated, single-minded effort. We need full-time support for this. We have not found significant funding commercially. Robotics has been a historically unprofitable business, and small companies have no money for long-term efforts, while large companies have other interests. Daimler-Benz, for instance, has only a small internal robotics effort, that competes with other research projects. My hosts were able to offer us only a small gift for our work, which we used for modest equipment purchases.

Year 1 Agenda: Perception Layer

Our first year’s work we will be to implement a lengthy list of intuitively promising improvements to the existing core perception system [A3], which was a first-cut experiment. Among them:

• Adjusting the stereo ray and many other parameters with an automatic learning process to optimize the system performance. For a learning criterion, we will “project” some scene images onto range-sorted occupied cells of a grid to naturally colorize them, then compare other scene images with corresponding views of the colorized grid. The better the match, the better a representation of reality is the grid. By this means we avoid the onerous task of supplying independent “ground truth” to the learning process. The camera images already code such information, even when the robot is in the field. Our first 3D results, illustrated in the preceding image, used ad-hoc settings of all parameters. Learning should provide substantial improvement, evidenced by the following 2D grid example with data from a ring of sonar sensors on a robot traversing a highly specular hallway, where a majority or readings were misleadingly long. The upper left image is a hand-constructed ideal map of the hallway. The upper right is the hallway reconstructed using a sensor model that works well in non-specular settings, but fails here. Lower left is a reconstruction using a model partially trained for these conditions (note prominent door frames, and their specular reflections), and lower right is by a fully trained model [A2]. The learning improvement in our 3D setups will likely be less dramatic, since the initial model is not so bad.



• Augmenting our dead-reckoning by statistically registering grids constructed with partial data. The has worked well in 2D, providing sub-voxel registration (see references in [A3]). It should be even better in 3D, with more pixels. Computational cost is the main concern: we plan to a coarse to fine strategy, and a comparison concentrating on the typically mere 2% of the cells that are occupied to keep this tolerable. In the latter technique, a list of the occupied cells of one grid are mapped under a trial displacement and rotation into the cells of a second grid, and also vice-versa using the inverse translation and rotation, to calculate a goodness of match, which is to be optimized. We also plan to add a six-axis inertial navigation device to provide close initial estimates.

• Replacing monochrome binocular with color trinocular stereoscopy. Doing so will give an error-reducing redundancy to our range measurements, and allow us to detect purely horizontal features. Horizontal edges were lost to matching ambiguity in the original setup, which had two horizontally-displaced cameras. We also plan to illuminate the surroundings with textured infrared light, creating matchable features on uniformly colored surfaces, and to add several camera triplets to widen our field of view, probably to 360°.

• Using a pyramid of image resolutions in the stereoscopic disparity search, allowing it to proceed in equal range steps. Traditionally, stereo searches have been in constant resolution pixel steps, but by stepping through the near field in excessive detail the traditional approach expends a excessive of effort and generates unnecessary numbers of false positive matches at close range, where they cause the most trouble. The latter is especially a problem when an occlusion hides the correct match. Variable resolution matching will be both faster and better, because unavoidable matching errors will be spread uniformly over distance, rather than being concentrated in the robot’s immediate obstacle zone.

• Higher resolution grids, which we have observed to improve performance. With careful algorithm choice, the main cost of modest resolution increases is in required memory.

• Adding data from other sensors. Most of our robots carry sonar and contact transducers, and the grid representation allows evidence from all sources to be combined.

• Many subtle improvements in matching functions, search strategies, data representations, etc. For instance, we have recently discovered that cross entropy has superior properties for matching than the similar log probability formula we had been using.

Year 2 Agenda: Recognition Layer

In the second year we will build a layer of recognition software that uses the data collected in the grid to navigate and locate important world features. In this group will be procedures to:

• Plan and execute obstacle-avoiding paths in grid spaces. Grids are very good for this purpose, because they provide positive indications of empty regions. We have already written an A* path planner, which works, but is several times too slow. The following image is an example of a path from this program operating in a robot-height slab of the office grid shown in the first illustration. A coarse-to-fine strategy or other approximations should allow us to .



• Store away and retrieve 3D map snapshots of the surroundings in a compressed form, and to volumetrically register a current map against a stored map. This technique can be orchestrated into a route-learning program, in which a trail of snapshots is stored on a training run, against which to position the robot when it travels autonomously. Differencing of maps registered by large-scale features could be used to locate smaller objects that have moved. Computational efficiency is major concern.

• Extract geometric architectural features, for instance to identify open floor area, walls, doors, corridors and rooms.

• Match smaller scale volumes, for identifying furniture-scale objects, including people, using techniques probably similar to map matching but optimized for smaller volumes and articulated parts.

• Build long-range topological maps by piecing together local geometric grids with approximate transforms.

• Generate paths in topological maps that minimize travel, sweep the ground, or provide best vantage points, as needed by anticipated applications.

Year 3 Agenda: Application Layer (Extension Option)

The first two years of the project are naturally capped by a third during which we would tune up and orchestrate basic perception and identification abilities into complete useful applications. Since the BAA solicited a 24 month effort, we propose this third year as an optional extension.

In the third year we would construct several demonstration applications in a third layer of code. The idea is to use the powerful lower layers to create task-specific but location-independent programs. For instance, a point-to point delivery program will be constructed to learn any route by saving a trail of map snapshots in a training run. A floor cleaning program will identify the boundaries and obstacles in any environment, and plan a covering route. A security program may be built to explore a network and plan a randomized route that leaves no area unvisited for very long. These programs may be quite straightforward themselves, but their success on the reliability of the 3D maps produced by the perception layer, and the identifications extracted by the identification layer. It is certain that extended application trials will uncover subtle weaknesses in the lower layers, and we expect to spend much time in the third year dealing with such foundational issues.

Testbeds

Our robot-controlling computing platforms should be updated on an approximately annual basis, to take advantage of increasing available performance. We presently are using a dual-processor Pentium-II Dell PC that provides about 700 MIPS, and running a realtime BE Inc. operating system. By the third-year period we expect to have about 3,000 MIPS available, and may have changed operating systems more than once. Our robot code is written in C and C++ and is quite portable, and has made transitions between systems several times. We plan to have more computing power than we expect in a final commercial product to allow us to quickly test and perhaps reject new ideas without the onus of immediately optimizing every implementation for speed.

For mobile robot platforms, we are presently using the Uranus mobile robot, which we constructed ten years ago, and also retain the option of using our even older machine, Neptune. Uranus is aging, slow-moving, short on battery stamina, often unreliable, and uses difficult-to-maintain obsolete components. Neptune is simpler and more reliable, but tethered and thus usable only for very short-range experiments. To avoid expending excessive effort on robot maintenance, we think it is necessary for the success of the first years’ work to upgrade to a modern smaller yet more capable and reliable robot platform.


Test Robots: The Uranus mobile Robot, left, with trinocular cameras and sonar ring. The Neptune robot is seen at the right.

In the first two years, the an ideal platform would be a robot just big enough to carry cameras, inertial system, communications and sensor and effector controlling computer and wireless ethernet. We could then use an offboard large desktop machine for the heavy computation and display, shipping digital images and other data over the ethernet link. The robot would be small enough to extricate by hand when it gets into trouble. One suitable machine is the Nomadics Scout II, weighing 25 kg and costing about $5,000. Others, slightly less attractive, are the RWI Mach 5 and microATRV costing about $7,000 each:


Nomadics Scout II         RWI Mach 5             RWI microATRV  


Application software in the third year could be most effectively developed and tested using actual industrial vehicles. The cost of commercial machines is in the over $100,000 range, and beyond the scale of our current project. We would make an effort to gain use of machines from interested parties. To this end, we have initiated some preliminary discussions, but have not yet secured any commitments. Having the proposed research contract in place would benefit such negotiations. One contact is John Holland of Cybermotion, Inc, who has expressed interest in collaborating with our project. Cybermotion would be able to provide a very capable and sturdy “Navmaster” robot platform, worth about $100K. Over 100 of these platforms are patrolling as autonomous/remote security guards, using site-specific programming. We have also had discussions with Karcher, Inc, a large German maker of industrial cleaning machines. Karcher currently makes a radar-guided robotic floor scrubber that is in use at the Bonn airport, and could probably provide similar machines for experiments. They are interested in expanding their robotics efforts. Makers of factory AGVs (Automatically Guided Vehicles) are also candidates for collaboration. In the unlikely event we cannot gain use of any industrial machines, we would use our research robots to simulate utility applications. This would not be as satisfactory, since the endurance and basic capability of the research machines is limited, and many practical questions would remain.

More extensive presentations of the key technical details may be found in reference [A3], included with this submission.


C. Deliverables (2)

We enter the research unencumbered by proprietary claims. All our work to date has been done openly. Though many of our starting ideas have lapsed into the public domain, many others are not yet fully developed, and await first implementation. We feel it would help secure future investment for bringing our results into the world if we were to apply for patents on some of the planned research work. Our department would cover the expense of such applications, in return for half interest.

It likely that any industrial partners we enlist to provide robots or other assistance will require some interest in the work in return. We would offer such interest in planned research results on a strictly non-exclusive basis. All results from the three year effort would be available to the funding agency. We may, of course, extend the work after the contract ends in a proprietary setting.

Our deliverable results will be:

At the end of year 1 - Perception Package:

A 3D perception package that processes textured-lighting-enhanced trinocular stereoscopic images at the rate of at least one set per second using 1,000 MIPS of processing power. It may also digest data from other sensors, such as odometry, 6-axis inertial, sonar, proximity and contact. It will extract at least several thousand depth values from each stereoscopic image set, and process them into rays of evidence of spatial occupancy that are accumulated in 3D evidence grids with a resolution of about 2 cm, to a range of at about 5 meters and a height of about 2 meters, containing 4 million cells or more. The package will also include a camera calibration program and a learning procedure that optimizes evidence ray models and other system parameters to optimize a comparison of scene images with corresponding synthetic images of grid reconstructions. The program will perform substantially better than our 1996 program [A3]. The package will be demonstrated on a small test robot platform that moves to collect views from multiple vantage points, linked by wireless ethernet to a large desk computer with at least 1,000 MIPS, which will display images, 3D and 2D maps and navigational information. Because of image digitization and communication delays and computational limits, it is likely the robot will work in a start/stop mode, collecting image then pondering them before moving again. The pauses should be on the order of seconds.

We will provide the latest version of the recognition software and a complete report and documentation for it at the end of the first year. We will also make them available on our web pages, as we have our work in past. Some aspects of the new work may have patents applied for by that time.

At the end of year 2 - Identification Package:

An additional set of identification routines that extract from the 3D maps produced by the first layer the following: navigable paths, localization by matching maps, major architectural features, a few furniture-scale objects and people, long-range topological maps and paths in such maps. The combination will be demonstrated using the small robot system of year 1, but with the desk computer upgraded or replaced to provide at least 1,500 to 2,000 MIPS, and perhaps with the robot augmented with additional sensors we find helpful. With the identification layer, the robot will be able to navigate around a network of corridors and offices and build a sparse labeled map, and can be commanded to go to specified locations. Since the code will be new and little-tested, these demonstrations are unlikely to be as reliable as required for actual applications. Also, the robot will probably work in a start/stop mode, collecting images, transmitting and pondering them between moves.

We will provide the latest version of the two software layers and a complete report and documentation for them at the end of the second year. We will also make them available on our web pages, as we have our work in past.

(Optional Third Year)
At the end of year 3 - Application Packages:

We will produce several specialized programs as if for commercial mobile robots that use the perception and identification layers to orchestrate simulated commercial robot tasks, including point to point delivery, floor traverse in a cleaning pattern and area patrol as for a security robot. These will be demonstrated using our test robot, with the offboard processor upgraded or replaced to have at least 3,000 MIPS. The application programs will allow the robot to be quickly “trained” to do its job in new locations, either by leading the robot itself along a path or the boundaries of its work area, or by outlining those on a map it constructs and displays. It would be desirable in this third year to conduct more realistic tests of the system using on one or more industrial vehicles. Possibly the funding agency would be able to provide such. In any case, we would attempt to find an industrial partner who could both supply vehicles and perhaps provide a first steps toward commercialization.

We will provide the latest version of all three software layers and a complete report and documentation for them at the end of the contract. We will also make them available on our web pages, as we have our work in past.

D. Statement of Work
(3)

In Year 1

We will assemble an experimental robot system consisting of a small robot with an onboard portable computer for communications. The robot will carry at least three digital cameras and a six-axis inertial system linked to the onboard computer. Also linked will be the robot’s control channel, and any sensors that come with the basic robot, for instance sonar units. The machine will also be equipped with a wireless ethernet transceiver. Offboard, connected to the other side of the wireless ethernet, we will set up a high-end personal-computer-class machine dedicated to robot control. The make, options and software for this machine will be determined at time of purchase to obtain at least 1,000 MIPS and sufficient memory and disk, within our budget. We presently are using a dual-processor Dell workstation 400 that provides about 700 MIPS running BeOs 3 operating system, in this mode, using a wired connection to our obsolete Uranus mobile robot.

With the hardware described, we will complete and improve an existing preliminary 3D perception package that processes textured-lighting-enhanced trinocular stereoscopic images at the rate of at least one set per second. It may also digest data from other available sensors, such as odometry, 6-axis inertial, sonar, proximity and contact. It will extract at least several thousand depth values from each stereoscopic image set, and process them into rays of evidence of spatial occupancy that are accumulated in 3D evidence grids with a resolution of about 2 cm, to a range of at about 5 meters and a height of about 2 meters, containing 4 million cells or more. The package will also include a camera calibration program and a learning procedure that optimizes evidence ray models and other system parameters to optimize a comparison of scene images with corresponding synthetic images of grid reconstructions. The program will perform substantially better than our 1996 program [A3]. The package will be demonstrated on a small test robot platform that moves to collect views from multiple vantage points, linked by wireless ethernet to a large desk computer with at least 1,000 MIPS, which will display images, 3D and 2D maps and navigational information.

In Year 2

We will write an additional set of identification routines that extract from the 3D maps produced by the first layer the following: navigable paths, localization by matching maps, major architectural features, a few furniture-scale objects and people, long-range topological maps and paths in such maps. The combination will be demonstrated using the small robot system of year 1, but with the desk computer upgraded or replaced to provide at least 1,500 to 2,000 MIPS, and perhaps with the robot augmented with additional sensors we find helpful. With the identification layer, the robot will be able to navigate around a network of corridors and offices and build a sparse labeled map, and can be commanded to go to specified locations. Since the code will be new and little-tested, these demonstrations are unlikely to be as reliable as required for actual applications.

In Year 3 (Optional Year)

We will write at least three specialized programs as if for commercial mobile robots that use the perception and identification layers to orchestrate simulated commercial robot tasks, including point to point delivery, floor traverse in a cleaning pattern and area patrol as for a security robot. These will be demonstrated using our test robot, with the offboard processor upgraded or replaced to have at least 3,000 MIPS. The application programs will allow the robot to be quickly “trained” to do its job in new locations, either by leading the robot itself along a path or the boundaries of its work area, or by outlining those on a map it constructs and displays. It would be desirable in this third year to conduct more realistic tests of the system using on one or more industrial vehicles. Possibly the funding agency would be able to provide such. In any case, we would attempt to find an industrial partner who could both supply vehicles and perhaps provide a first steps toward commercialization.





E. Schedule of Milestones
(1)

At the end of year 1 - Perception Package

A 3D perception package that processes textured-lighting-enhanced trinocular stereoscopic images at the rate of at least one set per second using 1,000 MIPS of processing power. The program will perform substantially better than our 1996 program [A3]. The package will be demonstrated on a small test robot platform that moves to collect views from multiple vantage points, linked by wireless ethernet to a large desk computer with at least 1,000 MIPS, which will display images, 3D and 2D maps and navigational information.

At the end of year 2 - Identification Package

An additional set of identification routines that extract from the 3D maps produced by the first layer the following: navigable paths, localization by matching maps, major architectural features, a few furniture-scale objects and people, long-range topological maps and paths in such maps. The combination will be demonstrated using the small robot system of year 1, but with the desk computer upgraded or replaced to provide at least 1,500 to 2,000 MIPS, and perhaps with the robot augmented with additional sensors we find helpful. With the identification layer, the robot will be able to navigate around a network of corridors and offices and build a sparse labeled map, and can be commanded to go to specified locations. Since the code will be new and little-tested, these demonstrations are unlikely to be as reliable as required for actual applications.

(Optional Third Year)

At the end of year 3 - Application Packages

At least three specialized programs as if for commercial mobile robots that use the perception and identification layers to orchestrate simulated commercial robot tasks, including point to point delivery, floor traverse in a cleaning pattern and area patrol as for a security robot. These will be demonstrated using our test robot, with the offboard processor upgraded or replaced to have at least 3,000 MIPS. The application programs will allow the robot to be quickly “trained” to do its job in new locations, either by leading the robot itself along a path or the boundaries of its work area, or by outlining those on a map it constructs and displays. We may also port the program to an industrial robot, if such is made available.


F. Technology Transfer
(2)

We target our research towards commercial realization of free-ranging and self-installing mobile robots. We chose 3D grid mapping because it promised commercially viable reliability, use video camera sensors for their combination of high data density and low cost, and target one second map updates to keep up with practical indoor robot speeds. As the proposed round of research nears completion, we will seek industrial partners or private financing to develop marketable products.

A projected first commercial product from this effort is a basketball-sized “navigation head” to be retrofitted on existing robots, providing them with full autonomy (8-faced head, left; potential customer, right):


The navigation head would contain 360 degrees of stereoscopic cameras and other sensors, an inexpensive inertial navigation system to inform it about small motions without depending on accurate odometry from robot wheels, 1,000 MIPS of computational power, software for basic navigation, and software hooks for applications specific programming and hardware interfaces for vehicle controls, probably through a standard connector installed into the vehicles. Offered as a retrofit for existing factory transport machines, self-powered floor cleaners and others, it could quite possibly expand the market for robotic vehicles tenfold from the current tens of thousands, and might reach millions if it could be programmed to be a safety assist or autonomous controller for forklifts.

After some years of development, we imagine the technique’s costs will fall and the performance rise enough to permit the design of truly mass-market products. A possible first product with mass-market potential might be a small robot vacuum cleaner, which can reliably and systematically keep clean designated rooms in a home following a simple introduction to the location, keeping itself charged, and delivering dust to a base collecting station.


The simple vacuum cleaner may be followed by larger and smarter utility robots with dusting arms. Mobile robots with dexterous arms, vision and touch sensors will be be able to do various tasks. With proper programming and minor accessories, such machines could pick up clutter, retrieve and deliver articles, take inventory, guard homes, open doors, mow lawns or play games. New applications will expand the market and spur further advancements, when existing robots fall short in acuity, precision, strength, reach, dexterity or processing power. Capability, numbers sold, engineering and manufacturing quality, and cost effectiveness should increase in a mutually reinforcing spiral. Ultimately the process could result in “universal robots” which can be do many different tasks, orchestrated by applications-specific programs.



G. Comparison with other Research (3)

Self-guiding mobile robots for transport, cleaning, and inspection have existed for decades, but only in locations that warranted extensive installation cost and tolerated inflexible behavior. A few tens of thousands of AGVs (Automatically Guided Vehicles) are at work in factories, warehouses and other institutional locations. Many are guided by buried signal-emitting wires defining their routes, with switches signaling endpoints and collisions, a technique developed in the 1960s. More advanced models, made possible by the advent of microprocessors in the 1980s, use more flexible cues, such a optical markings on a tiled floor. The latter may use ultrasonics and infrared proximity sensors to detect and possibly negotiate their way around obstacles. The most advanced machines, manufactured in the late 1980s and 1990s, are guided by navigational markers, such as retroreflective targets or active emitters at strategic locations, and by specialized site-specific programming exploiting premeasured existing features, like walls and known distances.

The newer systems can be installed with less physical effort, but all require the services of a specialist to program the initial setup, and for layout and route changes. The expense, inconvenience, inflexibility and lack of independence stemming from the elaborate setup greatly reduces the potential market for these systems. Only very stable and high-value routes are candidates, and the high cost and rigid behavior usually puts them at an economic disadvantage to human-guided vehicles. Fully autonomous robots, which could be simply shown or led through their paces by nonspecialists, then trusted to execute their tasks reliably, would have a far greater market.

A generation of fully autonomous mobile robots, which navigate without route preparation, has emerged in research laboratories worldwide in the 1990s, made possible by more powerful microprocessors. The majority of these use sonar or laser rangefinders or very specialized image processing to build coarse two-dimensional maps of their surroundings, from which they locate themselves relative to their surroundings, and plan paths between destinations. Some of the best such robots, winners of various competitions, are described in reference [A0]. Presented are both fully autonomous systems, and ones that require some human installation. The limited information in horizontal maps allows for certain ambiguities, and when navigating in new areas the fully autonomous machines have a mean time between difficulties (getting lost, stuck, or even falling) measured in hours. Previous generations of commercial robots were not accepted by customers until the between-failure time exceeded several months. It is our judgment, based on our own experiences and from observing other projects, that two-dimensional autonomous navigation is unlikely ever to meet this standard, but that 3D perceiving systems, with about 1,000 times the world information, probably can.

Some projects have used high-quality data from scanning laser rangefinders to map rough outdoor terrain. These “2 1/2” D approaches are not suitable indoors, where many flat surfaces act as mirrors, and where complex clutter simply cannot be modeled by a terrain elevation map. Also, these laser techniques, which process small numbers of range measurements for each elevation point, fail when a high percentage of range measurements are erroneous, as is typical indoors.

A few attempts to make 3D maps in past simplified the problem by reducing the data per cell to a single bit, suitable for planning paths, but useless for statistically merging noisy measurements over long periods. These works only in very limited circumstances, when inflowing data was very high quality. Other attempts tried to reduce the memory require by structuring the data in “oct-trees” that merge cells with similar values. We know of none of the latter that was carried to completion: a few experiments quickly convinced that the computing cost, very high at best, became prohibitive.

Our own mobile robot research , ongoing since the 1970s, has investigated sparse three-dimensional models and dense two-dimensional grid maps, using camera, sonar and laser sensors. In the early 1990s we guessed that practical indoor navigation based on two-dimensional maps could be accomplished given about 10 MIPS, but that 3D grid mapping would require about 10,000 MIPS and 100 megabytes of memory, both unavailable in the 1 MIPS mainframe computers that we had been using. We attempted, nevertheless, to gain some experience with 3D by spending our 1992 sabbatical year at Thinking Machines Corp., planning to use their CM5 multiprocessor supercomputer to do experiments. After a few trials, we stumbled on approximations and economies of scale in 3D grids that allowed us to produce a uniprocessor program that was about 100 times faster than our extrapolations from efficient 2D programs had led us to expect. This, combined with 20 MIPS that had become available in computer workstations, put us on track to a practical 3D grid robot controller in the foreseeable future. We added a perceptual front end to the basic measurement-accumulating 3D package during another sabbatical in 1996 at Daimler-Benz in Berlin, and almost immediately obtained heartening results.

To our knowledge, our 3D grid programs are the only effective package of this kind now available. We have been pushing the limits, which has been possible partly because we did not feel pressured to mount real-time robot demonstrations, and have been able to work with programs that were somewhat impractical because they were computationally too demanding. But the time has come for the biggest push, which we think will bring us to practicality in just a few more years. The proposed support would enable us to mount the extended single-minded effort required to deliver useful free-ranging, spatially-aware robot controllers just as the requisite computing power arrives.

H. List of Key Personnel
(3)

Hans P. Moravec

The PI is Hans Moravec, a principal research scientist with the Carnegie Mellon Robotics Institute. He plans to apply his undivided time and effort to the proposed work for the next several years. He was able work with such singleness of purpose during his sabbatical years of 1992 and 1996, but unrelated contracts and other commitments distracted in other recent years. In preparation for the proposed work he has finished all prior commitments, and refrained from making others.
Moravec did his PhD work at Stanford from 1971 to 1979 working with the Stanford Cart, which became the first computer-controlled robot to visually map and drive its way through natural obstacle fields in 1979. In 1980 he founded Carnegie Mellon’s Mobile Robot Laboratory, where he and a student, Alberto Elfes, first implemented the idea of spatial occupancy evidence grids (in 2D) in 1983, which proved a very robust way to combine blurry sensor data into sharp maps, and has since been adopted by a majority of research mobile robot projects. In the latter 1980s he devised several learning techniques for optimizing the sensor models used to add data to grids. In the 1990s he began serious efforts to implement grids in 3D, and during a 1992 sabbatical, planned for supercomputer access, instead produced an unexpectedly fast 3D grid program that ran on workstations. He and a student, Martin C. Martin, made some modest experiments with feeding the program stereoscopic range data between other commitments from 1993-1995. In 1996 he undertook another concentrated sabbatical effort, this time at a robot research facility in Berlin maintained by Daimler-Benz AG, and produced a carefully constructed program that reconstructed maps of stereoscopically-viewed scenes that rather clearly was good enough to be used for navigation and other purposes. The years following 1996 were again distracted by unrelated contracts. He feels now the need for a directed effort as concentrated as that during his sabbaticals, but extended to three years.
Besides key concepts and research skills, Moravec brings to the project some industrial experience. From 1983 to 1997 he was a consultant and founding director of Denning Mobile Robotics Inc., and ultimately unsuccessful company that nevertheless produced and sold some useful robots, for security, warehouse transport, TV news-studio camera motion, floor cleaning and research. He has maintained contacts with other robot entrepreneurs, for instance John Holland of Cybermotion, Inc., a company with similar interests and products that is still operating. Of, course, he looks forward to robot companies that thrive and grow, rather than merely survive.
Moravec’s full CV can be found at

/users/hpm/hpm.cv.html




Martin C. Martin

Martin joined the Mobile Robot Laboratory and became as a PhD student in 1993, having already done noted undergraduate work in vision-based robotics at the University of Toronto. He quickly proved his skills by modifying a sensor model learning program to efficiently use a conjugate-gradient hill climbing strategy and learn 100 times faster than the original implementation. He also put together a first front end for the 3D grid program, and restored and conducted navigation experiments with the Lab’s aging Uranus Mobile robot, as well as being a valuable participant in a half-dozen other robotics efforts not directly related to this proposal. Recently, towards completing his thesis, Martin did all the planning, hardware, firmware, systems work and high-level programming necessary to assemble the system described elsewhere in this proposal, that uses an off-board dual-processor Dell computer to control the Uranus robot. His system serves as a prototype model for the testbeds we propose. Martin’s thesis topic, to automatically evolve effective high level organizations to integrate lower-level sensing and control techniques into maximally-effective overall robot controllers, is tangential to the exact focus of this proposal, but his participation has been an essential part of the work so far, and will continue to be so until he graduates. He is likely to graduate within the contract period, in which case another student or perhaps additional fractional effort from regular engineering employees will be needed to fill the void.
Martin’s full CV is at

/users/mcm/cv



25% Engineering Support

In addition to the full-time participants, there is often a need for specialized mechanical, electronic or software expertise in our work. It is time-consuming, inefficient and distracting for the researchers to always become skilled in every relevant facet of building and keeping the robot system operational, and we often feel to need to contract out some of the necessary work. The quality of the work is usually superior when we do. Our work volume does not warrant a full-time person, but we easily generate several month-long projects a year, for things like building interface hardware and writing device drivers, building cables and test stands, interfacing to university networks, managing software upgrades and the like. Fortunately there are many skilled individuals in the university and nearby whose assistance we can contract as necessary.


I. Description of Facilities (1)


Carnegie Mellon University and its School of Computer Science maintain a world-class networked computer facility, which we will use a matter of course.

The Robotics Institute has not only computers, but extensive electronic and mechanical shop facilities, which we are free to use as needed. There are at least a dozen other mobile robot projects within the institute with whom we can share advice and sometimes personnel.

The Mobile Robot Laboratory maintains a small mechanical, electronic and computer workshop, and its own stable of aging robots. For daily work, we have several Sun, Silicon Graphics, Apple and Intel-based computers, and adequate office budget to keep them maintained and upgraded.

By the time this contract would be in effect, the School of Computer Science will have completed construction of a large new building. Our test robot would be located mainly on a third floor of this building, a floor that has 40,000 square feet, 80 offices and labs, a huge two-storey common area with an atrium, wide walkway access to an outside parking lot should we wish to conduct outdoor experiments, and also elevators to other floors, should we wish to do multi-floor experiments and explore even more varied indoor environments, including catwalks, machine shops, and large-robot test facilities. At times there will be many people milling about, and we feel this is an essential part of our test environment. Using a small robot will minimize any danger to people in these tests.




J. Cost by Task (1)

The proposed research involves the full time effort of a principal investigator and a student, and 1/4 time access to engineering talent. The first year’s budget has equipment money to cover the cost of acquiring, interfacing and making operational a small mobile robot system with a small onboard portable computer and a fast wireless link to a high-end deskbound robot-controlling computer. Also required in the first year are a minimum of three digital cameras that can be mounted on the robot and interfaced to the onboard computer. The equipment budget in subsequent years is for upgrading this system as faster, or otherwise improved, components become available. The deskbound computer is the main focus, but faster cameras and communication link will also be necessary to transition from start/stop to continuous robot motion.


SECTION III

Bibliography


A0: Kortenkamp, D., Bonasso, R. and Murphy, R., Artificial Intelligence and Mobile Robots: Case Studies of Successful Robot Systems, AAAI Press, MIT Press, 1998.

A1:
Moravec, H., The Stanford Cart and The CMU Rover, Proceedings of the IEEE, July 1983.
(COPY INCLUDED)
/users/hpm/
project.archive/robot.papers/1983/ieee83.mss

A2: Martin, M. and Moravec, H., Robot Evidence Grids, CMU Robotics Institute Technical Report CMU-RI-TR-96-06, March 1996.
(COPY INCLUDED)
/users/hpm/
project.archive/robot.papers/1996/RobotEvidenceGrids.abs.html


A3: Moravec, H., Robot Spatial Perception by Stereoscopic Vision and 3D Evidence Grids, CMU Robotics Institute Technical Report CMU-RI-TR-96-34, September 1996.
(COPY INCLUDED)

/users/hpm/
project.archive/robot.papers/1996/9609.stereo.paper/SGabstract.html


A4: Moravec, H., Autonomous Free-Ranging Utility Robots for Mass Market before 2005, January 1998.
/users/hpm/
project.archive/robot.papers/1998/busplan.overview.html

A5: Moravec, H., Curriculum Vita
/users/hpm/hpm.cv.html