DARPA MARS program research progress

Project Title: Robust Navigation by Probabilistic Volumetric Sensing

Organization: Carnegie Mellon University

Principal Investigator: Hans Moravec

Date: July 1, 2001


The most recent update of this report can be found at http://www.frc.ri.cmu.edu/~hpm/project.archive/robot.papers/2001/ARPA.MARS/Report.0107.html


Technical Report

Objective

Ours is a 100% effort to develop laboratory prototype sensor-based software for utility mobile robots for industrial transport, floor maintenance, security etc., that matches the months-between-error reliability of existing industrial robots without requiring their expensive worksite preparation or site-specific programming. Our machines will navigate employing a dense 3D awareness of their surroundings, be tolerant of route surprises, and be easily placed by ordinary workers in entirely new routes or work areas. The long-elusive combination of easy installation and reliability should greatly expand cost-effective niches for mobile robots, and make possible a growing market that can itself sustain further development.

Approach

Our system is being built around 3D grids of spatial occupancy evidence, a technique we have been developing for two decades, following a prior decade of robot navigation work using a different method. 2D versions of the grid approach found favor in many successful research mobile robots, but seem short of commercial reliability. 3D grids, with 1,000 times as much world data, were computationally infeasible until 1992, when when we combined increased computer power with 100x speedup from representational, organizational and coding innovations. In 1996 we wrote a preliminary stereoscopic front end for our fast 3D grid code, and the gratifying results convinced us of the feasibility of the approach, given about 1,000 MIPS of computer power. This contract enables us to extend that start towards a universally convincing demonstration, just as the requisite computing power arrives.

We've altered our plan slightly. Two years ago we proposed in each of three years to successively address three software layers: basic perception, extended recognition and demonstration applications. Experience to date has led us to a more experiment-paced approach with the same end. Some results are better than anticipated, but surprises with new data remind us this is still a research effort, where even apparently settled design choices must sometimes be reconsidered and reworked.

Our experience to date has come primarily from two substantial stereoscopic image sets collected in 1996, now used far beyond their original intent. Our programs derived nearly photorealistic maps but also exposed fundamental limitations in that data. In May 2001 we collected over 600 images intended to overcome those limitations, on a calibrated test course we had been preparing for many months. Carefully adjusted trinocular images, with and without textured light, were obtained every 325 mm in each of four compass directions along a 10-meter L-shaped indoor path decorated with plants, mannequins and furniture. There were also two overhead views. Camera locations are known to 2mm or better. We are in the process of reimplementing our programs (in C++ for better modularity) to make best use of this data, and expect the resulting maps to be nearly the best our methods can achieve, a quality we will attempt to preserve in less-controlled future tests. Calibrated data allows us temporarily to avoid the problem of registering uncertain viewpoints. We are preparing code, however, that can do such registrations by matching 3D grids made from new glimpses to perviously accumulated maps. When we are satisfied with results from the calibrated run, we will process several uncalibrated runs obtained from robots following prearranged trajectories in various locations. When we are satisfied with those uncalibrated results, we will attempt autonomous runs, with new code that chooses paths as it incrementally constructs maps. When the autonomous runs go satisfactorily, we will add code to orchestrate full demonstration applications like patrol, delivery and cleaning. Though we have techniques in mind (described in earlier reports) for implementing the required functionality, we expect the experimental results to subtly or dramatically alter our plans, as they have in past.

Accomplishments for FY 2001

We have continued to developing our "learning through coloring" program, whose purpose is to optimize sensor models and other parameters that affect the quality of the gridmaps derived from sense data. As we did so the quality of the maps has palpably improved, as has the objective color variance score. We increased the number and scope of sensor model parameters adjusted by the learning, added an iterative image color adjustment process that compensates for differences in exposure in the robot images of the scene by using the colored grid (which averages the views) as a color reference.

In our last report, we noted that the program had found its best scores by setting the occupancy threshold that decides whether a cell is imaged as opaque or transparent to a negative value, effectively declaring blank maps as being fully occupied. Subsequent sensing then carved out empty regions rather than by building up the occupied parts. The resulting grids contain features, such as horizontal shelves, that were not actually detected by our legacy two-eyed stereo, but appear through a process of elimination. The number of occupied cells remained in the millions, however. The program pruned them down to about 100,000 by eliminating those that were seen from no sensing position, and thus received no color.

As we added parameters to the learning that allowed the evidence rays to be reshaped, we were delighted to find the optimum solutions now had zero rather than negative occupancy thresholds. Alas, the number of occupied cells was still almost a half million before color pruning. The program had substituted enormous extensions of the occupied portion of evidence rays. Each point that was sensed filled a meter or two of space behind it. That fill was then carved out by other evidence.

We trace much of the odd nature of the optimum solutions to deficiencies in the sensor configuration of our stock data, which includes an office scene and a more elaborate laboratory scene. In particular, horizontal two-camera stereo prevents the ranging of clean horizontal features such as shelves and lintels. Blank surfaces like walls also cannot be ranged. Also, though the learning does well in making the grid map match the views of the scene, it currently makes a mess of areas out of view: behind every object one finds a meter of occupied roughness.

One solution to the latter problem is to have more views from more directions. Our laboratory data set does this to some extent: it has front, side, back and diagonal view directions (the office data set are all frontal views), and indeed some portions of the lab data are much cleaner than the office scene. But we intend to do much better.

We built a sensor head with three cameras and a textured light source, to allow the program to range all edges and also blank surfaces. We collected a position-calibrated data set of both textured and untextured image triplets, 100 views in four compass directions along a 10 meter journey. The textured images are used for stereoscopic matching, the regularly lighted ones for coloring. In addition, the data set includes two overhead views covering much of the traverse. The overhead views are not used in the basic map-building program (since a wandering robot would not have such available), but will function as a kind of ground truth by adding color variance during the learning process, penalizing sensor models that make poor choices even in parts of the map hidden from the ground-level viewpoints. We expect this to teach the program to avoid depositing debris in the unseen spaces. For instance, though our present best maps are quite good, they have a dense "roof" of debris at the upper edge of the camera fields of view, which the overhead views will heavily penalize. Our existing code, able to use only untextured stereo pairs, produces tolerable maps from the new data. Only a portion of the new code that will fully exploit the new data is complete at the time of this writing.

In a parallel development, Martin Martin, graduate student with the project, successfully defended his Ph.D. thesis The Simulated Evolution of Robot Perception on June 11, 2001. He has accepted a post-doctoral position at the MIT AI lab.

Current Plan

We are in the process of reworking our programs to make best use of the calibrated data we commected in May 2001, and expect the resulting maps to be nearly the best our methods can achieve, a quality we will attempt to preserve in less-controlled future tests. Calibrated data allows us temporarily to avoid the problem of registering uncertain viewpoints. We are preparing code, however, that can do such registrations by matching 3D grids made from new glimpses to perviously accumulated maps. (This latter effort is proceeding with the help of a new research programmer, Scott Crosby, hired in June, filling the project position vacated by graduating Ph.D. student Martin Martin.) When we are satisfied with results from the calibrated run, we will process several uncalibrated runs obtained from robots following prearranged trajectories in various locations. When we are satisfied with those uncalibrated results, we will attempt multiple autonomous runs, with new code that chooses paths as it incrementally constructs maps. When the autonomous runs go satisfactorily, we will add code to orchestrate full demonstration applications like patrol, delivery and cleaning. Though we have techniques in mind (described in earlier reports) for implementing the required functionality, we expect the experimental results to subtly or dramatically alter our plans, as they have in past. Extrapolating from the present rate of progress, it is likely these steps will take us to the end of calendar year 2002, and demand a slight expansion of the effort in the latter portion of that period to support increased pace of experiments involving new hardware.

Technology Transition

We plan for a major external impact at the end of the project, when we expect to have we a prototype for self-navigating robots that can be used in large numbers for transport, floor cleaning, patrol and other mass applications, enabling a growing market. We've made a number of preliminary industrial contacts towards that end. The farthest along is with Pittsburgh-based Personal Robots Inc., where we serve as science advisor with the aim of bringing this technology to market as soon as possible. Contact: Henry Thorne (henry@personalrobots.com).

In the meantime, stable versions of our code are available on our web site. Occasionally we hear from others who have used it. Our self-contained camera calibration code has been particularly popular. It has been used for several years by the DARPA-MARS funded robot soccer group. Contact: Tucker Balch (trb@cs.cmu.edu). It was used in Martin Martin's thesis on evolving realistic robot perception using genetic programming. Contact: Martin Martin (mcm@cs.cmu.edu). It is being used by a CMU solar sail development project. Contact: Richard Blomquist (rsb@ri.cmu.edu). There were other users in past years.

The full 3D grid package makes great demands on computer capacity, and is thus more difficult to use. We've recently heard from an experimenter in Texas who has the full suite running in Linux on a robot testbed. Contact: Mark Cartwright (markc@weaponeer.com). The 3D code has long been used by Alan Schultz at the Naval Research Lab in Washington. Contact: Alan Schultz (schultz@aic.nrl.navy.mil). Our simpler 2D grid code that preceded the 3D package was used by several groups in the past decade, and may still be in use.

By best current estimates, the project will likely be in a position to benefit from a one year continuation beyond the June 2002 expiration date of the present contract.

Slides and Movies

Project Overview

Precision run overhead view A

Precision run overhead view B

Precision run movie: middle camera, plain light