Organization: Carnegie Mellon University
Principal Investigator: Hans Moravec
Date: March 20, 2001
We are engaged in a 100% effort to develop laboratory prototype sensor-based software for utility mobile robots for industrial transport, floor maintenance, security etc., that matches the months-between-error reliability of existing industrial robots without requiring their expensive worksite preparation or site-specific programming. Our machines will navigate employing a dense 3D awareness of their surroundings, be tolerant of route surprises, and be easily placed by ordinary workers in entirely new routes or work areas. The long-elusive combination of easy installation and reliability should greatly expand cost-effective niches for mobile robots, and make possible a growing market that can itself sustain further development.
Our system is being built around 3D grids of spatial occupancy evidence, a technique we have been developing for two decades, following a previous decade of robot navigation work using a different approach (See Figure 1: Project history). 2D versions of the approach found favor in many successful research mobile robots, but seem short of commercial reliability. 3D, with 1,000 times as much world data, was computationally infeasible until 1992, when when we combined increased computer power with 100x speedup from representational, organizational and coding innovations. In 1996 we wrote a preliminary stereoscopic front end for our fast 3D code, and the gratifying results convinced us of the practicability of the approach, given about 1,000 MIPS of computer power. We work to parlay that start into a universally convincing demonstration, just as the requisite computing power arrives.
The work has three stages: completion and improvement of the basic perception code; creation of an identification layer for navigation and recognition of architectural features; finally, sample application programs that orchestrate the other abilities into practical behaviors like patrol, delivery and cleaning. We need both capability and efficiency. The existing code allows one-second time resolution with 1,000 MIPS, but our 3D maps have millions of cells, and straightforward implementations of path planning, localization, object matching, etc. would be much slower. We will combine techniques like coarse-to-fine matching, selective sampling and alternate representations to get the desired results at sufficient speed.
We have continued to developing our "learning through coloring" program, whose purpose is to optimize sensor models and other parameters that affect the quality of the gridmaps derived from sense data. As we did so the quality of the maps has palpably improved, as has the objective color variance score. We increased the number and scope of sensor model parameters adjusted by the learning, added an iterative image color adjustment process that compensates for differences in exposure in the robot images of the scene by using the colored grid (which averages the views) as a color reference.
In our last report, we noted that the program had found its best scores by setting the occupancy threshold that decides whether a cell is imaged as opaque or transparent to a negative value, effectively declaring blank maps as being fully occupied. Subsequent sensing then carved out empty regions rather than by building up the occupied parts. The resulting grids contain features, such as horizontal shelves, that were not actually detected by our legacy two-eyed stereo, but appear through a process of elimination. The number of occupied cells remained in the millions, however. The program pruned them down to about 100,000 by eliminating those that were seen from no sensing position, and thus received no color.
As we added parameters to the learning that allowed the evidence rays to be reshaped, we were delighted to find the optimum solutions now had zero rather than negative occupancy thresholds. Alas, the number of occupied cells was still almost a half million before color pruning. The program had substituted enormous extensions of the occupied portion of evidence rays. Each point that was sensed filled a meter or two of space behind it. That fill was then carved out by other evidence.
We trace much of the odd nature of the optimum solutions to deficiencies in the sensor configuration of our stock data, which includes an office scene and a more elaborate laboratory scene. In particular, horizontal two-camera stereo prevents the ranging of clean horizontal features such as shelves and lintels. Blank surfaces like walls also cannot be ranged. Also, though the learning does well in making the grid map match the views of the scene, it currently makes a mess of areas out of view: behind every object one finds a meter of occupied roughness.
One solution to the latter problem is to have more views from more directions. Our laboratory data set does this to some extent: it has front, side, back and diagonal view directions (the office data set are all frontal views), and indeed some portions of the lab data are much cleaner than the office scene. But we intend to do much better.
We've built a sensor head with three cameras and a textured light source. These are intended to allow the program to see all edge directions, and also blank surfaces. We will arrange for the program to take both textured and untextured image triplets from each position: the textured images are used for stereoscopic matching, the regularly lighted ones for coloring. At each position we will take images in four compass directions (our cameras have 90 degree fields of view, so this gives us almost complete horizontal coverage), giving cross views of much of the scene. In addition we have arranged a test area with mounts for overhead views. The overhead views of the test area will not be used in the map-building stage of the program (since a wandering robot would not have such available), but will be used to contribute color (and color variance) to the scene. Effectively, the program will be penalized for making mistakes in parts of the map hidden from the ground-level viewpoints. We expect this to teach the program to avoid depositing debris in the unseen spaces. For instance, though our present best maps are quite good, they have a dense "roof" of debris at the upper edge of the camera fields of view, which the overhead views will heavily penalize.
We have completed building our new sensor setup, and will use it to collect new data almost immediately. Although it can be used on our Cye robot, we have opted to take our first data from a manually moved camera stand built around a very precise optical bench mount. We position this stand by lining up calibrated notches in its base with an array of precisely marked floor dots. This allows the camera positions to be known to a millimeter or two, allowing us to separate the robot fine-positioning problem from the map-building problem. And when we do begin to solve for camera positions, we will have a reference to judge how well the program is doing,
We expect our new data to bring us further towards photorealism, and more importantly extremely reliable 3D maps. We have begun a parallel effort to build the second, recognition, software layer in our original proposal. This extracts paths, localization, basic architectural features and some object identifications from the maps.
By the end of the project we plan to produce a prototype for self-navigating robots that can be used in large numbers for transport, floor cleaning, patrol and other applications.