Organization: Carnegie Mellon University
Principal Investigator: Hans Moravec
Date: June 1, 2000
We are engaged in a 100% effort to develop laboratory prototype sensor-based software for utility mobile robots for industrial transport, floor maintenance, security etc., that matches the months-between-error reliability of existing industrial robots without requiring their expensive worksite preparation or site-specific programming. Our machines will navigate employing a dense 3D awareness of their surroundings, be tolerant of route surprises, and be easily placed by ordinary workers in entirely new routes or work areas. The long-elusive combination of easy installation and reliability should greatly expand cost-effective niches for mobile robots, and make possible a growing market that can itself sustain further development.
Our system is being built around 3D grids of spatial occupancy evidence, a technique we have been developing for two decades. 2D versions of the approach found favor in many successful research mobile robots, but seem short of commercial reliability. 3D, with 1,000 times as much world data, was computationally infeasible until 1992, when when we combined increased computer power with 100x speedup from representational, organizational and coding innovations. In 1996 we wrote a preliminary stereoscopic front end for our fast 3D code, and the gratifying results convinced us of the practicability of the approach, given about 1,000 MIPS of computer power (See overview figure). We work to parlay that start into a universally convincing demonstration, just as the requisite computing power arrives.
The work has three stages: completion and improvement of the basic perception code; creation of an identification layer for navigation and recognition of architectural features; finally, sample application programs that orchestrate the other abilities into practical behaviors like patrol, delivery and cleaning. We need both capability and efficiency. The existing code allows one-second time resolution with 1,000 MIPS, but our 3D maps have millions of cells, and straightforward implementations of path planning, localization, object matching, etc. would be much slower. We will combine techniques like coarse-to-fine matching, selective sampling and alternate representations to get the desired results at sufficient speed.
We implemented a first version of "learning through coloring", whose purpose is to optimize sensor models and other parameters that affect the quality of the gridmaps derived from sense data. Early runs of the program with a minimal set of learned parameters on archived stereo images have already led us to significantly better results, and suggested an improved overall approach, wherein intermittent coloring is used as an integral part of grid map building, not just as an offline system optimizer.
In related work, Martin C. Martin documented new results in his Ph.D. research, on genetic learning for robot perception. This work contributes to the main project, whose goal for the summer of 2000 is a complete self-tuning navigating system. Martin reported on both efforts in a presentation at the DARPA MARS-SDR 2000 Spring PI Conference in Houston on May 24 and 25.
Grid coloring provides a readily available yet powerful measure of map accuracy to guide a learning process that adjusts stereo ray and other parameters. We had planned to "project" colors from our robot camera images onto the occupied cells of grids viewed from corresponding positions. Many cells would be seen from several image positions and thus be colored many times. Consensus color would be some kind of average (median was our first choice). The map quality would then be rated by comparing other robot images to corresponding synthetic views of the colored grid. Implementation elegance lead us to instead try arithmetic mean to combine the colors, and to simultaneously calculate a color variance for each cell's. The average variance of the colorings proved to be a very good measure of grid quality, that agreed closely with subjective human judgment, but only if the colorings rather than the cells were weighted equally. Here is an early result, showing an original image of the scene, a corresponding view of a high-scoring grid, and a view of the 3D grid from an elevation angle above any of the images.
Though the first results were encouraging, our second foray with more intense learning found a weakness. The program found a setting of the parameters that produced an exceptionally good score, but when we examined the resulting grid we found that most of the scene had been eliminated, except for the stereoscopically strongest features. We reasoned as follows. The coloring method provides a good indication of grid quality because when a trial grid contains an extra cell, that cell intercepts disparate colors that come from its background in various directions in the real scene. Thus an extra cell contributes high variance making for a poor score. On the other hand, if a cell is missing, the color it should have intercepted is instead projected onto its background, contributing to high variance there, also making for a poorer score. The lowest variance should occur when the cells in the grid are occupied in the same configuration as visible matter in the real scene. But this model broke down when the learning process discovered a way to mostly depopulate the grid. In that case there was often no background to catch the colors of missing cells, and thus no penalty. We addressed the problem by surrounding the grid with a additional layer of "Sky" cells at apparent infinity. The Sky intercepted all colors that escaped the grid proper, and color variances in the Sky cells weighed into the grid score. This modification eliminated sparse learning solutions, and incidentally filled in unsightly gaps in grid views. Here is a view of a grid with Sky.
Though just beginning, learning runs have already dealt us a second surprise that significantly changes our planned approach. To simplify program development, our learning thus far involves only a few variables, often just the following three:
CorWt (Correlation Weight): an exponential coefficient that relates stereoscopic correlation values to evidence weights of rays projected into the grid. We had considered values ranging from 0.001 to 1000 for this quantity, but nearly all learning runs get best scores with CorWt set between 0.5 and 0.75, independent of other parameter settings. A solid, encouraging result.
EvE (Evidence Empty): the relative strength of evidence indicating that space is empty in the region along a stereo range measurement before the found distance. This negative evidence weighs against the positive evidence for occupancy at the triangulation distance (EvO). For now we give the latter a fixed value of +100 to set an absolute scale. Plausible values of EvE could range from about 0 to -100. Our learning runs at first always selected EvE in the range -12 to -15, again an apparently solid result.
OccT (Occupancy Threshold): Our grid coloring process begins by selecting a set of occupied cells (typically less than 100,000 out of over 4 million total) by means of an evidence threshold. Cells are initialized to 0, and repeatedly raised or lowered by amounts proportional to EvO and EvE when in the path of stereoscopic evidence. After all images are processed (having generated several hundred thousand evidence rays), those with evidence greater than OccT are presumed occupied and participate in the coloring process. We had considered values of OccT about 0 to 10 to be reasonable. Within that range, our learning programs always unambiguously pegged OccT at 0. Though consistent, this was disturbing in its suggestion that an optimum OccT might be found at a value less than 0. Since our grid was initialized to 0, OccT less than 0 would declare untouched cells to be occupied, typically about 2.5 million cells in our test framework. We overcame our misgivings and allowed the program to take OccT negative, and it rewarded us with a significantly better score with OccT at -16 and EvE at -18.
Effectively the program achieved better grids carving out the empty regions than by building up the occupied parts. The resulting grids (the "Sky" images above are an example) contain features, such as horizontal shelves, that were not actually detected by our legacy two-eyed stereo, but appear through a process of elimination. Though an unwieldy 2.5 million cells are nominally occupied, the majority lie behind the visible surfaces, and receive no color. The color projection process reaches only a manageable 100,000 cells, and is thus an excellent pruning tool. The results compel us to set OccT negative, but we are then also compelled to make grid coloring (and pruning) an integral part of our future robot navigation system, rather than just an offline learning tool.
We are continuing our exploration of grid building and learning. Soon we will engage a much larger and more complete set of grid-building learning parameters. We are also in the process of launching a new Cye robot with trinocular stereoscopic firewire cameras to gather a greater range of test images. It took several engineering months to get our firewire camera system software operational: the devices, drivers and related software are very new, incomplete and rapidly changing.
Ideas awaiting evaluation include variable resolution trinocular stereoscopic matching, randomly textured light to improve stereo of uniform surfaces; coarse-to-fine path finding; occupied-cell-list matching of grids for fast localization and feature identification. Ongoing work will lead us to more. Case in point: a pivotal learning process that projects colors from images of a scene onto grid map interpretations to evaluate map quality is implemented and in early use. It has unexpectedly compelled us to use the coloring step as an essential part of robot navigation, not just for off-line learning.
Over the three years of the project we plan to produce a prototype for self-navigating robots that can be used in large numbers for transport, floor cleaning, patrol and other applications. The work at present is developmental only, though we have had informal discussions about future cooperative commercial work with companies such as Probotics, iRobot, Cybermotion, Karcher and others.