DARPA MARS program research progress

Project Title: Robust Navigation by Probabilistic Volumetric Sensing

Organization: Carnegie Mellon University

Principal Investigator: Hans Moravec

Date: January 20, 2000

Technical Report


We are engaged in a 100% effort to develop laboratory prototype sensor-based software for utility mobile robots for industrial transport, floor maintenance, security etc., that matches the months-between-error reliability of existing industrial robots without requiring their expensive worksite preparation or site-specific programming. Our machines will navigate employing a dense 3D awareness of their surroundings, be tolerant of route surprises, and be easily placed by ordinary workers in entirely new routes or work areas. The long-elusive combination of easy installation and reliability should greatly expand cost-effective niches for mobile robots, and make possible a growing market that can itself sustain further development.


Our system is being built around 3D grids of spatial occupancy evidence, a technique we have been developing for two decades [A2]. 2D versions of the approach found favor in many successful research mobile robots [A0], but seem short of commercial reliability. 3D, with 1,000 times as much world data, was computationally infeasible until 1992, when when we combined increased computer power with 100x speedup from representational, organizational and coding innovations. In 1996 we wrote a preliminary stereoscopic front end for our fast 3D code, and the gratifying results convinced us of the practicability of the approach, given about 1,000 MIPS of computer power (See overview figure). We work to parlay that start into a universally convincing demonstration, just as the requisite computing power arrives.

The work has natural three stages: completion and improvement of the basic perception code; creation of an identification layer for navigation and recognition of architectural features; finally, sample application programs that orchestrate the other abilities into practical behaviors like patrol, delivery and cleaning. We need both capability and efficiency. The existing code allows one-second time resolution with 1,000 MIPS, but our 3D maps have millions of cells, and straightforward implementations of path planning, localization, object matching, etc. would be much slower. We will combine techniques like coarse-to-fine matching, selective sampling and alternate representations to get the desired results at sufficient speed.

Ideas awaiting evaluation include variable resolution trinocular stereoscopic matching; optimization of sensor evidence profiles and other system parameters using robot images of the scene to first naturally colorize the occupied cells in a grid, then to check its faithfulness, to close the learning loop; randomly textured light to improve stereo of uniform surfaces; coarse-to-fine path finding; occupied-cell-list matching of grids for fast localization and feature identification; and others. Ongoing work will lead us to more.

Accomplishments for December 1999

Our stereo vision algorithms exploit "rectified" images for stereoscopic match search efficiency and geometric accuracy.  Raw images from real cameras that are slightly mispointed, with rasters mechanically offset and containing lens distortions, are transformed by camera-specific geometric rectification functions, encoded in image-sized lookup tables, into images with precise projective geometry.  The utilitarian job of deriving these rectification functions is handled by a program called flatfish (because, among other things, it "flattens" fisheye distortion).

In the last months of 1999 we greatly improved a version of flatfish originally written in 1996.  Many changes and additions were made to greatly extend the  range of calibration images it could handle correctly and automatically.  The largest of these, iterative incremental rectification, now allows the program to correctly rectify true fisheye lens images, with fields of view in excess of 120 degrees.  Previously the usable FOV had been limited to about 100 degrees.

Flatfish is run under supervision once for each robot camera.  The functions it produces are then used repeatedly onboard the robot to rectify each camera image.  The program assumes that image imperfections arise from small camera and target mounting misalignments and from  a lens distortion that is radially symmetric about some, initially unknown, center.  These assumptions are mostly correct for solid-state cameras, but would not be true for electron-gun cameras, whose scanning rasters often contain non-radial distortions.

Flatfish works with images of a test pattern.  The camera fixture is leveled, and positioned, with reasonable accuracy, perpendicularly a known distance in front of a screen patterned with a precise square array of about 400 black spots on a white background, with one additional spot at a half-grid position at the center serving as an absolute position reference.

Figure 1:  Robot with three 90 degree field of view cameras, aligned on rectification test pattern

The program inputs an image of the calibration spot pattern, an aim point on the pattern, the measured distance between the lens center and the pattern, and the desired angular field of view of the rectified image.  It produces a file of calibration parameters which contains the necessary information to rectify images from the same camera to these specifications.  The lens center (where the pinhole would be, if it were a pinhole camera) is conventionally taken to be at the distance of the lens iris ring.  To determine the position without guessing, a helper program scalefish takes two calibration files produced from the same camera at different distances from a test pattern, with other settings including guessed lens center unchanged.  By relating the image scale change reflected in the calibration files to the distance change (which must be precisely given), scalefish deduces the actual lens center position, which can then be used in later runs of flatfish with that type of lens.

To calibrate stereoscopic camera sets, all are aimed at the same calibration pattern, as in figure 1. The flatfish program is run for each resulting image, with only the aim point on the calibration pattern specified differently for each calibration.  The aim point for each camera defines the rectified view direction, and the separation of the two aim points on the pattern, vertically and laterally, must be equal to the physical separation of the cameras.

Flatfish begins by estimating the density of spots expected from the setup description, then applies a "spot operator" to identify round dark spots in the image.  At each pixel the spot operator applies templates at a range of sizes, each consisting of a round region and its annular surround.  The "spot value" is the difference in mean lightness between the exterior and the interior regions minus terms proportional to the nonuniformity of brightness within the interior and exterior regions.  An adjacent spot-linking process connects the spots into a grid to give each a "spot coordinate", for instance three spots up and two to the right from a starting spot.  The program then guesses an optical center (initially the pixel center of image raster), and calculates a radius from this center to each spot, in both pixel and spot coordinates.  We show the result as a scatter diagram: a point for each spot, positioned horizontally according to its spot radius and vertically by its pixel radius vertically.  The shape of plot gives the radial lens distortion, the scatter an indication of the error made in choice of center.  The program then adjusts the assumed optical center (and any additional parameters, currently pixel aspect ratio) to minimize the scatter.  A function with several parameters fit to the scatter plot defines the radial lens distortion, which is then eliminated.  The program then determines first-order corrections to remove the effects of camera aiming errors in azimuth, elevation and roll, as well as scaling and recentering the result to the desired field of view.

Examples of the program's operation on images from cameras with fields of view between 30 and 90 degrees follow.  These is an undemanding regime within the scope of the old program, though the new program works much more automatically and reliably.

90 degree field of view example, from left camera of a stereo pair

60 degree field of view example, from right camera of a stereo pair 30 degree field of view example, from center-aimed camera

The original program managed to rectify correctly only about 100 degrees of the field of view of very wide angle cameras, because the spot finder could not recognize spots distorted into extreme ellipses at the edges of the field of view.  The new program addresses this by iterating the rectification steps.  The first iteration locates spots only in the innermost approximately 100 degrees of the field of view, but extrapolates the rectification roughly to wider angles.  Usually all the spots in the resulting rectified image are then recognizable, and a second iteration is accurate to the image edges.  This process works only if the scatter fit function extrapolates reasonably.  We found that polynomials and other generic functions do not, tending instead to rise or fall precipitously just beyond the range of the fitted points.  We found a very good solution by considering the lens designer's problem, to provide equal illumination across the field of view, which can be achieved by assigning every pixel an equal solid angle of view.  With radial symmetry, this demands that the pixel radius be proportional to a specific arctangent of the spot (or projective) radius.  We installed this arctangent as the first order basis function for the least-squares fit.  A linear function, modeling a projective relation, was the second term, and subsequent functions were successive powers of these arctangent and linear functions.  With this basis the extrapolations are often nearly quantitatively as well as qualitatively correct.  The program iterates repeatedly, increasing the degree of the fit function, until the number of found spots and the scatter both fail to improve, or the function coefficients grow too large, indicating an incipient singularity.

Here is an example from a 120 degree image, with vignetting because the camera was fitted with a lens intended for a different camera with a smaller sensor.

120 degree field of view, multiple iterations.

We found the arctangent "uniform illumination" function to be a good representation of distortion in all the lenses we encountered.  This led us to try applying it pre-emptively, before any spot-finding.  Since we do not know the exact image scaling, positioning or rotation in advance, such pre-rectification is bound to produce images with slight scale, position and rotation errors.  Still, spots are likely to be rounder and more uniform in size in the pre-rectified than in the raw images, making flatfish's first iteration easier.  We found this was in fact he case, and flatfish now pre-rectifies unless otherwise instructed.  Here are some examples of pre-rectified calibrations:

120 degree field of view, with pre-rectification

30 degree field of view, with pre-rectification

60 degree field of view, with pre-rectification

90 degree field of view, overcorrected by pre-rectification

The current version of the program can be found here: flatfish.c

In other developments, The G4 and iBook computers, linked by wireless network, that will control our first robot testbed have arrived, as have three Quickcam Pro cameras to be mounted on it.  We delayed ordering a robot chassis, and have now decided to use the Probotics Cye platform, with special laptop trailer for the iBook, which can do the job of collecting stereoscopic imagery for less than $1,000 per platform.

Graduate student Martin Martin, working towards his thesis about one year away, has integrated monocular vision, using an optical-flow approach, with sonar range sensing and omnidirectional robot control towards a program that evolves high level control behaviors. The work is being conducted using our existing Uranus mobile robot, controlled via a long cable by a desktop dual pentium machine running the Be operating system. Though the details are different, the approach foreshadows the development system we will use for the main work.

Current Plan

We are now developing a first version of the code that projects color from stereoscopic images onto the occupied cells of 3D occupancy grids derived from the same images. This is a first step towards an automatic learning process that will automatically optimize the various parameters in the system. Other camera images will be compared with corresponding views of such colorized grids to estimate the quality of the grids. The goal of the learning is to maximize this quality.  We will report the results at the next meeting.

At that time, or perhaps the following meeting, we will also be able to report on genetic algorithm learning of strategies for combining sensor and control techniques for more effective control of robot mobility.  This is the thesis work of Martin Martin.

Technology Transition

Over the three years of the project we plan to produce a prototype for self-navigating robots that can be used in large numbers for transport, floor cleaning, patrol and other applications. The work at present is developmental only, though we have had informal discussions about future cooperative commercial work with companies such as Cybermotion, Karcher iRobot and Probotics.