
Figure 1: Progress in Robot Spatial Awareness: By 1980
the Stanford Cart had (sometimes, slowly) managed to negotiate
obstacle courses by tracking and avoiding the 3D locations of a few
dozen object corners in the route ahead. The top panel shows the
Cart's view of a room, superimposed with red dots marking points its
program has selected and stereoscopically ranged. The consequent 3D
map at the right shows the same points, with diagonal stalks
indicating height, and a planned obstacle-avoiding path. (Labels were
added by hand.) The program updated map and plan each meter of
travel. The sparse maps were barely adequate, and blunders occurred
every few tens of meters.
The second panel shows a dense 2D grid map of 150 meters of corridor
produced in 1993 by a program by Barry Brummitt controlling Carnegie
Mellon's Xavier robot via a remote Sparc 2 workstation. The sensor was
a ring of sonar rangefinders, whose interpretation was automatically
learned. In the map image evidence of occupancy ranges from empty
(black) through unknown (grey) to occupied (white). Regular
indentations marking doors are evident, also bumps where cans, water
coolers, fire extinguishers, poster displays, etc. protrude. The
curvature is dead-reckoning error.
The last panel shows work in progress. As with the Cart, the left
image is a robot's eye view of a scene. The right image, though
resembling a fuzzy photograph, is actually a perspective view of the
occupied cells of a 3D map of the scene, built from about 100,000
range measurements extracted from 20 stereoscopic views similar to the
one on the left. The grid is 256 cells wide by 256 deep by 128 high,
covering 6x6x3 meters. Of the eight million total cells, about
100,000 are occupied. The realistic occupied cell colors are a side
effect of a learning process. The shape of the evidence patterns
corresponding to stereoscopic range values, among other system
parameters, are tuned up automatically to make the best grids. A
candidate grid is evaluated by "projecting" colors from the original
images onto the grid's occupied cells from the appropriate directions.
Each cell in a perfect grid would collect colors from different views
of the same thing in real space. Since most objects show the same
color from different viewpoints, the various colorings of each single
cell would agree with one another. Incorrect extra cells, however,
would intercept many disparate background colors from different points
of view. Conversely, colors of incorrectly missing cells would be
"sprayed" across various background cells, spoiling their uniformity.
The learning program tunes the system to minimize total color
variance. The maps so far are ragged around the edges, and many
promising improvements remain to be tried, but the results are very
encouraging nevertheless. Compare the richness of the 3D maps in the
first and third panels. Both were produced by processing about 20
stereoscopic image sets, the 1980 result on a 1 MIPS DEC KL-10
mainframe computer with 500 kilobytes of memory, the 2000 result on a
1,000 MIPS Macintosh G4 with 500 megabytes of memory.