Three Dimensional Images from Cheap Sonar Hans P. Moravec The Robotics Institute Carnegie-Mellon University Pittsburgh, PA 15213 December 21, 1985 @Section(Introduction) We propose to build and use moderate resolution three dimensional space occupancy maps built from multiple measurements from cheap sonar sensors. By cheap sonar I mean range readings obtained from unmodified Polaroid sonar transducers driven by the original Polaroid circuit board, or by an improved board (allowing closer minimun ranges) from Texas Instruments. This is a simple, but highly developed and reliable, not to mention inexpensive, system that returns the distance to the nearest reflector in a certain wide cone of sensitivity. Though much more information can be obtained, in principle, from single sound bursts by modifying the aperture, phase relationships, frequencies and processing, such an approach ignores the present very good solution. @Section(Past Work) In earlier work [Moravec&Elfes 1985] we described the use of multiple wide-angle sonar range measurements to map the surroundings of an autonomous mobile robot. A sonar range reading provides information concerning @r(empty) and @r(occupied) volumes in a cone (subtending 30 degrees in our case) in front of the sensor. The reading is modelled as probability profiles projected onto a rasterized map, where somewhere occupied and everywhere empty areas are represented. Range measurements from multiple points of view (taken from multiple sensors on the robot, and from the same sensors after robot moves) are systematically integrated in the map. Overlapping empty volumes re-inforce each other, and serve to condense the range of occupied volumes. The map definition improves as more readings are added. The final map shows regions probably occupied, probably unoccupied, and unknown areas. The method deals effectively with clutter, and can be used for motion planning and for extended landmark recognition. This system was tested on our @i(Neptune) mobile robot, and recently outdoors on the @i(Terregator) robot. @Section(Experimental Approach) Processing a single reading from a standard unit is computationally cheap; only one number is generated, limiting the computations necessary or possible. The range accuracy of a typical reading is better than a centimeter, but because of the wide angle of the pulse, the lateral position of the reflection is uncertain to on the order of a meter. By exercising multiple units repeatedly, readings from multiple viewpoints may be combined to deduce the location of the reflecting surfaces more precisely. The combining process is a kind of deconvolution - each point in the final high resolution map is a consequence of many of the individual readings combined in a particular, unique way and each reading participates in many map points. Our existing approach uses the idea that the interior of each sonar reading cone (bounded by the sensitivity profile laterally, and by the range surface lengthwise) is known to be empty, and that the reflecting point is somewhere on the range surface in this cone. The empty interiors of other readings overlapping this range surface reduce the region of uncertainty of the location of the echoing point in a probabilistic way, while intersecting range surfaces reinforce each other at the intersections. The deconvolution is essentially non-linear. The old programs work in two dimensions, collapsing the measurement cones vertically into flat pie wedges that are combined in a two dimensional map array that ultimately holds numbers giving the confidence that a given cell is empty or occupied. We have experimentally noted that maps with a range of 10 meters and a resolution of 15 to 30 cm can be reliably constructed with data from a ring of 24 robot-mounted transducers looking out horizontally at 15 degree intervals and pulsed at six locations a few meters apart in the robot's travels (144 independent measurements). The sharpness of the map can be seen to improve as more readings are added. Many readings are combined to form one map probability point, and this process makes our method quite tolerant to the occasional range errors encountered in the sonar data. A highly optimized version of the program, using fixed point arithmetic, can process 144 points in roughly 1 second on a big Vax, 2 seconds on a Sun2 and 4 seconds on a Macintosh, building a 32x32 map of eight bit probabilities. A companion program correlates two such maps, using a coarse to fine hierarchy of reductions and a dual representation (raster and list of occupied cells) to search over X, Y shift and angle, in similar times. Another program devises good robot paths through the probability maps. @subsection(3D mapping) Our approach generalizes very naturally to three dimensions - in fact the collapse of cones to wedges in the 2D program is its greatest single approximation, and information waster. The sensors must be configured differently, however. The only height information in the present planar ring comes from the vertical divergence of the cones of sensitivity, whose symmetry makes it impossible in principle to distinguish reflections from above the ring plane from those an equal distance below the plane. Even without this ambiguity, the present arrangement could provide very little vertical resolution. An arrangement of sensors on the surface of a partial sphere would be much better. The 15 degree spacing of the 24 sensors on the planar ring was chosen to give some overlap of fields of view. It was discovered that this spacing allowed multiple sensors to be fired simultaneously without serious interference, in three, or even two, interleaved banks, greatly speeding data gathering. Using the same idea and spacing to fill a sphere instead of a circle leads to the following calculation. A sphere represents 4@g(p) of solid angle. Spacing the sensors 15 degrees from each other assigns a cone with 15 degree apex to each sensor. A cone with apex angle T subtends 2@g(p)(1-cos(T/2)) solid angle, and we can (glossing over packing problems) arrange about 4@g(p)/(2@g(p)(1-cos(T/2)) = 2/(1-cos(T/2)) of them into a sphere. With T=15 degrees 233 transducers fill a sphere. If we content ourselves with a 90 degree wedge (almost a fisheye if you note that the beams fan out an additional 15 degrees on all edges, for a net coverage of 120 degrees) then this gets reduced to a more manageable 34 transducers. If actually packed onto a spherical cap, the sensor group would greatly resemble a compound insect eye, each facet being a Polaroid transducer. The insect would be a monster. The transducers are somewhat less than 5cm in diameter, which would demand a sphere radius of about 40cm. A 90 degree cap from this sphere would be a shallow bowl 56cm in diameter and 12cm deep. One such sensor array on the nose of a vehicle, tilted down somewhat, should be adequate for many tasks, but imagine getting better side coverage, say for obstacle avoidance, by placing two, one on each side of the head, enhancing the giant insect effect. @subsection(How Many Readings, How much Computation?) The 3D map we hope to derive from this array has more cells than the 2D maps we have worked with, and will require more data. How much? Suppose we build our maps to a range of about 10 meters in the vehicle forward direction, 5 meters laterally and 3 meters in the vertical direction, and to a resolution of 30cm in each direction. There will be 33x17x10 cells, each holding a number, in the final map. This is 5,610 numbers. A naive degrees of freedom analysis suggests that a similar number of readings each returning one number are necessary to determine this many variables. Fortunately our 2D experience suggests that far fewer will suffice. We have noted experimentally that 144 readings nicely spaced around our cluttered laboratory is just enough to give us good 32 cell by 32 cell maps covering a square area 10 meters on a side. There are 1024 points in such maps, so we seem to be accomplishing the impossible, extracting 1024 unknowns from 144 equations. Actually, the 1024 numbers are not very informative as their magnitude represents our certainty (or uncertainty) about particular cells being occupied, not something intrinsic about the scenery. Most of the cells in the final mape are labelled an unsurprising "unknown" (represented by 0) or "probably empty" (represented by a negative number). The real information is concentrated in the locations of the reflecting boundary seen by the robot, i.e. the minority of cells labelled "probably occupied". To first approximation this boundary is a one dimensional contour embedded in the 2D map. Its length in cells is on the order of the boundary length of the map, 4x32. The information is not in the contents of these cells (positive probability numbers), but in their location. Each cell represents about one number - think of the boundary expressed in polar co-ordinates - the information is in the radius at each angle, the angle itself is just the independent variable. SO - we have 144 equations to determine about 4x32 = 128 variables - just about right! Mathematics is great. In 3D the contour becomes a surface. In our example of two paragraphs ago the map size was 33x17x10 cells. The surface of this volume has about 2,100 cells, and thus requires about 2,100 readings by the above analysis, or 62 full scans of the 34 transducers in the 90 degree eye. The sensors can be pulsed about twice per second. With two way interleaving, a full eye poll takes a second. The 62 readings would thus take about a minute. Computation times on a big Vax, extrapolating from the fast 2D program, would also be at about 30 seconds to a minute. It is assumed that the robot travels about ten meters during this minute (a speed of 0.6 km/hr) to give each reading set a fresh vantage point, and that adequate dead reckoning is provided to correctly relate the 60 sets geometrically. Of course, lower resolution maps, or simple obstacle detection, can be accomplished faster, in as little as one (half second) pulse gathering period. These numbers suggest that high speed travel is best left to longer range sensors, and perhaps simpler techniques. The sonar mapping could be very useful for slow, close in, tight maneuvering in complicated environments and on very rough ground. The very general path passability grid route planners demonstrated by the group extend in a natural way to the dense 3D data this approach will provide. @Section(Research Plan) All our sonar experiments so far have been conducted with early prototype sonar rings provided by our sometime collaborator, Denning Mobile Robotics, Inc. of Woburn, Massachusetts. Because of a rather old fashioned (small buffer) serial interface on our Vax computers, the processors on these rings can't reliably communicate with the Vaxes in the present configuration, and this has been a serious hinderance to sonar experimentation. We will begin the work by building new interfaces for the transducers using Texas Instrument driver boards funneling into an MC68000 microprocessor. Denning has agreed to help in this effort - they have been using a TI board based design successfully for six months. A second stage is design and construction of the physical array. This will require a mathematical optimization and an evaluation by simulations of the individual sensor placements. The bulk, and point, of the work will be an extended series of experiments with 3D map building and navigation programs. One small but interesting subproblem in the early stages is 3D raster fill of conically bounded sphere surfaces and volumes. A more significant problem is the handling of position uncertainty in the measurements made during an extended run. Our probability raster permits a very direct representation for uncertainty - it can simply be added to the probability distribution, increasing the spread of each reading in the appropriate directions. We'd like to try an approach that projects the incremental uncertainty of each move onto old measurements rather than new ones. The result would be a map that is always detailed for the local area around the vehicle, and fades to fuzziness under the cumulative effect of errors in the distance. Very old readings that provide almost no information because of uncertainty in their location could eventually be eliminated from the mapmaking. The three dimensional nature of the images will permit some work in identification of large objects. Recognition of small objects is ruled out by the coarseness (about 10cm) of the anticipated maps.