TOWARDS AUTOMATIC VISUAL OBSTACLE AVOIDANCE Hans P. Moravec, Stanford University This report describes ongoing research on a working system which drives a vehicle through cluttered environments under computer control, guided by images perceived through an onboard tv camera. The emphasis is on reliable and fast low level visual techniques which determine the existence and location of objects in the world, but do not identify them. Included are an interest operator for choosing distinctive regions in images, a correlator for finding matching regions in similar images, a camera solver which determines camera displacement and distance to objects from stereo information [Gennery, D.B., this Proceedings] and an automatic geometric distortion corrector for camera nonlinearities. Many of these use pictures reduced in linear dimension by powers of 2 by summation of pixels. Other operators are a high pass filter, a point noise remover, a contrast normalizer, a vertical roll corrector, a picture comparator and an operator for reducing pictures by other than powers of two. Our hardware includes an electric vehicle, called the cart, remotely controlled over a CB radio link by a PDP-KL10. It carries a b/w tv camera whose picture is broadcast over a UHF channel, and occasionally digitized by the computer. It has motors for the wheels, steering and camera pan. Each can be made to run forward or backward. There are potentiometers on the steering and pan which enables them to be commanded to point straight ahead. Budgetary and personnel limitations have resulted in crude mechanical arrangements. The motor speeds are poorly regulated, and video is the only feedback to the computer. Dead reckoning errors are about 30%. Our small resources have been spent gaining software experience before undertaking serious hardware work. In my opinion our major hardware limitation is one shared by all other vision work, and AI in general, namely a critical shortage of raw processing power. For instance it would take about 100,000 efficiently programmed PDP-10's to match the human visual system. Results Early versions of the routines described below were used in a program which drove the vehicle in straight lines or uniform arcs. It acquired and tracked distant features, using their motion from frame to frame to build up a model of vehicle response, and to servo on the desired path. With the cart outdoors on a dirty road, it worked well. Ten runs of about 60 steps were completed. The runs were usually terminated by serious hiccups of the radio control link. Each step took the cart 2 feet forward, and used 30 compute seconds. The radio link has since been much improved. Several runs involving the distortion corrector, camera solver and new versions of the interest operator and correlator have been completed. The new program trys to determine the distance to the features by applying the camera solver after tracking them through several images. The performance is poor. The camera solver results are erratic, seemingly due to the degenerate nature of the solution. Objects lying near the camera axis (most of the scene) provide no depth information. Next We are trying a new approach, replacing the camera pan mechanism with one which provides 21 inches of side to side motion, in three 7 in. steps. This should provide adequate parallax, and also close spacing to make the correlations easy. Since the camera motion parameters will be known the correlation searches become one dimensional, and an absolute scale factor is known. The camera solving is also easy. The idea is to locate nearby features in 3D at each vehicle stop. The vehicle motion can be found from the apparent feature motions between stops. The location of the ground can be deduced from the camera height and orientation. Interest Operator This routine is used to acquire new features in a scene. It selects a relatively uniform scattering, to minimize the probability of missing important obstacles, and chooses distictve areas for unambiguous correlation. This is achieved by returning regions which are local maxima of a directional variance measure. Featureless areas and simple edges (which have no variance in the direction of the edge) are thus avoided. Directional variance is measured over small square overlapping windows of specified size (typ. 4x4 to 8x8). Sums of squares of differences of pixels adjacent in each of four directions (horizontal, vertical and two diagonals) over the window are obtained. The variance of the window is the minimum of these four sums. The operator is applied to a reduced version of the picture, where the specified window size shrinks to 2 or 3 pixels. Noise sensitivity is reduced and speed increased. Partly hand coded, the routine takes 75 ms for a 260x240 image, with 8x8 windows. Binary Search Correlator Given a feature in one picture, the correlator attempts to find the matching region in another image. It takes the position in the first picture, a rectangular search area (often the whole image) in the second picture, and a feature window size n. The search uses a coarse to fine strategy, which begins in reduced versions of the pictures. The order of reduction is chosen to shrink the smaller dimension of the search rectangle to between n and 2n pixels. An n by n window in the shrunken source image, centered on the desired feature, is considered. It covers about 25% of this tiny version of the picture. A correlation coefficient is calculated for each possible placement of this window on the search area. For a search area exactly 2n by 2n, there are (n+1)^2 positions. The one with the highest coefficient becomes the search area for the next level of refinement. This is repeated with pictures reduced one step less, i.e. linearly twice as large. An n by n window is again centered around the location of the feature, and is searched for in the best matching window from the previous search, which expands to 2n by 2n at the new reduction. This goes on in successively larger versions of the pictures until an n by n window is matched in the unreduced images. There are about log2(w/n) searches in all, where w is the smaller dimension of the search rectangle in the unreduced picture. This approach has advantages over a simple pass of a correlation coefficient. It needs only 1/150 the number of pixel comparisons to find an 8x8 window in a 256x256 picture (smaller advantage for smaller searches). The simple method comparisons are without context, and a match may be found in totally unrelated parts of the image. In our technique coarse structure guides the higher resolution comparisons, and further speedup is possible because smaller windows work. The searches at coarse levels rarely fail, possibly because noise and distortions are reduced by reduction. The correlation measure used, designed to have limited contrast sensitivity, was obtained by multiplying the normalized correlation coefficient by twice the cosine of the angle with the line a=b. It is: 2 Sum ab / (Sum a^2 + Sum b^2) Normalized correlation is the sum of the pairwise products of a and b divided by the geometric mean of the sum of their squares. The new measure, referred to as pseudo-normalized correlation, is the sum of the products divided by the arithmetic mean of the sums of the squares. By in-line coding the source window and using a table of squares the bulk of the correlation is done in 3 instructions per pixel comparison. An 8x8 window is found in a 260x280 area in 75 ms. The error rate is 10% on interest operator selected features. Typical image pairs are taken two feet apart with a 60 degree lens. Scale Changes As the vehicle moves the image it sees changes. The major element of this transformation is an enlargement of nearby objects. We have tried correlating across images reduced by different geometric scale factors by generating pictures 2^2/3 as large as each of the binary steps. We obtain effective scale changes of 1, 2^1/3, 2^2/3 and 2 by comparing various combinations of reductions of the first and second images. The results are disappointing. The method often introduces as many new errors as it corrects. Experiments in applying it more selectively are planned. Camera Distortion Correction Electron optics tend to have geometric distortions undesirable when using a camera as a measuring instrument. We have written a camera calibration program which is given an image of a square array of black spots on a white background, and told the array to lens center/spot spacing distance ratio. It computes a polynomial for transforming feature image positions accurately to angle in space. It tolerates a wide range of image sizes (3 to 12 spots across) and illumination and arbitrary rotation. After intense fiddling with a training set of 20 images, it has worked without error on 80 widely differing new images. Our test pattern is a ten foot square painted on a wall, with two inch spots at one foot intervals. The algorithm gets an image of such an array, and finds four major peaks in the magnitude of the fourier transform of a reduced version of it, to find its rotation and spacing. The interest operator is used to find a starting spot, and a special operator, which does local thresholding and finds centroids and moments of black areas, pinpoints all the spots, guided by the rotation/spacing information. A fourth degree least squares polynomial in two variables relating the actual to the ideal position of the spots is then generated. Acknowledgement: This work was supported in part by contracts and grants from ARPA, NASA and NSF.