Cart Project Progress Report Donald Gennery Hans Moravec Stanford University September 22, 1976 CONTENTS 1 Introduction 2 Interest Operator 3 Binary Search Correlator 4 Correlation Coefficients 4 Normalized Correlation 5 Pseudo-normalized Correlation 6 Timing 8 Scale Changes 9 High Resolution Correlator 10 Camera Calibration 11 Results Introduction This report describes work towards autonomous vehicles able to maneuver through cluttered environments. The emphasis thus far has been on visual techniques which locate the clutter in three dimensions and avoid it, but do not identify its nature. Our hardware is an electric vehicle, remotely controlled over a citizens band radio link by a PDP-KL10. It carries a black and white television camera whose picture is broadcast over a UHF channel, and received and occasionally digitized by the computer. The vehicle has drive motors for the rear wheels, a steering motor coupled to a steering bar arrangement on the front wheels, and a motor controlling the camera pan angle. Each can be commanded to run forward or backward. There is a potentiometer on the steering and pan linkages which enables them to be commanded to point straight ahead. The mechanical arrangements are crude, the motor speeds are unregulated, and there is no feedback to the computer other than the video from the camera. Overall dead reckoning errors are on the order of 30%. Most of the computer controlled runs so far have been with the camera pointed straight ahead and the vehicle moving forward in two foot increments, punctuated by long pauses, during which the images are processed. The key elements in the vision software are three primitive picture operators and a geometric camera solver. The "interest operator" locates small patches, called features, scattered more or less uniformly over images and having the property that the corresponding points are likely unambiguously findable in subsequent images. The "binary search correlator" takes features in one image and attempts to find them in subsequent images, altered slightly by vehicle motion. Both operators make extensive use of versions of the pictures reduced by factors of 2, 4, 8, 16 etc. in linear dimension. The "high-resolution correlator" takes approximate matches of points in pairs of images and produces matches of greater accuracy, especially in areas of the picture with low signal to noise ratio. It also produces an accuracy estimate. The "camera solver" is given the location of at least five corresponding features in two pictures, and deduces the relative camera positions and orientations and the three dimensional locations of the features. Peripheral routines include a camera calibrator which computes a two dimensional polynomial for correcting distortions of a camera when the camera is placed in front of a screen containing a square array of black spots on a white background. Other operators sometimes used are a high pass filter, a high frequency noise remover, a histogram normalizer, a vertical sync loss corrector, a picture comparator and an operator which reduces pictures by other than powers of two. Interest Operator This routine is used to acquire new features in a scene. It tries to select a relatively uniform scattering, to minimize the probability that important obstacles will be missed, and also attempts to choose areas which can be found easily by the correlator. Both goals are achieved by returning regions which are local maxima of a directional variance measure. Featureless areas and simple edges (which have no variance in the direction of the edge) are avoided by this process. Directional variance is measured over small square windows of whose size is chosen by the user. Sums of squares of differences of pixels adjacent in each of four directions (horizontal, vertical and two diagonals) over the window are obtained. The variance of the window is the minimum of these four sums. An earlier version of the algorithm computed this function over a grid of adjacent non-overlapping windows. A window was declared a local maximum if it was larger than any of its eight immediate neighbors. Interesting regions were sometimes missed by this arrangement when the high contrast parts straddled the boundaries of windows. Performance has been improved by increasing the number of windows by a factor of four, including sets shifted half a window size horizontally, vertically and diagonally. To be a local maximum, a window must now be the largest of the twenty five which overlap and contact it. Since the variance measure depends on adjacent pixel differences, it responds to high frequency noise in the image. This undesirable effect is circumvented by applying the interest operator to a reduced version of the picture, where noise is lessened by the averaging. The smaller image also means a faster program. The standard procedure is to choose the power of two reduction in which the specified window size is reduced to 2 or 3 pixels on a side. A window size of 8 is typical, in which case the operator is applied to the picture reduced twice, i.e. by a linear factor of 4. The local maxima found are stored in an array, sorted in order of decreasing variance. The entire process on a typical 260 by 240 image, using 8 by 8 windows takes about 75 milliseconds on the KL-10. The variance computation and local maximum test are coded in FAIL (our assembler), the maxima sorting and top level are in SAIL (Algol). Binary Search Correlator Given a feature in one picture, this routine attempts to find the corresponding region in a second image. The correlator is given a source position in the first picture, a rectangular search window (often the whole image) in the second picture, and and a feature window size n (8 is typical), representing the side of a square window. The search uses a coarse to fine strategy, which begins in reduced versions of the pictures. The order of reduction is chosen to make the smaller dimension of the search rectangle between n and 2n pixels. An n by n window in the shrunken source image, centered around the desired feature, is considered. Because this version of the image is so small, this window covers about 25% of the picture, if the search window is the entire picture. A correlation coefficient (see below) is calculated for each possible placement of the source window in the search area, quantizing the positioning to whole pixel steps. If the search window is exactly 2n by 2n, there are (n+1)^2 positions. The one which results in the highest coefficient is used as the search area for the next level of refinement. The process is repeated at the next lower order of reduction, i.e. with pictures that are linearly twice as large. An n by n window is again centered around the location of the source feature, and is searched for in the area that corresponds to the winning window in the previous search, which expands to 2n by 2n at the new level of reduction. This goes on in successively larger versions of the pictures until an n by n window is located in the original unreduced images. The program thus locates a huge general area around the feature in a very coarse version of the images, and successively refines the position, finding smaller and smaller areas in finer and finer representations. The work at each level is approximately that of finding an n by n window in a 2n by 2n area, and there are log2 w/n levels, where w is the smaller dimension of the search rectangle, in unreduced picture pixels. This approach has many advantages over a simple pass of of a correlation coefficient computation over the search window. The most obvious is speed. A scan of an 8 by 8 window over a 256 by 256 picture would require 249x249x8x8 comparisons of individual pixels. My method needs only about 5x81x8x8, about 150 times fewer. The advantage is lower for smaller search areas. Perhaps more important is the fact that the simple method exhibits a serious jigsaw puzzle effect. The 8 by 8 patch is matched without any reference to context, and a match is often found in totally unrelated parts of the picture. The binary search technique uses the general context to guide the high resolution comparisons. This makes possible yet another speedup, because smaller windows can be used. Window sizes as small as 2 by 2 work reasonably well. The searches at very coarse levels rarely return mismatches, possibly because noise is averaged out in the reduction process, causing comparisons to be more stable. Reduced images are also tolerant of geometric distortions. Correlation coefficients The problem is, given two n by n arrays of pixels, referred to as A[i,j] and B[i,j], determine how alike they are. One of the simplest comparisons is: Sum[i,j = (0,0)..(n,n)] (A[i,j]-B[i,j])^2 If the camera did exact photometric measurements, and if lighting conditions remained unchanged between frames, this measure would be ideal. Unfortunately the camera response varies from place to place on the camera field due to the nature of the optics and complicated vidicon effects, from scene to scene because of target voltage response to illumination, and from time to time due to changes in battery voltage. In spite of these drawbacks the measure has been used successfully. Normalized Correlation Another alternative is "normalized correlation", the RMS average displacement of co-ordinate pairs (A[i,j],B[i,j]) from a least squares regression line relating A to B. Let a[i,j] = A[i,j]-Abar and b[i,j] = B[i,j]-Bbar , Abar and Bbar being the means over the entire window. The normalized correlation coefficient is defined as: sigma = Sum[i,j = (0,0)..(n,n)] a[i,j]b[i,j] / sqrt( Sum[i,j = (0,0)..(n,n)] a[i,j]^2 x Sum[i,j = (0,0)..(n,n)] b[i,j]^2 ) Normalized correlation is invariant under all linear transformations of the values of A and B. The complete contrast insensitivity this implies is a needless waste of information, because contrast changes in actual pictures are relatively small from frame to frame. In addition, the measure has a degeneracy when all the values of either A or B are the same. This will happen often if the search window contains uniformly shaded areas, and must be handled by a special test. A different measure is called for. Pseudo-normalized Correlation Although the correlation coefficient itself remains unchanged, the "line of best fit" is different when calculated from A to B than from B to A: for the best fit of A to B, b = ka, we wish to find k which minimizes Sum (ka-b)^2 this is k = Sum ab / Sum a^2 the line can be expressed as a = sigma sqrt(Sum a^2)) / sqrt(Sum b^2) b for the best fit of B to A, a = kb, we wish to find k which minimizes Sum (kb-a)^2 this is k = Sum ab / Sum b^2 the line can be expressed as b = sigma sqrt(Sum b^2) / sqrt(Sum a^2) a these are equivalent if sigma is unity, i.e. if the correlation is perfect. Since for our application, the comparison of two picture patches, there is no particular reason for choosing one over the other, we compromise and choose a line equidistant from the two, removing the asymmetry between A and B: b = sqrt(Sum b^2) / sqrt(Sum a^2) a or equivalently a = sqrt(Sum a^2) / sqrt(Sum b^2) b We will use this to obtain a correlation measure more suited to our needs. Small contrast changes should be tolerated, large ones should not. Contrast is expressed by the slope of the line of best fit, a slope of unity denoting a perfect match, slopes of zero and infinity indicating that one or the other of the patches has no variations in intensity. Consider the normalized dot product of the line of best fit with a line of unity slope. Since this is the cosine of the angle between them, it will be unity for a perfect match, and very near unity for small contrast differences, since the cosine is flat around zero angle, and zero for a negative correlation. If we were to multiply this dot product by the normalized correlation coefficient, we would obtain a measure which was unity only for perfect correlation with no contrast differences, dropping rapidly with lessening correlation, and slowly at first and then rapidly with increasing contrast differences. This measure still has a few flaws. If one of the patches is without intensity variations, the normalized correlation coefficient becomes 0/0 and the dot product 1/sqrt(2). In fact we want our measure to be zero in this case. This can be arranged if we multiply the normalized coefficient by the cosine of twice the angle between the correlation line and a line of slope 1. This cosine will be zero for lines of slope zero or infinity, and the limit of the product with the normalized correlation coefficient will also be zero in those cases. In addition, perfect negative correlation will again appear as minus unity, which might be an advantage in some circumstances. Deriving an expression for this measure, we note that the cosine of our "compromise" correlation line with one of unity slope is (sqrt(Sum a^2) + sqrt(Sum b^2)) / sqrt(2 (Sum a^2 + Sum b^2)) To convert this to the cosine of twice the angle with a=b, we use the identity cos(2t) = 2 cos^2(t) - 1 giving us 2 (sqrt(Sum a^2) + sqrt(Sum b^2))^2 / 2 (Sum a^2 + Sum b^2) - 1 = 2 sqrt(Sum a^2 Sum b^2)/(Sum a^2 + Sum b^2) multiplying by the normalized correlation coefficient, we get 2 sqrt(Sum a^2 Sum b^2) / (Sum a^2 + Sum b^2) Sum ab / sqrt(Sum a^2 Sum b^2) = 2 Sum ab / (Sum a^2 + Sum b^2) Note that whereas normalized correlation consists of the sum of the pairwise products of A and B divided by the geometric mean of the sum of their squares, this new measure, referred to as pseudo-normalized correlation, is the sum of the products divided by the arithmetic mean of the sums of the squares. Since it involves an addition and a halving rather than a multiplication and a square root, it is actually easier to calculate. The prettiness of the result leads me to suspect that a more elegant derivation exists. This pseudo-normalized measure works as well as any of the alternatives, and is easy to compute. It is the standard correlation measure in cart software. Timing The formula above is expressed in terms of A and B with the means subtracted out. It can be translated into an expression involving Sum A, Sum A^2, Sum B, Sum B^2 and Sum (A-B)^2. By evaluating the terms involving only A, the source window, outside of the main correlation loop, the work in the inner loop can be reduced to evaluating Sum B, Sum B^2 and Sum (A-B)^2. This is done in three PDP-10 machine instructions per point by using a table in which entry i contains both i and i^2 (in subfields), and by generating in line code representing the source window, three instructions per pixel, eliminating the need for inner loop end tests and enabling the A-B computation to be done during indexing. On our KL-10 each pixel comparison takes about one microsecond. The time required to locate an 8 by 8 window in a 16 by 16 search area is about 10 milliseconds. A single feature requires 5 such searches, for a total per feature time of 50 ms. One of the three instructions could be eliminated if Sum B and Sum B^2 were precomputed for every position in the picture. This can be done incrementally, involving examination of each pixel only twice, and would result in an overall speedup if many features are to be searched for in the same general area. The correlator has approximately a 10 percent error rate on features selected by the interest operator in our sample pictures. Typical image pairs are generally taken about two feet apart with a 60 degree field of view camera. Scale Changes As the vehicle moves forward the image seen by the camera changes. The major component of this perspective transformation is an enlargement of nearby objects. It would probably be advantageous for the correlator to consider the possibility that the feature it is looking for has changed size. An attempt at this has been made. In the basic system the digitized images are reduced by powers of two, and the correlator always compares two equally reduced images. Applying it to two images reduced by different amounts is equivalent to considering a power of two scale change. If the correlation measure of the best match is better in such an asymmetric comparison, a scale change has probably been detected. Powers of two clearly do not provide enough scale resolution. Higher resolution could be provided by generating sets of images reduced by factors less than two. Core size limits the number of intermediate reductions that can be handled by this method, since each new step essentially involves duplicating a full set of binary reductions of the pictures. A compromise has been reached by generating a picture 2^(2/3) as large as each of the binary steps. By comparing two pictures at the same level we get a scale change of unity, by comparing a power of two with a 2^(2/3) reduced one we get 2^(1/3) reduction, by comparing a 2^(2/3) reduced one with one reduced by an factor of two we get an effective factor of 2^(2/3) reduction, and finally by comparing two at adjacent binary levels we get a full factor of 2. Thus two steps intermediate between binary reductions are obtained at the cost of only a single extra set of pictures. Since correlating across reduction levels adds to the cost in proportion to the number of scales tested, the current algorithms do this only if the match at the 1:1 reduction is not sufficiently good (a coefficient in excess of about .7). The results are disappointing. This method causes success on some features that fail when scale changes are not considered, but also introduces approximately a corresponding number of new errors. It seems most helpful at the intermediate stages of the binary search. Apparently in very high levels of reduction the background dominates, small nearby objects are blurred out of existence, and at very low levels the type of feature the interest operator picks (corners, etc.) are largely unchanged by scale changes. At intermediate levels, the correlation window covers large nearby objects, such as rocks, bushes and chairs (in our samples), and scale change is important. Experiments with ways of applying trial reductions more selectively will be conducted in the near future. High Resolution Correlator Consider the following problem. A pair of stereo pictures is available. For a given point in Picture 1, it is desired to find the corresponding point in Picture 2. It will be assumed here that a higher-level process has found a tentative approximate matching point in Picture 2, and that there is an area surrounding this point, called the search window, in which the correct matching point can be assumed to lie. A certain area surrounding the given point in Picture 1, called the match window, will be used to match against corresponding areas in Picture 2, displaced by various amounts within the search window in order to obtain the best match. Thus when the matching process (correlator) is given a point in one picture of a stereo pair and an approximate matching point in the other picture, it produces an improved estimate of the matching point, suppressing the noise as much as possible based on the statistics of the noise. It also produces an estimate of the accuracy of the match in the form of a two-by-two covariance matrix, and an estimate of the probability that the match is consistent with the statistics of the noise in the pictures, rather than being an erroneous match. This probability will be useful in guiding a higher-level search needed to produce a dense sampling of matched points. Let A[1](x,y) represent the brightness values in Picture 1, A[2](x,y) represent the brightness values in Picture 2, x[1],y[1] represent the point in Picture 1 that we desire to match, x[2],y[2] represent the center of the search window in Picture 2, w[m] represent the width of the match window (assumed to be square), and w[s] represent the width of the search window (assumed to be square), where x and y take on only integer values. The following assumptions are made. A[1] and A[2] consist of the same true brightness values displaced by an unknown amount in x and y, and with normally distributed random errors added. The errors are uncorrelated with each other, both within a picture (autocorrelation) and between pictures (cross correlation), and the errors are uncorrelated with the true brightness values. We temporarily assume that the variance of the errors is known for every point in each picture. We now wish to find the matching point x[m],y[m] which will produce the best match of A[2](x+x[m]-x[1],y+y[m]-y[1]) to A[1](x,y) in some sense. Traditionally the match which maximized the correlation between A[1] and A[2] has been used. Indeed, this produces the optimum result if one of two functions has no noise, in which case it is the matched filter problem. However, here both functions have noise. This fact introduces fluctuations in the cross-correlation function which may cause its peak to differ from the expected value. Ad hoc smoothing techniques could be used to reduce this effect, but an optimum solution can be derived from the assumed statistics of the noise. Let epsilon represent the w[m]^2 - vector of the differences A[2](x+x[m]-x[1],y+y[m]-y[1]) - A[1](x,y) over the w[m] by w[m] match window, for a given trial value of x[m],y[m], and let x[c],y[c] represent the true (unknown) value of x[m],y[m]. Let P represent a probability and p represent a probability density with respect to the vector epsilon. Then by Bayes' theorem P(x[m],y[m]=x[c],y[c] | epsilon) = (P(x[m],y[m]=x[c],y[c]) p(epsilon | x[m],y[m]=x[c],y[c])) / (sigma P(x[m],y[m]=x[c],y[c]) p(epsilon | x[m],y[m]=x[c],y[c])) If we assume that the a priori probability P(x[m],y[m]=x[c],y[c]) is constant over the search window and is zero elsewhere, this reduces to P(x[m],y[m]=x[c],y[c] | epsilon) = k p(epsilon | x[m],y[m]=x[c],y[c]) where k is any constant of proportionality. Since epsilon consists of uncorrelated normally distributed random variables, p(epsilon | x[m],y[m]=x[c],y[c]) = k Product exp(-(.5 epsilon[i]^2)/(sigma[1]^2 + sigma[2]^2)) = k exp(-(.5 sigma epsilon[i]^2)/(sigma[1]^2 + sigma[2]^2)) = k w where w = exp(-.5 sigma \!jdiv((epsilon[i]^2),sigma[1]^2 + sigma[2]^2)) and where epsilon[i] denotes the components of epsilon, sigma[2] and sigma[2] are the standard deviations of A[1] and A[2], and the product and sum are taken over the match window. Thus, P(x[m],y[m]=x[c],y[c] | epsilon) = k w We define the optimum estimate of x[m],y[m] (x[0],y[0]) to be the mathematical expectation of x[m],y[m] according to the above probability distribution. Thus x[0] = sigma w x[m] / sigma w y[0] = sigma w y[m] / sigma w where the sums are taken over the search window. The variances and covariance of x[0] and y[0] are given by the second moments of the distribution around the expected values: sigma[x^2] = sigma w x[m]^2 / sigma w - x[0]^2 sigma[y^2] = sigma w y[m]^2 / sigma w - y[0]^2 sigma[xy] = sigma w x[m] y[m] / sigma w - x[0] y[0] The covariance matrix of x[0] and y[0] consists of sigma[x^2] and sigma[y^2] on the main diagonal and sigma[xy] on both sides off the diagonal. It might appear that the above analysis is not correct because of the fact that certain combinations of errors at each point of each picture are possible for more than one match position, and the probability of these combinations is split up among these match positions. However, this fact does not influence the results, as can be seen from the following reasoning. The possible errors at each point of each picture form a multidimensional space. When a particular match position is chosen, a lower-dimensioned subspace of this space is selected, in order to be consistent with the measured brightness values. When another match is chosen, a different subspace is selected. These two subspaces in general intersect, if at all, in a subspace of an even lower number of dimensions. Thus the hypervolume (in the higher subspace) of this lower subspace is zero. Therefore, the fact that the two subspaces intersect does not change the computed probabilities. Now suppose that the standard deviations sigma[1] and sigma[2] are not known. It is possible to estimate them (actually, the sum of their squares, which is what is needed in the equation for w) from the data if it is assumed that they are constant, that is, the noise does not vary across the pictures. Let v equal the constant value of sigma[1]^2 + sigma[2]^2. Then epsilon.epsilon / w[m]^2 (the mean square value of the components of epsilon) is an estimate for v, where . denotes the vector dot product. However, this value is different for each possible match position x[m],y[m]. The method used to obtain the best value for v is to average all of these values for v, weighted by the probability for each match position p(x[m],y[m]=x[c],y[c] | epsilon) = w. Thus a preliminary variance estimate is computed by v' = sigma w epsilon.epsilon / w[m]^2 sigma w where the sums are taken over the search window. However, this averaging process introduces a bias because of the statistical tendency for the smaller values to have the greater weights. It can be shown that this effect causes the estimate of variance to be too small by a ratio that can be anywhere from .5 to 1. Therefore, an empirically determined approximate correction factor is applied to the variance estimate as follows: v = v' / ( 1 - .5 (1 - u/v')^.3 ) where u is the minimum value of epsilon.epsilon/w[m]^2 over the search window. Since the computation of w requires the value of sigma[1]^2 +sigma[2]^2 (=v), the above process is iterative. An estimate of an upper limit to the variance is also computed from the high-frequency content of the pictures. The overall variance estimate used in the above equations is obtained by an appropriate weighted average of the a priori given value, the derived value, and the computed upper limit. The probability of a correct match is computed by comparing the derived variance to the a priori variance and the upper limit (high-frequency variance) by means of F-tests. Because of the finite window size, the computed covariance matrix will be an under-estimate. An approximate correction for this effect is made by computing the eigenvalues and eigenvectors of the covariance matrix, applying a correction to the eigenvalues, and then reconstructing the covariance matrix from the eigenvalues and eigenvectors. A more accurate correction may be implemented in the future. The above computations assume that the shift between the two pictures is always an integer number of pixels. In cases where the correlation peak is broad, the smoothing process inherent in the moment computation for x[0], y[0], sigma[x^2], sigma[y^2], and sigma[xy] cause a reasonable interpolation to be performed if the correct answer lies between pixels. However, when the correlation peak is sharp, this will not happen, and the answer will tend towards the nearest pixel to the correct best match. This is not particularly serious insofar as it affects the position estimate, but it can have a serious effect on the probability estimate. This is because the epsilon vector should be much smaller at the correctly interpolated point than it is at the nearest pixel, because of the sharp peak. Therefore, the probability may come out much too small, indicating a bad match, whereas the match is really good but lies between pixels. In the future, a special interpolation process will be added to overcome this deficiency. In general, there will be changes in brightness and contrast between the two pictures. To allow for this, provision for removing the mean and normalizing will be added in the future. This will include the ability to include weight for a priori values of brightness and contrast, instead of necessarily leaving these completely free. Camera Calibration Optics, especially electron optics, tend to have peculiar geometric properties. If the camera is to be used as a measuring instrument such distortions should be taken into account. To this end we have written a camera calibration program which is given an image of a square array of black spots on a white background, the ratio of the array center to lens center distance and the spot spacing, and which computes a distortion polynomial that enables position in the image to be translated accurately to angle in space. The program tolerates a wide range of spot parameters (about 3 spots across to 12 spots across), arbitrary image rotation, and is very robust. After being intensely fiddled with to work successfully on an initial set of 20 widely varying images, it has worked without error on 50 successive images. The test pattern for the cart is a ten foot square painted on a wall, with two inch spots at one foot intervals. The algorithm reads in an image of such an array, and begins by determining its approximate spacing and orientation. It trims the picture to make it square, reduces it by averaging to 64 by 64, calculates the fourier transform of the reduced image and takes its power spectrum, arriving at a 2D transform symmetric about the origin, and having strong peaks at frequencies corresponding to the horizontal and vertical spacing and half the diagonal spacing, with weaker peaks at the harmonics. It multiplies each point i,j in this transform by point -j,i and points j-i,j+i and i+j,j-i, effectively folding the primary peaks onto one another. The strongest peak in the 90 degree wedge around the y axis gives the spacing and orientation information needed by the next part of the process. The interest operator is applied to the original image, and the most interesting feature in the central portion of the picture is assumed to be a spot, and used as starting seed to find the others. A special operator examines a window surrounding this position, generates a histogram of intensity values within the window, decides a threshold for seperating the black spot from the white background, and calculates the centroid and first and second moment of the spot. This operator is again applied at a displacement from the first centroid indicated by the orientation and spacing of the grid, and so on, the region of found spots growing outward from the seed. A least squares polynomial in two variables relating the actual position of the spots to the ideal position is then generated. Results During the past year early versions of the routines described were used in a program which drove the vehicle in a straight lines or roughly uniform arcs by acquiring and tracking features in the vicinity of the horizon, and using their motion from frame to frame to build up a model of vehicle response, and to servo on the desired path. Curved paths cannot be done very accurately because the current hardware provides no facilities for sending back data other than camera image. In particular no data about distance travelled was returned, it having to be estimated from drive motor on time. Steering angle also had to be guessed from steering motor timing. The program tried to deduce reasonable coefficients for these functions by observing the differential effects of small corrections. Since the motor speeds are unregulated, the parameters change with terrain and battery voltage. In spite of these problems the runs were quite successful. The cart proceeds in one second spurts, each of which takes it forward about two feet. A picture is digitized after each such step, and the correlator and, if necessary, interest operator are applied, displacements of the positions of the features from the last picture are calculated, bad matches are eliminated by a clustering technique which eliminates features not agreeing in displacement with the majority, steering correction is calculated, motor parameters are updated, and commands for another step are transmitted. On the KA-10 the time taken for this process was about 35 seconds real time per move without displays, or one minute if the digitized image and other interesting information was displayed. Several runs of between 20 and 60 steps were completed, both indoors and outdoors, in bare and cluttered conditions. The runs were usually terminated by a failure of the radio control link, a 100 mw model airplane controller, disrupting the steering timing and causing the vehicle to veer sharply in one or the other direction. We have recently acquired a pair of high quality mobile CB tranceivers, which are in the process of being incorporated into the cart system, and which should improve the control situation immensely. They also provide the opportunity for an eventual data link from the vehicle, as soon as the control logic is upgraded. Hardware problems and a serious manpower limitation have prevented any runs on our new, four month old, KL-10. These are almost overcome and more ambitious runs, involving the camera solver and obstacle avoidance will be attempted in the near future. The camera solver has been applied successfully to isolated image pairs to determine the distance of nearby objects.