Cart Project Progress Report
Donald Gennery
Hans Moravec
Stanford University
September 22, 1976
CONTENTS
1 Introduction
2 Interest Operator
3 Binary Search Correlator
4 Correlation Coefficients
4 Normalized Correlation
5 Pseudo-normalized Correlation
6 Timing
8 Scale Changes
9 High Resolution Correlator
10 Camera Calibration
11 Results
Introduction
This report describes work towards autonomous vehicles able to
maneuver through cluttered environments. The emphasis thus far has
been on visual techniques which locate the clutter in three dimensions
and avoid it, but do not identify its nature.
Our hardware is an electric vehicle, remotely controlled over
a citizens band radio link by a PDP-KL10. It carries a black and white
television camera whose picture is broadcast over a UHF channel, and
received and occasionally digitized by the computer. The vehicle has
drive motors for the rear wheels, a steering motor coupled to a
steering bar arrangement on the front wheels, and a motor controlling
the camera pan angle. Each can be commanded to run forward or
backward. There is a potentiometer on the steering and pan linkages
which enables them to be commanded to point straight ahead. The
mechanical arrangements are crude, the motor speeds are unregulated,
and there is no feedback to the computer other than the video from the
camera. Overall dead reckoning errors are on the order of 30%. Most
of the computer controlled runs so far have been with the camera
pointed straight ahead and the vehicle moving forward in two foot
increments, punctuated by long pauses, during which the images are
processed.
The key elements in the vision software are three primitive
picture operators and a geometric camera solver. The "interest
operator" locates small patches, called features, scattered more or
less uniformly over images and having the property that the
corresponding points are likely unambiguously findable in subsequent
images. The "binary search correlator" takes features in one image
and attempts to find them in subsequent images, altered slightly by
vehicle motion. Both operators make extensive use of versions of the
pictures reduced by factors of 2, 4, 8, 16 etc. in linear dimension.
The "high-resolution correlator" takes approximate matches of points
in pairs of images and produces matches of greater accuracy,
especially in areas of the picture with low signal to noise ratio. It
also produces an accuracy estimate. The "camera solver" is given the
location of at least five corresponding features in two pictures, and
deduces the relative camera positions and orientations and the three
dimensional locations of the features.
Peripheral routines include a camera calibrator which computes
a two dimensional polynomial for correcting distortions of a camera
when the camera is placed in front of a screen containing a square
array of black spots on a white background. Other operators sometimes
used are a high pass filter, a high frequency noise remover, a
histogram normalizer, a vertical sync loss corrector, a picture
comparator and an operator which reduces pictures by other than powers
of two.
Interest Operator
This routine is used to acquire new features in a scene. It
tries to select a relatively uniform scattering, to minimize the
probability that important obstacles will be missed, and also attempts
to choose areas which can be found easily by the correlator. Both
goals are achieved by returning regions which are local maxima of a
directional variance measure. Featureless areas and simple edges
(which have no variance in the direction of the edge) are avoided by
this process.
Directional variance is measured over small square windows of
whose size is chosen by the user. Sums of squares of differences of
pixels adjacent in each of four directions (horizontal, vertical and
two diagonals) over the window are obtained. The variance of the
window is the minimum of these four sums.
An earlier version of the algorithm computed this function
over a grid of adjacent non-overlapping windows. A window was
declared a local maximum if it was larger than any of its eight
immediate neighbors. Interesting regions were sometimes missed by
this arrangement when the high contrast parts straddled the boundaries
of windows.
Performance has been improved by increasing the number of
windows by a factor of four, including sets shifted half a window size
horizontally, vertically and diagonally. To be a local maximum, a
window must now be the largest of the twenty five which overlap and
contact it.
Since the variance measure depends on adjacent pixel
differences, it responds to high frequency noise in the image. This
undesirable effect is circumvented by applying the interest operator
to a reduced version of the picture, where noise is lessened by the
averaging. The smaller image also means a faster program. The
standard procedure is to choose the power of two reduction in which
the specified window size is reduced to 2 or 3 pixels on a side. A
window size of 8 is typical, in which case the operator is applied to
the picture reduced twice, i.e. by a linear factor of 4.
The local maxima found are stored in an array, sorted in order
of decreasing variance.
The entire process on a typical 260 by 240 image, using 8 by 8
windows takes about 75 milliseconds on the KL-10. The variance
computation and local maximum test are coded in FAIL (our assembler),
the maxima sorting and top level are in SAIL (Algol).
Binary Search Correlator
Given a feature in one picture, this routine attempts to find
the corresponding region in a second image.
The correlator is given a source position in the first
picture, a rectangular search window (often the whole image) in the
second picture, and and a feature window size n (8 is typical),
representing the side of a square window.
The search uses a coarse to fine strategy, which begins in
reduced versions of the pictures. The order of reduction is chosen to
make the smaller dimension of the search rectangle between n and 2n
pixels. An n by n window in the shrunken source image, centered around
the desired feature, is considered. Because this version of the image
is so small, this window covers about 25% of the picture, if the
search window is the entire picture. A correlation coefficient (see
below) is calculated for each possible placement of the source window
in the search area, quantizing the positioning to whole pixel steps.
If the search window is exactly 2n by 2n, there are (n+1)^2
positions. The one which results in the highest coefficient is used as
the search area for the next level of refinement.
The process is repeated at the next lower order of reduction,
i.e. with pictures that are linearly twice as large. An n by n window
is again centered around the location of the source feature, and is
searched for in the area that corresponds to the winning window in the
previous search, which expands to 2n by 2n at the new level of
reduction.
This goes on in successively larger versions of the pictures
until an n by n window is located in the original unreduced images.
The program thus locates a huge general area around the
feature in a very coarse version of the images, and successively
refines the position, finding smaller and smaller areas in finer and
finer representations. The work at each level is approximately that of
finding an n by n window in a 2n by 2n area, and there are log2 w/n
levels, where w is the smaller dimension of the search rectangle, in
unreduced picture pixels.
This approach has many advantages over a simple pass of of a
correlation coefficient computation over the search window. The most
obvious is speed. A scan of an 8 by 8 window over a 256 by 256 picture
would require 249x249x8x8 comparisons of individual pixels. My method
needs only about 5x81x8x8, about 150 times fewer. The advantage is
lower for smaller search areas. Perhaps more important is the fact
that the simple method exhibits a serious jigsaw puzzle effect. The 8
by 8 patch is matched without any reference to context, and a match is
often found in totally unrelated parts of the picture. The binary
search technique uses the general context to guide the high resolution
comparisons. This makes possible yet another speedup, because smaller
windows can be used. Window sizes as small as 2 by 2 work reasonably
well. The searches at very coarse levels rarely return mismatches,
possibly because noise is averaged out in the reduction process,
causing comparisons to be more stable. Reduced images are also
tolerant of geometric distortions.
Correlation coefficients
The problem is, given two n by n arrays of pixels, referred to
as A[i,j] and B[i,j], determine how alike they are.
One of the simplest comparisons is:
Sum[i,j = (0,0)..(n,n)] (A[i,j]-B[i,j])^2
If the camera did exact photometric measurements, and if lighting
conditions remained unchanged between frames, this measure would be
ideal. Unfortunately the camera response varies from place to place on
the camera field due to the nature of the optics and complicated
vidicon effects, from scene to scene because of target voltage
response to illumination, and from time to time due to changes in
battery voltage. In spite of these drawbacks the measure has been used
successfully.
Normalized Correlation
Another alternative is "normalized correlation", the RMS
average displacement of co-ordinate pairs (A[i,j],B[i,j]) from a least
squares regression line relating A to B. Let a[i,j] = A[i,j]-Abar and
b[i,j] = B[i,j]-Bbar , Abar and Bbar being the means over the entire
window. The normalized correlation coefficient is defined as:
sigma = Sum[i,j = (0,0)..(n,n)] a[i,j]b[i,j] /
sqrt( Sum[i,j = (0,0)..(n,n)] a[i,j]^2
x Sum[i,j = (0,0)..(n,n)] b[i,j]^2 )
Normalized correlation is invariant under all linear transformations
of the values of A and B.
The complete contrast insensitivity this implies is a needless
waste of information, because contrast changes in actual pictures are
relatively small from frame to frame. In addition, the measure has a
degeneracy when all the values of either A or B are the same. This
will happen often if the search window contains uniformly shaded
areas, and must be handled by a special test. A different measure is
called for.
Pseudo-normalized Correlation
Although the correlation coefficient itself remains unchanged,
the "line of best fit" is different when calculated from A to B than
from B to A:
for the best fit of A to B, b = ka, we wish to find k which minimizes
Sum (ka-b)^2 this is k = Sum ab / Sum a^2
the line can be expressed as
a = sigma sqrt(Sum a^2)) / sqrt(Sum b^2) b
for the best fit of B to A, a = kb, we wish to find k which minimizes
Sum (kb-a)^2 this is k = Sum ab / Sum b^2
the line can be expressed as
b = sigma sqrt(Sum b^2) / sqrt(Sum a^2) a
these are equivalent if sigma is unity, i.e. if the correlation is
perfect. Since for our application, the comparison of two picture
patches, there is no particular reason for choosing one over the
other, we compromise and choose a line equidistant from the two,
removing the asymmetry between A and B:
b = sqrt(Sum b^2) / sqrt(Sum a^2) a
or equivalently
a = sqrt(Sum a^2) / sqrt(Sum b^2) b
We will use this to obtain a correlation measure more suited to our
needs.
Small contrast changes should be tolerated, large ones should
not. Contrast is expressed by the slope of the line of best fit, a
slope of unity denoting a perfect match, slopes of zero and infinity
indicating that one or the other of the patches has no variations in
intensity. Consider the normalized dot product of the line of best
fit with a line of unity slope. Since this is the cosine of the angle
between them, it will be unity for a perfect match, and very near
unity for small contrast differences, since the cosine is flat around
zero angle, and zero for a negative correlation. If we were to
multiply this dot product by the normalized correlation coefficient,
we would obtain a measure which was unity only for perfect correlation
with no contrast differences, dropping rapidly with lessening
correlation, and slowly at first and then rapidly with increasing
contrast differences.
This measure still has a few flaws. If one of the patches is
without intensity variations, the normalized correlation coefficient
becomes 0/0 and the dot product 1/sqrt(2). In fact we want our
measure to be zero in this case. This can be arranged if we multiply
the normalized coefficient by the cosine of twice the angle between
the correlation line and a line of slope 1. This cosine will be zero
for lines of slope zero or infinity, and the limit of the product with
the normalized correlation coefficient will also be zero in those
cases. In addition, perfect negative correlation will again appear as
minus unity, which might be an advantage in some circumstances.
Deriving an expression for this measure, we note that the
cosine of our "compromise" correlation line with one of unity slope is
(sqrt(Sum a^2) + sqrt(Sum b^2)) / sqrt(2 (Sum a^2 + Sum b^2))
To convert this to the cosine of twice the angle with a=b, we use the
identity
cos(2t) = 2 cos^2(t) - 1
giving us
2 (sqrt(Sum a^2) + sqrt(Sum b^2))^2 / 2 (Sum a^2 + Sum b^2) - 1
= 2 sqrt(Sum a^2 Sum b^2)/(Sum a^2 + Sum b^2)
multiplying by the normalized correlation coefficient, we get
2 sqrt(Sum a^2 Sum b^2) / (Sum a^2 + Sum b^2)
Sum ab / sqrt(Sum a^2 Sum b^2)
= 2 Sum ab / (Sum a^2 + Sum b^2)
Note that whereas normalized correlation consists of the sum
of the pairwise products of A and B divided by the geometric mean of
the sum of their squares, this new measure, referred to as
pseudo-normalized correlation, is the sum of the products divided by
the arithmetic mean of the sums of the squares. Since it involves an
addition and a halving rather than a multiplication and a square root,
it is actually easier to calculate. The prettiness of the result leads
me to suspect that a more elegant derivation exists.
This pseudo-normalized measure works as well as any of the
alternatives, and is easy to compute. It is the standard correlation
measure in cart software.
Timing
The formula above is expressed in terms of A and B with the means
subtracted out. It can be translated into an expression involving
Sum A, Sum A^2, Sum B, Sum B^2 and Sum (A-B)^2.
By evaluating the terms involving only A, the source window,
outside of the main correlation loop, the work in the inner loop can
be reduced to evaluating Sum B, Sum B^2 and Sum (A-B)^2. This is done
in three PDP-10 machine instructions per point by using a table in
which entry i contains both i and i^2 (in subfields), and by
generating in line code representing the source window, three
instructions per pixel, eliminating the need for inner loop end tests
and enabling the A-B computation to be done during indexing.
On our KL-10 each pixel comparison takes about one
microsecond. The time required to locate an 8 by 8 window in a 16 by
16 search area is about 10 milliseconds. A single feature requires 5
such searches, for a total per feature time of 50 ms.
One of the three instructions could be eliminated if Sum B and
Sum B^2 were precomputed for every position in the picture. This can
be done incrementally, involving examination of each pixel only twice,
and would result in an overall speedup if many features are to be
searched for in the same general area.
The correlator has approximately a 10 percent error rate on
features selected by the interest operator in our sample pictures.
Typical image pairs are generally taken about two feet apart with a 60
degree field of view camera.
Scale Changes
As the vehicle moves forward the image seen by the camera
changes. The major component of this perspective transformation is an
enlargement of nearby objects. It would probably be advantageous for
the correlator to consider the possibility that the feature it is
looking for has changed size. An attempt at this has been made.
In the basic system the digitized images are reduced by powers
of two, and the correlator always compares two equally reduced images.
Applying it to two images reduced by different amounts is equivalent
to considering a power of two scale change. If the correlation measure
of the best match is better in such an asymmetric comparison, a scale
change has probably been detected.
Powers of two clearly do not provide enough scale resolution.
Higher resolution could be provided by generating sets of images
reduced by factors less than two. Core size limits the number of
intermediate reductions that can be handled by this method, since each
new step essentially involves duplicating a full set of binary
reductions of the pictures. A compromise has been reached by
generating a picture 2^(2/3) as large as each of the binary steps. By
comparing two pictures at the same level we get a scale change of
unity, by comparing a power of two with a 2^(2/3) reduced one we get
2^(1/3) reduction, by comparing a 2^(2/3) reduced one with one reduced
by an factor of two we get an effective factor of 2^(2/3) reduction,
and finally by comparing two at adjacent binary levels we get a full
factor of 2. Thus two steps intermediate between binary reductions are
obtained at the cost of only a single extra set of pictures.
Since correlating across reduction levels adds to the cost in
proportion to the number of scales tested, the current algorithms do
this only if the match at the 1:1 reduction is not sufficiently good
(a coefficient in excess of about .7).
The results are disappointing. This method causes success on
some features that fail when scale changes are not considered, but
also introduces approximately a corresponding number of new errors.
It seems most helpful at the intermediate stages of the binary
search. Apparently in very high levels of reduction the background
dominates, small nearby objects are blurred out of existence, and at
very low levels the type of feature the interest operator picks
(corners, etc.) are largely unchanged by scale changes. At
intermediate levels, the correlation window covers large nearby
objects, such as rocks, bushes and chairs (in our samples), and scale
change is important.
Experiments with ways of applying trial reductions more
selectively will be conducted in the near future.
High Resolution Correlator
Consider the following problem. A pair of stereo pictures is
available. For a given point in Picture 1, it is desired to find the
corresponding point in Picture 2. It will be assumed here that a
higher-level process has found a tentative approximate matching point
in Picture 2, and that there is an area surrounding this point, called
the search window, in which the correct matching point can be assumed
to lie. A certain area surrounding the given point in Picture 1,
called the match window, will be used to match against corresponding
areas in Picture 2, displaced by various amounts within the search
window in order to obtain the best match.
Thus when the matching process (correlator) is given a point
in one picture of a stereo pair and an approximate matching point in
the other picture, it produces an improved estimate of the matching
point, suppressing the noise as much as possible based on the
statistics of the noise. It also produces an estimate of the accuracy
of the match in the form of a two-by-two covariance matrix, and an
estimate of the probability that the match is consistent with the
statistics of the noise in the pictures, rather than being an
erroneous match. This probability will be useful in guiding a
higher-level search needed to produce a dense sampling of matched
points.
Let A[1](x,y) represent the brightness values in Picture 1,
A[2](x,y) represent the brightness values in Picture 2, x[1],y[1]
represent the point in Picture 1 that we desire to match, x[2],y[2]
represent the center of the search window in Picture 2, w[m] represent
the width of the match window (assumed to be square), and w[s]
represent the width of the search window (assumed to be square), where
x and y take on only integer values.
The following assumptions are made. A[1] and A[2] consist of
the same true brightness values displaced by an unknown amount in x
and y, and with normally distributed random errors added. The errors
are uncorrelated with each other, both within a picture
(autocorrelation) and between pictures (cross correlation), and the
errors are uncorrelated with the true brightness values.
We temporarily assume that the variance of the errors is known
for every point in each picture.
We now wish to find the matching point x[m],y[m] which will
produce the best match of A[2](x+x[m]-x[1],y+y[m]-y[1]) to A[1](x,y)
in some sense. Traditionally the match which maximized the
correlation between A[1] and A[2] has been used. Indeed, this
produces the optimum result if one of two functions has no noise, in
which case it is the matched filter problem. However, here both
functions have noise. This fact introduces fluctuations in the
cross-correlation function which may cause its peak to differ from the
expected value. Ad hoc smoothing techniques could be used to reduce
this effect, but an optimum solution can be derived from the assumed
statistics of the noise.
Let epsilon represent the w[m]^2 - vector of the differences
A[2](x+x[m]-x[1],y+y[m]-y[1]) - A[1](x,y) over the w[m] by w[m] match
window, for a given trial value of x[m],y[m], and let x[c],y[c]
represent the true (unknown) value of x[m],y[m]. Let P represent a
probability and p represent a probability density with respect to the
vector epsilon. Then by Bayes' theorem
P(x[m],y[m]=x[c],y[c] | epsilon) =
(P(x[m],y[m]=x[c],y[c]) p(epsilon | x[m],y[m]=x[c],y[c])) /
(sigma P(x[m],y[m]=x[c],y[c]) p(epsilon | x[m],y[m]=x[c],y[c]))
If we assume that the a priori probability P(x[m],y[m]=x[c],y[c]) is
constant over the search window and is zero elsewhere, this reduces to
P(x[m],y[m]=x[c],y[c] | epsilon)
= k p(epsilon | x[m],y[m]=x[c],y[c])
where k is any constant of proportionality. Since epsilon consists of
uncorrelated normally distributed random variables,
p(epsilon | x[m],y[m]=x[c],y[c])
= k Product exp(-(.5 epsilon[i]^2)/(sigma[1]^2 + sigma[2]^2))
= k exp(-(.5 sigma epsilon[i]^2)/(sigma[1]^2 + sigma[2]^2))
= k w
where
w = exp(-.5 sigma \!jdiv((epsilon[i]^2),sigma[1]^2 + sigma[2]^2))
and where epsilon[i] denotes the components of epsilon, sigma[2] and
sigma[2] are the standard deviations of A[1] and A[2], and the product
and sum are taken over the match window. Thus,
P(x[m],y[m]=x[c],y[c] | epsilon) = k w
We define the optimum estimate of x[m],y[m] (x[0],y[0]) to be
the mathematical expectation of x[m],y[m] according to the above
probability distribution. Thus
x[0] = sigma w x[m] / sigma w
y[0] = sigma w y[m] / sigma w
where the sums are taken over the search window. The variances and
covariance of x[0] and y[0] are given by the second moments of the
distribution around the expected values:
sigma[x^2] = sigma w x[m]^2 / sigma w - x[0]^2
sigma[y^2] = sigma w y[m]^2 / sigma w - y[0]^2
sigma[xy] = sigma w x[m] y[m] / sigma w - x[0] y[0]
The covariance matrix of x[0] and y[0] consists of sigma[x^2] and
sigma[y^2] on the main diagonal and sigma[xy] on both sides off the
diagonal.
It might appear that the above analysis is not correct because
of the fact that certain combinations of errors at each point of each
picture are possible for more than one match position, and the
probability of these combinations is split up among these match
positions. However, this fact does not influence the results, as can
be seen from the following reasoning. The possible errors at each
point of each picture form a multidimensional space. When a
particular match position is chosen, a lower-dimensioned subspace of
this space is selected, in order to be consistent with the measured
brightness values. When another match is chosen, a different subspace
is selected. These two subspaces in general intersect, if at all, in
a subspace of an even lower number of dimensions. Thus the
hypervolume (in the higher subspace) of this lower subspace is zero.
Therefore, the fact that the two subspaces intersect does not change
the computed probabilities.
Now suppose that the standard deviations sigma[1] and sigma[2]
are not known. It is possible to estimate them (actually, the sum of
their squares, which is what is needed in the equation for w) from the
data if it is assumed that they are constant, that is, the noise does
not vary across the pictures. Let v equal the constant value of
sigma[1]^2 + sigma[2]^2. Then epsilon.epsilon / w[m]^2 (the mean
square value of the components of epsilon) is an estimate for v, where
. denotes the vector dot product. However, this value is different
for each possible match position x[m],y[m]. The method used to obtain
the best value for v is to average all of these values for v, weighted
by the probability for each match position p(x[m],y[m]=x[c],y[c] |
epsilon) = w. Thus a preliminary variance estimate is computed by
v' = sigma w epsilon.epsilon / w[m]^2 sigma w
where the sums are taken over the search window. However, this
averaging process introduces a bias because of the statistical
tendency for the smaller values to have the greater weights. It can
be shown that this effect causes the estimate of variance to be too
small by a ratio that can be anywhere from .5 to 1. Therefore, an
empirically determined approximate correction factor is applied to
the variance estimate as follows:
v = v' / ( 1 - .5 (1 - u/v')^.3 )
where u is the minimum value of epsilon.epsilon/w[m]^2 over the search
window. Since the computation of w requires the value of sigma[1]^2
+sigma[2]^2 (=v), the above process is iterative.
An estimate of an upper limit to the variance is also computed
from the high-frequency content of the pictures.
The overall variance estimate used in the above equations is
obtained by an appropriate weighted average of the a priori given
value, the derived value, and the computed upper limit.
The probability of a correct match is computed by comparing
the derived variance to the a priori variance and the upper limit
(high-frequency variance) by means of F-tests.
Because of the finite window size, the computed covariance
matrix will be an under-estimate. An approximate correction for this
effect is made by computing the eigenvalues and eigenvectors of the
covariance matrix, applying a correction to the eigenvalues, and then
reconstructing the covariance matrix from the eigenvalues and
eigenvectors. A more accurate correction may be implemented in the
future.
The above computations assume that the shift between the two
pictures is always an integer number of pixels. In cases where the
correlation peak is broad, the smoothing process inherent in the
moment computation for x[0], y[0], sigma[x^2], sigma[y^2], and
sigma[xy] cause a reasonable interpolation to be performed if the
correct answer lies between pixels. However, when the correlation
peak is sharp, this will not happen, and the answer will tend towards
the nearest pixel to the correct best match. This is not particularly
serious insofar as it affects the position estimate, but it can have a
serious effect on the probability estimate. This is because the
epsilon vector should be much smaller at the correctly interpolated
point than it is at the nearest pixel, because of the sharp peak.
Therefore, the probability may come out much too small, indicating a
bad match, whereas the match is really good but lies between pixels.
In the future, a special interpolation process will be added to
overcome this deficiency.
In general, there will be changes in brightness and contrast
between the two pictures. To allow for this, provision for removing
the mean and normalizing will be added in the future. This will
include the ability to include weight for a priori values of
brightness and contrast, instead of necessarily leaving these
completely free.
Camera Calibration
Optics, especially electron optics, tend to have peculiar
geometric properties. If the camera is to be used as a measuring
instrument such distortions should be taken into account.
To this end we have written a camera calibration program which
is given an image of a square array of black spots on a white
background, the ratio of the array center to lens center distance and
the spot spacing, and which computes a distortion polynomial that
enables position in the image to be translated accurately to angle in
space.
The program tolerates a wide range of spot parameters (about 3
spots across to 12 spots across), arbitrary image rotation, and is
very robust. After being intensely fiddled with to work successfully
on an initial set of 20 widely varying images, it has worked without
error on 50 successive images. The test pattern for the cart is a ten
foot square painted on a wall, with two inch spots at one foot
intervals.
The algorithm reads in an image of such an array, and begins
by determining its approximate spacing and orientation. It trims the
picture to make it square, reduces it by averaging to 64 by 64,
calculates the fourier transform of the reduced image and takes its
power spectrum, arriving at a 2D transform symmetric about the origin,
and having strong peaks at frequencies corresponding to the horizontal
and vertical spacing and half the diagonal spacing, with weaker peaks
at the harmonics. It multiplies each point i,j in this transform by
point -j,i and points j-i,j+i and i+j,j-i, effectively folding the
primary peaks onto one another. The strongest peak in the 90 degree
wedge around the y axis gives the spacing and orientation information
needed by the next part of the process.
The interest operator is applied to the original image, and
the most interesting feature in the central portion of the picture is
assumed to be a spot, and used as starting seed to find the others. A
special operator examines a window surrounding this position,
generates a histogram of intensity values within the window, decides a
threshold for seperating the black spot from the white background, and
calculates the centroid and first and second moment of the spot. This
operator is again applied at a displacement from the first centroid
indicated by the orientation and spacing of the grid, and so on, the
region of found spots growing outward from the seed.
A least squares polynomial in two variables relating the
actual position of the spots to the ideal position is then generated.
Results
During the past year early versions of the routines described
were used in a program which drove the vehicle in a straight lines or
roughly uniform arcs by acquiring and tracking features in the
vicinity of the horizon, and using their motion from frame to frame to
build up a model of vehicle response, and to servo on the desired
path. Curved paths cannot be done very accurately because the current
hardware provides no facilities for sending back data other than
camera image. In particular no data about distance travelled was
returned, it having to be estimated from drive motor on time. Steering
angle also had to be guessed from steering motor timing. The program
tried to deduce reasonable coefficients for these functions by
observing the differential effects of small corrections. Since the
motor speeds are unregulated, the parameters change with terrain and
battery voltage.
In spite of these problems the runs were quite successful.
The cart proceeds in one second spurts, each of which takes it forward
about two feet. A picture is digitized after each such step, and the
correlator and, if necessary, interest operator are applied,
displacements of the positions of the features from the last picture
are calculated, bad matches are eliminated by a clustering technique
which eliminates features not agreeing in displacement with the
majority, steering correction is calculated, motor parameters are
updated, and commands for another step are transmitted. On the KA-10
the time taken for this process was about 35 seconds real time per
move without displays, or one minute if the digitized image and other
interesting information was displayed. Several runs of between 20 and
60 steps were completed, both indoors and outdoors, in bare and
cluttered conditions. The runs were usually terminated by a failure of
the radio control link, a 100 mw model airplane controller, disrupting
the steering timing and causing the vehicle to veer sharply in one or
the other direction. We have recently acquired a pair of high quality
mobile CB tranceivers, which are in the process of being incorporated
into the cart system, and which should improve the control situation
immensely. They also provide the opportunity for an eventual data
link from the vehicle, as soon as the control logic is upgraded.
Hardware problems and a serious manpower limitation have
prevented any runs on our new, four month old, KL-10. These are almost
overcome and more ambitious runs, involving the camera solver and
obstacle avoidance will be attempted in the near future. The camera
solver has been applied successfully to isolated image pairs to
determine the distance of nearby objects.