Deducing the 3D location of features from their projections in 2D
images requires that we know their position in two or more such images.
The correlator is a subroutine that, given a description of a
feature as produced by the interest operator from one image, finds the
best match in a different, but similar, image. Its search area can be the
entire new picture, or a rectangular sub-window.
\topgra\figsiz{4.7}{3.3}
\grugrfig{6-1}{U:CORLS.GOD[DIA,HPM]}{U:CORRS.GOD[DIA,HPM]}{
Areas matched in a binary search correlation. Picture at top contains
originally chosen feature. The outlined areas in it
are the prototypes which are searched for in the bottom picture.
The largest rectangle is matched first, and the area of best match
in the second picture becomes the search area for the next smaller
rectangle. The larger the rectangle, the lower the resolution of
the pictures in which the matching is done.}
\topgra\figsiz{4.75}{3.37}
\piupifig{6-2}{U:LINEX.PIC[DIA,HPM]}{U:REDEX.PIC[DIA,HPM]}{
The ``conventional'' representation of a feature (above) used in
documents such as this one, and a more realistic version which
graphically demonstrates the reduced resolution of the larger
windows. The bottom picture was reconstructed entirely from
the window sequence used with a binary search correlation.
The coarse outer windows were interpolated to reduce quantization
artifacts.}
The search uses a coarse to fine strategy, illustrated in Figure 6-1,
that begins in reduced versions of the pictures. Typically the
first step takes place at the x16 (linear) reduction level. The 6 by 6
window at that level in the feature description, that covers about one
seventh of the total area of the original picture, is convolved with the
search area in the correspondingly reduced version of the second picture.
The 6 by 6 description patch is moved pixel by pixel over the
approximately 15 by 16 destination picture, and a correlation coefficient
is calculated for each trial position.
The position with the best match is recorded. The 6x6 area it
occupies in the second picture is mapped to the x8 reduction level, where
the corresponding region is 12 pixels by 12. The 6 by 6 window in the x8
reduced level of the feature description is then convolved with this 12 by
12 area, and the position of best match is recorded and used as a search
area for the x4 level.
The process continues, matching smaller and smaller, but more and
more detailed windows until a 6 by 6 area is selected in the unreduced
picture.
The work at each level is about the same, finding a 6 by 6 window
in a 12 by 12 search area. It involves 49 summations of 36 quantities. In
our example there were 5 such levels. The correlation measure used is
${2\sum ab}/({\sum a^2}+{\sum b^2})$, where $a$ and $b$ are the values of
pixels in the two windows being compared, with the mean of windows
subtracted out, and the sums are taken over the 36 elements of a 6 by 6
window. The measure has limited tolerance to contrast differences.
The window sizes and other parameters are sometimes different from
the ones used in this example.
In general, the program thus locates a huge general area around
the feature in a very coarse version of the images, and successively
refines the position, finding smaller and smaller areas in finer and finer
representations. For windows of size $n$, the work at each level is
approximately that of finding an $n$ by $n$ window in a $2n$ by $2n$ area,
and there are $\log2(w/n)$ levels, where $w$ is the smaller dimension of
the search rectangle, in unreduced picture pixels.
This approach has many advantages over a simple pass of of a
correlation coefficient computation over the search window. The most
obvious is speed. A scan of an 8 by 8 window over a 256 by 256 picture
would require 249x249x8x8 comparisons of individual pixels. The binary
method needs only about 5x81x8x8, about 150 times fewer. The advantage is
lower for smaller search areas. Perhaps more important is the fact that
the simple method exhibits a serious jigsaw puzzle effect. The 8 by 8
patch is matched without any reference to context, and a match is often
found in totally unrelated parts of the picture. The binary search
technique uses the general context to guide the high resolution
comparisons. This makes possible yet another speedup, because smaller
windows can be used. Window sizes as small as 2 by 2 work reasonably well.
The searches at very coarse levels rarely return mismatches, possibly
because noise is averaged out in the reduction process, causing
comparisons to be more stable. Reduced images are also more tolerant of
geometric distortions.
\topgra\figsiz{4.8}{3.32}
\grugrfig{6-3}{U:LS50.GOD[DIA,HPM]}{U:RS50.GOD[DIA,HPM]}{
Example of the correlator's performance on a difficult example.
The interest operator has chosen features in the upper image,
and the correlator has attempted to find corresponding regions
in the lower one. The cart moved about one and a half meters forward
between the images. Some mistakes are evident. The correlator
had no a-priori knowledge about the relationship of the
two images and the entire second image was searched for each
feature.}
\topgra\figsiz{5.5}{4.0}
\grugrfig{6-4}{U:LS70.GOD[DIA,HPM]}{U:RS70.GOD[DIA,HPM]}{
An outdoor application of the binary search correlator.}
The current routine uses a measure for the measure for
the cross correlation which I call {\sl pseudo normalized}, given
by the formula $${2 \sum{ab} \over \sum{a^2} + \sum{b^2}}$$
that has limited contrast sensitivity, avoids the degeneracies of
normalized correlation on informationless windows, and
is slightly cheaper to compute. A description of its derivation
may be found in Appendix 6.
\heading{Timing}
The formula above is expressed in terms of $A$ and $B$ with the means
subtracted out. It can be translated into an expression involving
$\sum{A}$, $\sum{A^2}$, $\sum{B}$, $\sum{B^2}$ and $\sum{(A-B)^2}$.
By evaluating the terms involving only $A$,
the source window, outside of the main correlation loop, the work
in the inner loop can be reduced to evaluating
$\sum{B}$, $\sum{B^2}$ and $\sum{(A-B)^2}$.
This is done in three PDP-10 machine instructions per point by
using a table in which entry $i$ contains both $i$ and $i^2$ in
subfields, and by generating in-line code representing the
source window, three instructions per pixel, eliminating the need
for inner loop end tests and enabling the $A-B$ computation to be done
during indexing.
Each pixel comparison takes about one
microsecond. The time required to locate an 8 by 8 window
in a 16 by 16 search area is about 10 milliseconds. A single feature
requires 5 such searches, for a total per feature time of 50 ms.
One of the three instructions could be eliminated
if $\sum{B}$ and $\sum{B^2}$ were precomputed for every position
in the picture. This can be done incrementally, involving examination
of each pixel only twice, and would result in an overall speedup if
many features are to be searched for in the same general area.
The correlator has approximately a 10\%\ error rate on
features selected by the interest operator in our sample pictures.
Typical image pairs are generally taken about two feet apart with a
$60\deg$ field of view camera.