
 A Bayesian Method for Certainty Grids

 Hans P. Moravec
 Dong Woo Cho (Now at Pohang Institute of
               Science and Technology
                Pohang City, Kyoungbuk, Korea)

 Carnegie-Mellon University
 Pittsburgh, PA  15213  USA

 January 1989

\section{Abstract}

	In earlier work we introduced a probabilistic, finite-element
representation of robot spatial knowledge we call ``certainty grids''.
The grids allow the efficient accumulation of small amounts of
information from individual sensor readings into increasingly accurate
maps of a robot's surroundings. Early experiments using the method to
interpret measurements from a ring of 24 Polaroid sonar transducers
carried on board an autonomously navigating mobile robot were
surprisingly successful, compared with our earlier experiences with
stereo-vision based programs that mapped points on objects as error
distributions in space. These older programs enabled a robot to map
and traverse cluttered 30 meter obstacle courses, succeeding about
three times in four attempts. By contrast the grid method accomplished
a similar task with a vanishingly small failure rate. We then used the
grid approach to a stereo-vision equipped robot, also with excellent
success. A subsequent experiment integrated sonar and vision data,
generating maps with correct features not found in those from either
sensor alone.

	These encouraging early results were obtained using ad-hoc
statistical models and methods. We then developed a Bayesian
statistical foundation for grid updates. A key result of this
derivation was a combining formula for integrating two independently
derived maps of the same area, or for adding a new reading to a
developing map. This combining formula incorporated in one expression
(and improved on) several different parts of the ad-hoc approach. In
this paper we introduce a more specialized Bayesian combining formula
for inserting range readings into maps.  The formula is suitable for
sonar, stereo, laser, proximity and touch measurements. By making use
of the property of this kind of sensor that nearby objects occlude
distant ones, the new (context-sensitive) formula manages to extract
more information from a reading than the older (context-free)
version. To insert a sensor reading, the context free method has a
computational cost linear in the number of grid cells in the sensitive
volume of the sensor.  The context-sensitive formula has a cost
dominated by a term quadratic in the volume of range uncertainty of
the reading.  Using simulated data, we compare the performances of the
context-free formula, the context-sensitive one used incrementally,
and the context-sensitive formula operating in a "batch" mode, in
which every reading of a batch serves as context for all the
others. Given the same input, the context-sensitive formula produces
slightly better maps than the context-free method, and the batch mode
does better than the incremental mode. But typically the differences
are small. A few more readings processed by the cheaper context-free
method can compensate for its slightly less efficient use of each
reading.

	 The paper also shows how this approach allows sensor models
to be learned as a wandering robot equipped with several sensors
gathers experiences.  As an example, a sonar-type sensor whose
characteristics are initially completely unknown is well characterized
after experiencing as few as 1000 random range measurements in a world
well mapped by other sensors.  }

\section{Introduction}

	In a previous paper [\cite{Moravec88}] we described our
experiences with an evolving series of techniques that allow
information from various noisy sensors on a mobile robot to be
accumulated into an increasingly good spatial map of its
surroundings. In these techniques space is represented by a regular
grid with each cell holding a number representing the probability,
based on accumulated sensor readings, that a particular patch of space
is occupied (as opposed to being vacant). In this paper we present new
mathematical developments on these ideas.

\def\n{\overline}
\def\o{{\bf o}}
\def\p{{\bf p}}
\def\P{{\bf P}}

\section{Definitions}

	A certainty grid is a regular finite element model of space.
Each cell of the grid contains a likelihood estimate (our
``certainty'') of a given property of the corresponding region of
space.  Here we are concerned simply occupancy of the region,
represented by $\o_x$.  $\p(\o_x)$ is the likelihood (or probability)
that region $x$ is occupied.  When a discussion contains only one
particular $x$, we will drop the subscript, and refer simply to
$\p(\o)$.

\begin{figure}
\vspace{4in}
\special{picture Venn scaled 325}
\caption{Venn Diagram}
\label{Venn}
\end{figure}

	Let $\p(A|B)$ represent our best estimate of the likelihood of
situation A if we have received information B. By definition

\begin{equation}
\p(A|B) = {{\p(A\land B)}\over{\p(B)}}
\end{equation}

 Plain $\p(A)$ represents our estimate of $A$ given no special
information.  The alternative to situation $A$ will be referred to as
$\n{A}$ (not $A$). For any $B$

\begin{equation}
\p(A|B) + \p(\n{A}|B) = 1
\end{equation}

\section{Fundamental Formula}

	For the two occupancy cases of a cell $\o$ (the cell is
occupied) and $\n{\o}$ (the cell is empty), with prior likelihoods
$\p(\o)$ and $\p(\n{\o})$ and new information $M$, Bayes theorem can
be expressed:

\begin{equation}
{{\p(\o|M)}\over{\p(\n{\o}|M)}} =
 {{\p(M|\o)}\over{\p(M|\n{\o})}}\times
 {{\p(\o)}\over{\p(\n{\o})}}	\label{bayodds}
\end{equation}

	This relationship follows from simple geometry in figure
\ref{Venn}, if the geometric area of an expression corresponds to its
probability, and if $A$ is identified with $\o$ and $B$ with $M$.

\section{Map Combining Formula}

\begin{figure}
\vspace{2in}
\special{picture VennI scaled 337}
\caption{Venn Diagram for Independent Variables}
\label{VennI}
\end{figure}

Formula \ref{bayodds} combines independent sources of information
about $\o$ and $M$ into an estimate of a single quantity $\p(\o|M)$.
The new information $M$, occurs in terms of the probability of $M$ in
the situation that a particular cell is or is not occupied, $\p(M|\o)$
and $\p(M|\n{\o})$.  This inversion of $\o$ and $M$ is a key feature
of Bayes' theorem.  But consider the problem of generating a map from
information $M_1$ and $M_2$ when each has already been individually
processed into a map, i.e. find $\p(\o|M_1\land M_2)$ given
$\p(\o|M_1)$ and $\p(\o|M_2)$. This requires finding $\p(A\land{B})$
knowing $\p(A)$ and $\p(B)$, which is not possible in the general case
of figure \ref{Venn}. But if we assume that $A$ and $B$ represent
independent information, the relationship is as in figure \ref{VennI},
and $\p(A\land{B}) = \p(A) \times \p(B)$.  Algebra then gives us:

\begin{equation}
{{\p(\o|M_1\land M_2)}\over{\p(\n{\o}|M_1\land M_2)}} =
 {{\p(\o|M_1)}\over{\p(\n{\o}|M_1)}}\times
 {{\p(\o|M_2)}\over{\p(\n{\o}|M_2)}}
 	\label{combinesimple}
\end{equation}

	Suppose $M1$ and $M2$ are not entirely independent, but
contain some common information $M_c$. M1, for instance, would be
factored into an independent component $M'_1$ and the dependent
component $M_c$:

\begin{equation}
{{\p(\o|M_1)}\over{\p(\n{\o}|M_1)}} =
 {{\p(\o|M'_1)}\over{\p(\n{\o}|M'_1)}}\times
 {{\p(\o|M_c)}\over{\p(\n{\o}|M_c)}}
\label{mix}
\end{equation}

The combining equation would then be written:

\begin{equation}
{{\p(\o|M_1\land M_2)}\over{\p(\n{\o}|M_1\land M_2)}} =
 {{\p(\o|M_1)}\over{\p(\n{\o}|M_1)}}\times
 {{\p(\o|M_2)}\over{\p(\n{\o}|M_2)}} /
 {{\p(\o|M_c)}\over{\p(\n{\o}|M_c)}}
\label{contextfree}
\end{equation}

 Where the division by the third term serves to remove one of the
redundant occurrences of $M_c$ in the expression.

	Formula \ref{contextfree} can be used to build maps from
sensor readings if $\p(\o|M_1)$ is interpreted as the developing map,
and $\p(\o|M_2)$ represents a reading. The values of $\p(\o|M_2)$ can
be determined in a training session by accumulating occupancy
statistics each possible reading for a sensor in a variety of known
environments. The resulting averages will contain information about
the occupancy density of the training spaces as well as of the sensor
response. This effect can be removed by identifying the training space
density with $\p(\o|M_c)$ and dividing it out in formula
\ref{contextfree}. Another way of determining the $\p(\o|M_2)$ values
is from a detailed sensor model, such as the one outlined in the next
section. The model is applied to a map of uniformly unknown cells,
causing it to contain information from one reading. The resulting map
is then used to update the accumulating map using formula
\ref{contextfree}.  This approach will be called the {\it
Context-Free} method. It is potentially very fast because the learned
or modeled $M_2$ values are independent of the current map, and can be
stored in tables.  An update can then be done with as little as one
arithmetic operation per cell in a sensor's sensitive volume.

\begin{figure}
\vspace{2in}
\special{picture Range scaled 575}
\caption{Range Sensor Model}
\label{Range}
\end{figure}

\section{Range Sensor Model}

	Imagine a single reading from a range sensor, modeled with
enough generality to subsume imperfect sonar measurements, single
stereo depth determinations, proximity or touch readings. The reading
has both range and lateral uncertainty and a position-dependent missed
detection and false alarm rate. A stereo measurement has small lateral
and range uncertainty, sonar has high lateral and small range
uncertainty, while a proximity sensor would have large lateral and
range uncertainty.  Figure \ref{Range} represents a range reading
superimposed on a Certainty Grid. The range transducer is on the lower
left, and its sensitivity is assumed to drop to zero at the lateral
wedge lines, and to terminate at the range of first detection, on the
circular range surface. The grid is two dimensional, but the following
argument works equally well for three (or more) dimensions.

	Take the grid cells in the interior and range surface of the
reading and sort them into a one dimensional array $c_i$ in order of
increasing distance from the transducer. A range reading can then be
modeled as the sequential passage of a sensing wavefront through the
$c_i$ that is eventually stopped at a particular cell $c_M$. For the
moment, assume we know the identity of $M$.

\begin{figure}
\vspace{2.03in}
\special{picture Ideal scaled 468}
\caption{Ideal Map}
\label{Ideal}
\end{figure}

	Each cell $c_i$ in the real world is assumed to be either
occupied or empty. The transducer model has two probability functions
$Pdet_i$, that gives the probability that the sensing wavefront will
be stopped at particular cell if it should happen to be occupied, and
$Pfal_i$ (false alarm), the probability that the wavefront will be
stopped at an empty cell.

	At any time a robot has only an occupancy estimate $\p(\o_i)$
for each cell, derived from prior information. A reading updates this
estimate.  In this circumstance, the probability that a wavefront will
be stopped at $c_i$ can be estimated by combining the two
possibilities, weighted by their probabilities: $\p(halt_i) = \p(\o_i)
Pdet_i + \p(\n{\o_i}) Pfal_i$

	We can use formula \ref{bayodds} to update the map after a
reading if we can determine ${\p(M|\o)}/{\p(M|\n{\o})}$. \p(M) is the
probability that the sensing wavefront passes unstopped through every
$c_i$ before $c_M$, then stops at $c_M$.  This is given by $\p(M) =
(\prod_{i<M}\n{\p(halt_i)}) \p(halt_M)$. To incorporate the additional
hypotheses $\o$ and $\n{\o}$ as required by formula \ref{bayodds}, we
define:

\begin{equation}
\p(halt_i|\o_j) = \cases{\p(halt_i)&if $i\neq j$\cr Pdet_i&if $i = j$\cr}
\label{halt1}
\end{equation}
\begin{equation}
\p(halt_i|\n{\o_j}) = \cases{\p(halt_i)&if $i\neq j$\cr Pfal_i&if $i = j$\cr}
\label{halt0}
\end{equation}

So

\begin{equation}
{{\p(M|\o_j)}\over{\p(M|\n{\o_j})}} = 
{{\p(halt_M|\o_j)\prod_{i<M}\n{\p(halt_i|\o_j)}}
  \over 
{\p(halt_M|\n{\o_j})\prod_{i<M}\n{\p(halt_i|\n{\o_j})}}}
\label{longprod}
\end{equation}
but (\ref{halt1}, \ref{halt0}) show that 
$\p(halt_i|\o_j)=\p(halt_M|\n{\o_j})$ except when $i=j$
so formula \ref{longprod} greatly simplifies by cancellation to:
\begin{equation}
{{\p(M|\o_j)}\over{\p(M|\n{\o_j})}} = 
\cases{{{\n{Pdet_j}}/{\n{Pfal_j}}}&if $j<M$\cr
        {{Pdet_j}/{Pfal_j}}&if $j=M$\cr}
\label{updatesimple}
\end{equation}

	Formula \ref{updatesimple} provides a surprisingly simple
method for updating map cells from range sensor readings, equal in
efficiency to the Context-Free method, but requiring less space for
tables.  Unfortunately it requires that we know exactly which cell
$c_M$ was detected in the reading. This is possible only if there is
almost no lateral or range uncertainty in the measurement. But suppose
the identity of $c_M$ is known only to lie among $c_{m1}$ through
$c_{mn}$, with associated probabilities $p_m1$ through $p_m2$.  We can
mix these possible cases and convert formula \ref{longprod} into the
general form:

\begin{equation}
{{\p(M|\o_j)}\over{\p(M|\n{\o_j})}} = 
{\sum_{k\le n}
{p_{mk}\p(halt_{mk}|\o_j)\prod_{i<mk}\n{\p(halt_i|\o_j)}}
  \over 
{\sum_{k\le n}
{ p_{mk}\p(halt_{mk}|\n{\o_j})\prod_{i<mk}\n{\p(halt_i|\n{\o_j})}}}
}
\label{fullprod}
\end{equation}

This formula, which we will call the {\it Context-Sensitive} method,
being a nested sum and product over the entire sensor volume, would
seem to require a number of computational steps proportional to the
square of the sensor volume for {\it each} cell to be updated, for a
total cost per reading proportional to the cube of the volume.  This
compares to linear cost for the Context-Free method. In return it
potentially uses the growing information in the map to wring more
useful information from later readings.  Fortunately in the portion of
the volume outside of $m1\ldots mn$ the expression still reduces to
the simple form of the first case of formula \ref{updatesimple}.  And
within $m1\ldots mn$, algebraic manipulation can make the cost
proportional to $n^2$.

\begin{figure}
\vspace{2.03in}
\special{picture Back scaled 468}
\caption{Background Map}
\label{Back}
\end{figure}

\section{Match, Error and Entropy}

	To compare the relative merits of various reconstruction
methods, we simulated a robot with a sensor similar to a Polaroid
sonar ranger moving in a space similar to the laboratory that has
hosted many of our recent real robot runs. This raised the problem of
how to properly compare two different probability maps. Besides its
usefulness in evaluating reconstruction methods, such a measure is
necessary when correlating different maps for navigational purposes.

	After several unsatisfactory approaches, we devised the
following.  The probability that two maps represent the same situation
is equal to the product of the probabilities that each corresponding
pair of cells represents the same value.  Now, each number in our maps
represents the probability that the cell is a occupied.  So the
probability that two cells $A_i$ and $B_i$ are both occupied is just
$A_i\times B_i$.  The probability that they are both 0 is
$\n{A}_i\times\n{B}_i$.  The probability that they are the same is the
sum of these cases: $A_i B_i + \n{A}_i\n{B}_i$.  The probability that
two maps represent the same thing is then $\prod_i(A_i
B_i+\n{A}_i\n{B}_i)$, with $i$ enumerating all the cells of the maps.
Generally this will be a ridiculously tiny number.  To get an
information measure in bits, just take $\log_2$ this probability. We
call this result the {\it Match} between the maps.

\begin{equation}
Match = \log_2\prod_i A_i B_i+\n{A}_i\n{B}_i =
 \sum_i\log_2(A_i B_i+\n{A}_i\n{B}_i)
\end{equation}

	Match is generally a large negative number that climbs slowly
to zero as the maps being compared both become more certain (the
probabilities approach 0 or 1) and become identical to each other.  If
a cell is completely uninformed $(p = 1/2)$, then the similarity
probability for that cell, regardless of the state of the
corresponding cell in the other map, will be $1/2$. $\log_2 1/2 = -1$,
so an uninformed map always gives a Match equal to the negative of the
number of cells, regardless of the state of the second map.

	When a Match is calculated between a reconstructed map and its
perfect ideal (which contains either 0 or 1 in every cell), the result
is called the {\it Error}. The perfect knowledge of the world this
requires is easily obtained in simulation, though it will generally
not be available in physical runs. Still, if a reconstructed map has a
probability of $p$ in a cell, it should be the case that $p$ of the
time the ideal map should have a 1 in that place, and a 0 the other
$\n{p}$ of the time.  This reasoning allows us to calculate and
average expected Error for a map, even when the ideal map is
unknown. We call this the {\it Entropy}.  A bit of algebra reveals
that:

\begin{equation}
Entropy = \sum_i p_i\log_2(p_i)+\n{p_i}\log_2(\n{p_i})
\end{equation}

	If the probabilities in a reconstructed map are a completely
accurate representation of the robot's true state of incomplete
knowledge, then the map's Error would be equal to its Entropy. If the
magnitude of Error is greater than Entropy, then the map can be said
to be overconfident.

	 A totally undifferentiated map, where all the cells have the
same probability value $p$, has minimum Error when the $p$ is equal to
the number of cells in the ideal map that are 1 divided by the total
number of cells, {i.e.} when the probability is equal to the occupancy
density of the ideal.  But Entropy is maximum when $p = 1/2$ and drops
towards 0 as $p$ goes to either 0 or 1.

\begin{figure}
\vspace{2.03in}
\special{picture Free scaled 468}
\caption{Context-Free Map}
\label{Free}
\end{figure}

\begin{figure}
\vspace{2.03in}
\special{picture Sense scaled 468}
\caption{Context-Sensitive Map}
\label{Sense}
\end{figure}

\section{Simulations}

	In our simulations, the maps had 2048 cells, and simulated
robot runs consisted of 324 range measurements by an imperfect sonar
with a 15 degree field of view. Perfectly uninformed maps have Error
and Entropy of -2048.  When the maps are initialized to an optimum
background $p$ of $0.123$, Error is -1109.  Early in the simulation
each reading reduces the magnitude of this by about 8 bits. Towards
the end the contribution is about 1 or 2 bits, though sometimes a
reading makes things slightly worse, by a fraction of a bit. After 324
readings Error is about $-550$.

	Surprisingly, the simpler Context-Free method gives better
results than the Context-Sensitive method. A possible explanation is
that both methods make the assumption that the readings provide
independent information.  Because of similarity in transducer
position, this independence assumption is not exactly true.  The
Context-Sensitive method uses the assumption more heavily, and so
suffers more from the fact that it is false.  In all cases the
magnitude of Error (about $-550$) is greater than Entropy (about
$-400$).  This means that the maps are overconfident (for example, a
cell with $0.7$ probability might have a 1 in the ideal map only $0.6$
of the time). Imagine that Entropy measures the amount of information
in a map, while Error measures the amount of {\it correct}
information. The difference between Error and Entropy is thus amount
of {\it wrong} information.  We attempted to reduce the confidence of
our maps by linearly scaling the $\log_2 (p/{\n{p}})$ of every cell by
a confidence factor $C$.  The results were interesting. As $C$ was
varied from 1 to a value of about $0.74$ in the Context-Free case, and
$0.7$ in the Context-Sensitive case, Entropy increased and Error
decreased until they became exactly equal.  As $C$ was decreased
beyond these critical values, Error again began to rise, but not so
fast as Entropy. The minimum Error in all cases happened exactly when
Error was equal to Entropy.

\section{Acknowledgement}

	This work has been supportedby the Office of Naval Research
under contract N00014-81-K-503, and by Denning Mobile Robotics, Inc.

\section{References}
\begin{enumerate}

\item\label{Moravec88}
Moravec, Hans P., {\bf Sensor Fusion in Certainty Grids for Mobile
Robots}, AI Magazine, v.9 n.2, Summer 1988, pp. 61-74.
\end{enumerate}

\end{document}
