
 Bayesian Grids

 Hans Moravec
 February 1987

\section{Bayesian Reasoning}

	In most of our work to date, we have used {\it ad-hoc}
formulas to update the certainty grid estimates. Recently a less
arbitrary statistical approach derived from Bayes theorem has captured
our attention.  Preliminary results using this approach are at least
as good as those from the old formulas.  Many puzzling aspects of the
old scheme have been clarified in the process.

	Let $\p(A|B)$ represent our best estimate of the likehood of
situation A if we have received information B. By definition 
\begin{equation}
\p(A|B) =
{{\p(A\land B)}\over{\p(B)}}
\end{equation}
 Plain $\p(A)$ represents our estimate of $A$
given no special information.  The alternative to event $A$ will be
referred to as $\n{A}$ (not $A$). For any $B$
\begin{equation}
\p(A|B) + \p(\n{A}|B) = 1
\end{equation}

	A certainty grid is a regular finite element model of space.
Each cell of the grid contains a liklehood estimate (our
``certainty'') of a given property of the corresponding region of
space.  Primarily we are concerned with simple occupancy of the
region, represented by $\p(\o [x])$, the probability that region $x$
is occupied.  When a discussion contains only one particular $x$, we
will drop the subscript, and refer simply to $\p(\o)$.

	We will be considering data derived from wide angle sonar
range measurements.  A given measurement will be designated $M[i]$,
with $i$ being the sequential number of the reading.  The intersection
of a set of readings can be designated by a range in subscript, as in
\begin{equation}
		M[<n] = \bigcap_{i=1}^{i<n} M[i]
\end{equation}
 or by a list as in 
\begin{equation}
		M[i,j,l] = M[i]\land M[j]\land M[l]
\end{equation}
When only one reading $M[i]$ enters into a discussion, we will
abbreviate its name to simply $M$.  Each measurement has a value, a
sonar range $R(M)$.  The sonar sensor is quantized, and $R$ will be an
integer.

	$\P(\o)$ is the probability that any particular cell is
occupied, i.e. the average occupation density of our space.  Our
measurements don't give $\P(\o)$ directly, but it is approximately the
overall average of the $\p(\o)$'s of all the cells of a typical map of
the space.  By definition, $\P(\n{\o}) = \n{\P(\o)} = 1 - \P(\o)$.

\section{Fundamental Formulas}

	For two occupancy possibilities $\o$ and $\n{\o}$ of a cell, new
information $B$ and old information $A$, one form of Bayes theorem gives:
\begin{equation}
\p(\o|B\land A) = {{\p(B|\o\land A)\times{\p(\o|A)}}
 \over{\p(B|\o\land A)\times{\p(\o|A)}+
 \p(B|\n{\o}\land A)\times{\p(\n{\o}|A)}}}	\label{bay}
\end{equation}
\begin{equation}
\p(\n{\o}|B\land A) = 
 {{\p(B|\n{\o}\land A)\times{\p(\n{\o}|A)}}
 \over{\p(B|\o\land A)\times{\p(\o|A)}+
 \p(B|\n{\o}\land A)\times{\p(\n{\o}|A)}}}	\label{bayn}
\end{equation}
the ``odds'' formulation of this is compact and convenient for
computation, and will be important later:
\begin{equation}
{{\p(\o|B\land A)}\over{\p(\n{\o}|B\land A)}} =
 {{\p(B|\o\land A)}\over{\p(B|\n{\o}\land A)}}\times
 {{\p(\o|A)}\over{\p(\n{\o}|A)}}	\label{bayodds}
\end{equation}

	Formulas \ref{bay} and \ref{bayn} are somewhat complicated for
repeated use. Formula \ref{bayodds} is better, it is formulated in
terms of a product of odds.  When the odds ratio involves a
probability and its complement, odds and probabilities can be
interconverted by the relationship:

\begin{equation}
Odds(A) = {{\p(A)}\over{\p(\n{A})}}  =  {{\p(A)}\over{1 - \p(A)}}
\end{equation}
\begin{equation}
\p(A) = {{Odds(A)}\over{1 + Odds(A)}}
\end{equation}

	The $\p(B|\ldots)$ ratio in \ref{bayodds} is not of this
form. To compute its value, both numerator and denominator must be
evaluated separately. To make apparent this difference, the ratio
$\p(B|\o\land A)/\p(B|\n{\o}\land A)$ will be referred to as
$Odds(B||\o\land A)$.  Once the ratio is obtained, however, it can be
treated as any other odds number.

	If we deal exclusively with odds, all the combining operations
become multiplications.  Formula \ref{bayodds} is expressed:
\begin{equation}
	Odds(\o|A\land B) = {Odds(B||\o\land A)\times Odds(\o|A)}
\end{equation}

	An additional transformation streamlines the computation more.
Let $L(A)$ represent the logarithm, to some suitable base, of
$Odds(A)$.  The formula then becomes a simple addition.
\begin{equation}
		L(\o|A\land B) = L(B||\o\land A) + L(\o|A)
\end{equation}

	The terms can be integers if the base of the logarithms is
chosen well.  Perfect certainty ($\p(\o) = 0$ and $\p(\o) = 1$) can no
longer be represented, but such values are probably a mistake in any
representation, since they are unalterable by any input.  Only God
should be absolutely certain of anything!

\section{Combining Formula}

	Bayes' theorem is a formula that combines independent sources
of information $A$ and $B$ into an estimate of a single quantity
$\p(\o|A\land B)$.  The new information $B$, occurs in terms of the
probability of $B$ in the situation that (in our case of interest) a
particular cell is or is not occupied, $\p(B|\o)$ and $\p(B|\n{\o})$.
This inversion is central to the usefulness of Bayes' theorem.  But
consider the problem of generating a map from information $A$ and $B$
when each has already been individually processed into a map,
i.e. find $\p(\o|A\land B)$ given $\p(\o|A)$ and $\p(\o|B)$.

	Bayes' formula (\ref{bayodds}) applied to information $B$ and
null $A$ (i.e. only global information from the $A$ side) makes the
relationship between $\p(\o|B)$ and $\p(B|\o)$ clear:

\begin{equation}
{{\p(B|\o)}\over{\p(B|\n{\o})}} = 
	{{\p(\o|B)\times{\P(\o)}}\over{\p(\n{\o}|B)\times{\P(\n{\o})}}}
\end{equation}
thus
\begin{equation} \label{flip}					
{{\p(\o|B)}\over{\p(\n{\o}|B)}} = 
	{{\p(B|\o)}\over{\p(B|\n{\o})}}\times{{\P(\n{\o})}\over{\P(\o)}}
\end{equation}

	It would be nice to substitute this ratio back into the
original version of formula (\ref{bayodds}) to produce a formula
giving $\p(\o|A\land B)$ in terms of $\p(\o|A)$ and $\p(\o|B)$, thus
allowing maesurements $A$ and $B$ to be incorporated into maps
independently, and combined afterward.  This is not possible in
general because the two measurements may interact in some way - for
instance, either $A$ or $B$ alone may indicate a high probability of
$\o$, but taken together, they may confirm some other hypothesis, and
reduce the probability of $o$.  If, however, we make a strong
assumption of independence of B from A:
\begin{equation}
{{\p(B|\o\land A)}\over{\p(B|\n{\o}\land A)}} =
{{\p(B|\o)}\over{\p(B|\n{\o})}}		\label{independ}
\end{equation}
we can use equations (\ref{bayodds},\ref{flip},\ref{independ}) to
produce a {\it map combining formula}:
\begin{equation}
{{\p(\o|A\land B)}\over{\p(\n{\o}|A\land B)}} =
  {{\p(\o|B)}\over{\p(\n{\o}|B)}}\times
  {{\p(\o|A)}\over{\p(\n{\o}|A)}}\times
  {{\P(\n{\o})}\over{\P(\o)}}			\label{combine}
\end{equation}

	While the non-informative reading in equation \ref{bayodds},
that which leaves the original $\p(\o)$ estimate unchanged, is found
when $\p(B|\o)/\p(B|\n{\o}) = 1$, the non-informative case in equation
\ref{combine} happens when $\p(\o|B)/\p(\n{\o}|B) =
\P(\o)/\P(\n{\o})$, i.e.  when the cell density estimated from the
reading is the same as the average cell density of the whole map.

	Formula \ref{combine} is most important because it provides a
means for combining maps of the same area obtained by different means,
for instance by independent scans of different sensors.  It can also
be used to incorporate individual sensor readings as we build a map,
but is in general inferior to formula \ref{bayodds} for this purpose
because it precludes the use of any knowledge we may have of sensor
interactions.  In odds and log odds form it becomes:

\begin{equation}
Odds(\o|A\land B) = {{Odds(\o|A)\times Odds(\o|B)} \over {Odds(\o)}}
\end{equation}

\begin{equation}
	L(\o|A\land B) = L(\o|A) + L(\o|B) - L(\o)
\end{equation}

\section{Sonar Wedges}

	Simple reasoning from first principles can produce estimates
for $\p(M|\o)$ and $\p(M|\n{\o})$ for a sonar reading.  These values
can be used to update maps by direct application of the Bayes formula
\ref{bayodds}.  In a later section we will show how to systematically
incorporate in the possibility of errors in the readings.  For now
assume that the sensor always perfectly returns the range to the
nearest occupied cell within its angle of sensitivity. Define the
sonar regions according to the diagram:


\begin{verbatim}

		      Range "R(M)" of actual measurement
		     /
		  |||---	Range "Rc"
		 |||	----   /
		|||	    ----
		|||	     .	----
(10)	       |||	    .	    ----||
	       |||	    .	    ----||
		|||	     .	----
		|||	    ----		External Region
		 |||	----  \
		  |||---       Interior Region
		/
		Range Surface

\end{verbatim}

	When incorporating a new reading $B$, the map built from prior
readings $A$ can be used to help provide an estimate for the
$\p(B|\o\land A)$ and $\p(B|\n{\o}\land A)$ of formula \ref{bayodds}.

	Let $\P(R)$ be the probability, prior to a the reading, that
the next sonar measurement will result in a given range $R$.  $\P(R)$
can be approximated by stepping through the possible ranges $R_1$,
$R_2$, \dots $R_n$, starting with the shortest, $R_1$, and multiplying
the occupancy probability $\p(\o|A)$ of each cell on the range surface
for that range by the probability that the sonar would detect an
occupied cell at that location.  The sum of these products at $R_1$,
call it $S(R_1)$, is $\P(R_1)$. Now $\P(R_2) =
S(R_2)\times(1-\P(R_1))$ since an echo from $R_2$ can happen only if
the sonar pulse has not already been intercepted at the shorter range
$R_1$. In general
\begin{equation} \label{profile}
	\P(R_i) = S(R_i) \times (1-{\sum_{j<i}\P(R_j)})
\end{equation}

	By definition $\sum\P(R)$ over all $R$ is unity.  We will
suggest later exactly how to compute $S$, and how the detection
probabilities required in the calculation above can be determined
empirically by collecting statistics from many maps of the correlation
of individual sensor readings with the composite maps they helped
create.

\subsection{External Region}

	Consider a cell in the external region.  $\p(M|\o)$ is our
estimate that a measurement range of $R(M)$, as opposed to some other
range, will occur if we happen to know only that the cell is occupied.
Since the cell is outside the sonar cone, its state of occupancy has
no effect on the reading, and we can refer only to the uniform range
distribution to conclude that

\begin{flushright}
[external region probabilities]
\end{flushright}

\begin{equation}
\p(M|\o) = \p(M|\n{\o}) = \P(R(M))
\end{equation}
and
\begin{equation}
{{\p(M|\o)}\over{\p(M|\n{\o})}} = 1
\end{equation}
inserting these into formula \ref{bayodds} gives
\begin{equation}
{{\p(\o|M[1:i])}\over{\p(\o|M[1:i])}} = 
  {{\P(R(M))\times \p(\o|M[<i])}\over{\P(R(M))\times \p(\n{\o}|M[<i])}} =
{{\p(\o|M[<i])}\over{\p(\n{\o}|M[<i])}}
\end{equation}
so, the occupancy certainty is unchanged, as it should be, since the
sonar reading contains no information about the external cell.

\subsection{Range Surface}

	Now consider a cell on the range surface, and suppose this
surface covers n cells in all. By definition the range at this surface
is $R(M)$.

	If the cell is occupied, then a perfect sensor would detect
it, and so could not return a greater range than $R(M)$.  All ranges
beyond the occupied cell, thus, would be ``short circuited'' by the
cell.  It would, however, be possible for it to return shorter ranges,
if closer cells within the sonar beam angle happened to be occupied.
The probabilities of the shorter ranges should not be changed by the
presence of the farther occupied cell.  The probability of getting
just the range reading would be unity minus the a-priori probability
of getting a lesser reading, or equivalently, the a-priori probability
of getting a reading greater than or equal to $R(M)$.

\begin{flushright}
[range surface probabilities]
\end{flushright}

\begin{equation} \label{rangep}
\p(M|\o) = 1 - {\sum_{R<R(M)}{\P(R)}} = {\sum_{R\geq R(M)}{\P(R)}}
\end{equation}

	If, on the other hand, the cell were unoccupied, the original
distribution is hardly altered, except that the particular $R = R(M)$
is slightly less likely to occur, since one possible way to achieve it
is eliminated.  The cell in question is one of ``n'' cells we assumed
on the range surface, so we can say the chance of $R(M)$ happening is
reduced by the factor $(n-1)/n$.  The probabilities of other ranges
are not affected directly, so the result is a slight ``notching'' at
$R=R(M)$ of the a-priori $\P(R)$ distribution. Simply notching it,
however, reduces its total area, which must then be renormalized by
the proper factor to restore the area of the distribution to unity,
increasing all the other probabilities slightly:

\begin{equation} \label{rangen}
\p(M|\n{\o}) = {{\P(R(M))\times(n-1)/n}\over{1-\P(R(M))/n}} = 
\P(R(M))\times{{n - 1}\over{n - \P(R(M))}}
\end{equation}

\subsection{Interior Region}

	If we assume a perfect sensor, an occupied cell in the wedge
at a range less than $R(M)$ would always be detected and thus prevent
a the reading $M$.  So in this case

\begin{flushright}
	[interior region probabilities]
\end{flushright}

\begin{equation} \label{interiorp}
\p(M|\o)  =  0
\end{equation}

if the cell in the interior is unoccupied, then the range ``Rc'' at
which it occurs is less likely than other ranges.  Suppose there would
be $k$ cells occupied by a range arc at distance $Rc$.  By reasoning
like that of formula \ref{rangen}, the probability of $Rc$ would be
reduced by a factor of $(k-1)/k$, but the overall distribution would
have to be normalized by dividing by $(1 - \P(Rc)/k)$.  This
normalization raises the probability of $R(M)$ from its a-priori
value:

\begin{equation} \label{interiorn}
\p(M|\n{\o}) =  {{\P(R(M))}\over{1 - \P(Rc)/k}}
\end{equation}

\section{Measurement Errors}

	Of course our sensors are not perfect in the sense of the last
section. Sometimes they fail to respond to an occupied location, at
other times they give a spurious indication of occupancy.  It is easy
to modify the perfect case formulas for this.  Suppose there is a
small chance $e_{hit}$ that an empty cell will act as if it were
occupied.  Suppose also that there is a small possibility $e_{miss}$
that an occupied cell will fail to be detected.  We can then construct
the formulas with error from the error-free cases using the general
formulas:

\begin{equation}
\p(M|\o)_{error}=\p(M|\o)\times(1-e_{miss})+\p(M|\n{\o})\times e_{miss}
\end{equation}

\begin{equation}
\p(M|\n{\o})_{error}=\p(M|\n{\o})\times(1-e_{hit})+\p(M|\o)\times e_{hit}
\end{equation}

	If $e_{hit} = e_{miss} = 1/2$, then the sensors are returning
random readings. This is captured in (15) as $\p(M|\o)_{error} =
\p(M|\n{\o})_{error}$, the non informative case, similar to formula
(12).  As $e_{hit}$ and $e_{miss}$ grow closer to $1/2$, the reading
becomes less and less informative.  If $e_{hit} = e_{miss} = 1$ the
sensor is perfect, but the hit and miss indications are mistakenly
swapped.  In that case (15) restores the correct pairing.

	A real sensor can be modelled, somewhat redundantly, by
varying $e_{hit}$ and $e_{miss}$ over the sensed area.  Towards the
edges of the beam they will creep towards $1/2$, achieving this value
perfectly outside the beam, and perhaps at the boundary of the
interior and range surface regions.

	If $0\leq e_{hit},e_{miss}\leq 1$, the error formulas can only
reduce or leave unchanged the amount of information provided by the
``perfect'' estimates of $\p(M|\o)$ and $\p(M|\n{\o})$.

	The quantities $e_{hit}$ and $e_{miss}$ must be initialized to
some reasonable values when a system begins operation. They can then
be adjusted by an iterative ``learning'' process.  Each time a map is
constructed from many sensor readings, the correlation of each cell in
each individual sensor profile with the area of the total map it
overlays is recorded. This correlation is averaged over many maps.  If
the map often reports a high occupancy probability in a location where
the sensor profile indicates occupancy, $e_{hit}$ will be low for that
position in the sensor profile, otherwise it will be high.  Similarly
if the map usually has low probabilities at a site indicated empty in
the sensor profile, $e_{miss}$ will be low, otherwise it will be high.

	The detection probability required to compute the function $S$
in equation \ref{profile} is given by the expression $1-e_{miss}$.  A
more accurate computation of $S$ would also take into account the
chance that an empty cell would falsely register as occupied. So a
good estimate for $S$ would be:
\begin{equation}
S(R) = \sum_R\p(\o|A)\times (1-e_{miss}) + \p(\n{\o}|A)\times e_{hit}
\end{equation}

\section{Stereo Range Measurements}

	Another approach to incoprorating individual sensor readings
uses the independent combining formula \ref{combine}.  In general this
will result in inferior maps, but it requires less computation.
Here's an example from a consideration of a stereo vision measurement.
One of our stereo approaches matches edges crossing a given scanline
in the left and right images.  A range is deduced from the relative
horizontal shift of each edge between the two images. The edges cannot
be located with infinite precision, so there is some uncertainty in
depth, given by a distribution $\P(D|m)$, the probability the object
ranged by stereo measurement $m$ is actually at distance $D$.  Note
that this is not the probability that location $d$ is occupied,
because $d$ may contain an object other than the edge being
ranged. Let $\p(d|m)$ represent the probability, given measurement
$m$, that $d$ is occupied by {\it any} object.  We can estimate
$\p(d|m)$ directly by the following reasoning.

	Consider the possible case that the ranged edge is at a
particular distance $D$ (this possibility has probability
$\P(D|m)$). In this case, and assuming a perfect sensor as we did in
the sonar example, we can conclude that
\begin{equation}
\p(d|D) = 0\qquad when\qquad d<D
\end{equation}
since a perfect sensor would detect an intervening object
\begin{equation}
\p(d|D) = 1\qquad when\qquad d=D
\end{equation}
since the object is at D
\begin{equation}
\p(d|D) = \p(\o)\qquad when\qquad d>D
\end{equation}
since the edge blocks the view behind it, the measurement gives
us no special information beyond d=D.

Graphically:


\begin{verbatim}

	1			___
				| |
\p(d|D)				| |________________________________ \p(\o)
				| 
        0 ______________________|

	  d=0                   d=D			d>D

\end{verbatim}

Relationsip (16) holds only if the edge is exactly at distance $D$.
Measurement $m$ tells us only that there is a probability $\P(D|m)$
that this is the case.  To get the overall probability, we must sum
the $\p(d|D)$s over all possible $D$'s, each weighted by its
probability of actually being the case. Thus
\begin{equation}
\p(d|m) = \sum_D{(\p(d|D)\times\p(D|m))}
\end{equation}


	This will generally have the form

\begin{verbatim}

	1			
			       _-_
\p(d|m)			      /   \________________________________ \p(\o)
			   __/
        0 _____________----

(17)	  d=0                  d=m			d>D

\end{verbatim}

	Though if the spread of $\p(D|m)$ is very large the hump at $d=m$ will
be attenuated, and the curve will have the approximate shape:

\begin{verbatim}

	1			
		
\p(d|m)			             _____________________________ \p(\o)
			  ____-------
        0 __________------

(17)	  d=0                  d=m			d>D

\end{verbatim}

	The values of $\p(d|m)$ can be used directly to update maps by
use of the combining formula \ref{combine}.

\section{Acknowledgement}

	I wish to thank Peter Cheeseman for setting my nose firmly in
this groove.  {\bf Statistical Decision Theory and Bayesian Analysis}
by J. O. Berger, 1985, was helpful too.

\end{document}
