Robust Navigation
by
Probabilistic Volumetric Sensing
Carnegie Mellon University
The Robotics Institute
5000 Forbes Avenue
Pittsburgh, PA 15213
Time Period: May 1, 1999 to April 30, 2002
PROPOSAL
1. DARPA ITO Solicitation: BAA #99-09
Mobile Autonomous Robot Software
2. Technical Topic Area: Soft Computing
3. Proposal Title: Robust Navigation by Probabilistic Volumetric
Sensing
4. Technical Contact: Hans Moravec
email: hpm@cmu.edu
tel: 412-268-3829 or 412-363-9759
fax: 412-363-9834
postal: Robotics Institute
5000 Forbes Avenue
Carnegie Mellon University
Pittsburgh, PA 15213
A. Innovative claims for the proposed
research (1)
Our goal is laboratory prototype sensor-based software for utility
mobile robots for industrial transport, floor maintenance, security
etc., that matches the months-between-error reliability of existing
industrial robots without requiring their expensive worksite
preparation or site-specific programming. Our machines will navigate
employing a dense 3D awareness of their surroundings, be tolerant of
route surprises, and be easily placed by ordinary workers in entirely
new routes or work areas. The long-elusive combination of easy
installation and reliability should greatly expand cost-effective
niches for mobile robots, and make possible a growing market that can
itself sustain further development.
We will employ 3D grids of spatial occupancy evidence, a technique we
have been developing for two decades [A2]. 2D versions of the
approach found favor in many successful research mobile robots
[A0], but seem short of commercial reliability. 3D, with 1,000
times as much world data, was computationally infeasible until 1992,
when when we combined increased computer power with 100x speedup from
representational, organizational and coding innovations. In 1996 we
wrote a preliminary stereoscopic front end for our fast 3D code, and
the gratifying results convinced us of the practicability of the
approach, given about 1,000 MIPS of computer power [A3]. We
seek support for a sustained, single-minded effort to parlay that
start into a universally convincing demonstration, just as the
requisite computing power arrives.
The work has natural three stages: completion and improvement of the
basic perception code; creation of an identification layer for
navigation and recognition of architectural features; finally, sample
application programs that orchestrate the other abilities into
practical behaviors like patrol, delivery and cleaning. We need both
capability and efficiency. The existing code allows one-second time
resolution with 1,000 MIPS, but our 3D maps have millions of cells,
and straightforward implementations of path planning, localization,
object matching, etc. would be much slower. We will combine
techniques like coarse-to-fine matching, selective sampling and
alternate representations to get the desired results at sufficient
speed.
Ideas awaiting evaluation include variable resolution trinocular
stereoscopic matching; optimization of sensor evidence profiles and
other system parameters using robot images of the scene to first
naturally colorize the occupied cells in a grid, then to check its
faithfulness, to close the learning loop; randomly textured light to
improve stereo of uniform surfaces; coarse-to-fine path finding;
occupied-cell-list matching of grids for fast localization and feature
identification; and others. Ongoing work will lead us to
more.
B. Technical Rationale, Approach and Plan
(18)
This work is to be the finishing sprint in a long research journey.
Milestones on that journey include:
1) A 1970s Stanford thesis, which stereoscopically tracked
dozens of localized 3D visual features to servo a robot called the
Cart through obstacle courses, the first time this had been done in
arbitrary environments. It used 10 min/meter on a 1 MIPS computer,
and made navigation errors about every 100 meters. [A1]
2) 1980s invention of grids accumulating evidence of occupancy.
This noise-tolerant representation allowed kilometers of travel
between errors and reduced computational cost when used in 2D. A
majority of the dozens of indoor robots cruising research hallways
today use 2D grids, needing about 50 MIPS. [A2]
3) Very fast 3D grid code written in a 1992 sabbatical at Thinking
Machines Corp. 3D grids are 1,000 times as big as 2D, due to the
added dimension, and because they invite higher resolution. Use of
TMCs CM5 supercomputers was planned, but an intense series of
representational and coding innovations produced a program 100 times
faster than anticipated, usable on 30 MIPS workstations. It remains
the only practical 3D grid implementation. [A3]
4) Stereoscopic front end for the 3D grid program, written in a
1996 sabbatical at Daimler-Benz research. The very encouraging first
results, obtained at 100 MIPS, are good enough and fast enough to
promise viable autonomous navigation and other functions given 1,000
MIPS. This first implementation suggests major optimizations,
improvements and extensions. [A3]
During the 1996 sabbatical work we wrote a new program that derives a
dense three-dimensional evidence grid representation of a robot's
surroundings from wide-angle stereoscopic images [A3]. We
chose to work with stereo because, of all approaches that provide
dense enough data to make good 3D maps, it promised to be the
alternative most likely to be inexpensive enough for mass-market
robots in the near future. The program adds several spatial rays of
evidence to a grid for each of about 2,500 local image features chosen
per stereo pair. It was used to construct a 256x256x64 grid,
representing 6 by 6 by 2 meters, from a hand- collected test set of
twenty stereo image pairs of an office scene. Fifty nine stereo pairs
of an 8 by 8 meter laboratory were also processed. The positive
(probably occupied) cells of the grids, viewed in perspective,
resemble dollhouse scenes [see view below]. Details as small as the
curvature of chair armrests are discernible. The processing time, on a
100 MIPS Sparc 20, is less than five seconds per stereo pair, and
total memory is under 16 megabytes. The results strike us as
abundantly adequate for very reliable navigation of freely roaming
mobile robots, and plausibly adequate for shape identification of
objects bigger than 10 centimeters. The program is a first proof of
concept, and awaits optimizations, enhancements, variations,
extensions and applications.

The program has since been used by my sabbatical hosts at
Daimler-Benz in Berlin to drive a robot around office corridors. It
is available on the web, and has been used by several other groups.
We ourselves used it to drive a robot short distances, but have made
little additional progress because of unrelated demanding commitments.
Growing computer power strongly compels us now to bring the work to a
practical conclusion in a sustained, concentrated, single-minded
effort. We need full-time support for this. We have not found
significant funding commercially. Robotics has been a historically
unprofitable business, and small companies have no money for long-term
efforts, while large companies have other interests. Daimler-Benz,
for instance, has only a small internal robotics effort, that competes
with other research projects. My hosts were able to offer us only a
small gift for our work, which we used for modest equipment
purchases.
Year 1 Agenda: Perception Layer
Our first years work we will be to implement a lengthy list of
intuitively promising improvements to the existing core perception
system [A3], which was a first-cut experiment. Among
them:
Adjusting the stereo ray and many other parameters with an
automatic learning process to optimize the system performance. For a
learning criterion, we will project some scene images onto
range-sorted occupied cells of a grid to naturally colorize them, then
compare other scene images with corresponding views of the colorized
grid. The better the match, the better a representation of reality is
the grid. By this means we avoid the onerous task of supplying
independent ground truth to the learning process. The
camera images already code such information, even when the robot is in
the field. Our first 3D results, illustrated in the preceding image,
used ad-hoc settings of all parameters. Learning should provide
substantial improvement, evidenced by the following 2D grid example
with data from a ring of sonar sensors on a robot traversing a highly
specular hallway, where a majority or readings were misleadingly long.
The upper left image is a hand-constructed ideal map of the hallway.
The upper right is the hallway reconstructed using a sensor model that
works well in non-specular settings, but fails here. Lower left is a
reconstruction using a model partially trained for these conditions
(note prominent door frames, and their specular reflections), and
lower right is by a fully trained model [A2]. The learning
improvement in our 3D setups will likely be less dramatic, since the
initial model is not so bad.

Augmenting our dead-reckoning by statistically registering
grids constructed with partial data. The has worked well in 2D,
providing sub-voxel registration (see references in [A3]). It
should be even better in 3D, with more pixels. Computational cost is
the main concern: we plan to a coarse to fine strategy, and a
comparison concentrating on the typically mere 2% of the cells that
are occupied to keep this tolerable. In the latter technique, a list
of the occupied cells of one grid are mapped under a trial
displacement and rotation into the cells of a second grid, and also
vice-versa using the inverse translation and rotation, to calculate a
goodness of match, which is to be optimized. We also plan to add a
six-axis inertial navigation device to provide close initial
estimates.
Replacing monochrome binocular with color trinocular
stereoscopy. Doing so will give an error-reducing redundancy to our
range measurements, and allow us to detect purely horizontal features.
Horizontal edges were lost to matching ambiguity in the original
setup, which had two horizontally-displaced cameras. We also plan to
illuminate the surroundings with textured infrared light, creating
matchable features on uniformly colored surfaces, and to add several
camera triplets to widen our field of view, probably to
360°.
Using a pyramid of image resolutions in the stereoscopic
disparity search, allowing it to proceed in equal range steps.
Traditionally, stereo searches have been in constant resolution pixel
steps, but by stepping through the near field in excessive detail the
traditional approach expends a excessive of effort and generates
unnecessary numbers of false positive matches at close range, where
they cause the most trouble. The latter is especially a problem when
an occlusion hides the correct match. Variable resolution matching
will be both faster and better, because unavoidable matching errors
will be spread uniformly over distance, rather than being concentrated
in the robots immediate obstacle zone.
Higher resolution grids, which we have observed to improve
performance. With careful algorithm choice, the main cost of modest
resolution increases is in required memory.
Adding data from other sensors. Most of our robots carry sonar
and contact transducers, and the grid representation allows evidence
from all sources to be combined.
Many subtle improvements in matching functions, search
strategies, data representations, etc. For instance, we have recently
discovered that cross entropy has superior properties for matching
than the similar log probability formula we had been using.
Year 2 Agenda: Recognition Layer
In the second year we will build a layer of recognition software that
uses the data collected in the grid to navigate and locate important
world features. In this group will be procedures to:
Plan and execute obstacle-avoiding paths in grid spaces. Grids
are very good for this purpose, because they provide positive
indications of empty regions. We have already written an A* path
planner, which works, but is several times too slow. The following
image is an example of a path from this program operating in a
robot-height slab of the office grid shown in the first illustration.
A coarse-to-fine strategy or other approximations should allow us to
.

Store away and retrieve 3D map snapshots of the surroundings in
a compressed form, and to volumetrically register a current map
against a stored map. This technique can be orchestrated into a
route-learning program, in which a trail of snapshots is stored on a
training run, against which to position the robot when it travels
autonomously. Differencing of maps registered by large-scale features
could be used to locate smaller objects that have moved.
Computational efficiency is major concern.
Extract geometric architectural features, for instance to
identify open floor area, walls, doors, corridors and rooms.
Match smaller scale volumes, for identifying furniture-scale
objects, including people, using techniques probably similar to map
matching but optimized for smaller volumes and articulated
parts.
Build long-range topological maps by piecing together local
geometric grids with approximate transforms.
Generate paths in topological maps that minimize travel, sweep
the ground, or provide best vantage points, as needed by anticipated
applications.
Year 3 Agenda: Application Layer (Extension
Option)
The first two years of the project are naturally capped by a third
during which we would tune up and orchestrate basic perception and
identification abilities into complete useful applications. Since the
BAA solicited a 24 month effort, we propose this third year as an
optional extension.
In the third year we would construct several demonstration
applications in a third layer of code. The idea is to use the
powerful lower layers to create task-specific but location-independent
programs. For instance, a point-to point delivery program will be
constructed to learn any route by saving a trail of map snapshots in a
training run. A floor cleaning program will identify the boundaries
and obstacles in any environment, and plan a covering route. A
security program may be built to explore a network and plan a
randomized route that leaves no area unvisited for very long. These
programs may be quite straightforward themselves, but their success on
the reliability of the 3D maps produced by the perception layer, and
the identifications extracted by the identification layer. It is
certain that extended application trials will uncover subtle
weaknesses in the lower layers, and we expect to spend much time in
the third year dealing with such foundational issues.
Testbeds
Our robot-controlling computing platforms should be updated on an
approximately annual basis, to take advantage of increasing available
performance. We presently are using a dual-processor Pentium-II Dell
PC that provides about 700 MIPS, and running a realtime BE
Inc. operating system. By the third-year period we expect to have
about 3,000 MIPS available, and may have changed operating systems
more than once. Our robot code is written in C and C++ and is quite
portable, and has made transitions between systems several times. We
plan to have more computing power than we expect in a final commercial
product to allow us to quickly test and perhaps reject new ideas
without the onus of immediately optimizing every implementation for
speed.
For mobile robot platforms, we are presently using the Uranus mobile
robot, which we constructed ten years ago, and also retain the option
of using our even older machine, Neptune. Uranus is aging,
slow-moving, short on battery stamina, often unreliable, and uses
difficult-to-maintain obsolete components. Neptune is simpler and
more reliable, but tethered and thus usable only for very short-range
experiments. To avoid expending excessive effort on robot
maintenance, we think it is necessary for the success of the first
years work to upgrade to a modern smaller yet more capable and
reliable robot platform.

Test Robots: The Uranus mobile Robot, left, with
trinocular cameras and sonar ring. The Neptune robot is seen at the
right.
In the first two years, the an ideal platform would be a robot just
big enough to carry cameras, inertial system, communications and
sensor and effector controlling computer and wireless ethernet. We
could then use an offboard large desktop machine for the heavy
computation and display, shipping digital images and other data over
the ethernet link. The robot would be small enough to extricate by
hand when it gets into trouble. One suitable machine is the Nomadics
Scout II, weighing 25 kg and costing about $5,000. Others, slightly
less attractive, are the RWI Mach 5 and microATRV costing about $7,000
each:

Nomadics Scout II
RWI Mach 5
RWI microATRV
Application software in the third year could be most effectively
developed and tested using actual industrial vehicles. The cost of
commercial machines is in the over $100,000 range, and beyond the
scale of our current project. We would make an effort to gain use of
machines from interested parties. To this end, we have initiated some
preliminary discussions, but have not yet secured any commitments.
Having the proposed research contract in place would benefit such
negotiations. One contact is John Holland of Cybermotion, Inc, who
has expressed interest in collaborating with our project. Cybermotion
would be able to provide a very capable and sturdy
Navmaster robot platform, worth about $100K. Over 100 of
these platforms are patrolling as autonomous/remote security guards,
using site-specific programming. We have also had discussions with
Karcher, Inc, a large German maker of industrial cleaning machines.
Karcher currently makes a radar-guided robotic floor scrubber that is
in use at the Bonn airport, and could probably provide similar
machines for experiments. They are interested in expanding their
robotics efforts. Makers of factory AGVs (Automatically Guided
Vehicles) are also candidates for collaboration. In the unlikely
event we cannot gain use of any industrial machines, we would use our
research robots to simulate utility applications. This would not be
as satisfactory, since the endurance and basic capability of the
research machines is limited, and many practical questions would
remain.
More extensive presentations of the key technical details may be
found in reference [A3], included with this submission.
C. Deliverables (2)
We enter the research unencumbered by proprietary claims. All our
work to date has been done openly. Though many of our starting ideas
have lapsed into the public domain, many others are not yet fully
developed, and await first implementation. We feel it would help
secure future investment for bringing our results into the world if we
were to apply for patents on some of the planned research work. Our
department would cover the expense of such applications, in return for
half interest.
It likely that any industrial partners we enlist to provide robots or
other assistance will require some interest in the work in return. We
would offer such interest in planned research results on a strictly
non-exclusive basis. All results from the three year effort would be
available to the funding agency. We may, of course, extend the work
after the contract ends in a proprietary setting.
Our deliverable results will be:
At the end of year 1 - Perception Package:
A 3D perception package that processes textured-lighting-enhanced
trinocular stereoscopic images at the rate of at least one set per
second using 1,000 MIPS of processing power. It may also digest data
from other sensors, such as odometry, 6-axis inertial, sonar,
proximity and contact. It will extract at least several thousand
depth values from each stereoscopic image set, and process them into
rays of evidence of spatial occupancy that are accumulated in 3D
evidence grids with a resolution of about 2 cm, to a range of at about
5 meters and a height of about 2 meters, containing 4 million cells or
more. The package will also include a camera calibration program and
a learning procedure that optimizes evidence ray models and other
system parameters to optimize a comparison of scene images with
corresponding synthetic images of grid reconstructions. The program
will perform substantially better than our 1996 program [A3].
The package will be demonstrated on a small test robot platform that
moves to collect views from multiple vantage points, linked by
wireless ethernet to a large desk computer with at least 1,000 MIPS,
which will display images, 3D and 2D maps and navigational
information. Because of image digitization and communication delays
and computational limits, it is likely the robot will work in a
start/stop mode, collecting image then pondering them before moving
again. The pauses should be on the order of seconds.
We will provide the latest version of the recognition software and a
complete report and documentation for it at the end of the first year.
We will also make them available on our web pages, as we have our work
in past. Some aspects of the new work may have patents applied for by
that time.
At the end of year 2 - Identification Package:
An additional set of identification routines that extract from the
3D maps produced by the first layer the following: navigable paths,
localization by matching maps, major architectural features, a few
furniture-scale objects and people, long-range topological maps and
paths in such maps. The combination will be demonstrated using the
small robot system of year 1, but with the desk computer upgraded or
replaced to provide at least 1,500 to 2,000 MIPS, and perhaps with the
robot augmented with additional sensors we find helpful. With the
identification layer, the robot will be able to navigate around a
network of corridors and offices and build a sparse labeled map, and
can be commanded to go to specified locations. Since the code will be
new and little-tested, these demonstrations are unlikely to be as
reliable as required for actual applications. Also, the robot will
probably work in a start/stop mode, collecting images, transmitting
and pondering them between moves.
We will provide the latest version of the two software layers and a
complete report and documentation for them at the end of the second
year. We will also make them available on our web pages, as we have
our work in past.
(Optional Third Year)
At the end of year 3 - Application Packages:
We will produce several specialized programs as if for commercial
mobile robots that use the perception and identification layers to
orchestrate simulated commercial robot tasks, including point to point
delivery, floor traverse in a cleaning pattern and area patrol as for
a security robot. These will be demonstrated using our test robot,
with the offboard processor upgraded or replaced to have at least
3,000 MIPS. The application programs will allow the robot to be
quickly trained to do its job in new locations, either by
leading the robot itself along a path or the boundaries of its work
area, or by outlining those on a map it constructs and displays. It
would be desirable in this third year to conduct more realistic tests
of the system using on one or more industrial vehicles. Possibly the
funding agency would be able to provide such. In any case, we would
attempt to find an industrial partner who could both supply vehicles
and perhaps provide a first steps toward commercialization.
We will provide the latest version of all three software layers and a
complete report and documentation for them at the end of the contract.
We will also make them available on our web pages, as we have our work
in past.
D. Statement of Work (3)
In Year 1
We will assemble an experimental robot system consisting of a small
robot with an onboard portable computer for communications. The robot
will carry at least three digital cameras and a six-axis inertial
system linked to the onboard computer. Also linked will be the
robots control channel, and any sensors that come with the basic
robot, for instance sonar units. The machine will also be equipped
with a wireless ethernet transceiver. Offboard, connected to the
other side of the wireless ethernet, we will set up a high-end
personal-computer-class machine dedicated to robot control. The make,
options and software for this machine will be determined at time of
purchase to obtain at least 1,000 MIPS and sufficient memory and disk,
within our budget. We presently are using a dual-processor Dell
workstation 400 that provides about 700 MIPS running BeOs 3 operating
system, in this mode, using a wired connection to our obsolete Uranus
mobile robot.
With the hardware described, we will complete and improve an existing
preliminary 3D perception package that processes
textured-lighting-enhanced trinocular stereoscopic images at the rate
of at least one set per second. It may also digest data from other
available sensors, such as odometry, 6-axis inertial, sonar, proximity
and contact. It will extract at least several thousand depth values
from each stereoscopic image set, and process them into rays of
evidence of spatial occupancy that are accumulated in 3D evidence
grids with a resolution of about 2 cm, to a range of at about 5 meters
and a height of about 2 meters, containing 4 million cells or more.
The package will also include a camera calibration program and a
learning procedure that optimizes evidence ray models and other system
parameters to optimize a comparison of scene images with corresponding
synthetic images of grid reconstructions. The program will perform
substantially better than our 1996 program [A3]. The package
will be demonstrated on a small test robot platform that moves to
collect views from multiple vantage points, linked by wireless
ethernet to a large desk computer with at least 1,000 MIPS, which will
display images, 3D and 2D maps and navigational information.
In Year 2
We will write an additional set of identification routines that
extract from the 3D maps produced by the first layer the following:
navigable paths, localization by matching maps, major architectural
features, a few furniture-scale objects and people, long-range
topological maps and paths in such maps. The combination will be
demonstrated using the small robot system of year 1, but with the desk
computer upgraded or replaced to provide at least 1,500 to 2,000 MIPS,
and perhaps with the robot augmented with additional sensors we find
helpful. With the identification layer, the robot will be able to
navigate around a network of corridors and offices and build a sparse
labeled map, and can be commanded to go to specified locations. Since
the code will be new and little-tested, these demonstrations are
unlikely to be as reliable as required for actual
applications.
In Year 3 (Optional Year)
We will write at least three specialized programs as if for
commercial mobile robots that use the perception and identification
layers to orchestrate simulated commercial robot tasks, including
point to point delivery, floor traverse in a cleaning pattern and area
patrol as for a security robot. These will be demonstrated using our
test robot, with the offboard processor upgraded or replaced to have
at least 3,000 MIPS. The application programs will allow the robot to
be quickly trained to do its job in new locations, either
by leading the robot itself along a path or the boundaries of its work
area, or by outlining those on a map it constructs and displays. It
would be desirable in this third year to conduct more realistic tests
of the system using on one or more industrial vehicles. Possibly the
funding agency would be able to provide such. In any case, we would
attempt to find an industrial partner who could both supply vehicles
and perhaps provide a first steps toward
commercialization.
E. Schedule of Milestones (1)
At the end of year 1 - Perception Package
A 3D perception package that processes textured-lighting-enhanced
trinocular stereoscopic images at the rate of at least one set per
second using 1,000 MIPS of processing power. The program will perform
substantially better than our 1996 program [A3]. The package
will be demonstrated on a small test robot platform that moves to
collect views from multiple vantage points, linked by wireless
ethernet to a large desk computer with at least 1,000 MIPS, which will
display images, 3D and 2D maps and navigational information.
At the end of year 2 - Identification Package
An additional set of identification routines that extract from the
3D maps produced by the first layer the following: navigable paths,
localization by matching maps, major architectural features, a few
furniture-scale objects and people, long-range topological maps and
paths in such maps. The combination will be demonstrated using the
small robot system of year 1, but with the desk computer upgraded or
replaced to provide at least 1,500 to 2,000 MIPS, and perhaps with the
robot augmented with additional sensors we find helpful. With the
identification layer, the robot will be able to navigate around a
network of corridors and offices and build a sparse labeled map, and
can be commanded to go to specified locations. Since the code will be
new and little-tested, these demonstrations are unlikely to be as
reliable as required for actual applications.
(Optional Third Year)
At the end of year 3 - Application Packages
At least three specialized programs as if for commercial mobile
robots that use the perception and identification layers to
orchestrate simulated commercial robot tasks, including point to point
delivery, floor traverse in a cleaning pattern and area patrol as for
a security robot. These will be demonstrated using our test robot,
with the offboard processor upgraded or replaced to have at least
3,000 MIPS. The application programs will allow the robot to be
quickly trained to do its job in new locations, either by
leading the robot itself along a path or the boundaries of its work
area, or by outlining those on a map it constructs and displays. We
may also port the program to an industrial robot, if such is made
available.
F. Technology Transfer (2)
We target our research towards commercial realization of free-ranging
and self-installing mobile robots. We chose 3D grid mapping because
it promised commercially viable reliability, use video camera sensors
for their combination of high data density and low cost, and target
one second map updates to keep up with practical indoor robot speeds.
As the proposed round of research nears completion, we will seek
industrial partners or private financing to develop marketable
products.
A projected first commercial product from this effort is a
basketball-sized navigation head to be retrofitted on
existing robots, providing them with full autonomy (8-faced head,
left; potential customer, right):

The navigation head would contain 360 degrees of stereoscopic
cameras and other sensors, an inexpensive inertial navigation system
to inform it about small motions without depending on accurate
odometry from robot wheels, 1,000 MIPS of computational power,
software for basic navigation, and software hooks for applications
specific programming and hardware interfaces for vehicle controls,
probably through a standard connector installed into the
vehicles. Offered as a retrofit for existing factory transport
machines, self-powered floor cleaners and others, it could quite
possibly expand the market for robotic vehicles tenfold from the
current tens of thousands, and might reach millions if it could be
programmed to be a safety assist or autonomous controller for
forklifts.
After some years of development, we imagine the techniques
costs will fall and the performance rise enough to permit the design
of truly mass-market products. A possible first product with
mass-market potential might be a small robot vacuum cleaner, which can
reliably and systematically keep clean designated rooms in a home
following a simple introduction to the location, keeping itself
charged, and delivering dust to a base collecting station.

The simple vacuum cleaner may be followed by larger and smarter
utility robots with dusting arms. Mobile robots with dexterous arms,
vision and touch sensors will be be able to do various tasks. With
proper programming and minor accessories, such machines could pick up
clutter, retrieve and deliver articles, take inventory, guard homes,
open doors, mow lawns or play games. New applications will expand the
market and spur further advancements, when existing robots fall short
in acuity, precision, strength, reach, dexterity or processing
power. Capability, numbers sold, engineering and manufacturing
quality, and cost effectiveness should increase in a mutually
reinforcing spiral. Ultimately the process could result in
universal robots which can be do many different tasks,
orchestrated by applications-specific programs.

G. Comparison with other Research (3)
Self-guiding mobile robots for transport, cleaning, and inspection
have existed for decades, but only in locations that warranted
extensive installation cost and tolerated inflexible behavior. A few
tens of thousands of AGVs (Automatically Guided Vehicles) are at work
in factories, warehouses and other institutional locations. Many are
guided by buried signal-emitting wires defining their routes, with
switches signaling endpoints and collisions, a technique developed in
the 1960s. More advanced models, made possible by the advent of
microprocessors in the 1980s, use more flexible cues, such a optical
markings on a tiled floor. The latter may use ultrasonics and infrared
proximity sensors to detect and possibly negotiate their way around
obstacles. The most advanced machines, manufactured in the late 1980s
and 1990s, are guided by navigational markers, such as retroreflective
targets or active emitters at strategic locations, and by specialized
site-specific programming exploiting premeasured existing features,
like walls and known distances.
The newer systems can be installed with less physical effort, but all
require the services of a specialist to program the initial setup, and
for layout and route changes. The expense, inconvenience,
inflexibility and lack of independence stemming from the elaborate
setup greatly reduces the potential market for these systems. Only
very stable and high-value routes are candidates, and the high cost
and rigid behavior usually puts them at an economic disadvantage to
human-guided vehicles. Fully autonomous robots, which could be simply
shown or led through their paces by nonspecialists, then trusted to
execute their tasks reliably, would have a far greater market.
A generation of fully autonomous mobile robots, which navigate
without route preparation, has emerged in research laboratories
worldwide in the 1990s, made possible by more powerful
microprocessors. The majority of these use sonar or laser rangefinders
or very specialized image processing to build coarse two-dimensional
maps of their surroundings, from which they locate themselves relative
to their surroundings, and plan paths between destinations. Some of
the best such robots, winners of various competitions, are described
in reference [A0]. Presented are both fully autonomous
systems, and ones that require some human installation. The limited
information in horizontal maps allows for certain ambiguities, and
when navigating in new areas the fully autonomous machines have a mean
time between difficulties (getting lost, stuck, or even falling)
measured in hours. Previous generations of commercial robots were not
accepted by customers until the between-failure time exceeded several
months. It is our judgment, based on our own experiences and from
observing other projects, that two-dimensional autonomous navigation
is unlikely ever to meet this standard, but that 3D perceiving
systems, with about 1,000 times the world information, probably
can.
Some projects have used high-quality data from scanning laser
rangefinders to map rough outdoor terrain. These 2 1/2 D
approaches are not suitable indoors, where many flat surfaces act as
mirrors, and where complex clutter simply cannot be modeled by a
terrain elevation map. Also, these laser techniques, which process
small numbers of range measurements for each elevation point, fail
when a high percentage of range measurements are erroneous, as is
typical indoors.
A few attempts to make 3D maps in past simplified the problem by
reducing the data per cell to a single bit, suitable for planning
paths, but useless for statistically merging noisy measurements over
long periods. These works only in very limited circumstances, when
inflowing data was very high quality. Other attempts tried to reduce
the memory require by structuring the data in oct-trees
that merge cells with similar values. We know of none of the latter
that was carried to completion: a few experiments quickly convinced
that the computing cost, very high at best, became prohibitive.
Our own mobile robot research , ongoing
since the 1970s, has investigated sparse three-dimensional models and
dense two-dimensional grid maps, using camera, sonar and laser
sensors. In the early 1990s we guessed that practical indoor
navigation based on two-dimensional maps could be accomplished given
about 10 MIPS, but that 3D grid mapping would require about 10,000
MIPS and 100 megabytes of memory, both unavailable in the 1 MIPS
mainframe computers that we had been using. We attempted,
nevertheless, to gain some experience with 3D by spending our 1992
sabbatical year at Thinking Machines Corp., planning to use their CM5
multiprocessor supercomputer to do experiments. After a few trials,
we stumbled on approximations and economies of scale in 3D grids that
allowed us to produce a uniprocessor program that was about 100 times
faster than our extrapolations from efficient 2D programs had led us
to expect. This, combined with 20 MIPS that had become available in
computer workstations, put us on track to a practical 3D grid robot
controller in the foreseeable future. We added a perceptual front end
to the basic measurement-accumulating 3D package during another
sabbatical in 1996 at Daimler-Benz in Berlin, and almost immediately
obtained heartening results.
To our knowledge, our 3D grid programs are
the only effective package of this kind now available. We have been
pushing the limits, which has been possible partly because we did not
feel pressured to mount real-time robot demonstrations, and have been
able to work with programs that were somewhat impractical because they
were computationally too demanding. But the time has come for the
biggest push, which we think will bring us to practicality in just a
few more years. The proposed support would enable us to mount the
extended single-minded effort required to deliver useful free-ranging,
spatially-aware robot controllers just as the requisite computing
power arrives.
H. List of Key Personnel (3)
Hans P. Moravec
The PI is Hans Moravec, a principal research scientist with the
Carnegie Mellon Robotics Institute. He plans to apply his undivided
time and effort to the proposed work for the next several years. He
was able work with such singleness of purpose during his sabbatical
years of 1992 and 1996, but unrelated contracts and other commitments
distracted in other recent years. In preparation for the proposed
work he has finished all prior commitments, and refrained from making
others.
Moravec did his PhD work at Stanford from 1971 to 1979 working with
the Stanford Cart, which became the first computer-controlled robot to
visually map and drive its way through natural obstacle fields in
1979. In 1980 he founded Carnegie Mellons Mobile Robot
Laboratory, where he and a student, Alberto Elfes, first implemented
the idea of spatial occupancy evidence grids (in 2D) in 1983, which
proved a very robust way to combine blurry sensor data into sharp
maps, and has since been adopted by a majority of research mobile
robot projects. In the latter 1980s he devised several learning
techniques for optimizing the sensor models used to add data to grids.
In the 1990s he began serious efforts to implement grids in 3D, and
during a 1992 sabbatical, planned for supercomputer access, instead
produced an unexpectedly fast 3D grid program that ran on
workstations. He and a student, Martin C. Martin, made some modest
experiments with feeding the program stereoscopic range data between
other commitments from 1993-1995. In 1996 he undertook another
concentrated sabbatical effort, this time at a robot research facility
in Berlin maintained by Daimler-Benz AG, and produced a carefully
constructed program that reconstructed maps of stereoscopically-viewed
scenes that rather clearly was good enough to be used for navigation
and other purposes. The years following 1996 were again distracted by
unrelated contracts. He feels now the need for a directed effort as
concentrated as that during his sabbaticals, but extended to three
years.
Besides key concepts and research skills, Moravec brings to the
project some industrial experience. From 1983 to 1997 he was a
consultant and founding director of Denning Mobile Robotics Inc., and
ultimately unsuccessful company that nevertheless produced and sold
some useful robots, for security, warehouse transport, TV news-studio
camera motion, floor cleaning and research. He has maintained
contacts with other robot entrepreneurs, for instance John Holland of
Cybermotion, Inc., a company with similar interests and products that
is still operating. Of, course, he looks forward to robot companies
that thrive and grow, rather than merely survive.
Moravecs full CV can be found at
/users/hpm/hpm.cv.html
Martin C. Martin
Martin joined the Mobile Robot Laboratory and became as a PhD student
in 1993, having already done noted undergraduate work in vision-based
robotics at the University of Toronto. He quickly proved his skills
by modifying a sensor model learning program to efficiently use a
conjugate-gradient hill climbing strategy and learn 100 times faster
than the original implementation. He also put together a first front
end for the 3D grid program, and restored and conducted navigation
experiments with the Labs aging Uranus Mobile robot, as well as
being a valuable participant in a half-dozen other robotics efforts
not directly related to this proposal. Recently, towards completing
his thesis, Martin did all the planning, hardware, firmware, systems
work and high-level programming necessary to assemble the system
described elsewhere in this proposal, that uses an off-board
dual-processor Dell computer to control the Uranus robot. His system
serves as a prototype model for the testbeds we propose.
Martins thesis topic, to automatically evolve effective high
level organizations to integrate lower-level sensing and control
techniques into maximally-effective overall robot controllers, is
tangential to the exact focus of this proposal, but his participation
has been an essential part of the work so far, and will continue to be
so until he graduates. He is likely to graduate within the contract
period, in which case another student or perhaps additional fractional
effort from regular engineering employees will be needed to fill the
void.
Martins full CV is at
/users/mcm/cv
25% Engineering Support
In addition to the full-time participants, there is often a need for
specialized mechanical, electronic or software expertise in our work.
It is time-consuming, inefficient and distracting for the researchers
to always become skilled in every relevant facet of building and
keeping the robot system operational, and we often feel to need to
contract out some of the necessary work. The quality of the work is
usually superior when we do. Our work volume does not warrant a
full-time person, but we easily generate several month-long projects a
year, for things like building interface hardware and writing device
drivers, building cables and test stands, interfacing to university
networks, managing software upgrades and the like. Fortunately there
are many skilled individuals in the university and nearby whose
assistance we can contract as necessary.
I. Description of Facilities (1)
Carnegie Mellon University and its School of Computer Science
maintain a world-class networked computer facility, which we will use
a matter of course.
The Robotics Institute has not only computers, but extensive
electronic and mechanical shop facilities, which we are free to use as
needed. There are at least a dozen other mobile robot projects within
the institute with whom we can share advice and sometimes
personnel.
The Mobile Robot Laboratory maintains a small mechanical, electronic
and computer workshop, and its own stable of aging robots. For daily
work, we have several Sun, Silicon Graphics, Apple and Intel-based
computers, and adequate office budget to keep them maintained and
upgraded.
By the time this contract would be in effect, the School of Computer
Science will have completed construction of a large new building. Our
test robot would be located mainly on a third floor of this building,
a floor that has 40,000 square feet, 80 offices and labs, a huge
two-storey common area with an atrium, wide walkway access to an
outside parking lot should we wish to conduct outdoor experiments, and
also elevators to other floors, should we wish to do multi-floor
experiments and explore even more varied indoor environments,
including catwalks, machine shops, and large-robot test facilities.
At times there will be many people milling about, and we feel this is
an essential part of our test environment. Using a small robot will
minimize any danger to people in these tests.
J. Cost by Task
(1)
The proposed research involves the full time
effort of a principal investigator and a student, and 1/4 time access
to engineering talent. The first years budget has equipment
money to cover the cost of acquiring, interfacing and making
operational a small mobile robot system with a small onboard portable
computer and a fast wireless link to a high-end deskbound
robot-controlling computer. Also required in the first year are a
minimum of three digital cameras that can be mounted on the robot and
interfaced to the onboard computer. The equipment budget in
subsequent years is for upgrading this system as faster, or otherwise
improved, components become available. The deskbound computer is the
main focus, but faster cameras and communication link will also be
necessary to transition from start/stop to continuous robot
motion.
SECTION III
Bibliography
A0: Kortenkamp, D., Bonasso, R. and Murphy, R.,
Artificial Intelligence and Mobile Robots: Case Studies of
Successful Robot Systems, AAAI Press, MIT Press, 1998.
A1: Moravec, H., The Stanford Cart and The CMU
Rover, Proceedings of the IEEE, July 1983.
(COPY INCLUDED)
/users/hpm/
project.archive/robot.papers/1983/ieee83.mss
A2: Martin, M. and Moravec, H., Robot Evidence
Grids, CMU Robotics Institute Technical Report CMU-RI-TR-96-06,
March 1996.
(COPY INCLUDED)
/users/hpm/
project.archive/robot.papers/1996/RobotEvidenceGrids.abs.html
A3: Moravec, H., Robot Spatial Perception by
Stereoscopic Vision and 3D Evidence Grids, CMU Robotics Institute
Technical Report CMU-RI-TR-96-34, September 1996.
(COPY
INCLUDED)
/users/hpm/
project.archive/robot.papers/1996/9609.stereo.paper/SGabstract.html
A4: Moravec, H., Autonomous Free-Ranging Utility
Robots for Mass Market before 2005, January 1998.
/users/hpm/
project.archive/robot.papers/1998/busplan.overview.html
A5: Moravec, H., Curriculum Vita
/users/hpm/hpm.cv.html