To See the World

Hans Moravec
October 2004

In the 1970s computer vision was a fantasy, even million dollar research computers failed more often than not at toy tasks. In the 1980s very specialized industrial vision applications for fixed locations running on minicomputers began to appear, and in the 1990s became better, cheaper and more common. But only in the 2000s is vision for the wildly varying scenery of mobile robots becoming possible, on computers a thousandfold both as powerful and as cheap as the mainframes of the 1970s. Fantasies are becoming realities, and soon industries.

On February 20, 2003 Scott Friedman and I founded SEEGRID Corporation to commercialize the fruits of thirty years of work in robot perception and navigation.

After decades of commercial stagnation, robotics seems to be at a turning point. A half dozen companies have introduced small domestic robot vacuum cleaners, with sufficient market success to fuel the development of more advanced follow-ons. Hundreds of thousands of Sony's advanced AIBO robot pets have been sold despite their over $1000 price, enough to sustain the development of more advanced models, including humanoid robots. Meanwhile makers of industrial automatic guided vehicles and of industrial floor cleaning machines have begun to offer units that navigate using an onboard scanning laser rangefinder to construct floor-level maps of their routes. Other companies have demonstrated vehicles that determine their position with limited precision by tracking opportunistic distinctive visual features in the fields of view of onboard cameras.

Our work leapfrogs these early offerings by building dense realistic 3D maps of a vehicle's surroundings, suitable not only for simple position and obstacle finding, but also for safely exploring new routes and for recognizing large features such as walls, floors and doors as well as smaller objects including humans. Our methods can digest data from many different kinds of sensors, but in the near term simple cameras used in stereoscopic pairs are the most cost effective. We are ready for commercialization now in part because the considerable computational power required has arrived in inexpensive personal computers, and the cost continues to fall rapidly. Even so, our approach would not be feasible for many years yet without the large number of technical innovations we've accumulated in decades of research aimed squarely at this goal.

1980: Slow, risky 3D navigation (1 MIPS)

A mainframe PDP-10 computer allowed the Stanford "Cart" mobile robot to interpret stereo images from a sliding camera into sparse 3D maps of about 100 visually distinctive features, to localize its motion and identify and avoid obstacles on its path. It paused to glimpse and compute about 10 minutes for each meter of travel, and became navigationally confused about every 20 meters. The controlling program was developed from 1974 to 1979.

1990: Fast, good 2D navigation (20 MIPS)

Thousands of robot sonar ranges accumulated in 2D occupancy evidence grid maps via trained sensor models allowed mapping and navigation in real time on computer workstations, with navigation failures about every 1,000 meters. The technique was developed from 1983 to 1990.

2000: Dense 3D Grid Maps (500 MIPS)

A million ranges from 100 stereoscopic views merged in a 3D evidence grid produced a very high quality map with potential for very reliable navigation, in near real time on a robot-portable microcomputer. The stereo sensor model was trained to optimize the similarity of grid virtual views to the corresponding camera images of the real scene. 3D grid map programs were developed from 1992 to 2002.

2005: Industrial Transport by 3D Map (2,000 MIPS)

Falling camera and computing costs now allow commercialization of 3D grid methods. SEEGRID was formed in 2003 for this purpose. First products are material handling vehicles that record routes and replay them on command. More advanced applications are in development.