The Stanford Cart and The CMU Rover Hans P. Moravec The Robotics Institute Carnegie-Mellon University Pittsburgh, PA 15213 February 1983 The Stanford Cart work, conducted at the Stanford University Artificial Intelligence Laboratory, was supported over the years 1973 through 1980 by the Defense Advanced Research Projects Agency, the National Science Foundation and the National Aeronautics and Space Administration. The CMU Rover has been supported at the Carnegie-Mellon University Robotics Institute since 1981 by the Office of Naval Research under contract number N00014-81-K-0503. @heading(Abstract) The Stanford Cart was a remotely controlled TV equipped mobile robot. A computer program was written which drove the Cart through cluttered spaces, gaining its knowledge of the world entirely from images broadcast by an onboard TV system. The CMU Rover is a more capable, and nearly operational, robot being built to develop and extend the Stanford work and to explore new directions. The Cart used several kinds of stereopsis to locate objects around it in 3D and to deduce its own motion. It planned an obstacle avoiding path to a desired destination on the basis of a model built with this information. The plan changed as the Cart perceived new obstacles on its journey. The system was reliable for short runs, but slow. The Cart moved one meter every ten to fifteen minutes, in lurches. After rolling a meter it stopped, took some pictures and thought about them for a long time. Then it planned a new path, executed a little of it, and paused again. It successfully drove the Cart through several 20 meter courses (each taking about five hours) complex enough to necessitate three or four avoiding swerves; it failed in other trials in revealing ways. The Rover system has been designed with maximum mechanical and control system flexibility to support a wide range of research in perception and control. It features an omnidirectional steering system, a dozen onboard processors for essential real time tasks, and a large remote computer to be helped by a high speed digitizing/data playback unit and a high performance array processor. Distributed high level control software similar in organization to the Hearsay II speech understanding system and the beginnings of a vision library are being readied. By analogy with the evolution of natural intelligence, we believe that incrementally solving the control and perception problems of an autonomous mobile mechanism is one of the best ways of arriving at general artificial intelligence. @heading(Introduction) Experience with the Stanford Cart @cite(Moravec77, Moravec79, Moravec81), a minimal computer controlled mobile camera platform, suggested to me that, while maintaining such a complex piece of hardware was a demanding task, the effort could be worthwhile from the point of view of artificial intelligence and computer vision research. A roving robot is a source of copious and varying visual and other sensory data which force the development of general techniques if the controlling programs are to be even minimally successful. By contrast the (also important) work with disembodied data and fixed robot systems often focuses on relatively restricted stimulii and small image sets, and improvements tend to be in the direction of specialization. Drawing an analogy with the natural world, I believe it is no mere co-incidence that in all cases imaging eyes and large brains evolved in animals that first developed high mobility. @heading(The Stanford Cart) The Cart @cite(Moravec80) was a minimal remotely controlled TV equipped mobile robot (Figure 1) which lived at the Stanford Artificial Intelligence Laboratory (SAIL). A computer program was written which drove the Cart through cluttered spaces, gaining its knowledge of the world entirely from images broadcast by the onboard TV system (Figure 2). The Cart used several kinds of stereo vision to locate objects around it in 3D and to deduce its own motion. It planned an obstacle avoiding path to a desired destination on the basis of a model built with this information. The plan changed as the Cart perceived new obstacles on its journey. The system was reliable for short runs, but slow. The Cart moved one meter every ten to fifteen minutes, in lurches. After rolling a meter it stopped, took some pictures and thought about them for a long time. Then it planned a new path, executed a little of it, and paused again. It successfully drove the Cart through several 20 meter courses (each taking about five hours) complex enough to necessitate three or four avoiding swerves. Some weaknesses and possible improvements were suggested by these and other, less successful, runs. @heading(A Cart Run) A run began with a calibration of the Cart's camera. The Cart was parked in a standard position in front of a wall of carefully painted spots. A calibration program noted the disparity in position of the spots in the image seen by the camera with their position predicted from an idealized model of the situation. It calculated a distortion correction polynomial which related these positions, and which was used in subsequent ranging calculations (Figure 3). The Cart was then manually driven to its obstacle course (littered with large and small debris) and the obstacle avoiding program was started. It began by asking for the Cart's destination, relative to its current position and heading. After being told, say, 50 meters forward and 20 to the right, it began its maneuvers. It activated a mechanism which moved the TV camera, and digitized nine pictures as the camera slid in precise steps from one side to the other along a 50 cm track. A subroutine called the @i{interest operator} was applied to the one of these pictures. It picked out 30 or so particularly distinctive regions (features) in this picture. Another routine called the @i{correlator} looked for these same regions in the other frames (Figure 4). A program called the @i{camera solver} determined the three dimensional position of the features with respect to the Cart from their apparent movement from image to image (Figure 5). The @i{navigator} planned a path to the destination which avoided all the perceived features by a large safety margin. The program then sent steering and drive commands to the Cart to move it about a meter along the planned path. The Cart's response to such commands was not very precise. The camera was then operated as before, and nine new images were acquired. The control program used a version of the correlator to find as many of the features from the previous location as possible in the new pictures, and applied the camera solver. The program then deduced the Cart's actual motion during the step from the apparent three dimensional shift of these features. Some of the features were pruned during this process, and the interest operator was invoked to add new ones. This repeated until the Cart arrived at its destination or until some disaster terminated the program. Figures 6 and 7 document the Cart's internal world model at two points during a sample run. @heading(Some Details) The Cart's vision code made extensive use of a reductions of each acquired image. Every digitized image was stored as the original picture accompanied by a pyramid of smaller versions of the image reduced in linear size by powers of two, each successive reduction obtained from the last by averaging four pixels into one. @heading(Camera Calibration) The camera's focal length and geometric distortion were determined by parking the Cart a precise distance in front of a wall of many spots and one cross. A program digitized an image of the spot array, located the spots and the cross, and constructed a two dimensional polynomial that related the position of the spots in the image to their position in an ideal unity focal length camera, and another polynomial that converted points from the ideal camera to points in the image. These polynomials were used to correct the positions of perceived objects in later scenes (Figure 3). The algorithm began by determining the array's approximate spacing and orientation. It reduced by averaging and trimmed the picture to 64 by 64, calculated the Fourier transform of the reduced image and took its power spectrum, arriving at a 2D transform symmetric about the origin, and having strong peaks at frequencies corresponding to the horizontal and vertical and half-diagonal spacings, with weaker peaks at the harmonics. It multiplied each point [@i{i,j}] in this transform by point [@i{-j,i}] and points [@i{j-i,j+i}] and [@i{i+j,j-i}], effectively folding the primary peaks onto one another. The strongest peak in the 90 degree wedge around the y axis gave the spacing and orientation information needed by the next part of the process. The interest operator described later was applied to roughly locate a spot near the center of the image. A special operator examined a window surrounding this position, generated a histogram of intensity values within the window, decided a threshold for separating the black spot from the white background, and calculated the centroid and first and second moment of the spot. This operator was again applied at a displacement from the first centroid indicated by the orientation and spacing of the grid, and so on, the region of found spots growing outward from the seed. A binary template for the expected appearance of the cross in the middle of the array was constructed from the orientation/spacing data from the Fourier transform. The area around each of the found spots was thresholded on the basis of the expected cross area, and the resulting two valued pattern was convolved with the cross template. The closest match in the central portion of the picture was declared to be the origin. Two least-squares polynomials (one for X and one for Y) of third (or sometimes fourth) degree in two variables, relating the actual positions of the spots to the ideal positions in a unity focal length camera, were then generated and written into a file. The polynomials were used in the obstacle avoider to correct for camera roll, tilt, lens focal length and long term variations in the vidicon geometry. @heading(Interest Operator) The Cart vision code dealt with localized image patches called features. A feature is conceptually a point in the three dimensional world, but it was found by examining localities larger than points in pictures. A feature was good if it could be located unambiguously in different views of a scene. A uniformly colored region or a simple edge is not good because its parts are indistinguishable. Regions, such as corners, with high contrast in orthogonal directions are best. New features in images were picked by a subroutine called the @i{interest operator}, which returned regions that were local maxima of a directional variance measure, defined below. The idea was to select a relatively uniform scattering of good features over the image, so that a few would likely be picked on every visible object while textureless areas and simple edges were avoided. Directional variance was measured over small square windows. Sums of squares of differences of pixels adjacent in each of four directions (horizontal, vertical and two diagonals) over each window were calculated, and the window's interest measure was the minimum of these four sums. Features were chosen where the interest measure had local maxima. The chosen features were stored in an array, sorted in order of decreasing interest measure (Figure 4 top). Once a feature was chosen, its appearance was recorded as series of excerpts from the reduced image sequence. A 6 by 6 window was excised around the feature's location from each of the variously reduced pictures. Only a tiny fraction of the area of the original (unreduced) image was extracted. Four times as much area (but the same number of pixels) of the x2 reduced image was stored, sixteen times as much of the x4 reduction, and so on until at some level we had the whole image. The final result was a series of 6 by 6 pictures, beginning with a very blurry rendition of the whole picture, gradually zooming in linear expansions of two to a sharp closeup of the feature. @heading(Correlation) Deducing the 3D location of features from their projections in 2D images requires that we know their position in two or more such images. The @i(correlator) was a subroutine that, given a feature description produced by the interest operator from one image, found the best match in a different, but similar, image. Its search area could be the entire new picture, or a rectangular sub-window. The search used a coarse to fine strategy that began in reduced versions of the pictures. Typically the first step took place at the x16 (linear) reduction level. The 6 by 6 window at that level in the feature description, that covered about one seventh of the total area of the original picture, was convolved with the search area in the correspondingly reduced version of the second picture. The 6 by 6 description patch was moved pixel by pixel over the approximately 15 by 16 destination picture, and a correlation coefficient was calculated for each trial position. The position with the best match was recorded. The 6x6 area it occupied in the second picture was mapped to the x8 reduction level, where the corresponding region was 12 pixels by 12. The 6 by 6 window in the x8 reduced level of the feature description was then convolved with this 12 by 12 area, and the position of best match was recorded and used as a search area for the x4 level. The process continued, matching smaller and smaller, but more and more detailed windows until a 6 by 6 area was selected in the unreduced picture (Figure 4 bottom). This "divide and conquer" strategy was in general able to search an entire picture for a match more quickly (because most of the searching was done at high reduction levels) and more reliably (because context up to and including the entire picture guided the search) than a straightforward convolution over even a very restricted search area. @heading{Slider Stereo} At each pause on its computer controlled itinerary the Cart slid its camera from left to right on a 52 cm track, taking 9 pictures at precise 6.5 cm intervals. Points were chosen in the fifth (middle) of these 9 images, either by the correlator to match features from previous positions, or by the interest operator. The camera slid parallel to the horizontal axis of the (distortion corrected) camera co-ordinate system, so the parallax-induced displacement of features in the 9 pictures was purely horizontal. The correlator was applied eight times to look for the points chosen in the central image in each of other eight pictures. The search was restricted to a narrow horizontal band. This had little effect on the computation time, but it reduced the probability of incorrect matches. In the case of correct matches, the distance to the feature was inversely proportional to its displacement from one image to another. The uncertainty in such a measurement is the difference in distance a shift one pixel in the image would make. The uncertainty varies inversely with the physical separation of the camera positions where the pictures were taken (the stereo baseline). Long baselines give more accurate distance measurements. After the correlation step the program knew a feature's position in nine images. It considered each of the 36 (9 values taken 2 at a time) possible image pairings as a stereo baseline, and recorded the estimated (inverse) distance of the feature in a histogram. Each measurement added a little normal curve to the histogram, with mean at the estimated distance, and standard deviation inversely proportional to the baseline, reflecting the uncertainty. The area under each curve was made proportional to the product of the goodness of the matches in the two images (in the central image this quantity is taken as unity), reflecting the confidence that the correlations were correct. The distance to the feature was indicated by the largest peak in the resulting histogram, if this peak was above a certain threshold. If below, the feature was forgotten (Figure 5). The correlator sometimes matched features incorrectly. The distance measurements from incorrect matches in different pictures were not consistent. When the normal curves from 36 pictures pairs are added up, the correct matches agree with each other, and build up a large peak in the histogram, while incorrect matches spread themselves more thinly. Two or three correct correlations out of the eight usually built a peak sufficient to offset a larger number of errors. In this way eight applications of a mildly reliable operator interacted to make a very reliable distance measurement. @heading(Motion Stereo) After having determined the 3D location of objects at one position, the computer drove the Cart about a meter forward. At the new position it slid the camera and took nine pictures. The correlator was applied in an attempt to find all the features successfully located at the previous position. Feature descriptions extracted from the central image at the last position were searched for in the central image at the new stopping place. Slider stereo then determined the distance of the features so found from the Cart's new position. The program now knew the 3D position of the features relative to its camera at the old and the new locations. Its own movement was deduced from 3D co-ordinate transform that related the two. The program first eliminated mis-matches in the correlations between the central images at the two positions. Although it didn't yet have the co-ordinate transform between the old and new camera systems, the program knew the distance between pairs of feature positions should be the same in both. It made a matrix in which element [@i{i,j}] is the absolute value of the difference in distances between points @i{i} and @i{j} in the first and second co-ordinate systems divided by the expected error (based on the one pixel uncertainty of the ranging). Each row of this matrix was summed, giving an indication of how much each point disagreed with the other points. The idea is that while points in error disagree with virtually all points, correct positions agree with all the other correct ones, and disagree only with the bad ones. The worst point was deleted, and its effect removed from the remaining points in the row sums. This pruning was repeated until the worst error was within the error expected from the ranging uncertainty. After the pruning, the program had a number of points, typically 10 to 20, whose position error was small and pretty well known. The program trusted these, and recorded them in its world model, unless it had already done so at a previous position. The pruned points were forgotten forevermore. The 3D rotation and translation that related the old and new Cart position was then calculated by a Newton's method iteration that minimized the sum of the squares of the distances between the transformed first co-ordinates and the raw co-ordinates of the corresponding points at the second position, with each term divided by the square of the expected uncertainty in the 3D position of the points involved. @heading(Path Planning) The Cart vision system modelled objects as simple clouds of features. If enough features are found on each nearby object, this model is adequate for planning a non-colliding path to a destination. The features in the Cart's 3D world model can be thought of as fuzzy ellipsoids, whose dimensions reflect the program's uncertainty of their position. Repeated applications of the interest operator as the Cart moves caused virtually all visible objects to be become modelled as clusters of overlapping ellipsoids. To simplify the problem, the ellipsoids were approximated by spheres. Those spheres sufficiently above the floor and below the Cart's maximum height were projected on the floor as circles. The one meter square Cart itself was modelled as a 3 meter circle. The path finding problem then became one of maneuvering the Cart's 3 meter circle between the (usually smaller) circles of the potential obstacles to a desired location. It is convenient (and equivalent) to conceptually shrink the Cart to a point, and add its radius to each and every obstacle. An optimum path in this environment will consist of either a straight run between start and finish, or a series of tangential segments between the circles and contacting arcs (imagine loosely laying a string from start to finish between the circles, then pulling it tight). The program converted the problem to a shortest path in graph search. There are four possible paths between each pair of obstacles because each tangent can approach clockwise or counterclockwise. Each tangent point became a vertex in the graph, and the distance matrix of the graph (which had an entry for each vertex pair) contained sums of tangential and arc paths, with infinities for blocked or impossible routes. The shortest distance in this space can be found with an algorithm whose running time is @b(O)(n@+(3)) in the number of vertices, and the Cart program was occasionally run using this exact procedure. It was run more often with a faster approximation that made each obstacle into only two vertices (one for each direction of circumnavigation). A few other considerations were essential in path planning. The charted routes consisted of straight lines connected by tangent arcs, and were thus plausible paths for the Cart, which steered like an automobile. This plausibility was not necessarily true of the start of the planned route, which, as presented thus far, did not take the initial heading of the Cart into account. The plan could, for instance, include an initial segment going off 90 degrees from the direction in which the Cart pointed, and thus be impossible to execute. This was handled by including a pair of "phantom" obstacles along with the real perceived ones. The phantom obstacles had a radius equal to the Cart's minimum steering radius, and were placed, in the planning process, on either side of the Cart at such a distance that after their radius was augmented by the Cart's radius (as happened for all the obstacles), they just touched the Cart's centroid, and each other, with their common tangents being parallel to the direction of the Cart's heading. They effectively blocked the area made inaccessible to the Cart by its maneuverability limitations (Figure 6). @cite(Lozano79) describes an independently developed, but very similar, approach to finding paths around polygonal obstacles. @subheading{Path Execution} After the path to the destination had been chosen, a portion of it had to be implemented as steering and motor commands and transmitted to the Cart. The control system was primitive. The drive motor and steering motors could be turned on and off at any time, but there existed no means to accurately determine just how fast or how far they had gone. The program made the best of this bad situation by incorporating a model of the Cart that mimicked, as accurately as possible, the Cart's actual behavior. Under good conditions, as accurately as possible means about 20%; the Cart was not very repeatable, and was affected by ground slope and texture, battery voltage, and other less obvious externals. The path executing routine began by excising the first .75 meters of the planned path. This distance was chosen as a compromise between average Cart velocity, and continuity between picture sets. If the Cart moved too far between picture digitizing sessions, the picture would change too much for reliable correlations. This is especially true if the Cart turns (steers) as it moves. The image seen by the camera then pans across the field of view. The Cart had a wide angle lens that covers 60 degrees horizontally. The .75 meters, combined with the turning radius limit (5 meters) of the Cart resulted in a maximum shift in the field of view of 15 degrees, one quarter of the entire image. The program examined the Cart's position and orientation at the end of the desired .75 meter lurch, relative to the starting position and orientation. The displacement was characterized by three parameters; displacement forward, displacement to the right and change in heading. In closed form the program computed a path that accomplished this movement in two arcs of equal radius, but different lengths. The resulting trajectory had a general "S" shape. Rough motor timings were derived from these parameters. The program then used a simulation that took into account steering and drive motor response to iteratively refine the solution. @heading(Cart Experiments) The system described above only incompletely fulfills some of the hopes I had when the work began many years ago. One of the most serious limitations was the excruciating slowness of the program. In spite of my best efforts, and many compromises in the interest of speed, it took 10 to 15 minutes of real time to acquire and consider the images at each meter long lurch, on a lightly loaded DEC KL-10. This translated to an effective Cart velocity of 3 to 5 meters an hour. Interesting obstacle courses (two or three major obstacles, spaced far enough apart to permit passage within the limits of the Cart's size and maneuverability) were about 20 meters long, so interesting Cart runs took 5 hours. The reliability of individual moves was high, as it had to be for a twenty lurch sequence to have any chance of succeeding, but the demanding nature of each full run and the limited amount of time available for testing (discussion of which is beyond the scope of this paper) after the bulk of the program was debugged, ensured that many potential improvements were left untried. Three full (about 20 meter) runs were digitally recorded and filmed, two indoors and one outdoors. Two indoor false starts, aborted by failure of the program to perceive an obstacle, were also recorded. The two long indoor runs were nearly perfect. In the first long indoor run, the Cart successfully slalomed its way around a chair, a large cardboard icosahedron, and a cardboard tree then, at a distance of about 16 meters, encountered a cluttered wall and backed up several times trying to find a way around it (Figures 6 and 7 are snapshots from this run). The second long indoor run involved a more complicated set of obstacles, arranged primarily into two overlapping rows blocking the goal. I had set up the course hoping the Cart would take a long, picturesque (the runs were being filmed) S shaped path around the ends of the rows. To my chagrin, it instead tried for a tricky shortcut. The Cart backed up twice to negotiate the tight turn required to go around the first row, then executed several tedious steer forward / back up moves, lining itself up to go through a gap barely wide enough in the second row. This run had to be terminated, sadly, before the Cart had gone through the gap because of declining battery charge and increasing system load. The outdoor run was less successful. It began well; in the first few moves the program correctly perceived a chair directly in front of the camera, and a number of more distant cardboard obstacles and sundry debris. Unfortunately, the program's idea of the Cart's own position became increasingly wrong. At almost every lurch, the position solver deduced a Cart motion considerably smaller than the actual move. By the time the Cart had rounded the foreground chair, its position model was so far off that the distant obstacles were replicated in different positions in the Cart's confused world model, because they had been seen early in the run and again later, to the point where the program thought an actually existing distant clear path was blocked. I restarted the program to clear out the world model when the planned path became too silly. At that time the Cart was four meters in front of a cardboard icosahedron, and its planned path lead straight through it. The newly re-incarnated program failed to notice the obstacle, and the Cart collided with it. I manually moved the icosahedron out of the way, and allowed the run to continue. It did so uneventfully, though there were continued occasional slight errors in the self position deductions. The Cart encountered a large cardboard tree towards the end of this journey and detected a portion of it only just in time to squeak by without colliding (Figure 2 was a photograph taken during this run). The two short abortive indoor runs involved setups nearly identical to the two-row successful long run described one paragraph ago. The first row, about three meters in front of the Cart's starting position contained a chair, a real tree (a small cypress in a planting pot), and a polygonal cardboard tree. The Cart saw the chair instantly and the real tree after the second move, but failed to see the cardboard tree ever. Its planned path around the two obstacles it did see put it on a collision course with the unseen one. Placing a chair just ahead of the cardboard tree fixed the problem, and resulted in a successful run. The finished program never had trouble with chairs. @subheading(Problems) These tests revealed some weaknesses in the program. The system did not see simple polygonal (bland and featureless) objects reliably, and its visual navigation was fragile under certain conditions. Examination of the program's internal workings suggested some causes and possible solutions. The program sometimes failed to see obstacles lacking sufficient high contrast detail within their outlines. In this regard, the polygonal tree and rock obstacles I whimsically constructed to match diagrams from a 3D drawing program, were a terrible mistake. In none of the test runs did the programs ever fail to see a chair placed in front of the Cart, but half the time they did fail to see a pyramidal tree or an icosahedral rock made of clean white cardboard. These contrived obstacles were picked up reliably at a distance of 10 to 15 meters, silhouetted against a relatively unmoving (over slider travel and Cart lurches) background, but were only rarely and sparsely seen at closer range, when their outlines were confused by a rapidly shifting background, and their bland interiors provided no purchase for the interest operator or correlator. Even when the artificial obstacles were correctly perceived, it was by virtue of only two to four features. In contrast, the program usually tracked five to ten features on nearby chairs. In the brightly sunlit outdoor run the artificial obstacles had another problem. Their white coloration turned out to be much brighter than any "naturally" occurring extended object. These super bright, glaring, surfaces severely taxed the very limited dynamic range of the Cart's vidicon/digitizer combination. When the icosahedron occupied 10% of the camera's field of view, the automatic target voltage circuit in the electronics turned down the gain to a point where the background behind the icosahedron appeared nearly solid black. The second major problem exposed by the runs was glitches in the Cart's self-position model. This model was updated after a lurch by finding the 3D translation and rotation that best related the 3D position of the set of tracked features before and after the lurch. In spite of the extensive pruning that preceded this step, (and partly because of it, as is discussed later) small errors in the measured feature positions sometimes caused the solver to converge to the wrong transform, giving a position error beyond the expected uncertainty. Features placed into the world model before and after such a glitch were not in the correct relative positions. Often an object seen before was seen again after, now displaced, with the combination of old and new positions combining to block a path that was in actuality open. This problem showed up mainly in the outdoor run. I had observed it indoors in the past, in simple mapping runs, before the entire obstacle avoider was assembled. There appear to be two major causes for it, and a wide range of supporting factors. Poor seeing, resulting in too few correct correlations between the pictures before and after a lurch, was one culprit. The highly redundant nine eyed stereo ranging was very reliable, and caused few problems, but the non-redundant correlation necessary to relate the position of features before and after a lurch was error prone. Sometimes the mutual-distance invariance pruning that followed was overly aggressive, and left too few points for a stable least-squares co-ordinate fit. The outdoor runs encountered another problem. The program ran so slowly that shadows moved significantly (up to a half meter) between lurches. Their high contrast boundaries were favorite points for tracking, enhancing the program's confusion. @subheading(Quick Fixes) Though elaborate (and thus far untried in our context) methods such as edge matching may greatly improve the quality of automatic vision in future, subsequent experiments with the program revealed some modest incremental improvements that would have solved most of the problems in the test runs. The issue of unseen cardboard obstacles turns out to be partly one of over-conservatism on the program's part. In all cases where the Cart collided with an obstacle it had correctly ranged a few features on the obstacle in the prior nine-eyed scan. The problem was that the much more fragile correlation between vehicle forward moves failed, and the points were rejected in the mutual distance test. Overall the nine-eyed stereo produced very few errors. If the path planning stage had used the pre-pruning features (still without incorporating them permanently into the world model) the runs would have proceeded much more smoothly. All of the most vexing false negatives, in which the program failed to spot a real obstacle, would have been eliminated. There would have been a very few false positives, in which non-existent ghost obstacles would have been perceived. One or two of these might have caused an unnecessary swerve or backup, but such ghosts would not pass the pruning stage, and the run would have recovered after the initial, non-catastrophic, glitch. The self-position confusion problem is related, and in retrospect may be considered a trivial bug. When the path planner computed a route for the Cart, another subroutine took a portion of this plan and implemented it as a sequence of commands to be transmitted to the Cart's steering and drive motors. During this process it ran a simulation that modelled the Cart acceleration, rate of turning and so on, and which provided a prediction of the Cart's position after the move. With the old hardware the accuracy of this prediction was not great, but it nevertheless provided much a priori information about the Cart's new position. This information was used, appropriately weighted, in the least-squares co-ordinate system solver that deduced the Cart's movement from the apparent motion in 3D of tracked features. It was not used, however, in the mutual distance pruning step that preceeded this solving. When the majority of features had been correctly tracked, failure to use this information did not hurt the pruning. But when the seeing was poor, it could make the difference between choosing a spuriously agreeing set of mis-tracked features and the small correctly matched set. Incorporating the prediction into the pruning, by means of a heavily weighted point that the program treats like another tracked feature, removed almost all the positioning glitches when the program was fed the pictures from the outdoor run. More detail on all these areas can be found in @cite(Moravec80). @heading(The CMU Rover) The major impediments to serious extensions of the Cart work were limits to available computation, resulting in debilitatingly long experimental times, and the very minimal nature of the robot hardware, which precluded inexpensive solutions for even most basic functions (like "roll a meter forward"). We are addressing these problems at CMU in an ambitious new effort centered around a new, small but sophisticated mobile robot dubbed the CMU Rover. The project so far has been focused on developing a smoothly functional and highly capable vehicle and associated support system which will serve a wide variety of future research. The shape, size, steering arrangements and onboard and external processing capabilities of the Rover system were chosen to maximize the flexibility of the system (naturally limited by present day techniques). The robot is cylindrical, about a meter tall and 55 cm in diameter (Figure 8) and has three individually steerable wheel assemblies which give it a full three degrees of freedom of mobility in the plane (Figures 9, 10). Initially it will carry a TV camera on a pan/tilt/slide mount, several short range infrared and long range sonar proximity detectors, and contact switches. Our design calls for about a dozen onboard processors (at least half of them powerful 16 bit MC68000s) for high speed local decision making, servo control and communication (Figure 13). Serious processing power, primarily for vision, is to be provided at the other end of a remote-control link by a combination of a host computer VAX 11/780 an ST-100 array processor (a new machine from a new company, Star Technologies Inc., which provides 100 million floating point operations per second) and a specially designed high performance analog data acquisition and generation device. The Stanford Cart used fifteen minutes of computer time to move a meter. With this new CMU hardware, and some improved algorithms, we hope to duplicate (and improve on) this performance in a system that runs at least ten times as fast, leaving room for future extensions. The fifteen minutes for each meter-long move in the Cart's obstacle avoiding program came in three approximately equal chunks. The first five minutes were devoted to digitizing the nine pictures from a slider scan. Though the SAIL computer had a flash digitizer which sent its data through a disk channel into main memory at high speed, it was limited by a poor sync detector. Often the stored picture was missing scanlines, or had lost vertical sync, i.e. had rolled vertically. In addition, the image was quantized to only four bits per sample; 16 grey levels. To make one good picture the program digitized 30 raw images in rapid succession, intercompared them to find the largest subset of nearly alike pictures (on the theory that the non-spoiled ones would be similar, but the spoiled ones would differ even from each other) and averaged this "good" set to obtain a less noisy image with six bits per pixel. The new digitizing hardware, which can sample a raw analog waveform, and depends on software in the array processor to do sync detection, should cut the total time to under one second per picture. The next five minutes was spent doing the low level vision; reducing the images, sometimes filtering them, applying the interest operator and especially the correlator, and statistical pruning of the results. The array processor should be able to do all this nearly 100 times faster. The last five minutes was devoted to higher level tasks; maintaining the world model, path planning and generating graphical documentation of the program's thinking. Some steps in this section may be suitable for the array processor, but in any case we have found faster algorithms for much of it, for instance a shortest path in graph algorithm which makes maximum use of the sparsity of the distance matrix produced during the path planning. We hope eventually to provide a manipulator on the Rover's topside, but there is no active work on this now. We chose the high steering flexibility of the current design partly to ease the requirements on a future arm. The weight and power needed can be reduced by using the mobility of the Rover to substitute for the shoulder joint of the arm. Such a strategy works best if the Rover body is given a full three degrees of freedom (X, Y and angle) in the plane of the floor. Conventional steering arrangements as in cars give only two degrees at any instant. @heading(Rover Details) Three degrees of freedom of mobility are achieved by mounting the chassis on three independently steerable wheel assemblies (Figures 9-12). The control algorithm for this arrangement at every instant orients the wheels so that lines through their axles meet at a common point. Properly orchestrated this design permits unconstrained motion in any (2D) direction, and simultaneous independent control of the robot's rotation about its own vertical axis. An unexpected benefit of this agility is the availability of a "reducing gear" effect. By turning about the vertical axis while moving forward the robot derives a mechanical advantage for its motors. For a given motor speed, the faster the Rover spins, the slower it travels forward, and the steeper the slope it can climb. (Visualization of this effect is left as an exercise for the reader.) To permit low friction steering while the robot is stationary, each assembly has two parallel wheels connected by a differential gear. The drive shaft of the differential goes straight up into the body of the robot. A concentric hollow shaft around this one connects to the housing of the differential (Figure 11). Turning the inner shaft causes the wheels to roll forwards or backwards, turning the outer one steers the assembly, with the two wheels rolling in a little circle. The assemblies were manufactured for us by Summit Gear Corp. Each shaft is connected to a motor and a 4000 count/revolution optical shaft encoder (Datametrics K3). The two motors and two encoders are stacked pancake fashion on the wheel assembly, speared by the shafts. There are no gears except for the ones in the differential. (Figure 11 shows a schematic cross section of a complete motor/wheel-assembly structure, Figure 10 a partially assembled stack in the flesh). The motors are brushless with samarium-cobalt permanent magnet rotors and three-phase windings (Inland Motors BM-3201). With the high energy magnet material, this design has better performance when the coils are properly sequenced than a conventional rotating coil motor. The coils for each are energized by six power MOSFETs (Motorola MTP1224) mounted in the motor casing and switched by six opto-isolators (to protect the controlling computers from switching noise) whose LEDs are connected in bidirectional pairs in a delta configuration, and lit by three logic signals connected to the vertices of the delta (Figure 12). The motor sequencing signals come directly from onboard microprocessors, one for each motor. These are CMOS (Motorola MC146805 with Hitachi HM6116 RAMs) to keep power consumption low. Each processor pulse width modulates and phases its motor's windings, and observes its shaft encoder, to servo the motor to a desired motion (supplied by yet another processor, a Motorola 68000, the @i(Conductor) as a time parameterized function). Though the servo loop works in its present form, several approximations were necessary in this real-time task because of the limited arithmetic capability of the 6805. We will be replacing the 6805 with the forthcoming MC68008, a compact 8 bit bus version of the 68000. The shaft encoder outputs and the torques from all the motors, as estimated by the motor processors, are monitored by another processor, @i(the Simulator), a Motorola MC68000 (with all CMOS support circuitry the power requirement for our 32K 68000 is under one watt. The new high performance 74HC series CMOS allows operation at full 10MHz speed.) , which maintains a dead-reckoned model of the robot's position from instant to instant. The results of this simulation (which represents the robot's best position estimate) are compared with the desired position, produced by another 68000, @i(the Controller), in the previously introduced @i(the Conductor), which orchestrates the individual motor processors. The Conductor adjusts the rates and positions of the individual motors in an attempt to bring the Simulator in line with requests from the Controller, in what amounts to a highly non-linear feedback loop. Other onboard processors are: @begin(description) Communication @\ A 68000 which maintains an error corrected and checked packet infrared link with a large controlling computer (a VAX 11/780 helped out by an ST-100 array processor and a custom high speed digitizer) which will do the heavy thinking. Programs run in the Controller are obtained over this link. Sonar @\ A 6805 which controls a number of Polaroid sonar ranging devices around the body of the Rover. These will be used to maintain a rough navigation and bump avoidance model. All measurements and control functions of this processor and the following ones are available (on request over a serial link) to the Controller. Camera @\ A 6805 which controls the pan, tilt and slide motors of the onboard TV camera. The compact camera broadcasts its image on a small UHF or microwave transmitter. The signal is received remotely and the video signal captured by a high bandwidth digitizer system and then read by the remote VAX. There are tentative plans for a minimal vision system using a 68000 with about 256K of extra memory onboard the Rover, for small vision tasks when the Rover is out of communication with the base system. Proximity @\ A 6805 which monitors several short range modulated infrared proximity detectors which serve as a last line of defense against collision, and which sense any drop off in the floor, and contact switches. Utility @\ A 6805 which senses conditions such as battery voltage and motor temperature, and which controls the power to non-essential but power hungry systems like the TV camera and transmitter. @end(description) Communication between processors is serial, via Harris CMOS UARTs, at a maximum speed of 256 kilobaud. The Conductor talks with the motor processors on a shared serial line and the Controller communicates with the Sonar, Camera, Proximity, Utility and any other peripheral processors by a similar method. The processors live in a rack on the second storey of the robot structure (Figure 8, 13), between the motor and battery assembly (first floor) and the camera plane (penthouse). Figure 13 shows the initial interconnection. The Rover is powered by six sealed lead-acid batteries (Globe gel-cell 12230) with a total capacity of 60 amp hours at 24 volts. The motors are powered directly from these, the rest of the circuitry derives its power indirectly from them through switching DC/DC converters (Kepco RMD-24-A-24 and Semiconductor Circuits U717262). Each 6805 processor draws about one eighth of a watt, each 68000 board only one watt. Physically the robot is a meter tall and 55 cm in diameter. It weighs 90 kilograms. The maximum acceleration is one quarter g, and the top speed is 10 kilometers/hour. With appropriate onboard programming the motions should be very smooth. The great steering flexibility will permit simulation of other steering systems such as those of cars, tanks and boats and other robots by changes in programming. @heading(Progress) As of this writing (January 1983) the robot's major mechanical and electronic structures are complete, mostly tested, and being assembled. An instability in the servo control algorithms which had held us up for several months has finally been solved, and we expect baby's first "steps" early in 1983. The digitizer unit which will receive and generate video and other data for the robot is under construction. Its specifications include four channels each with two megabytes of memory and two ports able to transfer data at 100 megabytes per second. @heading(Promises) The high level control system has become very interesting. Initially we had imagined a language in which to write scripts for the onboard Controller similar to the AL manipulator language developed at Stanford @cite(Goldman81), from which the commercial languages @b(VAL) at Unimation @cite(Unimation79) and the more sophisticated @b(AML) @cite(Taylor82) at IBM were derived. Paper attempts at defining the structures and primitives required for the mobile application revealed that the essentially linear control structure of these state-of-the-art arm languages was inadequate for a rover. The essential difference is that a rover, in its wanderings, is regularly "surprised" by events it cannot anticipate, but with which it must deal. This requires that routines for responding to various situations can be activated in arbitrary order, and run concurrently. We briefly examined a production system, as used in many "expert systems", as the framework for the requisite real time concurrency, but have now settled on a structure similar to that developed for the CMU @b(Hearsay II) speech understanding project @cite(Erman80). Independent processes will communicate via messages posted on a commonly accessible data structure we call a @i(blackboard). The individual processes, some of which will run under control of a spare real time operating system on one or more of the onboard 68000s, others of which will exist at the other end of a radio link on the VAX, change their relative priority as a consequence of relevant messages on the blackboard. For instance, a note from several touch sensors signalling a collision is taken as a cue by the avoidance routine to increase its running rate, and to post messages which trigger the motor co-ordinating routines to begin evasive actions. We plan to implement the multiple processes required for this task on each of several of the onboard 68000's with the aid of a compact (4K), efficient real time operating system kernel called VRTX available from Hunter & Ready. A more detailed description of the state of this work may be found in @cite(Elfes83). Other interesting preliminary thinking has resulted in a scheme by which a very simple arm with only three actuators will enable the robot, making heavy use of its great steering flexibility, to enter and leave through a closed standard office door (Figure 14). Stepping into a more speculative realm, we are considering approaches to model based vision @cite(Brooks81) which would permit recognition of certain classes of objects seen by the robot. Discussions with the distributed sensor net crew here at CMU @cite(Rashid81) has raised the possibility of equipping the robot with ears, so it could passively localize sound, and thus perhaps respond correctly, both semantically and geometrically, to a spoken command like "Come here!" (using, in addition, speech understanding technology also developed at CMU @cite(Waibel81)). We are also toying with the idea of a phased array sonar with about 100 transducers operating at 50 KHz which, in conjunction with the high speed analog conversion device mentioned above and the array processor, would be able to produce a modest resolution depth map (and additional information) of a full hemisphere in about one second, by sending out a single spherical pulse, then digitally combining the returned echoes from the individual transducers with different delay patterns to synthesize narrow receiving beams. @heading(Philosophy) It is my view that developing a responsive mobile entity is the surest way to approach the problem of general intelligence in machines. Though computers have been programmed to do creditable jobs in many @i(intellectual) domains, competent performance in @i(instinctive) domains like perception and common sense reasoning is still elusive. I think this is because the instinctive skills are fundamentally much harder. While human beings learned most of the intellectual skills over a few thousand years, the instinctive skills were genetically honed for hundreds of millions of years, and are associated with large, apparently efficiently organized, fractions of our brain; vision, for example, is done by a specialized 10% of our neurons. Many animals share our instinctive skills, and their evolutionary record provides clues about the conditions that foster development of such skills. A universal feature that most impresses me in this context is that @i(all animals that evolved perceptual and behavioral competence comparable to that of humans first adopted a mobile way of life). This is perhaps a moot point in the case of the vertebrates, which share so much of human evolutionary history, but it is dramatically confirmed among the invertebrates. Most molluscs are sessile shellfish whose behavior is governed by a nervous system of a few hundred neurons. Octopus and squid are molluscs that abandoned life in the shell for one of mobility; as a consequence they developed imaging eyes, a large (annular!) brain, dexterous manipulators and an unmatched million channel color display on their surfaces. By contrast no sessile animal nor any plant shows any evidence of being even remotely this near to the human behavioral competence. My conclusion is that @i(solving the day to day problems of developing a mobile organism steers one in the direction of general intelligence, while working on the problems of a fixed entity is more likely to result in very specialized solutions). I believe our experience with the control language for the Rover vis a vis the languages adequate for a fixed arm, is a case in point. My experiences with computer vision during the Cart work re-inforce this opinion; constantly testing a program against fresh real-world data is nothing at all like optimizing a program to work well with a limited set of stored data. The variable and unpredictable world encountered by a rover applies much more selection pressure for generality and robustness than the much narrower and more repetitive stimuli experienced by a fixed machine. Mobile robotics may or may not be the fastest way to arrive at general human competence in machines, but I believe it is one of the surest roads. @heading(Related Work) Other groups have come to similar conclusions, and have done sophisticated mobile robot work in past @cite(Raphael76), @cite(Yakimovsky78). The robotics group at Stanford has acquired a new, experimental, mobile robot from Unimation Inc., and plans research similar to ours @cite(Binford82). This new Unimation rover @cite(Carlisle81) is very similar in size, shape and mechanical capabilities to the machine we are building. It achieves a full three degrees of freedom of floor-plane mobility by use of three novel "omnidirectional" wheels which, by virtue of rollers in place of tires, can freely move in a direction broadside to the wheel plane as well as performing the usual wheel motion under motor control. @newpage @subheading(Figure 1: The Stanford Cart) @subheading(Figure 2: The Cart on an Obstacle Course) @subheading(Figure 3: A Cart's Eye View of the Calibration Grid) The Cart camera's focal length and distortions were determined by parking the Cart a precise distance in front of, and aiming its camera at, a carefully painted array of spots pattern, and running a calibration program. The program located the spots in the image and fitted a 2 dimensional, 3rd degree polynomial which converted actual positions in the image to co-ordinates in an ideal unity focal length camera. The picture presented here was obtained by running a corresponding grid in the unity focal length frame through the inverse function of the polynomial so obtained and superimposing it on the raw spot image. @subheading(Figure 4: Interest Operator and Correlator Results) The upper picture shows points picked out by an application of the Interest Operator. The lower picture shows the Correlator's attempt to find the same points in an image of the same scene taken from a different point of view. @subheading(Figure 5: Slider Stereo) A typical ranging. The nine pictures are from a slider scan. The interest operator chose the marked feature in the central image, and the correlator found it in the other eight. The small curves at bottom are distance measurements of the feature made from pairs of the images. The large beaded curve is the sum of the measurements over all 36 pairings. The horizontal scale is linear in inverse distance. @subheading(Figure 6: A Cart Obstacle Run) This and the following diagram are plan views of the Cart's internal world model during a run of the obstacle avoiding program. The grid cells are two meter squares, conceptually on the floor. The Cart's own position is indicated by the small heavy square, and by the graph, indicating height, calibrated in centimeters, to the left of grid. Since the Cart never actually leaves or penetrates the floor, this graph provides an indication of the overall accuracy. The irregular, tick marked, line behind the Cart's position is the past itinerary of the Cart as deduced by the program. Each tick mark represents a stopping place. The picture at top of the diagrams is the view seen by the TV camera. The two rays projecting forward from the Cart position show the horizontal boundaries of the camera's field of view (as deduced by the camera calibration program). The numbered circles in the plan view are features located and tracked by the program. The centers of the circles are the vertical projections of the feature positions onto the ground. The size of each circle is the uncertainty (caused by finite camera resolution) in the feature's position. The length of the 45 degree line projecting to the upper right, and terminated by an identifying number, is the height of the feature above the ground, to the same scale as the floor grid. The features are also marked in the camera view, as numbered boxes. The thin line projecting from each box to a lower blob is a stalk which just reaches the ground, in the spirit of the 45 degree lines in the plan view. The irregular line radiating forwards from the Cart is the planned future path. This changes from stop to stop, as the Cart fails to obey instructions properly, and as new obstacles are detected. The small ellipse a short distance ahead of the Cart along the planned path is the planned position of the next stop. @subheading(Figure 7:) After the 11'th lurch the Cart has rounded the chair, the icosahedron and is working on the cardboard tree. The world model has suffered some accumulated drift error, and the oldest acquired features are considerably misplaced. @subheading(Figure 8: The CMU Rover) @subheading(Figure 9: The Rover Wheelbase) The steering angle and drive of each wheel pair is individually controlled. The rover's trajectory will be an arc about any point in the floor plane if lines through the axles of all three wheels intersect at the center of that arc. @subheading(Figure 10: The Rover Wheel Assembly) The steering motor is shown attached to the wheel assembly, part of the drive motor is shown detached. @subheading(Figure 11: Diagram of the Rover Wheel Assembly) @subheading(Figure 12: Rover Brushless Motor Drive Circuitry) Each Samarium-Cobalt brushless motor in the wheel assemblies is sequenced and servoed by its own microprocessor. A processor generates three bipolar logic signals @i(P1, P2) and @i(P3) which control the currents through the three phase motor windings @i(W1, W2) and @i(W3) through the IR light emitting diode/phototransistor optical links A through F shown. Each phototransistor controls a power field effect transistor which switches power to the windings. The circuitry in the upper, optically isolated, portion of the diagram is contained withing its motor's housing, on an annular circuit board, using the housing as heat sink. The LEDs poke through holes in the cover. Motor direction is controlled by sequencing order; torque is adjusted by pulse width modulation. @subheading(Figure 13: The Rover's Onboard Processors) @subheading(Figure 14: The Rover Goes Through a Closed Door) Using a simple arm with only three powered actuators and two passive hinges, greatly helped by the wheel flexibility, the Rover deals with a self-closing office door. The door and knob are visually located, the Rover extends its arm, and approaches the door (a). The arm grasps the knob, twists it open, and the Rover backs up in an arc, partially opening the door (b). The Rover rolls around the door edge, while retaining its grasp on the knob; passive hinges in the arm bend in response (c). The Rover body now props open the door; the arm releases and retracts, and the Rover rolls along the front of the door (d). The Rover moves in an arc outward, allowing door to close behind it (e).