@make(article) @use(bibliography = "../master.bib") @libraryfile(picture) @counter(parahead, within subsection, TitleEnv HD4, contentsenv tc4, incrementedby use) @pageheading (center "@ref[page]", right "DRAFT", left "@value[date]") @heading(Experiments and Thoughts on Visual Navigation) @center[@b(C. Thorpe, L. Matthies, and H. Moravec)] @heading(Abstract) We describe a second generation system that drives a camera-equipped mobile robot through obstacle courses, at the same time maintaining a long term visually updated navigational model. Accuracy improving and speedup techniques are evaluated, and future improvements and implications are discussed. We mention why it might be good for coming robots to spend some of their time in play. @section(Introduction) @c(FIDO) is a navigation and vision system for a robot rover. Using only stereo vision, it locates obstacles, plans a path around them, and tracks the motion of the robot as it moves. @c(FIDO)'s main loop repeatedly: Picks about 40 points from one member of a stereo image pair, stereo-ranges those points by a hierarchical correlation technique, plans a path that avoids those points, moves forward, takes two new stereo pictures, relocates those same points and stereo ranges them again, deduces vehicle motion from apparent point motion. This paper describes our experimental investigations and improvements in @c(FIDO)'s performance. Early versions of @c(FIDO) and its predecessor, the Stanford Cart programs, used 9-eyed stereo, took 15 minutes or more per step, and were not always reliable. By using additional geometric constraints, we have been able to increase the reliability and reduce the number of stereo images from 9 to 2. With fewer images and several optimizations, we reduced the run time from 15 minutes to less than a minute per step. We also explored using parallel hardware for further speedups. Section 2 discusses the constraints used, and their effects on system precision. Section 3 presents optimizations for speed, and prospects for parallelism. Finally, Section 4 presents some extrapolations on the @c(FIDO) experience. The @c(FIDO) system has supported experiments in other aspects of visual navigation, notably @i(interest operators), used to pick points to be tracked from image to image, and @i(path planning). The results have been presented elsewhere [@cite(Thorpe84a, Thorpe84)]. It was found that the simple interest operator used in the original Cart program worked as well as more expensive ones, and it was retained with only slight changes. @c(Fido) does incorporate a new, more flexible, path planner based on a grid combinatorial search and an incremental smoothing. @subsection(Constraints) @c(FIDO) uses a variety of constraints to improve the accuracy of its stereo vision and motion solutions. Most reduce the area of the image to be searched by the correlator. A smaller search window reduces the chance of finding a false match and improves system performance in several ways. First, as more points are tracked correctly it becomes easier to identify those incorrectly tracked and delete them. Secondly, more points (and higher precision) improve the accuracy of the motion calculations @cite(tsai). Finally, points can be successfully tracked through more images, and over longer distances, for more accurate long term navigation. Some of the constraints arise from the imaging geometry - the known relationship between the cameras and the vehicle, with no assumptions about vehicle motion or object location. Other constraints come from vehicle motion estimates. The image location of an object that has been stereo ranged on a previous step is constrained by approximate knowledge of the vehicle's new position. All the constraints can be turned on or off, and their parameters are user-settable in the @c(FIDO) program, and were subjects of experimentation. We usually made a live vehicle run with the current best settings, and saved all the images and position predictions in a file. Subsequent runs were done offline using this stored data, with different constraint settings. Such runs were compared for accuracy of the final calculated position, number of features successfully tracked at each step, and occurrence of any catastrophic failures. @subsection(Imaging Geometry Constraints) These constraints are the simplest to understand and to apply. They depend only on camera and robot geometry, and they are applicable to stereo point matches of both new and previously ranged points. @b(Horizontal Hold.) If the point of view shifts to the left without rotation, the image of any given point will move to the right. @begin(figure) @presspicture (file = "../const/horiz.press", height = 2.5 in) @caption(Horizontal Hold) @end(figure) @b(Vertical Hold.) If the point of view moves purely sideways the image of a point will also move sideways (in the opposite direction) but not up or down. In the real world of misaligned cameras and distorted vidicons, the image might appear to move a little vertically, so we allow some slop (10% of the image height typical). @begin(figure) @presspicture (file = "../const/vert.press", height = 2.5 in) @caption(Vertical Hold) @end(figure) @b(Near Limit.) Obstacles are not expected too near the cameras because Neptune's cameras sit far back on the vehicle. An obstacle cannot be detected below a certain distance in any case because then it cannot be simultaneously seen by both cameras, and may be out of focus. This constraint limits the expected maximum stereo disparity. @begin(figure) @presspicture (file = "../const/near.press", height = 2.5 in) @caption(Near Limit) @end(figure) @begin(figure) @presspicture (file = "../const/comb.press", height = 2.5 in) @caption(Combined Imaging Constraints) @end(figure) @subsection(Motion Geometry) The estimated motion of the vehicle from step to step places a strong constraint on point matches. It can be used either @i(a priori) to limit the search area within an image, or @i(a posteriori) to gauge the reasonableness of a match. The predicted position of the vehicle can also be combined with the points tracked by vision in the vehicle motion calculation. @c(FIDO) uses the motion geometry constraints in the following 4 ways: @begin(figure) @presspicture (file = "../const/rea.press", height = 2.5 in) @caption(Reacquire Hold) @end(figure) @b(Two D Motion.) We usually run our robot on locally flat ground, in which case we know it will not pitch, roll, or move vertically. This reduces the problem of determining vehicle motion from 6 degrees of freedom to 3, simplifying the computation and tightening the constraints. @b(Reacquire Hold.) Given the 3D location of a point relative to a previous vehicle position, and a dead reckoned new position and heading for the vehicle, it is possible to predict where that point should appear in the new stereo pair of images. If this constraint is active @c(FIDO) will use the prediction to limit the stereo matcher's search. Three user-settable variables control the error estimates in robot position and orientation, and consequently the size of the search box around the predicted image position. @b(Prune.) When all points from a previous position have been reacquired at a new vehicle location and stereo-ranged, there is a pruning step that looks for points that do not move rigidly with the rest of the points. The points that do not appear to move rigidly have probably been tracked incorrectly, and can be deleted before the least-squares process that solves for vehicle motion. Activating the Prune constraint causes the predicted vehicle position to be included as one of the points in the rigidity test, perhaps weighting the selection to the correctly matched points rather than a coincidentally consistent incorrect set. @b(Motion Solution.) The motion solver determines the motion that minimizes the error between where points have been seen and where they should have been seen given that motion. The predicted vehicle position can be included as one of the points in this least-squares process, weighted more or less depending on the assumed precision of the prediction. @subsection(Results) We made several runs of the @c(FIDO) system on Neptune, with fairly consistent results. Data from June 24, 1984 was most extensively analyzed. On that run a single large obstacle was placed a close 2 meters ahead of Neptune's cameras, with the destination set to the far side. It was a tough test for @c(FIDO), since it required the maximum allowed turn (limited by the need to have significant overlap in the views from successive positions) on each step to get around the obstacle and back on course. We ran @c(FIDO) with each constraint in what we thought to be its best state, and saved images and dead reckoning information. Then we made a series of offline runs on the stored data, varying settings and watching the results. Several runs differed in only one parameter from the original, a few others changed two or three. The last group of runs began with one using none of the constraints, followed by a series each with only one constraint on. The following chart summarizes the results. The most important measure of a run's success is the (program's) calculated position at the end of the run: the nearer to the actual (manually) measured position, the better. Some cautionary notes are in order. The relative success of the run with only the horizontal hold constraint is accidental. During that run, there were two steps where the motion solution was completely wrong but that by coincidence nearly offset each other. Many of the other single constraint runs that appear worse actually had only one wild miscalculation. Some of the all-but-one constraint runs also appear too good. In many of these cases the dead-reckoning information was sometimes better than the visual tracking. The run with no vertical hold has a better final position than the run with no reacquire hold, because, by luck, it tracked fewer points at the right times and relied on dead reckoning while the latter placed too much reliance on small numbers of tracked points. Based on our experiences, we make the following observations: @begin(itemize) Vertical Hold is the single most powerful constraint. Turning it off, and all the others on, significantly decreases the minimum and average number of features tracked and the accuracy of the motion solution. Turning it on, with all others off, significantly increased the number of points tracked. In a sense, this is not surprising, since the vertical hold constraint rules out 90% of the image, more than any other constraint. No single constraint makes the difference between a successful and a catastrophic outcome. In none of the runs was vision overall as accurate at calculating position as straight dead reckoning based on motor commands, though in the best runs vision determined the final heading more correctly. It would have been better to use the dead reckoned position rather than the vision determined one if the number of features tracked dropped below 6 or 7, rather than 4 which was the threshold, at least for the ground roughness and mechanical accuracy in the experiments. We noticed that even the best runs have about a 20% error in calculated position, always on the short side. We suspect a small camera calibration error, and possibly systematic errors in representing uncertainty. @c(FIDO) calculated a point's 3D location by projecting rays through the centers of the pixels in the stereo images, which gives a location on the near side of the range of uncertainty of distance. There is a problem in using all the geometric constraints to cut down the search area since it leaves none for verification and pruning. If we had very accurate motion prediction, we would have to resort to photometry instead of geometry to identify points that had been occluded or otherwise lost. @end(itemize) @begin(figure) @presspicture (file = "dist.press", height = 4.0 in) @caption(Distance calculated as percentage of actual distance traveled) @end(figure) @section(Speed-up Methods) @c(FIDO) now takes 30 to 40 seconds per step on a Vax 11/780 under Unix. To run in real time, we would have to reduce that to about 1 second per step. We have looked at several speed-up techniques, including faster processors, dedicated hardware, coding hacks, and parallel processing. @parahead(Faster General Purpose Computers) Our @c(VAX) is about a one-@c(MIP) (@c(M)illion @c(I)nstructions @c(P)er second) machine. It is technically possible to get the required speedup by simply obtaining a 30-@c(MIP) or faster computer. Budget and logistics leave this as a tantalizing future possibility. @parahead(Commercial Array Processors) Buying a commercial array processor is more feasible for us than buying a faster computer. About 90 percent of the runtime in FIDO occurs in image array operations and geometric calculations, particularly the convolutions in point matching. These are done by small pieces of code that work on large amounts of data, and are well suited to the pipelined vector arithmetic of available array processors. We estimate, for instance, that the 100 MFlop Star Technologies ST-100 could give us the desired factor of 30 speedup. We've made several serious attempts to acquire one, but no luck yet. @parahead(Coding optimizations) Much effort has been expended on speeding up the Vax implementation, and we feel there is little room for left for significant incremental improvements. The basic routines, such as the correlator and the interest operator, fit all the criteria for good candidates for optimization @cite(Bentley): the code is fairly well understood, stable, small, and accounts for a large amount of run time. For instance, the implementation of the correlator uses the following coding techniques: @begin(itemize) The calculations of parameters of the correlating window are done once, outside the main loop. Sums and sums of squares for consecutive columns and rows are calculated by Price's technique@cite(Price). The next window total is calculated by adding in the total for the column that just entered the window and subtracting off the total for the column that just left the window. Squares are calculated by table lookup. Since the squares are of sums of two pixel values, the table needs only 511 entries. Image windows are moved by pointer swapping, rather than by data transfers. Loop indices count down to 0, since the @c(VAX) hardware has an efficient test-for-not-0-and-branch instruction. Formulas are rewritten to eliminate extra calculations. For example, @begin(format, above 0, below 0) 2 * @g(S)(img1 * img2) = @g(S)((img1+img2)^2) - @g(S)(img1^2) - @g(S)(img2^2) @end(format) gives a way of calculating the sum of the products of the pixel values by additions (which are cheap) and squares (which can be done by table lookup) rather than multiplications. The individual sums are also used in other parts of the calculation, so in this case the sum of products comes for free. Loop unrolling. The code in the innermost loop is written n times in line, rather than written once inside a loop that counts to n. This saves n increments of the counter and n tests for the end of the loop. Register use. The most frequently used variables are located in hardware registers. @end(itemize) These programming techniques reduce the run time of the correlator from 140 ms per call for a straightforward implementation to 4 to 5 ms per call. Similar optimizations have been performed on the other tight loops, such as in the interest operator and the image fine to coarse reduction routine. The user-level routines have been optimized to the point that the single routine that uses the most CPU time is now an image unpacker. @parahead(Dedicated hardware) A dedicated microcomputer running @c(FIDO) with enough memory to store all the relevant images offered some hope. We tried an implementation of the correlator on a 10-MHz MC68000 system, with all the images held in integer arrays. After eliminating all floating point operations the resulting code still took 29 microseconds per call to the correlator, compared with 4 to 5 on the @c(VAX). @subsection(Parallelism) There are several ways to break @c(FIDO) into separate processes that can run in parallel on different machines, including pipelining on macro or micro scales or the use of a master/slave system. @parahead(Macro Pipelining) One process might do the reductions, the next could do reacquires, the next the match, another motion-solving, and the last path planning. This organization improves throughput but not the latency. The time to process any one set of images is the same as on a single processor system. @parahead(Micro Pipelining) The processes could be subdivided more finely. For instance, one processor might do the first level of match for one point after another, handing its results to the process that does the next level of match, and so on. This approach requires huge communication bandwidth between processes. @parahead(Master/slave) This method has one master process and several identical slave processes. Each slave handles every image processing task: reduction, matching, and interest operator. At any time all the slaves work on the same task with different data. For example, during image reduction, each slave reduces part of the image, and during matching each slave processes its own queue of points. The master process does tasks that require global knowledge such as path-planning or motion-solving, and coordinates the slaves. This more flexible organization avoids several delays inherent in pipelines. As with micro-pipelining and other multiprocessor schemes, however, its effectiveness depends on high bandwidth communication. We implemented variants of this idea in our Ethernet-connected multi-Vax environment. Given the existing uniprocessor code, the task was not difficult. The slaves required new code for communication with the master, but the actual work is done by calls to the old image processing routines. The master contains the old path planning and display code, and new communication code and dispatch tables to keep track of each slave's activities. When a slave completes a task the master updates its dispatch table, finds a new task and puts the slave to work again. For instance during point matching each slave is initially given one point to correlate. When a slave finishes its correlation, the master hands it a new point to find. When all the points are done the master redundantly hands out points that are still in process on other slaves, and accepts the first answer to be returned, giving some protection against overloaded or crashed processors. A version of the system that used several @c(VAX)es in parallel, was swamped, as expected, by the overhead of squeezing images between machines through the Ethernet. Another version that used multiple processes on a single Vax gave us some idea of the performance that might be possible if faster communication, perhaps through shared memory, were available. The single machine version uses the same decomposition as the multiple machine version, and the same general-purpose interprocess communication package. Because of limitations in the communications package, each slave calculated its own image pyramid. @subsection(Timings for a 28-Step Run) @begin(format) @tabdivide (3) Uniprocessor@\978 One Slave Master@\216 Slave 1@\@\626 Five Slaves Master@\234 Slave 1@\@\403 Slave 2@\@\402 Slave 3@\@\403 Slave 4@\@\402 Slave 5@\@\400 @end(format) Notes: @begin(itemize) The time for the Master varies little with the number of slaves. Without image acquisition or communication package overhead the time for a single slave would be about 325 seconds or 12 seconds per step. Without image or communication overhead, and with the time for picture reduction shared evenly, the time for each of the five slaves would be 65 seconds, or about 2.5 seconds per step. The work spreads very evenly among the slaves. With 5 slaves, the workload is balanced to within the accuracy of our measurements. If the master process did not handle images, had zero-cost communication, and didn't have to do coordinate transforms, it could run in 75 to 80 seconds, or about 3 seconds per step. By comparison, the original uniprocessor system runs in 978 seconds, or 35 seconds per step. With the advantages we assumed above (no image handling overhead) it would still have taken 503 seconds, or 18 seconds per step. @end(itemize) @subsection(Remarks) Our experiments suggest that it is possible to decompose @c(FIDO) into a 5 to 10 fold parallel set of efficiently cooperating parts running on conventional processors. For maximum efficiency it would require certain facilities. @begin(enumerate) Shared main memory large enough to hold at least two image pyramids without swapping or data packing. (2 * [256 + 64 + 16 + 4 + 1 + .25] = 700 KiloBytes). Fast interprocess communication for small messages. At least 5 processors. It takes 5 slave processors to bring the image processing time into the same range as the master process' time. A device able to digitize images directly into the shared memory. @end(enumerate) @section(New Ideas) Some simple hardware enhancements could improve @c(FIDO)'s performance. A @b(pan mechanism) for the stereo cameras would permit larger turns while still maintaining continuity of field of view. @b(motion & heading sensors) would improve navigational accuracy, and eliminate some catastrophic misperceptions. There may be software aids to this issue also. @subsection(Camera calibration) Visual motion servoing as we do it depends on precise camera calibration and, to a lesser extent, accurate steering, properties hard to arrange for and maintain in a trundling vehicle. We've thought about an adaptive approach that calibrates the cameras continuously from the images obtained on normal runs, and adjusts motor control parameters from observations of past vehicle motions. A simple technique like this was used successfully in an early program that drove the Stanford Cart in straight lines @cite(Moravec75). A self-calibrating robot has interesting implications. To ensure that the adaptive parameters stay correctly adjusted across their entire range it may be desirable to have the robot exercise its abilities even in the absence of a specific goal like getting to a destination. Such robot "play" would be educational for the developers of the robot as well as for the robot itself, since it would reveal failure modes that might not be apparent in the dull workaday world that most robots inhabit. We imagine a situation where the robot is allowed to frolic overnight. At the present stage of development a few nights of robot frolics could double our experience with vision for mobile robots.