Stanford Artificial Intelligence Laboratory April 1979 Visual Mapping by a Robot Rover Hans P. Moravec COMPUTER SCIENCE DEPTARTMENT Stanford University Abstract Visual Mapping by a Robot Rover by Hans P. Moravec The Cart is a camera equipped rover, computer controlled over a radio link. This report describes a working program that has successfully driven the cart through cluttered real world environments, and has deduced the cart's motion, and built a map of its surroundings, entirely from what the onboard TV camera saw. The program builds a sparse 3D map of its surroundings, but makes no attempt to classify the objects it sees, other than to label them as obstacles or non-obstacles. The method's success is largely due to high redundancy and cross checking at several levels. The program is now being extended into one that will drive the cart to a desired destination, avoiding any obstacles that are encountered. \!picture((DSK:CART.PIC[PIC,HE])); \!fig(1,(The Stanford AI Cart. It is a meter square and a meter and a half tall. It broadcasts a standard B/W television picture on UHF channel 23, and is remote controlled over a citizen's band frequency.)); Introduction In 1977 we reported on work whose goal was a visually guided roving robot (called the Cart) able to go to a desired location in the real world, avoiding obstacles along the way [1]. The results at the time were discouraging; a major approach had just been abandoned as unworkable. Some new hardware and an entirely new program has made the situation far more cheery this year. The vision portion of the project is essentially complete and working satisfactorily. The cart has been driven through a real environment (a large cluttered room) several times, passively observing the world through its TV camera. During these trips the control programs constructed a correct three dimensional map of objects in the room, and of the cart's trajectories through them. A major key to success was the use of massive redundancy at several levels of the computation. Work is now underway on code that will use the map generated and kept updated by the visual system to plan an obstacle avoiding path to the desired destination, and drive the cart along it. Constraints The research was influenced by a pre-existing vehicle with fairly crude mechanical and electronic attributes, and a computer system of medium capability. The cart (Figure 1) is a self contained, remote controlled, battery powered electric vehicle. It carries a TV camera and transmitter, and broadcasts a standard black and white television image on a UHF channel. The remote control link evolved over the years from a model airplane proportional controller, and now transmits six bits continuously with a time resolution of 30 ms. Two of the bits control the drive motors, permitting the four states: Off, Forward, Reverse and Dynamic Braking. Two other bits command the steering motor: Motor Off, Steer Left, Steer Right, Steer to Straight Ahead. The steering speed is servoed, albeit crudely, and an approximate wheel angle can be achieved by centering the wheels with the straight ahead command, and then steering left or right for a precise amount of time. The errors incurred this way are in excess of 10 degrees. The remaining two bits once controled a motor that rotated the TV camera in a horizontal plane in a manner similar to the steering. This pan mechanism been replaced by a stepping motor drive that slides the camera from side to side on a 52 cm track in precise 0.25 mm steps (Figure 2). A set of thumbwheel switches on the cart determines how many steps the camera moves with each right or left command. The number is currently 256, effectively dividing the track into eight parts, and giving the camera nine stopping positions across it. These provide a highly redundant nine-eyed stereo baseline for distance measurements. \!picture((DSK:CARPAN.PIC[PIC,HE])); \!fig(2,(A closeup of the camera slider mechanism. The camera can translate sideways over a 56 cm range, in 0.25 mm steps.)); The only data transmitted from the cart is the camera's TV image. There are no wheel position sensors, touch or force detectors, or range finders, nor any means for sending data from such. It was decided early in this research, partly because of funding limitations, that while such devices would be an asset in a real application such as a Mars rover, they were not essential in an experimental vehicle whose purpose was development of vision software. It is obvious that some of the problems encountered would have had simpler or more reliable solutions were other sensors available. On the positive side, an independent visual navigation system will likely be a desirable redundant backup on a camera equipped vehicle that navigates primarily by other means. Under construction is an onboard microprocessor system for the cart that will allow bi-directional, error checked, communication and also loading and execution of simple onboard control and servoing programs. The obstacle avoiding work described here will be completed without benefit of this new hardware. \!picture((DSK:CARSPO.PIC[PIC,HE])); \!fig(3,(The cart in its calibration posture before the calibration pattern. A program automatically locates the cross and the spots, and deduces the camera's focal length and distortion.)); \!graphic((DSK:SPOT.GOD)); \!fig(4,(Results of the calibration program. The distortion polynomial it produced has been used to map an undistorted grid of ideal spot positions into the calculated real world ones. The result is superimposed on the original digitized spot image to make any discrepancies obvious.)); Overview From outside, the cart runs completed so far look as follows. The cart is placed 3 meters in front of and pointed towards a 3 meter square pattern of dots on a wall (Figure 3). A program that digitizes the image and automatically locates the spots and determines the camera's focal length and distortions is run (Figure 4). The cart is then driven to the starting location of a clear path in a cluttered space, and a navigating program is started. This reads in the camera calibration file produced by the first program (actually calibrations need to be done only occasionally), and takes control of the cart. At first the camera slides to the left on its track, until it encounters its left limit. Nothing mechanical happens for a few seconds while an image is digitized. The camera then slides one increment of 256 steps, 6.4 cm to the right, digitizes another, and so on for a total of nine images. Afterward the cart sits immobile for about ten realtime minutes while our timeshared KL-10 works on the pictures for 180 compute seconds. Then it lurches about a meter forward, stops, and repeats the camera motion. As the cart moves slowly forward, meter by meter, the program generates and updates a three dimensional world model of objects it has seen, and deduces the vehicle's motion through them. This model is displayed in map form. Initially the map is blank, as time (and the cart) goes by more and more points are added. The cart's trajectory is also plotted (Figures 10 and 11). This concludes the description of the existing program's behavior. When the last stage of the work is completed the activity will be slightly augmented. The cart will be told its destination at the start of the run, and at each stop will plan a path to the goal that avoids all the obstacles perceived so far. Each meter-long lurch will begin with a steering motion to keep the cart on that path. \!picture((DSK:SEPT.14)); \!fig(5,(A view of the room mapped in the sample run.)); Digitizing Digitizing images over a sometimes noisy radio link has its problems. Though gaining adequate images is a crucial part of the cart program, most of the techniques used are to solve problems peculiar to our system, and are of limited general interest. We summarize them here. The picture is transmitted on a frequency of 524 MHz, a wavelength of 57 cm. Reflections from nearby objects sometimes cancel the signal to a receiver. The cart program is able to get its picture through any one of a number of TV receivers (up to 4). It measures the scanline to scanline noise on each in turn, and then picks the one with lowest noise. In future we may simply switch antennas to a single receiver. Our decade old digitizer extracts a picture 280 samples wide by 240 high by 4 bits/sample from broadcast standard video. Four bits is only sixteen grey levels, and gives a poor quality image. Also, sometimes the noise level over the link is high, and the picture has a lot of "snow". These problems can be alleviated by adding up or averaging a number of frames, to increase the number of bits and average out noise. But this introduces problems of its own. The sync detector on the digitizer sometimes fails, especially when the signal is noisy, and the picture jitters or "rolls". Adding together pictures that have rolled gives us a big blur. The problem has been solved by a program that digitizes a large number of frames (about 30), and by an efficient sampling of the pixels intercompares them to find the subset of pictures that are most nearly alike. These are then combined. Since most images have not suffered sync loss, the result is almost always a good six bit picture. New digitizing hardware that the lab is in the process of ordering may soon make these gyrations obsolete. Picture Reduction The cart vision programs make extensive use of reduced versions of input images. Digitized pictures are compressed repeatedly by linear factors of two, four pixels being added to become one, until the whole image has shrunk to a single pixel. The original image and all of its reductions are available to all subsequent processing steps. Calibration The cart camera, like most vidicons, has peculiar geometric properties. Its precision has been enhanced by an automatic focal length and distortion determining program. The cart is parked a precise distance in front of a wall of many spots and one cross (Figure 3). The program digitizes an image of the spot array, locates the spots and the cross, and constructs a a two dimensional polynomial that relates the position of the spots in the image to their position in an ideal unity focal length camera, and another polynomial that converts points from the ideal camera to points in the image. These polynomials are used to correct the positions of perceived objects in later scenes. The program tolerates a wide range of spot parameters (about 3 to 12 spots across), arbitrary image rotation, and is very robust. After being intensely fiddled with to work successfully on an initial set of 20 widely varying images, it has worked without error on 50 successive images. The test pattern for the cart is a 3 meter square painted on a wall, with 5 cm spots at 30 cm intervals. The program has also been used successfully with a small array (22 x 28 cm) to calibrate cameras other than the cart's. The algorithm reads in an image of such an array, and begins by determining its approximate spacing and orientation. It trims the picture to make it square, reduces it by averaging to 64 by 64, calculates the Fourier transform of the reduced image and takes its power spectrum, arriving at a 2D transform symmetric about the origin, and having strong peaks at frequencies corresponding to the horizontal and vertical spacing and half the diagonal spacing, with weaker peaks at the harmonics. It multiplies each point [i,j] in this transform by point [-j,i] and points [j-i,j+i] and [i+j,j-i], effectively folding the primary peaks onto one another. The strongest peak in the 90 degree wedge around the y axis gives the spacing and orientation information needed by the next part of the process. The directional variance interest operator described later is applied to roughly locate a spot near the center of the image. A special operator examines a window surrounding this position, generates a histogram of intensity values within the window, decides a threshold for separating the black spot from the white background, and calculates the centroid and first and second moment of the spot. This operator is again applied at a displacement from the first centroid indicated by the orientation and spacing of the grid, and so on, the region of found spots growing outward from the seed. A template for the expected appearance of the cross in the middle of the array is constructed from the orientation/spacing determined determined by the Fourier transform step. Each of the found spots is matched to this cross template, and the closest match in the central portion of the picture is declared to be the origin. Least squares polynomial of about third degree in two variables relating the actual positions of the spots to the ideal positions are then generated and written into a file. Choosing Features \!graphic((DSK:INTOP.GOD)); \!fig(6,(A cart's eye view of the room from the starting position of the mapping run, and features picked out by the interest operator. They are labelled in order of decreasing interest measure.)); The cart vision code deals with very simple primitive entities, localized regions called features. A feature is conceptually a point in the three dimensional world, but it is found by examining localities larger than points in pictures. A feature is good if it can be located unambiguously in different views of a scene. A uniformly colored region or a simple edge does not make for good features because its parts are indistinguishable. Regions, such as corners, with high contrast in orthogonal directions are best. New features in images are picked by a subroutine called the interest operator, an example of whose operation is displayed in Figure 6. It tries to select a relatively uniform scattering, to maximize the probability that a few features will be picked on every visible object, and tries to choose areas that can be easily found in other images. Both goals are achieved by returning regions that are local maxima of a directional variance measure. Directional variance is measured over small square windows. Sums of squares of differences of pixels adjacent in each of four directions (horizontal, vertical and two diagonals) over each window are calculated, and the window's interest measure is the minimum of these four sums. Features are chosen where the interest measure has local maxima. The feature is conceptually the point at the center of the window with this locally maximal value. The effects of noise are alleviated and the processing time is shortened by applying the operator to a reduced image. In the current program original images are 240 lines high by 256 pixels wide. The interest operator is applied to the 120 by 128 version, on windows 3 pixels square. Once a feature is chosen, its appearance is recorded as series of excerpts from the reduced image sequence. A window (6 by 6 in the current implementation) is excised around the feature's location from each of the variously reduced pictures. Only a tiny fraction of the area of the original (unreduced) image is extracted. Four times as much of the x2 reduced image is stored, sixteen times as much of the x4 reduction, and so on until at some level we have the whole image. The final result is a series of 6 by 6 pictures, beginning with a very blurry rendition of the whole picture, gradually zooming in linear expansions of two to a sharp closeup of the feature. Of course, it records the appearance correctly from only one point of view. Finding Features Again Deducing the 3D location of features from their projections in 2D images requires that we know their position in two or more such images. The correlator is a subroutine that, given a description of a feature as produced by the interest operator from one image, finds the best match in a different, but similar, image. Its search area can be the entire new picture, or a rectangular sub-window. \!grasml((DSK:CORLS.GOD)); \!fig(7,(Areas searched for in a successive refinement correlation. The largest rectangle begins the process.)); \!grasml((DSK:CORRS.GOD)); \!fig(8,(Best matches to the areas outlined in Fig. 7 found by the correlator in a different image.)); The search uses a coarse to fine strategy, illustrated in Figures 7 and 8, that begins in reduced versions of the pictures. Typically the first step takes place at the x16 (linear) reduction level. The 6 by 6 window at that level in the feature description, that covers about one seventh of the total area of the original picture, is convolved with the search area in the correspondingly reduced version of the second picture. The 6 by 6 description patch is moved pixel by pixel over the approximately 15 by 16 destination picture, and a correlation coefficient is calculated for each trial position. The position with the best match is recorded. The 6x6 area it occupies in the second picture is mapped to the x8 reduction level, where the corresponding region is 12 pixels by 12. The 6 by 6 window in the x8 reduced level of the feature description is then convolved with this 12 by 12 area, and the position of best match is recorded and used as a search area for the x4 level. The process continues, matching smaller and smaller, but more and more detailed windows until a 6 by 6 area is selected in the unreduced picture. The work at each level is about the same, finding a 6 by 6 window in a 12 by 12 search area. It involves 49 summations of 36 quantities. In our example there were 5 such levels. The correlation measure used is \!jdiv((2 Sum ab),(Sum a^2 + Sum b^2));, where a and b are the values of pixels in the two windows being compared, with the mean of windows subtracted out, and the sums are taken over the 36 elements of a 6 by 6 window. The measure has limited tolerance to contrast differences. The window sizes and other parameters are sometimes different from the ones used in this example. Slider Stereo At each pause on its computer controlled itinerary the cart slides its camera from left to right on the 52 cm track, taking 9 pictures at precise 6.5 cm intervals. Points are chosen in the fifth (middle) of these 9 images, either by the correlator to match features from previous positions, or by the interest operator. The camera slides parallel to the horizontal axis of the (distortion corrected) camera co-ordinate system, so the parallax induced apparent displacement of features from frame to frame in the 9 pictures is purely in the X direction. The correlator looks for the points chosen in the central image in each of the eight other pictures. The search is restricted to a narrow horizontal band. This has little effect on the computation time, but it reduces the probability of incorrect matches. In the case of correct matches, the distance to the feature is inversely proportional to its displacement from one image to another. The uncertainty in such a measurement is the difference in distance a shift one pixel in the image would make. The uncertainty varies inversely with the physical separation of the camera positions where the pictures were taken (the stereo baseline). Long baselines give more accurate distance measurements. After the correlation step the program knows a feature's position in nine images. It considers each of the 36 possible image pairings as a stereo baseline, and records the estimated distance to the feature (actually inverse distance) in a histogram. Each measurement adds a little normal curve to the histogram, with mean at the estimated distance, and standard deviation inversely proportional to the baseline, reflecting the uncertainty. The area under the each curve is made proportional to the product of the correlation coefficients of the matches in the two images (in central image this coefficient is taken as unity), reflecting the confidence that the correlations were correct. The area is also scaled by the normalized dot products of X axis and the shift of the features in each of the two baseline images from the central image. That is, a distance measurement is penalized if there is significant motion of the feature in the Y direction. The distance to the feature is indicated by the largest peak in the resulting histogram, if this peak is above a certain threshold. If below, the feature is forgotten about. The correlator frequently matches features incorrectly. The distance measurements from incorrect matches in different pictures are usually inconsistent. When the normal curves from 36 pictures pairs are added up, the correct matches agree with each other, and build up a large peak in the histogram, while incorrect matches spread themselves more thinly. Two or three correct correlations out of the eight will usually build a peak sufficient to offset a larger number of errors. \!graphw((DSK:DIST4.GOD)); \!fig(9,(A typical nine-eyed stereo feature ranging. The pictures around the periphery are nine images acquired from a slider scan. The feature being sought is marked by an asterisk in the central image. Its locations in the other eight images, as determined by the correlator, are marked by X's. Each image pair is used to calculate a probable distance to the feature. The range so determined is recorded as a normal curve whose area is proportional to the quality of the correlations of the pair used, and whose standard deviation is the uncertainty in the ranging, based on the one pixel resolution of the correlation. Picture pairs from long baselines have a smaller deviation, an hence a narrower and taller curve than short baseline pairs. The curves from the 36 possible pairings are summed, and the peak in the resulting curve gives the distance. In the graph, the inverse distance scale is graduated in meters, the individual normal curves are drawn in plain lines and their sum, scaled to fit the diagram, is the beaded curve. It has a peak at 7.1 meters. In the picture the ranged object is the top of a box.)); In this way eight applications of a mildly reliable operator interact to make a very reliable distance measurement. Figure 9 shows a typical ranging. The small curves are measurements from individual picture pairs, the beaded curve is the final histogram. Motion Stereo The cart navigates exclusively by vision. It deduces its own motion from the apparent 3D shift of the features around it. After having determined the 3D location of objects at one position, the computer drives the cart about a meter forward. At the new position it slides the camera and takes nine pictures. The correlator is applied in an attempt to find all the features successfully located at the previous position. Feature descriptions extracted from the central image at the last position are searched for in the central image at the new stopping place. Slider stereo then determines the distance of the features so found from the cart's new position. The program now knows the 3D position of the features relative to its camera at the old and the new locations. It can figure out its own movement by finding the 3D co-ordinate transform that relates the two. There can be mis-matches in the correlations between the central images at two positions and, in spite of the eight way redundancy, the slider distance measurements are sometimes in error. Before the cart motion is deduced, the feature positions are checked for consistency. Although it doesn't yet have the co-ordinate transform between the old and new camera systems, the program knows the distance between pairs of positions should be the same in both. It makes a matrix in which element [i,j] is the absolute value of the difference in distances between points i and j in the first and second co-ordinate systems divided by the expected error (based on the one pixel uncertainty of the ranging). Each row of this matrix is summed, giving an indication of how much each point disagrees with the other points. The idea is that while points in error disagree with virtually all points, correct positions agree with all the other correct ones, and disagree only with the bad ones. The worst point is deleted, and its effect is removed from the remaining points in the row sums. This pruning is repeated until the worst error is within the error expected from the ranging uncertainty. After the pruning, the program has a number of points, typically 10 to 20, whose position error is small and pretty well known. The program trusts these, and records them in its world model, unless it had already done so at a previous position. The pruned points are forgotten forevermore. Now comes the co-ordinate transform determining step. We need to find a three dimensional rotation and translation that, if applied to the co-ordinates of the features at the first position, minimizes the sum of the squares of the distances between the transformed first co-ordinates and the raw co-ordinates of the corresponding points at the second position. Actually the quantity that's minimized is the foregoing sum, but with each term divided by the square of the uncertainty in the 3D position of the points involved, as deduced from the one pixel shift rule. This weighting does not make the solution more difficult. \!fig(10,(Overhead view of the cart's world model from position to position in the sample run, as deduced by the program purely from perceived images. The world model starts out empty, and gets gradually more cluttered as the program drives the cart forward and acquires and confirms new features. The grid cells are two meters on a side. The cart's position is indicated by the dark square. Each stopping point on its path is marked by a dash (note that the cart failed to move between the seventh and the eighth position. This was due to a bug in the controlling routine, and was detected by the visual mapper). The two lines radiating from the cart are the horizontal limits of the camera's field of view. Each feature is marked by a circle that approximates the range of uncertainty in the projection of its position on the ground. The 45 degree line growing to the upper right out of each circle indicates the feature's height above the floor, in the same scale as the grid. Features whose height is less than or equal to their position uncertainty are assumed to be marks on the ground. The pictures below each map show the Cart's view from that position. Each is the central image of a nine-eyed set.)); \!grabig((DSK:PLAN.14)); \!fig(11,(The world map at the final position. The graph on the left (note the calibration in centimeters), is the program's determination of the cart's height above the ground. Since the cart didn't actually bounce up and down, it provides a measure of the ultimate accuracy of the self model. The long vertical line through the circle of uncertainty around each feature position is the distance uncertainty, which is higher than the average uncertainty shown by the circle. The picture is the central view from the final position.)); The error expression is expanded. It becomes a function of the rotation and translation, with parameters that are the weighted averages of the x, y and z co-ordinates of the features at the two positions, and averages of their various cross-products. These averages need to be determined only once, at the begining of the transform finding process. To minimize the error expression, its partial derivative with respect to each variable is set to zero. It is relatively easy to simultaneously solve the three linear equations thus resulting from the vector offset, getting the optimal offset values for a general rotation. This gives symbolic expressions (linear combinations of the rotation matrix coefficients) for each of the three vector components. Substituting these values into the error expression makes it a function of the rotation alone. This new, translation determined, error expression is used in all the subsequent steps. Minimizing the error expression under rotation is surprisingly difficult, mainly because of the non-linear constraints in the 3D rotation matrix. The next six paragraphs outline the struggle. Each step was forced by the inadequacies of the previous one. The program begins by ignoring the non-linearities. It solves for the general 3D linear transformation, nine elements of a matrix, that minimizes the least square error. The derivatives of the error expression with respect to each of the matrix coefficients are equated to zero, and the nine resulting simultaneous linear equations are solved for the nine coefficients. If the points had undergone an error-free rigid rotation and translation between the two positions, the result would be the desired rotation matrix, and the problem would be solved. Because there are errors in the determined position of the features, the resulting matrix is usually not simply a rotation, but involves stretching and skewing. The program ortho-normalizes the matrix. If the position errors were sufficiently small, this new matrix would be our answer. The errors are high enough to warrant adding the rigid rotation constraints in the least squares minimization. The error expression is converted from a linear expression in nine matrix coefficients into an unavoidably non-linear function in three parameters that uniquely characterize a rotation. This new error expression is differentiated with respect to each of the three rotation parameters, and the resulting expressions are equated to zero, giving us three non-linear equations in three unknowns. A strenuous attempt at an analytic solution of this simultaneous non-linear system failed, so the program contains code to solve the problem iteratively, by Newton's method. The rotation expressed by the ortho-normalized matrix from the previous step becomes the initial approximation. Newton's method for a multi-variate system involves finding the partial derivative of each expression whose root is sought with respect to each variable. In our case there are three variables and three equations, and consequently nine such derivatives. The nine derivatives, each a closed form expression of the rotation variables, are the coefficients of a 3 by 3 covariance matrix that characterizes the first order changes in the expressions whose roots are sought with the parameters. The next Newton's method approximation is found by multiplying the inverse of this matrix by the value of the root expressions, and subtracting the resulting values (which will be 0 at the root) from the parameter values of the previous approximation. Four or five iterations usually brings the parameters to within our floating point accuracy of the correct values. Occasionally, when the errors in the determined feature locations are high, the process does not converge. The program detects this by noting the change in the original error expression from iteration to iteration. In case of non-convergence, the program picks a random rotation as a new starting point, and tries again. It is willing to try up to several hundred times. The rotation with the smallest error expression ever encountered during such a search (including the initial approximation) is returned as the answer. Since the summations over the co-ordinate cross-products are done once and for all at the begining of the transformation determination, each iteration, involving evaluation of about a dozen moderately large expressions and a 3 by 3 matrix inversion, is relatively fast. The whole solving process, even in cases of pathological non-convergence, takes one or two seconds of computer time. Motion Stereo Mathematics Going over the previous ground in gory detail, the least squares expression to be minimized is Error = \!jsum((i=1),(N));\!jdiv(( (X[1,i]-X\!jsab(('),(2,i));)^2 + (Y[1,i]-Y\!jsab(('),(2,i));)^2 + (Z[1,i]-Z\!jsab(('),(2,i));)^2), (\!jxbp(U,i,2);)); where the X[1,i], Y[1,i] and Z[1,i] are the co-ordinates of the features as observed from the first cart position. X[2,i], Y[2,i] and Z[2,i] are the co-ordinates of the points as seen from the second position. X\!jsab(',(2,i ));, Y\!jsab(',(2,i )); and Z\!jsab(',(2,i )); are the points from the second position subjected to a co-ordinate transformation in the attempt to make them match the first set. U[i] is the combined uncertainty in the position of feature i in both frames. To get the most out of our data, we need to weight features proportional to their accuracy. To simplify the subsequent formulas, the U[i]'s are normalized to make the sum of their inverse squares unity. For a general linear transformation, these values are defined by [X\!jsab(',(2,i ));, Y\!jsab(',(2,i ));, Z\!jsab(',(2,i ));] = [X[2,i], Y[2,i], Z[2,i]] \f1x \!jmat(((((( AA ));((AB ));((AC ))));(((( BA ));((BB ) );((BC ))));(((( CA ));((CB ));((CC )))))); + [X[c], Y[c], Z[c]] where AA through CC is the "rotation" part of the transform and X[c] through Z[c] is the translation. Error can be expanded out and the summation can be distributed over the terms, giving us Error = \!jxbp((X),(c),(2));+\!jxbp((Y),(c),(2));+\!jxbp((Z),(c),(2)); + \!jbar((\!jxbp((X),(1),(2));));+\!jbar((\!jxbp((Y),(1),(2));));+\!jbar((\!jxbp((Z),(1),(2));)); + (AA^2 + AB^2 + AC^2)\!jbar((\!jxbp((X),(2),(2));)); + (BA^2 + BB^2 + BC^2)\!jbar((\!jxbp((Y),(2),(2));)); + (CA^2 + CB^2 + CC^2)\!jbar((\!jxbp((Z),(2),(2));)); - 2 \!jOP; AA \!jbar((X[1]X[2]));+AB \!jbar((Y[1]X[2]));+AC \!jbar((Z[1]X[2])); +BA \!jbar((X[1]Y[2]));+BB \!jbar((Y[1]Y[2]));+BC \!jbar((Z[1]Y[2])); +CA \!jbar((X[1]Z[2]));+CB \!jbar((Y[1]Z[2]));+CC \!jbar((Z[1]Z[2])); -(AA BA+AB BB+AC BC)\!jbar((X[2]Y[2])); - (AA CA+AB CB+AC CC)\!jbar((X[2]Z[2])); - (BA CA+BB CB+BC CC)\!jbar((Y[2]Z[2])); +(\!jbar((X[1]));-AA \!jbar((X[2]));-BA \!jbar((Y[2]));-CA \!jbar((Z[2]));)X[c] +(\!jbar((Y[1]));-AB \!jbar((X[2]));-BB \!jbar((Y[2]));-CB \!jbar((Z[2]));)Y[c] +(\!jbar((Z[1]));-AC \!jbar((X[2]));-BC \!jbar((Y[2]));-CC \!jbar((Z[2]));)Z[c] \!jCP; where \!jbar((X[1])); is \!jsum(i,);\!jdiv((X[1,i]),(\!jxbp(U,i,2);));, and \!jbar((X[1]Z[2])); is \!jsum(i,);\!jdiv((X[1,i]Z[2,i]),(\!jxbp(U,i,2);));, and similarly for the other barred quantities. With the expression in this form, the barred quantities can be evaluated once and for all at the start of the solving. Error can be differentiated partially with respect to X[c], Y[c] and Z[c]. If the resulting linear expressions are equated to zero and solved, we obtain: X[c] = \!jbar((X[1])); - AA \!jbar((X[2])); - BA \!jbar((Y[2])); - CA \!jbar((Z[2])); Y[c] = \!jbar((Y[1])); - AB \!jbar((X[2])); - BB \!jbar((Y[2])); - CB \!jbar((Z[2])); Z[c] = \!jbar((Z[1])); - AC \!jbar((X[2])); - BC \!jbar((Y[2])); - CC \!jbar((Z[2])); These are the components of the translation vector that minimizes Error for a general "rotation" matrix. They can be back substituted into Error, making it a function of the rotation alone. Error = \!jbar((\!jxbp((X),(1),(2));)); + \!jbar((\!jxbp((Y),(1),(2));)); + \!jbar((\!jxbp((Z),(1),(2));)); + (AA^2 + AB^2 + AC^2) \!jbar((\!jxbp((X),(2),(2));)); + (BA^2 + BB^2 + BC^2) \!jbar((\!jxbp((Y),(2),(2));)); + (CA^2 + CB^2 + CC^2) \!jbar((\!jxbp((Z),(2),(2));)); - 2 \!jOP; AA \!jbar((X[1]X[2]));+AB \!jbar((Y[1]X[2]));+AC \!jbar((Z[1]X[2])); +BA \!jbar((X[1]Y[2]));+BB \!jbar((Y[1]Y[2]));+BC \!jbar((Z[1]Y[2])); +CA \!jbar((X[1]Z[2]));+CB \!jbar((Y[1]Z[2]));+CC \!jbar((Z[1]Z[2])); -(AA BA+AB BB+AC BC)\!jbar((X[2]Y[2])); - (AA CA+AB CB+AC CC)\!jbar((X[2]Z[2])); - (BA CA+BB CB+BC CC)\!jbar((Y[2]Z[2])); \!jCP; - (\!jbar((X[1]));-AA \!jbar((X[2]));-BA \!jbar((Y[2]));-CA \!jbar((Z[2]));)^2 - (\!jbar((Y[1]));-AB \!jbar((X[2]));-BB \!jbar((Y[2]));-CB \!jbar((Z[2]));)^2 - (\!jbar((Z[1]));-AC \!jbar((X[2]));-BC \!jbar((Y[2]));-CC \!jbar((Z[2]));)^2 The linear first approximation is now found by partially differentiating this new Error with respect to AA through CC, setting these derivatives to zero and simultaneously solving the system of nine equations (which happen to be linear) so obtained for AA through CC. Now we get to the hard part. The matrix obtained in the preceding step is a general 3x3 transform, and not necessarily a true rotation. A true 3D rotation has only three free parameters, not nine. However these three parameters are chosen, some or all of the elements of a rotation matrix that characterizes the same rotation will be non-linear functions of the three parameters. We have chosen to represent rotations by a three element subset of a quaternion. A quaternion is a four element parameterization of a rotation, but any of the four elements may be derived from the other three (the sum of their squares is unity). Imagine an arbitrary axis through the origin, described by its angles alpha, beta and gamma with the co-ordinate axes. A general rotation can be specified as a twist of an angle pi around such an axis. The quaternion for this rotation is the four element list [P[0], P[x], P[y], P[z]] = [cos pi/2 , cos alpha sin pi/2 , cos beta sin pi/2 , cos gamma sin pi/2] The half angle components may seem a little peculiar, but the manipulations are simplified because of them. The rotation matrix corresponding to this quaternion is \!jmat(((((( -1+2 \!jxbp((P),(0),(2));+2 \!jxbp((P),(x),(2)); ) );(( 2 P[x]P[y]-2 P[0]P[z] ) );(( 2 P[0]P[y]+2 P[x]P[z] ))) );(((( 2 P[x]P[y]+2 P[0]P[z] ) );(( -1+2 \!jxbp((P),(0),(2));+2 \!jxbp((P),(y),(2)); ) );(( -2 P[0]P[x]+2 P[y]P[z] ))) );(((( -2 P[0]P[y]+2 P[x]P[z] ) );(( 2 P[0]P[x]+2 P[y]P[z] ) );(( -1+2 \!jxbp((P),(0),(2));+2 \!jxbp((P),(z),(2)); )))))); The P[0] element can be defined positive and equal to \!jnrt(,(1-\!jxbp(P,x,2);-\!jxbp(P,y,2);-\!jxbp(P,z,2);));. With this last substitution, the rotation is defined by three independent variables. Substituting these matrix elements (now involving square roots for P[0]) in place of AA through CC in Error creates an expression only a barbarian would inflict upon a reader. Suffice it to say that the resulting expression is partially differentiated with respect to P[x], P[y] and P[z], and the resulting, very non-linear, expressions are equated to zero. Let E represent the barbarous Error parameterized by the three quaternion sine elements. Name the three expressions as follows: E[x] = \!jdiv((E),(P[x])); , E[y] = \!jdiv((E),(P[y])); , E[z] = \!jdiv((E),(P[z])); The Newton's method iteration is defined by [P[x], P[y], P[z]]' = [P[x], P[y], P[z]] - [E[x], E[y], E[z]] \f1x exp((\!jmat(((((( \!jdiv((E[x]),(P[x])); ) );(( \!jdiv((E[x]),(P[y])); ) );(( \!jdiv((E[x]),(P[z])); ))) );(((( \!jdiv((E[y]),(P[x])); ) );(( \!jdiv((E[y]),(P[y])); ) );(( \!jdiv((E[y]),(P[z])); ))) );(((( \!jdiv((E[z]),(P[x])); ) );(( \!jdiv((E[z]),(P[y])); ) );(( \!jdiv((E[z]),(P[z])); ))))));), (-1)); The derivatives in the matrix, as well as E[x] through E[z] are messy closed form expressions of P[x], P[y] and P[z]. The bulk of the code in this part of the program, about a page of solid formulas, was produced automatically by MIT's Macsyma symbolic mathematics system. Work in progress The mapping program is being extended into its ultimate form, a system that can navigate the cart to a desired location in an unrestricted environment, avoiding obstacles along the way. The key element is a routine that finds a short clear path between two points in an obstacle field. This routine now exists as a stand-alone program, and has not yet been incorporated into the mapper. Since the features located by the mapper are simple points in the world, we must somehow add size and substance a posteriori. Each located feature is assumed to be part of a cylinder whose axis is vertical and passes through the feature, and whose radius is the feature's position uncertainty, and whose height is the feature's distance from the ground minus the position uncertainty. Cylinders so defined are considered obstacles only if they protrude above the ground. In the current program the ground plane is defined a priori, from the known height of the cart camera above the floor, and the angle of the camera with respect to the horizontal (measured before a run by a protractor/level). In later versions the cart will dynamically update its ground plane orientation model by observing its own motion as it drives forward. The endpoints of each meter-long lurch define a straight line that is parallel to the local ground. The vector component of the ground plane model in the direction of the lurch will be tilted to match the observed cart motion, the component perpendicular to that will be left unchanged. After moving in two non-colinear lurches, all ground-plane orientation parameters will be so updated. This process will allow the cart to keep its sanity while traversing hilly terrain. Because the motion determination has short term inaccuracies, the tilt model will be updated only fractionally at each move, in the manner of exponential smoothing. The cart itself has a radius of about 2 meters. It is convenient to add this radius to each and every obstacle, and conceptually shrink the cart to a point. The path finding problem then becomes one of drawing a line (or pulling a string) between two points through a maze of fattened cylindrical obstructions. Since the cart moves in a plane, the problem is really two dimensional. \!graphw((DSK:PATH.GOD)); \!fig(12,(The path finder at work in a randomly generated obstacle field. The cart's radius has been moved from the cart to every obstacle. It took about 15 compute seconds to find this route.)); An optimum path in such an environment will consist of either a straight run between start and finish, or a series of tangential segments between the cylinders and contacting arcs (imagine loosely laying a string from start to finish between the cylinders, then pulling it tight). The number of tangent paths between n obstacles is 2 n(n-1), because each obstacle may be circumnavigated in either a clockwise or a counter-clockwise direction. We can set up a table of tangential distances between pairs of obstacles, with two entries for each obstacle, one for each direction. If we set to infinity entries for tangents that are blocked by some other obstacle, the resulting table can be thought of as a distance matrix of a graph. The cost of determining this table is on the order of n^3 because each of the O(n^2) paths must checked for blockage against at least a fraction of the other obstacles. There are algorithms that take O(n^2) time to find the shortest path between two points in a graph. The current program uses one first described by Dijkstra [10]. Unfortunately, our tangent space is not exactly a graph. Going from a to b via an obstacle c involves not only the straight line tangential runs, but also a circular segment at c. The program adds in the cost of this at a point in the algorithm where the cost of going via an intermediate point is calculated. This addition keeps the cost estimates realistic, but destroys one of the pre-conditions of Dijkstra's algorithm. The result is that it always finds a good path, but sometimes it overlooks the optimal one. Figure 12 is a sample demonstration of the path finder. The circular obstacles were created by a random number generator. The originating point is the lower left corner, the destination is the upper right. References 1. H.P. Moravec, "Towards Automatic Visual Obstacle Avoidance", Proceedings of the 5th IJCAI, MIT, Cambridge, Mass., 1977, p. 584. 2. D.B. Gennery, "A Stereo Vision System for an Autonomous Vehicle", Proceedings of the 5th IJCAI, MIT, August 1977. 3. R. Nevatia & T. O. Binford, "Description and Recognition of Curved Objects," Artificial Intelligence, Vol. 8, pp.77-98, February 1977. 4. S. W. Zucker, "Relaxation Labelling and the Reduction of Local Ambiguities," Third International Joint Conference on Pattern Recognition, San Diego, November 1976. 5. R. Duda & P. Hart, Pattern Recognition and Scene Analysis, Wiley, 1973. 6. Y. Bard, Nonlinear Parameter Estimation, Academic Press, 1974. 7. F. A. Graybill, An Introduction to Linear Statistical Models Volume I, McGraw-Hill Book Company, 1961. 8. R.K. Nevatia, "Depth Perception by Motion Stereo", Computer Graphics and Image Processing, 1975. 9. R.A. Schmidt, "A Study of the Real-Time Control of a Computer Driven Vehicle", Stanford AI Memo 149, August 1971. 10. E.W. Dijkstra, "A Note on Two Problems in Connection with Graphs," Numerische Mathematik, \fC1, 269-271 (1959).