Three Dimensional Modelling and Graphics with Multiprocessors Hans P. Moravec Carnegie-Mellon University February 11, 1980 @heading(Introduction) Many current approaches to modelling of activity in three-space can be classed "Object Oriented". The modelled world is represented as a list of the entities in it, with position being one of the attributes. This is good for simple situations where the number of possible interactions is modest. As the number and complexity of the objects grows many things become increasingly difficult. The cost of detecting a collision, for instance, rises as the square of the number. Other representations are possible. The ones that will be described here are "Space Oriented". A volume is modelled (conceptually) as a three dimensional array of cells, with each cell representing a small volume of space, and with adjacent cells representing adjacent volumes. Each cell contains information about the objects or portions of objects in it. Collision detection involves examining each cell for the presence of more than one object, a process which is largely independent of the number of objects. There is a large fixed cost for a given spatial resolution and modelled volume size, but the cost grows only very slowly with complexity in the objects. @heading(The Oct-Tree) If a system is being used to model physical objects, explicit representation of all the spatial cells is wasteful, because much of the volume will probably be empty, or filled with a uniform substance. More efficient would be a "sparse matrix" formulation. Imagine a large cube containing the volume to be modelled. If the cube is empty or its contents are "simple" (by some definition), it is represented by a "primitive node". If its contents are more complex, it is bisected in three orthogonal directions by cutting planes parallel to its faces, into eight smaller subcubes. The original cube is then represented by eight pointers, each pointing to a representation of a different one of the subcubes. This conditional subdivision process is applied to each of the small cubes recursively, these being split again and again until the result is either at the size resolution of the model, or until they have become simple. The space is thus represented by a tree of nodes, branching as necessary to represent detailed areas of the space. The leaves of the tree are either very simple or very small spaces. A useful refinement on this idea is to put an approximate (fuzzy) description of the cube contents, such as its average density, into the nodes of the tree even when they are further subdivided. This permits coarse results to be derived from the model without consideration of unnecessary detail. In some circumstances, as when generating images without aliasing artifacts, the approximations, are more desirable than the full resolution descriptions for reasons besides cost. Realistic shaded graphics of the modelled spaces can be obtained with this representation through a procedure that can report on the fist object intersected by a ray thrown from a given point in the model in a given direction. A picture can be generated by choosing an eye position, with an image plane to determine the direction of view. A ray is launched from the eye through each pixel in the image plane using the throwing subroutine. The pixel is then colored according to the object the ray first penetrates (using knowledge of light source positions and an object reflectivity model). Images with shadows can be generated by an elaboration of this process, first throwing a ray from eye to world, then noting the co-ordinates of the point of first intersection. A ray from each light source is then launched at the point of intersection of the eye ray. If it first impacts at the same point, the point is illuminated by that light source. If not, it is in shadow (for that source). @heading(The Ray Launcher) The throwing procedure works recursively as follows. It considers the ray thrown into the cube described by the top level of the description tree (this cube describes the entire space modelled). If the cube is simple, it does the analytic geometry which answers the question "does the ray penetrate any object within the confines of the cube", and returns the answer. If the cube is not simple, the routine enumerates those of the eight subcubes at the next level of subdivision which the ray penetrates @i(in the order in which it penetrates them). It applies itself recursively to the affected subcubes in that order until one of these invocations responds positively, meaning there was a hit (the ray penetrated an object) within the confines of that subcube, in which case it returns the co-ordinates of this hit to its calling procedure. If none of the subcube invocations responds positively, it returns a negative response. @heading(Overlaps) Collision detection is a simple process once the tree is built. A recursive procedure examines a node, and if it is simple checks if two objects intersect. If not simple, the procedure applies itself to each of the eight subcubes at the next subdivision level until either one of these applications reports a collision, or until all eight have been exhausted, in which case it returns "no collision". @heading(Tree Growing) Building the tree representation from an object oriented description is the most difficult operation encountered so far. It depends, of course, on the exact nature of the original object description, and also on the exact definition of "simple" subcube. Suppose "simple" means a subcube is either empty, or entirely filled with the uniform substance of an object, and suppose also that we have a procedure which, given an object description and the co-ordinates and size of a subcube, can answer the question "does the object intersect the subcube" in one of three ways: "no", "yes, it partly fills the cube" and "yes, it fully fills the cube". The building procedure inserts the objects one at a time into a tree space, and for each one proceeds as follows. It asks the intersection question of the top cube. If the answer is "no", the procedure is finished with that object, it is outside the modelled space. If the answer is "yes, entirely", the top node is marked to indicate that the object wholly envelopes the cube. A simple recursive procedure which marks all existing subtrees of the current one in a similar manner is also invoked. If the answer is "yes, partly", the procedure applies itself recursively to each of the subtrees at the next level (creating an eight way pointer for the current node if none exists already), and marks the current node as partially contacting the object. Objects can be removed from the space by mimicing their creation, marching down the tree in the same order, but removing rather than creating any nodes containing only the object, and also deleting nodes that after modification have eight empty pointers. @heading(Cost Effectiveness) As noted earlier, this space oriented approach is more expensive for simple scenes than more classical object oriented methods, both in computer time and in memory space. However, processing times for object approaches grow at least linearly with scene complexity, while many operations, such as generating shaded 3D representations of the model, grow only approximately logarithmically in tree space. The differences between object and space representations of 3-space are analogous to the differences between the vector list and raster representations of line drawings. For simple drawings, the vector list is much more compact, and faster to generate. As the complexity of the drawings increases, the vector list grows unboundedly, while the raster description remains constant in size. The tree representation has a number of additional advantages. A major one is that almost all of the operations one might wish to perform with it can be expressed as recursive procedures which invoke themselves multiply for each subtree growing from a node. This process is well suited to a parallel processor. I visualize a tree of about a thousand moderately fast processors, either in a fixed pattern, or interconnected by an n log n switching net (the details remain to be worked out), which can generate realistic moving pictures of very complex scenes at full television rates. Each processor would have a moderate to large amount of memory (with current IC sizes 64K bytes would be reasonable, larger memories make sense for the future) which would hold the entire description of one or more intermediate subtrees of the world model. As the tree was modified by insertion, deletion and movement of objects, adjacent processors would trade tree portions, to keep the workload balanced. @heading(Complexity Frontiers) A machine able to throw a thousand viewing rays in parallel could, in a moderate amount of time, model the effect of light bouncing from diffusing surface to diffusing surface in a complex scene. Such secondary and higher order lighting effects are important visual cues. A recent series of experiments [ref] shows how human observers are able to separate the effect of surface reflectivity from light source intensity by subconsciously noting the effect of scattered light on nearby surfaces. The observers were able, at a glance, to correctly identify the surface color of white objects in a dimly lit white room, and of dark grey objects in a brightly lit dark grey room, even though the amount of light reflected from the grey surfaces was higher. The observers could not see the source of light directly. The only possible cue was the lessened intensity of secondary and higher order illumination, compared to the direct lighting, in the grey room. Pictures produced by the very best existing 3D shaded graphics systems have a certain "plastic" aura of unreality. This is partly because of the mathematical smoothness of the modelled surfaces. Equally to blame is the excessive simplicity of the lighting model. Objects modelled in a recursively subdivided space need have neither of these properties. Since each little bit of surface must be represented by a separate node of the tree in any case, it can have reflection properties slightly different from its neighbors without greatly increased memory cost. Fancy illumination can be arranged, though the time cost is high. Simulated photons are created at the light sources in the scene and propagated to intercepting surfaces by the ray throwing subroutine. On hitting a surface a new ray is generated, with a direction and intensity chosen by a random number and the scattering statistics of the surface. The process stops when the ray is attenuated below some limiting intensity, leaves the modelled space, or is intercepted by the simulated camera which is generating the image (this camera must have a finite aperture to collect any photons, and will thus have a finite depth of focus!). A more efficient alternative in many cases is to propagate the photons backwards, from the eye position. Rays from the eye are bounced around until they peter out or until they encounter a light source. In the latter case the intensity of the source is multiplied by the attenuation factor accumulated by the bouncing ray, and the product is added to the imaging plane pixel corresponding to the ray. This second approach favors extended light sources. "Compromise" schemes are also possible. Light sources can illuminate surfaces without further bounces, then backwards photons from the camera can bounce and accumulate secondary light. Fewer photons are wasted with such intermediate schemes. Because of the great number of rays that must be generated to make a reasonably ungrainy image, the process is slow. A machine that can simulate real time travel through a modelled world using simple first-order lighting might take a half hour to generate a single frame of a fully lit representation. @heading(Pre-Lighting) There is a way to have our cake and eat it too. That is, we can generate real-time images from a point of view flying around in a fully illuminated world. The secret is pre-illumination of the model. Before any pictures are generated, rays are thrown from the light sources. As they hit each surface, the reflection (and refraction) characteristics of the surface are used to calculate the illumination in all directions from that surface point contributed by the ray. After this calculation the ray is passed on statistically as usual. Each surface point accumulates the effect of all rays hitting it in a coarsely specified vector function which models all the light leaving that point from all causes. Viewing such a pre-lit scene is a first order process. Rays from an eye are thrown into the world, and the vector illumination function at the point of first intersection is examined to provide the coloration of the image pixel corresponding to the ray. Pre-illumination is ideal when all surfaces are matte. A perfectly scattering face throws light equally in all (forward) directions, and its illumination contribution can be characterized by a single quantity. As the face becomes more mirror-like, the directional variations increase, and the angular resolution of of the illumination vector function must also be higher. In the extreme case of a perfect mirror, each surface point must describe a full resolution scene, and the memory requirements become ridiculous. On the other hand, perfect mirrors can be easily modelled from the camera point of view; each line of sight from the eye becomes a single new ray on hitting a mirror. This suggests that in some models a hybrid approach would be most economical. Another approach would be to model light as a wave. Instead of throwing rays from point to point, the computation would propagate tiny spherical wavefronts from cell to spatial cell. This approach removes many of the combinatorial problems with the ray scheme for full illumination, but involves another large increment in fixed cost, since computations must be done for vast numbers of empty cells. @heading(Jaggies) Anti-aliasing, a process necessary to avoid moving jagged edges resembling marching ants at the edges of animated objects, can be handled in this system by fuzzy approximations of complex cubes stored at the non-terminal nodes of the recursive tree. Each node contains a sort of average picture of the smaller cubes of which it is composed. A ray thrown from the eye into the model space is stopped, and applied to the approximation when the projection of the cube it has just entered would be smaller than a pixel or two in the image plane. In some cases, such as the pre-illuminated world, computation of such an average is straightforward. In no case is great accuracy very important. Viewing in this manner is also faster than always fully resolving. @heading(Motion) The tree representation does not handle moving objects very easily. The most straightforward way to accomplish motion is to repeatedly remove an object from the space and re-insert it in a new position. This process could be accomplished reasonably quickly by the multiprocessor implementation, and may be necessary for collision detection. If graphics is the only goal, however, a shortcut is possible. The definition of "simple" subcube can vary. We can model individual rigid objects in the way suggested above without worrying about motion. If each such object is contained in its own tree structure, we can build a world tree whose leaves point to the trees containing objects. Each object would be linked to the world through a rotation and translation, or perhaps a full linear or projective transform, permitting scaling and skewing. A ray originating in the world would be transformed into one in the object space by subjecting it to the object transform, which must preserve the property of straightness, but nothing else. The transformed ray would then be propagated in the object space as usual. An object could be moved simply by altering the numbers in its connecting transform. The world tree would also have to be changed slightly from time to time, as an object moved from one world cell to another, but the world would have a very sparse tree and the work involved would be small. A hierarchy of trees connected by projective transforms would make modelling of articulated structures easy. @Heading(The State of Things) All experiments have thus far been conducted on a timeshared PDP10, and no hardware has been built. A conventional 3D shaded drawing program has been constructed for comparison purposes. It describes objects polygonally, and simulates both matte and specular reflection. It has generated many pretty pictures of scenes containing up to 10,000 polygons. A TV resolution shaded image of a complex scene can be made in about a minute of computer time. A prototype program which takes polygonal world descriptions compatible with the conventional shader, but which builds a recursive tree structure from them, is written and working. Two other programs which generate first-order and shadowed representations of the scene are also complete. Viewing of simple scenes (the only ones attempted so far) takes about ten times as long as with the conventional shader. No collision detection has yet been attempted. The work can be continued for the time being on a single processor. The representation needs a lot of memory even to get started, and a VAX to be acquired by the robotics institute greatly relieves the memory constraints inherent in the PDP10. Many concepts should be tried and proven effective (or defective) before we want to commit to even a small multiprocessor. The CM* machine might be suitable for early multiprocessor experiments. The research will involve investigation of alternative approaches as well as including efforts to optimize the tree representation, and to find more efficient algorithms that use it.