Three Dimensional
Modelling and Graphics
with Multiprocessors
Hans P. Moravec
Carnegie-Mellon University
February 11, 1980
@heading(Introduction)
Many current approaches to modelling of activity in three-space
can be classed "Object Oriented". The modelled world is represented as
a list of the entities in it, with position being one of the
attributes. This is good for simple situations where the number of
possible interactions is modest. As the number and complexity of the
objects grows many things become increasingly difficult. The cost of
detecting a collision, for instance, rises as the square of the number.
Other representations are possible. The ones that will be
described here are "Space Oriented". A volume is modelled
(conceptually) as a three dimensional array of cells, with each cell
representing a small volume of space, and with adjacent cells
representing adjacent volumes. Each cell contains information about the
objects or portions of objects in it. Collision detection involves
examining each cell for the presence of more than one object, a process
which is largely independent of the number of objects. There is a large
fixed cost for a given spatial resolution and modelled volume size, but
the cost grows only very slowly with complexity in the objects.
@heading(The Oct-Tree)
If a system is being used to model physical objects, explicit
representation of all the spatial cells is wasteful, because much of
the volume will probably be empty, or filled with a uniform substance.
More efficient would be a "sparse matrix" formulation. Imagine a large
cube containing the volume to be modelled. If the cube is empty or its
contents are "simple" (by some definition), it is represented by a
"primitive node". If its contents are more complex, it is bisected in
three orthogonal directions by cutting planes parallel to its faces,
into eight smaller subcubes. The original cube is then represented by
eight pointers, each pointing to a representation of a different one of
the subcubes. This conditional subdivision process is applied to each
of the small cubes recursively, these being split again and again until
the result is either at the size resolution of the model, or until they
have become simple. The space is thus represented by a tree of nodes,
branching as necessary to represent detailed areas of the space. The
leaves of the tree are either very simple or very small spaces.
A useful refinement on this idea is to put an approximate
(fuzzy) description of the cube contents, such as its average density,
into the nodes of the tree even when they are further subdivided. This
permits coarse results to be derived from the model without
consideration of unnecessary detail. In some circumstances, as when
generating images without aliasing artifacts, the approximations, are
more desirable than the full resolution descriptions for reasons
besides cost.
Realistic shaded graphics of the modelled spaces can be
obtained with this representation through
a procedure that can report on
the fist object intersected by a ray thrown from a given point in the
model in a given direction. A picture can be generated by
choosing an eye position, with an image plane to determine the
direction of view. A ray is launched from the eye through each pixel in
the image plane using the throwing subroutine. The pixel is then
colored according to the object the ray first penetrates (using
knowledge of light source positions and an object reflectivity model).
Images with shadows can be generated by an elaboration of this process,
first throwing a ray from eye to world, then noting the co-ordinates of
the point of first intersection. A ray from each light source is then
launched at the point of intersection of the eye ray. If it first impacts
at the same point, the point is illuminated by that light
source. If not, it is in shadow (for that source).
@heading(The Ray Launcher)
The throwing procedure works recursively as follows. It
considers the ray thrown into the cube described by the top level of
the description tree (this cube describes the entire space modelled).
If the cube is simple, it does the analytic geometry which answers the
question "does the ray penetrate any object within the confines of the
cube", and returns the answer. If the cube is not simple, the
routine enumerates those of the eight subcubes at the next level of
subdivision which the ray penetrates @i(in the order in which it
penetrates them). It applies itself recursively to the affected subcubes
in that order until one of these invocations responds positively,
meaning there was a hit (the ray penetrated an object) within the
confines of that subcube, in which case it returns the co-ordinates of
this hit to its calling procedure. If none of the subcube invocations
responds positively, it returns a negative response.
@heading(Overlaps)
Collision detection is a simple process once the tree is built.
A recursive procedure examines a node, and if it is simple checks if
two objects intersect. If not simple, the procedure
applies itself to each of the eight subcubes at the next subdivision
level until either one of these applications reports a collision, or
until all eight have been exhausted, in which case it returns "no
collision".
@heading(Tree Growing)
Building the tree representation from an object oriented
description is the most difficult operation encountered so far. It
depends, of course, on the exact nature of the original object
description, and also on the exact definition of "simple" subcube.
Suppose "simple" means a subcube is either empty, or entirely filled
with the uniform substance of an object, and suppose also that we have
a procedure which, given an object description and the co-ordinates and
size of a subcube, can answer the question "does the object intersect
the subcube" in one of three ways: "no", "yes, it partly fills the
cube" and "yes, it fully fills the cube". The building procedure
inserts the objects one at a time into a tree space, and for each one
proceeds as follows. It asks the intersection question of the top cube.
If the answer is "no", the procedure is finished with that object, it
is outside the modelled space. If the answer is "yes, entirely", the
top node is marked to indicate that the object wholly envelopes the
cube. A simple recursive procedure which marks all existing subtrees of
the current one in a similar manner is also invoked. If the answer is
"yes, partly", the procedure applies itself recursively to each of the
subtrees at the next level (creating an eight way pointer for the
current node if none exists already), and marks the current node as
partially contacting the object.
Objects can be removed from the space by mimicing their
creation, marching down the tree in the same order, but removing rather
than creating any nodes containing only the object, and also deleting
nodes that after modification have eight empty pointers.
@heading(Cost Effectiveness)
As noted earlier, this space oriented approach is more
expensive for simple scenes than more classical object oriented
methods, both in computer time and in memory space. However, processing
times for object approaches grow at least linearly with scene
complexity, while many operations, such as generating shaded 3D
representations of the model, grow only approximately logarithmically
in tree space.
The differences between object and space representations of
3-space are analogous to the differences between the vector list
and raster representations of line drawings. For simple drawings, the
vector list is much more compact, and faster to generate. As the
complexity of the drawings increases, the vector list grows
unboundedly, while the raster description remains constant in size.
The tree representation has a number of additional advantages.
A major one is that almost all of the operations one might wish to
perform with it can be expressed as recursive procedures which invoke
themselves multiply for each subtree growing from a node. This process
is well suited to a parallel processor. I visualize a tree of about a
thousand moderately fast processors, either in a fixed pattern, or
interconnected by an n log n switching net (the details remain to be
worked out), which can generate realistic moving pictures of very
complex scenes at full television rates. Each processor would have a
moderate to large amount of memory (with current IC sizes 64K bytes
would be reasonable, larger memories make sense for the future) which
would hold the entire description of one or more intermediate subtrees
of the world model. As the tree was modified by insertion, deletion and
movement of objects, adjacent processors would trade tree portions, to
keep the workload balanced.
@heading(Complexity Frontiers)
A machine able to throw a thousand viewing rays in parallel
could, in a moderate amount of time, model the effect of
light bouncing from diffusing surface to diffusing surface in a complex
scene.
Such secondary and higher order lighting effects are important
visual cues. A recent series of experiments [ref] shows how human
observers are able to separate the effect of surface reflectivity from
light source intensity by subconsciously noting the effect of scattered
light on nearby surfaces. The observers were able, at a glance, to
correctly identify the surface color of white objects in a dimly lit
white room, and of dark grey objects in a brightly lit dark grey room,
even though the amount of light reflected from the grey surfaces was
higher. The observers could not see the source of light directly. The
only possible cue was the lessened intensity of secondary and higher
order illumination, compared to the direct lighting, in the grey room.
Pictures produced by the very best existing 3D shaded graphics
systems have a certain "plastic" aura of unreality. This is partly
because of the mathematical smoothness of the modelled surfaces.
Equally to blame is the excessive simplicity of the lighting model.
Objects modelled in a recursively subdivided space need have
neither of these properties. Since each little bit of surface must be
represented by a separate node of the tree in any case, it can have
reflection properties slightly different from its neighbors without
greatly increased memory cost. Fancy illumination can be arranged,
though the time cost is high. Simulated photons are created at the
light sources in the scene and propagated to intercepting surfaces by
the ray throwing subroutine. On hitting a surface a new ray is
generated, with a direction and intensity chosen by a random number and
the scattering statistics of the surface. The process stops when the
ray is attenuated below some limiting intensity, leaves the modelled
space, or is intercepted by the simulated camera which is generating
the image (this camera must have a finite aperture to collect any
photons, and will thus have a finite depth of focus!).
A more efficient alternative in many cases is to propagate the
photons backwards, from the eye position. Rays from the eye are bounced
around until they peter out or until they encounter a light source. In
the latter case the intensity of the source is multiplied by the
attenuation factor accumulated by the bouncing ray, and the product is
added to the imaging plane pixel corresponding to the ray. This second
approach favors extended light sources.
"Compromise" schemes are also possible. Light sources can
illuminate surfaces without further bounces, then backwards photons
from the camera can bounce and accumulate secondary light. Fewer
photons are wasted with such intermediate schemes.
Because of the great number of rays that must be generated to
make a reasonably ungrainy image, the process is slow. A machine that
can simulate real time travel through a modelled world using simple
first-order lighting might take a half hour to generate a single frame
of a fully lit representation.
@heading(Pre-Lighting)
There is a way to have our cake and eat it too. That is, we can
generate real-time images from a point of view flying around in a fully
illuminated world. The secret is pre-illumination of the model. Before
any pictures are generated, rays are thrown from the light sources. As
they hit each surface, the reflection (and refraction) characteristics
of the surface are used to calculate the illumination in all directions
from that surface point contributed by the ray. After this calculation
the ray is passed on statistically as usual. Each surface point
accumulates the effect of all rays hitting it in a coarsely specified
vector function which models all the light leaving that point from all
causes. Viewing such a pre-lit scene is a first order process. Rays
from an eye are thrown into the world, and the vector illumination
function at the point of first intersection is examined to provide
the coloration of the image pixel corresponding to the ray.
Pre-illumination is ideal when all surfaces are matte. A
perfectly scattering face throws light equally in all (forward)
directions, and its illumination contribution can be characterized
by a single quantity. As the face becomes more mirror-like, the
directional variations increase, and the angular resolution of
of the illumination vector function must also be higher. In the
extreme case of a perfect mirror, each surface point must describe
a full resolution scene, and the memory requirements become ridiculous.
On the other hand, perfect mirrors can be easily modelled from
the camera point of view; each line of sight from the eye becomes
a single new ray on hitting a mirror. This suggests that in some
models a hybrid approach would be most economical.
Another approach would be to model light as a wave. Instead of
throwing rays from point to point, the computation would propagate tiny
spherical wavefronts from cell to spatial cell. This approach removes
many of the combinatorial problems with the ray scheme for full
illumination, but involves another large increment in fixed cost, since
computations must be done for vast numbers of empty cells.
@heading(Jaggies)
Anti-aliasing, a process necessary to avoid moving jagged edges
resembling marching ants at the edges of animated objects, can be
handled in this system by fuzzy approximations of complex cubes stored
at the non-terminal nodes of the recursive tree. Each node contains a
sort of average picture of the smaller cubes of which it is composed. A
ray thrown from the eye into the model space is stopped, and applied to
the approximation when the projection of the cube it has just entered
would be smaller than a pixel or two in the image plane. In some cases,
such as the pre-illuminated world, computation of such an average is
straightforward. In no case is great accuracy very important.
Viewing in this manner is also faster than always fully resolving.
@heading(Motion)
The tree representation does not handle moving objects very
easily. The most straightforward way to accomplish motion is to
repeatedly remove an object from the space and re-insert it in a new
position. This process could be accomplished reasonably quickly by the
multiprocessor implementation, and may be necessary for collision
detection. If graphics is the only goal, however, a shortcut is
possible.
The definition of "simple" subcube can vary. We can model
individual rigid objects in the way suggested above without worrying
about motion. If each such object is contained in its own tree
structure, we can build a world tree whose leaves point to the trees
containing objects. Each object would be linked to the world through a
rotation and translation, or perhaps a full linear or projective
transform, permitting scaling and skewing. A ray originating in the
world would be transformed into one in the object space by subjecting
it to the object transform, which must preserve the property of
straightness, but nothing else. The transformed ray would then be
propagated in the object space as usual. An object could be moved
simply by altering the numbers in its connecting transform. The world
tree would also have to be changed slightly from time to time, as an
object moved from one world cell to another, but the world would have a
very sparse tree and the work involved would be small. A hierarchy
of trees connected by projective transforms would make modelling of
articulated structures easy.
@Heading(The State of Things)
All experiments have thus far been conducted on a timeshared
PDP10, and no hardware has been built.
A conventional 3D shaded drawing program has been constructed
for comparison purposes. It describes objects polygonally, and
simulates both matte and specular reflection. It has generated many
pretty pictures of scenes containing up to 10,000 polygons. A TV
resolution shaded image of a complex scene can be made in about a minute
of computer time.
A prototype program which takes polygonal world descriptions
compatible with the conventional shader, but which builds a recursive
tree structure from them, is written and working. Two other programs
which generate first-order and shadowed representations of the scene
are also complete. Viewing of simple scenes (the only ones attempted so
far) takes about ten times as long as with the conventional shader. No
collision detection has yet been attempted.
The work can be continued for the time being on a single
processor. The representation needs a lot of memory even to get
started, and a VAX to be acquired by the robotics institute greatly
relieves the memory constraints inherent in the PDP10. Many concepts
should be tried and proven effective (or defective) before we want to
commit to even a small multiprocessor. The CM* machine might be
suitable for early multiprocessor experiments.
The research will involve investigation of alternative approaches
as well as including efforts to optimize the tree representation, and to
find more efficient algorithms that use it.