3D Graphics and the Wave Theory
Hans P. Moravec
Robotics Institute
Carnegie-Mellon University
January 1981
------------------------------------------------------------------
This is the Scribe source for the paper
"3D Graphics and the Wave Theory" by Hans Moravec
presented at the 1981 Siggraph conference, Dallas, TX, August 1981
published in "Computer Graphics, v15#3 August 1981, pp. 289-296
-------------------------------------------------------------------
@PrefaceSection(Abstract)
A continuing trend in computer representation of three
dimensional synthetic scenes is the ever more accurate modelling of
complex illumination effects. Such effects provide cues necessary for
a convincing illusion of reality. The best current methods simulate
multiple specular reflections and refractions, but handle at most one
scattering bounce per light ray. They cannot accurately simulate
diffuse light sources, nor indirect lighting via scattering media,
without prohibitive increases in the already very large computing
costs.
Conventional methods depend implicitly on a @i(particle)
model; light propagates in straight and conceptually infinitely thin
rays. This paper argues that a @i(wave) model has important
computational advantages for the complex situations. In this
approach, light is represented by wave fronts which are stored as two
dimensional arrays of complex numbers.
The propagation of such a front can be simulated by a linear
transformation. Several advantages accrue. Propagations in a
direction orthogonal to the plane of a front are convolutions which
can be done by FFT in @b(O)(@i(n)@ log@i(n)) time rather than the
@i(n)@+(2) time for a similar operation using rays. The wavelength of
the illumination sets a resolution limit which prevents unnecessary
computation of elements smaller than will be visible. The generated
wavefronts contain multiplicities of views of the scene, which can be
individually extracted by passing them through different simulated
lenses. Lastly the wavefront calculations are ideally suited for
implementation on available array processors, which provide more cost
effective calculation for this task than general purpose computers.
The wave method eliminates the aliasing problem; the
wavefronts are inherently spatially filtered, but substitutes
diffraction effects and depth of focus limitations in its stead.
@newpage
@heading(Introduction)
The realism in computer synthesized images of 3D scenes has
gradually increased from engineering drawings of simple shapes to
representations of complex situations containing textured specular
reflectors and shadows. Until recently the light models have been to
first order; the effect of only one bounce of light from the light
sources to the eye via a surface was simulated. The stakes have
recently gone up again with the introduction of ray tracing methods
that simulate multiple non-dispersive refractions and specular
reflections @cite(Kay79) @cite(Whitted80).
With each increment in the faithfulness of the process new
properties of real light had to be added. Thus far the evolution has
been in the direction of the particle theory of light, probably
because straight rays preserve most of the properties of 3D computer
graphics' engineering drawing roots.
An obvious and desirable next step in the process is the
simulation of scattering, as well as specular, reflection of light
from surface to surface. That such effects are important is shown
graphically by the subjective impressions of increasing realism as the
models become more complex (see references in @cite(Whitted80)) and
objectively in visual psychology experiments such as those by
Gilchrist @cite(Gilchrist79), in which subjects were able to
intuitively separate the effects of surface albedo and light source
brightness by noting the relative intensity of scattered illumination.
While a specular/refractive bounce can convert a ray from a
simulated camera or lightsource into two rays, a scattering can make
hundreds or thousands, heading off in a multitude of directions.
Since pictures modelling only specular bounces themselves take about
an hour of computer time to generate, simulation of multiple
scattering (and incidentally, extended light sources) by extension of
conventional methods is out of reach for the immediate future.
@heading(A Way Out)
One solution may be a different approach in simulating the
physical reality. The ray representation implicitly assumes an
effectively infinite precision in the position of objects and the
heading of rays. This leads to some unnecessary calculations, as when
several rays bouncing from nearly the same surface point in nearly the
same direction (perhaps originating from different light sources) are
propagated by separate calculations.
We can evade these and other inefficiencies by using the
particle theory's major competitor, the wave theory of light. Rather
than multitudes of rays bouncing from light sources and among objects,
the illumination is done by complex wavefronts described over large
areas. Such wavefronts can be represented by arrays of complex
numbers, giving the amplitude and relative phase of portions of the
front, embedded in the simulated space. Lesem et. al. used this
approach @cite(Lesem68) to make physical holograms (though of
disappointingly low quality).
Note that a plane array of complex wave coefficients is very
like a window into a scene. The wavefront, fully described by the
coefficients, can represent multiple point and extended sources of
light at various distances behind (or even in front of) the window.
Figure 1 is a simple example, a slice through a window generating
light that seems to come from a point source very far away in the
lower left direction. A picture of the scene behind the window can be
formed by passing the wavefront through a simulated lens onto a
simulated screen. Moving the lens center changes the apparent point
of view. The size of the lens and its distance from the scene
determine the depth of focus of the image (finite depth of focus is
one of the "features" of this method. The wave representation could
be used to do the lighting for a scene later imaged by a ray method if
infinite depth of focus is desired). The maximum number of resolvable
pixels in the image can be no greater than the number of wavefront
coefficients - no more information is present and any attempt to get
more resolution in the image than in the wavefront will result in
blurring. Also, no object feature smaller than a wavelength of the
simulated light can be represented. Thus the wavelength must not be
too large. It should also not be smaller than necessary, since the
computational cost goes up dramatically as the wavelength drops.
@heading(Simplifications)
Here's one way to use the light wave idea. It deviates from a
true physical simulation in the interests of simplicity and
computational efficiency. Goodman @cite(Goodman68) presents the
fundamental mathematics. Feynman @cite(Feynman63) is a good source for
aspects of real light I've maligned.
We will use light of a single wavelength, and we assume that
all optical effects are linear. The light is propagated according to
scalar diffraction theory, which behaves realistically except for very
short paths. The wavelength must be no larger than the smallest
object feature we wish to see, but should not be too small because the
amount of computation increases dramatically as the wavelength drops.
To faithfully represent a wavefront, there must be a coefficient every
half wavelength across the width of a description, so the number of
complex numbers needed in a description increases as the square of the
inverse wavelength. We also assume that only the steady state of the
illumination is important. This allows us to play cavalierly with
time. For instance, a wavefront may be propagated from one plane
description to a parallel one some distance away in a single step,
even though different portions of the effect have different transit
times.
The scene is represented as a box of half wavelength on a side
cells, roughly 1000 cells cubed. The content of each cell is
characterized by two complex coefficients, one giving the absorption
and phase shift (simulating refractive index) for transmitted light,
the other giving the reflection attenuation and phase shift
(simulating surface scattering). Some cells may also contain a term
to be unconditionally added during a wavefront calculation, making
them light sources. This 3D array need not be explicitly stored, but
1000@+(2) cell slices perpendicular to at least one of the major axes
should be efficiently extractable at will. One suitable storage
scheme is the recursively subdivided "oct-tree" representation
@cite(Reddy78), which permits large uniform volumes to be kept as
single nodes of a tree.
@heading(Some Details)
The pictures in this paper are of scenes contrived to make
them computationally cheap. Each consists of a number of very thin
objects confined to one of a small number of parallel planes. The
wavefront calculations are done in a direction parallel to these
object planes. The lenses and in the scenes are actually thin regions
of slowly varying transmissive (a bit like Fresnel optics).
The light sources in the the first object plane (plane @b(a))
generate a wavefront which we will call @b(WF)@-(a). @b(WF)@-(a)(i,j)
where i and j are integers between 0 and 511, represents one complex
coefficient in a half wavelength grid across the area of plane @b(a).
The effect of @b(WF)@-(a) on @b(b), the second plane, (call it
@b(WF)@-(ab)) is the sum of the effects of every @b(WF)@-(a)(i,j) on
every coefficient @b(WF)@-(ab)(k,l). Each such contribution involves
noting the distance between the @b(WF)@-(a)(i,j) and the
@b(WF)@-(ab)(k,l) cells and adjusting the phase (because of the time
delay introduced by travelling over such a distance) and the amplitude
(because of the inverse square attenuation or equivalently because of
the amount of solid angle cell @b(WF)@-(ab)(k,l) occupies when seen
from @b(WF)@-(a)(i,j)). This operation is equivalent to a
multiplication of the complex valued wavefront coefficient by a
complex valued propagation coefficient to obtain a complex valued
contribution to the new wavefront. Doing this process
straightforwardly means multiplying each of our 1024@+(2) wavefront
coefficients by 1024@+(2) propagation coefficients and summing the
products appropriately into 1024@+(2) resultant waveform coefficients.
This is over a trillion complex multiplications and additions, and too
expensive for present day conventional hardware. Fortunately there is
a cheaper way to achieve the same result. Note that for a given k and
l the distance from @b(WF)@-(a)(i,j) to @b(WF)@-(ab)(i+k,j+l), and
thus the propagation coefficient, is the same for all values of i and
j. Call this coefficient @b(P)@-(a,b)(k,l). We can now express the
resultant wavefront by
@center[@b(WF)@-(ab)(m,n) = Sum@-(i,j) @b(P)@-(ab)(m-i,n-j) @b(WF)@-(a)(i,j)]
This is the convolution of @b(WF)@-(a) with @b(P)@-(ab), and
will be denoted by @b(WF)@-(a)@b(*)@b(P)@-(ab). Such a convolution
can be accomplished by taking the Fourier transforms, multiplying them
term by term then taking the inverse Fourier transform of the result:
@center[@b(WF)@-(ab) =
@b(WF)@-(a) @b(*) @b(P)@-(ab) =
@b(F)@+(-1)(@b(F)(@b(WF)@-(a)) @b(.) @b(F)(@b(P)@-(ab)))]
where @b(F) denotes Fourier transform and @b(.) denotes the termwise
product of two arrays of like dimension to obtain a third. Since
@b(P)@-(ab) is an invariant of the scene, @b(F)(@b(P)@-(ab)) can be
calculated once and for all when the scene is created. The symmetries
@center[@b(P)@-(ab)(i,j) = @b(P)@-(ab)(j,i) = @b(P)@-(ab)(-i,j) = @b(P)@-(ab)(j,-i)]
can help cut down that cost a little.
The Fast Fourier Transform (FFT) is a method of arranging the
calculations so that only about @i(n)@ log@-(2)@i(n) multiplications
and a similar number of additions are required. A two dimensional
transform like we require can be accomplished by taking a one
dimensional FFT of each row, replacing each row by its transform, then
doing one dimensional FFT's of the columns of the result.
In our case the one dimensional transforms have 1024 points,
and require about 10,000 complex multiplies each. There are 2x1024 of
them to be done per 2D transform, for a total of about twenty million
multiplies. The inverse transform is computationally equivalent to
the forward one, so the FFT approach requires 60 million multiplies,
compared to the million million of the direct method. If the F(P) is
assumed to be precomputed the cost drops to 40 million.
The FFT approach is thus about 20,000 times faster than the
direct convolution method. If the naive ray tracing schemes were
applied to the same scene they would work essentially by the direct
method, though probably in a less orderly fashion. No organization
analogous to the FFT method for wavefronts has been offered for the
ray schemes, so this 20,000 times speedup is a major advantage for the
wave approach.
Getting back to our scene, we have now calculated
@b(WF)@-(ab), the wavefront impinging on plane @b(b). The
transmission coefficients of plane @b(b) (@b(T)@-(b)) which tell of
the attenuation and refraction caused by the substances at @b(b) are
multiplied term by term with the incoming wave front (a mere million
complex multiplies) to generate a modified front, to which is added
the effect of any light sources at @b(b) (@b(L)@-(b)) to give us
@b(WF)@-(b), the wavefront leaving plane @b(b).
@center[@b(WF)@-(b) =
@b(WF)@-(ab)@b(.)@b(T)@-(b)+@b(L)@-(b) =
(@b(WF)@-(a)@b(*)@b(P)@-(ab))@b(.)@b(T)@-(b)+@b(L)@-(b)]
The reflection coefficients of @b(b) (@b(R)@-(b)) are also
multiplied by @b(WF)@-(ab) to obtain the wave reflected back towards
@b(a) from @b(b). We call it a waveback, and designate it
@b(WB)@-(a). @b(WB)@-(a) is stored for future reference.
We are now ready to carry the wavefront from @b(WF)@-(b) to
and through the next plane @b(c) in exactly the same manner in which
we moved it from @b(a) to @b(b):
@center[@b(WF)@-(c) = (@b(WF)@-(b)@b(*)@b(P)@-(bc))@b(.)@b(T)@-(c)+@b(L)@-(c)]
we also save for future reference
@center[@b(WB)@-(b) = (@b(WF)@-(b)@b(*)@b(P)@-(bc))@b(.)@b(R)@-(c)]
(doing the convolution just once for the two results, of course).
Each such move costs about 50 million complex multiplies (=
200 million real multiplications) and can be done in about five
compute minutes on a typical medium sized machine like our VAX 11/780.
With a 15 megaflop array processor like the CSPI MAP-300 the time is
less than 15 seconds.
After a complete sweep across the scene (Figure 2) we have all
the first order lighting effects if the light sources were all in
plane @b(a), and some higher order ones, as when light is scattered by
@b(b) then @b(c) to illuminate @b(d). We now sweep backwards, from
the far end of the scene towards @b(a). Each step is similar to the
forward sweep except that at each plane the waveback left behind in
the first pass is added to the outgoing wavefront, like the light
sources at that plane, simulating the contribution of reflected light
from the last pass. For example, the return sweep modifies the
wavefront leaving plane @b(c) into the one leaving @b(b) by the
transformation
@center[@b(WF)@-(b) =
(@b(WF)@-(c)@b(*)@b(P)@-(ab))@b(.)@b(T)@-(b) + @b(L)@-(b) + @b(WB)@-(a)]
in the same step @b(WB)@-(c), the effect of light from the return sweep
reflected from @b(b) to @b(c) is stored
@center[@b(WB)@-(c) = (@b(WF)@-(c)@b(*)@b(P)@-(ab))@b(.)@b(R)@-(b)]
After several back and forth passes, each involving @b(WB)'s
from the previous pass, the wavefront contains the effects of highly
indirect light. The iteration can simply be stopped after a fixed
number of sweeps, or the wavefront can be checked for significant
changes between forward/back cycles. If @b(WF)@-(a) is nearly the
same before and after two sweeps (by a sum of squares of coefficient
differences, for instance), the process has settled down. Ten passes
is usually enough, though pathological cases like scenes containing
two facing mirrors may require more.
Images are generated from the ultimate wavefront in a final
step by passing the wavefront through a new plane which contains a
simulated lens that projects the image onto a virtual surface where
magnitude (absolute value) of the complex wavefront coefficients are
extracted, scaled and transcribed into a pixel array.
Color pictures are made of three such images, one each for
red, green and blue, with different reflection, transmission and
emission properties of the things in the object planes. The simulated
wavelength is the same in all cases, so the simulated color has
nothing whatever to do with the physiological color property of real
light, which depends on wavelength
Figure 4 contains two planes, and was subjected to eight
wavefront passes (four each way). The first plane contains a coherent
light source which projects a beam onto the scattering "lampshade" in
the second plane. The scattered light reflects back to the first plane
and illuminates the "checkerboard" there, and so on for eight
bounces. A thin lens in the second plane distorts some of the
checkerboard behind it. The two planes are 300 wavelengths apart, and
512 half-wavelengths on a side. Light from the scene was intercepted
by a square lens of the same size 10,000 wavelengths away, which
formed the displayed image yet another 10,000 wavelengths further on.
The illumination is monochromatic, but color is used in one
version of the image to represent the relative phase of the incident
light. The red, green and blue components of the color are varied with
phase like the three components of three phase power; red varies with
the sine of the phase angle, green with the angle shifted 120 degrees,
blue by phase shifted 240 degrees.
We don't yet have an array processor, so the compute time was
about an hour. With an FPS-100 it could have been generated in 8
minutes. CSPI's MAP-300, with twice as many multipliers, could have
done it in 4. It should also be noted that the calculations are
highly suitable for division among parallel processors (most of the
cost is in the 1024 point complex FFT's, and 1024 of these, one for
each row or column, could be carried out in parallel). Since the
scenes contain extended light sources and model multiple indirect
light scattering, all conventional ray methods would have required at
least thousands of compute hours to produce the same result.
@heading(Some Fine Points)
Since the illumination is by simulated monochromatic light,
potentially all in phase, there is a tremendous opportunity for
probably undesirable effects such as interference fringes and "laser
speckle" to make themselves manifest. These are made worse by the
maximally long wavelength of the light chosen to reduce the amount of
computation to the bare minimum required for the image and object
resolution desired.
If the scene is lit mainly by light sources extending over
many half-wavelength cells, most of the problem can be eliminated by
randomizing the phase of the light emitted by the individual cells.
This effectively simulates a spatially non-coherent source, and blurs
interference patterns into oblivion. If the light comes mostly from a
moderate number of very localized regions, the picture can be
recomputed a few times with different relative phases between the
sources, and the results averaged. This simulates phase non-coherence
between the different sources.
If the scene is lit by a very small number of very localized
sources simple randomization of light source phases won't help. If
much of the actual illumination in those scenes arrives indirectly by
scattering of the light from objects of high albedo, randomizing the
reflective and transmissive phase shifts of the scattering objects
will improve things a little.
In general it also helps if objects have "soft" edges which
blend gradually over the distance of a few cells from the properties
of their surrounding to the optical properties of their interiors.
The above techniques remove low frequency interference fringes
but do not significantly affect the high frequency phase interferences
analogous to the "laser speckle" seen when viewing objects in coherent
light. This pixel graininess can be smoothed by computing the image
to higher resolution than needed and averaging, or by recomputing it
several times with different random phases in the extended light
sources or the light scattering surfaces, and averaging the results.
The problems are highly analogous to the "aliasing" problem
encountered in conventional 3D methods.
@heading(Bigger Games)
So far we have dealt with very simple scenes in which all the
matter was confined to a small number of parallel planes. Not
co-incidentally these are particularly suitable subjects for the wave
approach.
More general scenes can be constructed by increasing the
number of matter-containing planes. In the most general case there is
such a plane every half wavelength, and fully three dimensional
objects are described as a stack of parallel slices.
This increases the required computation by a factor of several
hundred, and also creates a major storage problem. A special purpose
Fourier device able to compute a 1024 point transform in 100 usec
would allow fully general scenes with complete illumination to be
calculated in about an hour by the straightforward application of the
above method. It would be necessary to store about one billion
complex numbers, representing the approximately 1000 wavebacks
generated, temporarily during the computation.
Storage for a billion complex numbers (the precision need not
be great) may not be excessive in a machine than can compute at 500
megaflops, but it doesn't make the problem easier. It may be possible
to avoid explicit storage of the wavebacks by combining them as they
are generated, dragging them along backwards with the wavefronts in a
time-reversed way. The composite waveback would be subjected to
transformations inverse to the ones it will experience when the sweep
reverses. Or maybe not. All versions of this idea I've examined have
some unavoidable high order light interactions that do not have
analogs in the physical world.
@heading(Waves and the Oct-Tree)
The above schemes, particularly the fully general one, have an
unpleasant feature. They do as much work on the empty spaces in a
matter-containing plane as they do on the substance. Propagating a
wave through a dense stack of matter planes involves hundreds of
half-wave baby steps, even if the planes contain only a speck of
matter each.
We have seen that a wavefront can be transmitted across an
arbitrarily large gap of empty space by a single FFT convolution. In
fact, it is just as easy to send it through an arbitrarily large mass
of any substance of uniform attenuation and phase shift; only the
propagation coefficients change. Since typical scenes contain large
homogeneous volumes, we may be wasting some effort; computing over
volumes when only the surface areas require the full power of the
incremental method.
The fundamental space underlying our 3D models is a cubic
array of half-wavelength cells. We can organize this into convenient
large regions by the following process. Suppose our scene is
contained in a cube 1024 half wavelengths on a side. If this cube is
homogeneous, stop; we have our simple description. If it is not,
divide it into eight 512 on a side subcubes, and represent it by an
eight way branching node of a tree pointing to the results of the same
procedure applied recursively to the subcubes. The division process
at every level stops if the subcube it is handed if homogeneous, or if
it is a primitive half-wavelength cell (in which case it is
homogeneous by definition).
A wavefront can be propagated through such an "oct-tree"
structure irregularly, advancing through large undivided subcubes
quickly, creeping through the hopefully few heavily subdivided
regions. There is a serious complication.
While the wavefronts in the "stack of planes" approach of the
last section were sent between parallel planes, with light leaving at
the sides assumed lost (or wrapped around, depending on the details of
the convolution), waves leaving at right angles to the principal
wavefront direction in a subcube of the oct-tree must be accounted
for, because it affects adjacent subcubes. Part of the problem is a
matter of bookkeeping, solved by having two wavefront storage arrays
at every boundary surface between distinct subcubes, one for each
direction of light travel (an alternative way of looking at it is that
each distinct subcube of the tree has one outgoing wave coefficient
array on each of its six faces. Incoming waves are stored in the
outgoing wave arrays of adjacent subcubes).
The other half of the problem is calculating the outgoing
wavefront on a surface orthogonal to the plane of the incoming
wavefront description. A wave entering one face of a cube leaves by
one parallel face and four at right angles. While the parallel face
calculation is a simple convolution, the orthogonal ones are not so
easy.
Whereas the propagation coefficients between two co-ordinates
in an incoming and outgoing waveform in a parallel propagation are a
function simply of displacement in X and Y, of the form
@b(F)(@i(x@-(out)-x@-(in),y@-(out)-y@-(in))), the coefficients in an
orthogonal transfer look like
@b(F)(@i(x@-(out)@+(2)+x@-(in)@+(2),y@-(out)-y@-(in))), reflecting
that the rows (say) of the outgoing array are parallel to the rows the
incoming array, but the columns in this 3D situation are at right
angles. In this example the Y direction can be done by an FFT
convolution, but the transformation over the X coordinate must be done
directly (unless I've missed something). For a face size of 1024 by
1024 we must first do 1024 FFT convolutions for the Y direction (30
million multiplies) giving us 1024 vectors each with 1024
elements. These vectors must then each be multiplied by 1024
coefficients and summed into the output array, for a total of over a
thousand million multiplications, a number which dominates the FFT
step. A cube has four such difficult faces, but the calculations can
be shared between parallel pairs of outgoing faces and a 1024 on a
side cube requires a total of about two billion multiplications.
A cube of side @i(n) needs about 2@i(n)@+(3) multiplies.
Since the amount of work thus goes up approximately linearly in the
volume, we gain little by subdivision, and the method is of limited
interest unless a faster way of calculating the sidegoing wavefronts
is found.
@heading(The Grand Synthesis)
Note that in the stack of planes approach, the wavefront
propagation consists of a chain of linear transformations; a waveform
is alternately convolved with a propagation array then multiplied by
an array of transmission coefficients. Both these operations are
special cases of a general linear transform which can be expressed as
a huge array. If we treat our 1024 by 1024 wavefront arrays as a
1024@+(2) (call it a million) element vector (name it @b(V)), then a
general linear transform is a million by million matrix (@b(A)) to
multiply this vector by, plus a million element vector offset (@b(B))
to be added (@b(V'=AV+B)). A convolution is simply a special case of
such a matrix @b(A) where every row is the same as the previous one
except shifted right one place. The transmission coefficients can be
represented in a matrix zero everywhere except on the main diagonal
where the coefficients reside. Light sources are simply represented
in the constant vector @b(B).
Now a chain of linear operations can be condensed into a
single one by multiplying the matrices and transforming and adding the
vectors. This implies that a complex scene consisting of a stack of
planes (or otherwise described) can in principle be multiplied out
into a single matrix and offset. The effect on a wavefront of a scene
so expressed can be determined by a single matrix multiply.
The matrix may contain the effect of multiple passes through
the scene, and thus provide the effect of high order light indirection
in a single multiply. Multiplying a complex wavefront (representing
external illumination) by such a matrix produces an image-bearing
front which shows the scene encoded in the matrix freshly illuminated.
Although manipulation of such matrices, with a trillion
elements, is beyond the means of present day machinery, they may be
practical someday when computers become another factor of one million
more powerful. For 1024 on a side planes they become cheaper than the
explicit scene representation when the stack of planes is more than a
million deep. More importantly, the matrix representation is far more
general than the matter plane approach, and can represent any linear
effect.
@heading(Conclusion)
A new approach to generation of shaded three dimensional
computer imagery has been presented. It models light as wavefronts
rather than rays, and has major computational advantages when
representation of extended light sources and multiply scattered light
is desired. Even so, the simplest applications of the method tax the
resources of present day computers. The saving grace is that
conventional ray techniques don't even come close.
Some half baked future applications were also presented,
useful only when several times more computation power becomes
available. They barely scratch the surface.