Waves not Corpuscles!
Hans P. Moravec
September 22, 1980
@heading(Introduction)
Early work on representation of computer modelled three
dimensional scenery for human consumption took an engineering drawing
approach. Line drawings representing orthographic projections
of all the edges of planar objects came first. Perspective projections
were only slightly harder. In complicated situations the drawing
soon became an uninterpretable jumble of overlapping lines. Removing
lines obscured by the surfaces, the "hidden line removal" problem became
the first major challenge of 3D graphics. The ante was soon raised
when devices capable of display area filling became available.
Simple line drawings could be converted to much more realistic
shaded pictures simply by appropriately coloring in the space
between the lines. The hidden line problem evolved into the
qualitatively different "hidden surface" problem. The realism
of the images was further enhanced by increasingly accurate
illumination models for computing the surface coloration.
Matte and specular reflection was modeled and curved surfaces
were approximated first by the lighting models alone, and
later by actually representing objects with simple curved
primitives, or cubic patches. The absence of shadowing effects
in these otherwise very authentic pictures was soon attacked
with techniques that did hidden surface eliminations from
the light sources as well as the simulated camera.
The stakes have recently gone up again with the
introduction of ray tracing methods that simulate refraction
and specular reflection not just of light source rays, but
of images. Curved mirrors and lenses can now be realistically
simulated.
@heading(Realityward, Ho!)
With each increment in the faithfulness of the process
new properties of real light had to be added. Thus far the
evolution has been in the direction of the particle theory
of light, probably because straight rays preserve most of the
properties of 3D computer graphics' engineering drawing roots.
The latest additions virtually demand a simulation of
multiply rebounding light. A obvious and desirable
next step in the process is the simulation of scattering
as well as specular, reflection of light and imagery from surface
to surface. While a specular/refractive bounce can convert
a ray from a simulated camera or lightsource into two rays,
a scattering can make hundreds or thousands, heading off
in a multitude of directions. Since pictures modelling only
specular bounces themselves
take about an hour of computer time to generate, simulation
of multiple scattering (and incidentally, extended light
sources) is out of reach for the immediate future.
@heading(A Way Out)
One solution may be to go even further in modelling
the physical reality. The ray representation implicitly assumes
a virtually infinite precision in the position of objects
and the heading of rays. This leads to some unnecessary
calculations, as when several rays bouncing from nearly the
same surface point in nearly the same direction (perhaps
originating from different light sources) are propagated
by separate calculations.
We can evade many of these inefficiencies by
using the particle theory's major competitor, the wave
theory of light. Rather than multitudes of rays bouncing
from light sources and among objects, the illumination is done
by complex wavefronts described over large areas. Such wavefronts
can be described by arrays of complex numbers, giving the amplitude
and relative phase of portions of the front, embedded in the
simulated space. Such wavefronts descriptions are very like
physical holograms.
@heading(Simplifying Assumptions)
We will use light of a single
wavelength. This wavelength must be no larger than the smallest
object feature we wish to see. On the other hand it should not
be too small because the amount of computation
increases dramatically as the wavelength drops. To faithfully
represent a wavefront, there must be a coefficient every
half wavelength across the width of a description. Thus the number
of complex numbers needed in a description increases as the
square of the inverse wavelength. We also assume that
only the steady state of the illumination is important;
no impulse effects (true for the everyday world also because light
is so fast).
This assumption allows us to play cavalierly with time. For instance,
a wavefront may be propagated from one plane description to a
parallel one some distance away in a single step, ignoring the
time differences between perpendicular and diagonal paths.
@heading(A Solid Example)
Here we present one way of using the light wave idea. It is
not necessarily the best.
The scene is represented as a box roughly
1000 half wavelengths on each side. (It need not be explicitly stored,
just so you can get at any 1000^2 slice perpendicular to the X direction
at will. It could be in oct-tree form; for my first trick there will just
be a handful of objects analytically described, and the slices will be
computed as needed, and no oct tree. Suppose negative X is called the
left of the box and positive X is on the right face. A wavefront
containing contributions from all the illumination sources is introduced
as a 1000^2 array of complex numbers on the left face. This array
represents the right-travelling half of a full 360 deg (or 4 pi, if you're
the solid type) bidirectional front. The other half (the wave back?) is
carried in a parallel 1000^2 complex array, which probably starts out all
zeros. This pair of arrays is moved, relatively painfully from the left
face to the right, in half-wavelength steps, about 1000 of them. Each step
involves multiplying the wavefront, term by term by coefficients
representing the matter at the slice the wavefront is currently
traversing. Each half wavelength cell of the cube has two complex
coefficients. One says how to modify the piece of the wave front it
encounters. Its magnitude is the attenuation of the material and its phase
shift is the refractive index. The other number is also multiplied by the
wave front coefficient, but then added to the wave back. It represents the
reflectivity. Its phase shift lets you simulate surface roughness (and
thus reflective scattering), without having to actually geometrically
model the roughness. The wave back is then multiplied by the RECIPROCAL of
the wavefront multipliers. This allows us to drag the waveback against its
natural direction of propagation, and reconstruct it on the way back (when
we reach the right face, we will start moving leftward with our
calculation). After this multiplication step (a spatial filter), the
wavefront is convolved with an array of coefficients which describe the
attenuations, phase shifts and mixings that happen as a front moves
through a half-wavelength of empty space to a parallel plane. This is
done by FFT, and involves doing a 2D fourier transform of the spatially
filtered wavefront, multiplying it termwise by a (precomputed) fourier
transform of the propagation coefficients, and doing an inverse FFT of the
result. The waveback is moved forward (against its will) by a similar
process, but using inverted propagation coefficients.
After all that we are ready to creep over the next half-wave,
doing the same operations. After about 1000 such steps, we reach the right
face. The identity of wave back and wave front interchange, and we start
doing the baby steps right to left. After about ten such sweeps we have
pretty high order light indirections simulated. The composite wavefront
resulting can be used to view the scene from virtually any angle; we
essentially simulate lenses with different positions and orientations and
do the appropriate convolutions (infinitely thin lenses are easiest).
@heading(Questions and Answers)
@b(Q): Why drag a double wavefront back and forth? Isn't it silly to
do the work of moving it the wrong way just to bring it back again?
@b(A): The wave back collects the reflections. We could leave these
behind as we creep along and retrieve them on the way back, but each such
cache is a 1000^2 array of complex numbers, and as someone once said, "A
megaword here and a megaword there and pretty soon you're talking about
real memory."
@b(Q): Can't the entire scene be condensed into a single filter,
rather than as a chain of alternate space and frequency domain filters, as
you've done.
@b(A): In principle yes, but not really. The space domain filter which
has only as many coefficients as there are in the wave description and the
propagation filter which has the same property in the frequency domain are
very special cases of a general filter where there is a different
coefficient for the contribution of each piece of the input front to each
piece of the output. Think of the input and output as n element vectors;
the general filter is an n by n array. Light sources are a constant vector
to be added to the product of the input vector and this array. In this
formulation, in the space domain, the spatial filter is zeros everywhere
but on the main diagonal. The propagation filter is a matrix where every
row is the previous one rotated one place to the right. In the frequency
domain the nature of these matrices is swapped; the space filter becomes a
rotated row array, and the propagation step is a diagonal matrix. The
diagonal form is very easy to multiply by, thus the FFT steps. This
important property is lost as soon as we begin multiplying them by one
another. We end up with (for the 1000 by 1000 fronts) a matrix with one
trillion coefficients, which must be explicitly represented. Though such a
matrix can represent any possible scene in its entirety, and can be
subjected to arbitrary illumination by multiplying it by an illumination
wavefront, the scale is out of hand. The "simple" vector multiplication
requires 10^18 multiplications, even if we had somewhere to store the
trillion element matrix. I've tried commuting the matrix chain to
condense at least some of them together while preserving the diagonal
property, but this does not appear possible. Matrix transposes keep
getting in the way.
@heading(Shortcuts and Notes)
The wavefront can be moved across an entirely empty gap in a
single FFT step by raising each coefficient in the FFT of the half
wavelength propagation filter by an integer power equal to the number of
half wavelengths in the gap width.
At least about 1000 by 1000 wavelengths in cross section is needed
for a good resolution picture because you can't get more pixels in an
image than there elements in the wavefront (if you try they get very
blurry).
Color is generated by doing the imaging three times for three
different kinds of light (of the same wavelength, but of different
colors!). The scene properties are different for the different light
types.
Anti-aliasing happens automatically (in fact you can't avoid it;
the image is generated only to the resolution you really need).
Diffraction effects will be seen.
For full diffuse and scattered indirect lighting the wave method
is much more efficient than ray tracing. The n log n FFT replaces a step
that (at least naively) must be done in n^2 time with ray tracing.
Calculations that involve coincident rays generated by scattering from
different illumination sources off the same surface points are handled
separately in all existing ray models. In the wave formulation, they
additively merge into a single wavefront almost immediately, and do not
require redundant calculations. Note that though the complexity of the
wavefront increases as we sweep it back and forth over the scene, the
number of coefficients in it does not. The wave model carries only the
essential information.
@heading(Costs)
For a 1000^3 scene most of the work in illumination will be in the
8 million 1000 point (one dimensional) FFT's per sweep. Each FFT requires
20,000 scalar multiplications. At 10 us per multiplication plus overhead,
we have about 2 million seconds or 500 hours per sweep. Though this is
better than ray tracing for full illumination, its clear that the first
experiments will be made to much smaller resolutions, and the pictures
will be fuzzy!
While ray tracing can be sped up with specially designed hardware
which is currently nowhere in existence, the wave method is excellently
suited for commercially available signal processing boxes. Some of these,
using a combination of pipelining and parallelism can execute effectively
100 million scalar multiplies per second when doing suitable tasks (like
FFTs!). This is 1000 times faster than we assumed for a general purpose
machine above, and cuts the sweep time to a reasonable half hour. Multiple
passes will still take several hours, but remember that once we have our
wavefront, we can generate many views of the same scene. The method, of
course, is perfectly able to make use of more than one such signal
processor, and speeds up linearly in their number.
@heading(Postscript)
Typical array processors, as available from FPS or CSPI,
provide only about 12 million floating point operations per
second, at a cost of about $50K.
There is a bug in the scheme outline above.
Because the waveback is brought past the places where reflections
generate it, it can have non-physical effects.
For instance, if the system is modelling a convex shoplifting
mirror being struck by a plane wave, the reflected wave diverges
from the mirror as it should. But,
before that happens, the reflected light is dragged behind the mirror
to the far end of the modelled space. While being so dragged it
undergoes the inverse effects of what will happen to it
on the return trip (for instance, in passing backwards
through a very dense material, its amplitude is greatly increased,
to compensate for the attenuation that it will experience on the way back).
But since reflections are also computed on the return journey,
this virtual light can have visible effects.
In particular, there seems to be a virtual focus behind the
mirror where the plane wave converges. This focus will be act as a light
source in the model as described, and illuminate the nearby scenery.
The bug can be eliminated by storing the reflected portions of the
wavefront at the planes of reflection, but this requires an
enormous amount of storage. Perhaps reflections should be
calculated only on the back sweeps, with the forward wave
being reconstructed. This problem requires more thought.
Though generating views of general scenes at 1000^3 resolution
is computationally out of reach for an unaided VAX, certain special
scenes can be handled quickly to high resolution.
In particular, if the scene consists of a small number
(like three or four) planes parallel to the direction of
the wave windows, a pass can be made in just a few (n-1, where
n is number of matter-bearing planes) FFT steps. On a VAX each
FFT step for a 1000^2 array should take about a minute, so
each pass of the wavefront takes two or three minutes. Also,
to prevent the virtual focus defect, one needs to save the
reflected wavefronts only at each matter plane.
These planes are thin, but can be pretty interesting. They can
contain transparent, translucent and opaque regions, and lenses and
mirrors. Even though perfectly flat, the lenses do interesting things
if their refractive index varies across their radius. The mirrors can
also seem to be curved because varying phase shifts can be introduced
in bounces off different parts of them. Doing the FFT steps in the
most straightforward way causes the pattern to be apparently repeated
indefinitely in all directions, so the scene would appear to be a set
of parallel infinite planes.
I have in mind a scene that has a checkerboard at the bottom, with a
few opaque squares and lenses and mirrors hovering above it at a couple
of levels. The view would essentially be from above.