Waves not Corpuscles! Hans P. Moravec September 22, 1980 @heading(Introduction) Early work on representation of computer modelled three dimensional scenery for human consumption took an engineering drawing approach. Line drawings representing orthographic projections of all the edges of planar objects came first. Perspective projections were only slightly harder. In complicated situations the drawing soon became an uninterpretable jumble of overlapping lines. Removing lines obscured by the surfaces, the "hidden line removal" problem became the first major challenge of 3D graphics. The ante was soon raised when devices capable of display area filling became available. Simple line drawings could be converted to much more realistic shaded pictures simply by appropriately coloring in the space between the lines. The hidden line problem evolved into the qualitatively different "hidden surface" problem. The realism of the images was further enhanced by increasingly accurate illumination models for computing the surface coloration. Matte and specular reflection was modeled and curved surfaces were approximated first by the lighting models alone, and later by actually representing objects with simple curved primitives, or cubic patches. The absence of shadowing effects in these otherwise very authentic pictures was soon attacked with techniques that did hidden surface eliminations from the light sources as well as the simulated camera. The stakes have recently gone up again with the introduction of ray tracing methods that simulate refraction and specular reflection not just of light source rays, but of images. Curved mirrors and lenses can now be realistically simulated. @heading(Realityward, Ho!) With each increment in the faithfulness of the process new properties of real light had to be added. Thus far the evolution has been in the direction of the particle theory of light, probably because straight rays preserve most of the properties of 3D computer graphics' engineering drawing roots. The latest additions virtually demand a simulation of multiply rebounding light. A obvious and desirable next step in the process is the simulation of scattering as well as specular, reflection of light and imagery from surface to surface. While a specular/refractive bounce can convert a ray from a simulated camera or lightsource into two rays, a scattering can make hundreds or thousands, heading off in a multitude of directions. Since pictures modelling only specular bounces themselves take about an hour of computer time to generate, simulation of multiple scattering (and incidentally, extended light sources) is out of reach for the immediate future. @heading(A Way Out) One solution may be to go even further in modelling the physical reality. The ray representation implicitly assumes a virtually infinite precision in the position of objects and the heading of rays. This leads to some unnecessary calculations, as when several rays bouncing from nearly the same surface point in nearly the same direction (perhaps originating from different light sources) are propagated by separate calculations. We can evade many of these inefficiencies by using the particle theory's major competitor, the wave theory of light. Rather than multitudes of rays bouncing from light sources and among objects, the illumination is done by complex wavefronts described over large areas. Such wavefronts can be described by arrays of complex numbers, giving the amplitude and relative phase of portions of the front, embedded in the simulated space. Such wavefronts descriptions are very like physical holograms. @heading(Simplifying Assumptions) We will use light of a single wavelength. This wavelength must be no larger than the smallest object feature we wish to see. On the other hand it should not be too small because the amount of computation increases dramatically as the wavelength drops. To faithfully represent a wavefront, there must be a coefficient every half wavelength across the width of a description. Thus the number of complex numbers needed in a description increases as the square of the inverse wavelength. We also assume that only the steady state of the illumination is important; no impulse effects (true for the everyday world also because light is so fast). This assumption allows us to play cavalierly with time. For instance, a wavefront may be propagated from one plane description to a parallel one some distance away in a single step, ignoring the time differences between perpendicular and diagonal paths. @heading(A Solid Example) Here we present one way of using the light wave idea. It is not necessarily the best. The scene is represented as a box roughly 1000 half wavelengths on each side. (It need not be explicitly stored, just so you can get at any 1000^2 slice perpendicular to the X direction at will. It could be in oct-tree form; for my first trick there will just be a handful of objects analytically described, and the slices will be computed as needed, and no oct tree. Suppose negative X is called the left of the box and positive X is on the right face. A wavefront containing contributions from all the illumination sources is introduced as a 1000^2 array of complex numbers on the left face. This array represents the right-travelling half of a full 360 deg (or 4 pi, if you're the solid type) bidirectional front. The other half (the wave back?) is carried in a parallel 1000^2 complex array, which probably starts out all zeros. This pair of arrays is moved, relatively painfully from the left face to the right, in half-wavelength steps, about 1000 of them. Each step involves multiplying the wavefront, term by term by coefficients representing the matter at the slice the wavefront is currently traversing. Each half wavelength cell of the cube has two complex coefficients. One says how to modify the piece of the wave front it encounters. Its magnitude is the attenuation of the material and its phase shift is the refractive index. The other number is also multiplied by the wave front coefficient, but then added to the wave back. It represents the reflectivity. Its phase shift lets you simulate surface roughness (and thus reflective scattering), without having to actually geometrically model the roughness. The wave back is then multiplied by the RECIPROCAL of the wavefront multipliers. This allows us to drag the waveback against its natural direction of propagation, and reconstruct it on the way back (when we reach the right face, we will start moving leftward with our calculation). After this multiplication step (a spatial filter), the wavefront is convolved with an array of coefficients which describe the attenuations, phase shifts and mixings that happen as a front moves through a half-wavelength of empty space to a parallel plane. This is done by FFT, and involves doing a 2D fourier transform of the spatially filtered wavefront, multiplying it termwise by a (precomputed) fourier transform of the propagation coefficients, and doing an inverse FFT of the result. The waveback is moved forward (against its will) by a similar process, but using inverted propagation coefficients. After all that we are ready to creep over the next half-wave, doing the same operations. After about 1000 such steps, we reach the right face. The identity of wave back and wave front interchange, and we start doing the baby steps right to left. After about ten such sweeps we have pretty high order light indirections simulated. The composite wavefront resulting can be used to view the scene from virtually any angle; we essentially simulate lenses with different positions and orientations and do the appropriate convolutions (infinitely thin lenses are easiest). @heading(Questions and Answers) @b(Q): Why drag a double wavefront back and forth? Isn't it silly to do the work of moving it the wrong way just to bring it back again? @b(A): The wave back collects the reflections. We could leave these behind as we creep along and retrieve them on the way back, but each such cache is a 1000^2 array of complex numbers, and as someone once said, "A megaword here and a megaword there and pretty soon you're talking about real memory." @b(Q): Can't the entire scene be condensed into a single filter, rather than as a chain of alternate space and frequency domain filters, as you've done. @b(A): In principle yes, but not really. The space domain filter which has only as many coefficients as there are in the wave description and the propagation filter which has the same property in the frequency domain are very special cases of a general filter where there is a different coefficient for the contribution of each piece of the input front to each piece of the output. Think of the input and output as n element vectors; the general filter is an n by n array. Light sources are a constant vector to be added to the product of the input vector and this array. In this formulation, in the space domain, the spatial filter is zeros everywhere but on the main diagonal. The propagation filter is a matrix where every row is the previous one rotated one place to the right. In the frequency domain the nature of these matrices is swapped; the space filter becomes a rotated row array, and the propagation step is a diagonal matrix. The diagonal form is very easy to multiply by, thus the FFT steps. This important property is lost as soon as we begin multiplying them by one another. We end up with (for the 1000 by 1000 fronts) a matrix with one trillion coefficients, which must be explicitly represented. Though such a matrix can represent any possible scene in its entirety, and can be subjected to arbitrary illumination by multiplying it by an illumination wavefront, the scale is out of hand. The "simple" vector multiplication requires 10^18 multiplications, even if we had somewhere to store the trillion element matrix. I've tried commuting the matrix chain to condense at least some of them together while preserving the diagonal property, but this does not appear possible. Matrix transposes keep getting in the way. @heading(Shortcuts and Notes) The wavefront can be moved across an entirely empty gap in a single FFT step by raising each coefficient in the FFT of the half wavelength propagation filter by an integer power equal to the number of half wavelengths in the gap width. At least about 1000 by 1000 wavelengths in cross section is needed for a good resolution picture because you can't get more pixels in an image than there elements in the wavefront (if you try they get very blurry). Color is generated by doing the imaging three times for three different kinds of light (of the same wavelength, but of different colors!). The scene properties are different for the different light types. Anti-aliasing happens automatically (in fact you can't avoid it; the image is generated only to the resolution you really need). Diffraction effects will be seen. For full diffuse and scattered indirect lighting the wave method is much more efficient than ray tracing. The n log n FFT replaces a step that (at least naively) must be done in n^2 time with ray tracing. Calculations that involve coincident rays generated by scattering from different illumination sources off the same surface points are handled separately in all existing ray models. In the wave formulation, they additively merge into a single wavefront almost immediately, and do not require redundant calculations. Note that though the complexity of the wavefront increases as we sweep it back and forth over the scene, the number of coefficients in it does not. The wave model carries only the essential information. @heading(Costs) For a 1000^3 scene most of the work in illumination will be in the 8 million 1000 point (one dimensional) FFT's per sweep. Each FFT requires 20,000 scalar multiplications. At 10 us per multiplication plus overhead, we have about 2 million seconds or 500 hours per sweep. Though this is better than ray tracing for full illumination, its clear that the first experiments will be made to much smaller resolutions, and the pictures will be fuzzy! While ray tracing can be sped up with specially designed hardware which is currently nowhere in existence, the wave method is excellently suited for commercially available signal processing boxes. Some of these, using a combination of pipelining and parallelism can execute effectively 100 million scalar multiplies per second when doing suitable tasks (like FFTs!). This is 1000 times faster than we assumed for a general purpose machine above, and cuts the sweep time to a reasonable half hour. Multiple passes will still take several hours, but remember that once we have our wavefront, we can generate many views of the same scene. The method, of course, is perfectly able to make use of more than one such signal processor, and speeds up linearly in their number. @heading(Postscript) Typical array processors, as available from FPS or CSPI, provide only about 12 million floating point operations per second, at a cost of about $50K. There is a bug in the scheme outline above. Because the waveback is brought past the places where reflections generate it, it can have non-physical effects. For instance, if the system is modelling a convex shoplifting mirror being struck by a plane wave, the reflected wave diverges from the mirror as it should. But, before that happens, the reflected light is dragged behind the mirror to the far end of the modelled space. While being so dragged it undergoes the inverse effects of what will happen to it on the return trip (for instance, in passing backwards through a very dense material, its amplitude is greatly increased, to compensate for the attenuation that it will experience on the way back). But since reflections are also computed on the return journey, this virtual light can have visible effects. In particular, there seems to be a virtual focus behind the mirror where the plane wave converges. This focus will be act as a light source in the model as described, and illuminate the nearby scenery. The bug can be eliminated by storing the reflected portions of the wavefront at the planes of reflection, but this requires an enormous amount of storage. Perhaps reflections should be calculated only on the back sweeps, with the forward wave being reconstructed. This problem requires more thought. Though generating views of general scenes at 1000^3 resolution is computationally out of reach for an unaided VAX, certain special scenes can be handled quickly to high resolution. In particular, if the scene consists of a small number (like three or four) planes parallel to the direction of the wave windows, a pass can be made in just a few (n-1, where n is number of matter-bearing planes) FFT steps. On a VAX each FFT step for a 1000^2 array should take about a minute, so each pass of the wavefront takes two or three minutes. Also, to prevent the virtual focus defect, one needs to save the reflected wavefronts only at each matter plane. These planes are thin, but can be pretty interesting. They can contain transparent, translucent and opaque regions, and lenses and mirrors. Even though perfectly flat, the lenses do interesting things if their refractive index varies across their radius. The mirrors can also seem to be curved because varying phase shifts can be introduced in bounces off different parts of them. Doing the FFT steps in the most straightforward way causes the pattern to be apparently repeated indefinitely in all directions, so the scene would appear to be a set of parallel infinite planes. I have in mind a scene that has a checkerboard at the bottom, with a few opaque squares and lenses and mirrors hovering above it at a couple of levels. The view would essentially be from above.