The Media Abstract Types

DirectX Media for Animation Media Types

Previous

TOC

Index

Next

The Media Abstract Types

Image Type , Geometry Type , Sound

This section describes the abstract notion of image, geometry, and sound in Direct Animation and the operations on these types. There are operations (such as importing an image or 3D model) for constructing basic values in these types. There are other operations for constructing richly-attributed values such as producing a globe as a textured sphere. Still other operations create composite types such as combining sound and geometry into audible geometry.

The media abstractions provide a conceptual way to think of the media types, how they vary in relation to time, and the variety of operations on these types, without exposing any of the implementation details. This eases the construction of mixed media animations and enables interesting optimizations and a wide space of implementation possibilities.

In general, the way the abstractions are defined and the way they are implemented are quite different. The former is designed to facilitate thinking and reasoning about the media types while the latter is intended for maximizing performance while being as true as is feasible to the abstractions. This difference between abstraction and implementation is at the heart of the declarative approach in Direct Animation.

In this section we discuss the different media types separately and the interesting operations on these types. In the next section we discuss the mixed media operations across these types.

Image Type

A value of the abstract type image has the following properties:

It is spatially infinite. That is, it has no edges or inherent boundaries. (Note that the boundingBox() method extracts a region of an image outside of which everything is fully transparent (opacity = 0) and not tangible (according to the new proposal for crop/tile/detectableEmptyImage). This does not imply that the image is non-existent outside of this bounding box.
It is of infinite resolution. Images are spatially continuous. They conceptually do not have discrete pixel values. This, along with the 'spatially infinite' property, means that the domain of an image is all of R ².
Conceptually, each of the infinitely many points in the image has associated with it:
An opacity value. Opacity values range from 0 to 1, with 1 being completely opaque, and 0 being completely transparent.
A continuous color in no particular color space.
A detectability value. This is either TRUE or FALSE. If FALSE, the image is not detectable under probing at that point.
A multi-channel sound, emanating equally from all positions in the image.

Each of the operations on images and attributes for images is described in terms of this abstract model. It is important to note that we use the term image in the broad sense to include an imported bit map, a rendered text, or a rendered triangle. All 2D geometric primitives are mapped into images, and can be thought of as constructors of images.

We will now discuss the image coordinate system, ways to construct images, and, later, the operations of the image type.

The Image Coordinate System

The image coordinate system is where images live and is, therefore, called the image plane. It has the following characteristics:

Coordinates are measured in meters.
The x-axis is the horizontal axis, increasing to the right.
The y-axis is the vertical axis, increasing upwards.

The following diagram demonstrates this coordinate system. Image Coordinate System

Note that all image and 2D geometric primitives live in this same continuous coordinate system. This includes all of the vector2, point2, transform2, path2, matte, text, montage, and image values and operations that are discussed below.

On its own, the image plane is of infinite extent, with an origin and X and Y-axes. Direct Animation images are constructed in this abstract coordinate system. However, when it is time to display a Direct Animation image, a certain section of the infinite plane is mapped on a region of a display device, as shown in the following figure. Image Plane

The selection of the display region is up to the programmer and is outside the scope of Direct Animation. The mapping from the image plane into the display region is straightforward. The origin of the image plane is mapped to the center of the region, and then the mapping happens in like units of measure. For example, if there is a red point 2 CMS above the origin in the image plane, this point would map to a red pixel 2 CMS above the center of the display region. (Note that while commonly displayed regions are rectangular in shape, this is not necessarily the case with windowless Applets or ActiveX controls. Direct Animation intends to support such non-rectangular regions.)

If the author wants to view an image that maps outside the display region, then 2D transformations must be used to construct a new image that does fit within that region. For instance, if the author is interested in viewing the range (10,10) (15,15), he or she could construct two transformations. The first would translate this region's center (12.5,12.5) to the origin, and the second would scale the region to fit the display region. This is shown in the following example:


Transfrom2Bvr centeringXf = translate(toBvr(-12.5), toBvr(-12.5));
Transfrom2Bvr scalingXf = scale(div(viewportUpperRight.getX(), toBvr(2.5)),
         			div(viewportUpperRight.getY(), toBvr(2.5));
ImageBvr newImg = origImg.transform(compose(scalingXf, centeringXf));

Note that compose is a transform composition operation and transforms apply from right to left. In other words, the last expression above is equivalent to:


Imagebvr newImg = origImg.transform(centeringXf).transform(scalingXf);

Construction of Images

There are several ways to construct values of the abstract image type. Once these values are constructed they are identical in terms of the operations that are applicable on them.

Constants of Type Image

An emptyImage is the null image. As are all images, it is of infinite extent. It is also transparent and undetectable throughout.

A detectableEmptyImage is transparent and detectable throughout. An image is said to be tangible at a certain point if it is either detectable or non-transparent at that point. Otherwise, the point is said to be non-tangible. Also, see the cropping function described below.

Importation of Images

A basic way to construct images is through the importation of bitmaps represented in commonly used formats. This provides good leverage from the multitude of image authoring tools that currently exist. The form for importing an image is:


ImageBvr im = importImage("foo.jpg");

Direct Animation supports the importation of bitmaps in the .JPG, .GIF, and .BMP formats. The string reference to the image could be any valid URL. The obtained result is as follows:

The bitmap is imported and returned to the user as a value of type image.
The resultant image is centered at the origin (0,0).
Only the bitmap portion of the image is detectable.
The resultant image, when displayed, and provided it has not been scaled, sheared, or rotated (although translations are acceptable), will map to the display window in a one-to-one pixel correspondence with the bitmap in the file from which it was read.
The resultant bounding-box (extractable via the boundingBox method) surrounds the tangible region of the image and is in meters.

Upon first reading, this may sound pedestrian. However, remember that the relationship between pixels and meters changes on a display-resolution by display-resolution basis. Thus, while one display may have a ratio of 18000 pixels per meter, another display may have a ratio of 22000:1. This means that the bounding box returned for an imported image varies depending on the specific display resolution. However, this is not true for all of the synthetic image constructors that we will discuss later in the document.

There are ways to work around the device-dependency of imported bit-maps. They involve scaling the bit-map in a device-dependent way so that the resultant image size is of some fixed, desired value across the different device resolutions. This topic is discussed in more detail in the section on mixed media. We will now discuss methods for constructing synthetic images.

Solid Color Images

One of the simplest images that can be constructed is a solid color image, as shown in the following example:


ImageBvr im = solidColorImage(color);

This results in an infinite-extent image with the color color behavior. Typically, such an image is either cropped or clipped (see image operations, discussed below), before it is used. Solid color images are commonly used to obtain solid-colored polygons or as viewport backgrounds, where the color can be time-varying.

Gradient Fill Images

Gradient fills provide a very compact form of interestingly colored images, where regions are filled with smoothly interpolated colors between specified colors at given vertices. The most general form of a gradient fill is obtained through the gradientPolygon(Point2s, Colors) method, which specifies an array of points and an array of colors. The arrays specify a triangular mesh with one color per vertex. The resultant image is based on a linear interpolation of the colors across each triangle in the RGB color model. Although interpolation in other color models may be desired, we map this CPU intensive operation to hardware, which currently only performs the interpolation in RGB.

Other more specialized forms of gradient fills, which are essentially shortcuts based on the general form described above, include:

The gradientSquare(color1, color2, color3, color4) method, which generates a unit square centered at the origin and with dimensions of 1 unit. The four colors are averaged into a color at the intersection of the diagonals, and then the four triangles demarcated by the diagonals have their vertex colors interpolated in the traditional fashion.
The gradientQuadrant(color1, color2, num) method, which dissects the first quadrent of a unit disk into num equally sized triangles. The color1 parameter is assigned to the center point and the color2 parameter is assigned to the outside vertices. Therefore, num indicates how finely to tessellate the circular arc before interpolating the colors.

Rendering Lines

Images can be constructed by rendering geometric 2D lines, as shown in the following example:


ImageBvr im = strokeImage(path, style, color);

In this example:

path is of type Path2Bvr. This is an abstract data type with rich and useful operations for constructing paths which can be either piecewise linear or curved.
style is the style of rendering of the path, and controls ends, joins, dashing type, width, and whether or not the line is closed.
color is the color of the path.

An attributed and colored version of the given path is rendered into an image according to the path's coordinates. Both path and style are abstract data types with their own operations for construction.

There is a shortcut to render a polyline into an image. It is:


ImageBvr im = polyline(numPts, points, color);

This avoids the steps of explicitly constructing a path and a style, where the path is simply the sequence of points. (The default line style has flat ends, beveled joins, dash solid, path not closed, and width of 2 points (.0076 meters or 2/72 inches).

Rendering Text:

Images can be constructed by rendering text, as shown in the following example:


ImageBvr im = text.render();

Here, text is of type TextBvr. This is an abstract type with attributes including font type, color, and style (bold or italic). A common constructor for text is simpleText, which takes a StringBvr and returns a TextBvr. For example:


simpleText(toBvr("any string"))

Also, toString is a method on NumberBvrs that converts the number to a StringBvr. The text, potentially attributed, is rendered into an image with its extent centered at the origin. By default, the text's color is black, its font family is serif proportional, and it is neither in bold nor italic. It will be the same size as the font that is used in the shell for title bars and other widgets. This means that the size will depend on the particular setting on the machine. See the section below on meter-based spaces to see how to render and scale the text into a resolution-independent size.

An example of rendering text and of the toString method is shown below:


TextBvr clockText = simpleText(localTime.toString(toBvr(0))).
bold().
color(red).
render().
 			transform(translate(posX, posY));

In this example, we construct a StringBvr of a time-varying integer based on localTime. From this, we construct a bold, red TextBvr and from this we obtain an image that we position appropriately.

Rendering Geometry

Images can be constructed by rendering geometry, as shown in the following example:


ImageBvr im = renderGeometry(geo, camera);

The details are in the next section on 3D geometry.

Using a Montage

An image can be the rendering of a montage, which consists of a list of ImageBvrs, each with a NumberBvr that indicates its Z-order in the list. As the Z NumberBvr changes with time, the order in which the images are composited changes accordingly. Montages are ideal for situations where the layering of the different images is dynamic.

Operations on Images

The set of Direct Animation operations can generate new and more interesting images from given images. Also, some of these operations generate useful data such as bounding-box information about a given image.

Transformations

A transformed version of an image can be obtained by applying im.transform(xf), where xf is of type Transform2Bvr and can be an arbitrary 2D affine transform. The Transform2Bvr type has operations for constructing scales, translates, rotates, and shears from basic parameters such as NumberBvrs and Vector2Bvrs. It is also possible to construct transforms based on 2x3 matrices. There are additional operations to compose transforms, obtain their inverse, and check whether or not they are singular.

In the following example, we obtain a jumping rabbit from a rabbit sprite by using a behavior of type transform:


ImageBvr rabbit = importImage("rabbit.gif");
Transform2Bvr jumpTransform = ...;		// some jumping reactive behavior 
ImageBvr jumpingRabbit = rabbit.transform(jumpTransform);

Opacity

An image with a specific opacity, (which is potentially time-varying), can be obtained by applying im.opacity(opac) where opac is a NumberBvr. For opac 0 the resulting image is fully transparent, for opac ]0, 1[ the result is proportionally opaque, and for opac 1 the result is fully opaque.

Crop

Cropping is obtained by applying im.crop(lowLeftPt, upRightPt).This results in the tangible (either detectable or non-zero opacity) portion of the subject image that falls within the two given points of type Point2Bvr.

Tile

When applied to an image, im.tile(), tile infinitely replicates the image in all four directions. Typically, tile is followed either with a crop or a clip operation, or else it is used to generate the background of a viewport, and is, therefore, implicitly clipped while it is displayed.

Clip

Clipping is obtained by applying im.clip(matte), where matte is a MatteBvr. This is itself an abstract type with operations for constructing interesting matte regions. The resulting image is the portion of the original image that falls in the matte region and is otherwise transparent and undetectable.

A shortcut to clipping an image with a polygonal region is im.clipPolygon(num, points), which is a shorthand method for generating a polygonal matte and doing a clip.

Overlay

An overlay takes two images as parameters and returns an image that is made by overlaying the first image on top of the second. Blending takes place where the images overlap and are partially opaque. This is commonly referred to as alpha blending.

Miscellaneous methods:

The im.boundingBox() method returns the bounding box of the parts of the image that are tangible. This means either detectable or non-transparent. The im.undetectable() method returns the same image with its detectability channel set to false throughout.

Geometry Type

A value of type geometry has the following properties:

It is spatially infinite. This means it has no edges or inherent boundaries.
It is of infinite resolution. Geometries are spatially continuous. They conceptually do not have discrete voxel values. This, along with the 'spatially infinite' property, means that the domain of a geometry is all of R ³.
Conceptually, each of the infinitely many points in the geometry has associated with it:
An opacity value. Opacity values range from 0 to 1, with 1 being completely opaque, and 0 being completely transparent. In addition, for non-transparent points, we have the following:
A color.
A normal (used for lighting computations).
A texture coordinate.
A detectability value which is either TRUE or FALSE. If FALSE, the geometry is not detectable under probing at that point.
A single-channel sound emanating from that point (in general, this is silence).

Each of the operations on the geometry type and attributes for the geometry type is described in terms of this abstract model. We will now discuss the geometry coordinate system, ways to construct geometry, and, finally, the operations on geometry.

The Geometry Coordinate System

The geometry coordinate system is a right-handed, 3D coordinate system of infinite extent where geometric models are specified and transformed. The default unit of measure is the meter. It has an origin and X, Y, and Z-axes. The coordinate system is a right-handed one, as shown in the following drawing: Geometry Coordinate System

The following drawing illustrates the direction of positive rotation along an axis of rotation (the arrow points away from the origin):

Direct Animation geometric models are constructed in this abstract coordinate system. However, when it is time to display a model, a camera is needed to project the infinite space into an infinite plane, which is the image plane discussed in the section on images.

In other words, the 2D image plane is identical to the 3D projection plane. This is only reasonable because the result of projecting geometry via a camera is an image. The resulting abstract image is amenable to the same operations and rules as other images, including the display through a window.

Construction of Geometry

The simplest way to construct geometry is by importing a geometry that is in either the .WRL or the .X format. This is done with importGeometry(URL). Another way to construct geometry is from lights, including ambient, directional, point, and spot lights. It is also possible to construct geometry from a sound source, which is used to give 3D spatial characteristics to a sound by embedding it in other geometric models. Finally, emptyGeometry is a constant of the GeometryBvr type and is vacuous throughout.

Operations on Geometry

Two geometry values can be aggregated/combined into one via the union operation. Geometry can also be transformed using arbitrary 3D affine transforms.

Finally, geometry can be:

Attributed with an opacity value.
Textured with an image.
Given a diffuse, emissive, and specular attributes with color parameters.
Given a specular component parameter with a number parameter.

All of the above parameters can be time-varying.

Rendering of Geometry

Probably the most interesting operation on geometry is to render it with a camera and obtain an image in the image plane. Related operations include rendering it into a microphone to extract a sound, or into a pair of microphones to extract stereo sound.

The operation that constructs an image from a geometry is called render, and it takes both a geometry value and a camera value. The operation does a standard rendering/projection of the geometry by the camera using Direct3D as an underlying 3D API.

There are two cameras that are supported by Direct Animation. They are:

The perspectiveCamera for perspective projection.
The parallelCamera for parallel projection.

These cameras consist of the following three elements:

The image plane, which is the XY-plane, as discussed above.
A near clip plane that is parallel to the XY-plane.
A projection point for perspectiveCamera that lies on the Z-axis, and a projection direction for parallelCamera that is the Z-direction.

The cameras are constructed by the following methods:


public class CameraBvr extends Behavior {
public static CameraBvr perspectiveCamera(NumBvr n, NumBvr p);
public static CameraBvr parallelCamera(NumBvr n);
}

The parameter n is the distance of the near plane and the parameter p is the distance of the projection point along the +Z-axis in relation to the origin (that is, the image plane). Both n and p must be positive, and the near plane must be on the same side as the image plane in relation to the projection point. A runtime error will result if this is not the case.

Both n and p can be time-varying because they are NumBvr. This is convenient for changing camera parameters. For example, varying p changes the perspective effect. Additionally, n can be made to vary with p. The distance of the near plane to the projection point is typically a fixed fraction of the range of dwelling of the geometry (this is where the geometry should reside to avoid clipping). Also, the choice of the near clipping plane must be tempered by Z-buffer resolution issues. (This is discussed later in the document.) A camera can be attributed by a far clipping plane as well. This is primarily for the same Z-buffer resolution issues.

Rendering involves projecting the geometry that resides on the opposite side of the near plane, in relation to the projection point, into an image on the image plane. The resultant image is infinite in extent, and, if the author is interested in only a section of it, the author must use 2D operations such as crop or clip on the image to extract that section.

Direct Animation cameras have an origin at [0,0,0] and can be transformed using standard Transform3Bvr values. Note that such transformations affect the placement of the projection point, the near plane, and the image plane. Translation and rotation can be used for positioning the camera relative to the geometric model being projected.

The X and Y scales can be used to zoom in or out in relation to the rendered model. Increasing the XY scale results in a zoom out (the rendered image shrinks), while decreasing the XY scale results in a zoom in (the rendered image grows). Controlling the perspective effect is achieved by changing p. The closer the projection point is to the projected object, the more pronounced is the perspective effect.

Z precision issues

The rendering engine determines a far plane that is parallel to the near plane, at the furthest point, in relation to the near plane, of the model being viewed. The media engine then formulates a camera projection transform that maps the space between the near plane and the far plane into the available Z buffer resolution, which, in the case of Direct3D, is 16 bits. This has two ramifications:

The near plane should not be too close to the projection point. The way perspective projection works is that the Z resolution is used up at an increasingly higher rate the closer the projected model is to the projection point. This implies that using too much of the buffer in the neighborhood of the near plane leaves too little of it for the rest of the space where the viewed 3D model might reside. For this reason the near plane is made into an explicit parameter to the camera instead of being implicitly at the projection point (this would be too close from a Z-buffering-resolution point of view).
Objects should not wander too far from the near plane. Having a wide range between the near and far planes will cut down on the available Z resolution per unit of distance in world coordinates. Reaching the limits of this resolution results in distant objects doing discrete jumps in the Z direction.

The following figure shows reverse Z-mappings, where equal Z-units of distance in display coordinates (post-viewing) are mapped into markers on a Z-axis in world coordinates (pre-viewing). The figure also shows the impact on the Z-resolution of moving the near plane closer to the projection point. Z Mappings

For the user to bound Z-resolution, we provide a far-plane attribute (which means it is an optional parameter) that can be applied to cameras. A programmer, by specifying this attribute, would be explicitly indicating that objects beyond a certain distance in relation to the near plane would get clipped. Effectively, this attribute is a means for programmers to place a bound on the minimal Z resolution, measured in meters, in the range of interest. Actually, an alternative way to specify this attribute is also provided. This is by directly specifying a bound on the Z resolution, and having the system select the far clipping plane accordingly.

Sound

A value of type sound has the following properties:

It is an infinite, one-dimensional n-tuple sound wave in relation to time that has no inherent boundaries. The elements in the n-tuple are called channels. Currently, up to two channels are supported. (In the physical sense, the waves correspond to fluctuations in pressure values.)
It is of infinite resolution and each sound wave is continuous (this means there are no discrete sample values). This, along with the 'infinite' property, means that the domain of sound is all of .

Each of the operations on sound and attributes for sound are described in terms of this abstract model. We will first discuss ways to construct sound, and then the operations on sound.

Construction of Sound

A common way to construct sound is by importation from a file representation, through importSound(URL, length), in the .WAV and .MIDI formats. Upon importation, the length of the imported sound, in seconds, is returned as the length parameter. The sinSynth value is a sine wave based sound, which, when attributed (see the next section) can be the basis of a diverse set of synthetically generated sound waves. Furthermore, silence is a constant of SoundBvr type that is vacuous throughout. Other ways to construct sound is by rendering geometry with embedded sound into a microphone for a single channel sound and into two microphones for stereo sound.

For best results with .WAV files we propose a canonical format with a mono-sound used at a sampling rate of 22050Hz, which is half of 44100Hz (this is the rate of an audio CD), and a dynamic range of 16-bits. Importation of stereo-sound, a variety of sampling rates, and dynamic ranges of 8-bits and 32-bit floats is also supported. The canonical format strikes a balance between the quality of sound and some implementation pragmatics.

Operations on Sound

New sounds can be obtained from seed sounds using parameters and operations. The loop method takes a sound and repeats it forever. The gain method scales the amplitude of the sound wave by a specified scalar. The rate method scales the time dimension of the wave, effectively squeezing or stretching the sound wave. Modifying the rate is equivalent to time-transforming sound by a scalar multiplication function. The phase method shifts the sound wave by a specified parameter. For positive phase values, the sound starts with silence for the specified amount of time before the sound wave begins, and for negative values the corresponding front part of the sound wave is skipped. The parameters to gain, rate, and phase are themselves reactive numbers, and could be (and typically are) time-varying. For example, it is common to relate a distance between a moving sound source, such as a squeaking bird, and a listener, by using the gain parameter.

Pan

The pan method takes a stereo sound and scales the gain of the left and right channels according to a pan parameter, panAmt. It works as follows:

For panAmt -1, the left channel is the original left sound and the right channel is silent.
For panAmt ]-1,0], the left channel is the original left sound and the right channel ramps up linearly from silence to original right sound which is attained at 0. (This works like the balance knob in stereo systems, where when in the middle both speakers present the full energy of the original sound. In relation to the obvious alternative, this has the advantage that it is possible to obtain the full energy of the stereo system when the knob is in the middle.)
For panAmt [0,+1[, the right channel is the original right sound and the left channel ramps down linearly from the original left sound to silence, which is attained at +1.
For panAmt +1, the right channel is the original right sound, and the left channel is silent.

Gain

Direct Animation supports 16-bit sound and expects all sounds to be normalized. This means that, for maximum resolution, all 16 bits are used. A consequence of this is that all sounds seem equally loud. For example, a jet plane and a whisper will sound as though they are the same volume.

To rectify this, sounds are pre-scaled with the gain method. Because there is no way to change the actual gain of a PC's amplifier, sounds can only be attenuated. Attenuating a sound is the same as multiplying it by a value between 0 and 1.

In the next example, the two sounds, which are heard in the left and right channels respectively, will have an equal volume, even though their gains are different:


SoundBvr snd1 = importSound("...", null);
setSound( mix( 
    snd1.gain( toBvr( 1  ) ).pan( toBvr( -1 ) ),
    snd1.gain( toBvr( 5 ) ).pan( toBvr( 1 ) ) ) );

In other words, the gain is clamped at one. The next example makes the sound in the left channel louder than the sound in the right channel:


	setSound(mix(
	snd1.gain(toBvr(1)).pan(toBvr(-1)),
	snd1.gain(toBvr(0.2)).pan(toBvr(1))));

When sounds are embedded in a geometry and spatialized (using GeometryBvr.soundSource and a MicrophoneBvr), then gains larger than 1 can be used to allow the sounds to be heard from farther away than sounds without the scaling factor.

In particular, loud sounds such as a jet engine should be scaled by a gain > 1. However, unlike sounds in the real world, the sound will only become louder as you approach until the gain is equal to 1. Once this limit is reached, there is no more attenuation, and the sound's volume remains constant no matter how much closer you come.

The following diagram relates distance to a gain of 1 in an approximately linear fashion: Distance vs Gain of 1

The next diagram shows how, for gains larger than 1, the volume is clamped at 1. Distance vs Gain of 2

Mix

The mix method merges two sound waves into one by adding the corresponding waves. The common implementation of mixing is to add corresponding wave samples, and it is implementation-dependent how two imported sounds of different sampling rates and/or different bit depths are mixed together. In the case of an overflow of values while mixing, the result is clamped to the highest possible value.

Sound Layering

The operations provided by Direct Animation on SndBvrs allow the generation of rich synthetic sounds from basic sound seeds by parameterization and layering (or mixing). The idea is to attribute the seed sounds with animation-related or random parameters that are usually time-varying, and then to mix the results together in a fashion consistent with the animation. This process is not very different from what is commonly done with 3D models. Simple parts are transformed, colored, or textured and then combined together into more interesting models. This means sound can be as flexibly and synthetically generated as 3D models are.

For example, in the Lighthouse sample included in the SDK, this technique is used to generate the ocean ambient sound, which is parameterized by the weather condition parameter, which, in turn, is controlled by the slider. Two constructs are used for the layering and parameterization of sound in Lighthouse that are generally useful. One is for generating parameterized cyclic sounds and the other for generating parameterized and random periodic sounds (the latter has a silence period between the different occurrences).