Overview - Perception of Sound Location
There are a number of factors that contribute to our perception of spatialized sound. These perceptual cues include: interaural time differences, interaural intensity differences, spectral shaping due to the shape of the ear, and audio shaping due to environmental factors. A number of these are briefly outlined below. For more detailed information see the reference section below.
Interaural Time Differences stem from the fact that our ears are on opposite sides of our heads. If a sound is coming from your left, then it hits your left ear before it hits your right ear. Your brain can perceive this difference and this helps to localize a sound source laterally (left/right localization.)
Not only is there a difference in intensity between left and right sounds, there is a difference in intensity perceived at the two ears. The head causes attenuation of sounds which are also used as perception cues for lateralization. This attenuation is only effective for those sound components that are smaller than the diameter of the head. Component frequencies which are outside this range (about 1.5 kHz and lower) diffract around the head.
The shape of the ears, especially the outer pinnae, has a significant effect on the perception of the sound location. The shape of the ear filters different frequencies of sound depending on sound location. High frequencies are muffled when located behind the listeners head. High frequencies are much more prevalent when the sound is in front or above the listener. These spectral cues are a major contributer to both lateral and elvational differences.
One problem with simulating this spectral shaping is that the model of the spectral effects are complicated and theory-based models are difficult to achieve. Related to this is the fact that everybodys ears are different, so using a generalized ear model for spectral shaping is not effective for all listeners.
HRTF stands for Head Related Transfer Function. The HRTF is a function which maps between the sound source location and the spectral filtering which occurs before the sound is perceived. Binaural HRTF takes into account both spectral and time-delay differences:
The binaural HRTF can be thought of as a frequency-dependent amplitude and time-delay differences that result primarily from the complex shaping of the pinnae.
-Durand R. Begault, 3D Sound for Virtual Reality and Multimedia
The HRTF determines the spectral filtering that occurs on a sound in this demo. The HRTF's used were empirically determined using a dummy head and torso with microphones inserted into the ears. The HRTF acquisition was done by Bill Gardner of the MIT Media labs. Readings exist in ten degree increments for elevations between -40 and +90 degrees, and a full 360 degrees azimuth.
For more information on the measurements and their acquisition, see Bill Garder and Keith Martin's web page. The low level software for reading and applying the HRTF filters to the samples is based on the spatializer referenced on this page.