Head-Related Transfer Functions
To find the sound pressure that an arbitrary source x(t) produces at the ear drum, all we need is the impulse response h(t) from the source to the ear drum. This is called the Head-Related Impulse Response (HRIR), and its Fourier transform H(f) is called the Head Related Transfer Function (HRTF). The HRTF captures all of the physical cues to source localization. Once you know the HRTF for the left ear and the right ear, you can synthesize accurate binaural signals from a monaural source*.
The HRTF is a surprisingly complicated function of four variables: three space coordinates and frequency. In spherical coordinates, for distances greater than about one meter, the source is said to be in the far field, and the HRTF falls off inversely with range. Most HRTF measurements are made in the far field, which essentially reduces the HRTF to a function of azimuth, elevation and frequency.
We made a series of HRIR measurements on an acoustic manikin known as KEMAR, which stands for Knowles Electronics Manikin for Auditory Research. To get an idea of how KEMAR's response varies with azimuth and elevation, take a look at the following graphical representations of the HRIR and the HRTF:
- The HRIR in the horizontal plane
- The HRIR in the median plane
- The HRTF in the horizontal plane
- The HRTF in the median plane
Footnotes
It is usually assumed that the HRTF is measured in an anechoic setting, and thus does not include the effects of environmental sound reflections, which also provide localization cues. In that case, it is necessary to use some kind of binaural room simulator to introduce these important reflections. Failure to do so results in an improper ratio of direct-to-reverberant sound, and when heard through headphones, the sound often seems to be either very close to or actually inside of the head. Lack of externalization is a common problem with simple headphone systems.
Of course, it is possible to measure the HRTF in an actual reverberant setting and thus to dispense with the room simulator. However, this has two serious disadvantages:
- It limits the sounds produced to that particular environment
- It leads to very long impulse responses that use much more memory and require much longer computation times
HRIR: Horizontal Plane
Here is an image representation of KEMAR's experimentally measured head-related impulse response. The picture shows the response of the right ear to an impulsive source in the horizontal plane. The strength of the response is represented by brightness. Thus, we see at once that the sound is strongest and arrives soonest when it is coming from the right side (azimuth = 90°). Similarly, it is weakest and arrives latest when it is coming from the left side (azimuth = 270°). Note that the arrival time varies with azimuth in a more or less sinusoidal fashion. In fact, the arrival time conforms quite well to the ITD equation. In particular, notice that the difference between the shortest and the longest arrival times is about 0.7 ms, just as the theory predicts.
It is also possible to explain some of the features seen in this image by thinking about the physics involved. For example, the initial sequence of rapid changes (bright and dark bands) is due to pinna reflections. The peak that arrives about 0.4 ms after the initial peak is due to a shoulder reflection.
Finaly, note that the response when the source is in front is quite similar to the response when the source is in back. The differences that do exist show up as a lack of perfect symmetry about a horizontal line at 90°, e.g., in the dark "trough" following the initial pulse that is prominent in front but not in back. People also have trouble distinguishing front from back, and often resolve this problem by head motion.
HRIR: Median Plane
When the source moves around the head in the median plane, the changes are more subtle. Arrival time is more or less the same, as one would expect. The main changes are in the relative arrival times and strengths of the pinna reflections. This shows up in the frequency domain as a notch whose frequency changes with elevation (see the HRTF in the median plane). Note that the difference between front and back shows up in the mild but clear lack of symmetry about a horizontal line at 90 degrees elevation.
(In case you are wondering what causes the faint streaks sloping at about a 30° degree angle at the bottom of the picture, they have been traced to a floor echo. It is hard to get experimental data that is free from artifacts.)
HRTF: Horizontal Plane
This mesh plot shows the frequency response for KEMAR's right ear as the source moves in the horizontal plane. Although this surface is rather bumpy, if you look at any one frequency you can see a roughly sinusoidal change with azimuth. As expected, the response is usually greatest when the source is at 90° and directed into the right ear, and weakest when the source is at 270° on the opposite side of the head.
Once again, front/back (0° and 180°) responses are quite similar. The graph below shows that the front response is a few dB higher than the back response in the frequency range from around 4 to 7 kHz. The peak around 4 kHz is due to ear-canal resonance. The notch around 10 kHz that is also clearly visible in the surface plot above is the famous "pinna notch", whose frequency changes with elevation.
HRTF: Median Plane
This series of curves shows how KEMAR's frequency response varies as the source moves around in the median plane. Note that the broad ear-canal resonance around 4 kHz doesn't change. However, the frequency of the pinna notch changes significantly with elevation. It goes from a bit below 6 kHz at low elevations up to 10 kHz or so as the sources moves overhead. When the source is directly above, the notch is hard to see, and the frequency response is fairly flat. It reappears as the source moves around the back of the head and back towards the floor.
We should hasten to add that the detailed behavior of the pinna features is quite sensitive to pinna shape, and varies considerably from person to person. With larger ears, the frequencies are shifted lower. Other shape changes introduce other significant changes in the response. This makes it much harder to control elevation than azimuth in HCI applications.