Fundamentals of Sound



Perceptual attributes of acoustic waves - Pitch
   Definitions - Ranges - JND
   Pitch of Pure Tones (Pitch & Frequency / Intensity / Duration) - Music Theory Primer / Tuning Perception
   Pitch of Complex Tones (Pitch & Spectrum) - Pitch Theories
   The Octave - Multidimensionality of Pitch:  Pitch Height & Pitch Chroma
   Optional Sections (Pitch Theories)





Perceptual attributes of acoustic waves: Pitch
Definition - Ranges - JND


Pitch: Sonic (i.e. perceptual) attribute of sound waves, related mainly to frequency.
Large frequency values result in 'high' pitch while low frequency values result in 'low' pitch.
American National Standards Institute (ANSI) definition:
That attribute of auditory sensation in terms of which sounds may be ordered on a scale extending from low to high.
According to ANSI, the pitch sensation corresponding to 440Hz is represented by A4.
Discussions on pitch usually revolve around music. However, pitch and pitch contours are quite significant in speech and, in some languages, pitch inflections carry specific semantic meaning.

The frequency range of hearing extends from ~20Hz to ~20,000Hz (or 20kHz). These values constitute the low and high absolute thresholds of frequency perception, respectively. Listen to a sine signal sweeping through this range. [This will be more a test of your listening equipment than of your hearing].

On average:
a) Frequencies below 20Hz sound as individual pulses with no definite pitch. Signals with frequency f >20Hz have period T < 50ms, which is less than the hearing response's "decay/release time" (~50ms). So, for such signals, the response system does not recover and provides continuous excitation, giving the impression of a clear, uninterrupted pitch sensation. (Note: the nerve fibers in the cochlea are only excited when the BM pushes up, or during one half of a vibration cycle).
b) Frequencies above 20KHz are inaudible to humans. The hearing mechanism's construction results in natural resonance frequencies that are < 20kHz.

The frequency range that can give an accurately identifiable pitch sensation extends from ~30Hz to ~5000Hz (or 5kHz).
The common instrumental pitch range spans most this frequency range (see the figure, below, and (i) alternative graph; (ii) comparison of popular (at one time) vocalists).

Maximal pitch accuracy extends from ~60Hz to ~3800Hz (or 3.8kHz) or ~six octaves above ~60Hz (~B1), to 60*26 = ~3800Hz (~B7)
[One octave up corresponds to double the frequency. So one octave above 60Hz is 60*2 = 120Hz. Two octaves up is 60*2*2 = 60*22 = 240Hz. Three octaves up is 60*2*2*2 = 60*23 = 480Hz; and so on.]

Frequencies above 10kHz give rise to pitch sensations that, although may be distinguishable from one another, they are hard to identify in terms of height and cannot accurately portray direction of pitch change

Listen to this high-frequency example over headphones (uncompressed .wav file).  It includes three 1-second-long tones, played successively and introducing two possible pitch changes: from the first to the second tone and from the second to the third.  Pick, in your opinion, the pattern of pitch changes from the following: SD,  DS,  SU,  US,  UD,  DU,  SS  UU,  DD (S: pitch stays the same;  U: pitch goes up;  D: pitch goes down).  See the bottom of the page for the frequencies of the three tones.

Complex-tone spectral components with frequencies above 10,000Hz usually represent the 'noisy' portions of musical sounds (bow scrapings, reed attacks, hammer hits, etc.) and have timbral (tone color) rather than pitch significance. In addition, spectral energy above 10,000Hz  communicates the degree of a complex signal's perceived "naturalness."

JND for Pitch: ~0.3-1% of frequency, depending on register (i.e. on frequency region - see the figure, below). 
Expressed differently, it corresponds to approximately 1/30th of the critical band, 1/12th of an equal-tempered semitone, or 5-8 cents (1 cent = 1/100th of a semitone - more below).

[Reminder: JND (just noticeable difference) or difference threshold refers to the smallest perceivable change in a physical variable]

Click, below, to listen to pairs of successive tones ranging from 440-441Hz to 440-448Hz. What is the smallest frequency difference you can perceive as a change in pitch?


Pitch of Pure Tones
Pitch & Frequency / Intensity / Duration - Music Theory / Tuning Perception



Dependence on Frequency

For simple/pure tones, pitch closely relates to frequency. Similarly to Intensity and SIL [Sound Intensity Level], frequency and pitch relate logarithmically: addition in perception (pitch) corresponds to multiplication in the physical variable (frequency).

As the frequency rises, the same pitch interval corresponds to increasingly larger frequency differences. For example, pitch increase by an octave interval corresponds to frequency doubling.

_ raising the pitch by 1 octave corresponds to multiplying the frequency by 2;
_ raising the pitch by 2 octaves corresponds to multiplying the frequency by 2x2 or 22;
_ raising the pitch by 3 octaves corresponds to multiplying the frequency by 2x2x2 or 23; and so on.

The figure, below, illustrates the frequency / pitch relationship. The same pitch interval (e.g. octave) corresponds to an increasingly larger frequency distance (after Campbell and Greated, 2001)

Read this outline of the frequency/pitch relationship

The frequencies corresponding to musical pitches/notes and pitch/note intervals (: pitch distances) increase logarithmically.
Per the current standard, pitch A4 corresponds to 440Hz.  

Music Theory Primer - Use this virtual piano

Interval: perceived pitch distance between two tones (whether pure/simple or complex).
Melodic Interval: Interval between two successive tones.
Harmonic Interval: Interval between two simultaneous tones.
Octave: Interval between two tones with frequency ratio of 2 (HighF / LowF = 2)
Chord triad: Set of three simultaneous tones.

Our current tuning system, referred to as equal temperament, divides the octave interval in 12 log-equal, modular, interval units, corresponding to increasingly larger frequency distances, as we move up within the octave. These 12 interval units constitute the chromatic scale and are, interchangeably and for historical reasons, referred to as semitones, half steps, or minor seconds.
Equal temperament 'tempers' problems with previous tuning systems (based on various "natural" justifications) by spreading any tuning "errors" equally over all 12 semitones. This insures that a) all octave intervals maintain their 2/1 frequency relationship and b) transposing musical pieces to different keys maintains the piece's original pitch relationships.  Table of pitch/frequency pairings.

Each of the 13 possible intervals outlined by the notes in the chromatic scale has a unique sonic character or 'signature sound,' related to the different ways the frequency components of the interval notes interact within the ear. Combinations of these signature sounds can create cognitively recognizable patterns. Click, below to listen to all thirteen intervals, starting at C4, (synthesized notes).

C4-C4    C4-C#4    C4-D4    C4-D#4    C4-E4    C4-F4    C4-F#4    C4-G4    C4-G#4    C4-A4    C4-A#4    C4-B4    C4-C5

Sequence of all intervals

For tuning and tuning-comparison purposes, each semitone is further divided into 100 log-equal parts called "cents ."

Tuning - Music Performance & Perception

Regardless of tuning system, the perception of musical intervals seems to be categorical rather than continuous (at least for experienced listeners), willing to tolerate minor deviations from ideal tunings.  In addition, performers seem to systematically stretch/contract intervals based on melodic and harmonic context.

In general, listeners appear to have more tolerance for stretched (or "sharp") rather than contracted (or "flat") tuning.
Click here for an example. The first melody uses compressed/flat tuning, the second uses stretched/sharp tuning, and the third is mathematically correct (Houtsma et al., 1987).
Consistent with our preference for slightly stretched/sharp, rather than contracted/flat tunings, listeners tend to prefer stretched over numerically precise or contracted octave intervals, and tend to reproduce them as such.


Dependence on Intensity



The pitch of pure tones also depends on intensity (see the figures to the left).

In general, increasing the intensity of pure tones:
   a) decreases the pitch of low frequencies
        (approx. <300Hz),
   b) increases the pitch of high frequencies
        (approx. >3000Hz), and
   c) has no noticeable effect at middle frequencies.

In the figure to the left:
A pure tone of frequency 98Hz has
         a pitch of G2 when quiet (ppp), and
         a pitch lower than E2 when loud (fff).
, (c), and (d) show the influence of intensity
         on the pitch for pure tones with frequencies
         392Hz, 784Hz, and 3136Hz respectively.
         (in Campbell & Greated, 1987; derived
          from Stevens & Davies, 1939).

In addition, the introduction of a high-intensity "interference" tone will change the perceived pitch of existing low-intensity tones, even if the frequency of the low-intensity tones remains unchanged.
For frequencies well above the high-intensity tone, the perceived pitch will rise.
For frequencies well below the high-intensity tone, the perceived pitch will drop.

In other words, the high intensity tone pushes the pitch of low intensity tones away from it, assuming frequency separations beyond one critical bandwidth. (What will happen if the intense and weak tones fall within the same critical band?)



The figure to the left  offers another, simplified illustration of the average dependence of the pitch of pure tones on intensity.
In short:
    Increasing the intensity will
    _ lower the pitch of a low frequency tone
    _ raise the pitch of a high frequency tone
    _ will not noticeable impact the pitch of a
       middle frequency tone

Dependence on Duration

Pitch also depends on duration. A tone must last more than a minimum amount of time (~10-60ms, depending on frequency and intensity) in order to sound more than a 'click' and convey a clear sense of pitch (see below). We will return to this during the module on Timbre.





Pitch of Complex Tones
Pitch & Spectrum - Pitch Theories



The dependence of the pitch of periodic complex tones (i.e. of complex tones with harmonic spectra) on frequency, intensity, duration, and the introduction of intense 'interference' tones is qualitatively similar to that of pure tones but more complex, due to the variety of frequencies involved. The pitch JND for complex tones may be larger or smaller than that for pure tones, depending on spectral context.

Dependence on Spectrum

Periodic/harmonic complex tones are usually perceived as a single unit, with their multiple spectral components merging into a single tone perception rather than being perceived as a set of individual pure tones.
The pitch of  periodic complex tones with harmonic spectra
(i.e. spectral components with frequencies that are integer multiples of the frequency of the lowest component, called 'fundamental') matches (in general) the frequency of the fundamental component.
This is apparently the case regardless of whether or not this component is perceivable (i.e. even when it is too low in level and/or masked) and even if it is not physically present in the tone's spectrum at all.

Phenomenon of the missing fundamental or "virtual pitch": the pitch of a complex harmonic tone matches the frequency of its fundamental spectral component, even if this and several more low-frequency components are missing from the tone's spectrum. 
In general, the phenomenon of the pitch of the missing fundamental or "virtual pitch" cannot be explained by place (tonotopic) theories of pitch perception (see below) and provides evidence that pitch depends not only on frequency, intensity, and duration but also on spectral distribution.

Listen to a pair of complex tones illustrating the phenomenon of the missing fundamental. Both have fundamental of 300Hz and up to 15 components (ramp spectrum: An = A1/n for all components). The first tone has all 15 components while the second is missing the first 5. As you can see, removing these first 5 components does not change the tone's pitch, which continues to match that of 300Hz.

Frequency region most significant to pitch
      Spectral and temporal context and fundamental frequency determine which of the criteria, below, will be more significant.

  • For complex tones with a fundamental of 100Hz, the pitch sensation changes after removing the first ~15-20 components and starts deteriorating after removing the first ~25-30 components. 
  • For fundamentals of 500Hz and 800Hz, the same observations occur after removing the first ~10-12  and ~4-7 components respectively.
          Listen to this example.  It includes 13 harmonic complex tones, with fundamental of 600Hz, played in succession. 
          The first tone has all harmonic components from the 1st to the 16th and each successive tone drops one component,
          starting from the fundamental, until only the highest 3 components remain. What happens to the pitch?
  • Experiments examining the effect of mistuning some of a harmonic tone's components on the resulting pitch have determined that the frequency region most important to pitch is between ~400 and ~1500Hz. That is, mistuning or removing components laying within this region has the most effect on pitch, while presence of frequency components within this region results in most clear pitch sensations. 

Perceiving individual components

As noted previously, harmonic complex signals are perceived as a unit rather than a set of multiple pure tones. However, the individual harmonics can be heard if we draw attention to them by removing them and re-introducing them (Houtsma et al.,1987).

The throat singers of Tuva (peoples of Tibet and the Siberian grasslands) exploit this phenomenon to create musical passages where one singer appears to produce two pitches simultaneously: one acting as a fixed drone and one performing a sort of melody. These passages are created:
a) from the single fundamental frequency produced, acting as a background drone, and
b) from harmonic components associated with this fundamental, selectively accentuated by the performer, acting as the melody line. [Optional: For more information and video/audio examples see here and here.]

Analytic / Synthetic Listening

In this example, (adapted from Smoorenburg, 1970), two complex tones with 2 components each are presented in succession.
(a) 800Hz + 1000Hz (b) 750Hz + 1000Hz. When moving from (a) to (b):

  • Some listeners hear the pitch going down by following the motion of the first component in each tone (800Hz 750Hz ~1semitone drop).
    Explicit rules are employed to track the physical attributes of the two tones and determine the pitch motion.
    This is considered an example of analytic listening.

  • Other listeners hear the pitch going up, by reconstructing the motion of the (missing) fundamental implied by the two complex tones (200Hz 250Hz ~a major third rise).
    Implicit rules
    are employed to synthesize a physical attribute that is implied by the rest of each tone's attributes, helping determine pitch motion. This is considered an example of synthetic listening.

Let's repeat the experiment, but now with two 5-component complex tones, presented in succession:
(a) 800Hz+1000+1200+1400+1600 (b) 750Hz+1000+1250+1500+1750.    In this case, listeners only hear the pitch going up by a major 3rd, perceiving the motion of the (missing) fundamental (200Hz 250Hz). Why?

In Dannenbring's (1974) demonstration (masking noise bursts filling tone gaps in a steady or frequency modulated tone - from our Hearing Module):

  • Listeners synthesize the sensation in a form of listening often referred to as synthetic or holistic, based largely on implicit rules (rules we are not explicitly aware of).

  • If listeners are alerted to the fact that the presented tones actually have gaps, they may be able to perceive them by directing their attention to separate portions of the total stimulus. This 'directed' form of listening is often referred to as analytic, based largely on explicit rules (rules we are explicitly aware of).

The McGurk effect is an auditory illusion caused by ambiguous audio/visual cues. It offers another example of synthetic listening, illustrating that perceptions are the result of an experience-guided synthesis of information from our environment. Our responses to stimuli do not necessarily match the stimuli. Rather, they reflect the best way such stimuli fit to our previous experience, concurrent context, and the associated expectations. When staring at the speaker's lips, the effect is so robust that, even when alerted to the a/v stimuli conflict, listeners/viewers are unable to listen to the a/v composite analytically. Listening with eyes closed presents no ambiguity and results in a different sonic perception. Watch this BBC2 clip on the effect.


Pitch of Complex Tones with Inharmonic Spectra

Inharmonic/non-periodic complex tones, whose frequency components are not integer multiples of a 'fundamental' component, do not elicit a clear, unique pitch sensation. They may elicit more than one competing pitch sensations, may resemble chords, or may sound as noise, depending on their spectral distribution.

As illustrated below, however, changing the spectral peaks of inharmonic complex tones without changing the frequencies of their sinusoidal components can result in a distinguishable change in pitch, matching the frequency change of the spectral peaks. This highlights the perceptual salience of changing vs. static stimuli, while also providing additional evidence of the dependence of pitch on spectral distribution. 

Listen to these three individual major chords (three simultaneous notes: C, E, and G), performed by combinations of flute, clarinet, and oboe:         Clarinet-Flute-Oboe             Flute-Clarinet-Oboe            Oboe-Clarinet-Flute

Now, listen to this "chord melody" example, consisting of a five-chord sequence, each of which is one of the three chords, above. All 3 chords are major, include exactly the same notes (C5, E5, and G5), and have inharmonic spectra (i.e. their spectral components are not integer multiples of a single fundamental, even though the spectra of the individual notes in the chords are themselves harmonic).

The frequencies of the components are identical among chords but the spectral envelopes (i.e. relative intensities of the components) are different and depend on which instrument plays what note (a flute, a clarinet, or an oboe).  The melody you hear (C-E-G-E-C) tracks the position of the flute in each successive chord, because the flute's fundamental frequency provides the spectral peak for each chord's spectrum.  In other words, the melody you hear matches the pitch changes corresponding to the changes in the flute's fundamental frequency. 

This effect can also be produced through spectral shaping of noise bands. Pitch perception will track changes in the spectral peak of the noise.

NOTE: Minor deviations from harmonic spectra (up to ~1-2% of frequency) and the way these interact when, for example, several instruments perform together in unison change the timbre of the combined sound rather than its pitch. They produce what is referred to as the 'chorus effect': richness of ensemble sound due to slow and varying beating rates among the slightly detuned components of the complex tones involved.

Approximate value of the virtual pitch of slightly inharmonic spectra, whose fundamental or more low components are missing:

Assume the 3-component harmonic spectrum: 800, 1000, 1200Hz (components 4, 5, and 6 of a 200Hz missing fundamental and a slightly inharmonic version of the same spectrum: 792, 1020, 1170Hz.
The virtual pitch of the harmonic spectrum will correspond to 200Hz.
The virtual pitch of the inharmonic spectrum will correspond, approximately, to the average of the three implied fundamentals: (792/4 + 1010/5 + 1170/6)/3 = (198+202+195)/3 = 595/3 = 198.3Hz.


Pitch Theories (for more details, see the Optional section at the end of this module)

Place (Tonotopic) Theory

Pitch relates directly to the point of stimulation on the BM. Different frequencies resonate at different locations on the BM. Frequency is encoded as pitch by the inner hair cells corresponding to the BM portion resonating for that frequency.
Watch these (simplified / exaggerated) animations of the basilar membrane motion in response to:
1000Hz,    8000Hz,    1000Hz+8000Hz,  &  a range of frequencies (.mov files).  Read this brief outline of the place theory.


Applies to all frequencies.

Explains pathological conditions:
: different pitch sensations per ear for the same frequency, due to BM structural differences;
presbycusis: pitch shift with age for the same frequency, due to BM hardening with age.


Cannot explain the coarseness of JND or the observed relationship of perceived pitch to intensity.

Unless it is modified, it cannot reliably explain  two phenomena associated with the pitch of complex tones:

  1. The pitch of the missing fundamental.
    The place theory claims that, when missing, the fundamental frequency and other low frequency components are re-introduced as the "difference" frequencies, distortion products arising from the interaction among the existing components in the tone's harmonic spectrum (the difference frequencies between upper harmonic equals the frequencies of lower harmonics; why?). The difference frequencies then excite the BM locations that correspond to the fundamental and other low-frequency components. This claim is challenged by several observations, including the pitch-shift effects. These effects refer to instances where spectral modifications result in the pitch changing towards a direction that appears to be opposite to that predicted by the place theory of pitch and the "difference frequency" hypothesis.

  2. The persistence of the missing fundamental phenomenon even when the first 8 or more components are removed.
    For most tones, only the first 5-8 components excite separate critical bands and can therefore elicit a unique pitch sensation. Beyond that, 2 or more components lay in the same critical band and cannot be differentiated based on a place theory of pitch. Therefore, the place theory of pitch requires energy at the BM location corresponding to the lower components for pitch sensation to register.


Temporal (Periodicity) Theory

Periodicity theories of pitch rely on time information, as represented on the signal itself or through its conversion to neural electrical spikes, following inner hair cell activity.

The volley theory of pitch, introduced in the 1940s by Wever, explains pitch in terms of the combination of phase locking (neurons firing at a single point in the vibration cycle) and neural firing synchrony (multiple neurons firing in sync with each other) information from a large number of inner hair cells associated with a specific place on the basilar membrane.
Select the "Illustration" tab on this page for an interactive animation that illustrates the temporal theory.

Whether based on the periodicity of the sound signal itself or the periodicity of the sound signal's representation in the auditory nerve (e.g. periodicity of neural spike activity), temporal models of pitch assume that the hearing mechanism extracts a signal's periodicity by performing an autocorrelation function on the signal.  Qualitatively, autocorrelation provides a measure of how well a signal matches a time-shifted version of itself, as a function of the amount of time shift.  


May explain why pitch perception becomes increasingly coarse past 5kHz.

May explain the perception of the missing fundamental.

Efficiently captures pitch and loudness information based on neural firing patterns and densities.


Cannot readily explain pathological conditions (diplacusis / presbycusis), which can be traced to the BM and have physiologically-based pitch perception implications.

Cannot readily explain the pitch shift effects.

Appears to break down above 5kHz.


Hybrid Theories

Contemporary, hybrid theories of pitch incorporate aspects of both, Place and Periodicity theories (Plomp, 1964; Terhardt, 1974; Pierce 1990). Terhardt's theory of 'virtual pitch', introduces 'previous learning' as a novel component that can be based on either temporal or place pitch cues and which provides us with pitch templates of disturbance patterns on the BM. These are used to support pitch decisions in ambiguous contexts (spectra with missing components, mistuned, etc.). 

In general, it appears that the most salient/prominent components of a signal's spectrum (in terms of intensity level and frequency separation) are the most important carriers of pitch information. In the presence of multiple complex tones, the components of each complex tone are perceptually linked together into a single percept, separate from the other complex tones. This appears to be due to each particular complex tone's spectral jitter (i.e. fast amplitude and frequency micro-variations that are almost synchronized among the components of each complex tone).





The Octave - Multidimensionality of Pitch: Pitch Height & Pitch Chroma



The Octave (use this virtual piano to help you with the concepts in this section)

The octave interval (doubling in frequency) is significant because tones separated by this interval sound remarkably similar and, when simultaneous, perfectly blend into a single pitch percept, even though the higher frequency is clearly higher in pitch.
The perceptual "sameness" conveyed by the octave is cross-cultural and is referred to as: pitch chroma.

Pitch chroma: The distinctive quality of a specific tone, separating it from the rest of the tones within an octave. It describes perceptual 'differences'/'distances' of pitches within an octave and the perceptual sameness of pitches separated by one or more full octaves. It is reflected in the fact that the different note names (e.g. C, D, E, F, G, A, B, C, D ...) repeat periodically for every 2/1 increase in frequency (i.e. every octave) with the addition of a subscript (e.g. C4) to indicate how high or low this pitch is relative to some reference pitch.  In other words, a numeric subscript difference between two notes that share the same pitch chroma (e.g. C4 vs. C5) reflects a pitch height difference of one or more octaves between those notes.

Pitch height: term describing the perceptual 'highness' or 'lowness' of a pitch; it is related mainly to frequency.
The intervals A3 (220Hz) - A4 (440Hz) and A4 (440Hz) - A5 (880Hz) are both octaves, with the three notes (A3, A4, A5) having the same pitch chroma but different pitch heights. In terms of their pitch height, octaves are equidistant perceptually (not in Hz).
Notes represented by different names are different in both pitch chroma and pitch height. Within a single octave, all notes differ in both, pitch chroma and pitch height, with the pitch rising in accordance with the note name (C3 is lower than D3, which is lower than E3, etc.). [Why, do you think, does octave numbering loops at the note C but note letters loop at the note A?]



Multidimensionality of Pitch (pitch spiral)

Pitch is multidimensional (at least three dimensions), involving pitch height (one dimension: frequency) and pitch chroma (two dimensions, separating pitches within an octave and linking pitches an octave apart).

Note: A variable is uni-dimensional if all its values can fit on a single straight line. If for example A, B, & C represent 3 values of a variable as points in space then:
a) if |AB| + |BC| = |AC| then the variable is uni-dimensional
b) if |AB| + |BC| ≠ |AC| then the variable is multidimensional

The western chromatic scale breaks the octave down into 12 different pitch chromas. The pitch chroma dimensions represent a circularity in pitch perception. Due to this circularity, although the octave interval represents the largest physical distance in terms of pitch height (within a single octave), it represents the smallest perceptual distance in terms of pitch chroma.
Pitch perception therefore wraps on the octave, with scales defining sets of different pitch chromas that repeat at different pitch heights for each new octave. This results in what resembles a pitch spiral.
The perceptual circularity of pitch is explored in Shepard-tone scales/slides (after Roger Shepard) that present the paradox of a continuously ascending (or descending) pitch.

Listen to two pitch spiral examples (Houtsma et al., 1987).

Shepard scales/slides are the auditory analog of the continuously ascending/descending staircases, explored conceptually by Penrose and artistically by Escher (see the images, below)

Pitch Spiral: pitch hight and pitch chroma

Pitch chroma circularity (octave)


                          Relativity (Escher, 1953)                                                                        Ascending-Decending (Escher, 1960) 

Short video on pitch and a/v related audio illusions






The Place Theory of Pitch was proposed by Ohm (1843), developed by Helmholtz (1862), and confirmed experimentally by von Békésy (1950s), who won the Nobel prize in medicine (1961) for his contributions to the understanding of hearing. The theory's drawbacks, below, were being explored in parallel.

The Pitch-Shift Effects

The artificial spectra created in explorations of the two pitch shift effects are slightly inharmonic but not inharmonic enough for the pitch sensation to deteriorate.

The first pitch shift effect: Shifting all components of a harmonic complex tone by a value: |Δ|, results in a shift of the perceived pitch by a value |ΔP|, despite the fact that the frequency spacing between the components (and therefore the difference frequency |Δ| among successive components of the complex tone) remains unchanged (|Δ|=0).


The second pitch shift effect: For the harmonic complex tone to the right/bottom (continuous spectral lines) the perceived pitch P matches that of the difference frequency 0. Increasing the spacing of the components while keeping the frequency of the center component n the same, results in a drop in pitch P'< P , although the difference frequency has increased from |Δ|=0 to |Δ|=0+d (broken lines).

Vassilakis (1998) argued that these two effects are not distinct but alternative manifestations of a single phenomenon. In a follow-up work, he argued that the pitch shift effect reflects our perceptual system's handling of the interaction between the phase and group velocities of the inharmonic tone complexes used in pitch shift experiments (Vassilakis, 1998b).

Unresolved Upper Harmonics

The place (or tonotopic) theory of pitch requires energy in the lowest 5-8 components (depending on fundamental frequency) in order to produce a clear pitch sensation.  These low components are the components that are usually resolved best by the basilar membrane (i.e. they lay within separate critical bands) and can therefore provide clear place-related pitch information and/or produce strong intermodulation distortion products. The problem is that clear virtual pitch sensations persist even when the remaining components in a spectrum are not resolvable (see the 'unresolved upper harmonics' in the figure, left).

The above observations led to the first attempt to a temporal theory of pitch, called the "Residue" theory of pitch, developed first by Schouten (1938) and later by Walliser (1969). It states that pitch is determined by the temporal interaction, at a neural level, of the unresolved, 'residual' upper harmonics in a spectrum. This theory is challenged by the experimentally determined frequency region most salient to pitch (~400-1500Hz).


Temporal Coding: Phase Locking and Rectification

The first systematic temporal (periodicity) theory of pitch was proposed by Seebeck (1843), developed by Rutherford (1886), worked out by Schouten (1938), and has been reformulated several times since (e.g. volley theory of pitch by Wever, 1949).
Wever (main figure in the 'periodicity theory' camp) and Békésy (main figure in the 'place theory' camp) were close friends


Neural response (neural firing) follows (or appears to be locked to) the positive peaks in the stimulus, firing only when the stereocilia are sheared in one direction.

This results in the neural signals of sinusoidal inputs
i) having a pulse-like shape with repetition rate equal to the input's period (i.e. they represent frequency in the time domain) and
ii) being rectified (i.e. they are constricted to the positive portion of the two-dimensional signal graph).

The process of phase locking is closely related to hearing's "temporal coding theory" of encoding frequency information:

The inner hair cells release neurotransmitters only when the basilar membrane moves in one direction, towards the scala media. They therefore only respond to the positive portions of incoming signals, with their stereocilia opening their ion channels when bending against the TM mostly at a signals positive peak.
Temporal coding assumes that, thanks to phase locking, the auditory system encodes periodic information through the firing rate of neurons. This rate corresponds to the incoming signals period (or some multiple of it) and, therefore frequency. Since neurons are not fast enough to encode high frequencies, more than one neuron must be involved in the process. Each neuron fires at some of the peak portions of an incoming signal and, after adding the outputs of all neurons, the signal is represented to the brain in a manner similar to that shown at the bottom graph (left).

Perception of pitch relations - Unit of pitch

Mel: Pitch-height unit and scale devised by S. S. Stevens (1937, 1940).
It is based on 'twice as high' perceptual judgments. 'Twice as high' amounts to a larger musical interval at high registers than at low registers. 

Reference: 1000 mels = pitch of 1000Hz presented 40dB above threshold.

The Mel unit of pitch height is analogous to the Sone unit of loudness.

The derivation of the Mel scale has been criticized for flawed methodology and is not in use.


Further Resources

_ Concise and systematic historical review of pitch theories (Alain de Cheveign, IRCAM, Paris, France - source).
_ Pitch Perception presentation (A.J. Oxenham, Harvard - MIT - source)
_ Neural Coding of Pitch presentation (B. Delgutte, Harvard-MIT - source)
_ Music perception, pitch, and the auditory system (J.H. McDermott & A.J. Oxenham, University of Minnesota)
_ Revisiting place and temporal theories of pitch (A.J. Oxenham)
_ Von Bksy and cochlear mechanics (E.S. Olson et al. - Columbia University)




Key to the 3-tone listening example:
DD  (the tones have frequencies 12025Hz, 12000Hz, and 11975Hz, in this order)




Loyola Marymount University - School of Film & Television