Fundamentals of Sound

Pitch: Sonic (i.e. perceptual) attribute of sound waves, related mainly to frequency. Large frequency values result in 'high' pitch while low frequency values result in 'low' pitch. American National Standards Institute (ANSI) definition: “That attribute of auditory sensation in terms of which sounds may be ordered on a scale extending from low to high.” According to ANSI, the pitch sensation corresponding to 440Hz is represented by A₄. Discussions on pitch usually revolve around music. However, pitch and pitch contours are quite significant in speech and, in some languages, pitch inflections carry specific semantic meaning. The frequency range of hearing extends from ~20Hz to ~20,000Hz (or 20kHz). These values constitute the low and high absolute thresholds of frequency perception, respectively. Listen to a sine signal sweeping through this range. [This will test your listening equipment more than your hearing]. *On average:* a) Frequencies below 20Hz sound as individual pulses with no definite pitch. _ Signals with frequency f >20Hz have period T < 50ms, which is less than the hearing response's "decay/release time" (~50ms). So, for such signals, the response system does not recover and provides continuous excitation, giving the impression of a clear, uninterrupted pitch sensation. _ Conversely, signals with frequency f <20Hz have period T > 50ms, which is longer than the hearing response's "decay/release time" (~50ms). So, for such signals, the hearing system, which is excited by the long, upwards-extending tail of its resonance response, gets to recover between cycles and results in an interrupted, pulsating sonic sensation with no definite pitch. [Reminder: the nerve fibers in the cochlea are only excited when the BM pushes up, only once per vibration cycle.] b) Frequencies above 20KHz are inaudible to humans. The hearing mechanism's construction results in natural resonance frequencies that are < 20kHz. The increasingly short, downwards-extending tail of the hearing system's resonance response to higher frequency signals cannot extend downwards enough to stimulate the Basilar Membrane.] The frequency range that can give an accurately identifiable pitch sensation extends from ~30Hz to ~5000Hz (or 5kHz). The common instrumental pitch range spans most this frequency range. See the figure, below, an alternative graph, and a comparison of once popular vocalists. Maximal pitch accuracy extends from ~60Hz to ~3800Hz (or 3.8kHz) or ~six octaves above ~60Hz (~B₁), to 602⁶ = ~3800Hz (~B₇) [One octave up corresponds to double the frequency. So one octave above 60Hz is 602 = 120Hz. Two octaves up is 6022 = 602² = 240Hz. Three octaves up is 60222 = 602³ = 480Hz; and so on.] Frequencies above 10kHz give rise to pitch sensations that, although may be distinguishable from one another, they are hard to identify in terms of height and cannot accurately portray direction of pitch change. Listen to this high-frequency example over headphones (.wav file). It includes three 1-second-long tones, played successively and introducing two possible pitch changes: from the first to the second tone and from the second to the third. Pick, in your opinion, the pattern of pitch changes from the following: SD, DS, SU, US, UD, DU, SS UU, DD (S: pitch stays the same; U: pitch goes up; D: pitch goes down). See the bottom of the page for the tones' frequencies. Complex-tone spectral components with frequencies above 10,000Hz usually represent the 'noisy' portions of musical sounds (bow scrapings, reed attacks, hammer hits, etc.*) and have timbral (tone color) rather than pitch significance. This 'noisy' portion, corresponding to spectral energy above 10,000Hz, often correlates with the degree of a complex signal's perceived "naturalness."
JND for Pitch: ~0.3-1% of frequency, depending on register (i.e. on frequency region - see the figure, below). Expressed differently, it corresponds to approximately 1/30^th of the critical band, 1/12^th of an equal-tempered semitone, or 5-8 cents (1 cent = 1/100th of a semitone - more below). [Reminder: JND (just noticeable difference) or difference threshold refers to the smallest perceivable change in a physical variable]
Listen to pairs of successive tones ranging from 440-441Hz to 440-448Hz. What is the smallest frequency difference you can perceive as a change in pitch?

Pitch of Pure Tones

Music Theory Primer - Tuning Perception
Pitch & Frequency / Intensity / Duration

Music Theory Primer - Tuning Perception (Use this virtual piano)

Interval: perceived pitch distance between two tones (whether pure/simple or complex).
Melodic Interval: Interval between two successive tones.
Harmonic Interval: Interval between two simultaneous tones.
Octave: Interval between two tones with frequency ratio of 2 (HighF / LowF = 2)
Chord triad: Set of three simultaneous tones.

Our current tuning system, referred to as equal temperament, divides the octave interval in 12 log-equal, modular, interval units, corresponding to increasingly larger frequency distances, as we move up within the octave. These 12 interval units constitute the chromatic scale and are, interchangeably and for historical reasons, referred to as semitones, half steps, or minor seconds.
For tuning and tuning-comparison purposes, each semitone is further divided into 100 log-equal parts called "cents ."

Equal temperament 'tempers' problems with other, earlier tuning systems (that were based on various "natural" justifications) by spreading any tuning "errors" equally over all 12 semitones. This insures that:
a) all octave intervals maintain their 2/1 frequency relationship and, consequently,
b) transposing musical pieces to different keys maintains the piece's original pitch relationships.

Examination of tuning systems may take up entire college-level courses.
For a succinct, comprehensive overview of tuning systems see Schmidt-Jones, 2013.

Each of the 13 possible intervals outlined by the notes in the chromatic scale has a unique sonic character or 'signature sound,' related to the different ways the frequency components of the interval notes interact within the ear. Combinations of these signature sounds can create cognitively recognizable patterns. Click, below to listen to all thirteen intervals, starting at C4, (synthesized notes).

C4-C4 C4-C#4 C4-D4 C4-D#4 C4-E4 C4-F4 C4-F#4 C4-G4 C4-G#4 C4-A4 C4-A#4 C4-B4 C4-C5

Sequence of all intervals

Music Performance & Tuning Perception

Regardless of tuning system, the perception of musical intervals by experienced listeners -within a given musical tradition- seems to be categorical rather than continuous; experienced listeners are willing to tolerate minor deviations from ideal tunings.
In addition, performers seem to systematically stretch and occasionally contract intervals based on melodic and harmonic context.

In general, listeners appear to have more tolerance for stretched (or "sharp") rather than contracted (or "flat") tuning.
Click here for an example. The first melody uses compressed/flat tuning, the second uses stretched/sharp tuning, and the third is mathematically correct, under equal temperament tuning (Houtsma et al., 1987).

Consistent with our preference for slightly stretched/sharp, rather than contracted/flat tunings, listeners tend to prefer stretched over numerically precise or contracted octave intervals, and tend to reproduce them as such.

Pitch & Frequency / Intensity / Duration

Pitch & Frequency

For simple/pure tones, pitch closely relates to frequency. Similarly to SIL [Sound Intensity Level] and loudness, frequency and pitch relate logarithmically: addition in perception (pitch) corresponds to multiplication in the physical variable (frequency).

As the frequency rises, the same pitch interval corresponds to increasingly larger frequency differences.
For example, pitch increase by an octave interval corresponds to frequency doubling. So, raising 200Hz by an octave means adding 200Hz, while raising 500Hz by an octave means adding 500Hz.

E.g.:
_ raising the pitch by 1 octave corresponds to multiplying the frequency by 2;
_ raising the pitch by 2 octaves corresponds to multiplying the frequency by 2x2 or 2² = 4
_ raising the pitch by 3 octaves corresponds to multiplying the frequency by 2x2x2 or 2³ = 8, and so on.

The figure, below, illustrates the frequency / pitch relationship. The same pitch interval (e.g. octave) corresponds to an increasingly larger frequency distance (after Campbell & Greated, 2001)

For a detailed overview of the frequency / pitch / music notation relationship see Chapter 2 in Newman, 2023 (source).

The frequencies corresponding to musical pitches/notes and to pitch/note intervals (interval: pitch distance)
increase logarithmically. Per the current standard, pitch A4 corresponds to 440Hz.

Pitch & Intensity

The pitch of pure tones also depends on intensity (see the figures to the left).

In general, increasing the intensity of pure tones:
  a) decreases the pitch of low frequencies (<~300Hz)
  b) increases the pitch of high frequencies (>~3000Hz)
       and
  c) has no noticeable effect at middle frequencies.

In the figure to the left:
   (a) A pure tone of frequency 98Hz has
         a pitch of G₂ when quiet (ppp), and
         a pitch lower than E₂ when loud (fff).
   (b), (c), and (d) show the influence of intensity
         on the pitch for pure tones with frequencies
         392Hz, 784Hz, and 3136Hz respectively.
         (in Campbell & Greated, 1987; derived
          from Stevens & Davies, 1939).

In addition, the introduction of a high-intensity "interference" tone will change the perceived pitch of existing low-intensity tones, even if the frequency of the low-intensity tones remains unchanged.

The pitch of a low level tone will
_ rise following the introduction of a high-intensity tone at a much lower frequency and
_ drop following the introduction of a high-intensity tone at a much higher frequency.

In other words, the high intensity tone pushes the pitch of low intensity tones away from it, assuming frequency separations beyond one critical bandwidth.
(What will happen if the intense and weak tones both fall within the same critical band?)

The figure to the left offers another, simplified illustration of the average dependence of the pitch of pure tones on intensity.
In summary:
    Increasing the intensity will
    _ lower the pitch of a low frequency tone
    _ raise the pitch of a high frequency tone
    _ have no noticeable impact on the pitch of a
       middle frequency tone

Pitch & Duration

Pitch also depends on duration. A tone must last more than a minimum amount of time (~10-60ms/3-100cycles, depending on frequency and intensity) in order to sound more than a 'click' and convey a clear sense of pitch (see below).
We will return to this during the module on Timbre.

Pitch of Complex Tones

Pitch & Spectrum - Analytic & Synthetic Listening
Pitch Theories

The dependence of the pitch of periodic complex tones (i.e. of complex tones with harmonic spectra) on frequency, intensity, duration, and the introduction of intense 'interference' tones is qualitatively similar to that of pure tones but more complex, due to the variety of frequencies involved.
The pitch JND for complex tones may be larger or smaller than that for pure tones, depending on spectral context.

Pitch & Spectrum

Pitch of Complex Tones with Harmonic Spectra

Reminder: Periodic/harmonic complex tones are usually perceived as a single unit, with their multiple spectral components merging into a single tone perception rather than being perceived as a set of individual pure tones.

The pitch of periodic complex tones with harmonic spectra (i.e. spectra whose components have frequencies that are integer multiples of the frequency of the lowest component, called 'fundamental') matches (in general) the frequency of the fundamental component. This is the case regardless of whether or not the fundamental component is perceivable (i.e. even when it is too low in level and/or masked) and even if it is not physically present in the tone's spectrum at all.

Phenomenon of the missing fundamental or "virtual pitch": the pitch of a complex harmonic tone matches the frequency of its fundamental spectral component, even if this and several additional low-frequency components are missing from the tone's spectrum.
The phenomenon of the pitch of the missing fundamental or "virtual pitch" provides evidence that pitch depends not only on frequency, intensity, and duration but also on spectral distribution. (The phenomenon challenges 'tonotopic' theories of pitch perception - see below.)

Listen to a pair of complex tones illustrating the phenomenon of the missing fundamental.
Both have fundamental of 300Hz and up to 15 components (ramp spectrum: A_n = A₁/n for all components). The first tone has all 15 components while the second is missing the first 5. As you can hear, removing these first 5 components does not change the tone's pitch, which continues to match that of 300Hz.

Frequency region most significant to pitch (depending on fundamental frequency and spectral & temporal context)

For complex tones with a fundamental of 100Hz, the pitch sensation is impacted after removing the first ~15-20 components and starts deteriorating after removing more than the first ~25 components.
For fundamentals of 500Hz and 800Hz, the same observations occur after removing the first ~8-10 and ~4-6 components respectively.
      Listen to this example. It includes 13 harmonic complex tones, with fundamental of 600Hz, played in succession.
      The first tone has all harmonic components from the 1st to the 16th and each successive tone drops one component,
      starting from the fundamental, until only the highest 3 components remain. What happens to the pitch?


Experiments examining the effect of mistuning some of a harmonic tone's components on the resulting pitch have determined that the frequency region most important to pitch is between ~400 & ~1500Hz. That is, mistuning or removing components laying within this region has the most effect on pitch, while presence of frequency components within this region results in the clearest pitch sensations.

Perceiving a complex signal's individual components

While harmonic complex signals are perceived as a unit, rather than a set of multiple pure tones, the individual harmonics can be heard if we draw attention to them by removing them and re-introducing them (example in Houtsma et al.,1987 - original experiments by von Helmholtz in the mid 1850s).

Tuva throat singers (peoples of Tibet and the Siberian grasslands) exploit this phenomenon to create musical passages where one singer appears to produce two pitches simultaneously: one acting as a fixed drone and one performing a sort of melody. They include:
     a) the single fundamental frequency produced, acting as a background drone, and
     b) harmonic components associated with this fundamental, selectively accentuated by the performer,
         acting as the melody line. [For more information and video/audio examples see here and here.]

Pitch of Complex Tones with Inharmonic Spectra

Inharmonic/non-periodic complex tones, whose frequency components are not integer multiples of a 'fundamental' component, do not elicit a clear, unique pitch sensation. Depending on their spectral distribution, inharmonic complex tones may
     a) elicit multiple competing pitch sensations
     b) resemble chords, or
     c) sound as noise.

As illustrated below, however, changing the spectral peaks of inharmonic complex tones without changing the frequencies of their sinusoidal components can result in a distinguishable change in pitch, matching the frequency change of the spectral peaks. This highlights the perceptual salience of changing vs. static stimuli and provides additional evidence of the dependence of pitch on spectral distribution.

Listen to three individual major chords (C, E, & G), performed by combinations of flute, clarinet, and oboe:
Clarinet-Flute-Oboe             Flute-Clarinet-Oboe            Oboe-Clarinet-Flute

Now, listen to this "chord melody" example, consisting of a five-chord sequence, each of which is one of the three chords, above. All 3 chords are major, include exactly the same notes (C5, E5, and G5), and have inharmonic spectra (i.e. their spectral components are not integer multiples of a single fundamental, even though the spectra of the individual notes in the chords are themselves harmonic).
The frequencies of the components are identical among chords but the spectral envelopes (i.e. relative intensities of the components) are different and depend on which instrument plays what note (a flute, a clarinet, or an oboe).

The melody you hear (C-E-G-E-C) tracks the position of the flute in each successive chord, because the flute's fundamental frequency provides the spectral peak for each chord's spectrum. In other words, the melody you hear matches the pitch changes corresponding to the changes in the flute's fundamental frequency.
Watch the video, below.

A similar effect can also be produced through spectral shaping of noise bands. Pitch perception will track changes in the spectral peak of the noise.

Virtual pitch of slightly inharmonic spectra, whose fundamental or additional low-frequency components are missing:

Assume the 3-component harmonic spectrum: 800+1000+1200Hz (components 4, 5, and 6 of a 200Hz missing fundamental and a slightly inharmonic version of the same spectrum: 798+1002+1204Hz. Assume equal amplitudes for all components.

_ The virtual pitch of the harmonic spectrum will correspond to 200Hz.

_ The virtual pitch of the inharmonic spectrum will correspond, approximately, to the average of the three implied fundamentals: (798/4 + 1002/5 + 1204/6)/3 = (199.5+200.4+200.7)/3 = 600.6/3 = 200.2Hz.

NOTE: Minor deviations from harmonic spectra (up to ~1-2% of frequency) and the way these interact when, for example, several instruments perform together in unison, change the timbre of the combined sound rather than its pitch. They produce what is referred to as 'chorus effect': richness of ensemble sound due to slow and varying beating rates among the slightly detuned components of the complex tones involved.

Analytic & Synthetic Listening

In this example, (adapted from Smoorenburg, 1970), two complex tones, each with 2 components, are presented in succession.
(a) 800Hz + 1000Hz (b) 750Hz + 1000Hz.   When moving from (a) to (b):

Some listeners hear the pitch going down. They follow the motion of the first component in each tone
(800Hz 750Hz ~1 semitone drop).
Explicit rules are employed to track the physical attributes of the two tones and determine the pitch motion.
This is considered an example of analytic listening.

Other listeners hear the pitch going up, by reconstructing the motion of the (missing) fundamental implied by the two complex tones
(200Hz 250Hz ~4 semitone or a major third rise).
Implicit rules are employed to synthesize a physical attribute that is implied by the rest of each tone's attributes, helping determine pitch motion. This is considered an example of synthetic listening.

Let's repeat the experiment, now with two complex tones, each with 5 components, presented in succession:
(a) 800Hz+1000+1200+1400+1600 (b) 750Hz+1000+1250+1500+1750.
In this case, most listeners only hear the pitch going up by a major 3rd, perceiving the motion of the (missing) fundamental (200Hz 250Hz). (Why?)

------------------------------------------------------------------

In Dannenbring's (1974) demonstration:
     (masking noise bursts filling tone gaps in a steady or frequency modulated tone - from our Hearing Module)

Listeners synthesize the sensation in a form of synthetic or holistic listening, based largely on implicit rules (rules we are not explicitly aware of).

If listeners are alerted to the fact that the presented tones include gaps, they may be able to perceive the gaps by directing their attention to separate portions of the total stimulus. This is a form of analytic or 'directed' listening, based largely on explicit rules (rules we are explicitly aware of).

------------------------------------------------------------------

The McGurk effect is an auditory illusion caused by ambiguous audio/visual cues. It illustrates that perceptions are often linked to an experience-guided synthesis of information from our environment, offering another example of synthetic listening.

The McGurk effect indicates that our responses to stimuli do not necessarily match the stimuli. Rather, they reflect the best way such stimuli fit to previous experience, concurrent context, and the associated expectations.

When staring at the speaker's lips, the effect is so robust that, even when alerted to the a/v stimuli conflict, listeners/viewers are unable to listen to the a/v composite analytically. Listening with eyes closed presents no ambiguity, resulting in a different sonic perception. Watch this BBC2 clip.

PITCH THEORIES
(refer to the optional section on Basilar Membrane Encoding, Module 3)

Place (Tonotopic) Theory

Pitch relates directly to the point of stimulation on the Basilar Membrane (BM).
Different frequencies resonate at different locations on the BM (from the Hearing Module).
Frequency is encoded as pitch by the inner hair cells corresponding to the BM portion resonating for that frequency.

Watch these (simplified & exaggerated) animations of the basilar membrane motion in response to:
1000Hz,    8000Hz,    1000Hz+8000Hz, & a range of frequencies (.mov files).
Read this brief outline of the place theory.

The theory was proposed by Ohm (1843), developed by Helmholtz (1862), and confirmed experimentally by von Békésy (1950s), who won the 1961 Nobel prize in medicine for his contributions to the understanding of hearing.

Pros

_ Applies to all frequencies.
_ Explains pathological conditions:
   diplacusis: different pitch sensations per ear for the same frequency, due to BM structural differences btw ears;
   presbycusis: pitch shift with age for the same frequency, due to BM hardening with age.

Cons

_ Cannot explain the coarseness of JND or the observed relationship of perceived pitch to intensity.
_ Unless it is modified, it cannot reliably explain two phenomena associated with the pitch of complex tones:

The pitch of the missing fundamental.

The place theory claims that, even when missing, the fundamental frequency and other low frequency components are re-introduced as the "difference" frequencies, distortion products arising from the interaction among the existing components in the tone's harmonic spectrum (the difference frequencies between upper harmonic equals the frequencies of lower harmonics).
The difference frequencies then excite the BM locations that correspond to the fundamental and other low-frequency components.

This claim is challenged by several observations, including the pitch-shift effects (see below). These effects refer to instances where spectral modifications result in the pitch changing towards a direction that appears to be opposite to that predicted by the place theory of pitch and the "difference frequency" hypothesis.


The persistence of the missing fundamental phenomenon even when the first 8 or more components are removed.

For most tones, only the first 5-8 components excite separate critical bands and can therefore elicit a unique pitch sensation. Beyond that, 2 or more components lay in the same critical band and cannot be differentiated based on a place theory of pitch. Therefore, the place theory of pitch requires energy at the BM location corresponding to the lower components for pitch sensation to register.

Temporal (Periodicity/Frequency) Theory

Periodicity/Frequency theories of pitch rely on time information, as represented on the signal itself or through its conversion to neural electrical spikes, following inner hair cell activity, to encode both frequency and intensity.

The volley theory of pitch was introduced in the late 1940s by American psychologist, E.G. Wever.
It explains pitch by combining information from a large number of inner hair cells associated with a specific place on the basilar membrane, obtained via two mechanisms:
     a) phase locking (neurons firing at a single point in the vibration cycle) and
     b) neural firing synchrony (multiple neurons firing in sync with each other).
     .
[Select the "Illustration" tab on this page for an interactive animation that describes the temporal theory.]

Whether based on the periodicity of the sound signal itself or the periodicity of the sound signal's representation in the auditory nerve (e.g. periodicity of neural spike activity), temporal models of pitch assume that the hearing mechanism extracts a signal's periodicity by performing an autocorrelation function on the signal.

Qualitatively, autocorrelation provides a measure of how well a signal matches a time-shifted version of itself, as a function of the amount of time shift (see to the right).

Pros

_ May explain why pitch perception becomes increasingly coarse past 5kHz.

_ May explain the perception of the missing fundamental.

_ Efficiently captures pitch and loudness information based on neural firing patterns and densities.

Cons

_ Cannot readily explain pathological conditions (diplacusis / presbycusis), which can be traced to
   the BM and have physiologically-based pitch perception implications.

_ Cannot readily explain the pitch shift effects.

_ Appears to break down far above 5kHz.

Hybrid Theories

Contemporary, hybrid theories of pitch incorporate aspects of both, Place and Periodicity theories (Plomp, 1964; Goldstein, 1973; Terhardt, 1974; Pierce 1990). Terhardt's theory of 'virtual pitch', introduces 'previous learning' as an additional novel component that can be developed based on either temporal or place pitch cues. It supports the development of "pitch templates" of BM disturbance patterns and neural firing patterns. These are used to support pitch decisions, including in ambiguous contexts (spectra with missing components, mistuned spectra, etc.).

In general, it appears that the most salient/prominent components of a signal's spectrum (in terms of intensity level and frequency separation) are the most important carriers of pitch information.

In the presence of multiple complex tones, the components of each complex tone are perceptually linked together into a single percept, while each complex tone can be distinguished from the others. This appears to be due to each particular complex tone's spectral jitter (i.e. fast amplitude and frequency micro-variations that are almost synchronized among the components of each individual complex tone).

The Octave - Multidimensionality of Pitch: Pitch Height & Pitch Chroma

The Octave (use this virtual piano to help you with the concepts in this section)
Pitch chroma: Term describing the distinctive quality of a given tone that a) links a given tone to tones one or more octaves away and b) separates it from the rest of the tones within an octave. Pitch chroma describes the perceptual sameness of pitches separated by one or more full octaves and perceptual 'differences'/'distances' of different pitches within an octave. It is reflected in the fact that the different note names (e.g... C, D, E, F, G, A, B, C, D ...) repeat periodically for every 2/1 increase in frequency (i.e. every octave) with the addition of a subscript (e.g. C₄) to indicate how high or low this pitch is relative to some reference pitch. In other words, a numeric subscript difference between two notes that share the same pitch chroma (e.g. C₄ vs. C₅) reflects a pitch height difference of one or more octaves between those notes. Pitch height:: Term describing the perceptual 'highness' or 'lowness' of a pitch, related mainly to frequency. The intervals A₃ (220Hz) - A₄ (440Hz) and A₄ (440Hz) - A₅ (880Hz) are both octaves. The three notes (A₃, A₄, A₅) have the same pitch chroma but different pitch heights. In terms of their pitch height, octaves are linearly equidistant perceptually but log equidistant in Hz. Notes represented by different names are different in both pitch chroma and pitch height. Within a single octave, all notes differ in both, pitch chroma and pitch height, with the pitch rising in accordance with the note name (CC">C₃ is lower than D₃, which is lower than E₃,etc.). [Why does standard scale naming starts from C and not from A? - see here.]
Multidimensionality of Pitch (pitch spiral) Pitch is multidimensional with at least three dimensions involving a) pitch height (one dimension: frequency) and b) pitch chroma (two dimensions, separating pitches within an octave and linking pitches one or more octaves apart). Note: A variable is uni-dimensional if all its values can fit on a single straight line. E.g.: If A, B, & C represent 3 values of a variable as points in Euclidian (i.e. regular three-dimensional) space then: a) if \|AB\|+\|BC\|=\|AC\|, the variable is uni-dimensional b) if \|AB\|+\|BC\|≠\|AC\|, the variable is multidimensional [analogous to the linear/nonlinear distinction] The Western chromatic scale breaks the octave down into 12 different pitch chromas. The pitch chroma dimensions represent a circularity in pitch perception. Due to this circularity, within a single octave, the octave interval represents the largest physical distance in terms of pitch height but smallest perceptual distance in terms of pitch chroma. Pitch perception therefore wraps on the octave, with scales defining sets of different pitch chromas that repeat at different pitch heights for each new octave. This is best represented by a pitch spiral (versus a pitch scale). The perceptual circularity of pitch is explored in Shepard-tone scales/slides (after 20th century American cognitive scientist and author, Roger Shepard) that present the paradox of a continuously ascending (or descending) pitch. Listen to two pitch spiral examples (Houtsma et al., 1987). Shepard scales/slides are the auditory analog of the continuously ascending/descending staircases, explored conceptually by English mathematecian, Sir Roger Penrose and artistically by 20th century Dutch graphic artist, M.C. Escher (see the images, below).
Relativity (Escher, 1953) Ascending-Decending (Escher, 1960)
Short video on pitch and a/v related audio illusions

[OPTIONAL SECTIONS]
(also refer to the optional section on Basilar Membrane Encoding, Module 3)

Place Theory of Pitch - Drawbacks

The Pitch-Shift Effects

The first pitch shift effect:
Shifting all components of a harmonic complex tone by a value: Δƒ, results in a shift of the perceived pitch by a value ΔP (two top graphs in the image).
This is so, despite the fact that the frequency spacing between the components (and therefore the difference frequency among successive components of the complex tone) remains unchanged and equal to ƒ₀.

The second pitch shift effect:
Increasing the spacing of the components of a harmonic complex tone by dƒ, while keeping the frequency of the center component n the same, results in a drop in pitch P'< P (bottom graph in the image).
This is so, despite the fact that the difference frequency has increased from ƒ₀ (solid lines) to ƒ₀+dƒ (broken lines).

Vassilakis (1998) showed that these two effects are not distinct but alternative manifestations of a single phenomenon. In a follow-up work, he argued that the pitch shift effect reflects our perceptual system's handling of the interaction between the phase and group velocities of the inharmonic tone complexes used in pitch shift experiments (Vassilakis, 1998b).

The artificial spectra created in explorations of the two pitch shift effects are slightly inharmonic but not inharmonic enough for the pitch sensation to deteriorate.

Unresolved Upper Harmonics

The place (or tonotopic) theory of pitch requires energy in the lowest 5-8 components (depending on fundamental frequency) in order to produce a clear pitch sensation.
These low components are the components that are usually resolved best by the basilar membrane (i.e. they lay within separate critical bands) and can therefore provide clear place-related pitch information and/or produce strong intermodulation distortion products.

The problem is that clear virtual pitch sensations persist even when the remaining components in a spectrum are not resolvable (see the 'unresolved upper harmonics' in the image to the left).

The above observations led to the first attempt to a temporal theory of pitch, called the "Residue" theory of pitch, developed first by Dutch physicist, J.F. Schouten (1938). It states that pitch is determined by the temporal interaction, at a neural level, of the unresolved, 'residual' upper harmonics in a spectrum. This theory is challenged by the experimentally determined frequency region most salient to pitch (~400-1500Hz).

Temporal Coding: Phase Locking and Rectification

The first systematic temporal (periodicity) theory of pitch was proposed by German scientist A. Seebeck (1843),
developed by British physiologist W. Rutherford (1886), worked out by Schouten (1938),
and has been reformulated several times since (e.g. volley theory of pitch by Wever, 1949).
Wever (main figure in the 'periodicity theory' camp) and von Békésy (main figure in the 'place theory' camp) were close friends.

Neural response (neural firing) follows (or appears to be locked to) the positive peaks in the stimulus, firing only when the stereocilia are sheared in one direction.

This results in the neural signals of sinusoidal inputs
i) having a pulse-like shape with repetition rate equal to the input's period (i.e. they represent frequency in the time domain) and
ii) being rectified (i.e. they are constricted to the positive portion of the two-dimensional signal graph).

The process of phase locking is closely related to hearing's "temporal coding theory" of encoding frequency information:

The inner hair cells release neurotransmitters only when the basilar membrane moves in one direction, towards the scala media. They therefore only respond to the positive portions of incoming signals, with their stereocilia opening their ion channels when bending against the TM mostly at a signal’s positive peak.

Temporal coding assumes that, thanks to phase locking, the auditory system encodes periodic information through the firing rate of neurons. This rate corresponds to the incoming signal’s period (or some multiple of it) and, therefore frequency.

Since neurons are not fast enough to encode high frequencies, more than one neuron must be involved in the process. Each neuron fires at some of the peak portions of an incoming signal and, after adding the outputs of all neurons, the signal is represented to the brain in a manner similar to that shown at the bottom graph (left).

Perception of pitch relations - Unit of pitch

Mel: Pitch-height unit and scale devised by American psychologist S.S. Stevens (1937, 1940). It is based on 'twice as high' perceptual judgments. 'Twice as high' amounts to a larger musical interval at high registers than at low registers.

Reference: 1000 mels = pitch of 1000Hz presented 40dB above threshold.

The Mel unit of pitch height is analogous to the Sone unit of loudness. The derivation of the Mel scale has been criticized for flawed methodology and is not in use.

Further Resources

_ Concise and systematic historical review of pitch theories (Alain de Cheveigné, IRCAM, Paris, France - source).
_ Spectral/Place and Temporal theories of pitch perception (Centre for Music and Science; University of Cambridge).

_ Pitch Perception presentation (A.J. Oxenham, Harvard - MIT - source)
_ Neural Coding of Pitch presentation (B. Delgutte, Harvard-MIT - source)
_ Music perception, pitch, and the auditory system (J.H. McDermott & A.J. Oxenham, University of Minnesota)
_ Revisiting place and temporal theories of pitch (A.J. Oxenham)
_ Von Békésy and cochlear mechanics (E.S. Olson et al. - Columbia University)

Key to the 3-tone listening example:
DD (the tones have frequencies 12025Hz, 12000Hz, and 11975Hz, in this order)

Loyola Marymount University - School of Film & Television