Fundamentals of Sound - Module 06: Timbre

Perceptual attributes of acoustic waves - Timbre Definitions / Scope / Signal & Spectral Correlates
The Importance of Studying Timbre (tone color - tone quality)
Timbre studies support the exploration of: speech recognition, communication, and simulation; the recognition/memory of sound objects and soundscapes; (soundscape: the totality of sounds heard in a musical performance/recording or, more generally, in a particular location or context) perceptual differentiation among musical instruments, genres, styles, gestures, and sonic expressions of cultural identity; timbral nuances as key means for the communication of musical expression; issues involved in orchestration (sonic blend, contrast, etc.) the key perceptual correlates of what sound designers and audio engineers broadly refer to as sound quality.
Timbre examination approaches: Acoustical/Psychoacoustical: Timbre is examined in terms of its physical (signal & spectral features of sound waves) and physiological (function of the ear) correlates and their relationship. Semantic/Cognitive/Aesthetic: Timbre is examined in terms of its function, meaning, value, and affective (i.e. emotional) qualities.
Definitions According to ANSI, timbre is: "that attribute of sensation in terms of which a listener can judge that two steady complex tones having the same loudness, pitch and duration are dissimilar." (American National Standards Institute, Acoustical Terminology, sec. 6.05; based on Plomp, 1970) In this definition, the term timbre refers to a vague set of sonic qualities that permit listeners to differentiate between, for instance, the sounds of string vs. reed instruments in the orchestra, if they perform the same note with the same intensity and for the same duration. Bregman (1990) points out that: “[The ANSI definition] is, of course, no definition at all. For example, it implies that there are some sounds for which we cannot decide whether they possess the quality of timbre or not. [...] Either we must assert that only sounds with pitch can have timbre, meaning that we cannot discuss the timbre of a tambourine or of the musical sounds of many African cultures, or there is something terribly wrong with the definition. [...] The problem with timbre is that it is the name for an ill-defined wastebasket category. […] What we need is a better vocabulary concerning timbre." [ Bregman, A.S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound, p.92-93, MIT Press, Boston, MA ].
It is difficult to agree on a single, all-encompassing definition of timbre, fact confirmed by the long list of timbre definitions available in the literature (see a partial list of definitions published up to 1997) and by its designation as an auditory "wastebasket." Hajda and collaborators propose two principle constituents of timbre perception: (1) Timbre conveys the identity of the instrument that produced it; this constituent is nominal or categorical in nature: the clarinet has a character to its sound, regardless of pitch, loudness, etc.; this characteristic separates it from other instruments (e.g. trumpet) and does so not along any specific scale or in any specific order but just along perceptually delineated categories; (2) Timbre represents a sonic palette or family of sonic palettes in which, tones from different sources can be related along one or more perceptual dimensions; this constituent can be ordinal in nature: the clarinet and the oboe are both woodwinds (i.e. belong to the same category) but can be ranked/ordered along a "nasality" scale, where the clarinet is at the low end of the scale and the oboe is at the high end of the scale (more on "nasality" further below). Hajda, J.M. et al., (1997). Methodological issues in timbre research. In I. Deliège & J. Sloboda (Eds.), Perception and Cognition of Music (pp. 253–306). Psychology Press/Erlbaum (UK) Taylor & Francis. Also see: Siedenburg, K. and McAdams, S. (2017). "Four distinctions for the auditory "wastebasket" of timbre." Frontiers in Psychology, 8: 1747.
Timbre - Spectrum - Signal Envelope Timbre is a multidimensional perceptual attribute of sound waves, related mainly to a complex wave's spectral distribution and secondarily to its signal envelope. Differences in these physical features manifest themselves perceptually in several ways (nasality, brightness, roughness, etc.; for a reminder see our previous discussion on spectra and on signal envelopes). Example 1: A sustained tone played by a Bb soprano clarinet is followed by the same tone presented by gradually increasing and then decreasing the number of spectral components (from the lowest to the highest in frequency). Example 2: A 220 sine tone of amplitude A followed by several more tones in which 7 additional harmonic components are incrementally added and removed (i.e. 2f, 3f, ... 8f), at amplitudes equal to A/n (n : number of harmonic component), returning back to the 220 sine tone. Example 3: Ascending sine-tone glide: 50-5000Hz. Does changing the frequency impact the timbre of the tone? If we consider sinusoidal waves a limit case of complex waves (i.e. complex waves with a single spectral component), we can expect that changing the frequency of a sinusoidal wave will change not only its pitch but also its timbre (i.e. its sound quality) as a consequence of changing its spectrum. Example 4: A recorded piano note played back regularly and backwards. Same complex tone, same average spectral distribution, different signal envelope shape. Key signal & spectral parameters related to timbral similarity/difference (based on Plomp, 1970 & 1976; Grey & Gordon, 1978; Kendall & Carterette studies in Hajda et all., 1997; etc.) signal time variance (envelope) degree of attack and decay synchrony of spectral components; presence/absence of high-frequency inharmonic energy in the attack portion of a signal; spectral energy distribution: frequency, amplitude and phase values of the sine components of a complex signal, which may change with changes in intensity and register, even for a given instrument; and spectral energy distribution time-variance (spectral flux or "jitter").

Timbre & Spectral Energy Distribution

Spectral energy distribution acoustic parameters (deduced/redacted from Grey, 1977; McAdams et al., 1995; Kendall et al., 1999; Lakatos, 2000):

spectral centroid: center of amplitude-weighted frequency distribution (low vs. high);

spectral bandwidth: spread of frequency distribution (narrow vs. wide);

spectral density: number of frequency components or total energy per critical band (sparse vs. dense);

spectral inharmonicity: departure from integer multiple frequency relationship relative to some fundamental component (harmonic vs. inharmonic).

Spectral energy distribution perceptual correlates

Spectral centroid is manifested perceptually as a sound’s degree of nasality-brightness vs. dullness-darkness, has been well-defined in the literature (e.g. Kendall & Carterette, 1996; Kendall et al., 1999; Marozeau et al., 2003), and will be discussed first.

Spectral bandwidth is manifested perceptually as a sound's degree of width/richness/fullness/thickness vs. narrowness/plainness/lightness/thinness and can significantly impact loudness, depending on the number of corresponding critical bands.

Spectral density & spectral inharmonicity are manifested in perceptions represented by models calculating/quantifying the sensations of auditory beating and roughness (collectively referred to as sensory dissonance).

Illustration of spectral density, inharmonicity, and bandwidth.

Helmholtz was the first to theoretically and experimentally link timbre (a perceptual aspect of sound waves) to spectral distribution (a physical aspect of sound waves).
He specifically focused on the spectral distribution of the steady state portion of sound signals, which was presumed to also be steady.
This approach overlooked several acoustical aspects that are important to timbre perception, such as onset transients (i.e. attack) and signal/spectral time variance.

Spectral Centroid

"Nasality" (spectral envelope slope)

Kendall et al. (1999) have demonstrated that the degree of a sound’s “nasality” constitutes the primary dimension of timbre. They link “nasality” directly to spectral centroid, a measure of the energy distribution in the spectrum of a complex signal, within a given time window.
[ Kendall, R., Carterette, E., & Hajda, J. (1999). Perceptual and acoustical features of natural and synthetic orchestral instrument tones. Music Perception, 16(3), 327-364 ]

In general, spectra with more
   _ high-frequency energy correspond to higher centroid values and to more 'nasal' sounds.
   _ low-frequency energy correspond to lower centroid values and to more 'acute' or 'duller' sounds.

Qualitatively, spectral centroid can be likened to a spectrum's "center of gravity" or "titter-totter fulcrum," where amplitude values represent "weights" and frequency values represent the "position" of each weight on the titter-totter. Graphically, two spectral distributions with the same spectral centroid have the same overall spectral-envelope slope (see "formants," below, for the contribution of spectral envelope shape details).

Listen to pair of harmonic complex toness with the 7 components each, the same fundamental frequency (220Hz), but different centroid values (.wav file). The first tone has most of its energy in the low components (low centroid value), while the second has most of its energy in the high components (high centroid value).

The didjeridu is an example of an instrument whose performance practice and aesthetic qualities rely heavily on spectral centroid manipulation. Listen to "Green Frog", a Wangga song from Arnhem Land, Northern Australia.

OPTIONAL: The formula, below, relates spectral centroid to the frequency f and amplitude A values of a complex signal's spectral components, for a total of n components (not necessarily harmonic):

Centroid: Σ_1-n f_n*A_n / f₁*Σ_1-nA_n (formula explained briefly in class).

Including f₁ in the formula's denominator results in centroid values that are independent of the lowest frequency and, therefore, musical register.
Removing it results in a centroid value that, in addition to degree nasality, it represents degree of brightness.

"Brightness" (frequency center of spectral energy) is closely correlating with a register-dependent centroid (i.e. by removing f₁ from the centroid formula's denominator; see just above). This captures the dependence of a signal's degree of "brightness" on the actual frequency center of its spectral energy distribution, rather than just the slope. [ Marozeau, J. and de Cheveigne, A. (2007). The effect of fundamental frequency on the brightness dimension of timbre. J. Acoust. Soc. Am., 121(1): 383-387 ].
The two spectra, below, have the same centroid, spectral envelope slope, and degree of nasality, but the one to the right is brighter, because the frequency center of its spectral energy distribution is higher.

Formants (spectral envelope peaks)

The resonant characteristics of all sound sources (e.g. musical instruments; voice) enhance certain spectral regions of the sounds produced. When these enhanced spectral regions remain the same, regardless of the note (i.e. fundamental frequency) produced, they are called formants and contribute to the identification of instrumental timbres and vocal sounds. ( Resource on Formants )
In most cases, identifying the three most prominent spectral envelope regions, or formants, of a sound is sufficient to capture
a) the perceptual impact of the source's resonant characteristics and
b) its overall perceptual identity.

As we discussed during the Musical Instruments module, formants appear to be responsible for the recognizable differences between various vowel sounds and are used in speech recognition and synthesis applications [ pioneered by Peter Ladefoged; UCLA ].

For example, the vowels in this resource are created by spectrally shaping generic sawtooth spectra, using spectral envelopes corresponding to each vowel's first 4 formants [ more details ].

Certain emotional states are accompanied by characteristic/typical facial expressions and, consequently, characteristic vocal fold tension and/or resonator shaping. This corresponds to characteristic spectral shaping and accompanying timbral features that pair emotional states to timbral "signatures."

Consider, for example, your ability to tell whether or not someone is smiling while they speak, even in the absence of visual and linguistic cues.

Watch this spectral breakdown of the vowel 'A' (male voice; 100Hz fundamental frequency).

[source]

The images illustrate spectrograms for four words (Hot, Hat, Hit, Head), spoken at high (top) and low (bottom) fundamental frequency / pitch. They display frequency over time, with level differences represented as color differences (e.g. dark blue: lowest levels / dark red: highest levels).

Spectral Density & Inharmonicity: Beating; Roughness Sensory vs. Musical Consonance/Dissonance (section expands on our earlier discussion of Auditory Interference)
Beating & Roughness are the perceptual manifestations of the hearing system's response to spectral density and inharmonicity and the corresponding complex signals' amplitude fluctuations.
Definitions - Reminders interval: pitch-height distance between two tones harmonic interval: interval between two tones played simultaneously melodic interval: interval between two tones played sequentially
Consonant harmonic intervals [non-discordant, pleasant, "smooth"] 8 consonant intervals (including Unison) Unison: 0 semitones difference between interval notes (maximally consonant) Octave: 12 semitones Perfect fifth: 7 semitones Perfect fourth: 5 semitones ....	Dissonant harmonic intervals [discordant, unpleasant, "rough"] 5 dissonant intervals Minor second: 1 semitone difference between interval notes (maximally dissonant) Major second: 2 semitones Augmented fourth: 6 semitones Major seventh: 11 semitones ....
Sensory Consonance Term referring to the perceptual 'smoothness' of a harmonic interval. The further apart on the basilar membrane the resonance regions for the components of the two tones in the interval, the less 'rough' the resulting sound and the more consonant (smoother) the interval. "Sensory consonance" refers to consonance understood specifically as absence of the sensations of auditory beating and roughness. Sensory Dissonance Term referring to the perceptual 'roughness' of a harmonic interval. The closer on the basilar membrane the resonance regions for the components of the two tones in the interval, the 'rougher' the resulting sound and the more dissonant the interval. "Sensory dissonance" refers to dissonance understood specifically as presence of the sensations of auditory beating and roughness. Sensory consonance & dissonance correspond to the degree of interaction among interval-note components whose resonance regions along the basilar membrane are less than a critical band apart. Consequently, the degree of beating & roughness (i.e. the degree of sensory dissonance) of a given harmonic interval is directly linked to the interaction of the interval notes' frequency components within the ear (refresh your memory on critical bands). In general, the denser and/or more inharmonic a complex signal's spectrum, the more likely for sensory dissonance (i.e. beating & roughness sensations) to arise.
Beating, Roughness, & The Basilar Membrane The degree of "smoothness," blending, or "sensory consonance" of a given harmonic interval has been linked to the degree of basilar membrane disturbance-pattern matching between the interval tones and, consequently, the degree of simplicity in the two-tone disturbance pattern. The figure, below, includes schematic diagrams of idealized disturbance patterns corresponding to the low-frequency harmonics of tones in three harmonic intervals: _ (A) octave; all disturbance peaks of the high-frequency tone coincide with the even components of the low-frequency tone; _ (B) fifth; less coincidence; and _ (C) third; even less coincidence.
Assuming the ear performs frequency analysis on incoming signals, the perceptual manifestations of amplitude fluctuation can be related directly to the bandwidth of the analysis filters, depending upon and defining the ear's critical bandwidth. For example, in the simplest case of amplitude fluctuations resulting from the addition and interference of two sine signals with frequencies f₁ and f₂, the fluctuation rate is equal to the frequency difference between the two sines \|f₁-f₂\|, and the following statements represent the general consensus: _ If the fluctuation rate is smaller than the critical bandwidth, then a single tone is perceived either with beating (fluctuating loudness) or with roughness. _ If the fluctuation rate is larger than the critical bandwidth, then a complex tone is perceived, to which one or more pitches can be assigned but which, in general, exhibits little or no beating or roughness. More specifically, periodic signal amplitude fluctuations are linked to the frequency difference among simultaneous tones and can be placed in three overlapping perceptual categories related to the rate of fluctuation: Slow amplitude fluctuations (<~10-15 per second) are perceived as loudness fluctuations, referred to as beating. As the rate of fluctuation is increased >~15, the loudness appears to gradually become constant and the amplitude fluctuations are perceived as "fluttering," "buzzing," or roughness. As the amplitude fluctuation rate is increased further, the roughness sensation reaches a maximum strength and then gradually diminishes until it almost disappears, at >~75-150 fluctuations per second, depending on the frequency of the interfering waves. The critical bandwidth (1/3 of an octave at middle frequencies and therefore proportional to center frequency of interfering tones) determines the amplitude fluctuation rate limit and the associated frequency-difference interference limit on the basilar membrane at which roughness disappears. So, as the center frequency of two interfering tones rises so does that upper frequency difference limit of the roughness sensation. Review the Beating & Roughness Resources presented in Module 3.
______________________________	______________________________
Sensory vs. Musical Consonance/Dissonance The concepts of consonance and dissonance are approached in our class specifically and narrowly from within the physical (sound wave properties) and physiological (ear properties) frames of reference. The approach applies to acoustic or sensory (not musical) consonance/dissonance, determined by the extent of beating and roughness sensations generated from the interaction on the basilar membrane of the different frequency components within a complex signal's spectrum. The concept of sensory consonance/dissonance only applies to harmonic intervals and is based on clear-cut physical/physiological considerations, applicable to all cultures. However, how "pleasant" or musically consonant a given degree of beating or roughness is judged to be is culturally defined, with no universally "correct" judgment. "Whether one combination [of tones] is rougher or smoother than another depends solely on the anatomical structure of the ear, and has nothing to do with psychological motives. But what degree of roughness a hearer is inclined to … as a means of musical expression depends on taste and habit; hence the boundary between consonances and dissonances has frequently changed … and will still further change… " (Helmholtz, 1875) Examples Within the Western Art musical tradition there is a strong link between roughness and annoyance, manifested in the assumption that rough sounds are considered inherently bad or unpleasant and must therefore be avoided. Instrument construction and performance practices outside the Western art musical tradition, however, indicate that the sensation of roughness can be an important factor in the production of musical sound. Manipulating the roughness parameters helps create a buzzing or rattling sonic canvas that becomes the backdrop for further musical elaboration. It permits the creation of timbral or even rhythmic variations (through changes among roughness degrees), contributing to a musical tradition’s menu of expressive tools. Watch the Mijwiz and Ganga video examples presented in class. Watch a video outline of the physics and music theory of musical dissonance. [ Optional: Study exploring the cultural correlates of the roughness/dissonance relationship. ]
Degree of sensory consonance/dissonance (i.e. of roughness/beating) is just one of the several cues informing musical consonance/dissonance judgments. What is considered musically consonant (acceptable, pleasing, fitting, correct) or dissonant (unacceptable, disturbing, unfitting, wrong) also depends on melodic, harmonic, rhythmic, and dynamic context, goes beyond physics or physiology, and may change: a) with time (historical context), b) with tradition (cultural context), or even c) within a single tradition, style, or piece of music (musical structure context). Melodic and harmonic development are often based on musical consonance/dissonance contrasts. Musical structures can be created through a back and forth move between (musical) consonance and dissonance that outlines a musical piece's consonance/dissonance 'contour'.

Timbre, Combination Tones, & Basilar Membrane Disturbance Patterns
Combination (subjective) tones The term Combination Tones was introduced by Helmholtz to describe tones that can be traced not in a vibrating source but in the combination of two or more waves originating in vibrating sources. Combination tones are the products of wave interference and have physical, physiological, neurological, and cognitive origins. Watch the video, to the right, for a brief outline of the phenomenon and its history. [Optional: Additional Information]
A specific combination tone, the difference tone, is one of the perceptual manifestations of amplitude fluctuation. The difference tone is a tone with pitch corresponding to the frequency \|f₁ – f₂\| (i.e. amplitude fluctuation rate) heard when two tones with fundamental frequencies f₁ and f₂ are played together. Experimental evidence indicates that the difference tone can be partially traced to the nonlinear response of the cochlea. In this example, there are 4 successive tones: (a) 700Hz, (b) 1000Hz, (c) 700+1000Hz. and (d) 300Hz. When listening to tone (c) at a high level, the difference tone (300Hz) can be heard in the background. Tone (d) is presented as a reference. Whether created in the physical (e.g. in the sound source or the propagation medium) or the physiological (e.g. in the ear) frame of reference, combination tones belong to the spectral distribution of a signal, as this is manifested in basilar membrane disturbance patterns.
Timbre & Basilar Membrane Disturbance Patterns Performance techniques, resonant and feedback characteristics of a vibrating system, aural harmonics, other subjective/combination tones, and the phenomenon of masking, all influence the timbre of a sound, by changing its effective (i.e. reaching the inner ear) spectral and signal envelopes. At one level, timbre correlates with Basilar Membrane disturbance patterns and with the way these patterns change with time. Such an approach can account for timbral similarities/differences due to spectral distribution, register, signal and spectral time variance, formants, and combination tones. Associating BM disturbance patterns to timbre identification/discrimination is analogous to associating them to pitch identification/discrimination and to total loudness. It is consistent with a) the general observation that all perceptual and physical attributes of sound waves are, at some level, interdependent & b) the specific observation that relative and quasi-absolute pitch judgments are facilitated by timbral cues Relative Pitch Judgment: ability to identify/reproduce a pitch value relative to a provided pitch reference.. Absolute Pitch Judgment: ability to identify/reproduce a pitch value in the absence of a provided pitch reference.

Timbre & Time Variance, Duration, and Time Separation

Timbre and Signal/Spectral Time-Variance (signal envelope, spectral flux, etc.)

The average spectrum of a complex signal describes the amount of energy in each of the signal's frequency components (partials) but does not describe how the amount of energy in a signal or spectrum changes with time, a change that also influences timbre.

Signal time-variance can be represented through the signal envelope.

Spectral time-variance (spectral flux) can be represented through time-variant spectra, spectrograms, or individual component amplitude/frequency envelopes.

Signal Time-Variance

The significance of signal envelope to timbre can be demonstrated by playing a sound backwards. The sound signal changes with time, while its average spectral distribution remains the same. Example 1 - Example 2 - Example 3. (based on experiments by Houtsma et al., 1987).

REMINDERS:

Attack: The portion of the envelope tracing the development of a sound signal towards its maximum amplitude. It represents how energy is built up in a vibrating system and, in music, can be manipulated through instrument generator excitation methods (bowing, plucking, striking with a hard or soft mallet, etc.).

In signal synthesis contexts, the short time between the attack and steady state portions of a signal, during which the source generator's response settles following initial excitation, is referred to as "decay." Prior to the advent of synthesizers, the term "decay" had been used to describe the final signal portion, now referred to as "release." In the context of acoustics, decay is a more appropriate term for the last portion of a signal envelope.

Steady state: The portion of the envelope corresponding to a vibrating system's response to the continuous supply of energy, modulated via performance techniques such as vibrato, muting, damping, bowing pressure, driver excitation location, etc..

Decay (Release): The portion of the envelope that traces the drop in amplitude (or "decay") of a sound signal from its maximum value to zero, following the "release" of the source's generator from excitation. Decay occurs when energy stops being supplied to a vibrating system, and represents how energy stored in a system eventually dissipates.

Decay time depends on:

The resonance and feedback characteristics of the vibrating system. More specifically, the sharper a resonator's tuning and the less the feedback in the vibrating system the shorter the decay time.

The environment within which the system vibrates. More specifically, the less boundaries around the vibrating system and the lower their reflectivity, the shorter the decay time.

Based on envelope shapes, we classify signals in two broad categories:
   i) Continuous signals (image above, left),
      where most of the energy is contained in the steady state portion of the envelope (e.g. signal of a bowed violin string).
ii) Impulse signals (image above right),
      where the envelope has no steady state portion and the attack portion is much shorter and steeper than the
     decay portion (e.g. signal of a struck, marimba bar).

The attack portion of the signal envelope contains a signal's onset transients: frequency components that are
a) usually inharmonic,
b) reach higher amplitudes than other components,
c) die out rather fast, and
d) contribute to a signal's perceived degree of noisiness/naturalness.

The attack influences the timbre of any signal but significantly more so that of impulse signals. In this example, three orchestral instruments are presented with the attack portion of their signal removed. Can you recognize the instruments? If yes, which instrument's sound has been impacted more by the removal of the attack? [ key at the bottom of the page ].

NOTE: The separation between a signal envelope's sections is, in most cases, not clear-cut (especially between attack and steady state).
              In addition, the so-called "steady state" represents a portion of the signal during which several of its acoustical parameters are changing.

Spectral Time-Variance

Spectral time-variance describes changes in the frequency and amplitude of a complex tone's components with time. It can be represented by:

A) Time-variant spectra: 3D "waterfall" plots; usually with frequency on the x axis, time on the y axis, and level on the z axis.

Click here for a video of the time-variant spectrum of all 3 piano examples, mentioned above (Houtsma et al., 1987).

B) Spectrograms: 2D plots; usually with time on the x axis, frequency on the y axis, and intensity in different color shades.
See and hear 2D Spectrograms of sample sound sources.

Spectrograms from (a) warbler, (b) whale, (c) flute, (d) singer singing a steady tone.
(from "Music without Borders" by Susan Milius)

C) Amplitude (or, less commonly, frequency) envelopes of the individual spectral components: 2D plots, usually with time on the x axis, level (or frequency) change on the y axis, and spectral component frequency or number in different color/style lines:

Spectral time-variance contributes to the "naturalness" of a sound, while deviations from harmonic spectra and the way these interact when several instruments perform together in unison change the timbre of the resulting sound and contribute to its 'liveliness' and what is perceived as a 'chorus effect'.

Timbre, Duration, & Time Separation

As discussed previously, there is a

duration threshold for loudness (~200ms for sine signals; ~400ms for broadband signals), below which loudness appears to increase with increase in duration, even if the sound intensity level remains fixed, and

duration threshold for pitch (~10-60ms, depending on frequency and intensity) below which sounds lose their pitch identity.

Durations that are shorter than a given signal's attack portion significantly degrade timbre perception.

In addition, two signals separated by a time delay that is shorter than a specific time separation threshold of ~2-50ms, depending on the spectra of the two signals, sound as one (per Hirsch, 1959; details and analysis in Divenyi, 2004). In this case, introduction of the second, delayed signal has an effect on the original signal's:

loudness (the loudness of the original signal appears to increase) and

timbre (the original signal's attack portion becomes less sharply defined, resulting in a timbre with a rather 'blurry' onset).

This example presents 3 complex signals with fundamental 600Hz.: (i) a single complex tone, (ii) two complex tones separated by 30ms, and (iii) two complex tones separated by 150ms. The introduction of the second tone in (ii) is perceived as an increase in loudness and a change in the attack of the first tone. The introduction of the second tone in (iii) results in the perception of two tones with two distinct onsets.

Duration and time separation thresholds are linked to the mechanical and electro-chemical latency of the auditory system and the associated forward masking effect.

Previous experience and context can override such psycho-physiological limits, allowing listeners to make pitch, loudness, and timbre judgments for tones with durations below the suggested thresholds.

Listen to a melody performed using 7 notes shortened to clicks (2 signal cycles per note). Stripped from context, this melody is unrecognizable, to most listeners. If listeners are told that this is the opening line of "......(look at the bottom of the page)..." they are able to hear the intended pitch contour. After listeners have been primed to listen to this tune, some may continue to hear it even if the "notes" represented by each click follow a random pitch pattern.

SUMMARY LIST

Signal onset transients contribute to a signal's degree of noisiness/naturalness

Spectral density and inharmonicity contribute to a signal's degree of beating/roughness

Spectral inharmonicity contributes to a signal's degree of fluidity/liveliness (e.g. chorus effect)

Spectral centroid contributes to a signal's degree of nasality and brightness

Spectral time-variance contributes to a signal's perceived appeal and degree of naturalness

Table summarizing the relationship between spectral variables and verbal descriptors of timbre (Vassilakis, 2009; unpublished)

Cognitive Aspects of Timbre

Timbre Perception: Categorical or Continuous?

The same instrument may produce notes with widely differing spectral envelope shapes, when performed at different intensity levels (Butler, 1992: 72) or registers, but will most likely retain its timbral identity, suggesting that: timbre perception may be categorical.

For example, the signals of low, middle, and high pitched notes on a piano will have very different spectral envelopes but will, in general, continue to be identified as belonging to a single instrumental timbre category, that of a piano.
Accordingly, imposing the same spectral envelope and overall spectral distribution across the playing range of a single instrument (as in early samplers) results in tones that cannot convincingly convey the instrument's identity.

_ For example, listen to the sound of a violin playing C4.
_ Now listen to the same sound transposed up to C5 or transposed down to C3, changing the pitch while keeping the spectral envelope shape the same.
Do the two transposed tones convincingly convey the instrument's identity as being that of a violin?
[ Spectral evolution videos of a similar example on the bassoon: 2D Spectra - 3D Spectra ].

On the other hand, several studies that gradually morph signals from one instrumental spectral distribution to another have shown that, perceptually, the timbre does not abruptly move from the first instrument to the second at some fixed point in the morphing stage but appears to transform perceptually in a gradual manner, as does the physical stimulus. This suggest that: timbre perception may be continuous.

In this sound morphing example, a C4 tone played on a French Horn gradually (in 10 steps) morphs into a C4 tone played on a Bb Clarinet (after Butler, 1992). Does the transition from French Horn to Clarinet seem gradual or abrupt?

Based on such observations, the previously discussed ability to group together the widely different spectral distributions of different notes on the violin or piano under a single timbre category (that of the violin or the piano) might be based on higher level cognitive processing, guided by our experience with an instrument's sound throughout its pitch range. This claim is supported by studies that show a larger decline in timbre identification with changes in register for unfamiliar versus familiar instrument sounds.

Timbre perception appears to be more continuous for unfamiliar sounds and more categorical for familiar sounds.

Multidimensionality of Timbre

The observed difficulties in defining and quantifying timbre are due to its multidimensional nature. Several studies have attempted to define timbre based only on a tone's steady-state spectral characteristics (e.g. Helmholtz, 1875; Slawson, 1985) or by incorporating time-variance information that, in some cases, includes spectral jitter (i.e. temporal micro-variations in the amplitude and frequency of individual spectral components).

Grey's (1975) timbral similarity experiments revealed three primary physical dimensions related to timbral judgments, based on whether their spectra are:
    a) narrow vs. wide;
    b) coherent vs. independent; and
    c) low vs. high centroid (overall or attack-specific).

Instrument identification experiments support the timbral clustering in Grey's (1975) 3D plot (in Butler, 1992: 132), but reveal asymmetries in identification confusion. For example, the bassoon may be confused for a French horn and the saxophone may be confused for an English horn, but not the other way around.

Experimental studies that involve melodic and harmonic contexts (e.g. Grey, 1977) suggest that perceptual strategies for timbral recognition and discrimination vary, depending on:
    a) whether spectral (for continuous signals) or temporal characteristics (for impulse signals) of a tone are more pronounced;
    b) the simultaneous presence/absence of other instruments/sources;
    c) attention shifts and larger musical/sonic context.

[ Optional: Various timbre spaces, based on a combination of psychoacoustical and cognitive studies. ]

[OPTIONAL SECTIONS]

Musical Context & Timbre

A series of studies (e.g. Kendall, 1986) provide further support to Grey's 1977 claim that the perception of timbre also depends on musical context. Different strategies are being employed depending on the types of tones in question. The most salient timbral cues are provided by time-variant information, within the so-called steady-state of a signal, and the way it modulates in realistic musical contexts.
Overall:
Timbre recognition/discrimination is based on different acoustical cues depending on whether the tones in question are in solo & static, vs. multi-instrumental & time-variant contexts.

Timbre Spaces & Semantics
(full-size slides & printable copy, also displayed below)

Dynamic, Melodic, and Harmonic Planes ( Pierre Schaeffer's "musique concrète" )

In an alternative approach by composer and music theorist Pierre Schaeffer, all necessary aspects of the acoustic correlates to timbre (spectral distribution & signal/spectral time-variance) can be described in terms of only three dimensions/planes: a) dynamic plane, b) melodic plane, and c) harmonic plane.
Schaeffer is the key proponent of "musique concrète" (French for "concrete music"), an experimental music genre that uses recorded, "real-world" sounds (e.g. machinery, nature, voices, instruments), manipulated and assembled into sound collages.

Schaeffer's Timbre Theory
(full-size slides & printable copy, also displayed below)

Spectral Music

Spectral music focuses on the physical, perceptual, and aesthetic attributes of timbre. Spectral composers such as Tristan Murail, Gerard Grisey, and Jonathan Harvey approach spectral distribution less as a physical correlate of timbre and more as a means to structure musical compositions. They claim that musical structures based on sounds' spectral features produces perceptible results but the intended effect remains ambiguous.
Is spectral music supposed to communicate
a) a metaphorical image of the "natural" (as captured by the natural response of music generators and the spectral distribution of the sounds they produce) or
b) a direct, literal translation of sonic properties and acoustic phenomena into musical pieces?

More radically than Cage's acceptance of 'sounds' as music*, spectral composers view music as "a special instance of the general phenomenon of sound", as "sound evolving in time" (Fineberg, 2000). Believing that music is just sound opens up the possibility for directing attention to the usually overlooked, within the Western musical tradition, timbral dimension of music. [ By "accepting 'sounds' as music," Cage meant elevating noise, ambient sounds, and everyday environmental sounds to the status of music. ]

"What sparked that initial jolt of attraction to musical composition in me when I was little was not melody or structure, it was something about the sound of music--sound as opposed to what you're doing with the sound; like the crunchiness of a Bartók Quartet. It was timbre." J. Fineberg.

For extensive research on Spectral Music see several issues (e.g. 19(2) & 19(3) from 2000; 22(1/2) from 2003; etc.) of the Contemporary Music Review Journal, available online through the LMU Library (LMU login required).

Key Works

Penderecki is one of the few composers that have based entire works on natural timbre manipulation. Threnody for the Victims of Hiroshima (1960) for 52 strings can be seen as a set of variations upon a harmonic (i.e. simultaneous) tone cluster. The work ends with a 30-second diminuendo from fff to silence, where all 52 strings sustain a single note each at 1/4-tone intervals, producing beating/roughness timbral effects that gradually transform into fading white noise.

James Tenney's work (e.g. Harmonium #5, 1978) includes several examples of spectral techniques used explicitly to attract attention to musical timbre.

Other Examples

Electronic timbre manipulation was prominent in the works of Varése and Stockhausen. Some of their works follow, in many respects, Cowell's (1930s) idea of devising a complex rhythmic system by superimposing poly-rhythms in the proportions of harmonic spectra, emphasizing timbre over pitch.
     Gesang der Junglinge by Stockhausen (1955/56).
     Poéme èlectronique by Varése (1958)

In one 'spectral' technique, composers analyze a portion of a signal they believe contains crucial information about a given tone's sonic meaning, and use the analysis results as the basis for newly-composed pieces. Note that, spectral analysis, a pre-compositional process, does not necessarily produce audible results that reflect this process.

Listening to spectral music has been likened to participating in a music perception experiment, aimed at discovering whether and how a given acoustical principle (e.g. the relations among spectral components) will function audibly under new conditions as, for example, in musical pieces created based on such relations.
     Come Out by S. Reich (1966)
     Different Trains by S. Reich (1988): America - Before the war

Grisey's Partiels (1975) has been described as an exploration of the sound of the trombone, where the results of a computer spectral analysis of a trombone tone are orchestrated for different instruments. Perceptually, there is nothing from the trombone preserved in this composition. In addition, the spectra from a variety of other instrumental tones could have served the same purpose. Setting aside the fact that listeners seem intuitively interested in the idea that they are hearing "the sound of a trombone", the original trombone is employed simply as a source of inspiration for compositional material, and the guiding metaphor of the piece seems to be the aesthetic/sensual appreciation of sound quality.

Additional compositions by Grisey.
_Vortex temporum (1996)
A few piano keys are re-tuned to produce the frequencies common to all the spectra employed in the piece
    1. I
    2. Interludes I & II
      3. III & Interlude 3
_Taléa (1986)
A freer piece, exemplifying Grisey's focus away from intellectualized concerns of form to perceptual concerns of timbre.

Murail's L'esprit des Dunes re-presents sampled sounds as more or less transformed images of themselves, manipulating degree of resemblance to their original forms. When processing his samples, Murail preserved the time variant aspects of the spectra in an effort to keep "some of the internal life of the sound" (Smith, 2000).

Several popular avant-guard composers have experimented with timbre-centered music, producing interesting results:
Distant Hill by Brian Eno

Key to the "signals with no attack" listening example: The three instruments are piano, clarinet, and French horn, in this order.
Key to the "clicks" melody: "Mary had a little lamb"