RECA-293 - Module 03A

Introduction

The initial stage of any musical activity (or even ideation) involves

a) sense perception of basic acoustic features: intensity, frequency, spectral distribution, duration, and sound wave direction,
and
b) auditory processing, in the auditory cortex, of the corresponding sonic features: loudness, pitch, timbre, perceived duration, & sound source localization.

Each sonic feature is linked to a primary acoustic feature. E.g.:
_ loudness to intensity;
_ frequency to pitch;
_ spectrum to timbre;
_ duration to perceived duration;
_ sound-wave energy differences between the two ears to sound-source localization.
However, each sonic feature also depends, to some degree, on all other acoustic features.
[ For details on sound waves' acoustic features see the Acoustics A & Acoustics B modules (RECA220) & Foundations of Acoustics (Center for Music & Science) ]

L O U D N E S S

Sonic (i.e. perceptual) attribute of sound waves that distinguishes them on a quiet-loud scale, related mainly to intensity.

Loudness & Intensity

All else (i.e. frequency*, spectrum**, duration, & sound wave direction) being equal:

Increasing/decreasing intensity I (in w/m²) or SIL (in dB) increases/decreases loudness.

(Intensity is proportional to the square of displacement amplitude)

Loudness and Intensity are related logarithmically: as intensity rises, an increasingly larger amount of intensity is needed to produce the same increase in loudness. To capture this, we use a logarithmic scale for Sound Intensity Level (SIL), constructed relative to the lowest audible intensity and measured in decibels (dB).

Lowest audible intensity: I₀=10^-12w/m²; SIL₀=0 dB. Highest safe intensity: =1w/m²; SIL=120dB
Average dynamic range of human hearing: 120dB (@ 1000Hz; it changes with frequency).

Key Facts:
_ Doubling the intensity, in w/m², corresponds to SIL increase by 3dB.
_ Doubling the loudness corresponds to
      a) intensity increase by a factor of 10 and to
      b) SIL increase by 10dB

Just Noticeable Difference (JND) for SIL
(i.e. minimum SIL change that can be perceived as a loudness change):
    ~1dB at moderate levels and middle frequencies.
    The SIL JND increases at frequencies below/above middle frequencies.
    For starting levels > 95-100dB the SIL JND increases at all frequencies.

Note: Sonic energy is often expressed in terms of Pressure & SPL, rather than of Intensity & SIL.
SPL and SIL scales are equivalent but Intensity is easier to treat logarithmically.

Loudness & Frequency

All else (i.e. intensity, spectrum, duration, & sound wave direction) being equal:

Middle frequencies (1-6kHz) have lower threshold and larger dynamic range than higher frequencies, which have lower threshold and larger dynamic range than lower frequencies.
Alternatively: Sensitivity and dynamic range are highest/largest at middle frequencies, lower/smaller at high frequencies, and lowest/smallest at low frequencies.

At low SILs, loudness depends strongly on frequency (i.e. at low SILs, signals of the same level will vary significantly in loudness, depending on frequency). The same SIL sounds loudest at middle frequencies, quieter at high frequencies, and quietest at low frequencies.
As SIL increases this effect decreases and the ear's frequency response becomes increasingly flat (i.e. same SILs sound almost equally loud regardless of frequency).

The Equal Loudness Curves/Contours (graph to the left) are lines that describe the above relationships, i.e. the dependence of loudness on frequency and how this dependence changes at different SILs.
All frequency-SIL pairs on a given blue contour-line sound equally loud, as loud as the 1000Hz-SIL pair on the same contour line.

Click to the left for a 20-20,000Hz sweep tone with steady SIL. Notice how loudness changes with frequency, even though SIL remains fixed.

Loudness Level Unit: Phon.
At 1000Hz, Sound Loudness Level (SLL, in Phons) and Sound Intensity Level (SIL, in dB) are identical. Different frequencies that sound equally loud are located on or near the same contour line on the equal loudness curves graph and have the same loudness level (in Phons) but, most likely, different SILs (in dB).

*Frequency: Number of full vibrations (cycles/oscillations) per second, measured in Hertz (Hz). The primary physical correlate to the sensation of pitch.

*Spectrum: The frequency spectrum of a complex sound is a "recipe," indicating the type (frequency) and amount (intensity) of each ingredient (sinusoidal frequency component) required to make the corresponding complex sound (see the three graphs, below). Correspondingly, spectral distribution describes how intensity level (i.e. sonic energy) of a given sound is distributed across the frequency range of hearing.

For sounds corresponding to periodic signals, also called harmonic signals (such as the signals corresponding to most musical sounds), the lowest frequency component is the 1st harmonic component or "the fundamental" (sometimes designated as ƒ0). All other harmonic components (also called harmonics) have frequencies that are integer multiples of the frequency of the fundamental. That is, if the fundamental frequency is 1ƒ0, then the components above the fundamental have frequencies 2ƒ0, 3ƒ0, 4ƒ0 (referred to as 2nd, 3rd, 4th harmonic) and so forth.

Periodic (harmonic) complex signals have a rather definite pitch that matches in frequency the frequency of the fundamental component. This appears to be the case even if the fundamental component is not present in the signal’s spectrum (phenomenon of the 'missing fundamental').
All other types of signals/spectra are non-periodic & are called inharmonic.

We will return to frequency and spectrum later in the course.

Spectrum (right) of a 2-component complex signal (middle) resulting form the linear superposition of two sinusoidal signals (left).

Loudness & Duration

All else (i.e. overall level, center frequency, spectral bandwidth, & sound wave direction) being equal:

For short durations up to ~200ms (for narrow spectra / narrow-band signals) and up to ~400ms (for wider spectra / broadband signals), the longer the signal the louder it appears.

For moderate durations and intensity levels, duration has no effect on loudness.

For longer durations, intense sounds may reduce loudness due to the auditory system's
_adaptation (i.e. adjustment to higher level baseline);
_fatigue (i.e. temporary reduction in sensitivity); or
_damage (i.e. permanent loss of sensitivity).

The more intense the sound (e.g. >90dB), the shorter the time exposure (e.g. <2hrs) sufficient to cause permanent hearing damage.

Loudness & Spectrum

For high-SPL complex sounds* (no effect at low SPLs), and all else being equal (i.e. overall intensity, center frequency, duration, & sound wave direction):

The wider the spectrum (i.e. the larger the spectral bandwidth / the larger the frequency range occupied by the spectrum), the more likely for the spectrum to correspond to more critical bandwidths* in the ear and the louder the sound (see to the left).
Conversely, the narrower the spectrum, the more likely for multiple frequency components to fall within the same number of critical bandwidths and the quieter the sound.
In addition, the more components within the same critical bandwidth the more likely for loss of sonic clarity due to:
     a) interference artifacts (i.e. beating / roughness sensations) and
     b) masking (i.e. covering of low-intensity components by high-intensity ones).
     [ details on interference and masking ]
So: The larger the bandwidth occupied by a musical arrangement the louder and the clearer the sound.

*Complex Sound: Complex sounds are represented by complex (i.e. non-sinusoidal) signals and contain multiple frequencies, each at its own intensity level, represented in the complex sound's spectrum. With very rare exceptions, all sounds we encounter (musical or otherwise) are complex sounds.

*Critical Bandwidth: Critical bandwidth refers to the range of frequencies around a given frequency component, within which the auditory system is unable to resolve and process other frequency components. Its size varies depending on several factors, including center frequency and intensity level.
The concept is essential to the understanding of loudness and pitch perception and of the interaction among different simultaneous frequencies and phenomena such as masking and interference. More later in the course.

Further Reading

Loudness Module - LMU: RECA220
The Loudness War (Wikipedia)
Mixing Sound for Film (The BeachHouse Studios)

Auditory Processing - Pitch

P I T C H

Sonic (i.e. perceptual) attribute of sound waves that distinguishes them primarily on a low-high scale, related mainly to frequency.

Experiment on Pitch JND Perception (J.P. Damborenea, 11/13/2013)

[ dependence of pitch on sound intensity level ]

Pitch & Frequency

All else (i.e. intensity, spectrum, duration, & sound wave direction) being equal:

Increasing/decreasing frequency f (in cycles/sec or Hz) increases/decreases pitch.

Similarly to the loudness and intensity, pitch and frequency relate logarithmically: addition in perception (pitch) corresponds to multiplication in the physical variable (frequency). As the frequency rises, the same pitch interval corresponds to increasingly larger frequency differences (see the figure to the left; from Campbell & Greated, 2001).

For example, pitch increase by an octave interval corresponds to frequency doubling. So, raising 220Hz by an octave interval means adding 220Hz, while raising 440Hz by an octave means adding 440Hz.
[ revisit Salant, 2024, as needed, for a Music Theory outline ]

Lowest frequency that gives rise to a pitch sensation: 20Hz.
Highest frequency that our hearing mechanism can perceive: 20kHz (20,000Hz).

Just Noticeable Difference (JND) for frequency (minimum frequency change that can be perceived as pitch change):
    ~0.3-1% of frequency, depending on register (i.e. on frequency region).
Expressed differently, it corresponds to approximately 1/12th of an equal-tempered semitone (2 semitones in a whole tone; 12 semitones in an octave), or 5-8 cents (1 cent = 1/100th of a semitone).
The frequency JND for complex tones may larger or smaller, depending on spectral context.

Click left to listen to pairs of successive pure tones (i.e. tones whose spectra contain a single sinusoidal frequency component) ranging from 440-441Hz to 440-448Hz. What is the smallest frequency difference you perceive as a change in pitch?

Pitch & Intensity

All else (i.e. frequency, spectrum, duration, & sound wave direction) being equal:

Changing sound intensity level SIL (in dB) will change the pitch differently, depending on frequency.

Increasing SIL will
    _ lower the pitch of a low frequency tone;
    _ have no noticeable impact on the pitch of a middle frequency tone;
    _ raise the pitch of a high frequency tone.

In addition, the introduction of a high-level tone may change the perceived pitch of an existing low-level tone, even if the frequency of the low-level tone remains unchanged.
    _For low-level tones with frequencies well above the frequency of the high-level tone, the perceived pitch will rise.
    _For low-level tones with frequencies well below the frequency of the high-level tone, the perceived pitch will drop.

In other words, introducing a high-level tone will push the pitch of existing low-level tones away from it, assuming frequency separations beyond one critical bandwidth. If high- and low-level tones are close enough in frequency to fall within the same critical bandwidth, the high-level tone will mask (i.e. render inaudible) the low-level tone.

Pitch & Duration

All else (i.e. overall level, (fundamental) frequency, sound wave direction, & spectral distribution) being equal::

A tone must last more than a minimum amount of time (~10-60ms, depending on frequency and intensity) in order to sound more than a 'click' and convey a clear sense of pitch.

Previous experience and context can often override such psycho-physiological limits, allowing listeners, for example, to make pitch, judgments for tones with durations below the suggested thresholds.

Listen to a melody performed using 7 tones shortened to clicks (2 signal cycles per note).
     _Stripped from context, the melody is unrecognizable, to most listeners.
     _If listeners are told to what melody these 7 tones belong, they are able to identify it.
     _After listeners have been primed to listen to this tune, some may continue to hear it even if the 'notes'
       represented by each shortened tone follow a random pitch pattern.

Pitch & Spectrum (pitch of complex tones)

All else (i.e. overall level, duration, & sound wave direction) being equal:

The pitch of periodic complex tones with harmonic spectra (i.e. spectral components with frequencies that are integer multiples of the frequency of the lowest component, called 'fundamental') matches (in general) the frequency of the fundamental component. This is the case regardless of whether or not the fundamental component is perceivable (i.e. even when it is too low in level and/or masked) and even if it is missing from the tone's spectrum all together.

Listen to a pair of complex tones illustrating the phenomenon of the missing fundamental. Both have fundamental of 300Hz and up to 15 components (ramp spectrum: A_n = A₁/n for all components). The first tone has all 15 components while the second is missing the first 5. As you can hear, removing these first 5 components does not change the tone's pitch, which continues to match that of 300Hz.

Frequency components & region most significant to pitch:
For harmonic complex tones, the more low-frequency components are removed, along with the fundamental, the more likely for the pitch sensation to be altered.
In addition, mistuning or removing a harmonic complex tone's components within the ~400Hz-1,500Hz region appears to impact the tone's pitch the most.
      Listen to this example. It includes 13 harmonic complex tones, with fundamental of 600Hz, played in succession.
      The first tone has all harmonic components from the 1st to the 16th and each successive tone drops one component, starting from the fundamental,
      until only the highest 3 components remain. What happens to the pitch?

Perceiving a complex signal's individual components:
Harmonic complex tones are perceived as a unit rather than a set of multiple pure tones, one per spectral component. However, the individual harmonics can be heard if we draw attention to them by removing them and re-introducing them (after Houtsma et al.,1987; based on an earlier experiment by von Helmholtz, 1862).

Tuva throat singers (peoples of Tibet and the Siberian grasslands) exploit this phenomenon to create musical passages where one singer appears to produce two pitches simultaneously: one acting as a fixed drone and one performing a sort of melody. In these passages:
    a) the single fundamental frequency produced acts as a background drone and
    b) harmonic components, associated with this fundamental, are selectively removed and re-introduced by the performer via modifications to their vocal tract,
        and act as the melody line [ more information & video/audio examples, here & here ].

The pitch of complex tones with inharmonic spectra (i.e. spectral components whose frequencies are not integer multiples of some 'fundamental' component, is unclear, ambiguous, or non-existent. Inharmonic complex tones may elicit more than one competing pitch sensations, may resemble chords, or may sound as noise, depending on their spectral distribution.

As illustrated below, however, changing the spectral peaks of inharmonic complex sounds without changing the frequencies of their sinusoidal components can result in a distinguishable change in pitch, matching the frequency change of the spectral peaks. This highlights the perceptual salience of changing vs. static stimuli and provides additional evidence of the dependence of pitch on spectral distribution.

Listen to three individual major chords (notes C, E, & G played simultaneously), performed by combinations of flute, clarinet, and oboe:
(1) Flute-Clarinet-Oboe (2) Clarinet-Flute-Oboe (3) Oboe-Clarinet-Flute

Now, listen to this example of a chord melody, consisting of the five-chord sequence: (1)-(2)-(3)-(2)-(1).
All three chords are major, include exactly the same notes (C5, E5, & G5), in the same order, and have inharmonic spectra (i.e. their spectral components are not integer multiples of a single fundamental, even though the spectra of the individual notes in the chords are themselves harmonic).

The frequencies of the components are identical among chords but the spectral envelopes* (i.e. relative intensities of the components) are different and depend on which instrument plays what note (a flute, a clarinet, or an oboe).

*Spectral Envelope: a boundary curve that traces the peaks of the spectrum, capturing how energy is distributed across the frequency range.

The melody you hear (C-E-G-E-C) tracks the position of the flute in each successive chord, because the flute's fundamental frequency provides the spectral peak for each chord's spectrum (see the images to the left). In other words, the melody you hear matches the pitch changes corresponding to the changes in the flute's fundamental frequency. Watch the video, below.

A similar effect can be produced through spectral shaping of noise bands. Pitch perception will track changes in the spectral peak of the noise spectral envelope, imposed by the spectral shaping filter. In fact, appropriate shaping of the spectral envelope of white noise signals can help generate speech signals [ speech-shaped noise example ].

Minor deviations from harmonic spectra (up to ~1-2% of frequency) and the way these interact when, for example, several instruments perform together in unison change the timbre of the combined sound rather than its pitch. They produce what is referred to as 'chorus effect': richness of ensemble sound due to slow and varying beating rates among the slightly detuned components of the complex tones involved. Chorus, Phaser, and Flanger effects are all based on the same principles, differentiated by modifying the values of the same variables [ video exploring all three effects ]

The Octave

The octave interval (corresponding to frequency doubling) is significant because tones separated by this interval sound remarkably similar and, when simultaneous, perfectly blend into a single pitch percept, even though the higher frequency is clearly higher in pitch. The perceptual "sameness" conveyed by octave intervals is cross-cultural and is referred to as: pitch chroma. We return to this topic later in the course.

Further Reading

Pitch Module - Music & Science
Pitch Module - LMU: RECA220
Spectral/Place and Temporal theories of pitch perception [ historical review ].

T I M B R E

Multidimensional sonic (i.e. perceptual) attribute of sound waves that describes their character/quality, related mainly to spectral distribution and expressed through an extensive list of largely -and necessarily- vague adjectives.

Real-time MRI scans of four professional musical theatre performers singing vowels.
Credit: ProfEdwardsSU

Timbre & Spectrum

All else (i.e. overall intensity, fundamental frequency, duration, and sound wave direction) being equal:

Changes in spectral distribution correspond to a variety of timbral changes.

Example 1: A sustained tone played by a Bb soprano clarinet is followed by the same tone presented by gradually increasing and then decreasing the number of spectral components (from the lowest to the highest in frequency and back).
Example 2: A 220 sine tone of amplitude A followed by several more tones in which 7 additional harmonic components are incrementally added and removed (i.e. 2f, 3f, ... 8f), at amplitudes equal to A/n (n : number of harmonic component), returning back to the 220 sine tone.

Rather than tracking every frequency component of a given spectral distribution, we capture the key timbral dimensions of musical sounds by exploring the following spectral energy distribution acoustic parameters
[ based on Grey, 1977; McAdams, 2013; Kendall et al., 1999; Lakatos, 2000 ]:

spectral centroid (center of amplitude-weighted frequency distribution);

spectral bandwidth (spread of frequency distribution = Highest Frequency - Lowest Frequency);

spectral density (number of frequency components per critical band); and

spectral inharmonicity (departure from integer multiple relationship relative to some fundamental frequency)

Spectral Centroid

A measure capturing the center of energy distribution in a complex signal's spectrum, within a given time window. It is manifested perceptually as a sound’s degree of nasality-brightness, the primary dimension of timbre [ e.g. Kendall et al., 1999 ].

In general:
_ Higher centroid values correspond to spectra with more high-frequency energy and to
   more 'nasal' & 'brighter' sounds.
_ Lower centroid values correspond to spectra with more low-frequency energy and to
   more 'acute' & 'duller' sounds.

Qualitatively, spectral centroid can be likened to a spectrum's 'center of gravity' or 'titter-totter fulcrum,' where amplitude values represent 'weights' and frequency values represent the 'position' of each weight on the titter-totter (see images to the left).

_ If this 'gravity center' is calculated relative to the given spectral bandwidth (i.e. if the centroid
   value is calculated independent of the lowest frequency in the spectrum),
   then:
   two spectral distributions with the same spectral centroid have the same spectral envelope*
   slope and the same degree of 'nasality,' regardless of absolute frequency boundaries.

_ If this 'gravity center' is calculated based on absolute frequency values (i.e. if the centroid
   value is calculated taking into account the lowest frequency in the spectrum),
   then:
   two spectral distributions with the same spectral envelope slope (i.e. same nasality) but
   different fundamental frequencies (or, more generally, different low frequency boundaries)
   defer in 'brightness': the higher the fundamental (or lowest) frequency the higher the
   brightness.
E.g. The two spectra, below, correspond to similar degrees of nasality (have the same spectral envelope slope), with the one to the right sounding brighter [ details in Marozeau et al., 2003; Marozeau & Cheveigné, 2007 ].

[*spectral envelope: a boundary curve that traces the peaks of the spectrum, capturing how energy is distributed across the frequency range]

(same nasality - different brightness)

Descriptive adjectives range from boomy, muffled, dull, & warm (lower centroid values); to acute, vibrant, bright, & nasal (mid-high centroid values); to tinny, piercing, & screeching (higher centroid values).

Listen to pair of harmonic complex tones with 7 components each, the same fundamental frequency (220Hz), but different centroid values. The first tone has most of its energy in the low components (lower centroid value), while the second has most of its energy in the high components (higher centroid value).
The didjeridu is an example of an instrument whose performance practice and aesthetic qualities rely heavily on spectral centroid manipulation (remember "Green Frog" from Module 01).

Spectral Bandwidth

A measure of the degree of overall energy spread along the frequency dimension.
It corresponds to the hearing mechanism's physiological response width and is manifested perceptually as a sound's degree of width/richness/fullness/thickness vs. narrowness/plainness/lightness/thinness.
Descriptive adjectives range from thin, narrow, & plain to rich, full, & thick. As previously noted, spectral bandwidth also impacts loudness, depending on the number of corresponding critical bands.

Spectral Density

A measure of the number of frequency components per critical band.
It corresponds to the degree and type of interference among spectral components within the ear (or of masking* for components with large intensity differences) and captures perceptions described by adjectives that range from hollow, clear, crisp, & smooth, to pulsating, buzzing, rough, muddy, & noisy.
[*reminder: masking refers to: the perceptual erasure/cover of a lower level tone (maskee) by a higher level tone or band of noise (masker)]

Spectral Inharmonicity

A measure of spectral deviations from harmonic (i.e. integer multiple) relationship among components.
It corresponds to instability in the response of the hearing mechanism to inharmonic spectra.
As noted under 'Pitch,' minor deviations from harmonic spectra (up to ~1-2% of frequency) and the way these interact when, for example, several instruments perform together in unison, produce what is referred to as 'chorus effect': the undulating quality, 'liveness,' and richness of ensemble sound due to slow and varying beating rates among the slightly detuned components of the complex tones involved.

Spectral Density & Inharmonicity - Sensory Dissonance

Spectral density and inharmonicity are captured by models estimating/quantifying the degree of sensory dissonance (i.e. degree of beating and/or roughness) in a sound, resulting form the interference of two or more frequency components within the ear.

Beating: loudness fluctuation perceived when two or more spectral components of a sound are separated by up to ~15Hz.
Roughness: a buzzing, harsh, raspy sound quality accompanying spectra with two or more frequency components separated by ~15-150Hz (upper limit depending on the frequency region in question and on the corresponding critical bandwidth).

_ Relationship among beating, roughness, spectral distribution, & critical bands.
_ Example: a fixed 1000Hz tone interfering with a 600-1300Hz sweeping tone.

The smaller the level difference between the interfering tones the stronger the corresponding beating/roughness sensation.

The larger the level difference between the interfering tones the more likely for the more intense tone to mask the less intense tone.

The beating and roughness sensations are directly related to the degree of sensory consonance / dissonance and are only relevant to harmonic intervals. They depend on clear-cut physical/physiological considerations, applicable to all cultures.
However, how "pleasant" or musically consonant a given degree of roughness is judged to be is culturally defined, with no universally "correct" judgment.

The general, musical concepts of consonance and dissonance depend on many variables, additional to sensory consonance/dissonance. What we consider musically consonant (e.g. acceptable, pleasing, fitting, correct) or dissonant (e.g. unacceptable, disturbing, unfitting, wrong) depends on melodic, harmonic, rhythmic, and dynamic context, and may change:
a) with time (historical context),
b) with tradition (cultural context), or even
c) within a single tradition, style, or piece of music (musical context).
[ more on musical consonance - video on the physics and music theory of musical consonance ]

Formants

The resonant characteristics (i.e. the tendency to respond better to, and therefore amplify some frequencies over others) of an instrument, voice included, enhance certain spectral regions of the produced sounds.
[ more on resonance ]
When these enhanced spectral regions remain the same, regardless of fundamental frequency (i.e. regardless of the note produced, or of pitch register), they are called formants and contribute to sound source recognition and identification.

Formants appear to be responsible for the recognizable differences between various vowel sounds [ see here ] and have been used successfully in speech recognition and synthesis applications. The vowels in this resource have been created by spectrally shaping generic sawtooth spectra using spectral envelopes corresponding to each vowel's first 4 formants
[ more details here ].

Watch this spectral breakdown of the vowel 'A' (male voice; 100Hz fundamental frequency).

Many emotional states are accompanied by characteristic/typical facial expressions and, consequently, characteristic vocal fold tension and/or resonator shaping. This corresponds to characteristic spectral shaping that consistently modifies vowel formants.
The accompanying, consistent timbral features permit the pairing of emotional states to timbral "signatures." For example, we are able to tell when someone is smiling while they speak, even in the absence of visual and linguistic cues [ e.g. Torre, 2014 ].

Timbre & Time

The average spectrum of a complex signal describes its energy distribution across frequencies but does not capture if/how it changes with time, a change that also influences timbre.

Signal Time-Variance

Signal time-variance can be represented through a signal's envelope: a boundary curve that traces the signal's amplitude through time, capturing how the total energy in the signal changes with time. It encloses the area outlined by all maxima of the two-dimensional signal (see top-left: signal in blue, envelope in red).

Based on envelope shapes, we can classify signals in two broad categories:
_ Continuous Signals: most of the energy is in the "steady state" or "sustain" portion of the envelope
(e.g. signal of a bowed violin string).
_ Impulse Signals: most of the energy is in the "attack" portion of the signal; there's no "steady state"
(e.g. signal of a struck marimba bar).

Signal envelopes are segmented into three portions:

Attack: The portion of the envelope tracing the development of a sound signal towards its maximum amplitude/strength. It represents how energy builds up in a vibrating system and, in music, can be manipulated through instrument excitation methods (bowing, plucking, blowing, striking, etc.).

Signal durations shorter than a given signal's attack portion will significantly impact timbre perception. This portion contains a signal's onset transients and contributes to the timbre of any signal, but significantly more to the timbre of impulse signals.
Listen to 3 orchestral instruments presented with the attack portion of their signal removed. Can you recognize the instruments? [piano, clarinet, French horn]

Onset (attack) transients are frequency components that are usually inharmonic, reach higher amplitudes than other components, die out rather fast, and correspond to a signal's degree of noiseness/naturalness.
The degree of level rise/fall synchrony of attack transients is characteristic to a given source, assisting in its timbral recognition and identification.

Steady State / Sustain: The portion of the envelope during which there is a continuous supply of energy in a vibrating system and which can be manipulated through performance techniques (e.g. vibrato, muting, damping, bowing pressure, driver excitation location).

Decay: The portion of the envelope that traces the drop in amplitude (or "decay") of a sound signal from its maximum value to zero, when energy stops being supplied to a vibrating system.
It represents how energy stored in a system dissipates and depends on sound source construction and performance space.

The significance of envelope to timbre can be demonstrated by playing a sound backwards; the signal's time evolution changes, while its average spectral distribution does not.

Click on the image to the left for a piano-note example.

Listen to piano-passage examples (by Houtsma et al.,1987):
Example 1 - Example 2 - Example 3.

3D Time-Variant Spectrum ('waterfall' plot)

2D Spectrogram

2D Amplitude Envelopes

Spectral Time-Variance (spectral flux)

Spectral time-variance is manifested as changes in the frequency and amplitude of a complex tone's components with time, and can be represented through time-variant spectra, spectrograms, or individual component amplitude/frequency envelopes.

3D Time-Variant Spectra ('waterfall' plots):
Usually with frequency on the x axis, time on the y axis, and level on the z axis. Watch the video, below, for the 3D time-variant spectra of the three piano examples in the previous section (in Houtsma et al.,1987).

----------------------------------------------------------------------------------------------------------------------------------------------------

2D Spectrograms:
Usually with time on the x axis, frequency on the y axis, and intensity in different shades.
_ See and hear 2D Spectrograms of sample sound sources.
_ Watch, below, for 2D spectrograms of various orchestral instrument tones [ from What Music Really Is ].

----------------------------------------------------------------------------------------------------------------------------------------------------

2D Amplitude Envelopes of Individual Spectral Components; usually with time on the x axis, level on the y axis, and spectral component number or frequency in different color/type lines (image to the left).

----------------------------------------------------------------------------------------------------------------------------------------------------

Time Separation Threshold

Two signals separated by a time delay that is shorter than a specific time separation threshold of ~2-50ms, depending on the spectra of the two signals, sound as one (per Hirsch, 1959; details and analysis in Divenyi, 2004), with increased loudness and a noticeable change in timbre: the original signal's attack portion becomes less sharply defined, resulting in a timbre with a rather 'blurry' onset.

Listen to these 3 complex signals with fundamental 600Hz:
(i) one 1-sec-long complex tone, (ii) two complex tones; onsets separated by 30ms, and (iii) two complex tones; onsets separated by 150ms.
_ The introduction of the second tone in (ii) is perceived as an increase in loudness and a change in the attack of the first tone.
_ The introduction of the second tone in (iii) results in the perception of two tones.

SUMMARY

_ Signal onset transients correspond to a signal's degree of noiseness/naturalness

_ Spectral density & inharmonicity correspond to a signal's degree of beating/roughness

_ Spectral inharmonicity correspond to a signal's degree of fluidity/liveliness (chorus effect - for small departures from harmonicity)

_ Spectral centroid correspond to a signal's degree of nasality and brightness

_ Spectral time-variance correspond to a signal's degree of naturalness

Table summarizing the relationship between spectral variables and timbral adjectives (Vassilakis - unpublished)

Further Reading

Timbre Module - Music & Science (Center for Music and Science)
Timbre Module - LMU: RECA220
MP3s and the Degradation of Listening (Vassilakis - DePaul University Blog)