Frequently Used Terms, Abbreviations and Notation

Terms and abbreviations

Anechoic chamber: Type of room with almost totally sound-absorbing walls, frequently used for experimentation under free-field-like conditions.

Auditory object: Perception corresponding to a single sound source. Attributes of auditory objects are location and extent.

Auditory spatial image: Illusion of perception of a space with auditory objects of specific extent and at specific locations.

BCC: Coding of stereo or multi-channel signals using a down-mix and binaural cues (or spatial parameters) to describe inter-channel relationships.

Binaural cues: Inter-channel cues between the left and right ear entrance signals (see also ITD, ILD, and IC).

BMLD: Binaural masking level difference. The difference in masked threshold due to different spatial cues of masker and target signal.

BRIR: Binaural room impulse response, modeling transduction of sound from a source to left and right ear entrances in enclosed spaces. The difference from HTRFs is that BRIRs also consider reflections.

CPC: Channel prediction coefficient.

DFT: Discrete Fourier transform.

Direct sound: Sound reaching a listener's ears or microphones through a direct (nonreflected) path.

Enclosed space: A space where sound is reflected from the walls enclosing it.

ERB: Equivalent rectangular bandwidth. Bandwidth of the auditory filters estimated from masking experiments.

Externalization: When typical stereo music signals are played back over headphones, the extent of the auditory spatial image is limited to about the size of the head. When playing back headphone signals which realistically mimic ear entrance signals during natural listening, the extent of the auditory spatial image can be very realistic and as large in extent as any auditory spatial image in natural listening. The experience of perceiving an auditory spatial image significantly larger than the head during headphone playback is denoted externalization.

FFT: Fast implementation of the DFT, denoted fast Fourier transform (FFT).

Free-field: An open space with no physical objects from which sound is reflected.

Free-field cues: Binaural cues (ITD and ILD) which occur in a one-source free-field listening scenario.

HRTF: Head-related transfer function, modeling transduction of sound from a source to left and right ear entrances in free-field.

IC: Interaural coherence, i.e. degree of similarity between left and right ear entrance signals. This is sometimes also referred to as IAC or interaural cross-correlation (IACC).

ICC: Inter-channel coherence. Same as IC, but defined more generally between any signal pair (e.g. loudspeaker signal pair, ear entrance signal pair, etc.).

ICPD: Inter-channel phase difference. Average phase difference between a signal pair.

ICLD: Inter-channel level difference. Same as ILD, but defined more generally between any signal pair (e.g. loudspeaker signal pair, ear entrance signal pair, etc.).

ICTD: Inter-channel time difference. Same as ITD, but defined more generally between any signal pair (e.g. loudspeaker signal pair, ear entrance signal pair, etc.).

ILD: Interaural level difference, i.e. level difference between left and right ear entrance signals. This is sometimes also referred to as interaural intensity difference (IID).

IPD: Interaural phase difference, i.e. phase difference between the left and right ear entrance signals.

ITD: Interaural time difference, i.e. time difference between left and right ear entrance signals. This is sometimes also referred to as interaural time delay.

kb/s: Unit for bitrate, kilo-bit per second. Also denoted as kbps.

Lateral: From the side, e.g. lateral reflections are reflections arriving at a listener's ears from the sides.

Lateralization: For headphone playback, a subject's task is usually restricted to identifying the lateral displacement of the projection of the auditory object to the straight line connecting the ear entrances. The relationship between the lateral displacement of the auditory object and attributes of the ear entrance signals is denoted lateralization.

LFE channel: Low-frequency effects channel. Multi-channel surround systems often feature one or more LFE channels for low-frequency sound effects requiring higher sound pressure than can be reproduced by the loudspeakers for the regular audio channels. In movie soundtracks, an LFE channel may for example contain low-frequency parts of explosion sounds.

Listener envelopment: A listener's auditory sense of ‘envelopment’ or ‘spaciousness of the environment’.

Localization: The relation between the location of an auditory object and one or more attributes of a sound event. For example, localization may describe the relation between the direction of a sound source and the direction of the corresponding auditory object.

Mixing: Given a number of source signals (e.g. separately recorded instruments, multitrack recording), the process of generating stereo or multi-channel audio signals intended for spatial audio playback is denoted mixing.

OCPD: Overall channel phase difference. A common phase modification of two or more audio channels.

PDF: Probability density function.

Precedence effect: A number of phenomena related to the auditory system's ability to resolve the direction of a source in the presence of one or more reflections by favoring the ‘first wave front’ over successively arriving reflections.

QMF: Quadrature mirror filter; specific filter as used in audio coders.

Reflection: Sound that arrives at a listener's ears or microphones indirectly after being reflected one or more times from the surface of physical objects.

Reverberation: The persistence of sound in an enclosed space as a result of multiple reflections.

Reverberation time: Defined as the time it takes for sound energy to decay by 60 dB. The more reverberant a room is, the larger is its reverberation time. Reverberation time is often denoted RT or RT60.

SAOC: Spatial audio object coding.

Sound source: A physical object emitting sound.

Spatial audio: Audio signals which, when played back through an appropriate playback system, evoke an auditory spatial image.

Spatial impression: The impression a listener spontaneously gets about type, size, and other properties of an actual or simulated space.

Spatial cues: Cues relevant for spatial perception. This term is used for cues between pairs of channels of a stereo or multi-channel audio signal (see also ICTD, ICLD, and ICC). Also denoted as spatial parameters or binaural cues.

STFT: Short-time (discrete) Fourier transform.

Sweet spot: Optimal listening position for a stereo or multi-channel loudspeaker-based audio playback system.

Transparent: An audio signal is transparent when a listener can not distinguish between this signal and a reference signal. For example, transparent audio coding denotes audio coding, where there is no perceptible degradation in the coded audio signals.

Notation and variables

* Convolution operator

b Partition (parameter band) index

bm Parameter band b corresponding to sub-band signal m

fs Sampling frequency

T Time constant of a one-sided exponential estimation window

k Time index of subband signals (also time index of STFT spectra)

C Number of encoder input channels

D Number of decoder output channels (if different from number of encoder input channels C)

E Number of transmitted channels (if different from number of encoder input channels C)

xc(n) Encoder input audio channels

s(n) Transmitted sum signal or down-mix signal

si(n) Down-mix signal i

yc(n) Transmitted audio channels

di(n) Residual signal

ei(n) Externally provided down-mix signal

imagesi(n) Decoder output audio channels

imagesi(k) One sub-band signal of xi(n) (similarly defined for other signals)

xi,m(k) mth sub-band signal of xi(n)

imagesi,m(k) Estimated mth sub-band signal of xi,m(k)

D(si,m)(k) Decorrelated version of mth sub-band signal of si,m(k)

pxi(k) Short-time estimate of power of xi(k) (similarly defined for other signals)

pxi,b Frame-based estimate of power of xi(k) in sub-band b

hl,i(t) Left-ear HRIR for sound source i

Hl,i(f) Left-ear HRTF for sound source i

Hr,i(f) Right-ear HRTF for sound source i

τi1i2(k) ICTD between channel i1 and i2

ΔLi1i2 (K) ICLD between channel i1 and i2

ci1i2(k) Coherence between channel i1 and i2

ρi1i2(k) Correlation between channel i1 and i2

ΔLi1i2,b(k) ICLD between channel i1 and i2 of parameter band b

ϕi1i2,b(k) ICPD between channel i1 and i2 of parameter band b

ci1i2,b(k) ICC between channel i1 and i2 in parameter band b

θi1i2,b(k) OCPD between channel i1 and i2 of parameter band b

τ(k) ITD in specific critical band

τb(k) ITD in band parameter band b

ΔL(k) ILD in specific critical band

ΔLb(k) ILD in parameter band b

c(k) IC in specific critical band

cb(k) IC in parameter band b (similarly defined for other signals)

γi,b ith CPC in parameter band b

Pi Parameter set P of element i

