Chapter 9
Loudness, the Confluence of Domains and Deep Listening

This chapter explores three subject areas: the recording element of loudness, the confluence of all the track's contents and resulting timbre percepts, and a more thorough coverage of deep listening (a subject briefly introduced in Chapter 2). These are presented by dividing the chapter into four parts. Together these topics (1) conclude our exploration of recording elements, (2) situate recording elements within the confluence of all domains, (3) acknowledge timbre as a confluence, and (4) introduce the reader to new ways of listening to tracks.

In the first section, loudness as a recording element is the last of the recording elements to be explored. It is discussed last because loudness determines if and when all other elements are audible. It is also, perhaps, the most misunderstood of the elements. Loudness is examined in its appearances at all levels of perspective, ranging from the individual loudness contours of partials inside the spectral envelope up to the overall loudness contour of the track (program dynamic contour). Loudness balance, loudness levels and contours of sources, and source performance intensity (a timbral quality) are each in turn presented in detail; each are central concerns of records. These percepts commonly bring challenges to the listener’s sense of prominence, and largely shape expression leading to listener interpretations.

The second section presents the ‘confluence’ that leads to the perception that all elements blend; all elements of the three domains, all contributions of performances within the track, along with the outside semiotic associations, affective states, and so forth entwine within listener interpretation. The mix stage of the recording process is used to illustrate confluence concepts. Discussion begins with the confluence of recording elements, shedding light on how they may combine to establish larger concepts of how recording shapes the song; this is often associated with the three percepts of lateral image and sound stage width, distance and depth, and timbral balance. Prominence is further explored in this context. The exploration of confluence leads to the third part, ‘timbre as confluence,’ and a presentation of timbre as a multi-domain and multi-faceted percept. Timbres of sound sources and the timbre of the track are investigated as a confluence of domains; this allows the analyst to engage the contexts, character and content of sound source timbres and the timbre of the track, and leads to engaging crystallized form. A detailed account of crystallized form concludes this section.

The last section of this chapter expands coverage of deep listening. Deep listening concepts have been introduced throughout this book, though not explicitly. It is the basis for numerous approaches offered for hearing many of the qualities of the track, especially recording elements. This chapter concludes by contrasting open listening and directed listening. The deep listening principle to allow whatever comes along to hold equal potential is a significant aspect of our framework.

LOUDNESS AS A RECORDING ELEMENT

Loudness is the perceived magnitude of sound; it is the sensation of the amplitude of the waveform. Loudness is the percept resulting from the psychological impression of the physical intensity of pressure (sound pressure) (Handel 1993, 63). Loudness as a recording element utilizes our sense of this sensation of amplitude.

Loudness, though, can encompass more than sensation. Loudness “is a concept that has implicit meaning for nearly everyone” (Schlauch 2004, 317) yet its sense of magnitude is subjective to the individual and to context (both cultural and environmental). All of us might ‘know’ when a sound is soft and when it is loud, but we all carry our own sense of these concepts, and this sense of ‘loud and soft’ shifts. These ratings of loud and soft are largely meaningless out of context—they carry no association to any actual and measureable sound pressure level. Environment context impacts ‘loud and soft’ so what is perceived as loud in a small room might be moderately soft from the back of a lecture hall. Cultural and social context are even more complicated, adding norms, notions of acceptable levels, and more; a softly spoken word in the midst of a noisy crowd is far different than the same level of speech within a quiet theatre. Loudness percepts are subjective between individuals, are subject to social and cultural conventions, and are perceived differently within various contexts. In addition, perceived level of loudness can vary under influence of other elements—whether or not the actual amplitude of the signal differs.

Perceived loudness can be significantly transformed without an actual change in sound level. We learned, beginning with earliest psychoacoustic studies, that functional dependencies exist between loudness and all other elements, and that there are underlying psychological and physiological factors that may impact loudness perception. Spectral content, duration, time relationships, bandwidth, frequency/pitch level/range, and even visual cues (experienced or imaged) are among percepts that can transform the experience of perceived loudness. Even surprise, shifting attention, pleasure, interest, and discomfort impact loudness perception—psychological influences on perceived loudness can be highly influential, and highly personal. Loudness is a subjective impression we can “assume to be influenced by different nonsensory factors and biases” (ibid.,318), and by selective attention and focus on perspective level, as much as the impression of loudness is influenced by other elements within the sound and by the sound qualities and aesthetic activities of other sound sources. Loudness perception has many layers of subjective influences, laid upon the subjective impression of the sensation of the magnitude of sound.

Loudness manipulation is an important function of recording. Loudness levels of sources and sounds can be profoundly shaped by the recording process. When many of us consider the recording process, our thoughts turn to changing loudness levels of sources and mixing them together in different proportions from what occurs live (or what might have occurred live, given appropriate circumstances). The subtle qualities of loudness, then, can provide qualities and relationships that distinguish tracks as well as the sources they contain. Loudness can create surreal relationships of sounds, where gentle whispers can be incorporated at very loud levels, and instruments performed with great exertion are altered so the timbre generated by such a performance appears at a subdued loudness level in the mix; the balanced mixture of sound sources that comprise the track’s aggregate texture need not reflect acoustic realism.

Loudness as a recording element is focused on the sensation of amplitude, separated (in as much as it is possible) from context of the performance (performance intensity, timbre, etc.). This sensation is loudness as loudness alone; it distinguishes loudness as a recording element from dynamics as a music element, and the dynamics coupled with loudness of performance. Here, the loudness percept and observation is separated from all of the influences of interpretation and impression; it is the perception of actual physical sensation. Loudness can be perceived as sensation, but typically we process loudness quite differently. In our daily experiences, loudness is rarely experienced solely as the sensation of the magnitude of the sound. It is counter to our natural listening tendencies to process (hear) loudness as sensation alone (without influences outside the sensation of magnitude). To separate listening to loudness from the above mentioned factors requires intention, and a control of attention and focus; this listening process seeks to isolate sensation from other influences within the sound. In this way, listening for loudness alone pulls the sound or sound materials out of context and the subjective factors they contribute to the interpretation of the sound’s loudness. This attention to loudness is a critical listening process; it examines the experience as a sound object, void of causal factors and contextual implications. Attention to dynamics, as stated above, incorporates context and character of sounds and materials; dynamics is as much (and often more) about timbre, expression, energy, intensity, and nonsensory factors as it is about the sensation of loudness.

This separation makes it possible to approximate actual loudness levels against a reference, and to calculate loudness contours over time. The sensation of loudness is observed within the context of the track, a context established by its reference dynamic level. Loudness as an element of recording shares an equal role in shaping the track. Hearing the percept of loudness as a sensation of the magnitude of sound can unveil sonic characteristics and gestures inherent to the track that might otherwise go unobserved. The recording element of loudness/dynamics can add dimension to a recording analysis at all levels of perspective. Loudness establishes the presence of sounds and sound sources, but does not in itself establish prominence.

Prominence and Loudness

Attention itself can play a role in loudness perception. Attention is the act of bringing active awareness to the listening process. It can also be holding something within the center of that awareness—an act that by its nature diminishes the prominence of all other sounds, impressions, thoughts, aesthetic ideas, materials, etc. What is held in the center of one’s attention is most prominent to the listener; this prominence from focus of attention is often mistaken to also be the loudest aspect of the track.

Prominence is what is most noticeable or conspicuous at a particular moment in time; it is what has grabbed the listener’s attention. It is not necessarily the most important or most significant sound or material in the track, but only what the listener—at that moment, or in this listening—finds most interesting (Moylan 2015, 452). What is most prominent is also not necessarily what is loudest; prominent sounds are often not the loudest. A sound can be most prominent in the listener’s consciousness, while being at a lower loudness level than all other sounds.

A sound, sound source, aesthetic idea, lyric or musical idea can dominate the listener’s attention for a myriad of reasons. All elements of the track have an equal potential to provide qualities that cause sounds or materials to standout and be noticed. The entry of the lead vocal very often captures the listener’s attention, and at least for a moment can seem to dominate by loudness; when language is present, or pronounced affects of a voice, our life experiences direct us to give it our attention. Prominence can be personal, and influenced by the listener’s prior experience; what stands out to listeners on a personal level can be highly unique.

Unexpected events can attract attention and thus bring prominence, as can those that are unusual in some way. There is a prominent hi-hat entrance in Phil Spector’s mix of “Let It Be” (Let It Be, 1970) at 0:53. The instrument is not the loudest in the mix. It is prominent because it is the first appearance of a percussion sound in the track, and it is a new addition to the texture. The listener’s attention is likely immediately captured by the new sound that is unlike anything that has preceded it—though they may also remain engaged in the lead vocal melody or the content of the lyrics. Immediately the instrument’s unusual spatial identity becomes pronounced as its delayed iterations provide movement on the sound stage. Never is it the loudest sound, or the track’s most significant aesthetic voice. Whether or not it is more prominent than the lead vocal rests in the attention of the listener. Shifting one’s focus of attention between the lead vocal, piano and hi-hat, one might experience a shift in loudness accompanying a shift in prominence, or awareness. Recognizing this shift in prominence can lead toward experiencing the sensation of loudness decoupled from other aspects of sound.

Prominence can be influenced by interpretive factors, and our perception of it may be detached from the actual context of the track. Prominence can be brought into context, though. This is accomplished through intention to perceive all sounds and activities in the track as being equivalent to all others. A balance of prominence (an equality of all that is occurring) will allow one to ‘hear’ all sounds as equivalent. This can be accomplished by directing attention to a higher level of perspective, where the sources (or whatever is being compared) can be heard as equals; this can reveal their differences and unique states most accurately. In this way we can perceive the balance of loudness levels without the influence of prominence. This chapter’s praxis studies can assist the reader in acquiring skill in these areas.

Measuring Loudness Perception

Measuring loudness and establishing identifiable (or perceptible) loudness increments is highly problematic, and not entirely possible.

First is the matter of equal loudness throughout the hearing range of pitch/frequency. As we have learned, two sounds of different frequencies will very likely require different physical amplitudes to establish the sensation of equal loudness. The sensation of loudness can change throughout the hearing range, while levels of acoustic energy might remain consistent—and these inconsistencies of loudness sensitivity are non-linear, and vary significantly at different pressure and frequency levels (as the equal-loudness contour previously informed us). Loudness perception of the track is linked with the loudness level of playback, and its ability to reproduce the frequency range with relatively even response. Identifying equal loudness levels between diverse pitches and frequencies (and the timbres presenting them) is one significant problem of measuring loudness.

Identifying increments of loudness levels is the second significant problem we encounter.

Loudness (a subjective measure) is often confused with sound pressure (Mather 2016, 128–131). Sound pressure is a physical characteristic of how tightly air molecules are compressed together; as the displacement and compacting of air molecules increases (as a sound body moves farther) the greater the pressure increase in the waveform. Sound pressure is measured in decibels as sound pressure level. When we measure the physical amplitude of a sound, we identify its sound pressure level and establish its relationship to a reference level; we arbitrarily choose one value of sound pressure as a reference, and then measure all sounds as relative multiples of that reference level. Thus, the decibel is a comparative measure (a ratio) relating a reference value of the threshold for human hearing1 to the current sound’s measured sound pressure level. This measure of amplitude does not transfer into perception for numerous reasons. The most obvious might be the range of loudness we can perceive. If we accept the threshold of pain (for most listeners) to be 160 dB SPL, the loudest sounds are about 100,000,000 times more intense than the slightest perceivable sound (threshold of hearing for young ears); workable numbers are derived by converting the ratios to the logarithms of the ratios as decibels (dB). The sensations of loudness and sound energy are not proportional, but are calculated as a logarithmic function. Decibel’s logarithmic scaling is not helpful for establishing perceptual increments, as an increase of 3 dB is a doubling of power, but this is unrelated to an increase in perceived loudness; a sound 10 times the intensity as another is 1 Bel greater in loudness. (B. Moore 2013, 133–167) We “can only infer loudness from objective measures” (Schlauch 2004, 318); the measure of sound intensity does not transfer into a measurable perception of loudness.

The methods of measuring loudness explored in psychoacoustics, music psychology and recent ecological studies have little to offer recording analysis. The methodologies used to study loudness sensation have ranged widely. They have included paired comparisons, loudness matching, magnitude scaling, category ranking, cross-modal matching and psychological scaling; included also are studies in ecological loudness—the relation between loudness and the naturally occurring events that they represent. Much about loudness has been examined—both as simple stimuli in laboratories and as environmental sounds.2 While these studies have further informed our understanding of loudness perception, they have not opened a path toward devising a way to identify and compare loudness levels against some scaling or objective measure (like we do pitch and frequency). Our means to engage loudness levels—identifying levels and differences (intervals) between levels—has not advanced in a way that can be incorporated into studying the content of tracks. Loudness levels cannot be doubled to establish an identifiable clone of itself at another level of sound pressure, as pitch can be recognized as being an octave higher; the range of loudness cannot be divided into equal increments that are perceivable and recognizable, such as the half-steps of pitch.

Loudness in the Context of the Track

The category ratings of dynamic markings remain the most readily useable scale available to compare loudness levels and loudness relationships; dynamic markings are range areas of dynamics/loudness, not discrete levels. Discussed at length in Chapter 3 and the following: dynamics differ from loudness in that they reflect timbral characteristics, energy, expression, intensity; dynamics carry subjectivity of interpretation and intermodal connections (physical exertion, connotations of expression, etc.); dynamics are relative to the context of the track. Further, a range of loudness levels exist within each dynamic area—and the actual loudness levels of areas can overlap (also a reflection of dynamics privileging timbre over loudness).

Loudness as a sensation of sound intensity may be situated within this context, with recognition of its sensation being associated with, though distinguishable from those of dynamics. As a recording element, the sensation of loudness is understood within the context of the track. Its level may be defined within the continuum of dynamic ranges.

The reference dynamic level is used to relate dynamic levels, and has been adapted and adopted here (from what was presented earlier) to calculate loudness. It is a holistic impression of all of the sounds of the track and their musical materials and performances; this impression also embraces the content of the lyrics and the meanings and affects it generates. Reference dynamic level was discussed at length in Chapter 3, and will be discussed in more detail here.

The RDL is a specific level within a specific dynamic area; it can be placed at a precise point on the continuum of dynamic ranges, representing a clearly defined level of intensity. The RDL is a stable, unchanging reference within a track; no matter what occurs within the track, the RDL remains a consistent reference.

The track’s reference dynamic level (RDL) provides a stable level of intensity against which levels of sources and sounds can be calculated. Loudness will be identified on the basis of sensation alone, and placed within the context of the track against the ‘conceived’ loudness level that is implied by the reference dynamic level; this carries modest subjectivity from (1) the analyst’s determination of the RDL and also of (2) the transference of the RDL’s ‘dynamic’ level into a loudness level sensation. The RDL implies a ‘loudness level’ that reflects the energy, intensity and expression of its ‘dynamic level’ related to music contexts (pppp, mp, etc.). To allow this transfer, the thresholds of hearing and pain need to have corresponding dynamic levels. The reference levels of the threshold of hearing and the threshold of pain have been assigned the dynamic markings:

The selection of these markings is somewhat arbitrary; five p’s and/or five f’s might seem appropriate to some analysts. Some analysts may find tracks that require more than five increments to reflect its dynamic continuum. These were chosen because they are rarely encountered extremes in musical score markings, and the markings represent rarely encountered extremes of loud and soft within musical contexts. These markings will typically allow for a clear observation and presentation of data. The analyst may adjust this scale if it seems appropriate to the track; for instance, some metal tracks may use only a range from fffff, while a folk track might use a range of pmf .

Using the RDL and these references for extreme levels, loudness levels encountered might be understood and calculated within the context of the individual track. Calculating and comparing loudness levels utilizes numerous processes. They can differ somewhat at various levels of perspective and with various types of observation. Nearly all can be related to one or several of the following steps, however:

  • Identifying the level of the sensation as compared with that of the RDL; this determines its loudness level within the context of the track
  • Matching or comparing loudness levels within the same gesture or percept stream (a musical line, for instance) at different points in time; this establishes points within a gesture, and allows loudness contour to be mapped against time
  • Matching the loudness sensations of two or more different sounds/sources occurring simultaneously; this allows separate parts to be observed for their relationships and interactions, as well as their individual characteristics
  • Placing loudness within dynamic ranges from an interpretive calculation of sustainable physical exertion (see Chapter 3)

Interpreting Reference Dynamic Level and Crystallized Form

Crystallized form is the highest dimension of form; it is the track (in its entirety) existing out of time, ‘heard’ non-temporally within the experiential present. Perceived in an instant of realization, it is a single, multidimensional shape, and a large-scale nonverbal conception and experience. Crystallized form is an aural image and a large-dimension sound object—a unified presence of all qualities present at once. It establishes a sense of knowing the track’s fundamental substance (manifesting as a core essence, inherent nature, a unique presence) and its individual form (multidimensional shape) and character (energy, expression, affects, meanings).

In the silence after the track, allow the listening experience of the track to dissipate into a single awareness and reflect on the impression that remains. Reflect on this presence of the track that lingers in your memory and psyche, in your consciousness and awareness. The goal is to not ‘make sense of it’ but rather to ‘recognize it’ for what it is, perhaps to ‘feel’ its character. This overall presence is—at least in part—crystallized form.

Within this impression and conception is a sense of the amount of energy and the level of exertion of the performance, the tempo and the speed of motion, and the magnitude of intensity within its expression; these are supplemented with the drama and meanings of the lyrics and the spatial attributes and other characteristics of the recording. These all coalesce into a manifestation of the performance intensity of the track.

This performance intensity of the track is the reference dynamic level (RDL). The goal, here, is to experience the RDL as a sense of the intensity of the track. Perceived holistically and considered without interference from verbalization, it is a single level of intensity that embodies the track.

The reference dynamic level is the part or quality of crystallized form that embodies the intensity of the track—in all the elements and materials and outside associations that shape and establish it. While Table 9.1 presents factors that potentially influence RDL, it is important to remember this it is the result of an experience, not a calculation. Listening from the position of accessing musical expression and an appreciation of the singular, coherent whole will open attention and awareness; attention is directed toward nonverbal qualitative reflection and a recognition of its expression; one that is based on musical thought and aural imagery. This contrasts markedly with the analytic reasoning required of analysis, which engages the rational thought processes of calculating, deducing, problem solving and otherwise attempting to assemble a result for the RDL—a process that quickly and prematurely brings verbalization, and does not access the inherent character of crystallized form and the intensity of the RDL. This will be addressed in more detail later under crystallized form.

Revealing an RDL involves listening and finding the level at which the track’s energy and expression reflect its intrinsic character. Table 9.1 is presented to identify sources that may contribute to the RDL of a track. They are not intended to bring the reader to divert attention to these sources; attention should remain at the highest level of perspective, to experience the intensity dimension of the track. Each track, every individual track, will have a different combination of factors, and different proportions of factors that formulate a sensation of RDL that is unique. There is no formula for RDL, as it is an interpretation. One can expect one’s sense of the RDL to evolve—especially while one is becoming more acquainted with a track. As an interpretation, the RDL will (to some unknown extent) reflect the analyst’s biases, though this interpretation should (as much as is possible) reflect an objective interpretation of culturally shared perceptions.

Table 9.1 Potential factors that can influence reference dynamic level. These factors potentially provide a variable level of influence toward establishing a singular, overall impression of intensity for the individual track.


 Tempo, energy and directed motion Dramatic expression of lyrics 
 Density of information Meanings and associations of lyrics 
 Levels of exertion of performances (reflected in intensities of timbres): all sound sources, potential emphasis of dominant or significant sources, potential emphasis of contextual layer, with respect to the overall texture's performance intensity Recording element influences: pitch density, timbral balance, program dynamic contour, loudness balance, holistic environment, sound stage qualities, and other 
 Loudness levels and contours Affects and emotions of the whole 
 Content and tension of expression Musical expression

Overall Loudness Level of the Track

A hierarchy of loudness-level strata exists similar to hierarchies of other elements. The above comparison types for loudness levels will occur at all levels of perspective, except one sensation of loudness exists at the highest level of perspective and comparisons will be made only within that single contour. Table 9.2 illustrates the hierarchy of loudness levels.

Attention to the sensation of loudness can be directed to any of these levels of perspective. Simultaneous sounds or successive sounds can be compared in all of these perspectives—allowing a single stream of sounds to be followed, or various streams to be compared to one another. These levels of perspective differ greatly, ranging from the singular loudness of the aggregate texture to the strata within a sound’s components and its reverberation (loudness levels within timbres were encountered in Chapter 7, and Chapter 8 introduced loudness levels within environments).

At the highest level of perspective, the track is distilled to a single sensation of loudness; we perceive this level to change continually and to form a contour across the entire track. In earlier writings (Moylan 1992 and 2015) I have referred to this as ‘program dynamic contour.’ In these I was writing from the vantage point of an engineer/producer (I prefer the term ‘recordist’)3 where the term ‘program’ has been commonly used to describe the overall track or its singular sound; ‘dynamic’ was used synonymously with the sensation of ‘loudness’ to connect the two concepts (though this connection was rarely explicitly articulated) in order to connect the role of loudness sensation to the musicality of the track. This overall loudness level could be re-named as ‘track loudness contour’4 should one wish to be more accurate and perhaps less confusing. I will use both terms synonymously as this discussion unfolds.

Table 9.2 Hierarchy of loudness-level strata as a recording element in relationship to levels of perspective.

 Perspective Loudness/Dynamic Levels and Relationships 
 
 Overall Texture Reference dynamic level 
 Loudness/dynamic contour of overall, aggregate texture 
 Composite Texture Loudness/dynamic balance relationships of sound sources (voices and instruments) 
 Basic-level: individual sound sources Loudness shapes/contours of materials or lines 
 Loudness levels of individual sound sources 
 Individual sounds: overall loudness level and shape/contour Dynamic/loudness contours of individual sounds 
 Contour/shape of dynamic envelopes 
 Individual sounds: internal loudness levels and contours Contour/shape of spectral envelopesDynamic/loudness contours within reverberant energy

The track loudness contour is the single loudness-level of the track’s aggregate sound; it is the result of the combination of all source loudness levels. It helps some to envision this sensation and concept by thinking of a single VU (voltage unit) meter that displays a representation of the signal level, following the loudness level of the program as it potentially changes at every moment. The contour is the shape of the track’s changing loudness that is revealed as it progresses from beginning to end. This loudness shape often has structural significance to tracks, and can support its drama; this sensation in itself it is capable of generating movement and tension within the track. It is unusual to listen for overall loudness of many sources in our everyday lives; many practicing musicians, including conductors, are not aware of or seek to engage this sensation of combined loudness levels, though some certainly do. Recordists, in contrast, often bring their attention to this level of detail while shaping various stages of the recording process. This overall loudness contour may be deliberately shaped in the recording process; its subtle changes are often the result of the track’s arrangement or of its mix; these are significant contributions to the character of the track.5

This loudness shape often has structural significance to tracks, and it can support its expression. Jada Watson and Lori Burns (2010) use the loudness-wave shapes of a track’s two channels to illustrate this overall loudness as amplitude; while this is not precisely aligned with program dynamic contour (it depicts a physical measure and not the perceived dynamic shape, and it treats channels independently instead of bringing attention to a single impression), such a diagram can guide perception and observation, especially when acquiring this skill. They note their amplitude diagram of the Dixie Chicks song “Not Ready to Make Nice” (2006) “reveals that the song has an overarching increase and decrease in dynamic amplitude (< >), a design that reflects the growing intensity of the vocal gestures and instrumentation and complements the intensification of anger and resistance in the lyrics” (Watson & Burns 2010, 345). Here they connect structure, performance intensity, dramatic expression and the content of the lyrics into a statement that provides much important information to the track, and to its overall loudness shape.

A program dynamic contour graph allows the loudness contour of the track to be observed and notated. Listening to this overall loudness sensation will be a new experience to some readers, though it can be developed with directed attention; praxis study 9.2 can guide this experience. Engaging this level of perspective will lead to new observations, even for tracks one already knows well. When engaging the track’s loudness contour care should be taken to remain focused on the sensation of loudness alone, and to remain uninfluenced by the timbres and intensity levels of the ensemble or the drama of the music. Timbre and expression characteristics (and other percepts) can bring the impression of increased or diminished loudness, without an actual change in acoustic energy; likewise, loudness can change without a change in timbre (or another element of recording or another domain). Remember to monitor playback level; consistent playback level is needed for accurate observations between listening sessions.

Figure 9.1 VU (volume unit) meter. Image courtesy of API (Automated Processes, Inc.).

Figure 9.1 VU (volume unit) meter. Image courtesy of API (Automated Processes, Inc.).

Figure 9.2 Program dynamic contour (track loudness contour) graph of the Beatles' "Here Comes the Sun," Abbey Road (1969, 1987).

Figure 9.2 Program dynamic contour (track loudness contour) graph of the Beatles' "Here Comes the Sun," Abbey Road (1969, 1987).

Figure 9.2 illustrates the changes in overall loudness level throughout the track “Here Comes the Sun” (1969, 1987). The graph contains the reference dynamic level of the track, against which the contour can be heard. Imbedded in the contour are shapes of loudness that correspond to structural divisions; as the shapes emerge within one’s hearing of the track, their role in defining sections through their repetition becomes apparent. The loudness shape of the track is clearly evident from its beginning at the lower portion of mp to its peak within ff. The wide dynamic range of the track contains subtle changes of loudness as well as large and sudden shifts.

“Here Comes the Sun” is among the uncommon tracks in which the reference dynamic level is prominently experienced. During the final moments of the coda, the level of the track loudness contour matches the track’s RDL; the reference dynamic level is audible as the track’s overall loudness arrives at the track’s overall sense of energy, exertion, and expression (that is the RDL). At this moment, the low mf RDL delivers a sense of arrival and a settling in the place of the conception and expression in which the track exists. It is common for a track to arrive at its RDL as an important occurrence, but it is not common for it to be a point of arrival that provides aesthetic closure to a track.

Loudness Balance, Musical Balance

The balance of sound source loudness levels established by recording exerts significant influence on the track. The level of loudness of sources and their resulting interrelationships represents an important element of recording—this is loudness balance. Though loudness balance is but one of the recording elements applying influence on the track, it has the potential to prominently shape the content and character of the track.

The recording’s role shaping relationships of sounds is often reflected—to some degree—within loudness balance. Loudness balance situates each source in the mix in terms of loudness; as loudness brings all sound into perception, loudness brings it to be audible and establishes its presence. Observing the actual loudness levels of sources can bring an understanding of the loudness contours of each source, of their loudness relationships, and of the composite texture’s loudness balance. In music settings, loudness balance of instruments and voices (sound sources) is commonly framed as ‘musical balance.’ In earlier writings (Moylan 1992 and 2015) I have used ‘musical balance’ synonymously with what is referred to as ‘loudness balance’ here. ‘Loudness balance’ will be used here to more clearly differentiate this recording element from the elements of music.

Loudness contours of individual sound sources can be notated on an individual or a collective ‘loudness contour graph.’ This can provide a clear way to notate even the subtlest loudness/dynamic changes of individual sources and their gestures, or musical lines. It can also reveal loudness relationships and groupings of sources where they combine to create particular contours. The loudness balance X-Y graph will illustrate the actual loudness levels of sources; the graph can be used to make general loudness observations, or it may be calculated in detail against the reference dynamic level. Either approach may be most appropriate, depending on the goals of the individual analysis. Sound sources are represented by a separate line of the graph, allowing their contours to be mapped as it changes over time. The graph displays loudness as loudness—it does not factor in the influences of timbre, register, or prominence of any other origin. We remember that loudness is the perception of the amplitude of sound as we bring our attention to this element. It is important to remember to hold all sounds in equal prominence, as loudness can easily be distorted by listener perspective and focus. A source in the center

Figure 9.3 Loudness balance graph of the Beatles' "Lucy in the Sky with Diamonds" from Sgt. Pepper's Lonely Hearts Club Band (1967, 1987).

Figure 9.3 Loudness balance graph of the Beatles' "Lucy in the Sky with Diamonds" from Sgt. Pepper's Lonely Hearts Club Band (1967, 1987).

of one’s attention will be emphasized in one’s awareness, and cause loudness judgement to be skewed. Listening for loudness contours, and judging relationships to the RDL does, however, bring one to listen at the perspective of the individual source. Praxis study 9.3 can guide these listening experiences.

Figure 9.3 illustrates the loudness balance of “Lucy in the Sky with Diamonds” from Sgt. Pepper’s Lonely Hearts Club Band (1967, 1987). This graph provides great detail on the loudness shapes and relationships of all sounds against the track’s RDL. Note the designation of the RDL’s placement related to the dynamic areas; the loudness levels of individual sounds are interpreted against that level just as occurred with the program dynamic contour (track loudness contour). The RDL is the same in each graph (observing the same track).

This graph brings the activities of all sources into keen focus, and closely observes their loudness interrelationships.

Not all analyses require this highly detailed data collection, though. A more general assessment of loudness levels might provide adequate insight into tracks. Figure 9.4 illustrates the loudness levels of sound sources in three versions of the Beatles’ “Let It Be.” The sources are identified within the more generally-defined (though closely consistent with traditional dynamic areas) loudness areas ranging from very soft to very loud. These graphs contain general contours of loudness levels of little detail; they provide an impression of the overall loudness level of sources during a section or passage. A reference dynamic level is absent from each version as well; this also contributes to the general nature of these loudness level and relationship observations. A quick glance at these graphs identifies fundamental loudness differences between the three versions, even with observations quite generalized. Much detail could be extracted from these examples, adding to what has been provided here.

The original, single-release version produced by George Martin has a narrow range of loudness levels. All sounds are in the lower half of ‘moderately loud.’ The lead vocal is loudest, except for a short phrase at the end of Verse 1. The background vocals have a similar loudness, to the lead vocal and piano, and is situated between them until the piano increases in loudness at the end of the chorus. The piano’s rise in level at the end of sections is obvious from the graph.

Phil Spector’s version from the Let It Be (1970, 1987) album has clear contrasts. Loudness varies between high ‘moderately loud’ to the upper portion of ‘soft.’ The background vocals passage changes loudness level markedly. The piano retains its general loudness contour, though the chorus level is some magnitude lower in this version. Billy Preston’s Hammond organ resides at a soft loudness in the chorus, removed from all other sources until its level is approached by the backing vocals at the end of the chorus.

The graph for the Let It Be . . . Naked (2003) version makes clear it has the widest range of loudness levels of the three versions. The lead vocal is loudest in this version; just as in the other two versions, it is the loudest within the track. The background vocals are louder in this version than in the others, and the general loudness contour of the piano is slightly modified in the chorus. The Hammond organ is present throughout all sections graphed; though it is extremely soft and at times barely present, it has a distinct loudness contour.

Performance Intensity

Contrasting performance intensity with loudness balance allows the analyst to observe the impact of the mix on the performances of sources. Here, loudness contrasts with timbre instead of working in synergy. Loudness changes not accompanied by timbral changes emerge, and the reverse occurs as well, as timbres remain stable while loudness is altered. This contrast will often provide information on the relationships of sound sources to the overall loudness level of the track (program dynamic contour).

Figure 9.4 Generalized loudness balance graphs of three versions of the Beatles' "Let It Be."

Figure 9.4 Generalized loudness balance graphs of three versions of the Beatles' "Let It Be."

Performance intensity reflects the qualities present when a voice or an instrument is recorded; it is reflected in the timbre of the sound source. Performance intensity is the timbre of the instrument resulting from the levels of physical exertion, energy, expression, performance techniques, and any other timbral qualities related to performance present in the sound and the performance when it was captured (recorded) (Moylan 2015, 450).

The loudness level that established performance intensity when the source was recorded is transformed within the mix process. This allows performance intensity—and its expressive content—to be a separate percept in the track. This dichotomy separates reality from the crafted world within the record; this is an inherent trait of popular music recordings. The timbre of performance intensity is often used aesthetically to carefully shape the performance itself. In this way it is used to enhance the drama and expressions of the track, and for many other purposes. For example, sounds of low performance intensity often appear at higher loudness levels in the mix than their timbres indicate; this is especially common within vocal performances, where a whispered word can appear loudly in the track. In expressing lyrics and the musical line, a lead vocal can vary in performance intensity considerably and often—sometimes within a word or a syllable. The voice—“full of concrete meaning that is not conveyed through lyrics is to be found in all forms of musical expression, but in recorded music, precisely because it is recorded . . . the effect is especially discernable” (Lefford 2014, 44)—clearly illustrates the significance of performance intensity to shaping interpretation by performers and listeners. Performance intensity is often the primary carrier of musical expression, and also of the drama delivery and the shaping of meanings within the lyrics. It can create and carry the level of urgency in the track and establish a sense of tension or of ease.

The levels of performance intensity are interpreted by listeners by observing timbral qualities as they relate to physical exertion and expression. Denis Smalley (2007, 39) notes: “[O]ur experience of the physical act of sound making involves both touch and proprioception—the tensing and relaxation of muscles in relation to all types of body movement.” This experience can be one of observation as well as participation. A listener’s prior experience with and knowledge of the particular instrument’s timbre, as well as their abilities to remember and match the timbre in a state of normalcy, play central roles in this interpretation, and its relative accuracy. The deeper the experience and prior knowledge, the more likely the listener will successfully and accurately recognize the instrument’s performance intensity through perceived timbral qualities. In absence of experience, this interrelationship of timbral quality and performance intensity may have its “content . . . simply assumed, or even invented, by the listener” (A.F. Moore 2010, 259). No matter the accuracy of the percept to the actual act, performance intensity contributes substantively to a listener’s interpretation of important aspects of the track.

The threshold level that separates expending and restraining energy—the transitions from force to resistance as discussed in Chapter 3—is a reference for calculating performance intensity. At this level, resting between mp and mf, performance intensity (and source timbre) is in a state of normalcy for timbral characteristics, where the timbre is not altered by the energies of performance. This level of effort can theoretically be sustained indefinitely. In “Burnt Norton” from Four Quartets, T.S. Eliot (1943, 15–16) called this idea “the still point”; his poem describes this “still point of the turning world” as being “neither from nor towards ... neither arrest nor movement . . . neither ascent nor decline . . . except for ... the still point, there would be no dance, and there is only the dance.”

This “still point” provides a reference to allow loudness related to the listener’s perception of body movement’s role in sound making to be calculated. Embodied music cognition tends to recognize music perception as based on action—those of the listener body movements, and perception of kinesthetic properties and musical gestures involved in creating the sounds of the performance (Zbikowski 2011, 181–190).

Performance intensity in itself can produce directed motion in a line. The urgency of performance intensity’s expression and perceived actions of its physicality can create motion and tension in music, and in subdued expression it can create calm, ease, and repose, and thus mitigate all but the slightest tension; this is not a duality, but rather a continuum that may establish many states between the greatest tension and most urgency imaginable, and the slightest motion of ease and a most minimal sense of tension. Performance intensity can communicate urgency and energy, drama and expression, directed motion and stillness, tension and release, exertion and restraint, and more. Changes can be extreme and fast, subtle and gradual, but all contribute character and substance to sound sources and their materials—all are embodied in the timbre of the sound source and its invisible performance gestures. Performance intensity does much to shape the character of timbres and the affects of their delivery, and define the qualities of individual performances and the delivery of their musical idea.

A confluence of all domains and many of their elements can be found within performance intensity—as timbre, musical lines, and perhaps lyrics blend into one gesture. Performance intensity— and all that it carries—is an important aspect of performances in the track (Moylan 2015, 323).

In the context of the track, performance intensity may be accompanied by loudness changes, but not necessarily. Timbre can change without loudness changes, and performance intensity is reflected in timbre much more than loudness. These are controlled separately in the mix process.

Performance Intensity and Loudness Balance Graph

Figure 9.5 is a performance intensity and loudness contour graph. This graph is uniquely suited to the analysis of recorded performances, and can reveal a wealth of pertinent information.

Performance intensity is notated on the upper tier of the X-Y graph, charting the dynamic levels and contours of each source’s original performance. The timbres of sources are not related to the reference dynamic level; rather their intensity is calculated based on their timbre and expressive character. The lower tier of the graph provides the loudness balance of sources; the loudness levels of sources are calculated relative to the track’s RDL. Contrasting performance intensity with loudness balance, one is comparing two separate elements against the same timeline—the interaction of implied loudness of timbre and actual loudness in the track illustrates a confluence of the recording elements as well as their independence. The graph allows direct comparison of two different states of the same sound sources, as they change over the duration of the example.

Performance intensity and loudness contour graphs may contain all sources, or a selection of sources of significance to an analysis. When needed, a single source might be observed through this approach. The level of detail of data from both performance intensity and loudness balance can also be adjusted to reflect the needs of an analysis, from a significant attention to detail to general impressions.

Figure 9.5 provides much detail of contours and levels in both tiers. The graph could have greater detail, though. Subtle changes of timbre in the vocals and in several instruments are not contained here. In examining the two tiers of Figure 9.5, one can immediately recognize the disparity in levels of the Lowrey organ, as it is much louder in the mix that its performance intensity suggests. Comparing the two tiers, one can determine how sources deviate from their original levels, and just how far the loudness balance has strayed from the original

Figure 9.5 Performance intensity and loudness balance graph of the Beatles' "Lucy in the Sky with Diamonds," Yellow Submarine (1999).

Figure 9.5 Performance intensity and loudness balance graph of the Beatles' "Lucy in the Sky with Diamonds," Yellow Submarine (1999).

performances of the parts; the loudness relationships suggested by the timbres of the original performances are represented in the ‘performance intensity’ tier.

Figure 9.5 presents the Yellow Submarine (1999) version of “Lucy in the Sky with Diamonds,” graphing both performance intensity and loudness balance. It contrasts with the original Sgt. Pepper’s Lonely Hearts Club Band version in Figure 9.3, which provides the loudness balance of this same sections as 9.5. These two versions were created from the same source tape, with the same performances— allowing the opportunity to bring our attention to clear differences of loudness balance, loudness levels, and loudness contours from the same performances. Comparing the two versions of loudness balance allow us to recognize that they have slightly different reference dynamic levels, a product of their different mixes (loudness level shifts, plus changes in other recording elements); both RDLs are in the upper portion of mf, with the Yellow Submarine version being slightly higher. While many loudness levels of sources have similar relationships between the two mixes, some sources have distinctly different loudness levels (such as the vocals in the chorus) and contours (observe the Lowrey organ in the introduction). Notice the shaping of the bass line; more nuance of changing levels is present in the original version. Numerous other differences can be observed from the graphs, and from listening to the tracks.

THE CONFLUENCE OF ALL THE TRACK CONTAINS

Confluence acknowledges the interdependence of all domains, all of their elements, and the performance. There is a complex interplay—as we have examined in materials, perspective and structural levels, aesthetics and affects—where we find no single thing dominates the track at any moment, though anything may be most prominent within (or dominate) our attention and interpretation at any moment. This interdependence within the track establishes a tapestry of all that is present— an intricate web of interrelationships and sound qualities that is an inner dimension of crystallized form. This confluence can be extraordinarily intricate, and yet appear unadorned, as Albin Zak has observed:

In the recording process, the mix establishes this confluence. It is in the mix process that recording elements are largely defined, blending and delineating pitch/frequency, timbral, spatial and loudness attributes. The process of mixing also blends the individual performances of all sources; it provides each source with a spatial identity and final alterations of timbre. The performances of musical materials and lyrics are shaped by recording elements within this process, establishing the fundamental qualities of the final track—qualities that provide the track with a level of distinction. The mix process occurs at the basic-level of sound source and at the composite level that compares them as equivalent and equals, though its confluence influences relationships of all elements at all levels of perspective. The mix shapes and blends even the subtlest aspects of each element of each domain into a sonic tapestry that ultimately manifests as the overall timbre of the track and the track’s spatial identity.

The mix manifests the confluence of the track. It is also a metaphor for the confluence of all the qualities, proportions, character and expression the track contains. Elements lose their individuality as they blend into gestures and materials, aural images and aural events, and a rich and complex texture. The acoustic wave (the result of all of the sounds of the track) that arrives to us (slightly differently at each ear)

There is good reason we often have trouble making sense of what we hear in the record; confluence establishes a rich and multidimensional texture that can blend sounds so they are no longer distinguishable.

Prominence Emerging from Confluence

The mix establishes a balance of all parts within that complex texture. From that balance, any sound or element may emerge to be more prominent than others. As we learned above, prominence is established by listener attention. Attention is drawn to what stands out from all else—this exists at all levels of perspective, comparing sounds of sources, observing the elements within sounds or musical lines, observing the interactions of musical materials, hearing a text emerge from a musical fabric, and more. Albin Zak (2001, 157) significantly notes that “prominence is perceptible only in relationship. That is, to assess prominence we need a frame of reference.” A frame of reference exists from the materials and elements within the track, whereby some ideas emerge as more significant, and others fall into other roles; sounds can also emerge because they are interesting in some way, or simply discovered and brought into the center of one’s attention. From the latter, we might begin to understand how prominence can be personal—what emerges from the texture for one person (and their listening interests, skills, sensibilities, experiences, etc.) may not emerge for others, and certainly will carry some level of individual interpretation.

Allan Moore and Albin Zak both use the term ‘prominence’ in a manner that contrasts with this writing (see earlier this chapter). Zak states: “[R]elations of prominence are analogous to ‘depth,’ among Massenburg’s four dimensions of the mix. They impart impressions of proximity and emphasis along with whatever associations these may have” (ibid., 155); this points to Massenburg’s blending of the terms proximity (depth) and prominence. Allan Moore (2012a, 31) uses the term ‘prominence’ to represent “sounds . . . more (or less) distant than each other” referring to perceived proximity or distance, the second dimension of his soundbox.

I use the term ‘prominence’ as a perceived emphasis of one material (element, domain, etc.) over another that is determined from a manner of attention, and from a direction of focus. I use the terms and concepts of ‘depth’ and ‘distance’ as dimensions of physical space—dimensions one might physically measure, or perceive, and/or interpret depending on context. The percepts of depth and distance are distinctly separate, and both are removed from prominence, which I approach as a manner of interpretation that may emerge as evoked from any element, sound or material.

When Albin Zak (2001) discusses ‘prominence,’ he approaches the concept more broadly, thereby touching upon several key concepts that are relevant here. First, he clearly identifies depth as existing at many levels of perspective, from the overall texture to the individual event—a central consideration related to confluence and the mix that applies to all elements. Next, he also extends his use of the term ‘prominence’ toward engaging the ways tracks reveal and emphasize elements and materials; especially significant is his recognition of the “multifaceted nature of prominence perception” (2001, 156). He makes it clear ‘prominence’ is distinct from ‘loudness;’ though prominence might be established by loudness, it is only one facet that might influence the impression. Prominence may also be established by timbre, its level of diffusion in the mix (environment quality), ambience, sense of distance or location in the stereo field. This acknowledgement of the multifaceted nature of prominence is significant. Sounds will emerge from the confluence of the track at all levels of perspective, and within all domains—including the confluence of recording elements and the confluence of all that the track contains. These concepts support the roles of equivalence as a factor in the potentials of any element to have significance, and in bringing attention to prominence perception (Moylan 2015, 320–321).

A sense of shifting perspective—intentionally shifting the focus of attention from one level of detail to another—allows prominence to be recognized as a matter of context, relative to its surrounding materials and elements. Eric Clarke (2005, 188) describes:

A sense of control develops as attention is deliberately and clearly focused to various perspectives, various domains, various elements, and so forth. With this sense of control, a perception of prominence that is the result of context rather than bias has the potential to emerge. The analyst might then be able to choose whether to seek “what is most prominent within the texture” or “what appears to them the most prominent based on their own sensibilities”; the deliberate choice is what is important here. One choice allows the analysis to be based on (or at least emphasize) content within context of the track and within a culturally bonded interpretation, and the other emphasizes personal interpretation—of course, a continuum of shadings exist between these two poles.

Confluence of Recording Elements

The interrelationships and interdependence of recording elements manifest within the confluence of the mix, as each individual recorded performance (or track) is combined and mixed with others. Richard Middleton (1990) has framed this process as ‘polyvocality’:

This section examines ways data collected from several recording elements can be displayed to allow their individual traits and their interactions to be observed. What is offered is far from exhaustive, but can lead the analyst to determine how to most suitably explore elements within individual tracks. The most significant difference in these approaches to comparing elements lies in the factoring of time into observation methods. Some recording elements in some tracks are substantially fluid and temporal, changing over time. Other elements may be largely or completely static or stationary, their qualities fixed throughout a track or within sections; non-temporal graphs or diagrams bring a visual representation of the data of these elements. The temporal nature of any recording element within any track may establish surface rhythms aligned with the metric grid—though this is uncommon, especially an element like environments; elements establish their own pacing and morphology (changes of quality) within individual tracks, and also at each level of perspective.

The number of sources examined in diagrams, graphs or within any process might range from all sources present to a smaller number of select sound sources; a single source could also be graphed in all of these forms, allowing it to later be examined in great depth. Timelines for graphs might range from the entire track, to major sections, or perhaps single measures or more extended phrases. The span of time represented by illustrations and diagrams might be defined similarly.

Temporal Graphs for Comparing Recording Elements

Interrelationships of recording elements can be observed by comparing the observations of elements. The juxtaposition of loudness balance and performance intensity X-Y graphs discussed above allowed those elements to be observed simultaneously, as they evolved temporally over the duration of the example; changes of levels (either general or in detail) could be displayed in their magnitude and at the time of their change(s). Time marks the place (or moment) of change, and comparing these places of change represents rhythm.

Graphing two or more elements against a common timeline might display the elements’ data so as to allow their interrelationships to emerge more visibly—in other tracks this effort might yield less richness.

This approach can be used similarly for any other combination of two elements to be compared at the same level of perspective. At the perspective level of the composite texture where the interdependence and interrelationships of elements manifest, we have identified five qualities of this texture:

  • Pitch density
  • Loudness balance
  • Performance intensity
  • Stereo imaging (image positions and sizes, transferable to surround sound)
  • Distance positions

The reader may notice host environments and timbre are omitted from this listing. Timbre and host environments do not function directly at the composite level, interacting with others; they function most strongly at the basic-level of defining the character and content of sources into an identity (“an acoustic guitar in a small hall with vaulted ceiling”) or at the overall texture (as timbral balance and holistic environment). These more complex percepts are comprised of several elements from this list functioning at a lower perspective.

These five qualities may be coupled into ten (10) X-Y graph pairs—ten ways the qualities of the composite texture might be observed in groups of two. Among the possible permutations, interesting evaluations might emerge from observing the following pairings at the perspective of the composite level, plotted against the same timeline:

  • Distance positions and stereo imaging X-Y graphs
  • Stereo imaging and pitch density X-Y graphs
  • Performance intensity and pitch density X-Y graphs
  • Loudness balance and distance positions X-Y graphs
  • Loudness balance and pitch density X-Y graphs

In addition to the coupling of performance intensity and loudness balance offered before, other dual combinations of elements may be desirable, as they hold potential to generate pertinent observations

Figure 9.6 Temporal X-Y graph comparing pitch density and stereo imaging against a common timeline; from the Beatles' "Lucy in the Sky with Diamonds," Yellow Submarine (1999).

Figure 9.6 Temporal X-Y graph comparing pitch density and stereo imaging against a common timeline; from the Beatles' "Lucy in the Sky with Diamonds," Yellow Submarine (1999).

for an individual track (or section thereof). These other combinations are (1) pitch density and distance position, (2) loudness balance and stereo imaging, (3) performance intensity and stereo imaging, and (4) distance position and performance intensity. Some graphs are more workable than others, and the value of any graph is related its usefulness or appropriateness for an individual track; a graph’s usefulness is based on the content of an individual track, the goals of an analysis and the intentions of the analysist. All temporal graphs can allow subtle changes to be illustrated and observations can be detailed, or they may be approached observing more general values.

Examining combinations of three or more elements might progress similarly at the composite texture, illustrating the various elements of basic-level sources. Figure 6.2 illustrated loudness balance, stereo imaging and distance positions for four sources against a common timeline. Any combination of elements may be examined in this way, so long as observations remain at the same level of perspective. Returning to the five composite texture qualities listed above, there are ten (10) possible combinations of three qualities, and five (5) possible combinations of four qualities that could appear on a single graph, on separate tiers and against a common timeline.

The combination of pitch density, stereo image width and location, and of distance position is one of the ten possible combinations that could comprise a three-tier X-Y graph against the same timeline. This combination is the same as the soundbox (explored later).

Non-Temporal Graphs and Diagrams for Comparing Recording Elements

Table 9.3 Possible recording-element combinations of three (3) and of four (4) elements that may interact and/ or establish inter-dependence within the composite texture.


 Combinations of Three (3) Qualities Combinations of Four (4) Qualities 
 
 Pitch Density, Loudness Balance and Performance Intensity Pitch Density, Loudness Balance, Performance Intensity and Stereo Imaging 
 Pitch Density, Loudness Balance and Stereo Imaging Pitch Density, Loudness Balance, Performance Intensity and Distance Position 
 Pitch Density, Loudness Balance and Distance Position Pitch Density, Loudness Balance, Stereo Imaging and Distance Position 
 Pitch Density, Performance Intensity and Stereo Imaging Pitch Density, Loudness Balance, Stereo Imaging and Distance Position 
 Pitch Density, Performance Intensity and Distance Position Loudness Balance, Performance Intensity, Stereo Imaging and Distance Position 
 Pitch Density, Stereo Imaging and Distance Position 
 Loudness Balance, Performance Intensity and Stereo Imaging 
 Loudness Balance, Performance Intensity and Distance Position 
 Loudness Balance, Stereo Imaging and Distance Position 
 Performance Intensity, Stereo Imaging and Distance Position

Certain combinations of elements may also be charted as opposing axes on the same X-Y graph. Some combinations of elements are better suited than others for illustrating data. Figure 9.7 plots the stereo image size and position and the pitch density of several sound sources, positioning the sounds in perceived lateral space and by frequency/pitch content—what some refer to as “spectral space” (Smalley 2007) or “pitch space” (A.F. Moore 2012a, 31). This juxtaposition works visually, whereas graphing other combinations from the ten pairings described above may not be as successful—for instance, materials in a graph of performance intensity on the X-axis and loudness level on the Y-axis may be confusing. These two axes are capable of presenting these two source attributes with as much precision and accuracy the analyst wishes to seek; this graph allows considerable detail to be observed for these two dimensions of the “soundbox,” described in the next section.

Figure 9.7 Non-temporal X-Y graph comparing pitch density and stereo imaging; the Beatles’ “Lucy in the Sky with Diamonds,” Yellow Submarine (1999), 0:00–0:31.

Figure 9.7 Non-temporal X-Y graph comparing pitch density and stereo imaging; the Beatles’ “Lucy in the Sky with Diamonds,” Yellow Submarine (1999), 0:00–0:31.

Subtle changes or aspects such as small gradations of size, location, etc. are often significant to recording elements, and are often temporal in some way. All subtle changes and qualities have the potential to be significant to the track. An analyst choosing to use non-temporal graphs or diagrams may be forced to omit or condense details that cannot be incorporated into the format; changing formats for displaying data may be required if the data of the track cannot be clearly represented.

As these graphs are non-temporal (do not change over time), materials or elements that change over time are typically generalized into a single image. Elements that exhibit changes are difficult to notate, or illustrate, and might be generalized similarly. It follows that these graphs are inherently less detailed, imprecise to some degree (depending on the track and materials), and represent some span of time.

As time is not incorporated into these illustrations, the time period represented needs to be identified. These graphs (or illustrations) represent snap shots of time, or defined durations within which the graph’s content is present. This time period can represent syncrisis time units (Tagg 2013, 385) of an extended present, an integrated auditory scene (Bregman 1990) of some duration less than a syncrisis unit or extending beyond the window of ‘now sound,’ a structural song section (an appropriate division for numerous elements of many tracks), or a generalization of an entire track. Any time span appropriate to the element(s) and to the track may be represented here. Typically, the longer the time span the more likely the illustration contained missed details, as the track’s subtle information is increasingly absorbed into an overall impression.

The sound stage diagrams introduced in Chapter 8 are examples of non-temporal illustrations. Those diagrams represent defined periods of time; these may be considered as ‘scenes’ (Chapter 8) that take place over defined periods of time. Sources placed on those diagrams can be precisely located on the scaled sound stage (see Figure 8.9), for image location and width boundaries as well as distance position; those diagrams incorporate scales allowing detail to accurately place sounds.

Sources are located with less precision on the proximate sound stage (see Figure 8.8), where lateral and distance axes are not scaled. Image boundaries and positions are more generalized, though their relationships with other sources and the listener position are apparent. Figure 9.8 uses this proximate sound stage to localize sources in lateral size and distance from the listener point of audition, and also to provide an illustration of the size of source host environments, and their relationships.

Thus, the perceived depth of the sound source plus their individual host environment might be incorporated into the proximate sound stage, providing additional detail to the depth of the sources and of the sound stage. This allows source and environment placements to be conceptualized on the sound stage, though not placed in exact positions. Sources can be localized with proxemics (with as much detail as needed) though environment size is an interpretive approximation of the size of the space and the distance of the source from the front of the space and its rear wall. Some of these qualities are not present in artificial spaces. Sizes and locations of host environments are determined largely by comparing one sound to another, one space to others, etc. The non-scaled placements of sources within this format is more appropriate for these percepts which cannot be placed with proxemics or against a scale. The increments of space used to divide the scaled sound stage do not conform to these percepts; while one can identify precise widths and depths of spaces, we do have a sense of the distance of the source from the boundaries of the space, and can use that sense to assess the relationship of the source to the geometry of its host environment. Figure 9.8 demonstrates placements of sources plus their host environments on a sound stage—stated differently, this non-temporal figure allows three elements (stereo imaging, distance location, host environment size) to be illustrated in one place, and for comparisons to emerge. The track’s multiple spaces, the interactions of spatial simultaneity (Smalley 1997, 124), and the interrelationships of host spaces to the holistic space might be observed aided by this illustration.

The Soundbox and Other Approaches to Observing Multiple Recording Elements

The approaches discussed here—including the soundbox and various sound stages—mark a transition from data collection and display, toward and engaging the potentials of the elements in shaping the record (we will seek to retain clarity between the acts of collecting or observing data, and of evaluating it). Each offer guidance to access and recognize the contributions of recording elements to the track, and the interdependence of elements as each contributes to confluence. The acts of examining and of recognizing contributions of elements are the evaluation and conclusion processes—processes that will be explored in detail within Chapter 10. Only a few of these approaches below offer ways to illustrate or notate elements. While illustration and notation (of all sorts) is rife with issues we have discussed before, it holds benefits of collecting, refining, visualizing and holding data; as observation progresses, it is simply impossible to accurately hold all information in one’s mind.

Figure 9.8 Proximate sound stage of “Let It Be” (Past Masters, Volume Two, 1988) 0:00–1:01. Diagram illustrates sound source image positions and size, outlined by the widths and depths of their host environments.

Figure 9.8 Proximate sound stage of “Let It Be” (Past Masters, Volume Two, 1988) 0:00–1:01. Diagram illustrates sound source image positions and size, outlined by the widths and depths of their host environments.

The general qualities of the proximate sound stage have some similarity to the soundbox (briefly discussed in previous chapters). The soundbox contains the dimensions that numerous scholars engage when discussing tracks: stereo field, depth of sound stage and frequency range.

Allan Moore (1992) offered the soundbox as an approach to illustrating some of the primary elements of records; while in a way that has some similarity to sound staging it was devised quite separately. It is also distinctly different. The soundbox “is a heuristic model of the way sound-source location works in recordings, acting as a virtual spatial ‘enclosure’ for the mapping of sources . . . locations can be described in terms of four dimensions. The first, time, is obvious” (A.F. Moore 2012a, 31). The other three are the stereo image, distance (which he identifies as “perceived proximity of aspects of the image to ... a listener”) and “the perceived frequency characteristics of sound-sources” (ibid.). The soundbox is “almost like an abstract, three dimensional television screen” (Moore & Martin 2019, 149), positioning sound sources in frequency/pitch range (as in pitch density, above), in stereo positioning and image size, and in perceived proximity to (distance from) the listener position; using terminology offered within this writing, it combines stereo imaging, distance positioning, and pitch density. Like sound stage diagrams, the soundbox is at the perspective of the composite sound; it illustrates strands of instrumental timbre “conceived with reference to a ‘virtual textural space,’ envisaged as an empty cube” (ibid.). Fourth dimension of time represents a span of time, much like sound stages; illustrating changes to source positions in any of the three dimensions requires a new soundbox. It is challenging to make motion of images (changes of positions) clear in any illustration that does not incorporate a timeline— including the soundbox.

Allan Moore and colleagues have applied the soundbox to numerous tracks pursuing a variety of goals,6 including a taxonomy study (Dockwray & Moore 2010). The soundbox can convincingly illustrate the relative placement of a moderate number of sources (adequate for many tracks) within a conceptual three dimensionality of space. It combines percepts of two perceived physical dimensions and one metaphorical conception of the“‘highness’ or ‘lowness’” of pitch/frequency (Doyle 2005, 27). What the soundbox loses by way of precision of displaying data, it often gains in establishing a readily identifiable three-dimensional visual representation of sources. The similarity of the soundbox to the visual approach of representing sound sources as circles used by David Gibson (2005) has been acknowledged (Dockwray & Moore 2010, 224–225). Soundbox diagrams use simplified representations of specific sound sources—images of instruments and voices—morphed to occupy the three-dimensional space of the sound.

It should be clear the soundbox examines individual sound sources at the basic-level, as does the sound stage; this perspective allows comparison of sources at the level of the composite texture as well. These levels of perspective are the basis of approaches offered by the following scholars as well. As we have engaged many times to this point, the same percept can be defined differently on different levels of perspective. The audible pitch/frequency range (divided into registers) that I use to chart “pitch density” on the composite level (and timbral balance in overall texture) is defined as ‘register’ or the “height” of a sound by Allan Moore (2012a, 31) for the vertical axis in the soundbox. Lelio Camilleri (2010, 202) conceives the audible pitch/frequency range as “spectral space”; it is “height” to Anne Danielsen (2006, 52) and “frequency spectrum (height)” to Albin Zak (2001, 144); Jay Hodgson (2017, 220) approaches the audible pitch/frequency range as a “vertical plane.” Considering a percept from a slightly different conceptual angle—perhaps as an object experienced in crystallized form—can change how one perceives the concept (element, percept, or confluence) without altering its substance; if solely for this reason, each of these approaches (and those of others) holds value for our analyses. There are other reasons to be sure; each considers similar aspects in unique ways, and some explore other dimensions of tracks. These approaches have some inherent differences, but largely the same percepts engaged from different angles.

Lelio Camilleri (2010, 201) approaches the interaction of recording elements as a three-dimensional “sonic space” “to indicate the space in which the piece unfolds in recorded format.” The three dimensions are localized space, spectral space and morphological space. Localised space is the area wherein sounds are placed in stereo and mono, and includes depth, position and motion; this reflects two axes of the soundbox and also of the sound stage. Camilleri offers: “[T]he spectral content (timbre) of sound plays a relevant role in the overall perception of space. . . . the notion of spectral space . . . is metaphorical since there is no such physical space” (ibid., 202). Within the spectral space [the pitch/frequency range of the track], the spectral content of the sounds used can establish experiences of saturation or emptiness within the space; in addition, Camilleri acknowledges the perspective of spectral space at the perspective of overall sound (timbral balance): “[T]he combination of the spectral content of sounds and their disposition can accentuate the various sensory experiences to be had from listening to the overall sound structure” (ibid.). The third dimension is morphological space, as sound unfolds temporally to develop the shapes of sounds, and perhaps evoke motion and a sense of direction; this can be at the perspective of sound source timbre, though its implications can manifest at all structural levels if one remembers the equivalence of elements at all structural levels, and timbre’s central role as a recording element.

Albin Zak (2001) views the confluence of recording elements as a four-dimensional space, supported by incorporating concepts of mixing music offered by George Massenburg. The approach “highlights the interactive nature of the relationships among individual elements and larger composites—artifacts and gestures—and points to the ongoing shifts in perspective that a record makes available through its manipulation of ‘four-dimensional space’” (Zak 2001, 144). Three of the dimensions are familiar: the stereo soundstage (width), the frequency spectrum (height), and “the combination of elements that account for relations of prominence (depth)” and the “fourth dimension is the progression of events, the narrative or montage” (ibid.). The fourth is temporal (as also identified by A.F. Moore and Camilleri), though it seeks information on all levels of perspective and acknowledges the unfolding of drama, structure and simultaneous, perhaps unrelated materials, elements or sounds. His use of ‘prominence’ was discussed earlier.

The soundbox is reframed with six components by Jay Hodgson (2017, 218–221). The components are auditory horizon, horizontal plane, horizontal span, proximity plane, vertical plane and vertical span. The auditory horizon “constitutes the total reach of the mix’s ‘earshot’”; horizontal plane “describes where a sound is heard in relation to center, and we call the total horizontal expanse of a mix its ‘Horizontal Span’; the proximity plane “describes the position of sounds . . . vis-à-vis its Auditory Horizon . . . [and] represents a mix’s ability to hear depth, with the Auditory Horizon comprising its far limit” (ibid., 220). Hodgson identifi es the proximity plane as perhaps the most significant component of a soundbox. The vertical plane and vertical span describe the mix’s “capacity to hear vertically” (ibid.) and work together to identify the span (highest to lowest) of frequency content (vertical plane) in the mix. The “width, height, depth and temporal change” dimensions of the soundbox offered by Allan Moore (2010, 258) to discuss the textures established by interacting recording elements are engaged by Hodgson as dimensions of horizontal plane, vertical plane, and proximity plane; each ‘plane’ is then observed for its activity, with spans of farthest left and right image locations for horizontal span, lowest and highest pitch/frequencies for vertical span, and an auditory horizon used as a contextual reference for recognizing proximity (distance) of sources. At the perspective of the overall texture, horizontal span and auditory horizon align with three of the boundaries of the sound stage and vertical span represents the range of timbral balance for a track. Hodgson’s work offers the defining dimensions of the soundbox, incorporating some additional concepts to present an approach that has the potential to open readily to typology and application to analysis.

Anne Danielsen (1998) has provided a unique conception of a soundbox—originally described by the term “lydrom” (meaning ‘sound room’ in Norwegian)—with some similar dimensionality, though defined with a sense of functionality. “Her conceptualization of the sound box . . . was an attempt to capture processes within the sound—for example, radical change and the lack of continuity in time and/or space caused by the montage-like aesthetics” (Brøvig-Hanssen & Danielsen 2013, 72) found within Prince’s Diamonds and Pearls album (1991). This approach recognizes the recording elements that comprise the sound room (arranged similarly to the soundbox) as potentially acting in confluence and interdependence with the materials and elements of music. In discussing the interactions of pitch, timbre, dynamics, rhythm and melody merging into a heterogeneous groove sound of funk, Danielsen (2006, 51–52) identifies their interaction with the track’s space and the spatial differentiation of sounds:

The sound room—as applied here—illustrates a means to recognize the spatial delineation of musical materials, as they function to provide rhythmic propulsion to the mix of the groove sound.

With Danielsen’s sound room, we transition toward the confluence of the three domains and the many performances within tracks. The temporal change dimension is the most problematic within the approaches cited. Except for collecting and observing rhythmic patterning, temporal change leads directly to evaluation, to structural hierarchies, and much more. We will return to these approaches to recording elements, and add others, as we engage data evaluation and formulating conclusions in Chapter 10.

The Mix as Metaphor: Confluence and Interdependence of Domains

I read in Serge Lacasse’s offering of “phonographic discourse” a recognition of the confluence and interdependence of the domains of recording, music, and lyrics and their performance. As examined in Chapter 1, the act of ‘composition’ reaches not only through the domains, but encompasses their performances as well; initiating a cyclic process, performances (their interpretations, expression, elaborations and improvisation) provide compositional elements, additional ideas and considerable nuance. The domains are the raw materials that the performance (with gestures both predetermined or spontaneous) is shaping—composition is realized in performance, and performance adding substance becomes the artifact that is the composition. The concept of ‘mixing’ all these becomes a metaphor for their interdependence and their confluence. Confluence acknowledges (as the domains combine and their elements lose their independent characters) that materials and elements blend to become something else; the performance (recording included) transports materials into other gestures and shapes by means of absorbing qualities from any element in all domains. The composition and the performance are one—as are the track’s recording, music, lyrics, expression, and so forth. Each performance “is never exactly re-executable” even by the same artist(s) (ibid., 8); only the captured moments of performance and the resultant crafted confluence that is the track remain unchangeable, and unique to all others.

Figure 9.9 illustrates a conceptualization of the interdependence of domains and the performance, with the resultant interplay and enmeshed texture established. Our activities thus far have been to explore the individual elements of each domain, and how they manifest in performance. We have also discussed their interactions in a higher dimension, and that they ultimately blend into a unique and coherent whole. To begin engaging this complexity of their confluence, we embraced all aspects of the track as equivalent, and we seek to observe all elements/domains at all perspectives with that intention.

Figure 9.9 Confluence of domains and performance, within the context, character, affects and interpretation of the track.

Figure 9.9 Confluence of domains and performance, within the context, character, affects and interpretation of the track.

The diagram of Figure 9.9 may be more properly examined from a polar opposite position: that the domains and their elements (and all else in the track) are extractions from the whole. Analysis deconstructs what is present into smaller parts, it does not add the observed parts to establish an overall texture. The parts we identify and observe cannot be summed to make the whole—the sonic experience and aural image of the track are different from any assemblage. What is present within our observations is what we have chosen—chosen to include, and chosen to omit. Our observations will not include all aspects of the track, whether by choice or from some eluding our attention. Further, and importantly, there are materials and meanings that are generated by the interdependence and interrelationships of domains and performance, as well as attributes of the track emerging from outside the domains and from outside the track. The track is not self-contained and isolated—it is situated in context, it establishes a unique character, and it produces affects within (and/or from) listener interpretation. The confluence of the track contains much, but it is the whole that is its essence, and its unique voice.

Observing the Confluence of the Track

Peering into this confluence is facilitated by observing the domains (and what they contain) simultaneously, at least to some degree. Observing these materials allows observing their interactions—in the best of situations, these observations would allow evaluations to follow, and lead to conclusions from those evaluations. We may thus identify covariances among dimensions/domains/elements, or the lack thereof, and other patterns or characteristics to connect domains. This is less straight-forward when comparing across domains (though challenges within domains can at times be considerable). It is through all of this that our characterization of the track—based on content and context—emerges.

A considerable amount of data in each domain has been assembled (over the course of the previous chapters). Displaying the data of all these observations in one place—in such a way as to allow evaluation—is challenging. In whatever manner one approaches information display, at this stage (with so much data) some editing will take place. This has already happened (to some degree) in the processes of examining materials/elements and notating or noting their qualities and activities. Seemingly significant features of the track guided the analyst to collect more detailed information on some elements than others; it is likely one doubled back later and collected more detailed information on elements or materials that were at first deemed less significant or went unnoticed. The process of displaying data will also acknowledge functions of materials; this allows contextual and supportive materials/elements to be included and their significance observed alongside the primary and secondary ideas. This begins the evaluations stage of the process.

Figure 9.10 offers a format to display organized and summarized data in a way that might facilitate comparison. Information collected from each domain can be referenced in this timeline chart without the actual data appearing, or with data appearing in little detail. This is a summary of observations that can be used to reference more detailed observations retained elsewhere.

Reviewing Figure 9.10, each domain has an area in which observations may be listed. The three domains are located separately; performance observations are woven within those domains. Domain areas are distributed around the track’s timeline (which is divided into sections, but could have more detail if desired). Recording elements are located closest to the timeline, and in the most prominent location, because our goal is recording analysis—not music analysis, performance analysis, or lyrics analysis of popular records. Should one undertake an analysis that would emphasize music—or lyrics or performance—the arrangement of domains could justly be transformed to be appropriate for the track being studied.

Observations might be generalized for entire structural sections. Alternatively, element activity or materials may be placed against the timeline to illustrate their placement in time.

Element data displayed within any domain might represent specific material, salient features, features that may be significant, or some type of generalization. In placing information on this chart the analyst is engaging the evaluation process by the selection of what to include. Leaving space to add features to the chart can be important; further, at some point information on the timeline might be removed or condensed. Acknowledging primary, secondary, supportive or contextual roles will aid in organizing and delineating materials—in as much as it does not prematurely cause one to evaluate materials before examining all (or at least the most relevant) conditions. That which is prominent to the analyst could be appropriately noted—knowing what captures focus and attention allows one to determine attributes that bring interest, and allows one to willfully ignore what is prominent in order to discover subtle characteristics of attributes and nuance.

Figure 9.10 Master chart of element activities and of materials within the three domains, qualities added by performance, and of interpretation, context, affects, character and characteristics of the track.

Figure 9.10 Master chart of element activities and of materials within the three domains, qualities added by performance, and of interpretation, context, affects, character and characteristics of the track.

Interactions (i.e. complementary, parallel, delineated, or blended activity, etc.) of domains have potential to become visible here, and recorded in a location that sets them apart. Among interactions that may be explored are:

  • Music and lyrics
  • Lyrics and recording
  • Recording and music
  • Performances of individual sources, reflected in various elements/domains
  • Performance of lyrics
  • Performances of bonded groups of sources

An area of the chart is reserved for observing these interactions. There the affects generated within the track might also be noted. Aspects of interpretation, character, context and other observations can be included in this area. This chart serves as a way to track all of the information collected in a central location.

TIMBRE AS CONFLUENCE

Timbral percepts are not delineated by the domains we have been so carefully dissecting; no clear line separates it from other elements, or from one domain to the others. Timbre binds and infiltrates all other percepts into a coherent whole, a single aural image. This is—at least in large part—what makes talking about timbres, and observing and describing them so vexing.

Timbres are gestalts, are overall qualities; beneath this surface they are comprised of many dimensions containing subtle attributes of intricate nuance that are beyond our capacity to readily engage and perceive. Timbre is independent of its parts, however—it is a whole that is different from what the sum of its parts establish. Timbre does not belong to any single domain; a timbre blends domains through the soundings of performance. We know timbres by experiencing them; they defy notation and visual representations as readily as they defy description.

The relationships of and between timbre’s acoustic properties (across elements and domains) and its physical dimensions, between its affects formed from interpreting musical expression and projecting performer physical exertion, between its sonic character, and between all these and listener perception and interpretation remain blurred. We cannot easily access timbral content or define its character because it is not simply an element of music, or of vocal production and language, or of recording; it is not simply a product of performance, or of instrument selection; it fundamentally relates to frequency (and pitch) and to amplitude (and loudness), but is much more complex; finally, it elicits symbols, images and associations from outside the track.

Here we acknowledge that the sound’s content and character span all domains, as timbre functions and manifests in significantly different ways within the dimensions of tracks. Timbre represents the confluence of sound source, of musical materials, of performance, of recording’s imprint, perhaps of language—and more.

Reaching back to Chapter 7, recall that to describe timbre is to address the sound’s context and character as much as its content—this holds for whatever level of perspective we seek to understand. Here we will reframe ‘talking about sound’ by broadening our conception of timbral content, and thereby also broadening the scope of its character and its context. We will not seek to invent a vocabulary for sound, rather we will strive to define and describe timbres by the attributes and interactions of elements articulated earlier.

Two significant levels of perspective for timbre will be explored: (1) the timbre of sound sources and (2) the timbre of the track (the timbre of the track’s overall sound).

The Confluence Within Sound Source Timbres

Timbre demands we engage interpretation to identify a source’s sound, and we typically cease inquiry once we know (or can imagine) the timbre’s source and the qualities it is expressing (message, degree of force, level of urgency, nature of expression, etc.). Timbre brings meanings and qualities that frustrate description and explanation and that are an important facet of the track.

Timbres are associated with sound sources, with their origin. Identifying timbres is typically naming a sound source; when timbres are heard we seek to recognize ‘what it is,’ and judge the timbre related to the inherent qualities of instrument types (their acoustic content). Source timbre represents a blending of causal factors and modifying influences (from the performance) with the materials performed (music and lyrics).

The sound source itself can carry cultural and stylistic associations within the musical context of the track, and can also summon personal meanings related to the source within the listener’s interpretation. Further, the source may present drama and language, and a performer might contribute persona.

The performer adds dimension to the sound source, with their personal interpretative style and performance technique—this is also content, but blends into physicality. The listener’s interpretation acquires a sense of the physical gestures of the performance; the level of physical exertion and expression (performance intensity) blends into the content and context of the timbre, and attention might be shifted to timbral character.

As this progresses, we often seek to define the character of a timbre to identify it by some analogy or cross-modal metaphor or other associative reference. This is all linked to interpretations, as Kate Heidemann (2016, 1.2) offers: “. . . describing timbre in the context of an interpretation motivated by visceral experience . . . [it is] difficult to find satisfying words or representations, and misunderstandings abound.” To provide an over-simplified example: when we hear the voice of a friend, we identify the person (by the gestalt of vocal timbre, comprised of content) and immediately attend to the character of the voice (how her expression and mood are manifest within the timbral content and the context of the communication); this attending to character applies interpretation, which is rife with opportunities for mistakes (misinterpretation).

Within the sound source timbre is complex acoustic content, physical activities and gestures, and some sense of emotion or feeling and association or meaning. Our perception of timbre is different, though. We interpret an overall quality that is independent of these parts and other intangibles (affects, etc.). The sonic signature (see Chapter 7) that is timbre is not an addition of its parts, but a reality of its own; it is a coherent whole, or a gestalt.

As we seek to communicate with others about this coherent whole, about ‘sound,’ we attempt to share our subjective experiences and our resulting interpretations with others—we continue our attempts to describe sound with some clarity.

The Continuing Quest to Describe Sound with Shared Meaning

It is important to recognize descriptions of sounds are meant to inform others. We do not need to verbalize the qualities of timbre to make sense of it for ourselves. Our personal and even contextual understanding of timbres is nonverbal (including those expression and mood cues of our friend, above). Timbres of sources (as well as the timbre of the track and crystallized form) are auditory images; as such they are sensory memories (Baars & Gage 2013, 30) and are not time-dependent and are nonverbal (Snyder 2000, 216). Auditory images are sound objects in content, but they carry much about context external to their source, and the confluence of all these establish its character. That tracks are listened to as containing auditory images might partially explain why so many aspects of music’s aesthetic meaning, moods, emotion and expression defy verbal explanation (Clarke 2011, 197–202), and that their richness and clarity diminish as they are forced to conform to language.

With timbres (at all levels of perspective), we are engaging an overall quality that defies verbal description. Should we use language, when no single word can explain its complexity? Should we attempt to verbalize about them when even lengthy descriptions in language diminish, and do not adequately reflect, their multi-dimensional character and content?

Lawrence Zbikowski (2011, 186–187) offers a contrast that might help: n

Language externalizes a perception of the experience as an offering to others, and phenomenological musical consciousness is introspective and private, nonverbal, and infused with affects and the abstract.

To use language to describe the nonverbal is clearly incongruous—yet simultaneously it appears utterly necessary if we wish to communicate our interpretation of our experience of the track (phenomenologically or otherwise framed) to others.

For at least part of what we seek to accomplish in an analysis, some description of timbre appears necessary—regardless of the difficulty, or perhaps impossibility. The central role of timbre to the track’s sound—and all the sounds it contains—is overwhelmingly obvious. As we engage the inherent multi-domain nature of timbral content and perception, this difficulty becomes clearer.

Chapter 7 examined the challenges of talking about sound at great length. In the end, the matter was advanced, but remained unresolved. Here there might be a bit more resolution, as the notion of timbre is broadened, the richness of its multi-domain gestalt is acknowledged, and confluent elements across domains plus external factors might form a more viable approach to timbre analysis and description.

To make this shift will require discipline and attention, though. Our natural inclination is to seek to describe any sound—any timbre—with a single word or a few descriptive words; often with words utterly unrelated to sound.7 We articulate an interpretation to represent a complex gestalt, reaching for language to describe what we interpret as core qualities and character of the ‘sound’ plus all it brings forth in us, but rarely do we address its content, context or nuances of character. When we use ecological terminology (Clarke 2005, 197), we can be prone to overly simplify observations into highly personalized interpretations, though this need not be the result—should we decide to approach timbres differently.

As our data set of attributes increases with the confluence of all aspects of the track, it becomes clear no single word can represent all that timbre contains. We can transition to a definition of timbre that includes a description of its attributes and their interactions; terms such as those in Table 6.6 might have adequate meaning when the sound is defined, should the analyst (listener) wish to continue this practice of using them. Such terms could just as readily be abandoned.

We can expect no direct vocabulary to emerge to address timbres, though. In effect, the complexities of timbre ensure the unique qualities of each to be a multivariable calculus formula in itself; the relationships of its parts as important as their content, their interactions establishing further depth and breadth of content and meaning the formula could not predict. Each sound is different from others— note to note, between and within instruments, one vocal sound or voice to the next; the timbre of each track is different. Engaging these relationships and interactions, as well as content and context, might provide some tangible timbral information, albeit an incomplete approximation. Some shared meaning and understanding of timbres might emerge from descriptions based on observations of their content, character and context.

To summarize and recall what has been covered elsewhere: a hierarchy relationship exists between character and content. Character describes the overall quality (gestalt) of the timbre; content defines the attributes and traits of the component parts within the timbre. Context is external to the timbre, to the sound or the track—timbre’s associations and meanings connect it to matters outside the track, and also how these may situate the timbre within the track or establish a conceptual frame of reference for the sound.

Typology of Timbre in Confluence

As domains become blended, examining the confluence with a goal to define or describe its timbre gestalt appears overwhelming, and provides no real access point. It seems more appropriate to shift to an articulation (observation and examination) of physical attributes, of perceptual impressions and interpretations, and of the perceived physicality of sound production. Describing timbre might then turn to the attributes and their values (or variations) within three views of timbre: (1) those that are interpretation-related, (2) those that engage the content of the waveform, and (3) those that relate to the physiological. Information pulled from these categories might function toward defining a timbre by contributing to (1) its character, (2) its acoustic content, and/or (3) the context in which it is situated or which it establishes; some attributes may clearly be associated with content, character or context, while other attributes may apply to several, though differently to each.

Table 9.4 is a listing of attributes that might be included within a timbre typology table at the perspective of the sound source. The collection of variables (attributes) selected will be most effective when it conforms to the salient features of the sound source being studied. This listing will provide some guidance in assembling a suitable timbre typology table; it is not intended to be all-inclusive.

Rarely will one attribute (or element) dominate or dictate the character of a timbre. Timbre’s components are always interacting and interfering with one another—like micro-auditory streams. Timbral content is essential to the gestalt context; it emerges from all three domains and is the basis of what happens physiologically as it elicits interpretation from perception and other factors. Descriptions of timbre attributes are incomplete without content.

Using these categories and functions as references, the inner workings of timbre might be described with some consistency and substance. This approach might establish a framework of sorts for others to understand what one is identifying and to communicate something of substance. Clearly a single word for a sound will not emerge from this process. Describing sounds will be far more involved than offering the first descriptive term that comes to mind; this will be a decidedly positive step toward discussing a timbre’s character and content within the contexts of tracks, or as sound objects independent of context.

It is possible to discuss timbre by describing elements relative to one another. Each timbre has its own formula or algorithm of how the domains and elements (in their content, character and context) blend in confluence to establish its unique nature. An open process of description based on observation and evaluation of all pertinent timbral components and of its gestalt qualities is proposed here—typology might facilitate this.

Table 9.4 Interpretation-related, acoustic and physiological attributes that might be included within a timbre typology table at the perspective of the individual sound source.

Interpretation-Related Psychological, Perceptual Physical Content Acoustic Physiological Visceral, Implied 
 
 Source identification or recognition Inherent acoustic content of sound source Visceral connection with performance 
 Expression: musical, dramatic Dynamic envelope Implied physical gestures 
 Levels of energy, force or exertion Spectrum Levels of energy, force or exertion 
 Strain and ease of performer Spectral envelope Strain and ease of performer 
 Clarity or distortion of performed sound Definition of fundamental frequency Visceral feelings of affects 
 Affects, moods, emotions Space: width and location Idiomatic modes of performance 
 External connotations and associations (such as cause) Space: distance and depth Deviations from idiomatic playing 
 Tension level of sound changes and musical movement Space: echo, reverb, environment Tension driven motion of performance technique 
 Semiotic meanings attached to sound source Space: spectral content of reverberation Performance techniques 
 Language communications Timbre of time Implied meanings of imagined physical gestures 
 Symbolism Modifications to physical dimensions Level of difficulty of materials 
 Perceived meanings of paralanguage sounds Noise elements within spectrum Athleticism of performance 
 Realism and surrealism Inherent timbral traits of performance style Energy, intensity, exertion, speed 
 External associations elicited by breath, body, and performance sounds Content of breath, body, and performance sounds Persona 
 Drama, persona Listener connection with performer Language and paralanguage sounds

Describing sound can become an act of addressing the values of its attributes. The process of investigating and observing attributes will allow one to identify the features that are defining features of the sound, to guide further observation of other attributes. The attributes of Table 9.4 can be observed, and their values collected; when collected, observations can be recognized as relating to content, character and context and categorized on the typology table. With acquired facility, these two steps might be combined. Formulating a description requires some evaluation of data; entering evaluation now, prominent attributes that provide the timbre with distinctive traits are identified and described. Attributes that provide distinctive ornamentation to the timbre may be pertinent, and certainly other observations will bring further connections and interrelationships between elements. A detailed ‘definition’ replaces the single word description; the definition identifies the distinctive features within its content, the nature of its character, and its relationships to contexts in which it appears. It will be helpful for this definition to acknowledge that summing these parts will not adequately reflect the whole of the sound. The whole of the sound is something other than what these parts put together might represent.

Table 9.5 A general typology table for timbre, with attributes spanning timbral content, character and context.


 Variable (Attribute, Dimension) Value (Traits) 
 
 Content Dynamic envelope 
 Spectral content 
 Spectral envelope 
 Definition of fundamental frequency 
 Noise components 
 Formant frequencies and characteristics 
 OTHER 
 OTHER 
 Character Overall quality or distinguishing nature 
 Emotions or sub-emotions 
 Expressive qualities 
 Aesthetics 
 External associations 
 Physiological connections 
 Energy, intensity 
 OTHER 
 OTHER 
 OTHER 
 Context Source or origin of the timbre 
 Semiotics 
 External associations 
 Cultural meanings and connections 
 Conformity to the texture (blend or revealed) 
 Functional relationship to other sources 
 Aesthetic 
 OTHER 
 OTHER 
 OTHER

Given the purpose of language—and the description of timbre is verbal—is to share information with others, we must decide what information to share. The questions that arise, then, relate to what to include or emphasize in a typology table, or how to interpret what has been observed. Some useful questions are:

  • What is it we need to share to communicate our observations?
  • What are we most drawn to share from personal bias?
  • What is needed to represent the fundamental traits of the timbre?
  • What traits provide the timbre with its unique character?
  • What physical components provide it with its unique sonic quality?
  • What within the timbre is important to context, to expression or to its function?
  • What is required to achieve the goals of the analysis?

The Timbre of the Track

This timbre of the track is the ‘sound’ that is a significant trait of any record. The confluence of all sound sources—and all that they contain and represent—brings all percepts within the track to fuse into a single distinct timbre; it is a global, aggregate texture, and also a single impression.

The timbre of the track has a duality of content and character. It also establishes and reflects the context of the record—its overall affects and aesthetics, its energy and expression, its sense of directed motion and level of intensity, its atmosphere and sense of space, its drama and more, all coalesced into a singular and complete intrinsic character of the track—into a coherent whole. The aspects of content, character and context are the core ‘sound’ of the track—they establish and embody its sonic signature (sometimes called signature sound) that can often be recognized from just a brief exposure to a track.

The timbre of the track exists simultaneously as one dimension of crystallized form, and also as the fabric of all the track’s content at the highest level of structure and of perspective. This extends the duality of the track (1) as ‘character’ conceived as a single sound object or aural image (where all is present at once and exists without constraint of time), and (2) as ‘content’ conceived as an event experienced as the confluence of all materials and elements as they unfold over time within context of the track’s activities and structure, and as they coalesce into a single overall, changing texture or fabric (or timbre).

When manifest within the structure of the track, this overall timbre of the track is at the highest structural level, and is conceived at the highest level of perspective. Within structure, the ‘timbre of the track’ is temporal, changing dynamically over time, and it contains all of the materials and elements of all domains—included are the nuance and subtleties of all elements and materials at all dimensions, wherein the qualities of all can be observed and ‘appreciated’ for their contributions to the whole.

The timbre of the track’s aggregate texture is comprised of superimposed strands of materials and activities of musical materials, lyrics, and recording elements. The features of this texture are unique for all genres of music and to some degree for each track. A typology of the aggregate texture may be related to density and range, to the number of strands and the placement of the strands within the range of a particular element, to the relationships of elements and domains, etc.

In considering recording elements, insight into this confluence can be obtained by contrasting program dynamic contour, timbral balance (pitch density) and loudness balance as if they were (respectively) the dynamic envelope, spectrum and spectral envelope of the track. The interdependence of these establish a gestalt that is at the core of the timbre of the track.

It may be helpful to pause for a moment to remember these concepts, and to reframe them here. Program dynamic contour is how the overall loudness level established by all sounds and the materials they present evolve over the duration of the track. Pitch density is the pitch/frequency range that each sound source (and their musical materials) occupies within the spectral space (frequency range) of the track’s timbral balance. Loudness balance is how each sound source of the track (each that is also represented in pitch density) change and interact over the duration of the track. These three dimensions change fluidly, unfold temporally, and shape the sound structure of the track—they embody the timbre of the track just as dynamic envelope, spectral content, and spectral envelope are the content of an individual sound’s timbre. As timbre reaches across all elements and domains to establish character, the qualities of music and lyrics (etc.) contribute to the timbre of the track. The spatial identity of the track (highest dimension) that includes the traits and interrelationships of host environments and the holistic environment also provides components within the timbre of the track.

Timbre of the Track and the Dimensions and Domains of the Track

Confluence (of all domains/elements, performance and outside influences) permeates the timbre of the track, as well as crystallized form. Confluence resides within the content of its texture, and character. Confluence includes the affects of emotion, energy and expression, along with the semiotics of meanings emerging from each domain. These—along with listener bias—are all included within the listener interpretation of the character of the timbre of the track. The reader will recognize these have all been explored previously in great detail.

Here we will consider how the timbre of the track and crystallized form are reflected in and relate to the character, context and content of the track.

Crystallized form—which will be explored more deeply in the next section—might be conceptualized as a stationary physical object; it can be identified as a stable ‘multidimensional outer layer’ with ‘a myriad of activity on a host of perspectives’ occurring internally. This conceptual ‘outer layer’ is the context of the track and the overall shape of the track; the conceptual activity that is internal to the object is the timbre of the track. The timbre of the track is the myriad of activity of internal content (structural hierarchy, and the domains/elements that comprise it) of crystallized form; crystallized form also carries additional qualities beyond the timbre of the track. The totality of the inner activity and the outer layer is crystallized form, as an all-encompassing character of the track.

The ‘outer layer’ of crystallized form is the highest level of perspective of elements and confluence within each domain, and interdependencies of domains. These contribute directly to the overall character and context of the track. The timbre of the track is at the structural level just below. The timbre of the track has an overall quality of character—just as do timbres of sources—that is established by the content of all that is within the track and the contexts that they carry or establish. The timbre of the track is structural, temporal and changes of over time; crystallized form is an aural image, non-temporal and conceptualized outside time.

Recalling what was explained in Chapter 2, recording elements can serve a primary, supportive, ornamental or contextual function in shaping the track at the highest level of perspective. Of these functions, the contextual function creates references for the activities of materials (and the elements that create them) to assume primary, supportive or ornamental functions.

As recording elements manifest at the highest perspective, contextual recording elements are present in all tracks. Their quality provides a consistency throughout the track, and also establishes a reference against which other elements can be observed consistently. Contextual elements are stationary or static; their values or qualities are unwavering and do not change throughout the course of the track. While some values of these elements may be revealed slowly over the duration of the track, in total they establish a context and frame of reference against which all activities may be related (and evaluated).

Table 9.6 lists contextual dimensions, beginning with those that are established by the interdependence of all the track contains. Contextual elements for music and lyric domains are more fluid than those of recording. Tempo (for example) will always be present and contribute to context, though (conversely) tempo can be quite fluid; other stylistic traits such as tonality, groove, beats and ostinatos, etc., may or may not provide an individual track with context. Context represents a backdrop against which all other like attributes can be gauged, and their essential value does not change. The elements of music are highly variable in function, and those that are contextual in some tracks may well differ in others; the key or tonality of a track is a prominent exception. Lyrics’ contributions to context are also quite variable, and often to some degree unique to individual tracks; lyrics’ content is often linked to subject matter or message, and what they communicate can vary (even markedly) between individuals.

Table 9.6 Elements that may function to establish a sonic context within the timbre of the track.

Domain Element
 Confluence of Domains Reference dynamic level 
 Timbral Balance 
 Overall shape or form of the track 
 Overall emotion, energy, expression 
 Embedded degree of tension and motion 
 Degree and qualities of any final resolution 
 Elements of Recording Holistic environment 
 Sound Stage boundaries (stage left-to-right width, front edge of sound stage, & rear wall from depth) 
 Music Elements Tempo, beat, groove, ostinato patterns, tonality, hooks, other 
 Elements of Lyrics Singular impression of drama, mood, tension (etc.) of overall conception 
 Subject, meanings of lyric, story or drama, language style, other

The temporal qualities of domains at the highest dimension contribute the substance of timbre of the track’s character and content; these present materials that are fluid—melodic relationships, harmonic motion, morphing timbral balance, evolving storyline. These, as all structural components and relations, occur over time and cannot be instantaneous (Handel 1993, 186). This content is embedded within all structural levels (levels of dimension or perspective), and can function as primary, secondary or support materials/elements in shaping the track. Table 9.7 lists the structural materials, organization and relationships within the domains at the timbre of the track’s structural level.

The timbre of the track manifests into one of the dimensions of crystallized form. It is a large dimension concept of aggregate texture of all domains that result in a timbral quality of the track—one of internal content, overall character and connections with external contexts. Crystallized form provides a different angle on the character and context of the timbre of the track.

As will be explored next, crystallized form contrasts starkly with the timbre of the track; the timbre of the track is an unfolding gestalt-sound at the highest level of structure and of perspective, whereas crystallized form exists outside temporal experience and coalesces in memory. The timbre of the track represents the internal activity within crystallized form’s shell—a shell that is frozen outside time as an instantaneous manifestation of the track’s presence.

Crystallized Form

The character of the timbre of the track leads to crystallized form. Crystallized form will conclude this discussion of confluence. Our discussion will broaden, though, as when we engage crystallized form we will progress into the connected topics of deep listening.

Crystallized form is (1) a quality inherent to the experiencing of a track and (2) a sense of deeply knowing and comprehending one’s interpretation of the track as an all-encompassing presence and impression. It is the combination of perceptual attributes, abstractions, social significances, aesthetic gestures and embodied experiences that for the listener/analyst personally, from one’s own vantage, characterize the track. Crystallized form is what is most memorable about the track—to a particular listener, or listening analyst—as it is formulated in reflection. What crystallizes in this quality of form may be highly personal, or one might acquire the ability to step outside their subjective vantage into a position of greater connection with others or that may allow substantive academic discourse.

Table 9.7 An incomplete listing of elements that are temporal and variable within the content of timbre of the track; these represent the structural materials, organization and relationships at the timbre of the track’s highest level of perspective.


 Domain Element 
 
 Highest Structural Dimension of Domains Hierarchies of music and lyric materials 
 Hierarchies of recording elements 
 Variable affects: emotion, energy, expression 
 Semiotics within each domain 
 Elements of Recording Program dynamic contour 
 Timbral balance and pitch density 
 Loudness balance 
 Timbral qualities and performance intensity 
 Sound stage positions of sources (lateral and distance positions, including image width & depth) 
 Host environments of sources 
 Music Elements Musical syntax creating motion and tension other 
 Elements of Lyrics Unfolding story or drama 
 Word usage and meanings 
 Sounds and rhythms of text 
 Language syntax

The defining and holistic qualities and concepts of crystallized form are:

  • Cognized as a whole, is apprehended in an instant as a singular manifestation
  • An impression, atmosphere, ambiance, aura that constitutes the track’s presence
  • Realized through the temporal experiencing of the track
  • Coalesced retrospectively through introspection
  • Multidimensional sound object, aural image (or auditory image)
  • An inherent, singular identity with multidimensional features that results from the convergence of the content and character of all elements and materials of all domains (including affects, energy, message, drama, etc.), and all that they illicit from within the listener’s biases and experiences, and their cultural context
  • A sense of awareness and of knowing the core, essential nature of the track
  • Coalesces within listener interpretation
  • Establishes and embodies the context of the record
  • Represents the character of the track

Inherent within crystallized form is (1) what is unique to the track and what constitutes its content, character, and context, and also (2) what is most salient and meaningful to the listener. Listener interpretation and their subjective vantage, and the (perhaps) more objective position of the analyst are explored and contrasted below.

Intrinsic Nature of Crystallized Form

Crystallized form is the manifestation of the entire track in an instant of realization; the entirety perceived at once. It is a single large-scale aural image, that is a nonverbal percept, an interpretation of the track, and a memory of the listening experience and reflections. It is the highest dimension of form. It is experienced as a sense of knowing or understanding.

Crystallized form may be framed as the essential nature of the track, as its singular intrinsic character as a whole. Crystallized form may be considered as the presence of the track—equivalent to that ‘feeling’ or ‘understanding’ (or something other) we experience after a motion picture, as we nonverbally reconcile the story, drama, characters, plot, ending, etc. into a single ‘sensation’ or ‘mood’ or ‘spirit’ or ‘impression’ or ‘whatever it is that we experience.’

It is, perhaps, the ‘higher essence’ any work of art contains that allows it to reach beyond the human condition to transcend its combined materials, reason, imagination and emotion; as the quality that allows the track to communicate similarly to many; perhaps at times some may experience it as a ‘higher consciousness.’ Crystallized form can be “something felt to be greater than oneself, yet somehow within oneself” (Burnham 2001, 195). The phenomenological within experience and consciousness are inherent to crystallized form; as such crystallized form is (1) a quality inherent to the experience of an art object—whether visual, aural, dramatic, etc.—and (2) a state of knowing and comprehending the track as the memory of its experience.

As an aural image and a large-dimension sound object—a unified and multidimensional entity or conception—it might be understood “as equivalent to an image schema: a cognitive construction that represents the abstract qualities of sound rather than a single perception of it . . . [with] invariant properties that can be examined by ‘looking’ from a variety of perspectives . . .” (Bourbon & Zagorski-Thomas 2017, 3). Its substance reflects not only its formal shape and domain content, but also the affects, meanings and aesthetic expressions of the track.

Origin of the Term

The term ‘crystallized form’ aligns with the principle that every track is unique. Each track is multidimensional, reflecting different dimensions, shapes, qualities—just as the surfaces, levels of transparency and angles of a geological crystal. Each track may be conceived as a physical, sounding object, all qualities present at once—that can be viewed from a variety of perspectives. Approached in this way, as crystallized form the track can be turned (like a crystal) to appreciate it from another different vantage points, though its content has not changed; each hearing of the track represents “an incomplete view of the ‘object’ from a single perspective” (Bourbon & Zagorski-Thomas 2017, 3). The track’s richness will be further revealed from observation at each successive perspective, of every new vantage point, from considering character and confluence from a variety of angles (and so forth); as implied above, crystallized form contains all strata of perspective and structure, providing access to all levels of detail (from the all-encompassing whole to the microscopic) without altering content.

The term ‘crystallized form’ also emerged in analogy with ice. Water crystallizes when it freezes; while all motion of the liquid is stilled, it remains comprised fully of its original substance. Further, the ice crystal captures the moment of its formation; time is frozen at the moment of its realization. Within each ice crystal is a unique light patterning with many rarefactions, a unique size and shape, colorations and more; each track brings its unique patterning and sense of motion, its size and shape, the qualities of its materials and elements, and an inherent, singular identity that emerges from the convergence of its many parts into a holistic whole.

Chapter 2 compared visually observing a sculpture to crystallized form; this analogy could apply to any complex physical object, such as a crystal. This analogy is also helpful in recognizing that the object can be observed as a coherent whole, independent of its component parts—connecting with the concepts of acousmatic listening and sound object. At this highest level of perspective, crystallized form is most readily approached as an object and holistic perception; “holistic perception implies that objects are not broken into their component parts but simply perceived as wholes” (Neuhoff 2004d, 250). When shifting perspective to begin to notice the track’s component parts at their highest perspective, the relationships of components, as much as the components themselves, establish dimensionality to crystallized form. As listener perspective draws closer to the object, detail is added to dimensions and analytical perception and the parts can be separated from the whole to be observed (or analyzed), and then “glued together to form the whole” (ibid.)—as with timbre inChapter 7.

Continuing this physical object analogy, we might accept the track in crystallized form as stationary and without external motion. Internally is the myriad of activity of the timbre of the track—with its independent motion on a host of perspectives—that is contained by the stable, multidimensional outer layer of crystallized form. In this way, the impression of crystallized form establishes a context for all that happens within the track. The crystallized form is complete at every moment in time; in essence, it exists out of time. Crystallized form disregards temporal evolution8 because all that it is—its entirety—is present at every moment. Denis Smalley (2007, 37–38) notes: “I can collapse the whole experience into a present moment, and that is largely how it rests in my memory.”

Aural Image and Memory

Crystallized form reflects the notion of music as memory; the track existing out of time, ‘heard’ simultaneously in an instant. Its totality9 is held in long-term memory and accessed with retrospection and introspection. The experience of crystallized form relies on memory, and each person’s memory of an experience will differ. Our memory

Each person’s sense of a track’s crystallized form will differ; within the same cultural group the differences may be slight, but not necessarily. When one accounts for personal interpretation of lyrics, of performance intensity gestures, and the like, subtle differences can make for substantial meanings.

Crystallized form is manifest within the listener’s awareness after the track is experienced. It comes to be known (perhaps understood) on a nonverbal, intrinsic level. This is an interpretation that may result from an informed accumulated process, or at some level it may be noticed immediately. Formulating an impression of crystallized form likely begins the first moments the listener hears a track; the non-reflective, casual listener will obtain their interpretation of this overall impression, just as will the seasoned analyst. During the stages of collecting data for domains/elements/materials observations, a sense of crystallized form might gain richness. The impression of crystallized form will become more nuanced during the process of hearing, assimilating, recognizing, discovering, and experiencing the track on many levels.

It is entirely possible that crystallized form is what attracts one to a record. To learn about crystallized form is to learn how the track comes together, and also what is memorable and significant within the track—for us personally, or as an object for analysis. It is the opportunity to perceive its complexities as they fit into its grand scheme. Alva Noë (2004, 118) has observed “thought and experience are, in important ways, continuous,” recognizing that between perceptual awareness and thought awareness there can be no clear distinction. Denis Smalley (2007, 40) recognizes this overlap allows for understanding the condition of his concept of space-form, “which although gathered in time, can be contemplated outside the time of listening. . . . think about the . . . soundscape now, without perceiving it.” Perceptual awareness informing memory, brought to awareness through reflection.

Recognizing Crystallized Form

Crystallized form is approached through holistic listening, and also through reflection.

Crystallized form might be engaged early in the analysis process—as the analyst becomes aware of salient qualities during sessions of open listening. In a way, it is an impression that is continually evolving for the analyst, though for the lay listener its impression might become rather fixed. All listeners—even those with little experience and knowledge to inform their impressions—will formulate an interpretation of crystallized form, and that impression, mood, expression, understanding may be quite personal. This impression of crystallized form can seem to arise instinctively.

As the analyst engages the content of the track, focus shifts away from crystallized form in many directions. Crystallized form is a central topic within the analysis process when establishing the context and character of the track. In the end, the last task of the analyst might be to reflect on what they remember. What is it that remains? Think not just of the track, but also of your memories of it. Bring emphasis to your memories of the experience of it, as opposed to your analysis of its parts.

A few questions that may guide thoughts are:

  • What is it that remains and establishes a nonverbal aural image of the whole?
  • What is most memorable? Can you conceive this without labelling it with language?
  • What in it has stuck with you, personally? (This is awareness of subjective vantage.)
  • What is most meaningful? Can you recognize this without naming it?
  • What appeals to your sense of taste, your listening biases?
  • What reflects the cultural norms relevant to the track?
  • Can you settle your memories of the track into a single, nonverbal impression?
  • How much of this impression might be shared by others, and how much is personal?

Reflection is on the presence that remains after the experience, allowing a memory of the whole, as one impression, to coalesce. Crystallized form is not approached through considering the specific content, or deductive thought of prominence or significance; these higher cognitive functions of analysis elucidate the inner workings of structure (content), but are counterproductive for recognizing the singular presence (character) that is crystallized form. Crystallized form is a complex aural image in memory; it is not a temporal experience. Its qualities are those that are most memorable and meaningful to the listener; this inherent nature is based on listener interpretation of the nonverbal aural image, along with any context that arises.

Holistic listening—a type of deep listening—is the act of perceiving the complete track as a single object (aural image) that is not broken down into its “component parts but simply perceived as a whole” (Neuhoff 2004d, 250). Crystallized form is encountered through holistic listening, through opening to the experience of all at once—the work as a coherent whole, one single impression within one’s awareness and as a conscious experience. The qualities of crystallized form evade verbalization just as do the timbres of sound sources, and the timbre of the track—for all the same reasons.

An interpretation emerges in memory, in the silence after the track has ceased. In quiet reflection on the impression (perhaps a feeling or mood) or the presence (perhaps an air, atmosphere, aura or ambience) that comes forth—any specific materials recalled in memory are ‘heard’ within this context. Many of the factors for approaching the interpretation of reference dynamic level apply to recognizing crystallized form. These will be explored next, as deep listening and open listening lead us to a position of being aware and open to all that arrives, while also being fully passive toward seeking and processing sounds; one holds a position of only observing.

DEEP LISTENING

Deep listening and open listening can offer guidance for experiencing and recognizing crystallized form:

  • Allow one’s self to be open to sense its presence; searching to try to analyze or to make sense of crystallized form may often divert attention from the overall presence
  • Attempt to limit the tendency of rational processing, calculating, comparing and searching for a deduced answer; these pull attention to other perspectives, introduce prominence and utilize different memory functions
  • With intention, be receptive to the track’s global auditory presence as an elevated experience, transcending the confluence of sounds, accessed through holistic listening and nonverbal reflection
  • Experience the silence after the track as transcendent of the sounds that preceded it; hold the opportunity to experience the track’s crystallized form within that silence
  • Awareness is nonverbal, and directed to character that defies language, not to qualities that can be defined

Deep listening guides the experience of the track, and also can guide the process of recording analysis. It opens the analyst to experience the track differently—such as the presence of crystallized form can be revealed or the holistic listening of the track experienced. Deep listening can allow discoveries of what was not previously apparent or perceivable—for example, dimensions of sound not previously experienced, such as the subtle sonic dimensions of space.

Woven throughout the previous sections and chapters have been encouragements to listen deeply. To listen deeply can take many forms, with all based on or partially reflecting one or more of the following:

  • Being fully attentive to what is present at this moment
  • Listening without processing significance or imposing function or structure
  • Listening that minimizes memory’s influences
  • Listening without expectations based on prior materials or activities within the track
  • Listening that arrests anticipation
  • Listening without judgement, prejudice, personal bias
  • Listening intentionally: without an agenda or with a specific agenda
  • Listening with focused attention to a specific level of perspective and element
  • Listening with attention to all that arrives: adapting equivalence
  • Listening with open awareness: attention that accepts all that arrives
  • Listening without language: nonverbal listening to aural images and unfolding events
  • Listening that permits deeply knowing any object at the focus of attention
  • Listening that facilitates deeply knowing the singular impression of the track
  • Listening that opens awareness and retention for introspection and reflection to follow

Some of these forms have already been thoroughly introduced in Chapter 2. Many of these listed will be explored as this discussion unfolds. In this writing, deep listening functions as a common thread that interconnects observations, evaluations and conclusions. Deep listening’s aware attention and its openness to all that arrives are resident throughout the framework for analysis, in all its steps and concerns.

Records are created by deep listeners, for deep listening. Yes, they speak immediately, viscerally and profoundly and this is what grabs our sensibilities and shapes our memories—they also present great richness and detail. There is much more present on, under and over the surface, and what is there is often what shapes tracks fundamentally, and what is there can be of great relevance and interest in studying the track. Albin Zak shared a pertinent personal experience of deep listening:

A Tradition of Deep Listening

Deep listening has a long tradition that has come forward from experimental music throughout the twentieth century. It is evident in music composition, performance and the listening practices to engage the new ideas of the experimental, of what has not previously been encountered. A connection with listening to nature is evident in many approaches: many emphasize open listening and listening within the present; the notion of intermingling real-life listening and music listening is commonly used to contrast open listening and directed listening.

The latter is evident in the 1920s compositions of Erik Satie and Darius Milhaud that produced “furniture music” and brought surrounding noises into the experiences of music performances, and incorporated into music listening. Referring to Thoreau, John Cage has stated “Music is sounds, sounds around us whether we’re in or out of concert halls” (cited in Schafer 1971, 1).

John Cage represents an early key figure in which deep listening was substantive in his musical thinking and practice. In his book Silence he advocates open listening and non-directed deep listening by his typical abstract inference:

The listening within the present ‘here and now,’ not hearing because the composers’ imagination and expectations distort the experience, listening with mental emptiness (open attention) are all integral to the deep listening concepts we have been engaging.

Pierre Schaeffer—already covered within acousmatic listening, with its approach of pure sound disassociated from its origins and associates, void of external context—could not have engaged sound objects without deep listening. Since its beginnings in 1948, his approach to sound analysis had been based on directed deep listening within timbres (Schaeffer 2012). Occurring at the same time as to John Cage’s open and non-directed use of deep listening, Pierre Schaeffer utilized a deep listening technique that was directed inward, toward exploring the nuance within sounds; interestingly, the two hold in common a position that the origin of a sound is not pertinent to the experience of it.

R. Murray Schafer (2004, 34) identified a “blurring of the edges between music and environmental sounds is the most striking feature of twentieth century music.” The alternative he put forward:

He goes on to identify the significance of silence to listening and to acoustic ecology, and notes with the deep listening techniques he teaches: “the whole body becomes an ear . . . and [students] have heard music as never before” (ibid., 38). Schafer relied heavily on deep listening concepts in his research and compositions related to acoustic ecology and soundscapes.

Pauline Oliveros shared a pertinent personal experience of deep listening:

The listening experience of open awareness widened enough to include how the sound was interacting with the large environment, sounds from creatures, as well as their performance; these all intermingled and became one; using open listening, this experience included all that arrived without judgement or prejudice.

These different approaches to deep listening can all be useful in recording analysis.

Deep Listening for Recording Analysis

Deep listening provides new listening opportunities for recording analysis, and also in record production and engineering. These opportunities relate to the listener’s intention for how they will use their attention. Attention and awareness may be either directed by deliberate focus or it may be non-directed by holding an open awareness. Both forms have purpose, function and value to the analyst. Each is most effectively engaged by the exclusion of the other. Both directed and non-directed deep listening require concentration, disciplined attention and an ability to remain focused on awareness of sound.

Deep listening may be utilized to bring the analyst a sense of focus and attention to sound that can be particularly effective for engaging the sonic worlds within records. This is the ‘deep listening’ that has been mentioned previously. Deep listening can be directed with intention to any singular aspect of the track. This approach was engaged often in discussions of recording elements—such as with listening inside the gestalts of timbres and environments, identifying distance positions and relationships, recognizing the edges and size of stereo images, and so forth. Deep listening such as this can be directed to any element in any domain, and at any dimension. The focus of deep listening can be directed to nuance in the overall impression of an element (such as program dynamic contour) progressing through a continuum of levels of dimension to the nuance of activities within timbral components. Listening for this nuance—as well as hearing and recognizing the substance and contributions of all that is within the track—can be facilitated by adapting and broadening the openness to sound framed as equivalence.

Equally important, deep listening can be non-directed —listening without guiding intention. With this approach to deep listening attention allows all that arrives to be an equal presence within one’s awareness. This allows discovery, allows experiencing the unexpected, of what could never have been anticipated, of the utterly unique, or the experiencing of what has never before been experienced, of being prepared to hear the unexpected.10 Listening with open awareness allows all sounds to be experienced without consideration of their origins or functions; sounds can just be, they can simply exist, and their emerging and diminishing presences experienced.

Deep listening with open awareness is the experience within the present. When in practice, it inherently arrests the distortions of anticipation and minimizes the effects of the prejudices and preferences within our bias and subjective vantage. Practicing non-directed, open awareness is not so simple, though. It is dependent upon one’s ability to concentrate without fixating, to not be distracted by what occurs and yet to be fully aware of it, to withhold judgement and yet perceive significant detail and also broad perspective.

Deep Listening and the Present

The psychological states of ‘being present’ and being aware within ‘the present moment’ are integral to deep listening. Since the beginnings of the practice of deep listening these factors have been central to it. Being present and of being aware of the present moment are abilities that can be developed— developed to the benefit of performing recording analysis. They are mutually supportive: deep listening itself cultivates being present, being present allows deep listening, the awareness of the present moment is the window of attention that allows being present, that opens deep listening.11

The practice of listening deeply is the act of listening only. ‘Being present’ is having one’s attention consumed with what is happening. For us, it is a concentration on the listening process to the exclusion of all other thoughts and distractions. One’s attention is dedicated solely to listening. One acquires the ability (skill) to remain focused on what is being heard, and being unaware of all other thoughts and sensations. This cultivation of concentration and attention is central to deep listening; it is also relevant to the other types of deep listening that involve memory (discussed below).

The present moment is what is now; it is the moving time window of our existence. The present moment appears resident within the 2 to 3 second time “window of consciousness” (Snyder 2000, 9)— a duration curiously similar to the ‘buffer’ of echoic memory. Echoic memory functions in early processing12 where “information persists as an echoic memory, which usually decays in less than a second like an echo” (Neisser 1967, 189–194); indications are that this storage and processing may extend from about 2 seconds to perhaps as much as 4 seconds depending on context. Conscious awareness relies on perceptual processing of sensory systems (including audition) and as a storage system (echoic memory) (Snyder 2000, 4; Zbikowski 2011, 185). Echoic memory holds unprocessed auditory stimuli for a short period of time until the following sound(s) are heard, and the sound made meaningful. This is the ‘specious present’ that we experience as the present moment, our ‘right now.’ Within this space of time the experience is pre-reflective “with a certain breadth of its own on which we sit perched, and from which we look in two directions into time [to the past and the future]” (James 1890, 609).

From this position that we experience as now, our natural tendency is to shift our concerns and attention to higher-order mental functions where we can process sound in memory and where we can anticipate what is most likely to follow (based on our knowledge and experience of context). These are activities that open awareness seeks to mitigate (discussed below). To be present is to remain in the moment, to not reflect on and make connections with the past or to anticipate the future. Deep listening establishes a vantage that the past is irrelevant (what has happened is over and thoughts of it are distractions), and any notions of the future (expectations, anticipation) are conjecture (we cannot predict what will happen and any thoughts of it are distractions).

Memory and Reflection

Deep listening is keen awareness. This keen awareness can aid in establishing memories that are rich in detail, accurate and objective. All aspects of the track—such as auditory images, musical materials, structural relationships, and so forth—can be retained through deep listening. Much in the track can be perceived nonverbally and without cognitive processing (Damasio 2003, Edelman 2006) and made resident in memory through deep listening.

Deep listening’s goal when non-directed is to shift the natural tendency to process sound in memory and to project the future (anticipate or expect something to follow). Open awareness does not engage higher-order processing for anticipation or expectation, yet can allow memory without imposing order. This deep listening can experience the track’s inherent qualities without verbalizing them, without privileging some over others, and so forth. Auditory images may be retained in the abstract, appearing in nonverbal form, but nonetheless establish a fixed impression or presence; they appear resident in long-term memory as “an auditory image at the center of musical thought” (Butler 1992, 188), where the ordering of events are not fixed (Baars & Gage 2013, 286; Neisser 1976, 112). This approach to deep listening might be considered holistic listening.

Directed deep listening involves higher-order processing when it engages memory; it can also enhance and develop the ability to retain information. We hear backwards in time, and all sounds are resident in memory—including those within the window of the present. Directed deep listening can utilize memory to retain information and experiences of the track. Memories that can form the basis of reflection and introspection are important to the character of timbres and crystallized form—among other interpretations of tracks. Deep listening can aid introspection, allowing it to be employed without extreme subjectivism; the overly personal (and even added levels of fantasy) can be avoided with refined attention that cultivates a stable and vivid awareness. As sounds are perceived without judgment and prejudice, the influence of one’s subjective vantage can be minimized.

Deep listening’s retention of aural images and events that allows reflection is integral to evaluation processes. One’s experiences of the track—its materials and ideas—can be evaluated in memory and related to other experiences within the track. This allows the analyst to more easily perform analysis steps to reach insightful conclusions, and also to hold the experiences needed to interpret the character of crystallized form. Long-term memory is enhanced by deep listening, and it is vital to numerous recording analysis processes. With deep listening a perception of the track that is not based on working memory during listening, and not of recalling episodic memory events, can be realized. Working memory is highly influenced by our predispositions and attention, and recall of episodic memories has a lower level of accuracy than recognition of remembered episodes (Baars & Gage 2013, 253–288). This state might be observed with intention to be unencumbered (as much as this is possible) by personal inputs of interpretation, listening or aesthetic preferences, cultural influences, past experiences with the track or artists, personalized outside associations, and more. Intention and choice are what is crucial here, as they shape the experience and interpretation.

Arresting Anticipation and Minimizing Bias

Deep listening engages what is present; it allows us to hear what is there. This contrasts with our natural tendency for anticipation, an expectation that something related to the past will follow. We can listen hoping to hear what we believe will happen, or what we want to happen. Expectations and anticipation can bring one to hear what one wishes, whether or not it occurred.

Deep listening can arrest anticipation by listening in the moment only—by listening to what is present, when it is present. It does not process what has happened to project what will follow, rather it is concerned only with what is happening now.

Deep listening can bring listening without predicting. Being open to all that arrives, and not privileging some over others, mitigates the memories of what has happened. Without holding those memories there is no basis to project the future, no basis for anticipation or expectation.

An equal awareness and accepting all that arrives can in itself minimize bias. Deep listening allows being present without judgment of what arrives—no matter the qualities. What happens is the object of observation, not one’s preferences or personal reactions to those objects. Should the analyst desire the experience, deep listening can be used to identify prejudices, expectations, likes/dislike, preferences, what one is drawn to, and so forth. Deep listening can allow awareness to shift to these topics; by mitigating the influences of memory and expectation, an awareness of one’s biases can become more acute.

In these ways, holding an open awareness allows the track to unfold without imposed distortions of listener bias. Listener bias can be directed toward the content of the track, aspects of imposing order on what is heard and identifying substance and functions; listener bias can also take the form of the subjective vantage of the listener.

Subjective Vantage of the Analyst Listener

Each of us carry our own vantage from which we engage the world—including the worlds within records. Our listening experiences are unique as they carry all of our sensibilities and histories, our hearing mechanisms are unique, and how we make sense of what we hear will be different for each of us. We all are attracted to what we are attracted to, and for our unique (typically nonconscious) reasons; we remember what is prominent to our attention, whether significant to the track or superfluous. The analyst, as well, is human and holds these qualities.

In framing an analysis, analyst/listeners will benefit from perceiving their own subjective vantages. This allows one to become aware that their memories are their own—and may not be shared by others. The analyst may focus on that which is most memorable to that analyst, on what is most meaningful to themselves, on that which seizes their attention. This is useful information, and deep listening can facilitate revealing it.

In deep listening from the vantage of the subjective and personal, one can learn much about one’s self. One can be aware of their biases—their preferences and prejudices, the ways they are distracted and attracted by materials or elements, and so forth. Remaining inside the subjective vantage, an analyst might learn why a track speaks to them (if that is desired), they might develop their personal practice and skill levels, and they may learn how their subjective vantage is impacting the analyses they wish to be objective. To listen from the position of the subjective is to be deeply within the personal. This is not always desirable or functional in recording analysis, though; one’s personal perception of the track will likely not reflect what is actually present.

The analyst may choose to remain within the subjective vantage or to move out of it—in much the same manner as the analyst shifts attention from one level of perspective to another.

Stepping outside the subjective vantage allows the analyst to partake in objective analytic discourse. A neutral vantage can be established (as much as this might be possible) from which the track can be analyzed with minimal bias. A dispassionate position, objective in its assessments, and centered on the objects of study is the ideal basis of many assessments and analyses—these lay in stark contrast to the subjective vantage.

With intention, the subject of the listening experience can be shifted just as we have previously shifted between objects of focus or between levels of dimension and perspective. Learning to recognize one’s listening position is central to this. Learning to be aware one is listening from within one’s personal subjective vantage is possible; through deep listening with attention to how one is reacting to what is heard—to what is speaking to the individual’s affective senses, to what they are finding meaningful or moving, and so forth—the analyst can become aware of the nature of their subjective vantage, and how it manifests. Aware of our own biases, we might develop an awareness and ability to slip in and out of the subjective vantage.

It is equally important to learn to listen objectively and dispassionately, putting aside the personal and subjective. This is easier when listening to content than it is when listening for character, and the subjective vantage often privileges aural images over their inner workings. We will benefit from the skill to shift between these two vantages, because they clarify each other, as well as supply different—often pertinent—information about the hearing of the track.

Once aware of the two states, and once skillful in distinctly assuming one vantage over the other (listening from within each position), the listener/analyst can choose which vantage to assume. One vantage can then become the basis for the analysis, privileging one over the other. Even in an analysis keenly focused toward neutral observation and objective assessments, the analyst might learn to recognize their subjective vantage in relation to the analyses they perform; the subjective vantage can inform how they are inherently prone to distort data, and also the subjective vantage may be broadened toward the culturally inclusive, beyond the personal.

A Knowing of the Track

Deep listening and the recording analysis process affords a deep learning of the track, and establishes a deep sense of ‘knowing’ it. With every new hearing, any amount of richness, in detail and breadth, might be added to our sense of the track. Greater clarity of crystallized form’s presence and character may emerge as our memories and experiences of it accumulate—and also the materials and the details down through all structural levels. The analyst/listener’s interpretation gains insight and encounters new attributes and nuance. The deeper the understanding of the details of the track (while maintaining a sense of perceptually balanced prominence), the more aware one may become of its overall substance and presence. The result is a sense of deeply ‘knowing’ the track as a unique and singular presence— a sense of awareness of its fundamental nature.

One comes to a place of awareness in ‘knowing’ the record for its core substance through immersion in the track’s broad-reaching concepts and the unique affects that define its context. Included are small details and large-dimension gestures. Through deep listening, this ‘knowing’ can encounter the track’s holistic character, and an experience of its essential nature. Included are all that the track contains in its confluence, and of course all that the track’s content can elicit—‘knowing’ can ultimately extend to each of any of the individual elements and ideas, aural images and structural elements, etc. within the track.

This ‘knowing’ awareness is important functionally as well. It allows recognition of the largest-dimension characteristics of the recording that are essential aspects of crystallized form—allowing a recognition of the full dimensionality of crystallized form will likewise impact understanding of the music, of the track’s performances, of the lyrics, and of recording’s attributes. It also acknowledges and includes all of the functioning of elements give rise to the motion and movement of all domain elements. It is a sense of the kaleidoscope of the content and character of the track, in all its nuance, and also the complex web of their interrelationships.

Deep listening provides the portal for acquiring this sense of knowing. Deep listening allows one to encounter the track from a position that privileges none, a neutral position that holds all that occurs as intrinsic to the track, a vantage that does not allow the future to be distorted by the past. Deep listening allows the listener/analyst to develop an interpretation that is considered on the basis of what is experienced, with minimal prejudice and bias, with minimal memory-related expectation, and with a sense of the passing window the present, that ultimately broadens to encompass the track’s entirety.

CONCLUSION

As nine chapters have unfolded, we have encountered many forms of listening; they are listed in Table 9.8. Within that list, many are related to or interrelated with deep listening. Throughout this book you have been encouraged to listen deeply, to search for nuance and dimensionality, and to be open to the notion that everything in the track may be an important defining feature. Nearly all of these listening techniques can function within the processes of recording analysis.

As we progress further into the analysis process in Chapter 10, let us remember here that it is only through listening that we engage tracks. At the core of successful listening is attention and intention, discipline and concentration, and nonjudgement and awareness.

There is much to discover within tracks. The challenge of recording analysis is to discover what is within the track, and to accurately perceive its content and character through listening alone. Our ability to listen deeply and accurately will be rewarded by what we can unveil. Our ability to engage deep listening in all its facets, and an open awareness to many potential qualities within tracks, will uncover qualities within records that may well otherwise go unnoticed.

Table 9.8 Forms and approaches to listening.


 Acousmatic listening Listening inside sounds 
 Active (engaged) listening Listening with attention (attentive listening) 
 Analytical listening Listening with intention (intentional listening) 
 Aural analysis Music listening 
 Casual listening Non-directed listening 
 Critical listening Open listening 
 Deep listening (directed & non-directed) Passive listening 
 Detailed listening Pharmaceutical (mood modulating) listening 
 Directed listening (deep listening) Real-world listening 
 Ecological listening Recreational listening 
 Holistic listening Reduced listening
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.131.72