Chapter 5
Observing Elements of Music and Lyrics in Records

With this chapter we begin to engage the actions of putting the framework for recording analysis into practice. The observations process is defined, and then put into practice by examining this initial stage of analysis for the domains of music and lyrics in popular, recorded song. Practical matters will be found alongside more conceptual issues. The analyst’s playback system can impact observations of music and lyrics; while a detailed discussion of playback is postponed to Chapter 6, its presence is felt throughout this chapter.

The focus of this chapter is to bring about the start of the analysis process from a position of intention in defining the goals of the analysis and a position of awareness and attention in collecting observations.

The three stages of the ‘observations’ process are presented in detail. Guidance is provided for engaging listening for the purpose of performing aural analysis of all domains within records, and for enhancing requisite listening skills. For those who wish closer guidance, relevant Praxis Studies are identified in side bars throughout the remaining chapters, directing the reader to Appendix Afor specific studies and exercises.

The many issues concerning transcription are approached, including when transcription is appropriate, when it might not be necessary, and when transcription might adversely influence data collection, materials and an analysis. Approaches to transcription, notation conventions, assisting tools and devices, devising workable notation nomenclatures and replacing notation with relevant descriptions provide alternatives to traditional notation and a recognition of the simple values of accepted systems—no matter their limitations.

Some analyses will benefit from transcriptions; others will require some materials of the track be transcribed. Engaging and organizing observations comprises the second half of the chapter. Timelines of various sorts are constructed with goals of making structural divisions and hierarchies, and temporal divisions visible. Select approaches to collecting information on the elements of music and on the performance of lyrics are examined; these will provide a glimpse of what information from observations was deemed pertinent within these analyses. Adding information collected from observations to timelines is explored in various formats.


An analysis of a record is directed toward a goal—a goal of what level of understanding is sought (and will be communicated), a goal of what type of understanding is sought (what one seeks to learn or discover).

This goal may be directed towards understanding the song, and its lyrics, music, and expression. The objective may be directed towards understanding how the track is situated culturally or socially, what and how it communicates, or any number of other goals outside the record. While a worthy goal is to illuminate the track, to bring clarity to its essential qualities and how the other qualities contribute, this is only one approach, one objective.

For instance, the underlying objective and goal of this writing is to bring (provide a path toward) understanding of how the recording impacts the song—the affects of the recording process. The recording is a central concern, and the object of detailed examination. The track’s musical materials, lyrics’ sounds and content, message, structure, etc. are important to the analysis in that they need to be understood at some level in order to recognize how the recording impacts them, delivers them, enhances them, etc. How the qualities of the track’s recording establish relevant relationships to areas outside its sounds might also be pertinent or of interest. Still, the focus is on the elements of the recording; other domains and other disciplines are, well, ‘other’—they are examined to understand the qualities and influences of the elements of recording.

One can define the objective of an analysis of a recording in many other ways. One can focus on the entire track, or just a portion; one can focus on the lyrics, or on their interactions with accompaniments; one can explore the intricacies of the music, or its affects or expression. The choices are without number.

Even when an objective is well defined, it might shift due to what the analysis reveals. Such a shift occurs as one learns the track better, makes discoveries and connections, and recognizes what needs to be explored more deeply and what other topics need to be included to understand and present what makes that track unique. Still, a clear objective creates a clear path for collecting information—a path which can be broadened and deepened as the process unfolds.

Just as important as the subject matter is the level of detail sought—breadth and depth of coverage. At some point information collection must stop and evaluation begin; at some point the subtle qualities of the record are either central to understanding the track or superfluous, and so forth. The breadth and depth of the analysis is part of defining its goal. This might be the level of detail that is needed to adequately reveal the essential characteristics of the record, or the level adequate to support the needs of the individual analysis—to identify just two of many possibilities. For instance, the analysis might be directed toward a specific audience and need to conform to what they might reasonably be expected to understand.

An interpretation of the track results from the analysis. The interpretation is generated by and reflects the analyst—their skill, interests, knowledge and more. It also represents what they wish to learn about the record, and how they choose to communicate about what they learned. Defining goals that recognize the interests, strengths and skills of the analyst can be beneficial.

Overview of the Framework

To begin this overview of the recording analysis process, it can be helpful to recall the guiding framework’s four principles:

  1. Every recorded song is unique, with unique essential traits. Those essential traits are supported, ornamented and provided context in unique ways, and are organized in a manner that will also have unique qualities, even if subtle, and speak in a unique syntax and language. An appropriate analytical approach is needed to reveal the track’s essential traits.
  2. These essential traits are reflected in all dimensions of structure. Perspective allows the analyst to navigate these levels of detail and of hierarchy to collect, evaluate and synthesize information within all levels of detail, and to compare those diverse sets of data.
  3. The elements that form the essential traits—within and between its lyrics, its music and its recording— contribute in unique ways to every recorded song. While conventions might bring certain elements to dominate a style, every element is in play in shaping the record, every element impacts the recorded song, and every element holds the potential to contribute significant and essential information. Equivalence reminds the analyst to examine all elements, as each has a role and each may provide any function, at any time, at any (every) level of perspective.
  4. The concept of equivalence also helps guide the listening process that is at the core of analyzing the record. Listening with intention and attention establishes an approach to access all the record’s traits and materials, structural levels and dimensions of form, and so forth. Further, listening with intention and attention brings coherence to the process of making observations of music’s elements, the sound qualities of lyrics, and of the elements of recordings. It also facilitates navigating the challenges of transcription in each domain.

The differences between analyzing music from a score and from a record are many, though differences all tend to lead to one being perceived through sight, and the other through sound. The fundamental challenge of accurately analyzing a record is to hear what is there, and to hear it accurately. While not a simple task (especially at first), all paths of the analysis process rely heavily on deep, intensive listening—this is our only access to the materials and expression that draw us to it.

First Step of Process: Observations

The three stages of the analysis process might be reduced to collecting information, evaluating information, and bringing together conclusions. With a multiplicity of materials and elements, domains and dimensions, the track is complex. Dividing the process into three clear steps brings some clarity to engaging this complex texture. “Observations” (collecting data to analyze) is the first stage, and will be discussed here; the stages of “evaluations” and “conclusions” underlie Chapter 10.

There are three primary activities in the observations stage. These three activities do not need to be accomplished in a set sequence. This stage is dedicated to collecting raw data—information on what is present within the track. No explorations of functions of materials, patterns of organization, how the materials work or evolve and such will appear or be determined within this observations stage.

First, the track’s sound sources are identified, along with the basic-level materials they present. These are the surface materials in the music, along with the sound sources that present them (such as accompaniment and bass line, piano or perhaps groove and bass line, electric bass guitar).

Second, several basic characteristics of structure are defined in observations. This includes identifying the length (number of measures) of the recorded song, which requires recognizing the meter and tempo (and any changing meters or tempi). The large dimension timeline can now be established, as in Figure 5.1. It is important to note, the details of this figure contain information that is revealed from evaluations of materials—the song sections and the eight-measure prevailing time unit. This figure is configured to illustrate the overlying, inexact palindromic arch of the song’s structure; notice measure 57 is marked to identify the point where the structure reverses; before and after this point is a verse+bridge+verse sequence. The arch is inexact because the introduction and coda are not proportional; Clapton’s solo substitutes for a verse, and Verse 3 slightly modifies Verse 1’s text.

Third, information or data on each individual element in each domain are collected. The depth and detail of this information is intrinsic to the record (including recording elements). Data collection embraces all that is of concern to the analyst, and can be extended to pertinent stylistic conventions (once evaluation is engaged). Here the musical materials become better defined. Their content is more clearly delineated, but not evaluated. Transcription, descriptions and notation in some form will take place in this stage—if it is to be undertaken. We will learn transcription and description are interpretations, and interpretations are often based on evaluations of sorts.

Figure 5.1 Large dimension timeline of “While My Guitar Gently Weeps” (The Beatles, 1968).

Figure 5.1 Large dimension timeline of “While My Guitar Gently Weeps” (The Beatles, 1968).

Observation, Interpretation and Bias

Interpretation of information begins at the ‘evaluations’ stage; with all (or much) data available, patterns will emerge. Once in the evaluations stage, it is common to return to ‘observing’ to confirm information or to collect additional information. Being aware of the separation of collecting and evaluating brings clarity to the process, and can keep the analysis from being distorted.

It is very easy for data collection (assembling basic information) to be distorted by interpreting that data (even certain portions of it) before collection is complete (or sufficiently complete for accurate evaluation to emerge); for example, one cannot identify the tonality of a song from the materials within its first moments, and one cannot determine the functions of chords without learning how a progression concludes, or the nature of the tonality in which it is situated. Once one begins to interpret information, a natural tendency is to seek information to support that interpretation. This tendency can be so strong that determining the information that is present is transformed into causing the new information uncovered to conform to what one suspects might be present or wishes to be present to confirm one’s theory.

Scientific inquiry has taught us to separate data collection from data analysis, for good reason. For observations to be unbiased (as much as this is possible) the principles and theories explored within evaluation should not appear here, as they will cloud and distort this collection. The notion of neutral or purely unbiased observations is utopian; we approach any material within any track from our unique position of pre-understanding. Our different experiences with music, our listening skills, our personalities, ethnicity, and so on, all unconsciously influence what we perceive. While we seek a neutral observation—seek to discover what is present not what we wish to find—this is not ever completely possible. Significantly, this connects with setting the goals for an analysis, where deciding what data to collect already implies some kind of interpretation. While the ideal is to be unbiased, we need to be self-reflective in order to avoid being biased—it is easy to be biased without being aware of it.

Similarly, there can be a curiosity to begin to evaluate observations early in the process. Some have proposed transcription can be aided by some early evaluation of materials (Winkler 1997, 174–181), and the experienced analyst might instinctively engage evaluation early in the collection process. While this may work (to some extent) for some experienced individuals, it is not an unbiased position; it is clear that in order for the song to be engaged from a neutral position—the position that allows one to most readily and accurately recognize its unique qualities—evaluation is most effective later (and as a separate stage) in the analysis process.

Other forms of notating the track’s sound qualities can take place in observations. Functioning like music notation X-Y graphs and other diagrams for the recording, and phonetic sounds of lyrics can be transcribed. This stage of the process can become quite involved and detailed; it is common for this to happen. A middle dimension timeline, or perhaps a separate timeline for each section of the track, can assist in establishing some order to observing elements and materials. Through timelines, elements may be observed more systematically, thorough data collection can be ensured, and observations organized to facilitate the evaluations to follow.

These observation activities are accomplished through listening to the record. This is an activity that will be repeated a great many times to study a single track. In each successive listening, new information will be identified or extracted; noticing different features and attributes each time, successive hearings access deeper levels of structure, greater detail and/or shift to focusing on different instruments or other aspects of the track. This process will become more efficient as auditory memory improves. To avoid holding all of that information in active memory—and to allow the pool of information to grow—it is helpful to write it down. Transcribing the music and its performance, the performance of the lyrics, as well as the qualities of the recording will greatly assist the analysis, but are rife with issues that will be explored later.

Outside Sources

Reaching beyond the track itself, should the analysis seek to include other disciplines, information in those areas might be collected at this stage. The most pertinent and productive way to engage those areas and concepts will vary between disciplines. Forethought when defining the goal of the analysis will play a central role in when and how to collect that information, and when to begin evaluating it.

A literature review, collecting relevant background information (etc.) will influence, and potentially frame, an analysis. These may bring new awareness to the analyst. They may also skew their direction of inquiry and their perceptions. Engaging outside information should be approached with an awareness of purpose and significance of what is sought, and how that information is used in the analysis going forward.


Listening for the purpose of analysis is a distinctly different type of listening activity than one encounters in music listening, performance or analysis following a score—aural analysis is atypical in a great many ways. Many readers will carry a significant depth of listening experience in some areas to apply to recording analysis. Many of us have learned a considerable amount of music from listening to records; some have even learned to play instruments by listening. This is an extension of the aural/oral tradition that once passed musical traditions (especially popular music traditions) between individuals and generations; the record can be a teacher by virtue of the easy access and repetition of its content.

Within the process of observations, listening with attention and intention play central roles. It can take two basic forms in observations. One approach is intentionally focusing the listening experience (attention) on a certain element or idea—at a specific level of perspective, and within a certain time unit. The second approach brings the listening experience to intentionally and systematically scanning the track for information. This is a contrast of controlled exploration searching for salient features, and of bringing focus to certain features at the exclusion of all else.

Learning to listen to tracks is multifaceted—it is just as multifaceted as the texture of the record. Each domain requires a unique set of skills—the chords of a harmonic progression and the timbral modulations of a vocal line require different skills and abilities to recognize. While there are some aspects in common between domains, each has unique qualities; those qualities require listening skills that are unique in some way to access their information fully.

We learn from examining the elements of recordings that the record holds sonic experiences not represented in nature. Therefore, it is expected that a good many listeners (readers) will have little prior experience with some of these elements and materials—especially without prior knowledge of the elements that might exist in the record. While equivalence can assist the listener in being open to possibilities, the listener needs to be aware of what those possibilities ‘sound like’ in order to engage the act of searching the sonic landscape to experience the sounds of the record. We rarely see something for the first time, but when we do it is obvious—such as a friend’s new hairstyle—and is nearly always something within a known category. Within our senses of touch, smell and taste we rarely encounter something completely foreign or new. In sound, encountering something new is rare in most contexts, but not in some others. The sounds of unknown languages can be readily encountered, and while perhaps not understood the sounds are still recognized as being within the category of language, and connected to the voice and to communication. The nature of recorded sound, however, contains qualities that defy adequate verbal description and therefore listeners cannot be adequately prepared to perceive the qualities; access to perceiving such qualities is thwarted without a clear indication of what one is trying to hear, how one needs to listen to hear it, and where such a quality might be situated within the track.

It can be a challenge to discover sound qualities that have never before been experienced. Awareness of the possibilities, trusting that what has never been experienced is actually present, and the perseverance to keep trying is needed to begin to hear unknown dimensions of sound. Some qualities are more apparent than others; some qualities require a more developed awareness or a more concerted effort before they are perceived.

Learning to listen to tracks in multiple ways extends to perspective and dimension. Observing elements in records requires one to focus attention on many different levels of detail—individually, one at a time. As each level of perspective and dimension contributes to the track, each requires separate attention to its unique level of detail and activity. It can also be a challenge to listen at various levels of detail. Table 2.7 presented the levels of perspective and related detail:

  • Highest Structural Level or Large Dimension; the Overall Texture
  • Upper-Level Middle Dimension; the Composite Texture
  • Middle-Level Middle Dimension; Basic-level, individual sound sources
  • Lowest Levels, Small Dimension; Individual sounds, and activities within individual sounds

Individuals are inherently pulled to the perspective of the individual sound source and musical materials, the basic-level; as organisms we have adapted to instinctively identify sources in our environment. This also represents the surface-level detail of life experiences, and the relationship of self to the world that is reflected in the lead vocal of the song. Listening successfully at other specific levels, though, can be elusive—while listening to the overall sound one can be pulled into details of the middle dimension’s lyrics, listening to the blend of the middle dimension one can be pulled toward an individual sound or instrument, and listening for the subtle workings of a timbre can seem impossible. The mind gets distracted and pulled away. The four primary levels of perspective listed are central to understanding a track; information about the track will be collected at each of these.

Identifying the activities of various levels of perspective requires holding ones attention there. Intention of listening at a specific level, and holding one’s attention there requires continual monitoring of information, to ensure one has not shifted focus to another level of dimension. An intentional shift of focus and perspective is part of this process. The intention of where to focus, and the act of shift from one dimension to another is a valuable listening-related skill that can be developed; it will aid recording analysis in numerous ways. With competence in intentionally shifting attention from one specific dimension to another, other dimension shifts can be engaged; this can lead to learning to shift attention between elements as well as between dimensions. Ultimately listening deeply into sounds (as in the smallest dimensions) and listening to the overall quality of the texture (in the large dimension) might be engaged as comfortably as bringing attention to the basic-level of the vocal line (within the middle dimension).

The lowest levels of dimension contain much nuance and subtle detail; these are the small dimension and the lowest levels of middle dimension. Few people come to recording analysis with much relevant listening experience listening at the lowest levels of perspective. This type of listening may be periodically experienced in practicing performance techniques, within recording production processes, during careful attention when listening to sounds, and more. Still, holding one’s attention at the lowest levels of the small dimension is an atypical activity, although it is one to be cultivated.

The large dimension, or overall texture, is a strikingly different listening experience that is also not common for most listeners. Similar to the lowest dimensions—though conversely—the overall qualities of the track may also present challenges when one seeks to hold that experience within attention. Perception of this structural level brings awareness to qualities like the dynamic shape of the track or its evolving timbral balance.

We are most experienced engaging sounds within the middle dimension levels. Here sounds can be framed as either (1) a focus on an individual performer and/or the materials they present, or as (2) the interactivity between individual ideas and performers; these are directly related to common life experiences. Distinguishing between these two middle dimension levels allows for the examination (1) of the individual (middle-level middle dimension, basic-level) and an examination (2) of how the individuals relate to one another without placing more significance on any one (the upper-level middle dimension, composite texture). Conceptually, the composite texture is a way to perceive all of the sounds of the basic-level in a relationship of equal prominence. If any material, element or instrument is held in the focus of attention, that object of focus will be inherently prominent. To recognize how all of the sounds relate without this distortion, one listens at a level of perspective just above the basic-level of individual sound sources and materials; at this level all can be held in attention in equal significance. Only then can their relationships be accurately heard in the existing balance. This distinction is central to many analysis observations.

Transcribing Lyrics

Lyrics have typically been included with records, within its accompanying documentation, since Sgt. Pepper’s Lonely Hearts Club Band (1967). Recently, if not included in documentation, lyrics are often found on an official website of the artist, their label or their publisher. As more artists are self-releasing material, official lyrics are less consistently included. Relying on third parties, such as websites authored by fans, may not prove reliable.1

Transcribing the words, lines and stanzas (etc.) of the lyrics as they appear on the record is the goal here; this is separate and distinct from the lyrics as performed, which is part of the sung vocal explored later.

Compared to transcribing music, transcribing lyrics appears straightforward; often it is, at times it isn’t. The performance and arrangement can make words difficult to confidently identify, unknown words (or words with several meanings or spellings) might appear, line and stanza structures are commonly disguised, punctuation is assumed or goes unrecognized, some words may be unintelligible or easily misunderstood, and many more challenges will arise. Some interpretation of the sounds and structure of lyrics is to be expected.

Still, transcribing lyrics oneself is the most direct and active way into the words of a song. The process is simple listening and writing, re-listening and correcting. With the aid of digital editing and playback this process can be direct and productive—to the extent words are audible, articulation and punctuation clear, and structure detected. Some techniques of music transcriptions may be pertinent or valuable to unraveling the track’s lyrics.

Music Transcription

Transcription is immediately known to most as converting the sounds of music into music notation. This is fundamentally different from a musical score meant for performance, which represents how the music should sound. The transcription seeks to represent visually what is present in the record (in as much as that is possible), how it sounds.2 Any need we might have for transcription here is to serve the analysis—to make the information available for more careful study. Transcribing to assist analysis and transcribing with a goal to capture all of the aspects of sound are different objectives. Our objective is not to create a document that reflects the complete content of the record, and certainly not to create a musical score that can be used for performance.3

Transcription can aid in collecting information to bring it to hold still for study. One can hold the idea in one’s hand, look at it carefully and study it; one can consider it without relying solely on memory, and process it more deeply. Materials can be compared to others, out of sequence, out of time, at different levels of perspective. Much more detail can be collected, as listening repeatedly refines observations, adds richness and accuracy.

In recording analysis each of the three domains may be transcribed—each in their own way—to still them for study. Transcription here can include the sound qualities of the recording and the lyrics as well as the music—as per the needs of the individual song, and the objectives of the analysis. Though the discussion that follows will emphasize music notation, these factors also apply to the other domains. Specific approaches and concerns of transcribing the recording, music, lyrics, and the vocal line will each be covered separately later.

Transcription within the three domains can allow one to find out what is going on within the track. The very act of focused listening, discovering and transcribing brings a level of familiarity with and a connection to the track’s substance that is difficult to duplicate from listening alone. The looping sequence of listening, writing, listening, correcting, listening to verify can draw one into the analysis process deeply. This exploration and deciphering can be enlightening, and lead to an intimate connection that could not be experienced without exploring its subtleties, and notating them in some manner.

Allan Moore (2001, 35) offers: “[N]otation can act as a memory aid, enabling the aural experience to be (re)constructed.” When something is written down it does not need to remain in active memory; the material can be edited, corrected, added to and referenced at any time in the future. Notation is also an efficient way to isolate instruments and parts, structural units and individual elements in the musical stream, to draw them close for study and to commit them to memory.

Peter Winkler (1997, 172) observes:

The sound of the track is ever unfolding. To freeze its ideas and elements for more careful examination is in some ways liberating. When listening there can be a sense of urgency to absorb and remember as much as possible before the fleeting moment has passed; notation can free one to ignore all but specific materials, knowing all others can be included later. In this way, the listener can seek to find what is going on deep in the track, as details emerge that a casual listening could not reveal—details that are integral to the track, but that might otherwise evade discovery.

This all, of course, assumes the transcription actually reflects the sounds within the track; a state that is not entirely possible.


Transcription creates challenges and brings difficulties as well. “Transcription is not an innocent activity” (McClary and Walser 1990, 282). Its results can bring severe consequences that may not only limit the effectiveness of the transcription; it may actually distort the material, and thereby diminish the analysis. Indeed, sometimes it can be best not to transcribe materials, as “fixing a piece in notation may be not only unnecessary, it may be an actual impoverishment” (Winkler 1997, 173). Transcribing is a reductionist activity; it summarizes an experience of the specific performance captured on the recording in order to have it conform to a notational system (whatever the system). Nuance and detail is lost, and perhaps its essence distorted or its substance transformed.

Notation systems themselves are inherently flawed and incomplete. Western music notation has great limitations in presenting the sounds of music, but it was not designed for this purpose. Wilfrid Mellers (1973, 15) observes:

Western music notation at its core is an incomplete set of performance instructions; it most clearly represents pitch and rhythm, and which conform neatly to precise grids within written form. Tempo, dynamics, arrangement, expression, instrumentation, variations in timbre from performance and all others are represented with arbitrarily assigned symbols and verbal indications, and are less precise representations. Thus, the notation does not reflect the sound of the performance. The act of bringing a performance into staff notation

Western music notation cannot easily flex to match the nuance of pitch and rhythm represented within much popular music performance. Capturing some aspects like the flexing of the beat and interactions of parts in the all-important rhythmic feel or groove may well be impossible at times. The aspects of the record Western notation does not directly address are some of the most important aspects of popular music; “those elements which listeners tend to find most interesting in popular music and which most nearly capture the music’s particular strengths (rhythmic and pitch nuance, texture and timbre) are impossible to accurately notate. . .” (Moore 1997, x).

There is “an absence of a standard, easy visual representation” for vocal quality and timbral qualities of any instrument; “conventional music notation is helpless here” (Moore 2001, 35). The matter of vocal and instrumental timbres is a significant one. While an approach to a visual representation of sound quality and an approach to discussing it is offered within the domain of the recording (one which could be applied to the other domains as well), it is not easy to accomplish and by no means standard; still ‘sound’ is at the core of popular song, and begs to be engaged—visually represented or described. Describing the materials might seem a better choice to notation, though this, too, has its issues.

It must be obvious by now that transcription can take considerable time and effort to accomplish— even if only making incomplete sketches. One’s skill levels in hearing within the various elements and domains will impact the accuracy of the transcription, as well as the time involved. Knowledge of and experiences with music, the elements of recording and the sound qualities of lyrics also impact accuracy and time/effort factors.

It is often difficult and sometimes nearly impossible to hear inside the fabric of the record. Real perception issues exist, such as trying to determine the notes present in a piano chord within a complex musical texture. Psychoacoustics can inform some of these difficulties, but does not provide easier access to the materials. There can be uncertainty about which notes are being played, and where they are placed against the metric grid; pitch and rhythm cannot be determined with absolute certainty. Instruments playing in unison with others have timbres fused and disguised; this can happen when harmonically related as well. These are but a few examples of many, and similarly impossible perceptual situations abound in the recording as well.

Then the question becomes what is possible to hear that is being missed by the analyst, and what must be accepted as ambiguity; whether brought about by blending, combination tones, masking, or something else may be helpful to recognize, but might not be worth the effort. Ambiguity within the sound, the skill-level and experience of the listener all conspire against detail in the transcription. Peter Winkler (1997, 193) has observed these challenges:

Lastly, music notation can be unwieldy to use, even for those with much experience and skill in music dictation. One should expect to find other approaches to music notation, and the ways to notate the recording and the lyrics at least equally involved and uniquely complicated. Just as an excellent control of music notation can be valuable, so too will be a sense of timbre, as noted by Allan Moore, above; timbre analysis and descriptions have many applications in the recording, in vocal phonetics and sound qualities, and in music.

Tools and Devices, and Inventing Notations

The most valuable tool for transcription is a well-trained and informed ear. Knowing how to listen effectively, knowing what one is seeking to hear, and knowing the options and mechanisms of how to accurately write down what was heard are part of this trained and informed tool. Experience taking music dictation is valuable; it is a skill that can be applied toward engaging the elements of the recording as well. Some practical techniques have been devised to assist dictation and transcribing music. Pitch matching to decipher melodies, searching out chords and progressions on a piano or guitar, tapping or clapping out rhythms and singing to replicate vocal sounds can all supplement or carry the process; other performance and deductive techniques can be discovered by individuals (what works for them individually) to identify what is going on in the sounds—and to simultaneously improve their listening skills.

A digital audio workstation (DAW) or sound editing software can likewise be used as a tool to improve listening skills as well as to aid transcription. It can ‘stretch time’ (ibid., 174) to allow materials to be heard more clearly, especially helpful when slowing the playback speed without altering pitch or timbre; details will go by at a slower pace, altering tempo without substantially altering pitch or timbre. Portions of the audio range may be filtered (diminished or removed) to reveal a particular element or material by reducing interference and/or masking by other sounds. It can isolate sounds, sections, phrases, etc. for study; it can loop them for ease of examination; even reorder materials from throughout the song to place them side by side. Numerous other uses will arise. As a tool for educating the ear and improving listening skills, it can be used to aid in recognizing and transcribing many recording elements and their activities as well as music and lyrics. DAWs can highlight the spatial elements of the recording: allowing one to become aware of timbral detail and definition, provide a sense of lateral spaciousness and discrete boundaries, and might provide ways to examine environments. A DAW can provide access to the inner workings of timbres—a skill central to the analysis of many elements of the record, across its domains.

A potential concern with using the DAW for learning and transcribing is a diversion from a focus on sound; it is very easy for one to become overly engaged and reliant on sight and what is on the screen, and to lose focus on hearing the track. Further, one might see items that one cannot hear—and that have no place in the sound of the track. This concern also applies to spectrograms, waveforms, and other visual representations that are directly generated from audio through sound analysis devices and software, though they, too, can be of assistance.

Spectrograms can be “effective in refining and focusing the listening experience” (Cook 2009, 225). They represent sound in three dimensions: time (left to right), frequency (bottom to top), and amplitude (with color or degrees of shading in black and white). Spectrograms appear in many different forms, depending on formatting and sensitivity settings of the device or software; though these three dimensions are unchanged, different qualities can be generated.

David Brackett (2000, 27) used a spectrum analyzer to “freeze” the musical surface into photos of the spectrum of the sound over time as an aid in transcribing “the most prominent aspects of the musical surface and to comment on the melodic process, rather than to search for hidden relationships between different components of the musical texture.” “The spectrum photos record all the sounding physical vibrations present in the recording” (ibid., 65), and in effect display the timbral balance of track. These photos were able to provide some visual clues as to the content of vocal sounds, that might not be possible with other musical textures. The photos were used effectively to examine and to illustrate the performance styles and vocal qualities of Bing Crosby’s and Billie Holiday’s recordings of “I’ll Be Seeing You.” Among the pertinent information discovered,

Serge Lacasse (2010a, 2010b) has incorporated spectrograms into several detailed studies of language and vocal sounds within recordings. He offers: “In the absence of any comprehensible system of notation for paralinguistic features, an aural interpretation will be relied upon primarily, though with recourse to spectrograms which offer a useful visual representation of some of the paralinguistic features encountered” (2010a, 231). A spectrogram can illustrate timbral qualities of vocal sounds, as combinations of this frequency and amplitude information over time. Lacasse uses this to illustrate and analyze paralinguistic sounds, serving to engage the emotions and expression they produce. Spectrograms serve to replace notation; they are able to represent visually exactly what is happening acoustically and allow access to the sound itself (though, again, it is important to remember what is represented visually often does not translate accurately into what is heard). Nicholas Cook observes

In some instances, this approach may not be effective in bringing clarity to the content of some vocal lines, or of various sounds or sections; spectrum analysis of records is not a universal source of information (and Brackett clearly did not present it as such).

Unless one has access to the source files/tapes that generated the record, the visual representation will contain all the track’s sounds; this is the track’s mix. Seeking information about an individual instrument (even a prominent lead vocal) may yield limited results—these results may provide some important insight or information, as we have just learned, but may also be misleading or difficult to interpret. This is entirely dependent upon the context of the track. The process of trying to interpret sound through a visual image rekindles thoughts of the musical score, and the difficulties in imagining sound from the visual—and how one’s tendency turns to evaluating what is seen instead of what is heard; this holds for the X-Y graphs that will be offered for recording elements as well. Approached with caution and remembering our goal is to understand the experience of sound, spectrograms can be useful. “When they are integrated into the working environment of studying recordings . . . they help to transform listening into analytical interpretation” (ibid.).

Waveforms are generated by software programs and within DAWs, and illustrate dynamic alternations of frequency over time. These have been used successfully to assist in observing sounds. Anne Danielsen (2010b, 2012, 2015) has made extensive use of waveforms to transcribe microrhythms, as they can be very helpful in parsing out dense percussive textures. In Stanyek (2014) she notes:

Waveforms may be used alone or in combination (aligned in time) with spectrograms images as a relevant option; the timbral information of spectrograms contrasts with that of the waveform representation to reveal sound multi-dimensionally.

Automatic transcription appears within the discussion below, in making nuanced observations in rhythm. It is not a replacement for listening, but can add a different perspective of pitch and rhythm. It does not add substance to observations for other elements (dynamics, timbre, texture, etc.). Some approaches of automatic transcription for pitch and rhythm will have predetermined interpretations of these elements; the authors of those transcription algorithms have made choices that add another layer of interpretation. This predetermines the set of choices available when specific characteristics of the sound are translated and transferred into notation. Choices are made by the process that define the raw data against a few sets of preconceived possibilities—not unlike the limited options of some traditional music analysis methods.

Some analysts have devised their own approaches to notation. Among these are the graphic renditions of gestures offered by Richard Middleton (2000a, 111) that are reflected in Figure 5.4, and also the arrows showing alterations in rhythm and pitch incorporated by David Brackett (2000, 63) that are incorporated into Figure 5.9. New or invented notations can be helpful, effective and relevant, when carefully conceived, and the analyst might choose to explore this matter themselves.


Description can often replace notation, or function independent of notation. The reasons for this are many. Some aspects of sound defy notation, and an analysis may well include dimensions other than sound. At times description alone can reveal the essential character of the materials of music and lyrics; at others notating material is unwieldy or distorts the materials. Perhaps it is a choice related to the skill level of the analyst, or that the texture makes transcription difficult or impossible. Incorporating disciplines outside the track into an analysis will assure descriptions will need to be used, in order to embrace their substance and benefits to the analysis. Notations of music, lyrics or recording all have limitations, and simply might not provide a suitable platform to represent certain materials—especially when domains and disciplines intersect and overlap.

Bringing language to the track is an interpretation of its sounds and materials, from the listener/speaker’s unique perspective—their own ‘subject-position.’ Eric Clarke introduces this concept from cultural studies and the context of film. Every viewer has a unique interpretation resulting from the “individual’s particular circumstances, experience, background, and aesthetic attitudes, as well as the specific . . . occasion . . . But . . . there is a limit that can be attributed to properties of the film itself—understood within a certain shared cultural context” (Clarke 2005, 92–93). The personal and the cultural impact our interpretation, as well as skill levels at assessing sound also influencing the success of talking about sound. With our limited vocabulary specific to sound, our attempts to adapt or adopt terms from other modalities or experiences are fraught with difficulties and distortions.

Though often desirable, describing lyrics, music and sound can be difficult. Many words are needed to replace what might be represented in a visual; writings can quickly become repetitive and tedious; descriptions themselves can be misleading, filled with imprecise language. Descriptions can color the materials with thoughts and language better suited to creative literature or a personal blog than to an academic writing that seeks some level of depth, with some level of commonality and objectivity. It can be difficult to explain the substance of sound, of musical materials or of the lyrics without including one’s impressions about them; yet those impressions are exactly what one wishes to avoid while collecting details (making observations) of what is present. This is the duality of content (what is present) and of character (interpretation of what is present). The techniques of observing timbre and pitch areas (in Chapter 7) might open one to this substance of ‘what is present,’ and enhance understanding to enrich descriptions. Describing without distorting the sound or without relying on personal impressions is a challenge that will be engaged often throughout the remainder of the book.

‘Sound’ is central to the recording, and to the recorded popular song. Talking about sound and describing sound will be covered in greater depth in Chapter 6, and timbre analysis in Chapter 7. Those concepts can be applied to the ‘sounds’ of lyrics and of musical instruments. Descriptions of musical materials and their activities and of the lyrics are separate matters from timbre itself. Descriptions are based on ‘what is happening’ and ‘how’ they are pertinent—and relate to evaluations and conclusions explored in Chapter 10.

No matter how skilled the writer or speaker, to put a perception into words is to filter its richness. Description is reductive, and description is an interpretation. Language summarizes an experience, eliminates detail and dimensionality. The resulting description loses something, perhaps much, of the original experience. Depth, detail and character are likely to all be diminished, if not erased. Experiences of sound defy language—by their very nature, and from our limited vocabulary. The experience is lost within the description—and, yes, also within notation. To describe an experience is to define it; we define it by what we perceive, what we understand and what we are capable of putting into words.

Linked to Analysis Goals

To be valuable, a transcription should provide information that aids the execution of the analysis, and that facilitates and illustrates its discussions. The ideas, conclusions and discoveries one wishes to offer might benefit from illustration in notation. These can include information central to the goals of the analysis that are suitably explored with notation. There should be a purpose to transcribing materials: from exploring elements to determine their relevance and materials to examining in detail materials clearly identified as the substance of the track.

For example, an analysis that seeks to define the content and sound qualities of lyrics may pay little attention to some musical qualities, and an analysis that focuses on pitch and harmonic relationships may pay little attention to lyrics.

An analysis that brings its attention to how the recording contributes to the record will seek a significant breadth and depth of understanding of the elements of recording. This can bring one to create detailed graphs or charts of the elements; to explore their characteristics, shapes and motions, etc. The analysis could also seek to identify how those elements deliver the lyrics, and their impacts on the character and context of its performance, potentially bringing a need to transcribe areas of lyrics. The impact of the recording on the musical materials might also be objects for examination; perhaps this will require some transcription of musical materials, perhaps not. A clear objective of the type of understanding and level of detail of the analysis will guide decisions related to transcription—how much detail to uncover, and which areas to examine.

All sounds need not be transcribed for an analysis. An analysis can be thorough, meaningful and effective without converting all sounds into some type of notation.

Certain features are pulled from the sound of the record that are of particular interest for careful examination. This occurs once evaluation has begun, and transcription and analysis are no longer separated. Prior to seeking this level of detail and understanding, transcription is used to gather basic information: what chords are present (not their functional relationships to a tonic), what are the basic rhythmic patterns (not the subtle aspects that will be sought later), what pitch do I believe I am hearing in this performance (ambiguity acknowledged, for follow up once evaluation begins) and so forth. Once some evaluation has established such things, a transcription can include “looking for specific things . . . and I found them. . . . As the transcription process went on and I focused on more specific questions . . .” (Winkler 1997, 194). It is important to recognize that this establishes a circular function of sorts, where more transcription may take place after evaluation has started.


To collect observations is to establish the pool of information that will be evaluated in the analysis. These observations are inherently the analyst’s preliminary or first interpretation of what is heard. Mindful of one’s biases and skill level, one might be able to identify what is present without distorting it; one might look beyond what they expected to find, be open to recognizing unknown qualities, and so forth— though this is a challenge that cannot be entirely achieved. Collecting information on the track must reveal the unexpected and the unknown, if its essential qualities—what makes the track unique—are to be found. Holding a sense of one’s inclinations and biases might minimize distorting the data, and may keep data collection as neutral as can be possible.

Before collecting observations begins, some basic questions will be engaged to provide some guidance. These can establish a meaningful way to determine the elements and level of detail to be engaged. One should expect the path to collecting observations will take unexpected twists, and lead to exploring some elements/materials more than expected, and some less. The questions might be:

  • What is the goal(s) of the analysis, and what must be examined to meet them?
  • What is pertinent to this stream of inquiry, and what is going to be of limited significance?
  • What materials/elements need to be explored to understand the essential qualities of the song, as they pertain to the goal(s) of the analysis?

What speaks most prominently from within the track? Those features that draw one’s attention are very likely to be significant. The obvious is typically important, and might be explored first as a gateway into the more concealed or less known.

Expect this observations process to turn in to a circular process after a critical mass of observations has been established. As one observation leads to another, one will be drawn to other areas and collecting information on unexpected elements/materials; these might lead to some adjustment of the analysis. As initial evaluations gain depth, it is common to determine other materials need observations, more detailed observations are required of some elements/materials and some might have already been examined adequately, perhaps excessively.

Just as every track is unique, every analysis process will be unique (in some way) if it is responsive to the track. Observations within each domain will also be unique in some way. The next sections will identify the steps to collecting and organizing observations that, if approached without bias, will reveal the essential characteristics of the track:

  • Establishing appropriate large dimension and middle/small dimension timelines
  • Identifying sound sources, with a general identification of materials and lyrics
  • Processes and concerns for collecting observations within each domain, with music and lyrics explored separately, and recording explored in the succeeding chapters

Constructing Timelines

Structural timelines can be important organizational and analysis tools. These are distinct from the timelines that will appear in recording element X-Y graphs. Here, timelines are their own graphs or figures; they display information about the track against its structure (as time). Data recognized in observations is added to a timeline, organized without evaluation simply by placing the data against time; the information has no meaning or function, it awaits later evaluation. Once evaluation starts, patterns might emerge visually or from examining data; functions can become evident as large spans of time can be condensed, large amounts of data can be observed rapidly, if not simultaneously.

The ‘X’ axis of the timeline is always time, unfolding left-to-right. The axis may be divided into measures, groups of measures, or beats within measures. Once established, the resolution of this time axis remains constant, so as not to distort the consistent speed of unfolding time.

Structural timelines may take many forms; they can be applied to all perspectives, and can incorporate both observations and evaluations of elements in all domains. The vertical, ‘Y’ axis will dedicate space (a level of the graph above or below the line) for each aspect of the track included; this will not move, so the graph has consistency and one knows where to find information. The ‘Y’ axis will be labeled at the start of the graph as to what it includes at each level of height.

Timelines may be dedicated to large dimension structure alone, as seen in Figure 5.1. They can display structure in general terms, or present great detail. Structural divisions at or below the measure level and extending up through the highest strata might be visible simultaneously. Figure 2.1 illustrates several structural levels of imbedded strata within “Every Little Thing” (1964).

Timelines have great potential as analysis tools. As observations are collected, the nature of that information may be noted on the timeline; alternatively, when information might be more complex (such as details of a melodic line), the timeline may reference notes or notations on the material in an outside listing or outline. The large amounts of diverse information available (and visible) in timelines aids evaluation processes directly. Information of all sorts can be related, as they are located vertically against structure. Materials from one section or level of perspective can be compared to others in any structural level or location. This ready access to large amounts of information will be useful in a great many ways.

Adapting Timelines for Each Dimension

Timelines can be dedicated to specific perspectives, or structural levels, such as large dimension, middle dimension or combining middle and small dimensions. Within these dimensions, a timeline might be dedicated to a single domain and its elements, or may combine domains.

A large dimension timeline will, by definition, reflect the track as a whole. The time units displayed will be measures or groups of measures, as appropriate to clearly show the major structural divisions and any large dimension materials it contains. Elements at this large dimension will be listed in the ‘Y’ axis; they will reflect a single domain being observed, or may mix elements between domains (such as including lyrics within a music domain timeline). Elements may have fixed levels for the entire track; for example, a single tempo of the track would be a fixed element, as would the track’s perceived performance environment. Some elements will exhibit variable activities. In some instances, this means a few changes over the course of the track, such as key changes between major sections; these tonal areas can be readily shown within the major structural divisions. Other variable elements might require a higher degree of detail in the timeline even at the large dimension; for example, the track’s pitch density may continually change and require a resolution that makes changes clear within each measure.

The divisions of lyrics into stanzas might be noted on the large dimension timeline, identifying the first line of text on major structural divisions. Defining the functions of sections—such as verse or chorus—is best left until the evaluations stage; assessments of any type are typically most effective

when delayed until all information has been collected. The large dimension structural divisions, those that separate major sections (such as at the verse/chorus level, or higher-level division) should be noted when they are audible (and they typically are) by a vertical division in the timeline.

A middle dimension timeline can contain activities at one or more levels of perspective; it might be appropriate to locate composite texture information above the timeline and sound-source level materials and activities below. It can also be valuable for a timeline to display middle and small dimension materials in a similar way. Middle dimension timelines will be divided into measures or half-measure, as at least some of the material they graph or represent will have the potential to change often, or to start/ stop at any point within a measure. The timeline attempts to depict where activities take place; the exact nature of those activities is most effectively articulated on accompanying pages—especially within the working timeline that compiles observations.

Middle dimension timelines might represent a single section or several adjacent sections (such as an introduction and the first verse). It might encompass the entire record. The individual sections of a track-length timeline might still be compared with little effort. The timeline in the middle/small dimension is most useful for data collection. Figure 5.2 illustrates how melodic phrases, chords, sound sources (timbres and arrangement), verbal space, lyrics and structural divisions might be placed on a single middle dimension timeline; the elements contained in the graph can be changed to best suit the track.

Timelines are most useful when time divisions remain consistent once they are established; any inconsistencies distort the rate of time passage and confuse data. This is disorienting to the reader of the analysis (perhaps confusing the analyst, too). Moving between timelines is preferable to changing a timeline’s ‘X’ or ‘Y’ axis. Additional timelines can be incorporated into data collection or analysis as the work unfolds. The information they contain, their perspectives and lengths can vary markedly one to another, but should remain fixed once established.

Timelines can also be used to illustrate evaluations and conclusions (this will be explored later). In these instances, timelines may have a more unique format in order to most clearly present materials and relationships under discussion.

Identifying Sound Sources and Materials

It makes good sense to begin our analysis by using the perceptual process that allows our auditory system to sort out and differentiate sounds from the complex mixture of acoustic energy in our natural (or created) listening environment. Auditory stream analysis studies how our brain can separate sounds from the mixing of all sounds at once; which sounds, from which source, identified as connected in a unique stream allow the listener to track single lines within the texture. Otherwise, if, for example, we heard several people talking at once we would combine the separate streams into new words and sentences (Bregman 1990). Identifying the track’s sound sources and the basic-level materials they present not only pulls the listener deeply into the track, this process is also our natural tendency, and one of the basic ways we understand what we hear.

Figure 5.2 Middle dimension timeline containing select elements of music and lyrics; beginning sections of “Here Comes the Sun” (The Beatles, 1969).

Figure 5.2 Middle dimension timeline containing select elements of music and lyrics; beginning sections of “Here Comes the Sun” (The Beatles, 1969).

These processes provide an opening into the essential qualities of the track, without encumbrance of seeking to identify detail, function, meaning and so forth. The sound sources in records typically have unique characteristics and warrant individual attention; these include all individual instruments and voices present, as well as extra musical sounds and effects. The result will be a listing of the sound sources of the track, and some sense of the materials they present.

Depending on analysis goals, this listing of sources may need to be complete and detailed, or selective in identifying only prominent sources, significant instruments/voices, or some other limited number that serves the particular analysis. Identifying all individual instruments in groupings (such as all cymbals and drums in a drum kit) might be needed for certain observations, and not for others. Listening for and identifying sound sources can often be readily accomplished in textures with a few instruments or mixes that leave sources exposed. Depending on how sources are combined as well as the number of sources, instruments and voices can fuse, blend, mask, bury or otherwise disguise certain sources. It is not unusual for a source to be ‘discovered’ only after many hearings of the track.

Figure 5.3 Timeline of the opening sections of “Let It Be” (The Beatles, 1, 2000 version) with sound sources identified in each section.

Figure 5.3 Timeline of the opening sections of “Let It Be” (The Beatles, 1, 2000 version) with sound sources identified in each section.

Naming sources is sometimes direct and simple—such as identifying an instrument that appears in only a single form, or recognizing the lead vocalist by name; this is not always so. It can become confusing to identify sources as more instruments are added and groupings of sources are formed; then it can be challenging to name them. Many instruments can be identified by their unique sounds (such as ‘Hammond organ’), by the type of instrument (electric bass) or by the instrument plus the type of material presented (‘acoustic rhythm guitar’). With more than one independent instrument of the same type, sources can be numbered by order of appearance (‘synthesizer 1’ being the earliest), by an effect on a source (‘flanged electric guitar’) or by range with the highest instrument most commonly the lowest number (‘tom 1’ being the highest). A group of sources functioning together as a single idea might be labeled similar to ‘background vocals’ or ‘string section.’ If a source cannot be recognized, ‘unknown sound #1’ would be appropriate (noting timing of the sound’s appearance) until it has been successfully identified.

The general content of materials presented by sound sources may be noted now. This should not be detailed here, but rather provide a bit of definition between sources and delineate their interaction. This will give some guidance as to which materials are primarily melodic, rhythmic, harmonic, rhythmic/groove or sound related—while recognizing these observations might later be refined or revised.

Figure 5.3 is a timeline of the opening sections of the 1 (2000) version of “Let It Be” divided into song sections, and showing the main structural divisions and the four-measure prevailing time unit. Sound sources are placed against the timeline; sources present throughout a section are listed at the beginning of each section, and sources that enter well into a section appear near the point of their entry. A general idea of each source’s materials might appear either on the timeline or in an accompanying listing, where descriptions such the following (pertaining to the Figure 5.3) might be clearer:

  • Piano: accompaniment, rhythmic chords and bass line
  • Hammond organ: contextual accompaniment, sustained chords
  • McCartney, lead vocal: primary melodic materials and text
  • Backing vocals: accompaniment/counter melody, sustained harmonies, no text
  • Hi-hat: isolated strikes on back beats; hints at a groove
  • Snare, tom and bass drums: enters as a drum fill, lead in to drum groove

Identifying sound sources and materials is the first stage of collecting observations within ‘arrangement and texture.’ The remainder of observations processes are presented below by element.

The basic structure of the lyrics might also be collected at this stage. Figure 5.2 illustrated how verbal space and other lyrics data might be included in early timelines. While evaluation of lyrics is a separate step that follows later, some impressions of the structure and the content of the lyrics can prove useful if captured here. These observations might be followed by collection of greater detail and finalized later, as below.

These initial observations will directly assist the data collection of elements and materials that follow in the next sections.


Following will be six descriptions of analyses of popular songs, or of writings that contain song analyses. These examples are offered to guide one in determining what to evaluate in the music to meet the goals of an analysis. Further, these might provide some direction to the acts of identifying what must be transcribed and of choosing what will be transcribed in order to support the observations process or to articulate an analysis.

David Brackett (2000) analyzes many dimensions of Elvis Costello’s “Pills and Soap” (1983). In his analysis he includes a transcription into staff notation of the song; in it the synthesizer, voice, piano and percussion parts are clearly identified. The analysis covers underlying harmony, harmonic motion (chord progressions and implied tonalities), materials and developments in the vocal melody, its octatonic pitch collection, ostinato lines, percussion parts, vocal qualities (aided by spectrum photos), range and tessitura, grooves, and melodic gestural shapes. Observations needed for these evaluations identified musical instruments and parts, structure, chords, rhythms, instrumental melody lines (bass, synthesizer and piano), vocal part and vocal timbres, percussion rhythms and cross rhythms of grooves, pitch areas and ranges of parts, and likely more. The transcription’s detail does not include rhythmic and pitch nuance of the performance, though discussion addresses timbral qualities of the vocal performance in some depth.

In “Analytic Methodologies for Rock Music” Lori Burns (2008) focuses her evaluation on the governing harmonic progression and on the bass and vocal lines (to examine voice leading). These evaluations are central to her analytic method, which is the topic of the writing; the analytic method is demonstrated through her analysis of Tori Amos’ song “Crucify” (1992). Observations that would have been performed to lead to these evaluations would have included identifying chords, notating the bass line and the vocal line—though likely much more actually occurred. The level of detail of these transcriptions did not need to include the nuance of the performance to meet the defined goals of the analysis; the transcriptions gave rise to presenting the normative progression, voice-leading graph and reduction staff notations that are central to the method (ibid ., 74).

Stan Hawkins (2000) examines harmonic materials and organization of Prince’s “Anna Stesia” (1988). Observations required for this primary goal include chords and chord sequences, structural divisions and melodic lines. The full breadth of the analysis, however, extends well beyond ‘harmonic analysis’ and presents evaluations of phrase structure, texture, bass lines, vocal timbre and expression, dynamic contour, and more. Included in his study are sonic qualities of the production: vertical density, recording’s elements related to signal processing and mix elements, and other subtleties within the production. Few transcribed materials exist in the presented analysis; it is obvious a considerable amount of detail was extracted from the track that was not presented in his writing.

Observations of pitch materials’ melody and especially harmony form the backdrop of Walter Everett’s (1986) analysis of “Strawberry Fields Forever” (1967). Though “Fantastic Remembrance in John Lennon’s ‘Strawberry Fields Forever’ and ‘Julia’”4 examines other aspects of the song in some detail, considerable attention to the harmonic vocabulary provides strong cohesion to the analysis. Other areas covered include the ‘fantastic remembrance’ basis of the article, the sound qualities of the recording and production techniques, orchestration, voice leading, connections of text and music, and meaning. Some sizeable sections of the song are transcribed in considerable detail; included as outgrowth of the transcription process are Schenkerian graphs of middleground and foreground structures. Performance nuance is not included in the score; though rhythms are detailed to provide clarity of materials, performance nuance is not central to the goals of the analysis.

Contrasting “Fantastic Remembrance” with Everett’s discussion and analysis of “Strawberry Fields Forever” in his The Beatles as Musicians: Revolver through the Anthology (1999, 75–84), we witness a shift of breadth and focus. The discussion broadens to include the compositional process and the development of the lyrics and some musical elements. A more detailed examination of making the recording follows, that discusses vocal and instrumental sound qualities as well some qualities of the recording. The musical elements are not explored in great detail, but the coverage is all pertinent and insightful; also acknowledged are the sound qualities of instruments and vocals, melodic ideas and lyrics (at times acting as a unit), harmonic progressions and ambiguities of surface harmonies, tonal areas and arrival points, and bass lines. Observations undoubtedly included pitch collection for melody, bass lines and chords, rhythms, instrumentation, percussion rhythms, and structure. Other observations certainly informed what is present, though they did not become topics of focus in the writing. Many transcriptions were included, all of the compositional drafts that provided support for his discussions of the compositional process and sequence; these do not contain performance nuance, as that is not relevant to his discussion.

In “Determining the Role of Performance in the Articulation of Meaning: The Case of ‘Try a Little Tenderness,’” Rob Bowman (2003) looks deeply into four records of this Tin Pan Alley song.5 These tracks are performed by Bing Crosby (1933), Aretha Franklin (1962), Sam Cooke (1964), and Otis Redding (1966), and contain many performance technique and stylistic differences. The analysis carefully studies the vocal lines of the four performances for melodic interpretation (relationship to the original sheet music), pitch, rhythm, timbre, “timbre, dynamics and playful voicedness” (ibid., 115–118) and speech singing. Other aspects examined include structure, harmonic setting and instrumental accompaniment, and, of course, meaning. A significant number of vocal transcriptions are present, some showing nuance in performance related to rhythm and pitch. Observations would have included transcriptions of some vocal lines, and collecting information on chords, rhythms, vocal timbres, dynamics, accompanying instrumentation and musical parts, structural divisions of timeline, and likely more. The evaluation process also utilized the original sheet music—a step not common in popular music analysis, but of relevance here as it is the original medium for this song.


At the observations stage we attempt to access the melodic materials without becoming overly detailed. Without engaging the function of pitches, information will be examined and collected on the vocal melody and bass line. The crucial vocal melody and the bass line jointly establish the voice leading that is central to many songs (Burns 2008, 67). Riffs and ostinatos, and other melodic materials such as solos and background vocals, may also be engaged, as they are often significant.

A detailed transcription of pitch and rhythm of the lead vocal and bass line is often useful, but is not always needed for the analysis. Instead the melody might be conceived as a gesture, with shape and contour over time. This simplified approach of capturing some of the essential characteristics of melody might be useful here. Melodic contour might be sketched; using beginning and end pitches, highest and lowest pitches and sustained pitches in the phrase, a general shape can be established that can then be filled with further detail—for example any emphasized pitches, implied or outlined chords, and repeating shapes. The shape of melodic contour unfolds, and can increase in detail or transform into transcription, as desired.

Melodic phrasing of the lead vocal is typically important to collect. It can be marked on a middle dimension timeline to establish phrase structure, against the structural divisions. Phrasing of the lead vocal will coincide with verbal space (as in Figure 5.2), as it is a different way to observe the data percept.

By collecting information on melodic contour, melody might be engaged without excessive detail. This can be helpful, especially as a beginning point. Gestures of melodic contour might be mapped phrase by phrase. A reference of the highest and lowest pitch levels of the middle dimension section (defined above) might be identified. Central or significant pitch level(s) might be identified to serve as references for mapping the contour. The shape (from very general outline to a more detailed contour) of melodic movement can then add detail between these reference pitches without concern for specific rhythms, intervallic patterns, or pitches.6 The gestural shapes of contours can illustrate tessitura, shapes and ranges of melodic gestures, indication of melodic complexity (patterns, speed, density), and so forth. Perhaps a tonal center will be perceived by experienced analysts, but tonality will be examined later in evaluations and the goal here is to collect information without concern for function. Figure 5.4 illustrates general melodic contours of Lennon’s vocal in the first chorus and the first verse of “Strawberry Fields Forever”; each contour is a phrase, separated from others by silence. This figure could be more precise (showing specific pitch levels) if warranted.

Figure 5.4 Melodic gestures illustrated as general melodic shapes and contours. John Lennon’s lead vocal from the first chorus and first verse of “Strawberry Fields Forever” (The Beatles, 1967).

Figure 5.4 Melodic gestures illustrated as general melodic shapes and contours. John Lennon’s lead vocal from the first chorus and first verse of “Strawberry Fields Forever” (The Beatles, 1967).

Repeated pitches can be identified and incorporated into the contour, and can illustrate qualities that might later be found related to recitation or lyricism. Completing this process, melodic materials that might be recognized as significant—a prominent bass line or the main ideas of the vocal are typically readily identified—may be notated. These core melodic materials can be transcribed in a general way, to capture their basic interval sequencing and fundamental rhythms, without engaging evaluation. The nuances of performance will not be incorporated at this time; more detail can be added later as needed. The purpose of this final step is to begin to accumulate more detail on melodies, without subtle characteristics or functions. All this activity occurs at the middle dimension, and some of the observations are towards the small dimension.

Melody may have implications related to tessitura, shape and gesture over larger spans. ‘Observation' collects the unbiased information of shape, range, rhythm, intervals or pitches, etc. These and other qualities may be worthy of attention as well, perhaps listing variables and qualities in a typology table, or with melodic transcription or gestural mapping. Melody can have a shape that unfolds throughout a section of a song, or throughout the entire song. Attention might be brought to recurring ideas that seem to evolve to get a sense of information that may require attention. For example, in “All Along the Watchtower” (1968), Jimi Hendrix makes use of the introduction and four interludes to shape an over-arching motion of his guitar solos. Albin Zak (2005, 634–636) explains how Hendrix builds tension in his solos by climbing from C5 in the introduction, through E5, G5 and an ornamented G5 in the first through third interludes, and finally reaching C6 at the end of the song (the guitar’s highest pitch), and establishing the song’s dramatic and triumphant peak of arrival of Hendrix’s cover version. We learn in Zak’s ‘evaluation’ that a dissonant ‘D’ pitch continually appears and is resolved throughout these sections; while this evaluation is clearly ahead of ourselves, here within observations we clearly sense the tension of this pitch; our collection of melody data will reflect its presence, though at this stage would not necessarily identify its dissonance function and instability. This building arc is only evident when one brings attention to evaluating at the large dimension, though this evaluation cannot happen without some collecting of information on pitch content, peaks and shapes of the guitar solos.


Much information about harmony is wrapped in function—and function can only be determined later, by evaluating harmonic content and its inherent directed motion (and the significant contributions of melody). In making observations, we are concerned with chords: chords by letter names and chord types/qualities, perhaps also including inversions. Chord information often prominently comprises guitar parts and keyboard parts. These sonorities might be determined by pitch/chord matching, alternating listening to the record with searching for the right chord, perhaps including the right voicing. Noting chord voicing in the correct octaves will help acquire the information needed for pitch (vertical) density observations, and later allow timbral balance evaluations. Guitar tablature7 might provide helpful shorthand for some to collect guitar chord data, though transferring it into staff notation may at some point be needed for other analysis steps. Depending on the texture, this task has various levels of difficulty.

There are advantages to using letter names only as one does not need to engage the evaluation process to identify a key within discussions. Chord type and letter name removes all doubt about which chord is being observed; progressions can still be observed and noted without engaging chord functions or identifying mode.

Lloyd Whitesell (2008, 117–147) uses only chord name/qualities within his detailed discussion of Joni Mitchell’s ‘harmonic palette.’ The absence of chord functions and of typical emphasis on tonal centers brings a sense of flexibility to the discussion: specifics on chord names and harmonic rhythms and progressions, without concern over tonal centers and chord functions that are not intrinsic to his explanation of Mitchell’s harmonic language. This seems to especially work well in his discussion contrasting Joni Mitchell’s original version of “Woodstock” (1970) with that of Crosby, Stills, Nash & Young (1970) (ibid., 33-39).

There are practical concerns here as well. Using chord letter names allows more information to be collected in order to determine what key is in play. The roman numerals of harmonic analysis indicate a scale degree of a key and imply (or define) tonal function; calculating the key (tonal center) is not possible until study of the song has advanced and evaluation is in process. Separating observations from evaluations is simplified by merely naming chords; this can help minimize skewed evaluations drawn from incomplete or hastily interpreted observations.

Information is also collected on the rhythms of chord changes (harmonic rhythm). This is the simple pacing of chord changes; specific rhythms or rhythmic patterning will not be sought here. By placing chords on the timeline where they begin (or roughly so), this pacing is noted; patterns, speed and so forth will emerge later.

Cadence locations, or points of arrival or release of harmonic tension might be noticed during this process as well—although there should be no effort to identify function (which requires modal/tonal evaluation). These are typically apparent, and can be located without analyzing specifics. Finding these articulation points allows one to recognize harmonic movement and the phrasing they establish. Chord phrasing and the movement of departure and arrival should be treated loosely, as this will be refined significantly in evaluations, once tonality/modality has been established.

Arrangement and Texture

Collecting observations on the arrangement and texture begins by delineating the sound sources and the sounds of the recording, and recognizing areas of harmonic fusion or a composite rhythm creating a fusion of sources. Included in observations is relationships of source timbres of musical materials; this includes the timbral qualities of the performance and the pitch spaces occupied by the sound sources joined with materials. A dataset is generated that will later allow evaluations of how sources interact in these dimensions, and how they shape the track. These are among the fundamental observations of the arrangement. There is much alignment between arrangement and texture in music, and timbral balance in recording.

With identifying sound sources (as outlined above), observations of the arrangement began. Here, information on the presence of sources is verified, and the list of instruments/voices throughout the track is refined. Sound sources that are present within the structural sections are noted on timelines.

Connections of instruments/voices to general materials they present are identified next. Some sources were engaged during observations of melody and harmony. Remaining sources might be explored here; information on these other sound sources may be revealed by tracing how sources form groups and interact, and by the materials they present. All previous observations are refined to bring more focus to sound sources and their materials, and other pertinent sources identified. The sources are organized by general functions, among which are likely:

  • Primary melodic parts—lead vocal and instrumental lines
  • Groove elements—dominant rhythmic parts and often bass line
  • Secondary melodic parts (such as backing vocals)
  • Accompanying harmonic parts (often keyboards, guitars, string sections, etc.)
  • Secondary rhythmic and timbral parts
  • Thematic, riff or other defining gestures of sounds—whether melodic, harmonic, rhythmic or timbral

The final determination of a source’s place in the musical fabric can be delayed until the evaluations stage, as will finalizing the number and types of musical parts in the fabric. Some general observations are possible to make here, though, and they will prove helpful. An observation of the type of texture (melody and accompaniment, chordal, contextual groove, etc.) within each section (or sub-section) can be added to the timeline; this provides a sense of the number and nature of materials, and a general sense of how the sources are used within the fabric.

The combinations of sources that result in an arrangement will be apparent by the listing of sources in each section. Groupings of sources play a significant role in the timbral qualities of the ensemble; it can be a valuable observation to identify those instruments/voices performing the same or similar parts, sources that are synchronized in some way, or those that are interacting in some way. Often a set of dynamic relationships between the different sources contributes to this ensemble timbre; these general levels and relationships could be noted here, and lead to further observation within dynamics. An example of combining a variety of different sound sources to sync or interact would be the instruments that comprise a groove (see below).

The concept of sources occupying an area of pitch that extends over a defined period of time is central to the arrangement and the texture it produces. Pitch density allows access to understanding many qualities of an arrangement in the music domain, as it will later for the recording domain in Chapter 7. Pitch density exists at the perspective level of the individual sound source and the musical materials a sound source presents. These are fused into a single percept, combining the substance of the musical idea with the dominant timbral characteristics of the source. The result is a sense that the source (its material and timbre) is occupying a pitch space or bandwidth (within the full pitch-range of the record) for a certain, defined segment of time.

Pitch density unfolds as ‘scenes’ of gestures created by the source’s materials. It is mapped out as a series of snapshots of time, passing successively from one to the next. These time units are a syncrisis unit. They typically relate to the prevailing time unit or the phrase structure of materials.

Collecting observations on pitch density in the music domain uses a similar process to the pitch density as a recording element:

  • Materials are observed as occupying a pitch space (a pitch area comprised of its material and timbre)
  • Pitch space is identified by its boundaries and the interval between: (1) the lowest pitch present, or melodic contour, and (2) the highest significant partial of its timbre
  • The interval between these two boundaries establishes its range.
  • Changing levels of lower boundary over time (melodic motion)
  • Changing upper boundary caused by movement of the lower boundary or by a change of timbral characteristics (such as those caused by performance expression, technique, intensity, etc.)
  • Time span of the ‘scene’

Collecting the basic information of pitch density begins with identifying pitch material; this may have been observed when examining melody and/or harmony, depending on the sound source. Attention is next drawn to the timbre of the source to identify the interval spanning from this pitch material to that of the last prominent partial within the timbre; this defines the lower and upper boundaries of the gesture. Pitch area and timbral quality observations will assist this process; these will appear in Chapter 7. Other variables might include shifts of the pitch density area brought about by changing pitch level of the materials, or shifting upper boundary brought about by changes in the timbre of the source (such as those caused by increased or decreased performance intensity).

As a recording element, pitch density is plotted against pitch registers on an X-Y graph. This practice can be applied here within the music domain as well. In practice, observations of pitch density as a recording element can replace observations of the arrangement in the music domain.

Alternatively, one could attempt to notate these pitch areas in staff notation; a quasi-score of more than one system of staves will often be needed to clearly show most tracks. This will generate a score of the track’s pitch density and timbral balance. In many instances an X-Y graph presents the materials most clearly (see Chapter 7). The level of detail of a pitch area’s boundaries, densities and changes over time can vary with the goals of the analysis. Notations of pitch density content might be coordinated to appear as a tier on a data collection timeline, should one wish.

Figure 5.5 Pitch density graph of all instruments present within the musical fabric in the opening sections of “Let It Be” (The Beatles, 1, 2000).

Figure 5.5 Pitch density graph of all instruments present within the musical fabric in the opening sections of “Let It Be” (The Beatles, 1, 2000).

Within the evaluations process, pitch density observations will generate large dimension considerations of the arrangement’s use of range and registers. It will also facilitate upper middle dimension comparisons and interactions of pitch and timbre between sound sources.

This pitch density information allows the frequency and pitch range of the overall fabric to be identified, and provides the data for timbral balance evaluation. This can take varying forms including recognizing the highest and the lowest pitches of each section (any section) or of the entire track. It might also be a variable that can be traced from the beginning of the record to the end, representing the changing shape of the range of highest and lowest pitches, and the amount and location of activity within the range. Bringing this information into evaluations, ranges of most and least levels of activity can be identified, as well as the distribution of pitch materials throughout the pitch space of the track. The observations of pitch density can reveal much about the arrangement. There is no indication of dynamics within these pitch area relationships, though; these are revealed in the loudness levels and dynamic relationships of musical balance.

All this is explored more thoroughly in Chapter 7. There pitch density is explored as a recording element. As a recording element, each source is conceived as a level of pitch density, a level of strata. These strata of all sources establish timbral balance.


Dynamics can take place in any dimension. Information on dynamics might be collected in any dimension for individual sources or groups of sources that are in some way linked. Dynamics also functions at the large dimension, shaping a dynamic contour of the track, as well as at the smallest dimension within sounds and sound qualities.

Dynamics of musical materials and of recording are also closely linked in significant ways. The various graphs of the recording can examine the overall loudness levels and contour, and the dynamic contours of individual sound sources; these materials need not be collected here—unless there is a specific purpose to do so. It might be desirable to observe dynamic contour of certain lines, of musical balance of certain instruments or certain groups of instruments, to trace the dynamics of the entire musical fabric for a section or throughout the track, or to examine individual notes within a line—all instances at different perspectives, and many more exist.

Observations of central concern to dynamics will be loudness levels of sources, loudness levels within the sounds and the materials of sources, and loudness levels representing a balance of loudness between sources, or between groups of materials. These middle and small dimension activities might be related to metric and syncopation accents, crescendos and decrescendos, dynamic shaping and inflections in lines, and similar qualities. Musical materials interact with dynamics in a great many ways; observations of dynamics will typically be tied to elements and/or the sound sources that present them. Groups of instruments—such as those that might form the groove—will have their own overall dynamic shape, as well as the dynamic contours within the group. Often great dynamic complexity exists within grooves—and contributes to their character and drive.

Making written notes about dynamics might make use of traditional dynamic markings, though these are general at best. Individual lines and instrument dynamics can be concerned with:

  • Performed dynamic level at any moment
  • Shape of changing loudness level (contour)
  • Dynamic relationship of a source to others (louder or softer than, and by how much)

The dynamic areas adopted for recording elements in graphs such as the loudness balance graph covered in Chapter 9 as a recording element might be of greater assistance. At this stage of collection, general observations may be adequate, and will lead to deeper examination of some materials once evaluations begin. More detailed examination requires the reference dynamic level (RDL), which can only be determined in the later stages of evaluation or within conclusion; with that knowledge deeper observation and evaluation of dynamics are possible. Typology tables examining the types of dynamics and their values and characteristics, or their changes, may be used to collect unbiased observations on dynamics. This is covered in Chapters 6 and 9.

Character, Timbre and Performance Qualities

The loudness levels of sound sources (instruments and voices) carry an inherent and characteristic timbre. Changes of loudness (dynamic level) cause shifts of timbral qualities. These shifts of timbral qualities themselves are dynamics-related, as portions of the sound’s frequency spectrum become emphasized or de-emphasized as the energy and intensity of the performance shifts. This connection between how a sound is performed and the resultant timbre plays out in analysis through observations of the character of the sound (or sound source) and the timbre of the sound source. This duality of character balancing color might also be gauged as the interpretation of ‘affect, expression, energy, intensity (and more),’ that is contrasted against an observation of the substance of the sound (what is physically present).

Describing the Character of Sound Qualities

Descriptions of the character of materials, and even the musical character of instruments or voices can have value within an analysis. Descriptions that directly address the qualities of expression of lines or materials—the levels of intensity or, conversely, the degrees of passivity of a performance—can have relevance. Wrapped in with intensity, is the energy of the performance and the resultant sense of character, tension, motion, drama, stability, and much more.

The ‘character’ portrayed by a sound (timbre) is often linked to semiotics, and in variable ways to meaning attached to the character of a timbre. This is fundamentally entwined with ‘interpretation’ by the individual listener, but also by ‘meaning’ within the listener’s broader collective culture. Thus, identifying the character of a timbre is subject to some interpretations (or impressions) being personal, while others speak widely within a common culture—and are open to variation between listeners and cultural groups. “The semiotics of music . . . deals with relations between sounds we call musical and what those sounds signify to those producing and hearing the sounds in specific sociocultural contexts” (Tagg 2013, 145).8 Here this definition is extended to the timbre itself. It can be related to what Philip Tagg (ibid., 525) has called “genre synecdoche” where sound sources bring associations outside the track; sounds and musical structures that “connote paramusical semantic fields—another place, another time in history, another culture, other sorts of people.” Such outside elements might be sounds of other cultures (highland pipes, tin whistle, didgeridoo), or from the same culture, but a different context (such as a gospel choir within a rock song).

The expression and affective qualities that bring character to a timbre have some inherent qualities (qualities that are culturally learned), some qualities related to context within the track, some qualities personal to the listener, and qualities that might originate elsewhere or from any combination of factors.

Here is perhaps where subjective connotations might be used and found useful, when approached with clear communication of a common experience in mind.9 This subjective position is not overtly personal; rather, it is a softening of restricting observations to the universal that can be objectively quantified and qualified, to a position that might also embrace the expressive qualities of timbre, the affective qualities of the track and its impacts on the meanings of timbre, and on the energies and intensities sounds carry that sit squarely within the timbral qualities of sound sources. Embracing the character of timbres will be especially useful using the broader, more ‘universal’ or ‘cultural’ perception as a starting point for description, then weaving more personal interpretation as might be appropriate. How a timbre might bring the listener to feel on a personal level, or the energy, intensity or meaning that the timbre might elicit is difficult to navigate, and can often generate descriptions that are meaningful only to the individual. Using language to describe the character of a timbre is often tenuous ground; problems of precision of matching term and timbre are to be expected, as definitions of terms between individuals and cultures (and geography) can shift. These communications are never fully adequate or accurate, though they are often as good as it gets.

We have few words in our vocabulary to describe ‘how sounds sound,’ and do not have an objective, unbiased approach to describing the content and character of sounds. Still, talking about the sound of the record is a vital part of analyzing the record, the music, lyrics and the recording. Describing sound is also central to observing the characteristics of timbre, expression, performance qualities, and more. In our attempts to describe sounds we rely on terms from other senses, analogy, associations, feelings, affects, and a host of other well-intentioned but ineffective (or counterproductive) terms. Without careful attention, it can be difficult to avoid terms that do not present quantifiable information, and do not represent the experience shared between others. In the absence of terminology and methods, we easily fail to make meaningful assessments (Moylan 2017), and our analyses fail to communicate clearly.

The literature contains no small number of descriptions of tracks, sounds and musical materials that are filled with words that communicate little more than an author’s feelings or personal impressions, and do not address either the content or the character of the material within the context of the track. I leave it to the reader to recognize examples of such writings; you may have already encountered some. Descriptions effective in communicating tangible and relevant observations (and subsequent evaluations) about the qualities of materials are certainly possible, and a process and several resulting descriptions will be offered in Chapter 10.

With the timbre imbedded within the context of the sound stream, it is heard and presented to the listener within context of the track. The timbre is situated within the track; its character and the meaning of its character are formed (at least partially) by this larger context. The timbre is not isolated from context (as a sound object), but rather holds a character that has meaning within that context. As we are starting to recognize, the track’s context is rich and complex. The timbre of the sound is not only sound quality of the source and its musical material, it is also language and performed lyrics, and the persona of the performer(s). As we will encounter later, it is also the timbral definition that establishes distance and the sound qualities of environments, among other qualities of the recording.

The impressions of character are thus the result of many factors interacting into an overall quality—a quality that might be of the individual sound or of the sound source (and the materials it presents and the qualities it holds), a group of instruments or perhaps the entire work. Character is thus identified within the ‘conclusions’ process, where evaluations of elements are considered and the richness of their confluence might be recognized. This will be explored in greater detail in Chapter 10 under conclusions.

Recognizing the Content of Timbre

Defining the timbre of a sound is different from describing its character. Talking about the content of a sound requires engaging the sound itself—noticing physical properties as well as their affect. Describing timbre is defining the substance of the sound, not the characteristics that make a psychological impression on the listener—or that bring an interpretation of the sound or material within the context of the track.

Listening for timbral qualities requires listening inside the sound. Our natural approach to listening to sounds is to recognize overall quality; for example, we recognize a person by the sound of their voice, but are not aware of what makes that voice different from all others. Listening inside the sound brings awareness of its subtleties, and to the variables within timbres. We clearly detect minute differences, but are not practiced at identifying the nature of those qualities; instead we identify what those qualities elicit. We hear very subtle changes in the timbre of voices, and we recognize a change in meaning, a change in expression, perhaps a change in mood, but we are not aware of the content of the change. Observations of the content of timbre are most accurate and relevant when recognizing and describing the activities of the sound’s inner workings.

Timbre is one of the two dominant elements of recording. Chapters 7, 9 and 10 will cover timbre analysis in detail; there this contrast of physical content with character will be explored in depth. Those concepts can also be applied here to music and lyrics, when suitable. The following table presents a more general and simplified set of variables for examining the content of timbre that may prove more workable within the contexts of music and lyrics; it will often be adequate for addressing the concerns of timbre in these domains.

Table 5.1 Variables for observing the general, physical characteristics and content of timbre.

 Variable Values or Characteristics 
 Pitch or Pitch-Range Defined by the pitch-range of the change (for example, C5 to F5), or by identifying a pitch or pitches within the range and the approximate interval of the range 
 - Type of Change Increase or decrease of loudness level of the pitch-range 
 - Degree of Change Description in general but objective terms such as: minute, slight, noticeable, moderate, substantial, pronounced, extreme. 
 Dynamic Shape or Envelope Defined by speed (related to pulse or clock time) and attack level (related to sustain level or previous sound) and decay shape (contour and duration of levels) 
 - Type of Change Increase or decrease of speed, increase or decrease in attack level, increase or decrease of the level of the sustain and of the initial decay 
 - Degree of Change Description in general but objective terms such as: minute, slight, noticeable, moderate, substantial, pronounced, extreme.

Rhythm, Gesture and Groove

Rhythms and rhythmic patterning exist in all elements and in many structural levels. Microrhythms on the small dimension, middle dimension surface rhythms, and the macrorhythms of large dimension traits exist in all styles and genre. Most importantly, these rhythms at different structural levels function to shape those styles and genres as a (more or less) unique attribute. Discussions presented here should be taken as one example from the diverse popular music catalog. Electronic music contexts often treat rhythm, gesture and groove differently than more performance-based popular music. It is toward performance-based genres that many comments are directed—and for each comment, there will be exceptions, and perhaps more exceptions than tracks that conform. These following comments are not rules, but rather general observations, as rhythm—at all strata—might be the element that most deeply characterizes genres and styles of popular music.

Collecting information on rhythms might occur within observations of another element (such as harmonic rhythm, for example) or within musical materials (such as a vocal line); these are examples of surface rhythms (the rhythms of materials and sources). Within any musical style, rhythm data might uniquely emerge from any material or sources, and at any dimensional level.

Propelling and organizing surface rhythm is tempo and meter; tempo’s elasticity in some tracks and styles of popular music will warrant observation. Tempo and meter are the references for calculations of rhythm in most contexts of tracks. Structural rhythms, such as the rhythms of verse/chorus exchanges, or of hypermetric phrases are macrorhythmic and proportional extensions of meter and/ or pulse; these tend to be evaluations that are recognized after several levels of structure have been identified.

Figure 5.1 reveals macrorhythmic groupings and the prevailing time unit’s regular pulse. The prevailing time unit and various structural levels of hypermeter can be recognized within the stages of establishing a middle dimension timeline. These structural aspects of rhythm may be important additions to the middle dimension timeline, as they may serve as reference time units. Further, they lend a sensation of musical movement and thereby significantly assist observations of other elements (as just seen with pitch density), and later evaluations in all elements and domains.

The rhythmic qualities of popular song are perhaps foremost (along with timbre) among the qualities that establish its essential character and sound. Rhythmic qualities often manifest within a groove’s tight patterning and/or the rhythm section’s more flexible layers. These slightly different concepts each establish a textural accompaniment containing percussion’s rhythmic and timbral pulsation, harmonic/ rhythmic materials from guitar or keyboards, melodic/rhythmic patterns by a bass, and perhaps other contributors such as winds. In grooves of tight patterning, even the voice might participate, as Anne Danielsen (2006, 108) identified in James Brown’s productions from the late 1970s: “[T]he lyrics acted first of all as part of the groove and/or as comments on the qualities of the groove.” Both the contrapuntal layering and the tight patterning may take many forms, and each form contributes to delineating one genre and style from others. Electronic music contexts generate grooves that differ from those that are performed, though the sources themselves need not be the factor that determines microtiming and microrhythmic deviations and patterning, and interplay of parts; rather than a dichotomy between two types, though, there is a continuum of groove types with more than two poles of variables. Indeed, not all grooves emphasize microtemporal deviations from the pulse, some grooves (especially electronica-related styles) are generated with all rhythmic events aligned with the metric grid, with the square feel of quantized evenness and an exaggerated tempo (Danielsen 2010a, 2). What is important here is the sense of approaching grooves as rhythm+pitch+timbre+dynamics textures (perhaps performed by many individuals, though perhaps not) in a way that might be applied to popular music styles that have yet to emerge.

The groove is situated in the middle dimension, at a level higher than individual sound sources where sources interact. While multilayered with timbres of different instruments—typically drum kit, bass, keyboard or guitar, though this can expand or contract—it functions as a single musical statement, and is appropriately observed as such. While each instrument contributes something substantive, the groove is a composite of rhythmic activity, timbral qualities, texture, dynamics and energy. The groove engages an interaction of musical elements as well as an interaction of instruments (and as we will discover, it can also engage recording’s elements). Its complexity can vary widely between types of music, and can even shift within the track; while we immediately grasp the groove as rhythm-related, the interactions of dynamics, timbre and pitch-register also contribute and make the cross-relationships (and cross-rhythms) of elements significant to its overall quality (Danielsen 2006, 50–52; 2010a, 10). This section will seek to recognize the rhythms of the groove instruments, along with some approaches to getting it on paper as appropriate. Observing the rhythm section’s groove can bring one to engage great nuance in the performances and materials, as well as broader gestures of rhythm, motion and energy. Collecting information can take a variety of forms.

Richard Middleton (2000a, 105–112) underscores the musical ‘gesture’ in relation to the ‘performance’ and the experience of somatic movement. The starting point for his ‘gestural modeling’ of songs and song-types is the rhythmic ‘groove’; interactions of drum kit, bass line and other instruments (perhaps guitars, keyboard and horns) produce a gestural center. The rhythmic groove represents a ‘given’ around which many popular songs are oriented, and which vary between types of music. He offers a graphic of the gestures within the groove in Madonna’s “Where’s the Party” (1986). This concept of gestural modeling and graphic notation illustrates the various layers of texture with shapes depicting motion of elements or materials. Middleton defines the groove by identifying the rhythms and characters of its parts (instrumental lines) and their interactions; some materials are transcribed, but much is left undefined, save for general graphic shapes extending over time. Analysts may find value in this general data collection, especially for initial observations of some materials—in as much as this might align with the track and the goals of the analysis.

The groove, however, often calls for careful examination to understand its energy and rhythms. Anne Danielsen (2006, 100–103) has examined funk grooves in substantial depth, and demonstrated her findings with notated examples capturing much nuance. Her analysis of the two grooves in James Brown’s “Funky President” (1974) provides clear examples of its rhythmic relationships and patterns, cross-rhythmic tendencies and pick-up gestures; as an example, the song’s underlying feel of four against the first three beats of the bar within its first groove is clearly evident in notation that illuminates the rhythmic complexities upon listening. This approach is in stark contrast to the graphics of Middleton’s gestures; it will be appropriate to some tracks, and others perhaps not as much.10 Danielsen (2006, chapters 37) substantively embraces the concept of rhythmic gesture, but differently. Danielsen (2010a, 6):

An important observation of ‘gesture’ is offered by Danielsen (2006, 47):

Danielsen (2010a, 10) further defines gesture: “Whereas a stylistic figure is no more than a preliminary condition for musical performance, the gesture is the music as performed for someone.”

The notion of gesture can be explored and represented in great detail. Gesture is inherently rhythmic; though this rhythm may manifest in any element—as with the timbre of James Brown’s shouts. Gestures are also shapes—as with Madonna’s harmonic gestures creating shapes of repeating chord-sequences in (Middleton 2000a, 112). Gesture may manifest in a more general temporal, dynamic, timbral, registral shape; it may also have great precision in its material, or subtlety of expression.

Peter Winkler (1997, 180–186) brings attention to the groove’s subtle shadings of the subdivisions of the beat and the shadings of different beats within the measure, along with the often-found backbeat—the accentuation of metrically unstressed beats. He identifies the rhythmic pushing and pulling of the beat by the rhythm section resulting from the bass riff patterns, and the microrhythmic deviations from a ‘metronomically exact center.’ With the use of an automatic transcription device11 he is able to calculate deviations with great precision, revealing unexpected differences in the durations of beats and points of attack—though Western notation cannot capture these essential traits. Upon studying the lead vocal, Winkler makes some notational suggestions that will be presented below.

Making observations of the groove pulls one into recognizing activities of the drums and the bass, perhaps with guitar(s), keyboard(s), horn section, or more. Activities of rhythms, placements of the beats against the metric grid, subdivisions of beats, dynamic accents and timbral changes, are all part of the groove. Further, not all instruments may be involved in approaching microrhythms in the same way (or at all). Minute rhythmic tensions may emerge among instruments and parts, and between them; the subtle complexities of grooves can be quite intricate. Transcribing the groove nearly always brings percussion into notation and some significant rhythmic complexities are often imbedded.12

Figure 5.6 Basic groove in “Cold Sweat” (1967) by James Brown (standard pattern indicated in circles). Figure from Presence and Pleasure: The Funk Grooves of James Brown and Parliament © 2006 by Anne Danielsen. Published by Wesleyan University Press. Used by permission.

Figure 5.6 Basic groove in “Cold Sweat” (1967) by James Brown (standard pattern indicated in circles). Figure from Presence and Pleasure: The Funk Grooves of James Brown and Parliament © 2006 by Anne Danielsen. Published by Wesleyan University Press. Used by permission.

Figure 5.6 is from Anne Danielsen’s Presence and Pleasure (2006, 74-76), and shows the groove from James Brown’s “Cold Sweat” (1967). In the transcription the standard pattern of the groove is circled, though its elaborating elements are intricately complex. The drumming is woven together with the rest of the band (does not appear as an independent layer), and presents accents, syncopations and performance techniques to bring the sixteenth note as the smallest division present. Horns play to a different density referent (the pulse of melodic and rhythmic shaping as the shortest possible duration) of the quarter note, guitar to eighth notes, bass mostly to eighth notes, and saxophone and drums to sixteenth notes. The track’s phrasing and subdivisions imply different durations that do not fit, and provide its funky style.

The goal of a groove transcription might be to recognize and capture the groove pattern (or some semblance of it), which typically repeats for significant portions of the song. Alterations of the groove might occur, and those can be noted by how the groove has been modified, and where on the timeline modifications occurred. In some tracks a second groove can be present, and its traits might be observed likewise; “While My Guitar Gently Weeps” (1968) is an example of a track with grooves alternating between song sections. To hear inside the groove, some transcription can be helpful; how much is needed, or if it is needed at all, cannot be anticipated until one is engaging the record itself. Much subtlety of microrhythm is common, and is integral to the groove. While these microrhythms define the groove’s character and energy, they often elude clear and accurate notation.

Western notation of rhythm is cumbersome for small fractions of the beat; for many reasons groove rhythms can get extraordinarily difficult to represent. David Brackett (2000, 63) has incorporated arrows in his notation to show in performance the note arrived slightly before or after the metric beat; such imprecision might be adequate for some analyses. Observations of the nuance of performance will follow, within our examination of performing lyrics. Anne Danielsen (2010a, 10) articulates this fundamental connection between performance and the experience of rhythm, that defies our notational resources:

Embracing complex rhythmic lines as gestures, and engaging suitable notation and data collection, may allow multidimensional interrelationships of subtle detail to be recognized without being obscured by traditional notation devices.

Table 5.2 summarizes some general approaches toward collecting observations of music’s elements in a manner that limits personal bias, and seeks a neutral, objective and non-personalized manner of data collection that is promoted throughout this section (and which we acknowledge is not humanly possible).

Table 5.2 Summary table of general approaches to approximate neutral, non-personalized, objective, unbiased observations for music elements.

 Sound Sources (Listing of Timbres) Identify and name prominent instruments and voices; groupings of instruments; individual sources within groups; define musical parts by instruments/voices 
 Structure Tempo and meter; identify primary song sections; PTU and section groupings, and section subdivisions might emerge 
 Melody Gestures: contours and shapes against meter; transcribed melodic lines, riffs, motives, bass lines, and vocal lines 
 Harmony Chord names, rhythmic placement and durations of chords; harmonic gestures (sequences of chords); harmonic themes and characteristic progressions; chord voicings 
 Arrangement and Texture Identification of surface-level materials, and sources presenting them; pitch density of sources and their materials, placement of sources in registers, denote overlapping ranges; gestures of materials plus timbres; identification of texture type 
 Dynamics Gestures: dynamic shapes, contours; terraced dynamic changes; dynamic relationships of sources 
 Characteristics of Timbre and Performance Intensity Non-personalized observations or impressions of expression, affect, intensity and energy for, and any cultural meanings of individual sounds, sources, gestures, parts, lines, etc. 
 Rhythm Gestures: speed and contours of any element against time; transcribed rhythmic patterns, cells, riffs; groove: gestures and/ or transcription of individual parts/sources, and cross rhythms generated by interactions


Lyrics observations fall into two broad categories. First is the literary text itself, its characteristics of structure, use of language, and ideas. The second is lyrics as they are performed in the record; the vocal line’s presentation of the lyrics is a complex fusion of speech and singing, melody and sound qualities, musical expression and dramatic representation, voiced utterance and paralanguage, and more. The types and qualities of relationships between the vocal and the music are significant in the record, and are seen in the alignments of their phrasings and in the level of interplay of their materials.

Literary Analysis of Lyrics

Data collected on the lyrics is used to understand its content, ultimately as it relates to the vocal performance and material. Examining the lyrics in printed form, their most apparent structural dimensions are laid out before the reader. Structurally, stanzas are in the upper middle dimension, and individual lines of the lyrics are at or around the basic-level in the middle dimension.

The observation process identifies the number and ordering of stanzas, and some fundamental characteristics of each. Stanza structures contain a hierarchy of number of lines, line lengths (number of syllables within each lines), rhyme schemes (of various sorts) between lines within stanzas, and any other devices. Patterning of stanzas (as in verse – verse – chorus successions) and stanza groupings (as in verse+chorus pairings) will emerge. Reading the lyrics internally and aloud (as one would poetry), rhyme types, devices such as alliteration will be revealed, as well as rhythms of line lengths and internal feet patterns (inherent rhythms and meters within lines), and more. These aspects will become objects of attention and of careful examination at the ‘evaluations’ stage, where their interrelationships with the structural strata of musical setting become of interest.

Structural qualities of both the poetic and prose contexts may emerge from this process. Observation can then shift to the content of lyrics. The content of lyrics and their structural design, their meter and prosody, all contribute jointly and individually to the flow of motion and the generation of shape— these observations might ultimately lead evaluations to identify ‘how’ these are accomplished.

Lyrics often have “formed the basis for the interpretation of popular songs” (Brackett 2000, 192). Observations on the content of the lyrics might seek to identify the topics in Table 5.3. This table might establish a rudimentary start to exploring lyrics’ content; it is by no means exhaustive.

Table 5.3 Basic observations of the story.

 Theme and subject Story 
 Narrator Protagonist 
 Narrative vantage point (first-person, etc.) Audience and/or narratee 
 Temporal setting Language style, dialect 
 Tone Voice and persona 
 Location, place

The individual traits of the lyrics will bring unique twists in exploring these matters. It seems too apparent to state (and too important to ignore): the content of song lyrics is deeply original to each track, fundamental to its communication and message, and immensely varied. Following are brief summaries to provide a bit of illustration. Collecting information for observation and considering it in evaluation blend into a single activity most distinctly in lyrics; bearing in mind when one is holding objectivity, and when one is forming ideas, may allow the process to evolve most efficiently. Certainly in interpreting lyrics, one may be quick to assume a position of personal experience, rather than to formulate an objective reading of the intent of the artist or of a more universal meaning—and some might propose the interpretation of the listener (analyst) might be more relevant, more pertinent, or otherwise more valuable.13 In this writing we will continue to seek more universal (or culturally shared) interpretations with direction towards some commonality within recording analysis; the following provide a few examples.

Lloyd Whitesell (2008, 78–116) dedicates an entire chapter to thematic ideas in the songs of Joni Mitchell. His observations are refined into thematic threads running throughout her career’s work. The threads include the themes of ‘traps,’ ‘quests,’ ‘talent,’ ‘flight,’ and ‘bohemia’—quite removed from the most common themes of popular song. Whitesell focuses “on ‘voice’ in its literary-technical sense, indicating the vivid fictional characters and implied speaking presence in Mitchell’s poetry” (ibid., 42) to more deeply explore the illusions of fictional voices and the persona types characteristic of her work (ibid., 41-77).14

Returning to Elvis Costello’s “Pills and Soap,” David Brackett (2000, 192–194) explores its lyrics and extracts Biblical allusions and clichés from nursery rhymes, recurring figures in various guises (accumulating meaning) and the effects of an intricate use of pronouns. He notes the lyrics “do not convey a straightforward narrative, . . . They express neither loss nor desire in any overt way. They do fall loosely into what might be termed the ‘social criticism’ genre” (ibid., 192). His complete analysis presents an interpretation of the lyrics, with supporting evidence from the musical setting.

Brackett also examines the lyrics of Hank Williams’ “Hey, Good Lookin’” (1951); it addresses some different concerns that will transition us to the next section. This analysis includes aspects of the performance of the lyrics; it notes the sustained syllables in each line and their relevance, and the differences and interactions of durational and sonic stress on words within the verses. The stresses change meaning and mood. He offers: “[T]he performance of the lyrics . . . affects their meaning. We should first note how the performance stresses certain words through duration and sonic (rhyme) so that they reinforce the rough formal outline . . .” (ibid., 82). Brackett’s detailed analysis extends through analyzing the lyrics’ content, to situating them culturally, and more (ibid., 75-107).

Qualities of Performed Lyrics and the Vocal Performance

Performance situates lyrics within the time passage of the track. The lyrics’ lines occupy sung phrases. These phrases represent ‘verbal spaces’ (Griffiths 2003)—the spaces of time occupied by lines of lyrics—and also by the word spacing (rhythmic density and distribution) of lyrics within lines and (thereby) within stanzas (Griffiths 2013). These are convenient concepts, allowing the words to be mapped out against the song’s timeline, as in Figure 5.7. The phrases formed by each line are articulated on the timeline, showing beginning and ending points. Lines of text can be displayed, as well as the general rhythm of the line; this allows the syllabic density of the text to be visible. Proportional relationships between lines, words and stanzas can be made apparent, allowing one to evaluate their significance and contributions. Should the track warrant, other qualities of the text might be incorporated into the graph as well; some of these qualities might be rhymes, repeated words, emphasized (stressed) words, rhyme scheme, number of syllables per line, or significant word sounds, etc. The graph might readily accommodate expression characteristics and non-language sounds as well as other characteristics of vocal style.

Vocal style shapes the sound qualities of the lyrics in fundamental ways. Style might be considered in the following ways:

  • Inherent qualities of the singer’s unique voice
  • Performance style idiosyncrasies of the artist
  • Use and sounds of language
  • Expression characteristics
  • Use of paralanguage and non-linguistic sounds
Figure 5.7 Select observations of lyrics incorporated into timeline, within Verse 1 and Bridge 1 of "While My Guitar Gently Weeps" (The Beatles, 1968).

Figure 5.7 Select observations of lyrics incorporated into timeline, within Verse 1 and Bridge 1 of "While My Guitar Gently Weeps" (The Beatles, 1968).

All singers have voices that are unique, just as all speaking voices have unique characteristics. These inherent qualities are present within their performances, and are reflected in the timbre of their voice. The manner in which the individual articulates vocal sounds, the singer’s natural resonance, and other added qualities combine within the individual’s overall sound. The variables for observing timbral characteristics from Table 5.1 might be of assistance in defining some observations on vocal quality. Allan Moore (2001, 35) observed, “[O]ne of the problems in discussing vocal quality, indeed in discussing the timbral qualities of any instrument, is the absence of a standard, easy visual representation. Conventional notation is helpless here.” This approach to describing timbre can provide meaningful information on timbres and vocal qualities, but it has limitations; it is not fast and simple to create and does not blend into conventional notation. In listening to vocal style and timbre, the characteristics within sounds are the focus of attention; this way of listening is seeking information underneath the basic-level of the identity of sounds. This way of listening holds for all observations related to timbre, vocal style, and even performance techniques and transcription of the subtleties of vocal performance. The ‘character’ portrayed by a vocal sound or a vocal line might be approached as above within the discussion of timbre in music. As above, subjective connotations might be used and found useful in communicating about the ‘character’ of a sound to a broad readership—if the description is crafted with attention to culturally valid observations, and an awareness that the personal typically communicates about one’s own experiences and not often the experiences of others.

Artists also have vocal mannerisms; these are performance style idiosyncrasies that contribute to and often largely define the ‘sound’ of artists. These mannerisms can be related to expression, enunciation or non-linguistic sounds, as much as related to how the individual shapes vowel sounds, uses vibrato or their attitude toward intonation. Idiosyncrasies may be related to performance style such as characteristic pitch ornamentation or endings of lines, as much as the grunts, screams or breathing that define the performances of some singers.

Individuals have accents and dialects that shape their language sounds. This may influence their singing style, though it need not; it is likely to be readily apparent in rap, and speech-singing of any type. Sounds of language, often reflected in specific words, might be unique to the artist and therefore the track. In observations, sound qualities might be defined using the IPA symbols; they may then be noted and related within the form of the lyrics above.

Qualities of expression and those of nonverbal sounds are intricately woven into performances. Observations of these qualities, as well as other qualities of vocal style, might benefit from an organized approach to assembling the qualities (or variables) that are relevant to the performance and the values (or characteristics) that are present within the performance.

Table 5.4 represents a summary of aspects of vocal sound that might be used to assemble a typology of the vocal style. This typology would be unique to the individual singer, and to that singer’s performance on the record; this table represents a point of departure to be adapted, not a definitive listing. Such a typology might function for the entire track with large dimension observations, or middle dimension observations that remain unchanged. A typology might also be assembled for sections that vary. It should be apparent some elements might change frequently within lines, and demand a more detailed assembly of characteristics; singers commonly shape affects and shift emotions within phrases, sometimes even within words.

An example of the significance of vocal style and expression techniques has been provided by Serge Lacasse (2010a) in “The Phonographic Voice: Paralinguistic Features and Phonographic Staging in Popular Music Singing.” In it he examines Tori Amos’ 2001 version of Eminem’s “97 Bonnie and Clyde.” Lacasse examines the many paralinguistic features used in Amos’ performance and interpretation. Incorporated into the essence of the performance are breathing effects, whispered and murmured voice (acting as both qualifiers and differentiators) as well as several alternants—lips moistening, swallowing, sighing—all with much subtlety. The paralinguistic features’ sonic qualities and contributions to the performance and to the track are made apparent; illustrations of the paralinguistic features allow the reader to clearly recognize their significance and roles in the track’s artistry and communication. Lacasse incorporates spectrograms into his observations in order to identify the frequency content of these sounds, and to supplement aural interpretation—noting the absence of any system of notation for paralinguistic features. Observations of paralinguistic features might be noted within the vocal line, incorporated within the timeline as alternants (stand-alone sounds), or collected as part of the qualities of lyrics. As evidenced here, exploring the effects and the timbral qualities of paralanguage features may provide significant insight into the sounds and expression of some tracks.

Table 5.4 Typology table for defining vocal style.

 Variable Values or Characteristics 
 Range of voice 
 Register(s) of voice 
 Degree of resonance and placement 
 Degree of diaphragm support 
 Attitude toward pitch 
  - Intonation 
  - Types of inflections 
  - Rhythmic placement of inflections 
  - Vibrato 
  - Glides and bends 
 Attitude toward rhythm 
  - Note placements relative to beat 
  - Precision of subdivisions 
  - Relationship to spoken word 
  - Rubato or free vs. polymeter 
  - Microrhythms compressing or stretching the beat 
  - Manipulation of time, pulse 
 Inherent timbral qualities 
 Inherent performance style 
 Language sounds 
  - Frequency shifting states 
  - Degree of shift 
  - Non-linguistic vocalizations 
  - Wordless expressions of feelings 
  - Body-generated sounds 
 Level of ease of the singer

Transcribing Vocal Lines and Performed Lyrics

Observing the qualities of vocal lines is filled with challenges, owing much to the importance of nuance in the lines. A number of approaches might be appropriate to the analysis, at one time or another. Observations to collect data might engage transcription to hold those subtle qualities still. Such a transcription might generalize pitch and rhythm (perhaps include expression and dynamics (if addressed at all). Others may base observations on the content and sound qualities of lyrics and presented by the vocal melody, with only cursory transcription of portions of melody. Incorporating a typology of vocal style might help define expression and performance techniques enough to characterize a vocal performance to adequately subsidize a transcription of strategic passages in generalized (pitch and rhythm simplified and adjusted to conform to notational conventions) melodic notation. Other transcriptions may be filled with pitch, rhythmic and dynamic details, with rhythmic placements of precise phonetic qualities of the lyrics, and with speech-singing, expression affects and paralinguistic sounds.

The matters sur rounding notation covered earlier illustrate how problematic vocal transcription can be; even the most accurate and detailed transcriptions will only present a close correspondence between notation and what is actually heard on a recording, and only within a limited number of elements. This is the best one can hope for when using a system of performance instructions to represent sound. Still, transcription can be useful for many purposes, and might be flexibly applied; few other options are available to make the record hold still for evaluation. The level of detail of the transcription can be part of the decision-making process of the goals of the analysis, and what information needs to be thoroughly examined to achieve those goals.

Figure 5.8 Quartertone notation demonstrating enharmonic spellings using one- and three-quartertone symbols.

Figure 5.8 Quartertone notation demonstrating enharmonic spellings using one- and three-quartertone symbols.

Figure 5.9 Notation for an indeterminate notation for microtone and microrhythm deviations from conventional notation values, and for speech-singing.

Figure 5.9 Notation for an indeterminate notation for microtone and microrhythm deviations from conventional notation values, and for speech-singing.

The microtonal inflections in pitch are difficult to capture in traditional music notation. Quartertones can be noticed in the techniques of many singers, as well as smaller deviations of pitch. Figure 5.8 presents a notation for quartertones, a precise pitch that resides between the half-step intervals of the chromatic scale. Figure 5.9 contains arrows above note heads representing a slight raising (upward arrow) or lowering (downward arrow) of pitch; these intervals are not as large as a quartertone, and are a slight but undetermined degree of sharp or flat. These arrows can be defined in several ways; they can represent a slight raising (upward arrow) or lowering (downward arrow) of pitch, or they might represent an undetermined degree of sharp or flat. These options allow the transcription to address intonation matters with varying degrees of detail or accuracy.

Microrhy thmic inflections are equally difficult to capture in traditional music notation. Notation of microrhythms can quickly be an unreadable mess of ties, with short duration notes and rests tied to others. This results in a complex notation, that is difficult to identify, a challenge to notate and cumbersome to read. Rhythm is extraordinarily complex in the vocal line. The rhythm might flex with speech inflection, it might assume a relaxed relationship to the beat similar to rubato, or it might compress or expand the beat in such a way to establish a polyrhythmic relationship to other musical parts. These subtle shapings are just a few that are possible. In listening closely to singers, one becomes increasingly aware that portions of a word’s sound fall in specific places rhythmically. As examples, some words will start with a consonant preceding the beat, others are slightly late; some internal consonants might be sustained, others punctuate the rhythm by precise percussive placements; ends of words are often as important as beginnings in terms of rhythmic placement and audible articulation. The arrows above the pitches in Figure 5.9 designate that the note began slightly earlier than written (left pointing arrow) or the note was slightly delayed (with the right point arrow designating the beginning of the sound pushed after the note head); the amount of this displacement is, however, indeterminate.

Rhythms of and within words and lyrics are part of the vocal line. Syllables and words may be contained within the melody as clearly articulated syllables on each note. Words are often much more nuanced in performance, and vocal style brings much attention to the timbres of syllables, and in the ways syllables morph into others within words. Drama can play out within words, as lyrics unfold and are interpreted by the singer. The rhythm of the vocal might reflect any of these, in addition to any rhythmic placements generated by paralanguage sounds, nonsense syllables and other singer-produced sounds. All of these might be of concern in collecting observations.

Writing observations of the text can include unusual divisions of words, denote prolonged vowel sounds, or the placement or sustaining of consonants; unusual word usage may bring important qualities to the track. Recognizing extremes, some vocal styles emphasize or enunciate articulation and endings of words, while some blur language by slurring words; these same effects might appear as temporary vocal techniques to support the context of the track. The sound qualities of word usage within the text therefore might warrant observation. IPA symbols can aid in identifying, notating and evaluating some language and non-linguistic sounds; others might be best recorded as letters or in some letter combinations denoting a sound (“zzzzzzk”). Some of these observations might be clearly added to a timeline (as Figure 5.7), others might be more appropriately added within a transcription of the vocal line (as Figure 5.10).

Timbral qualities are central to vocal quality, to language and to performance’s shaping of the vocal line; the timbral qualities of the voice and the timbral qualities of words and syllables may blend into unique sounds. Any of these may generate an observation central to the track, and thus be worthy of close examination.

Here, again, it is impor tant to acknowledge and separate ‘character’ of timbre and its ‘content.’ Describing character, as above, may be pertinent and appropriate to the track or sound source, though the content of the sound is often the actual object of discussion. Timbre’s content is difficult to observe—in order to collect data, describe and explain—and requires deep listening. Further, we have few tools to collect this information, and they all are quite complex, which is needed to address the complexities of timbre itself. Listening to the inner workings of timbre and engaging approaches to observe and evaluate timbre are outside the prior experience of most analysts. Depending on the goals of the analysis and the skill set of the analyst, any one or combination of these following options might be pursued:

  • The process of observing the characteristics of timbre from Chapter 7 and/or Table 5.1
  • Using one or more of the fields in the typology listing of Table 5.4
  • Incorporating IPA symbols into a description (see Chapter 4)

Finally, performances of popular songs can shift between singing and speaking; sometimes the singer is neither singing nor speaking. Speech-singing allows the spoken voice to have pitch qualities of singing, while maintaining some quality of spoken word. Figure 5.9 also includes an accepted notation for speech-singing that is similar to standard pitch/rhythm notation but substitutes an ‘x’ for standard note heads.

Figure 5.10 The line “Nothing is real” from “Strawberry Fields Forever” (The Beatles, 1967) as it appears in each chorus.

Figure 5.10 The line “Nothing is real” from “Strawberry Fields Forever” (The Beatles, 1967) as it appears in each chorus.

Figure 5.10 illustrates the four performances of the line “Nothing is real” from “Strawberry Fields Forever.” Within this figure many of the topics from previous pages are applied. Notice the complexities of the rhythms, the use of arrows to designate indeterminate raising of pitch, the quatertone intonation (providing expression, interpretation and motion), the use of IPA and other approaches to word sounds, and speech-singing. Above all, note how each is unique; for instance, the word ‘real’ appears in front of the beat once, and a sixteenth behind the downbeat the three other times, and its intonation in each instance is slightly different. John Lennon’s performance of the word ‘nothing’ varies very little over the four presentations of the line.

Observing the Relationship of Vocal and Lyrics to the Musical Fabric

The opposition and interrelationships between the vocalist and the other performers (and the musical accompaniments they generate) can represent a central concept in the track. Observations pull out information on how the vocal line—including the persona of the singer, the content of the lyrics, and the performance of the line—is situated in the song relative to the musical accompaniment (and all that it contains).

Data on this interrelationship has been collected through explorations of the musical materials and their sound sources, and the arrangement. That initial data collection will be revisited at the evaluation stage to explore this relationship, seeking to reveal more nuance of some materials or adding other sources to those explored. These evaluations are only possible with more directed observations that seek to understand the relationship of the accompaniment and the lead vocal.

The five states of accompaniment discussed in Chapter 3 will provide a point of reference to these observations, and warrant review. With the vocalist in the primary role of the song, the accompaniment can take on the following functions:

  1. Contextual function with supportive traits
  2. Mostly supportive function with some contextual and ornamental traits
  3. Function clearly supports the vocal narrative
  4. Primary function with support traits
  5. Primary function

These functions can change at any time, though changes typically happen at section changes. The function of the accompaniment might differ from introduction to verse to chorus to middle-eight to coda; it may also establish a grounding and reliable context for the vocal that stretches throughout the track. Alternately, any backing material may emerge as significant, and assume the primary role for moments or for sections of the track; they may enter into dialog with the track or contrast. Options for interaction are innumerable. David Brackett (2000, 92; emphasis in original) engages this relationship as well the quality of the vocal performance when he observes an “important element in Williams’ singing is the rhythm of his vocal line—the way it either emphasizes the underlying pulse of the band or strains against it.”

This interchange is often evident within or supported by relationships of the phrasing of the vocal line against the accompaniment. The alignment of the vocal’s verbal space with the accompaniment’s phrasing can be significant, as can the prominence of the accompaniment’s phrases in drawing attention away from the voice. The strength of the prevailing time unit and hypermetric units can add to or limit the strength of phrases within the accompaniment and its layers or parts. Observing these interrelationships can contribute to collecting pertinent information of the functions of accompaniments, and of the characteristics of how the vocal is situated within the musical fabric. There is no one approach to observations that can establish the groundwork for these evaluations. Each track’s accompaniment establishes a more-or-less unique relationship to the lyrics and the expression of the vocal; the five functions of accompaniment merely serve as guides for the data collection of observations that might ultimately reveal this interrelationship.


Observations for lyrics and music can rely heavily on traditional approaches to music analysis, in the steps that precede determining function and organization. The processes of evaluating observations toward discovering the conclusions of how the song works will appear in Chapter 10. Our focus will now shift to the recording, and away from coverage of music and lyrics—though they will not be forgotten.

The chapters that immediately follow cover the elements of recording, and the unique challenges found in engaging them fully. Collecting observations for the elements of recordings will be woven throughout those chapters. The information on recording’s elements is collected through creating X-Y graphs, sound location diagrams and/or typology tables—observations take place in unique ways for each of the elements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.