Chapter 11. Using Paths

path elements are SVG’s answer to drawing irregular forms. Anything that’s not a rect, circle, or another simple shape can be drawn as a path. The catch is that the syntax for defining path values is not particularly human-friendly. For example, here is a line that we’ll generate from data in this chapter. Note the path syntax, as specified in the element’s d attribute and shown in Figure 11-1.

dvw2 1101
Figure 11-1. A path and its d attribute

If you can read that, then you don’t need this book.

Fortunately, D3 has lots of built-in functions that generate paths for you. You’ve already met the axis functions, which express scales as path, line, and text elements. In later chapters, you’ll learn about d3.arc() and d3.geoPath(), both of which also generate paths for different purposes. In this chapter, we’ll cover two other common uses of paths: drawing line and area charts.

Line Charts

Let’s start with a simple line chart. Actually generating the line is quite simple, but we need some data in place first. For this chapter, I’m going to use a real-world dataset.

Data Preparation

Line charts are great for time series, so I’ve decided to chart carbon dioxide measurements over time. I’ve downloaded the “Mauna Loa CO2 monthly mean data”, as provided by the National Oceanic & Atmospheric Administration’s Earth System Research Laboratory. (See the README.md in this chapter’s examples for details and a full citation.)

This data includes monthly average values of CO2 in parts per million, as measured at the Mauna Loa Observatory in Hawaii. An excerpt of the raw-text file looks like this:

# CO2 expressed as a mole fraction in dry air, micromol/mol, abbreviated as ppm
#
#  (-99.99 missing data;  -1 no data for #daily means in month)
#
#            decimal     average   interpolated    trend    #days
#             date                             (season corr)
1958   3    1958.208      315.71      315.71      314.62     -1
1958   4    1958.292      317.45      317.45      315.29     -1
1958   5    1958.375      317.50      317.50      314.71     -1
1958   6    1958.458      -99.99      317.10      314.85     -1
1958   7    1958.542      315.86      315.86      314.98     -1
1958   8    1958.625      314.93      314.93      315.94     -1
1958   9    1958.708      313.20      313.20      315.91     -1
1958  10    1958.792      -99.99      312.66      315.61     -1
1958  11    1958.875      313.33      313.33      315.31     -1
1958  12    1958.958      314.67      314.67      315.61     -1
1959   1    1959.042      315.62      315.62      315.70     -1
1959   2    1959.125      316.38      316.38      315.88     -1
1959   3    1959.208      316.71      316.71      315.62     -1
1959   4    1959.292      317.72      317.72      315.56     -1
1959   5    1959.375      318.29      318.29      315.50     -1
1959   6    1959.458      318.15      318.15      315.92     -1
1959   7    1959.542      316.54      316.54      315.66     -1
1959   8    1959.625      314.80      314.80      315.81     -1
1959   9    1959.708      313.84      313.84      316.55     -1
1959  10    1959.792      313.26      313.26      316.19     -1
1959  11    1959.875      314.80      314.80      316.78     -1
1959  12    1959.958      315.58      315.58      316.52     -1
1960   1    1960.042      316.43      316.43      316.51     -1

Note that the first column indicates a year, while the second indicates a month (1–12). This version of the file contains values from March 1958 through January 2017. The fourth column, “average,” is the CO2 value we’re interested in.

You probably noticed that these values aren’t comma-separated; rather, they are provided in fixed-width columns in an otherwise unformatted text file. D3 needs a little more structure than this, so my first step is to wrangle this into CSV form, using the free, amazing tool OpenRefine.

I won’t detail the steps involved here, as there are other books and tutorials on using OpenRefine. In short, I cut out everything but the three relevant columns, and my resulting file, mauna_loa_co2_monthly_averages.csv, looks like this:

year,month,average
1958,3,315.71
1958,4,317.45
1958,5,317.5
1958,6,-99.99
1958,7,315.86
1958,8,314.93
1958,9,313.2
1958,10,-99.99
1958,11,313.33
1958,12,314.67
1959,1,315.62
1959,2,316.38
1959,3,316.71
1959,4,317.72
1959,5,318.29
1959,6,318.15
1959,7,316.54
1959,8,314.8
1959,9,313.84
1959,10,313.26
1959,11,314.8
1959,12,315.58
1960,1,316.43
…

Much better! Next, I’ll define a row conversion function, as described in Chapter 5.

var rowConverter = function(d) {
    return {
        //Make a new Date object for each year + month
        date: new Date(+d.year, (+d.month - 1)),
        //Convert from string to float
        average: parseFloat(d.average)
    };
}

This will convert the three columns of data in our CSV into just two: date and average. The date consists of a new Date(), a new JavaScript Date object, into which we are passing the year and the month. The + operator, in this context, forces the d.year and d.month values to be typed as numbers instead of strings. And because JavaScript’s month counting begins at zero (as in, 0 = January, 1 = February, and 2 = March) but the data’s month counting begins at one (1 = January), I’ve subtracted one from each month value. (See MDN’s docs on the Date object for more.)

The average value is parsed from a string to a float.

We call the row conversion function when we load in the CSV data:

//Load in data
d3.csv("mauna_loa_co2_monthly_averages.csv", rowConverter, function(data) {

    var dataset = data;

    //Print data to console as table, for verification
    console.table(dataset, ["date", "average"]);

    //…

Feast your eyes on console.table(), a fancy improvement on console.log() that prints everything in nice, pretty columns that are easier on your brain. Here I’ve (optionally) specified the names of the two columns of interest: date and average. You can verify for yourself in Figure 11-2 that the values match that from our original dataset.

dvw2 1102
Figure 11-2. Using console.table() to verify data

Isn’t it beautiful?

Scale Setup

The process of setting up our scales should be familiar by now. First, xScale will handle time, and yScale handles the CO2 values. I’ve set a zero baseline (low domain value of 0) to start.

    xScale = d3.scaleTime()
               .domain([
                    d3.min(dataset, function(d) { return d.date; }),
                    d3.max(dataset, function(d) { return d.date; })
                ])
               .range([0, w]);

    yScale = d3.scaleLinear()
               .domain([0, d3.max(dataset, function(d) { return d.average; })])
               .range([h, 0]);

Line ’em Up

With our data and scales in place, finally we get to draw a line! (It’s the small things in life…)

Start by defining a line generator function, much as we defined axis generator functions to make axes. To do this, we call d3.line(), making sure to specify x and y accessors.

    //Define line generator
    var line = d3.line()
                 .x(function(d) { return xScale(d.date); })
                 .y(function(d) { return yScale(d.average); });

The x and y accessors tell the line generator how to decide where to place each point on the line. Note that for x, we specify the scaled value of d.date, and y gets the scaled value of d.average. Later, when the line generator is called, it will look for the bound data and loop through each value, using the accessor logic here to calculate where to position each point. Drawing the line itself is really just a matter of connecting those carefully positioned dots.

For this barebones example, I then create the SVG element and finally append a new path element.

    //Create SVG element
    var svg = d3.select("body")
                .append("svg")
                .attr("width", w)
                .attr("height", h);

    //Create line
    svg.append("path")
       .datum(dataset)
       .attr("class", "line")
       .attr("d", line);

Huh? What happened to your old friend, the selectAll/data/enter/append pattern?

Until now, our examples involved one graphical mark corresponding to one data value (or one “row” of related values), as in our bar charts and scatterplots. A line chart, however, calls for a single graphical mark that represents many data values. We can skip the selectAll/data/enter/append pattern, because we already know how many new marks we need: just one.

Instead of using data() to bind each value in our dataset array to a different element, we use datum(), the method for binding a single data value to a single element. The entire dataset array is bound to the new path we just created.

Following that, we assign a class of line (to enable easy CSS selection and styling) and then—finally—set a d attribute, passing in our line generator function as an argument. Since the data has already been bound to the path, the line generator simply grabs that data, plots the points as we specified, and draws a line connecting them. This produces the output we saw in Figure 11-1, at the start of this chapter. The final code is in 01_line_chart.html.

To verify how the entire dataset was bound to one element, try selecting that line in the console using d3.select(".line"). Note the complete, 707-value-long array that appears under __data__ in Figure 11-3.

dvw2 1103
Figure 11-3. Verifying the data array bound to a single path element

Dealing with Missing Data

In 02_line_chart_axes.html, I’ve added some padding on the left and bottom edges plus axes, as shown in Figure 11-4.

dvw2 1104
Figure 11-4. Added padding and axes

Now, with the axes labeled, it’s time for a sanity check: could CO2 levels really have dropped below zero, as those downward spikes seem to indicate? (Are negative CO2 ppm values even possible? Answer: no.)

This spiky anomaly is explained by a note in the original datafile:

#  (-99.99 missing data;  -1 no data for #daily means in month)

Ah, so carbon-measuring machines (and even scientists themselves) are not infallible! A –99.99 value is not a true measurement, but stands in for “no data available for that month.”

We could manually remove those –99.99 values from our dataset. Or we could leave the data untouched, and use the line generator’s defined() method to determine, on the fly, whether or not each individual value is defined (or valid). defined() is just another configuration method, like x() and y(). If the result of its anonymous function is true, then that data value is included. If not, the value is excluded.

This simple use of defined() checks merely to see if any value exists at all:

.defined(function(d) { return d; })

So, if d exists, this will be interpreted as true.

But in our case, the –99.99 values do exist; it’s just that we humans know they aren’t valid measurements. To exclude them, we could use a comparison operator, which will return a logical result (true or false):

//Define line generator
var line = d3.line()
             .defined(function(d) { return d.average >= 0; })
             .x(function(d) { return xScale(d.date); })
             .y(function(d) { return yScale(d.average); });

For every row in the dataset, if d.average is greater than or equal to zero, this will return true, and the value will be included. All negative values, including –99.99, will be thrown out.

See 03_line_chart_missing.html for that final code, which renders the chart shown in Figure 11-5.

dvw2 1105
Figure 11-5. Line chart, invalid CO2 values excluded

Note the veeeeery tiny gaps, where there used to be (graphically dishonest) spikes.

Actually, this would be a nice time to show off SVG’s fundamentally scalable nature. Use your browser’s zoom functionality to zoom in to this chart as much as possible. Note that the entire chart—consisting of vector SVG elements—remains as sharp and clear as ever. Figure 11-6 shows our chart at 500% zoom, where the gaps are clearly visible.

dvw2 1106
Figure 11-6. Zoomed in to reveal the gaps

Refining the Visuals

In 04_line_chart_adjusted.html, I’ve adjusted the yScale domain to emphasize the upward trajectory, so there is no longer a zero baseline. I’ve also manually added a red line corresponding to 350 ppm, considered the maximum “safe” level of atmospheric CO2 by many scientists, and used the stroke-dasharray property to make it a dashed line. See Figure 11-7.

dvw2 1107
Figure 11-7. Adjusted yScale and new red line

At this point, we are dealing in visual rhetoric—design decisions that express a point of view by influencing how others interpret our chart. (For great examples of climate change–related visual rhetoric, see Duarte Design’s graphics and animations, as used by Al Gore in the Academy Award–winning An Inconvenient Truth.)

Having acknowledged that, let’s say I want to emphasize the portions of the line that are above the 350 ppm line. I could approach that by simply creating a second data-driven line. I could use defined() to include only data points of 350 or greater for the new red line, and to exclude such points for the original teal line:

//Define line generators
line = d3.line()
            .defined(function(d) { return d.average >= 0 && d.average <= 350; })
            .x(function(d) { return xScale(d.date); })
            .y(function(d) { return yScale(d.average); });

dangerLine = d3.line()
            .defined(function(d) { return d.average >= 350; })
            .x(function(d) { return xScale(d.date); })
            .y(function(d) { return yScale(d.average); });

That, plus some styling changes for each line and a new text label, produces what you see in 05_line_chart_labeled.html and Figure 11-8.

dvw2 1108
Figure 11-8. Labeled line chart with two lines, teal and red

This is just one of many possible visual solutions. A close look reveals small gaps in the line near the 350 ppm mark. An alternative approach would be to render two full lines in their entirety, but mask them using SVG clipping paths, as described in “Containing visual elements with clipping paths” in Chapter 9.

Area Charts

Areas are not too different from lines. If a line is a series of connected x/y points, then an area is just that same line, plus a second such line (usually a flat baseline), with all the space in between filled in.

Figure 11-9 illustrates this and shows what we’re working toward.

dvw2 1109
Figure 11-9. Area chart

The main difference is calling d3.area() to define an area generator instead of a line generator:

area = d3.area()
         .defined(function(d) { return d.average >= 0; })
         .x(function(d) { return xScale(d.date); })
         .y0(function() { return yScale.range()[0]; })
         .y1(function(d) { return yScale(d.average); });

You’ll notice I’ve specified x, y0, and y1 accessors. The x accessor is unchanged from earlier. y0 represents the area’s baseline, while y1 represents the top, or data value. You’ll notice that y1 is the same as our earlier y accessor. But y0 is new. For every data value, this accessor returns the first value in yScale’s range: that is, the “bottom” edge of the chart.

I don’t know about you, but to me this sure feels like a whole lot more carbon dioxide. (Suddenly, I hope you are reading this as an ebook powered by renewable energy. If you’re reading the dead-tree version, don’t feel bad; use what you’re learning here for good.)

One potential downside of an area chart is that, if there are missing data points, those absences will be keenly felt, as in Figure 11-9. Those slices punctuating the data don’t sit right with me, but they are graphically honest. You could go fudge the data to patch those holes, but then how would you sleep at night?

I’ll define the second area (the red one) like so:

dangerArea = d3.area()
            .defined(function(d) { return d.average >= 350; })
            .x(function(d) { return xScale(d.date); })
            .y0(function() { return yScale(350); })
            .y1(function(d) { return yScale(d.average); });

Note that, in this case, I specify a baseline value of yScale(350)—that is, the 350 ppm mark—because this area doesn’t need to extend all the way down to the bottom of the chart. Nice!

Later in the code, we make sure to call the new area generators instead of the old line generators:

//Create areas
svg.append("path")
    .datum(dataset)
    .attr("class", "area")
    .attr("d", area);  // <-- Area!

svg.append("path")
    .datum(dataset)
    .attr("class", "area danger")
    .attr("d", dangerArea);  // <-- Area!

I also made some minor adjustments to variable names and CSS properties. But mainly you only need to change line to area and specify y0 and y1 accessors.

See that chart in 06_area_chart.html and Figure 11-10.

dvw2 1110
Figure 11-10. Area chart, with both areas

As a designer, I must point out that, yes, you can see the tiniest bit of the teal area bleeding around the edges of the red area. Most people won’t notice this, but if you do, I’m sorry for you, but glad to meet someone else with the same affliction. In any case, you could remove that cleanly by using SVG clipping paths (again, see Chapter 9).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.69.157