Chapter 5. Using Data and Scales

In Chapter 4, Creating a Bar Graph, you learned how to create a bar graph that was based upon a sequence of integers that were statically coded within the application. Although the resulting graph looks quite nice, there are several issues with the way the data is provided and rendered.

One of the issues is that the data is hard-coded within the application. Almost invariably, we are going to load the data from an external source. D3.js provides a rich set of functionalities for loading data from sources over the web, and which is represented in different formats. In this chapter, you will learn to use D3.js for loading data from the web in JSON, CSV, and TSV formats.

A second issue with the data in the example given in the previous chapter was that it was simply an array of integers. Data will often be represented as collections of objects with multiple properties, many of which we do not need for our visualization. They are also often represented as strings instead of numeric values. In this chapter, you will learn how to select just the data that you want and to convert it to the desired data type.

Yet another issue in our previous bar graph was that we assumed that the values represented in the data had a direct mapping to the pixels in the visualization. This is normally not the case, and we need to scale the data into the size of our rendering in the browser. This can be easily accomplished using scales, which we already examined relative to axes, and now we will apply them to data.

One last issue in the previous example was that our code for calculating the size and positions of the bars was performed manually. Bar graphs are common enough in D3.js applications, and there are built-in functions that can do this for us automatically. We will examine using these to simplify our code.

So let's jump in. In this chapter, we will specifically cover the following topics:

  • Loading data in JSON, TSV, or CSV formats from the Web
  • Extracting fields from objects using the .map() function
  • Converting string values into their representative numeric data types
  • Using linear scales for transforming continuous values
  • Using ordinal scales for mapping discrete data
  • Using bands for calculating the size and position of our bars
  • Applying what we've learned to date for creating a rich bar graph using real data

Data

Data is the core of creating a data visualization. Almost every visual item created in D3 will need to be bound to a piece of data. This data can come from a number of sources. It can be explicitly coded in the visualization, loaded from an external source, or result from manipulation or calculation from other data.

Most data used to create a D3.js visualization is either obtained from a file or a web service or URL. This data is often in one of many formats such as JSON, XML, CSV (Comma Separated Values), and TSV (Tab Separated Values). We will need to convert the data in these formats into JavaScript objects, and D3.js provides us with convenient functions for doing this.

Loading data with D3.js

D3.js provides a number of helper functions to load data from outside the browser as well as to simultaneously convert it into JavaScript objects. Probably, the most common data formats that you may come across and which we will cover are:

  • JSON
  • TSV
  • CSV

Note

You may have noticed that I have omitted XML from the list in our examples. D3.js does have functions to load XML, but unlike with JSON, TSV and CSV, the results of the load are not converted automatically into JavaScript objects, and require additional manipulation using the JavaScript XML/DOM facilities. XML will be considered out of scope for this text as most of the scenarios you will currently come across will be handled with these three formats, if not solely by JSON, which has become almost the ubiquitous data format for the Web.

To demonstrate working with all these formats of data, we will examine a dataset that I have put together and placed in a GitHub that represents the viewership of the episodes of Season 5 of AMC's The Walking Dead.

Note

This GitHub was built manually using data on https://en.wikipedia.org/wiki/The_Walking_Dead_(season_5).

Loading JSON data

Data in the JavaScript Object Notation (JSON) format is convenient for conversion into JavaScript objects. It is a very flexible format which supports named properties as well as hierarchical data.

The JSON data for this example is stored in GitHub and is available at https://gist.githubusercontent.com/d3byex/e5ce6526ba2208014379/raw/8fefb14cc18f0440dc00248f23cbf6aec80dcc13/walking_dead_s5.json.

Note

The URL is a little unwieldy. You can go directly to the gist with all three versions of this data at https://goo.gl/OfD1hc.

Clicking on the link will display the data in the browser. This file contains an array of JavaScript objects, each of which has six properties and represents an individual episode of the program. The first two objects are the following:

[
{
  "Season": 5,
  "Episode":  1,
  "SeriesNumber": 52,
  "Title": "No Sanctuary",
  "FirstAirDate": "10-12-2014",
  "USViewers": 17290000
},
{
  "Season": 5,
  "Episode":  2,
  "SeriesNumber": 53,
  "Title": "Strangers",
  "FirstAirDate": "10-19-2014",
  "USViewers": 15140000
},
…
]

This data can be loaded into our D3.js application using the d3.json() function. This function, like many others in D3.js, performs asynchronously. It takes two parameters: the URL of the data to load, and a callback function that is called when the data has been loaded.

The following example demonstrates loading this data and displaying the first item in the array.

Note

bl.ock (5.1): http://goo.gl/Qe63wH

The main portion of the code that loads the data is as follows:

var url = "https://gist.githubusercontent.com/d3byex/e5ce6526ba2208014379/raw/8fefb14cc18f0440dc00248f23cbf6aec80dcc13/walking_dead_s5.json";
d3.json(url, function (error, data) {
    console.log(data[0]);
});
console.log("Data in D3.js is loaded asynchronously");

There is no visible output from this example, but the output is written to the JavaScript console:

"Data in D3.js is loaded asynchronously"
[object Object] {
  Episode: 1,
  FirstAirDate: "10-12-2014",
  Season: 5,
  SeriesNumber: 52,
  Title: "No Sanctuary",
  USViewers: 17290000
}

Note that the loading of data in D3.js is performed asynchronously. The output from the console.log() call shows that the data is loaded asynchronously and is executed first. Later, when the data is loaded, we see the output from the second call to console.log().

The callback function itself has two parameters. The first is a reference to an object representing an error if one occurs. In such a case, this variable will be non-null and contain details. Non-null means the data was loaded, and is represented by the data variable.

Loading TSV data

TSV is a type of data that you will come across if you do enough D3.js programming. In a TSV file, the values are separated by tab characters. Generally, the first line of the file is a tab-separated sequence of names for each of the values.

TSV files have the benefit of being less verbose than JSON files, and are often generated automatically by many systems that are not JavaScript based.

The episode data in the TSV format is available at https://gist.githubusercontent.com/d3byex/e5ce6526ba2208014379/raw/8fefb14cc18f0440dc00248f23cbf6aec80dcc13/walking_dead_s5.tsv.

Clicking on the link, you will see the following in your browser:

Season Episode SeriesNumber Title FirstAirDate USViewers
5 1 52 No Sanctuary 10-12-2014 17290000
5 2 53 Strangers 10-19-2014 15140000 
5 3 54 Four Walls and a Roof 10-26-2014 13800000
5 4 55 Slabtown 11-02-2014 14520000
5 5 56 Self Help 11-09-2014 13530000
5 6 57 Consumed 11-16-2014 14070000
5 7 58 Crossed 11-23-2014 13330000
5 8 59 Coda 11-30-2014 14810000
5 9 60 What Happened and What's Going On 02-08-2015 15640000
5 10 61 Them 02-15-2015 12270000
5 11 62 The Distance 02-22-2015 13440000
5 12 63 Remember 03-01-2015 14430000
5 13 64 Forget 03-08-2015 14530000
5 14 65 Spend 03-15-2015 13780000
5 15 66 Try 03-22-2015 13760000
5 16 67 Conquer 03-29-2015 15780000

We can load the data from this file using d3.tsv(). The following contains the code for the example:

Note

bl.ock (5.2): http://goo.gl/nlq8jy

The code is identical to the JSON example except for the URL and the call to d3.json(). The output in the console is, however, different.

[object Object] {
  Episode: "1",
  FirstAirDate: "10-12-2014",
  Season: "5",
  SeriesNumber: "52",
  Title: "No Sanctuary",
  USViewers: "17290000"
}

Notice that the properties Episode, Season, SeriesNumber, and USViewers are now of type string instead of integer. TSV files do not have a means of implying the type like JSON does, so everything defaults to string. These will often need to be converted to another type, and we will examine that in the next section on mapping and data conversion.

Loading CSV data

CSV is a format similar to TSV except that instead of tab characters delimiting the fields, a comma is used. CSV is a fairly common format, common as output from spreadsheet applications, which is used for creating data to be consumed by other applications in many organizations.

The CSV version of the data is available at https://gist.githubusercontent.com/d3byex/e5ce6526ba2208014379/raw/8fefb14cc18f0440dc00248f23cbf6aec80dcc13/walking_dead_s5.csv.

Opening the link, you will see the following in your browser:

Season,Episode,SeriesNumber,Title,FirstAirDate,USViewers
5,1,52,No Sanctuary,10-12-2014,17290000
5,2,53,Strangers,10-19-2014,15140000 
5,3,54,Four Walls and a Roof,10-26-2014,13800000
5,4,55,Slabtown,11-02-2014,14520000
5,5,56,Self Help,11-09-2014,13530000
5,6,57,Consumed,11-16-2014,14070000
5,7,58,Crossed,11-23-2014,13330000
5,8,59,Coda,11-30-2014,14810000
5,9,60,What Happened and What's Going On,02-08-2015,15640000
5,10,61,Them,02-15-2015,12270000
5,11,62,The Distance,02-22-2015,13440000
5,12,63,Remember,03-01-2015,14430000
5,13,64,Forget,03-08-2015,14530000
5,14,65,Spend,03-15-2015,13780000
5,15,66,Try,03-22-2015,13760000
5,16,67,Conquer,03-29-2015,15780000

The example for demonstrating the loading of the preceding data using d3.csv() is available at the following link:

Note

bl.ock (5.3): http://goo.gl/JUX9CA

The result is identical to that of the TSV example in that all the fields are loaded as strings.

Mapping fields and converting strings to numbers

We are going to use this data (in its CSV source) to render a bar graph that shows us the comparison of the viewership levels for each episode. If we are to use these fields as-is for creating the bar graph, those values will be interpreted incorrectly as their types are strings instead of numbers, and our resulting graph will be incorrect.

Additionally, for the purpose of creating a bar chart showing viewership, we don't need the properties and can omit the Season, SeriesNumber, and FirstAirDate fields. It's not a real issue with this dataset, but sometimes, the data can have hundreds of columns and billions of rows, so it will be more efficient to extract only the necessary properties to help save memory.

These can be accomplished in a naive manner using a for loop, copying the desired fields into a new JavaScript object, and using one of the parse functions to convert the data. D3.js gives us a better way, a functional way, to perform this task.

D3.js provides us with the a .map() function that can be used on an array, which will apply a function to each of the array's items. This function returns a JavaScript object, and D3.js collects all these objects and returns them in an array. This gives us a simple way of selecting just the properties that we want and to convert the data, all in a single statement.

To demonstrate this in action, open the example given at the following link:

Note

bl.ock (5.4): http://goo.gl/ex2e8C

The important portion of the code is the call to data.map():

var mappedAndConverted = data.map(function(d) {
    return {
        Episode: +d.Episode,
        USViewers: +d.USViewers,
        Title: d.Title
    };
});
console.log(mappedAndConverted);

The function that is passed to the .map() returns a new JavaScript object for each item in the array data. This new object consists of only the three specified properties. These objects are all collected by .map() and stored in the mappedAndConverted variable.

The following code shows the first two objects in the new array:

[[object Object] {
  Episode: 1,
  Title: "No Sanctuary",
  USViewers: 17290000
}, [object Object] {
  Episode: 2,
  Title: "Strangers",
  USViewers: 15140000
},

Note that Episode and USViewers are now numeric values. This is accomplished by applying the unary + operator, which will convert a string to its appropriate numeric type.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.215.220