5
Arrays

For a data scientist, handling arrays of tabulated data is a recurring task. The baseline, if using JavaScript, is to be able to do what we usually do with a spreadsheet and basic macrooperations. Thousands and thousands of tabulated data can be accessed freely on the Internet: public data for most countries, from international bodies, or free access private data. In Part 3, several applications will be discussed (e.g. French parliament election 2017).

JavaScript provides several built-in objects able to represent tabulated data, which we can access through an index: Array, of course, and TypedArray, Map, Set, Stringare other “Iterables”. They all share several features, such as the length property, but are named “array-like”. The list of arguments of a function, and a list of selected HTML DOM elements, are also “array-like” objects.

This chapter is devoted to the object “Array”.

An “array” is a set of ordered values, which we can access with a numeric index. This index starts at 0 not 1 (zero-based index).

There is no specific type for arrays: the operator typeof returns "object", but the static method Array.isArray(tab) can check if its argument is an array, hence tab inherits all the Array.prototype methods.

5.1. Handling arrays: creation and access to its elements

5.1.1. Creating an array with the array literal notation

The syntax of an “array literal” uses [square brackets] to delimit a list of elements, which can be primitive values, variable names, objects and functions, separated by commas:

  [elem1, elem2, …] // any number of elements, including zero

Examples:

const tm = ["un", 2]; // two values: a string, a number
const ts = ["Jean","Bob"], tabn = [12,24]; // homogeneous arrays
const t2d = [ ts, tn];          // an array of two arrays
const te = [];                  // empty array: tab.length = 0
const tu = ["a",,"c"]; // syntactically correct, but to avoid!

WARNING.– When typing, do not remove an element, while leaving its comma: the array tu above has: tu.length = 3 and tu[1] = undefined.

Here are some best practices with arrays:

  • – declare an array with const: the elements, and their number, can be modified but it restricts the use of its name to that array, once for all;
  • – use the literal notation, not new Array(), it avoids ambiguities:
      const tm = new Array("two"); // equiv: tm = ["two"];
      const tn = new Array(2); // equiv.: tn = [undefined, undefined];

5.1.2. Checking if a variable is an array

The operator typeof does not distinguish an array from an object:

  const t = [a, b, c]; console.log( typeof t ); // "object"

Instead, use the static method:

  console.log( Array.isArray(t) ); // true

5.1.3. The length property, the index count

The length is in read-write access, and the index starts at 0 not 1:

  console.log( [a, b, c].length );        // -> 3
  const t = ["one", 2]; console.log( t[1]); // -> 2

The property length is permanently updated, returning the index of the last element plus one. Adding or removing an element in the array immediately modifies length. In turn, modifying length infers the addition or removal of elements, at or from the end of the array.

This feature can be used for some operations:

  • emptying an array: simply set length to zero
      const t1 = []; console.log(tab1.length); // -> 0 (t1 empty)
      const t2 = ["one", "two", "three"];         // t2.length = 3
      t2.length = 0; // now t2 is empty

    The “garbage collector” will take care of the memory previously assigned to t2.

  • initializing an array to a given number of elements
      const t = []; t.length = N; // [undefined, undefined,… N times]

NOTE.– An array is an object, with the specific ability to use numbers as property names, which is forbidden for a regular object:

  const obj = {"p1":1, "p2":"two"};
  console.log(obj["p2"]);     // two
  console.log(obj[1]); // undefined (no such property)

NOTE.– We can add own properties to an array, as we do with objects. For example:

  const t1 = [], t2 = [[], []];
  // let's add the property 'dimension', then print values:
        t1.dimension = 1;       t2.dimension = 2;
  console.log( t1["length"] +", "+ t1["dimension"] );  //-> 0, 1
  console.log( t2["length"] +", "+ t2["dimension"] );  //-> 2, 2

5.1.4. Accessing individual values in an array: the indices

In the section Array.prototype, we will learn that it is often possible to avoid handling the individual elements of an array and rather work with the array as a whole. However, it is worth knowing how indices behave. Reminder: they start at zero.

  const t = [2, 4];	
  let n = 1; console.log( t[n] ); // 4
  n = 6; console.log( t[n] ); // undefined
  n = -6; console.log( t[n] ); // undefined

The value undefined can result from either of the following cases:

  • – the index is out of bounds [0, t.length-1];
  • – the index is correct, but the value actually is undefined (“missing value”).

NOTE.– An index is an “object property name”: the index is of type “number”, but it is evaluated as a “string” when used for accessing an element, which is to say a “property” of the array:

  ( tab[1] === tab["1"] ); // -> true

NOTE.– Missing values: some methods ignore them, whereas a for loop enumerates them: e.g. tu[1] in the example const tu = ["a", ,"c"]; Recommendation: avoid unnecessary missing values.

5.2. Methods of the object Array and Array.prototype

By far, the best way to handle arrays is to use the Array.prototype methods and, whenever needed, the static methods of the object Array.

Table 5.1. The static methods of the object Array

Method Description Return
Array.of() Initializes an array with the arguments Array
Array.from () Creates an array from an “Iterable” Array
Array.i sArray () Checks if the argument is an “Array” Boolean

These static methods must be invoked from the built-in object Array. Example: Array.isArray(["a","b"]); // true.

See section 5.2.5 for the use of Array.from.

By contrast, the methods of the object Array.prototype are delegated to every array. For example: [].push("a"); // ["a"]

The Array.prototype methods are grouped into three families: “Mutators”, “Accessors” and “Iterators”. We use the notation [].method to identify them.

5.2.1. The “Mutators” family

Nine methods that modify the array on which they are invoked; the array length may change. Several methods are useful at initialization stage:

  • – incremental insertion element by element (‘push', ‘unshift');
  • – copy a set of values (‘fill', ‘copyWithin');
  • – incremental removal of element by element (‘pop', ‘shift');
  • – replacing a slice of values (‘splice').

Table 5.2. “Mutators” methods of Array.prototype

Method Description Return
a.copyWithin Moves a slice of the array according to (target, start, end) tab
a.fill Copies a same value according to (val, start, end) tab
a.pop
a.shift
Removes the last (respectively, first) value, decrementing length valeur
a.push
a.unshift
Adding a value at the end (respectively, beginning), incrementing length length
a.reverse Reverse the array tab
a.sort Alphabetical sort or according to comparator (function) tab
a.splice Modifies a slice according to (start, count, [values list]), modifies length tab

– The methods “fill”, “push”, “unshift”

  • – create an array of N identical values, for instance 0s:
      const t = []; t.length = N;    t.fill(0);   // [0, 0, … N]
  • – fill an array with values provided as a result of some requests:
    let x; const t = [];       // start with an empty array
    do{   x = Math.random() * (10 - 1) + 1;    // data in [1,10]
          if(x > 5){t.push(x.toFixed(1));}     // keep if > 5
    } while (x > 5);                           // stop if ≤ 5
    if(t.length > 0){t.unshift("random > 5");}    // set t[0] if t ≠ ∅
    
  • The method “splice”

This makes it possible to “revise” an array by modifying one element or a slice. For instance, to update the first name of an object whose full name contains a given last name:

  • – line 1: sets the original array;
  • – line 2: selects with findIndex the element containing “Dent”: gives k;
  • – line 3: uses splice to replace one value at index [k].
    const t = ["Jean Bon", "Ange Leau", "P. Dent", "Paul Tron"]; //1
    let k = t.findIndex(x=>x.includes("Dent")); // (see below)   //2
    t.splice(k, 1, "Redon Dent"); // replaces the value for t[k] //3
  • The methods “reverse” and “sort”

    These methods modify the order of the elements: “reverse” is trivial and “sort” is more complex.

    const t = ["Jean Bon", "Ange Leau", "P. Dent", "Paul Tron"];
    t.sort(); // ["Ange Leau", "Jean Bon", "P. Dent", "Paul Tron"]
    
  • – with no argument, “sort” makes a Unicode-based alphabetical sort;
  • – with a function 'compare' as argument, sort is made depending on the result of the compare function element by element:
    function compare(a, b) {
       if (a < b [by some ordering criterion]) {return -1;}
       if (a > b [by same ordering criterion]) {return 1;}
       return 0; // because a = b
    }
    

The function compare is used in a quickSort algorithm. The array is partitioned into slices, each slice having a pivot (value 'a'). At some point of the (ascending) sort, for each element (value 'b') preceding 'a', if compare(a,b) > 0, the element is moved, else left in place. A symmetric operation occurs if 'a' precedes 'b':

EXAMPLE 1.– Sorting is done by numbers (ascending order, ignoring Infinity, NaN)

  function compareNumbers(a,b) {return a - b;}
  [1, 4, 2, 11, 3].sort(compareNumbers);  // [1,2,3,4,11] (num)
  [1, 4, 2, 11, 3].sort();                // [1,11,2,3,4] (alpha!)

EXAMPLE 2.– Sorting is done by the last name in “first last” string.

  function compareLastNames(a,b){
     let al = a.split(" ")[1], // second word in a
         bl = b.split(" ")[1]; // second word in b
     if (a1 < b1)            return -1;
     else if (a1 == b1)      return 0;
     else                    return 1;
}
  const t = ["Jean Bon","Ange Leau","P. Dent","Paul Tron"]; t.sort(compareLastNames);
                       // ["Jean Bon","P. Dent","Ange Leau","Paul Tron"]

5.2.2. The “Accessors” family

Seven methods do not modify the array, but yield a representation of the data in the array, which can be a new array, an index value and a string.

Table 5.3. “Accessors” methods of Array.prototype

Method Description Return
a.concat Concatenates the array with the argument Array
a.lastIndexOf
a.indexOf
Index of the first/last occurrence of (val), or -1 if not found Number
a.slice Copies a slice (start,end) into a new array Array
a.join
a.toLocaleString
a.toString
Makes a string with all the values of the array similar to join (+“locale” format) similar to join (not choosing the separator) String

The use of these methods is rather straightforward.

5.2.3. The “Iteration” family

Repeat an operation for each element, not modifying the array. The result can be:

  • – a new array, same length (map), or shorter (filter);
  • – a selected value or index (find, findIndex);
  • – a cumulated value over the whole array (reduce, reduceRight);
  • – a plain iteration (forEach) of a block of operations element by element.

Table 5.4. “Iteration” methods of Array.prototype

Method Description Return
a.entries
a.keys
a.values
Returns an array of couples [key,value] An array of [key] An array of [value] Iterator
a.every
a.some
Checks if all elements comply with (function) If one element complies (function) Boolean
a.filter Array of elements selected by (function) Array
a.find
a.findIndex
First value complying with (function), or undefined index of that first value, or -1 Value
Index
a.forEach Processes (function) for each element Undefined
a.map New array with values modified according to (function) Array
a.reduce
a.reduceRigh
Cumulates the values according to (function) idem, from the end of the array Value

5.2.4. Iterating over the elements of an array

Important recommendations for handling arrays are as follows:

Use Array.prototype methods, not loops: it is easier and much less error prone.

There are two kinds of loops:

  • iteration loop: 'for(let i = 0; i < t.length; i++)' or 'while..';
  • enumeration loop: 'for(let prop in t)'.

The enumeration loop is not concerned by the remark. The iteration loop uses an index and the length of the array: we have pointed out the risks when declaring or forgetting-to-declare the index.

5.2.5. Iteration without a loop, with Array/Array.prototype methods

All basic operations for creating and handling arrays exist since ES6.

5.2.5.1. Array.from

This transforms an iterable into a regular array. For example:

  const args = Array.from(arguments);   // within a function
  const t2 = Array.from({length:3},function(){return 1}); // [1,1,1]
  const t3 = Array.from({length:3},function(v,i){return i*i});
  const t4 = Array.from("hello"); // ["h", "e", "l", "l", "o"]
  const rd = Array.from({length:n},
                         function(x){return Math.random() * 10;});
  • - 'args': see section 6.3 (Chapter 6);
  • - 't2': identical to the example with “fill” (but in one instruction);
  • - 't3': the first argument {length:3} is an iterable (it has a length), it allows from to iterate the function (argument #2) three times to i*i: resulting in [0, 1, 4];
  • - 't4': transforms a string into an array;
  • - 'rd': a data series of n random numbers between 0 and 9.

5.2.5.2. tab.forEach

This runs the function for each element of the array tab. For example, to compute a value and at the same time, save some data in an additional array:

  let rc = "figures ", rl = " and letters "; const tabc = [];
  ["a","b", 1, 2].forEach(function(v, i){
    if(typeof v === 'string') rl += (i===0?"" : ", ") + v; else tabc.push(v);
  });
  rc +tabc.toString()+ rl; // figures 1,2 and letters a, b

5.2.5.3. tab.map

This runs the function for each element of the array [], like forEach, plus saves the returned value into a new array. For example:

  const tabl = ["a","b", 1, 2].map(function(v, i){
     return (typeof v === 'string'? v : v.toString());
  }); // tabl will contain only 'strings'

5.2.5.4. tab.every, tab.some

This runs the function up to some element in the array. By contrast with most array methods, these methods can stop running before the end of the array:

  • some: stops as soon as one element complies;
  • every: stops as soon as one element does not comply
      function isEven(elt, index, array) {return (elt % 2 == 0);}
      [2,4,5,8,6,2,3,12,4,8,9,1].every(isEVen);
                              // -> false (stops at '5')
      [2,4,5,8,6,2,3,12,4,8,9,1].some(isEVen) ;
                             // -> true (stops at first '2')
    

5.2.5.5. tab.find, tab.findIndex:

This stops as soon as the argument is found, and returns the value or the index.

  function isPrime(v, i) {
     let f = 2;
     while (f <= Math.sqrt(v))
         {if (v % f++ < 1){return false;}}
     return v > 1;
}
[4, 5, 8, 12].findlndex(isPrime); //-> 1 (and stops)

5.2.5.6. filter

This selects the elements complying with the function into a new array. It is similar to the SQL command “Select_From_Where”:

  const upTo10 = Array.from({length:10},
        function(v,i){return i+1}); // [1,2,3,4,5,6,7,8,9,10]
  const primes = upTo20.filter(isPrime); // [2,3,5,7]

5.2.5.7. reduce

This aggregates onto an accumulator similarly to the SQL command “GroupBy”:

  function multAcc(acc, v){return acc *= v;}
  const factorial = upTo10.reduce(multAcc); // 3628800 (!)

Thanks to the polymorphism of +, we can aggregate onto a string:

  function tPrim(acc,  v){return acc +' (${v})';}
  const ini = "First primes: ";
  const str = upTo10.filter(isPrime).reduce(tPrim,ini);
              // First primes: (2) (3) (5) (7)

Here reduce uses the second argument that initializes the accumulator:

  ["c1","c2"].reduce(function(s,x){s+'<li>${x}</li>'}, "<p>")+"</p>";
  // <p><li>c1</li><li>c2</li></p>

We will use this feature to create text to display on the web page (see Part 3). Another feature is used in that example: the chaining of methods.

5.2.6. Chaining array methods

This is a “functional” capacity provided by JavaScript: chaining methods means that the result of the first method is an object that accepts the next method. For example (above), the result of filter is an array, and we can invoke the method reduce on that array in a single instruction (over three lines for better readability):

  Array.from({length:10}, function(v,i){return i+1})
         .filter(isPrime)
         .reduce(tPrim,ini);

With the help of the “arrow function syntax”, you may even write:

  Array.from({length:10},(v,i)=>i+1)
         .filter(isPtime) .reduce((s,p) =>s+' (${p}) ',ini);

5.2.7. Arrays and the arrow function syntax

The arrow syntax can be used everywhere for function expressions, and in particular in the context of array methods, it may ease code readability, for example:

  const f = function(x,i){do_something_with_x_and_i; return value;}
  let out = tab.reduce(function(acc,x,i){return acc + f(x,i);},ini);

equivalent, in the arrow syntax , to:

  const f = (x,i)=>{do_something_with_x_and_i; return value;}
  let out = tab.reduce((acc,x,i)=>acc + f(x,i),ini);

Readability is improved if the code of the function is simply a short return instruction (e.g. 'out'). With a more complex code (e.g. function isPrime) the arrow syntax may not help much: you would rather put that code in a function to be called (e.g. similar to the f(x,i) above).

NOTE.– The arrow syntax introduces a difference in the handling of the pronoun this (which we will study later).

5.2.7.1. Arguments of the callback functions in Array.prototype methods

Most array methods accept a function as first argument (plus some optional arguments: see initial value of the accumulator, with reduce). This function argument accepts three arguments, or four, when used with reduce.

  • Three arguments: for instance with filter
      const t = Array.from({length:n}, ()=>Math.random()*10);
      const tnew = t.filter(function(x, i, tbl){return /* code */});
      // @x: is the current the element      // x: mandatory
      // @i: is the current index (when required)
      // @tbl: link to the array 't' (seldom useful)

NOTE.– t receives an array of n random numbers in the range [0-9].

  • Four arguments, for instance with reduce
      let str = t.reduce(function(acc, x,i,tbl){return acc+…;},acc0);
      // @acc: is the accumulator, initialized to acc0 (if acc0 present)
      //                           if no acc0, acc initialized to x[0]
      // @x, i, tbl: as above// acc and x: mandatory
  • – Example: Data series transformed into %-change data series

    (map+reduce)

      const t = Array.from({length:n},()=>Math.random()*10);
      function pcentChange(x, i, tbl){return i>0? x - tbl[i-1]/x:0;}
      // can be directly written into the map method,
      // then chained with reduce to compute the mean average:
      let mean =    t
                    .map((x,i,tbl)=>i>0? x - tbl[i-1]/x:0;)
                    .reduce((acc,x)=>acc+x) / t.length;

5.2.8. The “Iterables”

Several built-in objects have been added by ES6: TypedArray, Map, Set and an “iterable” behavior have been introduced in the norm, making them “Iterable” together with String and Array: all have a length, and know how to “iterate” on their elements, which allows Array.from to work.

TypedArray, Map, Set and Array all have their forEach method, and the DOM NodeList object as well (see Part 2).

The Arguments object, accessible within any function, under the name arguments is also an Iterable: arguments.length is ≥0 (not undefined) and we can make an array from it:

  const args = Array.from(arguments)

5.3. Array of arrays (multidimensional array)

Handling multidimensional data (e.g. 2D) is a frequent task of the data scientist.

The “spreadsheet” is the most common pattern for simple databases (non-transactional DB). In many situations, these tabulated data are easy to collect and archive (very simple training). Each line of the file has a defined number of elements, and columns are “named”, which can be easily translated into a JavaScript object notation:

  const line_i = {"nameCol1": val1, "nameCol2": val2, … };

Here is an example of a 2D array:

  const t1 = [1, 2], t2 = [3, 4], t3 = [5, 6];
  const tab2d = [t1, t2, t3];
                 // tab2d[1][1] = 4    tab2d[2][0] = 5
                 // tab2d[1]      = [3, 4]
  console.log( tab2d );      //-> 1, 2, 3, 4, 5, 6

WARNING.– We should be aware of some traps:

  • – indices: the first index simulates the line, the second the column;
  • – sizes: tab2d.length gives the “number of lines”, but for each array element the corresponding size may vary, and you must check if it is the correct “number of columns” (some frameworks do this for you). For example:
      // let width=304, height=228 be the sizes of a picture const t1 = [1, 2], t2 = [3, 4], t3 = [5, 6];
      const tab2d = [t1, t2, t3];
      tab2d.toString();      // 1,2,3,4,5,6
      tab2d[1][2] = width;   // same as: t2[2] = width;
      tab2d[1][3] = height;       // same as: t2[3] = height;
      tab2d.toString();       // 1,2,3,4,304,228,5,6
      // now, tab2d.length is still 3, t1.length = t3.length= 2,
      // but t2.length= 4 (beware!)

EXAMPLE.– Extracting a single column from a 2D array:

  let n = 1; // warning: will refer the second column!
  const col2 = tab2d.map(x => x[n] || null);
         // col2 = [2, 4, 6]

NOTE.– The shortcut "x[n] || null" is to avoid undefined, if cell n does not exist.

EXAMPLE.– Flattening a 2D array:

  const tab1d = tab2d.reduce((acc,x) => acc.concat(x), []);
  // tab1d = [1, 2, 3, 4, 5, 6]

NOTE.– acc is initialized to the empty array [] to which each line of tab2d is concatenated.

EXAMPLE.– Providing metadata to a 2D array:

An array is an object, with specific features (numbers as property names), and like an object, we can add properties besides .length; we can set a .title to name it, .dimension to inform about dimensionality, or .schemata, an array to name the columns of a 2D array, etc.

  const t1 = [1, 2], t2 = [3, 4], t3 = [5, 6];
  const tab2d = [t1, t2, t3];
  tab2d.dimension = 2;
  tab2d.schemata = ["date", "amount"];
  tab2d.columnsOk = tab2d.every(x=>x.length === tab2d[0].length);

5.3.1. Frameworks proposing an “augmented Array.prototype”

There exists some stable and safe frameworks providing additional methods to arrays, though “masked” in their “namespace”. For instance, statistical tools or LINQ-like queries (“Microsoft Language Integrated Query”), which you can find on GitHub, such as:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.139.168