For a data scientist, handling arrays of tabulated data is a recurring task. The baseline, if using JavaScript, is to be able to do what we usually do with a spreadsheet and basic macrooperations. Thousands and thousands of tabulated data can be accessed freely on the Internet: public data for most countries, from international bodies, or free access private data. In Part 3, several applications will be discussed (e.g. French parliament election 2017).
JavaScript provides several built-in objects able to represent tabulated data, which we can access through an index: Array
, of course, and TypedArray, Map, Set, String
are other “Iterables”. They all share several features, such as the length property, but are named “array-like”. The list of arguments of a function, and a list of selected HTML DOM elements, are also “array-like” objects.
This chapter is devoted to the object “Array”
.
An “array” is a set of ordered values, which we can access with a numeric index. This index starts at 0 not 1 (zero-based index).
There is no specific type for arrays: the operator typeof
returns "object"
, but the static method Array.isArray(tab)
can check if its argument is an array, hence tab
inherits all the Array.prototype
methods.
The syntax of an “array literal” uses [square brackets] to delimit a list of elements, which can be primitive values, variable names, objects and functions, separated by commas:
[elem1, elem2, …] // any number of elements, including zero
Examples:
const tm = ["un", 2]; // two values: a string, a number
const ts = ["Jean","Bob"], tabn = [12,24]; // homogeneous arrays
const t2d = [ ts, tn]; // an array of two arrays
const te = []; // empty array: tab.length = 0
const tu = ["a",,"c"]; // syntactically correct, but to avoid!
WARNING.– When typing, do not remove an element, while leaving its comma: the array tu above has: tu.length = 3 and tu[1] = undefined
.
Here are some best practices with arrays:
const:
the elements, and their number, can be modified but it restricts the use of its name to that array, once for all;new Array()
, it avoids ambiguities:
const tm = new Array("two"); // equiv: tm = ["two"];
const tn = new Array(2); // equiv.: tn = [undefined, undefined];
The operator typeof
does not distinguish an array from an object:
const t = [a, b, c]; console.log( typeof t ); // "object"
Instead, use the static method:
console.log( Array.isArray(t) ); // true
The length
is in read-write access, and the index starts at 0 not 1:
console.log( [a, b, c].length ); // -> 3
const t = ["one", 2]; console.log( t[1]); // -> 2
The property length
is permanently updated, returning the index of the last element plus one. Adding or removing an element in the array immediately modifies length
. In turn, modifying length infers the addition or removal of elements, at or from the end of the array.
This feature can be used for some operations:
const t1 = []; console.log(tab1.length); // -> 0 (t1 empty)
const t2 = ["one", "two", "three"]; // t2.length = 3
t2.length = 0; // now t2 is empty
The “garbage collector” will take care of the memory previously assigned to t2
.
const t = []; t.length = N; // [undefined, undefined,… N times]
NOTE.– An array is an object, with the specific ability to use numbers as property names, which is forbidden for a regular object:
const obj = {"p1":1, "p2":"two"};
console.log(obj["p2"]); // two
console.log(obj[1]); // undefined (no such property)
NOTE.– We can add own properties to an array, as we do with objects. For example:
const t1 = [], t2 = [[], []];
// let's add the property 'dimension', then print values:
t1.dimension = 1; t2.dimension = 2;
console.log( t1["length"] +", "+ t1["dimension"] ); //-> 0, 1
console.log( t2["length"] +", "+ t2["dimension"] ); //-> 2, 2
In the section Array.prototype
, we will learn that it is often possible to avoid handling the individual elements of an array and rather work with the array as a whole. However, it is worth knowing how indices behave. Reminder: they start at zero.
const t = [2, 4];
let n = 1; console.log( t[n] ); // 4
n = 6; console.log( t[n] ); // undefined
n = -6; console.log( t[n] ); // undefined
The value undefined
can result from either of the following cases:
[0, t.length-1];
undefined
(“missing value”).NOTE.– An index is an “object property name”: the index is of type “number”
, but it is evaluated as a “string”
when used for accessing an element, which is to say a “property” of the array:
( tab[1] === tab["1"] ); // -> true
NOTE.– Missing values: some methods ignore them, whereas a for loop enumerates them: e.g. tu[1]
in the example const tu = ["a", ,"c"];
Recommendation: avoid unnecessary missing values.
By far, the best way to handle arrays is to use the Array.prototype
methods and, whenever needed, the static methods of the object Array.
Method | Description | Return |
Array.of() |
Initializes an array with the arguments | Array |
Array.from () |
Creates an array from an “Iterable” | Array |
Array.i sArray () |
Checks if the argument is an “Array” | Boolean |
These static methods must be invoked from the built-in object Array
. Example: Array.isArray(["a","b"]); // true
.
See section 5.2.5 for the use of Array.from.
By contrast, the methods of the object Array.prototype
are delegated to every array. For example: [].push("a"); // ["a"]
The Array.prototype
methods are grouped into three families: “Mutators”, “Accessors” and “Iterators”. We use the notation [].method
to identify them.
Nine methods that modify the array on which they are invoked; the array length may change. Several methods are useful at initialization stage:
Method | Description | Return |
a.copyWithin |
Moves a slice of the array according to (target, start, end) | tab |
a.fill |
Copies a same value according to (val, start, end) | tab |
a.pop a.shift |
Removes the last (respectively, first) value, decrementing length | valeur |
a.push a.unshift |
Adding a value at the end (respectively, beginning), incrementing length | length |
a.reverse |
Reverse the array | tab |
a.sort |
Alphabetical sort or according to comparator (function) | tab |
a.splice |
Modifies a slice according to (start, count, [values list]), modifies length | tab |
– The methods “fill”, “push”, “unshift”
const t = []; t.length = N; t.fill(0); // [0, 0, … N]
let x; const t = []; // start with an empty array
do{ x = Math.random() * (10 - 1) + 1; // data in [1,10]
if(x > 5){t.push(x.toFixed(1));} // keep if > 5
} while (x > 5); // stop if ≤ 5
if(t.length > 0){t.unshift("random > 5");} // set t[0] if t ≠ ∅
This makes it possible to “revise” an array by modifying one element or a slice. For instance, to update the first name of an object whose full name contains a given last name:
findIndex
the element containing “Dent”: gives k
;[k]
.
const t = ["Jean Bon", "Ange Leau", "P. Dent", "Paul Tron"]; //1
let k = t.findIndex(x=>x.includes("Dent")); // (see below) //2
t.splice(k, 1, "Redon Dent"); // replaces the value for t[k] //3
These methods modify the order of the elements: “reverse” is trivial and “sort” is more complex.
const t = ["Jean Bon", "Ange Leau", "P. Dent", "Paul Tron"];
t.sort(); // ["Ange Leau", "Jean Bon", "P. Dent", "Paul Tron"]
'compare'
as argument, sort is made depending on the result of the compare function element by element:
function compare(a, b) {
if (a < b [by some ordering criterion]) {return -1;}
if (a > b [by same ordering criterion]) {return 1;}
return 0; // because a = b
}
The function compare
is used in a quickSort algorithm. The array is partitioned into slices, each slice having a pivot (value 'a
'). At some point of the (ascending) sort, for each element (value 'b
') preceding 'a', if compare(a,b) > 0
, the element is moved, else left in place. A symmetric operation occurs if 'a'
precedes 'b'
:
EXAMPLE 1.– Sorting is done by numbers (ascending order, ignoring Infinity, NaN
)
function compareNumbers(a,b) {return a - b;}
[1, 4, 2, 11, 3].sort(compareNumbers); // [1,2,3,4,11] (num)
[1, 4, 2, 11, 3].sort(); // [1,11,2,3,4] (alpha!)
EXAMPLE 2.– Sorting is done by the last name in “first last” string.
function compareLastNames(a,b){
let al = a.split(" ")[1], // second word in a
bl = b.split(" ")[1]; // second word in b
if (a1 < b1) return -1;
else if (a1 == b1) return 0;
else return 1;
}
const t = ["Jean Bon","Ange Leau","P. Dent","Paul Tron"]; t.sort(compareLastNames);
// ["Jean Bon","P. Dent","Ange Leau","Paul Tron"]
Seven methods do not modify the array, but yield a representation of the data in the array, which can be a new array, an index value and a string.
Method | Description | Return |
a.concat |
Concatenates the array with the argument | Array |
a.lastIndexOf |
Index of the first/last occurrence of (val), or -1 if not found | Number |
a.slice |
Copies a slice (start,end) into a new array | Array |
a.join |
Makes a string with all the values of the array similar to join (+“locale” format) similar to join (not choosing the separator) | String |
The use of these methods is rather straightforward.
Repeat an operation for each element, not modifying the array. The result can be:
map
), or shorter (filter
);find
, findIndex
);reduce
, reduceRight
);forEach
) of a block of operations element by element.Method | Description | Return |
a.entries |
Returns an array of couples [key,value] An array of [key] An array of [value] | Iterator |
a.every |
Checks if all elements comply with (function) If one element complies (function) | Boolean |
a.filter |
Array of elements selected by (function) | Array |
a.find |
First value complying with (function), or undefined index of that first value, or -1 | Value Index |
a.forEach |
Processes (function) for each element | Undefined |
a.map |
New array with values modified according to (function) | Array |
a.reduce |
Cumulates the values according to (function) idem, from the end of the array | Value |
Important recommendations for handling arrays are as follows:
Use Array.prototype
methods, not loops: it is easier and much less error prone.
There are two kinds of loops:
for(let i = 0; i < t.length; i++)'
or 'while..
';for(let prop in t)
'.The enumeration loop is not concerned by the remark. The iteration loop uses an index and the length of the array: we have pointed out the risks when declaring or forgetting-to-declare the index.
All basic operations for creating and handling arrays exist since ES6.
This transforms an iterable into a regular array. For example:
const args = Array.from(arguments); // within a function
const t2 = Array.from({length:3},function(){return 1}); // [1,1,1]
const t3 = Array.from({length:3},function(v,i){return i*i});
const t4 = Array.from("hello"); // ["h", "e", "l", "l", "o"]
const rd = Array.from({length:n},
function(x){return Math.random() * 10;});
'args'
: see section 6.3 (Chapter 6);'t2'
: identical to the example with “fill” (but in one instruction);'t3'
: the first argument {length:3}
is an iterable (it has a length), it allows from
to iterate the function (argument #2) three times to i*i
: resulting in [0, 1, 4];
't4'
: transforms a string into an array;'rd'
: a data series of n
random numbers between 0 and 9.This runs the function for each element of the array tab
. For example, to compute a value and at the same time, save some data in an additional array:
let rc = "figures ", rl = " and letters "; const tabc = [];
["a","b", 1, 2].forEach(function(v, i){
if(typeof v === 'string') rl += (i===0?"" : ", ") + v; else tabc.push(v);
});
rc +tabc.toString()+ rl; // figures 1,2 and letters a, b
This runs the function for each element of the array [], like forEach
, plus saves the returned value into a new array. For example:
const tabl = ["a","b", 1, 2].map(function(v, i){
return (typeof v === 'string'? v : v.toString());
}); // tabl will contain only 'strings'
This runs the function up to some element in the array. By contrast with most array methods, these methods can stop running before the end of the array:
some
: stops as soon as one element complies;every
: stops as soon as one element does not comply
function isEven(elt, index, array) {return (elt % 2 == 0);}
[2,4,5,8,6,2,3,12,4,8,9,1].every(isEVen);
// -> false (stops at '5')
[2,4,5,8,6,2,3,12,4,8,9,1].some(isEVen) ;
// -> true (stops at first '2')
This stops as soon as the argument is found, and returns the value or the index.
function isPrime(v, i) {
let f = 2;
while (f <= Math.sqrt(v))
{if (v % f++ < 1){return false;}}
return v > 1;
}
[4, 5, 8, 12].findlndex(isPrime); //-> 1 (and stops)
This selects the elements complying with the function into a new array. It is similar to the SQL command “Select_From_Where”:
const upTo10 = Array.from({length:10},
function(v,i){return i+1}); // [1,2,3,4,5,6,7,8,9,10]
const primes = upTo20.filter(isPrime); // [2,3,5,7]
This aggregates onto an accumulator similarly to the SQL command “GroupBy”:
function multAcc(acc, v){return acc *= v;}
const factorial = upTo10.reduce(multAcc); // 3628800 (!)
Thanks to the polymorphism of +, we can aggregate onto a string:
function tPrim(acc, v){return acc +' (${v})';}
const ini = "First primes: ";
const str = upTo10.filter(isPrime).reduce(tPrim,ini);
// First primes: (2) (3) (5) (7)
Here reduce
uses the second argument that initializes the accumulator:
["c1","c2"].reduce(function(s,x){s+'<li>${x}</li>'}, "<p>")+"</p>";
// <p><li>c1</li><li>c2</li></p>
We will use this feature to create text to display on the web page (see Part 3). Another feature is used in that example: the chaining of methods.
This is a “functional” capacity provided by JavaScript: chaining methods means that the result of the first method is an object that accepts the next method. For example (above), the result of filter
is an array, and we can invoke the method reduce
on that array in a single instruction (over three lines for better readability):
Array.from({length:10}, function(v,i){return i+1})
.filter(isPrime)
.reduce(tPrim,ini);
With the help of the “arrow function syntax”, you may even write:
Array.from({length:10},(v,i)=>i+1)
.filter(isPtime) .reduce((s,p) =>s+' (${p}) ',ini);
The arrow syntax can be used everywhere for function expressions, and in particular in the context of array methods, it may ease code readability, for example:
const f = function(x,i){do_something_with_x_and_i; return value;}
let out = tab.reduce(function(acc,x,i){return acc + f(x,i);},ini);
equivalent, in the arrow syntax , to:
const f = (x,i)=>{do_something_with_x_and_i; return value;}
let out = tab.reduce((acc,x,i)=>acc + f(x,i),ini);
Readability is improved if the code of the function is simply a short return instruction (e.g. 'out'
). With a more complex code (e.g. function isPrime
) the arrow syntax may not help much: you would rather put that code in a function to be called (e.g. similar to the f(x,i)
above).
NOTE.– The arrow syntax introduces a difference in the handling of the pronoun this
(which we will study later).
Most array methods accept a function as first argument (plus some optional arguments: see initial value of the accumulator, with reduce
). This function argument accepts three arguments, or four, when used with reduce
.
filter
const t = Array.from({length:n}, ()=>Math.random()*10);
const tnew = t.filter(function(x, i, tbl){return /* code */});
// @x: is the current the element // x: mandatory
// @i: is the current index (when required)
// @tbl: link to the array 't' (seldom useful)
NOTE.– t
receives an array of n
random numbers in the range [0-9].
reduce
let str = t.reduce(function(acc, x,i,tbl){return acc+…;},acc0);
// @acc: is the accumulator, initialized to acc0 (if acc0 present)
// if no acc0, acc initialized to x[0]
// @x, i, tbl: as above// acc and x: mandatory
(map+reduce)
const t = Array.from({length:n},()=>Math.random()*10);
function pcentChange(x, i, tbl){return i>0? x - tbl[i-1]/x:0;}
// can be directly written into the map method,
// then chained with reduce to compute the mean average:
let mean = t
.map((x,i,tbl)=>i>0? x - tbl[i-1]/x:0;)
.reduce((acc,x)=>acc+x) / t.length;
Several built-in objects have been added by ES6: TypedArray
, Map
, Set
and an “iterable” behavior have been introduced in the norm, making them “Iterable” together with String
and Array
: all have a length
, and know how to “iterate” on their elements, which allows Array.from
to work.
TypedArray
, Map
, Set
and Array
all have their forEach
method, and the DOM NodeList
object as well (see Part 2).
The Arguments
object, accessible within any function, under the name arguments
is also an Iterable: arguments.length
is ≥0 (not undefined
) and we can make an array from it:
const args = Array.from(arguments)
Handling multidimensional data (e.g. 2D) is a frequent task of the data scientist.
The “spreadsheet” is the most common pattern for simple databases (non-transactional DB). In many situations, these tabulated data are easy to collect and archive (very simple training). Each line of the file has a defined number of elements, and columns are “named”, which can be easily translated into a JavaScript object notation:
const line_i = {"nameCol1": val1, "nameCol2": val2, … };
Here is an example of a 2D array:
const t1 = [1, 2], t2 = [3, 4], t3 = [5, 6];
const tab2d = [t1, t2, t3];
// tab2d[1][1] = 4 tab2d[2][0] = 5
// tab2d[1] = [3, 4]
console.log( tab2d ); //-> 1, 2, 3, 4, 5, 6
WARNING.– We should be aware of some traps:
tab2d.length
gives the “number of lines”, but for each array element the corresponding size may vary, and you must check if it is the correct “number of columns” (some frameworks do this for you). For example:
// let width=304, height=228 be the sizes of a picture const t1 = [1, 2], t2 = [3, 4], t3 = [5, 6];
const tab2d = [t1, t2, t3];
tab2d.toString(); // 1,2,3,4,5,6
tab2d[1][2] = width; // same as: t2[2] = width;
tab2d[1][3] = height; // same as: t2[3] = height;
tab2d.toString(); // 1,2,3,4,304,228,5,6
// now, tab2d.length is still 3, t1.length = t3.length= 2,
// but t2.length= 4 (beware!)
EXAMPLE.– Extracting a single column from a 2D array:
let n = 1; // warning: will refer the second column!
const col2 = tab2d.map(x => x[n] || null);
// col2 = [2, 4, 6]
NOTE.– The shortcut "x[n] || null
" is to avoid undefined
, if cell n
does not exist.
EXAMPLE.– Flattening a 2D array:
const tab1d = tab2d.reduce((acc,x) => acc.concat(x), []);
// tab1d = [1, 2, 3, 4, 5, 6]
NOTE.– acc
is initialized to the empty array []
to which each line of tab2d
is concatenated.
EXAMPLE.– Providing metadata to a 2D array:
An array is an object, with specific features (numbers as property names), and like an object, we can add properties besides .length;
we can set a .title
to name it, .dimension
to inform about dimensionality, or .schemata
, an array to name the columns of a 2D array, etc.
const t1 = [1, 2], t2 = [3, 4], t3 = [5, 6];
const tab2d = [t1, t2, t3];
tab2d.dimension = 2;
tab2d.schemata = ["date", "amount"];
tab2d.columnsOk = tab2d.every(x=>x.length === tab2d[0].length);
There exists some stable and safe frameworks providing additional methods to arrays, though “masked” in their “namespace”. For instance, statistical tools or LINQ-like queries (“Microsoft Language Integrated Query”), which you can find on GitHub, such as:
18.220.139.168