CHAPTER 6

image

Parsing JSON

In the last chapter, I discussed how to convert a JavaScript value into a valid JSON text using JSON.stringify. In Chapter 4, you learned how JSON utilizes JavaScript’s literal notation as a way to capture the structure of a JavaScript value. Additionally, you learned in that same chapter that JavaScript values can be created from their literal forms. The process by which this transformation occurs is due to the parsing component within the JavaScript engine. This brings us full circle, regarding the serializing and deserializing process.

Parsing is the process of analyzing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. As the grammar of JSON is a subset of JavaScript, the analysis of its tokens by the parser occurs indifferently from how the Engine parses source code. Because of this, the data produced from the analysis of the JSON grammar will be that of objects, arrays, strings, and numbers. Additionally, the three literals—true, false, and null—are produced as well.

Within our program, we will be able to refer to any of these data structures as we would any other JavaScript value. In this chapter, you will learn of the manners by which we can convert valid JSON into usable JavaScript values within our program.

JSON.parse

In our investigation of the JSON Object, I discussed how the JSON Object possesses two methods. On one hand, there is the stringify method, which produces serialized JSON from a datum. And on the other hand, there is a method that is the antithesis of stringify. This method is known as parse. In a nutshell, JSON.parse converts serialized JSON into usable JavaScript values. The method JSON.parse, whose signature can be observed in Listing 6-1, is available from the json2.js library, in addition to browsers that adhere to ECMA 5th edition specifications.

Listing 6-1. Syntax of the JSON.parse Method

JSON.parse(text [, reviver]);

Until Internet Explorer 7 becomes a faded memory only to be kept alive as a myth when whispered around a campfire as a horror story, it will continue to be wise to include the json2.js library into your projects that work with JSON. Furthermore, json2.js is a fantastic way to gain insight into the inner workings of the method, short of interpreting ECMA specifications.

As outlined in the preceding, JSON.parse can accept two parameters, text and reviver. The name of the parameter text is indicative of the value it expects to receive. The parameter reviver is used similarly to the replacer parameter of stringify, in that it offers the ability for custom logic to be supplied for necessary parsing that would otherwise not be possible by default. As indicated in the method’s signature, only the provision of text is required.

You will learn about the optional reviver parameter a bit later. First, we will begin an exploration of the text parameter. The aptly named parameter text implies the JavaScript value, which should be supplied. Of course, this is a string. However, more specifically, this parameter requires serialized JSON. This is a rather important aspect, because any invalid argument will automatically result in a parse error, such as that shown in Listing 6-2.

Listing 6-2. Invalid JSON Grammar Throws a Syntax Error

var str = JSON.parse( "abc123" );  //SyntaxError: JSON.parse: unexpected character

Listing 6-2 throws an error because it was provided a string literal and not serialized JSON. As you may recall from Chapter 4, when the sole value of a string value is serialized, its literal form is captured within an additional pair of quotes. Therefore, "abc123" must be escaped and wrapped with an additional set of quotation marks, as demonstrated in Listing 6-3.

Listing 6-3. Valid JSON Grammer Is Successfully Parsed

var str = JSON.parse( ""abc123"" );  //valid JSON string value
console.log(str)                       //abc123;
console.log(typeof str)                //string;

The JavaScript value of a parsed JSON text is returned to the caller of the method, so that it can be assigned to an identifier, as demonstrated in Listing 6-3. This allows the result of the transformation to be referenced later throughout your program.

While the preceding example was supplied with a simple JSON text, it could have been a composite, such as a collection of key/value pairs or that of an ordered list. When a JSON text represents nested data structures, the transformed JavaScript value will continue to retain each nested element within a data structure commonly referred to as a tree. A simple explanation of a data tree can be attributed to a Wikipedia entry, which defines a tree as a nonlinear data structure that consists of a root node and, potentially, many levels of additional nodes that form a hierarchy.1

Let’s witness this with a more complex serialized structure. Listing 6-4 revisits our serialized author object from the previous chapter and renders it into JSON.parse. Using Firebug in conjunction with console.log, we can easily view the rendered tree structure of our author object.

Listing 6-4. Composite Structures Create a Data Tree

var JSONtext= '{"name":"Ben","age":36,"pets":[{"name":"Waverly","age":3.5},{"name":"Westley","age":4}]}';
var author = JSON.parse( JSONtext );
console.log( author);

/*Firebug Illustrates the parsed Data Tree of our serialized JSON text below
    age     36
    name  "Ben"
img  pets  [ Object { name="Waverly", age=3.5 },  Object { name="Westley", age=4 } ]
    img 0    Object { name="Waverly", age=3.5 }
    img 1    Object { name="Westley", age=4 }
*/

Once a JSON text is converted into that of a data tree, keys, also called members, belonging to any level of node structure are able to be referenced via the appropriate notion (i.e., dot notation/array notation). Listing 6-5 references various members existing on the author object.

Listing 6-5. Members Can Be Accessed with the Appropriate Notation

var JSONtext= '{"name":"Ben","age":36,"pets":[{"name":"Waverly","age":3.5},{"name":"Westley","age":4}]}';
var author = JSON.parse( JSONtext );
console.log(typeof author)        //object;
console.log(author.name)          // Ben
console.log(author.pets.length)   // 2;
console.log(author.pets[0].name)  // Waverly;

The magic of JSON.parse is twofold. The first proponent that allows for the transformation of JSON text into that of a JavaScript value is JSON’s use of literals. As we previously discussed, the literal is how JavaScript data types can be “literally” typed within source code.

The second aspect is that of the JavaScript interpreter. It is the role of the interpreter to possess absolute understanding over the JavaScript grammar and determine how to evaluate syntax, declarations, expressions, and statements. This, of course, includes JavaScript literals. It is here that literals are read, along with any other provided source code, evaluated by the interpreter of the JavaScript language and transformed from Unicode characters into JavaScript values. The JavaScript interpreter is safely tucked away and encapsulated within the browser itself. However, the JavaScript language provides us with a not-so-secret door to the interpreter, via the global function eval.

eval

The eval function is a property of the global object and accepts an argument in the form of a string. The string supplied can represent an expression, statement, or both and will be evaluated as JavaScript code (see Listing 6-6).

Listing 6-6. eval Evaluates a String As JavaScript Code

eval("alert("hello world")");

Albeit a simple example, Listing 6-6 illustrates the use of eval to transform a string into a valid JavaScript program. In this case, our string represents a statement and is evaluated as a statement. If you were to run this program, you would see the dialog prompt appear with the text hello world. While this is a rather innocent program, and one created to be innocuous, you must take great caution with what you supply to eval, as this may not always be the case. Listing 6-7 reveals that eval will also evaluate expressions.

Listing 6-7. eval Returns the Result of an Evaluation

var answer = eval("1+5");
console.log(answer) //6;

The eval function not only evaluates the string provided, but it can also return the result of an evaluated expression so that it can be assigned to a variable and referenced by your application. Expressions needn’t be mere calculations either, as demonstrated in Listing 6-8. If we were to supply eval with a string referencing an object literal, it, too, would be evaluated as an expression and returned as a JavaScript instance that corresponds to the represented object literal.

Listing 6-8. object Literals Can Be Evaluated by the eval Function

var array = eval("['Waverly','Westley','Ben']");
console.log(array[1]) //Westley;

Because JSON is a subset of JavaScript and possesses its own specification, it is important to always ensure that the supplied text is indeed a sequence of valid JSON grammar. Otherwise, we could be unaware of welcoming malicious code into our program. This will become more apparent when we invite JSON text into our program via Ajax. For this reason, while eval possesses the means to handle the transformation of JSON into JavaScript, you should never use eval directly. Rather, you should always rely on the either the JSON2.js library or the built-in native JSON Object to parse your JSON text.

If you were to open the json2.js library and review the code within the parse function, you would find that the JSON.parse method occurs in four stages.

The first thing the method aims to accomplish, before it supplies the received string to the eval function, is to ensure that all necessary characters are properly escaped, preventing Unicode characters from being interpreted as line terminators, causing syntax errors. For example, Listing 6-9 demonstrates that you cannot evaluate a string possessing a carriage return, as it will be viewed as an unterminated string literal.

Listing 6-9. String Literals Cannot Possess Line Breaks

var str="this is a sentence with a new line
... here is my new line";
// SyntaxError: unterminated string literal

// Similarly
eval(""this is a sentence with a new line u000a... here is my new line"");
// SyntaxError: unterminated string literal

However, as stated by EMCA-262, section 7.3, line terminator characters that are preceded by an escape sequence are now allowed within a string literal token.2 By escaping particular Unicode values, a line break can be evaluated within a string literal, as demonstrated in Listing 6-10.

Listing 6-10. String Literals Can Only Possess Line Breaks If They Are Escaped

eval(""this is a sentence with a new line \u000a... here is my new line"");  //will succeed

The JSON library does not just ensure that Unicode characters are properly escaped before they are evaluated into JavaScript code. It also works to ensure that JSON grammar is strictly adhered to. Because JSON is simply text, its grammar can be overlooked, if it is not created via JSON.stringify or a similar library. Furthermore, because a string can possess any combination of Unicode characters, JavaScript operators could be easily inserted into a JSON text. If these operators were evaluated, they could be detrimental to our application, whether or not they were intended to be malicious. Consider an innocent call that has an impact on our system, as shown in Listing 6-11.

Listing 6-11. Assignments Can Impact Your Existing JavaScript Values

var foo=123
eval("var foo = "abc"");
console.log(foo); // abc

Because JavaScript values can easily be overwritten, as demonstrated in Listing 6-11, it is imperative that only valid JSON text is supplied to eval.

The second stage of the parse method is to ensure the validity of the grammar. With the use of regular expressions, stage two seeks out tokens that do not properly represent valid JSON grammar. It especially seeks out JavaScript tokens that could nefariously cause our application harm. Such tokens represent method invocations, denoted by an open parenthesis (() and close parenthesis ()); object creation, indicated by the keyword new; and left-handed assignments, indicated by the use of the equal (=) operator, which could lead to the mutation of existing values. While these are explicitly searched for, if any tokens are found to be invalid, the text will not be further evaluated. Instead, the parse method will throw a syntax error.

However, should the provided text in fact appear to be a valid JSON, the parser will commence stage three, which is the provision of the sanitized text to the eval function. It is during stage three that the captured literals of each JSON value are reconstructed into their original form. Well, at least as close to their original form as JSON’s grammar allows for. Remember: JSON’s grammar prohibits a variety of JavaScript values, such as the literal undefined, functions and methods, any nonfinite number, custom objects, and dates. That said, the parse method offers the ability for us to further analyze the produced JavaScript values in a fourth and final stage, so that we can control what JavaScript values are returned for use by our application. If, however, the reviver parameter is not supplied, the produced JavaScript value of the eval function is returned as is.

The final stage of the parse operation occurs only if we supply an argument to the method, in addition to the JSON text we seek to be transformed. The benefit of the optional parameter is that it allows us to provide the necessary logic that determines what JavaScript values are returned to our application, which otherwise would be impossible to achieve by the default behavior.

reviver

The reviver parameter, unlike the replacer parameter of the stringify method, can only be supplied a function. As outlined in Listing 6-12, the reviver function will be provided with two arguments, which will assist our supplied logic in determining how to handle the appropriate JavaScript values for return. The first parameter, k, represents the key or index of the value being analyzed. Complementarily, the v parameter represents the value of said key/index.

Listing 6-12. Signature of reviver Function

var reviver = function(k,v);

If a reviver function is supplied, the JavaScript value that is returned from the global eval method is “walked” over, or traversed, in an iterative manner. This loop will discover each of the current object’s “own” properties and will continue to traverse any and all nested structures it possesses as values. If a value is found to be a composite object, such as array or object, each key that object is in possession of will be iterated over for review. This process continues until all enumerable keys and their values have been addressed. The order in which the properties are uncovered is not indicative of how they are captured within the object literals. Instead, the order is determined by the JavaScript engine.

With each property traversed, the scope of the reviver function supplied is continuously set to the context of each object, whose key and value are supplied as arguments. In other words, each object whose properties are being supplied as arguments will remain the context of the implicit this within the reviver function. Last, it will be imperative for our reviver method to return a value for every invocation; otherwise, the JavaScript value returned will be that of undefined, as shown in Listing 6-13.

Listing 6-13. Members Are Deleted If the Returned Value from reviver Is undefined

var JSONtext='{"name":"Ben","age":36,"pets":[{"name":"Waverly","age":3.5},{"name":"Westley","age":4}]}';
var reviver= function(k,v){};
var author = JSON.parse( JSONtext,reviver);
console.log(author) //undefined
console.log(typeof author) //undefined

If the return value from the reviver function is found to be undefined, the current key for that value is deleted from the object. Specifying the supplied v value as the return object will have no impact on the outcome of the object structure. Therefore, if a value does not require any alterations from the default behavior, just return the supplied value, v, as shown in Listing 6-14.

Listing 6-14. Returning the Value Supplied to the reviver Function Maintains the Original Value

var JSONtext='{"name":"Ben","age":36,"pets":[{"name":"Waverly","age":3.5},{"name":"Westley","age":4}]}';
var reviver= function(k,v){ return v };
var author = JSON.parse( JSONtext,reviver);
console.log( author );
/* the result as show in firebug below
    age     36
    name  "Ben"
img pets    [ Object { name="Waverly", age=3.5 },  Object { name="Westley", age=4 } ]
    img  0      Object { name="Waverly", age=3.5 }
    img  1      Object { name="Westley", age=4 }
*/
console.log(typeof author); //object

As was stated earlier, a well-defined set of object keys is not only useful for your application to target appropriate data but can also provide the necessary blueprint to our reviver logic for clues leading to how and when to alter a provided value. The reviver function can use these labels as the necessary conditions to further convert the returned values of the eval, in order to arrive at the JavaScript structures we require for our application’s purposes.

As you should be well aware at this point, JSON grammar does not capture dates as a literal but, instead, as a string literal in the UTC ISO format. As a string literal, the built-in eval function is unable to handle the conversion of said string into that of a JavaScript date. However, if we are able to determine that the value supplied to our reviver function is a string of ISO format, we could instantiate a date, supply it with our ISO-formatted string, and return a valid JavaScript date back to our application. Consider the example in Listing 6-15.

Listing 6-15. With the reviver Function, ISO Date-Formatted Strings Can Be Transformed into date objects

var date= new Date("Jan 1 2015");
var stringifiedData = JSON.stringify( date );
console.log( stringifiedData );  // "2015-01-01T05:00:00.000Z"
var dateReviver=function(k,v){
     return new Date(v);
}
var revivedDate = JSON.parse( stringifiedData , dateReviver);
console.log( revivedDate );  //Date {Thu Jan 01 2015 00:00:00 GMT-0500 (EST)}

Because the ISO date format is recognized as a standard, JavaScript dates can be initiated with the provision of an ISO-formatted string as an argument. Listing 6-15 shows a program that begins with a known JavaScript date set to January 1, 2015. Upon its conversion to a JSON text, our date is transformed into a string made up of the ISO 8601 grammar. By supplying a reviver function, which possesses the necessary logic, JSON.parse is able to return a date to our application.

For purely illustrative purposes, Listing 6-15 does not have to determine if the value supplied is in fact an ISO-formatted string. This is simply because we know the value being supplied is solely that. However, it will almost always be necessary for a reviver function to possess the necessary conditional logic that controls how and when to treat each supplied value.

That said, we could test any string values supplied to our reviver function against the ISO 8601 grammar. If the string is determined to be a successful match, it can be distinguished from an ordinary string and thus transformed into a date. Consider the example in Listing 6-16.

Listing 6-16. RegExp Can Match ISO-Formatted Strings

var book={};
    book.title = "Beginning JSON"
    book.publishDate= new Date("Jan 1 2015");
    book.publisher= "Apress";
    book.topic="JSON Data Interchange Format"

var stringifiedData = JSON.stringify( book );
console.log( stringifiedData );
// ["value held by index 0","2015-01-01T05:00:00.000Z","value held by index 2","value held by index 3"]

var dateReviver=function(k,v){
    var ISOregExp=/^([+-]?d{4}(?!d{2}))((-?)((0[1-9]|1[0-2])(3([12]d|0[1-9]|3[01]))?|W([0-4]d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]d|[12]d{2}|3([0-5]d|6[1-6])))([Ts]((([01]d|2[0-3])((:?)[0-5]d)?|24:?00)([.,]d+(?!:))?)?(17[0-5]d([.,]d+)?)?([zZ]|([+-])([01]d|2[0-3]):?([0-5]d)?)?)?)?$/;
    if(typeof v==="string"){
        if(ISOregExp.test(v)){
             return new Date(v);
        }
    }
    return v;
}
var revivedValues = JSON.parse( stringifiedData , dateReviver);
console.log( revivedValues );
/* the result as show in firebug below
img publishDate    Date {Thu Jan 01 2015 00:00:00 GMT-0500 (EST)} ,
    publisher    "Apress",
    title        "Beginning JSON"
    topic        "JSON Data Interchange Format"
*/

In the preceding example, our application parses a composite structure, which is simply an array. The value of each key is in the form of a string, one of which, however, represents a date. Within the reviver function, we first determine if each value supplied is that of a string, via the operator typeof. If the value is determined to be of the string type, it is further compared against the ISO grammar by way of a regular expression. The variable ISOregExp references the pattern that matches a possible ISO-formatted string. If the pattern matches the value supplied, we know it is a string representation of a date, and, therefore, we can transform our string into a date. While the preceding example produces the desired effect, a regular expression may not prove most efficient in determining which strings should be converted and which should not.

This is where we can rely on a well-defined identifier. The k value, when supplied as a clearly defined label, as shown in Listing 6-17, can be incredibly useful for coordinating the return of the desired object.

Listing 6-17. Well-Defined Label Identifiers Can Be Used to Establish Which objects Require Added Revival

var book={};
    book.title = "Beginning JSON"
    book.publishDate= new Date("Jan 1 2015");
    book.publisher= "Apress";
    book.topic="JSON Data Interchange Format"

var bookAsJSONtext = JSON.stringify(book);
console.log( bookAsJSONtext );
// "{"title":"Beginning JSON",
    "publishDate":"2015-01-01T05:00:00.000Z",
    "publisher":"Apress",
    "topic":"JSON Data Interchange Format"}"

var reviver = function( k , v ){
       console.log( k );

/* logged keys as they were supplied to the reviver function */
// title
// publisher
// date
// publishedInfo
// topic
//(an empty string)

    if( k ==="publishDate"){
        return new Date( v );
    }
    return v;
}

var parsedJSON = JSON.parse( bookAsJSONtext , reviver );
console.log( parsedJSON );

/* the result as show in firebug below
img publishDate   Date {Thu Jan 01 2015 00:00:00 GMT-0500 (EST)} ,
   publisher    "Apress",
   title        "Beginning JSON"
   topic        "JSON Data Interchange Format"
*/

Listing 6-17 achieves the same results as Listing 6-16; however, it does not rely on a regular expression to seek out ISO-formatted dates. Instead, the reviver logic is programmed to revive only strings whose key explicitly matches publishDate.

Not only do labels offer more possibility when determining whether the value should or should not be converted, their use is also more expedient than the former method. Depending on the browser, the speeds can range from 29% to 49% slower when the determining factor is based on RegExp. The results can be viewed for yourself in the performance test available at http://jsperf.com/regexp-vs-label.

It was briefly mentioned in Chapter 5 that custom classes, when serialized, are captured indistinguishably from the built-in objects of JavaScript. While this is indeed a hindrance, it is not impossible to transform your object into a custom object, by way of the reviver function.

Listing 6-18 makes use of a custom data type labeled Person, which possesses three properties: name, age, and gender. Additionally, our Person data type possesses three methods to read those properties. An instance of Person is instantiated using the new keyword and assigned to the variable p. Once assigned to p, the three properties are supplied with valid values. Using the built-in instanceof operator, we determine whether our instance, p, is of the Person data type, which we soon learn it is. However, once we serialize our p instance, and parse it back into that of a JavaScript object, we soon discover via instanceof that our p instance no longer possesses the Person data type.

Listing 6-18. Custom Classes Are Serialized As an Ordinary object

function Person(){
    this.name;
    this.age;
    this.gender;
}
Person.prototype.getName=function(){
    return this.name;
};
Person.prototype.getAge=function(){
    return this.age;
};
Person.prototype.getGender=function(){
    return this.gender;
};

var p=new Person();
    p.name="ben";
    p.age="36";
    p.gender="male";

console.log(p instanceof Person); // true
var serializedPerson=JSON.stringify(p);

var parsedJSON = JSON.parse( serializedPerson );
console.log(parsedJSON instanceof Person); // false;

Because the reviver function is invoked after a JSON text is converted back into JavaScript form, the reviver can be used for JavaScript alterations. This means that you can use it as a prepping station for the final object to be returned. What this means for us is that, using the reviver function, we can cleverly apply inheritance back to objects that we know are intended to be of a distinct data type. Let’s revisit the preceding code in Listing 6-19, only this time, with the knowledge that our parsed object is intended to become a Person.

Listing 6-19. Reviving an object’s Custom Data Type with the reviver Function

function Person(){
    this.name;
    this.age;
    this.gender;
};

Person.prototype.getName=function(){
    return this.name;
};
Person.prototype.getAge=function(){
    return this.age;
};
Person.prototype.getGender=function(){
    return this.gender;
};
//instantiate new Person
var p=new Person();
    p.name="ben";
    p.age="36";
    p.gender="male";

//test that p possesses the Person Data Type
console.log(p instanceof Person); // true

var serializedPerson=JSON.stringify(p);

var reviver = function(k,v){
// if the key is an empty string we know its our top level object
    if(k===""){
        //set objects inheritance chain to that of a Person instance
        v.__proto__ = new Person();
    }
    return v;
}

var parsedJSON = JSON.parse( serializedPerson , reviver );

//test that parsedJSON possesses the Person Data Type
console.log(parsedJSON instanceof Person); // true
console.log(parsedJSON.getName()); // "Ben"

The __proto__ property used in the preceding example forges the hierarchical relationship between two objects and informs JavaScript where to further look for properties when local values are unable to be found. The __proto__ was originally implemented by Mozilla and has slowly become adopted by other modern browsers. Currently, it is only available in Internet Explorer version 11 and, therefore, shouldn’t be used in daily applications. This demonstration is intended for illustrative purposes, to demonstrate succinctly how the reviver function offers you the ability to be as clever as you wish, in order to get the parsed values to conform to your application’s requirements.

Summary

JSON.parse is the available mechanism for converting JSON text into a JavaScript value. As part of the JSON global object, it is available in modern browsers as well as older browsers, by way of including the json2.js library into your application. In order to convert the literals captured, json2.js relies on the built-in global eval function to access the JavaScript interpreter. While you learned that using the eval function is highly insecure, the JSON Object seeks out non-matching patterns of the JSON grammar throughout the supplied text, which minimizes the risk of inviting possibly malicious code into your application. If the parse method uncovers any tokens that seek to instantiate, mutate, or operate, a parse error is thrown. In addition, the parse method is exited, preventing the JSON text from being supplied to the eval function.

If the supplied text to the parse method is deemed suitable for eval, the captured literals will be interpreted by the engine and transformed into JavaScript values. However, not all objects, such as dates or custom classes, can be transformed natively. Therefore, parse can take an optional function that can be used to manually alter JavaScript values, as required by your application.

When you design the replacer, toJSON, and reviver functions, using clearly defined label identifiers will allow your application the ability to better orchestrate the revival of serialized data.

Key Points from This Chapter

  • JSON.parse throws a parse error if the supplied JSON text is not valid JSON grammar.
  • parse occurs in four stages.
  • eval is an insecure function.
  • Supply only valid JSON to eval.
  • A reviver function can return any valid JavaScript value.
  • If the reviver function returns the argument supplied for parameter v, the existing member remains unchanged.
  • If reviver returns undefined as the new value for a member, said member is deleted.
  • reviver manipulates JavaScript values, not JSON grammar.

_________________

1Wikipedia, “Tree (data structure),” http://en.wikipedia.org/wiki/Tree_%28data_structure%29#Terminology, modified January 2015.

2ECMA International, ECMAScript Language Specification, Standard ECMA-262, Edition 5.1, Section 7.3, www.ecma-international.org/ecma-262/5.1/#sec-7.3, June 2011.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.253.31