Introducing JSON
The JavaScript Object Notation data format, or JSON for short, is derived from the literals of the JavaScript programming language. This makes JSON a subset of the JavaScript language. As a subset, JSON does not possess any additional features that the JavaScript language itself does not already possess. Although JSON is a subset of a programming language, it itself is not a programming language but, in fact, a data interchange format.
JSON is known as the data interchange standard, which subtextually implies that it can be used as the data format wherever the exchange of data occurs. A data exchange can occur between both browser and server and even server to server, for that matter. Of course, these are not the only possible means to exchange JSON, and to leave it at those two would be rather limiting.
JSON is attributed to being the creation of Douglas Crockford. While Crockford admits that he is not the first to have realized the data format,1 he did provide it with a name and a formalized grammar within RFC 4627. The RFC 4627 formalization, written in 2006, introduced the world to the registered Internet media type application/json, the file extension .json, and defines JSON’s composition. In December 2009, JSON was officially recognized as an ECMA standard, ECMA-404, and is now a built-in aspect of the standardization of ECMAScript-262, 5th edition.
Controversially, another Internet working group, the Internet Engineering Task Force (IETF), has also recently published its own JSON standard, RFC 7159, which strives to clean up the original specification. The major difference between the two standards is that RFC 7159 states that a valid JSON text must encompass any valid JSON values within an initial object or an array, whereas the ECMA standard suggests that a valid JSON text can appear in the form of any recognized JSON value. You will learn more about the valid JSON values when we explore the structure of JSON.
It is important to remember, as we get further into the structure of JSON, that as a subset of JavaScript, it remains subject to the same set of governing rules defined by the ECMA-262 standardization. You can feel free to read about the latest specification at the following URL: www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf. At the time of writing, the current edition of the ECMA-262 standard is 5.1; however, 6 is just around the corner.
Note While edition 5.1 is today’s current standard, at the time of JSON’s formalization, the ECMA-262 standard was only in edition 3.
Crockford documented JSON’s grammar on http://json.org in 2001, and soon word began to spread that there was an alternative to the XML data format. With the widespread adoption of Ajax (Asynchronous JavaScript and XML), JSON’s popularity began to soar, as people began to note its ease of implementation and how it rivaled that of XML. You would think that Ajax would have enforced the adoption of XML, as the x within the acronym strictly refers to XML. However, being modeled after SGML, a document format, XML possesses qualities that make it very verbose, which is not ideal for data transmission. One of the reasons JSON has become the de facto data format of the Web, as you will shortly see in the upcoming section, is due to its grammatical simplicity, which allows for JSON to be highly interoperable.
JSON Grammar
JSON, in a nutshell, is a textual representation defined by a small set of governing rules in which data is structured. The JSON specification states that data can be structured in either of the two following compositions:
As the origins of JSON stem from the ECMAScript standardization, the implementations of the two structures are represented in the forms of the object and array. Crockford outlines the two structural representations of JSON through a series of syntax diagrams. As I am sure you will agree, these diagrams resemble train tracks from a bird’s-eye view and thus are also referred to as railroad diagrams. Figure 4-1 illustrates the grammatical representation for a collection of string/value pairs.
Figure 4-1. Syntax diagram of a string/value pair collection
As the diagram outlines, a collection begins with the use of the opening brace ({), and ends with the use of the closing brace (}). The content of the collection can be composed of any of the following possible three designated paths:
Note String/value is equivalent to key/value pairs, with the exception that said keys must be provided as strings.
An example of each railroad path for a collection of string/value can be viewed within Listing 4-1. The structural characters that identify a valid JSON collection of name/value pairs have been provided emphasis.
Listing 4-1. Examples of Valid Representations of a Collection of Key/Value Pairs, per JSON Grammar
//Empty Collection Set
{};
//Single string/value pair
{"abc":"123"};
//Multiple string/value pairs
{"captainsLog":"starDate 9522.6","message":"I've never trusted Klingons, and I never will."};
Figure 4-2 illustrates the grammatical representation for that of an ordered list of values. Here we can witness that an ordered list begins with the use of the open bracket ([) and ends with the use of the close bracket (]).
Figure 4-2. Syntax diagram of an ordered list
The values that can be held within each index are outlined by the following three “railroad” paths:
An example of each railroad path for the ordered list can be viewed within Listing 4-2. The structural tokens that identify a valid JSON ordered list have been emphasized.
Listing 4-2. Examples of Valid Representations of an Ordered List, per JSON Grammar
//Empty Ordered List
[];
//Ordered List of multiple values
["abc"];
//Ordered List of multiple values
["0",1,2,3,4,100];
You may have found yourself wondering how it came to be that the characters [, ], {, and } represent an array and an object, as illustrated in Listing 4-1 and Listing 4-2. The answer is quite simple. These come directly from the JavaScript language itself. These characters represent the Object and Array quite literally.
As was stated in Chapter 2, both an object and an array can be created in one of two distinct fashions. The first invokes the creation of either, through the use of the constructor function defined by the built-in data type we wish to create. This style of object invocation can be seen in Listing 4-3.
Listing 4-3. Using the new Keyword to Instantiate an object and array
var objectInstantion = new Object(); //invoking the constructor returns a new Object
var arrayInstantiation = new Array(); //invoking the constructor returns a new Array
The alternative manner, which we can use to create either object or array, is by literally defining the composition of either, as demonstrated in Listing 4-4.
Listing 4-4. Creation of an object and an array via Literal Notation
var objectInstantion = {}; //creation of an empty object
var arrayInstantiation = []; //creation of an empty array
Listing 4-4 demonstrates how to create both an array and an object, explicitly using JavaScript’s literal notation. However, both instances are absent of any values. While it is perfectly acceptable for an array or object to exist without content, it will be more likely that we will be working with ones that possess values.
Because object literals can be used to design the composition of objects within source code, they can also be provisioned with properties as they are authored. Listing 4-5 should begin to resemble the syntax diagrams we just reviewed.
Listing 4-5. Designing an object and array via Literal Notation with the Provision of Properties
var objectInstantion = {name:"ben",age:36};
var arrayInstantiation = ["ben",36];
Note While Listing 4-4 and Listing 4-5 illustrate the creation of objects through the use of literals, JSON uses literals to capture the composition of data.
The JSON data format expresses both objects and arrays in the form of their literal. In fact, JSON uses literals to capture all JavaScript values, except for the Date object, as it lacks a literal form.
What you may not have noticed, due to its subtlety, is that JavaScript object literals do not require its key identifiers to be explicitly defined as strings. Take, for example, the literal declaration of {name:"ben", age:36}; from Listing 4-5. It could have equally been declared as {"name":"ben", age:36};. Both declarations will create the same object, allowing our program to reference the same name property equally. Consider the code within Listing 4-6.
Listing 4-6. Object Keys Can Be Defined Explicitly or Implicitly As Strings
var objectInstantionA = {name:"ben",age:36};
var objectInstantionB = {"name":"ben",age:36};
console.log( objectInstantionA.name ); // "ben"
console.log( objectInstantionB.name ); // "ben"
The reason the preceding example works is because, behind the scenes, JavaScript turns every key identifier into a string. That said, it is imperative that the key of every value pair be wrapped in double quotes to be considered valid JSON. This is due to the many reserved keywords in JSON’s superset and the fact that ECMA 3.0 grammar prohibits the use of keywords as the properties held by an object. The ECMA 3.0 grammar does not allow reserved words (such as true and false) to be used as a key identifier or to the right of the period in a member expression.2 Listing 4-7 demonstrates the first JSON text used to interchange data.3
Listing 4-7. The Very First JSON Message Used by Douglas Crockford
var firstJSON = {to:"session",do:"test","message":"Hello World"}; //Syntax Error in ECMA 3
However, this JSON text produced an error instantly, due to the use of the reserved keyword do as the property name of a string/value pair. Rather than outlining all words that would then cause such syntax errors, Crockford found it simpler to formalize that all property names must be explicitly expressed as strings.
Note If you were to reference the exact preceding code expecting to arrive at a syntax error, you’ll likely be confused why none is thrown. The ECMAScript, 5th edition allows for keywords to now be used with dot notation. However the JSON spec continues to account for legacy.
JSON Values
As mentioned earlier, JSON is a subset of JavaScript and does not add anything that the JavaScript language does not possess. So, naturally, the values that can be utilized within our JSON structures are represented by types, as outlined within the 3rd edition of the ECMA standard. JSON makes use of four primitive types and two structured types.
The next figure in succession, Figure 4-3, defines the possible values that can be substituted where the term value appears in Figures 4-1 and 4-2. A JSON value can only be a representative of string, number, object, array, true, false, and null. The latter three must remain lowercased, lest you invoke a parsing error. While Figure 4-3 does not clearly demonstrate it, all JSON values can be preceded and succeeded by whitespace, which greatly assists in the readability of the language.
Figure 4-3. Syntax diagram illustrating the possible values in JSON
String literals in the JavaScript language can possess any number of Unicode characters enclosed within either single or double quotes. However, it will be important to note, as outlined in Figure 4-4, that a JSON string must always begin and end with the use of double quotes. While Crockford does not justify this, it is for interoperable reasons. The C programming grammar states that single quotes identify a single character, such as a or z. A double quote, on the other hand, represents a string literal. While Figure 4-4 appears verbose, there are only four possible paths.
Figure 4-4. Syntax diagram of the JSON string value
Listing 4-8 demonstrates a variety of valid string values.
Listing 4-8. Examples of Valid String Values As Defined by the JSON Grammar
//absent of unicode
"";
//random unicode characters
"∑"; or " ";
//use of escaped character to display double quotes;
" " " ";
//use of u denotes a unicode value
"u22A0"; // outputs
//a series of valid unicode as defined by the grammar
"u22A0 " ∑ ";
A solidus, better known as a backslash, is used to demarcate characters as having an alternate meaning. Without the use of the , the lexer might interpret as a token what is intended to be used as a string, or vice versa. Escaping characters offers us the ability to inform the lexer to handle a character in a manner that is different from its “normal” behavior. Table 4-1 illustrates the use of the escaped literals for the prohibited characters.
Table 4-1. Escaped Literals
The last value to discuss is that of the number. A number in JSON is the arrangement of base10 literals, in combination with mathematical notation to define a real number literal. Figure 4-5 addresses the syntactical grammar of the JSON number in great detail; however, it’s rather simple when we view it step-by-step.
Figure 4-5. Syntax diagram of a JSON number
The first thing to note is that the numbers grammar does not begin or end with any particular symbolic representation, as our earlier object, array, and string examples did.
As illustrated in Figure 4-5, a JSON number must adhere to the following rules:
4.1. Made up of a singular base10 numerical literal at the 10s placement
4.2. Made up of any base10 numerical literal per placement beyond the decimal
5.1. E notation can be expressed in the form of a uppercase “E” or lowercase “e”
5.2. Immediately followed by a signed sequence of 1 or more base10 numeric literals (0-9)
Listing 4-9 reveals valid numerical values as defined by the JSON grammar.
Listing 4-9. Valid Numerical Values
-0.01 //valid use of 0's
00.1 //superfluous 0 produces a SyntaxError
1/3 //fraction form
.3333333333333333 //decimal form
1.2e-1 //scientific notation
Any of the values discussed in this chapter can be used in any combination when contained within a composite structure. Listing 4-10 illustrates how they can be mixed and matched. What is necessary is that the JSON grammar covered is followed. The examples in Listing 4-10 demonstrate proper adherence of the JSON grammar to portray data.
Listing 4-10. Examples of JSON Text Containing a Variety of Valid JSON Values
// JSON text of an array with primitives
[
null, true, 8
]
// JSON text of an object with two members
{
"first": "Ben",
"last": "Smith",
}
// JSON text of an array with nested composites
[
{ "abc": "123" },
[ "0", 1, 2, 3, 4, 100 ]
]
//JSON text of an object with nested composites
{
"object": {
"array": [true]
}
}
JSON Tokens
While the Object and Array are conventions used in JavaScript, JavaScript, like many programming languages, borrowed from the C language in one form or another. While not every language explicitly implements Arrays and Objects akin to JavaScript, they do often possess the means to model collections of key/value pairs and ordered lists. These may take on the form of Hash maps, dictionaries, Hash tables, vectors, collections, and lists. Furthermore, most languages will be capable of working with text, which is precisely what JSON is based on.
At the end of the day, JSON is nothing more than a sequence of Unicode characters. However, the JSON grammar standardizes which Unicode characters or “tokens” define valid JSON, in addition to demarcating the values contained within.
Therefore, when regarding the interchange of JSON and the many languages that do not natively possess Objects and Arrays, the tokens that make up the JSON text are all that is required to interpret if any collections or ordered lists exist and apply all values in a manner required of that language. This is accomplished with six structural characters, as listed in Table 4-2.
Table 4-2. Six Structural Character Tokens
One point to note is that JSON will ignore all insignificant whitespace before or after the preceding six structural tokens. Table 4-3 illustrates the four whitespace character tokens.
Table 4-3. Four Whitespace Character Tokens
Because JSON is nothing more than text, you may find it rather difficult to determine whether your JSON is properly formatted or not. Furthermore, if the syntax is inaccurate to the grammar specified, then you will find that your malformed JSON causes code to come to a halt. This would be due to the syntax error that would be uncovered at the time of trying to parse said JSON. You will learn about parsing in Chapter 6.
For this reason, any attempt to devise JSON by hand should be performed with the aid of an editor. The following list of JSON editors understand the JSON grammar and are able to offer some much needed and immediate validation.
The first editor, http://jsoneditoronline.org/, adheres to the ECMA-262 standardization and, therefore, allows your JSON text to represent a singular primitive value. Whereas the ladder follows the RFC 7159 standardization, thus requiring a JSON text to represent a structural value, i.e., array or object literal. It should be made known that the two editors mentioned previously are not the only two in existence. There are many online and offline editors, each with its own nuances. I favor the two mentioned, for their convenience.
Summary
In this chapter, I covered the history of JSON and the specifications of the JSON data format that defines the grammar of a valid JSON text. You learned that JSON is a highly interoperable format for data interchange. This is achieved via the standardization of a simplistic grammar that can be translated into any language simply by understanding the grammar.
As was demonstrated in this chapter, we can use the JSON grammar in conjunction with predetermined data to create JSON. Because we are simply working with text, it will be helpful to rely on an editor that understands JSON’s grammar, for validation purposes. However, JSON can be written with a basic text editor and saved as a JSON document, using the file extension .json. Furthermore, as a subset of JavaScript, JSON can even be hard-coded within a JavaScript file directly. Both methods are ideal for devising configuration files for an application.
The next chapter will reveal how we can use the JavaScript language to produce JSON at runtime.
Key Points from This Chapter
_________________
1http://yuiblog.com/yuitheater/crockford-json.m4v.
2Allen Wirfs-Brock, “ES 3.1 ‘true’ as absolute or relative?” https://mail.mozilla.org/pipermail/es-discuss/2009-April/009119.html, April 9, 2009.
18.222.164.141