Semi-structured data models

In Chapter 3Defining Data Models, we learned about different types of data, including structured, semi-structured, and unstructured. In this section, we are going to discuss more semi-structured data. The World Wide Web (WWW) is the largest information source today. If we have to classify the data model behind the web, we can say it belongs to the semi-structured data model. Most of the semi-structured data refer to tree-structure data. 

Let's take the example of a web page:

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Page Title</title>
</head>
<body>
<h1>This is a Heading</h1>
<p>This is a paragraph.</p>
<ul>
<li>List Item 1</li>
<li>List Item 2</li>
<li>List Item 3</li>
</ul>
<footer><center>Copyright: Hands-on exercise</center></footer>
</body>
</html>

Here, an HTML document must be wrapped inside the <html> tag, and all the content goes inside the <body> tag. The code in the preceding snippet can render the HTML page. All the data comes from the HTML and slash HTML blocks. Similarly, we have a body and end, a header begins and end, list begin and end. The second thing to notice is, unlike a relational structure, there are multiple list items and multiple paragraphs. Any single document would have a different number of them. This means that while the data object has some structure, it is more flexible. This is the hallmark of an office semi-structure data model. 

eXtensible Markup Language (XML) is another well-known standard for representing data. XML can be perceived as the generalization of HTML, where the elements, or the beginning and end markers within the angular brackets, can be any string. Let's take an example of an XML document: 

<?xml version="1.0" encoding="UTF-8"?>
<depression_patients>
<patient>
<name>John Doe</name>
<bill>$1115.95</bill>
<session>2</session>
<level>6</level>
<telecom>
<system value="phone"/>
<value value="(03) 5555 6473"/>
<use value="work"/>
<rank value="1"/>
</telecom>
</patient>
<patient>
<name>Ola Nordmann</name>
<bill>$7000.95</bill>
<session>3</session>
<level>9</level>
<telecom>
<system value="phone"/>
<value value="(03) 5555 6473"/>
<use value="work"/>
<rank value="1"/>
</telecom>
</patient>
<patient>
<name>Gummy Bear</name>
<bill>$43.95</bill>
<session>4</session>
<level>90</level>
<telecom>
<system value="phone"/>
<value value="(03) 5555 6473"/>
<use value="work"/>
<rank value="1"/>
</telecom>
</patient>
<patient>
<name>Reshika Adhikari</name>
<bill>$4343.50</bill>
<session>6</session>
<level>3</level>
<telecom>
<system value="phone"/>
<value value="(03) 5555 6473"/>
<use value="work"/>
<rank value="1"/>
</telecom>
</patient>
<patient>
<name>Yoshmi Mukhiya</name>
<bill>$634.95</bill>
<session>7</session>
<level>0</level>
<telecom>
<system value="phone"/>
<value value="(03) 5555 6473"/>
<use value="work"/>
<rank value="1"/>
</telecom>
</patient>
</depression_patients>

You can read more about XML files and their purposes at https://www.w3schools.com/XML/xml_whatis.asp

Another most popular format used for different data, such as Facebook and Twitter, is JavaScript Object Notation (JSON). Let's consider the following example, which is exactly the same snippet represented as XML previously: 

{
"depression_patients": {
"patient": [
{
"name": "John Doe",
"bill": "$1115.95",
"session": "2",
"level": "6",
"telecom": {
"system": {
"_value": "phone"
},
"value": {
"_value": "(03) 5555 6473"
},
"use": {
"_value": "work"
},
"rank": {
"_value": "1"
}
}
},
{
"name": "Ola Nordmann",
"bill": "$7000.95",
"session": "3",
"level": "9",
"telecom": {
"system": {
"_value": "phone"
},
"value": {
"_value": "(03) 5555 6473"
},
"use": {
"_value": "work"
},
"rank": {
"_value": "1"
}
}
},
{
"name": "Gummy Bear",
"bill": "$43.95",
"session": "4",
"level": "90",
"telecom": {
"system": {
"_value": "phone"
},
"value": {
"_value": "(03) 5555 6473"
},
"use": {
"_value": "work"
},
"rank": {
"_value": "1"
}
}
},
{
"name": "Reshika Adhikari",
"bill": "$4343.50",
"session": "6",
"level": "3",
"telecom": {
"system": {
"_value": "phone"
},
"value": {
"_value": "(03) 5555 6473"
},
"use": {
"_value": "work"
},
"rank": {
"_value": "1"
}
}
},
{
"name": "Yoshmi Mukhiya",
"bill": "$634.95",
"session": "7",
"level": "0",
"telecom": {
"system": {
"_value": "phone"
},
"value": {
"_value": "(03) 5555 6473"
},
"use": {
"_value": "work"
},
"rank": {
"_value": "1"
}
}
}
]
}
}

JSON uses text only, which is easier for sending and receiving over any server. Hence, it is used as a data format by many programming languages. In the preceding snippet, we have a similar nested structure; that is, lists containing other lists which will contain tuples that consist of key-value pairs. So, the key-value pairs at atomic property names and their values. One way to generalize about all these different forms of semi-structured data is to model them as trees:

Figure 5.1: Tree data structure of a simple web page showing semi-structured data
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.222.89