Chapter 2. Introducing WebAssembly

WebAssembly is a portable binary code format that is delivered to the browser as modules using the .wasm file extension. It also defines a corresponding textual assembly language, called WebAssembly Text Format, which uses the .wat (WAT) extension. There is a close correspondence between the binary and textual formats, with the latter simply making it easier for humans to read .wasm modules.

Although it is relatively straightforward to write .wasm modules directly using this text format, it is primarily designed as a compilation target. The first demonstrations of this technology used the Emscripten compiler, allowing compilation of C/C++ to WebAssembly.

This chapter explores the structure of WebAssembly modules, through a few simple examples. Following this, we take a look at the WebAssembly runtime and how it integrates with its host environment. Finally, we look at the various languages that you can use to write applications in WebAssembly.

WebAssembly Modules

WebAssembly applications are written using a variety of different programming languages (C, C++, Rust, Python), each with their respective toolchains, with the ultimate output of the compilation process being one or more WebAssembly modules. These modules are delivered to the browser over HTTP as .wasm files. This section takes a look at this compilation process via a simple example and the modules it creates. The example itself will be quite trivial; it is the tools and processes that are of most interest here.

The following is a small C function that raises a floating-point number to a given power by simple iteration:

float power(float number, int pow) {
 float res = number;
 for (int i = 0;i < pow - 1; i++) {
   res = res * number;
 }
 return res;
}

For example, executing power(2, 8) returns the number 256.

This function, when saved to a file power.c, can be compiled to WebAssembly via Emscripten using the following command:1

$ emcc power.c -Os -s SIDE_MODULE=1 -o power.wasm

This outputs a WebAssembly file, power.wasm, which is a compact binary format:

$ xxd power.wasm
0000: 0061 736d 0100 0000 0107 0160 027d 7f01  .asm.......`.}..
0010: 7d03 0201 0007 0a01 065f 706f 7765 7200  }........_power.
0020: 000a 3b01 3902 017f 017d 2001 417f 6a21  ..;.9....} .A.j!
0030: 0220 0141 014a 0440 4100 2101 2000 2103  . .A.J.@A.!. .!.
0040: 037d 2003 2000 9421 0320 0220 0141 016a  .} . ..!. . .A.j
0050: 2201 470d 0020 030b 2100 0b20 000b       ".G.. ..!.. ..

Modules in their binary format are quite compact but not terribly readable. You can convert from the binary format into the textual representation using the wasm2wat tool from the WebAssembly Binary Toolkit:

$ wasm2wat power.wasm -o power.wat

This yields the following output:

(module
  (type (;0;) (func (param f32 i32) (result f32)))
  (func (;0;) (type 0) (param f32 i32) (result f32)
    (local i32 f32)
    local.get 1
    i32.const -1
    i32.add
    local.set 2
    local.get 1
    i32.const 1
    i32.gt_s
    if  ;; label = @1
      i32.const 0
      local.set 1
      local.get 0
      local.set 3
      loop (result f32)  ;; label = @2
        local.get 3
        local.get 0
        f32.mul
        local.set 3
        local.get 2
        local.get 1
        i32.const 1
        i32.add
        local.tee 1
        i32.ne
        br_if 0 (;@2;)
        local.get 3
      end
      local.set 0
    end
    local.get 0)
  (export "_power" (func 0)))

We don’t delve into too much detail regarding the WebAssembly instruction set, memory model, and inner workings of the runtime—if you’re interested, the WebAssembly Specification is surprisingly readable. However, this simple example does provide enough context to make a few interesting observations that will deepen your understanding of WebAssembly (this is quite important when it comes to understanding some of the limitations of this technology).

WebAssembly code is distributed as separate modules, each of which contain a number of sections that are defined in a specific order. In the preceding “power” example, you can see a few of these sections in action.

The export section defines functions (among other things) that are exported by the module, allowing them to be invoked by the host environment. In this example, the power function is exported. Later, we see how the host can invoke it.

The main body of the module is the code section, which comprises one or more functions. These are much like functions in any other language in that they take input parameters, perform computations, and then return a single result. Functions can also have local variables that are scoped to the function. The instructions that make up the function body are similar to those that you find in other assembly languages; for example, i32.add adds one 32-bit integer to another.

WebAssembly has support for four types: two integers (32- and 64-bit), and two floating point (again, 32- and 64-bit). The 64-bit integer type is quite notable; JavaScript lacks high precision integers (in fact, it supports only floating-point numerics), which has limited its use for certain types of mathematical computation.

WebAssembly application state is stored in three different places. The first is the execution stack, where intermediate function state is stored. Instructions (such as i32.add) pop values from the stack and push their return value back onto it. The second is global variables, which, as the name implies, are globally accessible across the entire module. These can also be exported to the host. The final is linear memory, which is a block of continuous memory that instructions can read and write to. Again, this can be exported (i.e., shared with) the host. Notably, the WebAssembly stack and linear memory are entirely separate; the two cannot collide with and overwrite each other.

Despite WebAssembly having an assembly-like instruction set, it does have some features that are more often found in higher-level languages, such as functions, loops, and local variables. These features make it relatively easy to write WebAssembly by hand, in WAT format, although this is mostly a pursuit for hobbyists, enthusiasts, and compiler authors.

To execute this .wasm module within a browser, it must be downloaded, compiled, and instantiated, as follows:

WebAssembly.instantiateStreaming(fetch("power.wasm"))
  .then(({ instance }) => {
    console.log(instance.exports.power(2, 3));
  });

The WebAssembly namespace provides a number of functions for managing WebAssembly modules. In the preceding example the instantiateStreaming function is used to both compile and instantiate a module directly from an input stream, in this case provided via the fetch API. The module instance is returned via a promise; in this example, by simply executing the exported power function.

This example neatly demonstrates C code execution within the browser via WebAssembly. In practice, this exact same compilation process is used for much more complex algorithms. However, it also reveals some interesting features of WebAssembly that we will return to in a later section, most notably is its relationship with the browser host. WebAssembly modules are not downloaded directly via script tags; instead, they must be downloaded using the browser’s fetch (or XHR) APIs. Also, they must be instantiated via the JavaScript WebAssembly APIs, with the exported functions invoked via the JavaScript host.

As shown here, WebAssembly was designed to have a close relationship with JavaScript; it was not designed to be a complete replacement.

WebAssembly Execution and Runtime

So far, we’ve taken a look at the static structure of WebAssembly modules, which are distributed in binary format (.wasm), and inspected their more readable textual format (.wat). We now turn our attention to the runtime that executes these modules.

WebAssembly defines the concept of a “host,” which can be any compatible runtime; however, for the purposes of this section, let’s just consider the browser as the host—hence, we will use the term “browser” as a synonym.

WebAssembly modules are executed within a virtual machine (VM) that is closely coupled to the browser’s JavaScript VM. One of the most notable features of this close relationship is that JavaScript execution “yields” to WebAssembly, and vice versa. In the earlier “power” example, when the exported power function is executed, the JavaScript execution waits until this function returns a value; in other words, function invocation is a synchronous operation.

When a WebAssembly module is loaded over HTTP, it must be decoded and compiled prior to execution. Even though WebAssembly is an assembly-like language, it must still be compiled to the machine code of the underlying processor. Similar to JavaScript, WebAssembly compilation can be tiered, allowing for progressing runtime optimizations. However, this process is far faster and more optimal than the equivalent with JavaScript, as illustrated in Figure 2-1.

WebAssembly versus JavaScript execution
Figure 2-1. WebAssembly versus JavaScript execution

Interoperating with the Host Environment

WebAssembly doesn’t have any built-in I/O capabilities. For example, modules cannot natively open sockets or write to the filesystem. Without assistance from the host, a WebAssembly module can only “compute.” There are a couple of different ways in which a module can communicate data with the host: function exports and shared linear memory.

The “power” example in the previous section demonstrated function exports, allowing them to be invoked by the hosting environment. Although the exports concept makes it easy to expose WebAssembly functions to the host (and similarly import host functions for execution by the module), there is one significant limitation to this approach: WebAssembly’s type system. As previously mentioned, WebAssembly has just four types, all of which are numeric—no arrays, strings, objects, structs, or other more-complex types that you expect to find in higher-level languages.

WebAssembly was designed to be a compilation target for a wide range of modern programming languages, so how does it support compilation of more complex types? The key to this is linear memory.

When languages that support more complex types such as strings, arrays, and structs are compiled to WebAssembly, these data types are stored in the module’s linear memory, using it as a heap (i.e., a portion of memory that is dynamically allocated). The host environment can also read and write to linear memory, allowing exchange of more complex data types with the host.

However, for the host to read these types from linear memory, it must be able to decode them, as illustrated in Figure 2-2, which shows a very simple example in which a string is decoded. The encoding/decoding logic is often termed glue code; it allows rich interoperability between JavaScript and the languages that compile to WebAssembly.

Exchange of more complex data types via linear memory
Figure 2-2. Exchange of more-complex data types via linear memory

The complexity of glue code required for communication with the host environment is a bit of an issue, requiring the addition of encoders and decoders on both the host and WebAssembly side. This results in additional code that contributes to overall application size and adds a significant performance overhead to communication between WebAssembly in the host.

These issues are being tackled in a couple of ways. First, the community is developing tooling that automates much of this process. For example, Emscripten generates a JavaScript API wrapper, and the Rust community has developed wasm-bindgen, which automates generation of JavaScript bindings. Second, the WebAssembly standards are evolving with a number of proposals looking to tackle this issue and potentially remove the need for any glue code, which we discuss in Chapter 5.

In the meantime, this has shaped the early adoption of WebAssembly, with initial success being found in applications that minimize I/O; for example, cryptography, audio synthesis, and image processing.

WebAssembly Design Goals

So far, we’ve had a look at the static structure of WebAssembly modules, seen how they are instantiated and executed at runtime, and explored how they communicate (interoperate) with the host. In this section, we take a few steps back and explore some of the original design goals of WebAssembly. This will help explain some of the features you have observed already, but, more important, it will give you a better understanding of the strengths (and limitations) of WebAssembly itself.

Minimum Viable Product and Use Cases

The original design goals for WebAssembly were guided by the initial scope: the Minimum Viable Product (MVP) was intended to have roughly the same functionality as asm.js, and was primarily aimed at C/C++ compilation.

A number of use cases were identified and used to shape the initial MVP release. These included image and video editing, Computer Aided Design (CAD), music streaming, encoding/decoding, Augmented Reality (AR)/Virtual Reality (VR), scientific simulations, developer tooling, games, and game-portals. Although this is not an exhaustive list of use cases, there is a common thread running through all of them: these are applications that are computationally intensive.

Finally, it helps to explicitly call out the various classes of website or web app that are not considered a use case for WebAssembly. It was not designed to be used for social media sites, news portals, shopping carts or form-based applications such as CRMs (Customer Relationship Management), for which conventional frameworks like React, Angular, and Vue are still quite appropriate.

Security

The primary target for WebAssembly is the web (the clue is in the name!), and as a result, security considerations were very much at the forefront when designing the language. A detailed exploration of these considerations and their influence on the language and runtime design are beyond the scope of this report. Instead, we cover some of the high-level concepts.

The WebAssembly runtime adheres to the same security policies as its host environment. Within the web this means that it conforms to the same-origin policy; a page cannot load and execute a malicious WebAssembly module from a different origin (as a result of a cross-site scripting [XSS] attack).

WebAssembly modules are sandboxed and, as a result, execution cannot escape the runtime environment. This runtime has no built-in I/O capabilities, which means that the only way a WebAssembly module can interoperate with its host is via exports/imports.

Portability

WebAssembly’s binary format was designed to be efficiently executed on a variety of operating systems and instruction set architectures, resulting in a highly portable runtime. Most notably, it was designed for execution both within the browser and out of the browser. Consequently, it has been integrated into a wide range of runtimes (we examine this in more detail in Chapter 5).

WebAssembly does not specify any APIs or system calls; instead, it simply provides an import mechanism by which the set of available imports is defined by the host environment. This contributes to the versatility of WebAssembly, but on the flip side, it does mean that accessing the host environment currently requires quite a bit of glue code, none of which is standardized.

Performance

In the previous sections, we’ve seen how WebAssembly removes some of the inefficiencies relating to JavaScript execution; the binary distribution and rapid compilation/optimizations results in WebAssembly modules reaching peak execution speed far faster than the equivalent JavaScript code. Here, we dig into some of the other features of WebAssembly that contribute to its overall performance.

The bottleneck for web performance used to be the network— websites with significant amounts of JavaScript were slow to render and become interactive because of the time needed to download the scripts themselves. However, as the internet has become faster (for most of us), the bottleneck is now the CPU, which must parse and compile this JavaScript code. Decoding WebAssembly is much simpler and faster than parsing JavaScript; furthermore, this decode/compile step can be split across multiple threads, and the whole process started while the module is still downloading. This significantly reduces the time it takes to download application code and reach peak execution speed.

So just how fast is WebAssembly when compared to JavaScript? This is a difficult question to answer because it is highly dependent on what you measure and how you measure it. There have been numerous papers and blog posts that have measured the algorithmic performance of WebAssembly, but often these “micro benchmarks” are far from realistic. A more useful indication of performance emerges when looking at WebAssembly within real-world scenarios.

The team at Mozilla has researched the use of WebAssembly to speed up the process of decoding sourcemaps. These provide mappings from one source-code representation to another, allowing developers to breakpoint code written in TypeScript, for example. The current sourcemap library is written in JavaScript. By rewriting the library in Rust and compiling to WebAssembly, the team reduced the time from a breakpoint being hit, to the code being visible to the developer by a factor of three. Furthermore, the JavaScript code had been highly optimized, resulting in a codebase that was a little convoluted and difficult to maintain. In contrast the Rust equivalent was clean and idiomatic.

Language Support

WebAssembly is a compilation target that supports a wide range of high-level programming languages. However, as it is a relatively nascent technology, the level of support varies considerably from one language to the next. With the commercial and community interest in WebAssembly, this landscape is changing all the time. In this section, we take a broad look at the current state of language support. For detailed practical guidance around the use of these tools, please refer to their respective websites.

C/C++ with Emscripten

Emscripten is an LLVM-based compiler toolchain that targets both asm.js and WebAssembly. The underlying LLVM architecture supports a wide range of programming languages and target instruction sets via its platform-independent intermediate language, illustrated in Figure 2-3.

LLVM compilation via an intermediate language
Figure 2-3. LLVM compilation via an intermediate language

During the development of asm.js, the Emscripten team created the Fastcomp backend, which compiles LLVM IR into asm.js. The team built on this when developing WebAssembly, by adding an additional compilation step, from asm.js to WebAssembly (using Binaryen), allowing compilation of C/C++ to WebAssembly, as shown in Figure 2-4.

WebAssembly compilation with Emscripten
Figure 2-4. WebAssembly compilation with Emscripten

Emscripten is primarily intended for porting existing C/C++ codebases to the web. For that reason, it is much more than just a WebAssembly compiler; it also has comprehensive support for a wide range of existing APIs, including OpenGL (for 3D graphics) and SDL (which provides access to audio, keyboard, and other I/O capabilities). Emscripten also provides direct access to HTML5 APIs, allowing C++ code to interact with the Document Object Model (DOM). These rich features do come at a cost, with Emscripten-generated WebAssembly modules being a little heavyweight and requiring a sizeable JavaScript “wrapper.”

Emscripten is still actively developed, and there is currently work underway to create a WebAssembly-specific backend, which will eliminate the asm.js compilation step. For those who want to use C/C++ code on the web, Emscripten provides quite a mature solution.

Rust

Rust is a relatively new programming language that was designed by a team at Mozilla, with the first version released in 2010. The team’s goals for this new language were to make it easier to create safe, highly performant and concurrent applications. An interesting feature of Rust is the concept of “ownership,” which is used to achieve compile-time memory safety. This, coupled with language features that make it feel quite modern (when compared to C++), has contributed to Rust being voted the “most loved” programming language in the StackOverflow annual survey, for four years running.

Early efforts to compile Rust to WebAssembly made use of Emscripten; however, more recently the tooling has moved to directly use LLVM, resulting in more lightweight modules (when contrasted with the output of Emscripten). There has been considerable interest in WebAssembly from the Rust community, with a suite of tools emerging over the past couple of years. These include tools for optimizing .wasm binaries, JavaScript toolchain integration (Webpack and Rollup), and package manager integration (npm and cargo).

One of the more notable Rust tools is wasm-bindgen, which facilitates high-level interactions between Rust and JavaScript, tackling the glue-code issue described earlier in this chapter. This tool automatically generates Rust to JavaScript bindings (and vice versa), making it easy to move complex types between the two languages. This project was initially created with the WebIDL bindings in mind (a WebAssembly proposal that has more recently been replaced with Interface Types, as discussed in Chapter 4).

For green-field projects Rust has become the language of choice for many WebAssembly developers.

C# with Blazor

Blazor started life in 2017 as a fun experiment by Steve Sanderson (part of Microsoft’s ASP.NET team) who was looking to find a way to run C# on WebAssembly. He initially took an old (and abandoned) C implementation of the .NET Common Language Runtime (CLR), compiled it to WebAssembly and found he was able to run .NET assemblies (Dynamic-Link Libraries [DLLs]) within the browser. Building on this foundation, he added a user interface (UI) framework based on Razor templates (a markup syntax for HTML and C#).

The Mono project, which provides an open source implementation of the CLR, was also looking to target WebAssembly. As a result, Blazor adopted Mono and in 2018 moved from being a community-led project to an official ASP.NET project, with an “experimental” status. The project has matured quite rapidly, with Microsoft recently announcing that its first full production release will take place in May 2020.

Blazor is designed to be more than just a C#-to-WebAssembly compiler, it is intended to be a complete framework for building web-based applications. This is a notable difference when compared to the C++/Rust toolchains, which focus primarily on language cross-compilation, leaving the community to build the various frameworks that support development on the web (and other hosts).

Blazor has a very interesting runtime model; when you write an app using this framework your C# code is not compiled to WebAssembly. Instead, it is the runtime (the CLR) that is compiled to WebAssembly, allowing DLLs to run directly on the web. This is an interpreted mode of operation, as shown in Figure 2-5.

Blazor s interpreted mode
Figure 2-5. Blazor’s interpreted mode

This approach has the advantage of rapid iteration times; however, the overall payload (the CLR runtime plus your application DLLs) is significant. The Blazor team is also exploring an Ahead-of-Time (AOT) compilation approach in which your application is compiled directly to WebAssemby, as illustrated in Figure 2-6. However, this does still require a lightweight runtime in order to support garbage collection and other CLR functions.

Blazor s Ahead of Time compilation mode
Figure 2-6. Blazor’s Ahead of Time compilation mode

With Blazor becoming a fully supported Microsoft framework in 2020, it is almost certainly going to find widespread adoption.

JavaScript with AssemblyScript

JavaScript has been the “language of the web” for the past 25 years, so it might not be immediately obvious why a JavaScript developer would want to target WebAssembly. However, the benefits that WebAssembly promises, namely predictable performance and reduced parse/load times, are equally desirable to JavaScript developers. We all want our websites and web apps to load more quickly and perform better.

JavaScript is not an easy language to compile to WebAssembly for a few reasons, the most crucial of which is that it is dynamically typed. Incidentally, this dynamic typing is part of the reason why JavaScript runtime optimization (as described in the first chapter) is such a challenge.

AssemblyScript is a subset of TypeScript (a popular superset of JavaScript that adds optional static typing) that has been carefully designed to allow compilation to WebAssembly. AssemblyScript does differ from JavaScript in a number of ways, but it’s close enough to be considered a JavaScript-to-WebAssembly compiler.

AssemblyScript is built on the LLVM toolchain using the Binaryen tooling that was developed alongside Emscripten. The AssemblyScript runtime performs garbage collection through reference counting. It also provides a JavaScript-like standard library, which includes strings, arrays, and a (limited) date API.

AssemblyScript is a nascent project and as a result most people would opt for Rust or C++ when targeting WebAssembly for production application. However, most tools and frameworks support AssemblyScript, and it is clearly seen as an important project by the community. Given time to mature, it could be a viable option in the near future.

And the Rest…

Although C++, C#, Rust, and AssemblyScript are some of the most mature options for targeting WebAssembly, there is growing support from a wide range of other languages:

Python

Mozilla is supporting Pyodide, a project that compiles the Python runtime to WebAssembly along with the Python scientific stack, including NumPy, Pandas, Matplotlib, and parts of SciPy.

Go

WebAssembly as a compilation target became available in Go 1.11 (late 2018). There are also WebAssembly UI frameworks being developed by the community; for example, Vugu.

Swift

There is work underway to add WebAssembly support to Apple’s Swift language, with the SwiftWasm project working on a pull request to add this feature.

C#

Blazor isn’t the only mature option for C# developers. Uno Platform also provides support for WebAssembly, allowing developers to use the Universal Windows Platform and XAML.

The WebAssembly language ecosystem is maturing fast, with a number of the options presented here likely to become production ready next year, opening the door to web development for a much wider community.

WebAssembly has started gaining widespread language support, allowing developers to use a range of different languages for writing web applications. In Chapter 3, we take a look at some of the early success stories from those who are already using WebAssembly for production applications.

1 The Emscripten tooling is rapidly evolving. For up-to-date instructions on how to install the toolkit and user documentation, refer to the website.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.77.158