Understanding WebAssembly Architecture

In this section, you’ll get a good look inside the engine that makes WebAssembly work. Its unique architecture makes it incredibly powerful, portable, and efficient—though this power comes with some limitations.

Stack Machines

The type of computer that you’re using right now is likely a Register Machine. Laptops, desktops, mobile devices, virtual machines, even microcontrollers and embedded devices are register machines. A register machine is a machine (physical or virtual) where the processor instructions explicitly refer to certain registers, or data storage locations, on the processor. Accessing these registers is fast and efficient because the data is available directly within the CPU.

For example, if you want to add two numbers together, you’d use the ADD instruction and you’d pass it the names of two registers as parameters, as shown in this bit of x86 assembly:

 ADD al, ah

In the preceding code, the values contained in ah and al will be added together, with the result stored in al.

WebAssembly is a stack machine. In a stack machine, most of the instructions assume that the operands are sitting on the stack, rather than stored in specified registers. The WebAssembly stack is a LIFO (Last In, First Out) stack. If you’re unfamiliar with the concept of a stack: it is as its name implies—values are piled (stacked) on top of each other, and unlike arrays where you can access any data regardless of location in the pile, stacks only allow you to pop data off or push data onto the top.

To add two numbers in a stack machine, you push those numbers onto the top of the stack. Then you push the ADD instruction onto the stack. The two operands and the instruction are then popped off the top and the result of the addition is pushed on in their place.

There are a number of advantages to a stack machine that made it an appealing choice for WebAssembly: their small binary size, efficient instruction coding, and ease of portability just to name a few.

There are some fairly well-known stack machines, including the Java Virtual Machine (JVM) and the bytecode executor for the .NET Common Language Runtime. In the case of those virtual machines, developers are spared the effort of writing assembly or thinking in prefix or Polish[3] (where the operator comes first) notation because of the intermediate steps and code generation happening behind the scenes.

Data Types

Admit it—you’ve been spoiled. Modern programming languages with hashes, lists, arrays, sets, extra-large numbers, and tuples have spoiled you. These languages also probably let you create your own types through structs or classes. Some of them even let you overload operators, and some of those overloads can even work on custom types. The world is your oyster and you have few limits. That is not the world of WebAssembly. As their name should imply, assembly languages are designed to be made up of primitives that can be used as building blocks by higher level languages.

WebAssembly 1.0 has exactly four data types:

TypeDescription

i32

32-Bit Integer

i64

64-Bit Integer

f32

32-Bit Floating-Point Number

f64

64-Bit Floating-Point Number

One aspect of this relatively limited set of data types is that WebAssembly doesn’t assign any intrinsic signed-ness to numbers as they’re stored. The assumption of whether a number is signed or unsigned is only performed at the time of an operation. For example, while there’s only one i32 data type, there are signed and unsigned versions of that type’s arithmetic operators, e.g. i32.add and i32.add_u.

When you’re using a high-level language that compiles to WebAssembly on your behalf, you shouldn’t have to worry about this subtlety. But when you’re writing raw Wasm in the text format by hand, it could trip you up in unexpected ways.

Control Flow

WebAssembly’s handling of control flow is a little different than other, less portable assembly languages. WebAssembly goes to great lengths to ensure that its control flow can’t invalidate type safety, and can’t be hijacked by attackers even with a “heap corruption”[4]-style attack in linear memory. For example, many assembly languages allow easily exploited blind jump instructions, whereas you’ll discover that WebAssembly does not. This additional layer of safety pairs well with the safety-first philosophy of Rust.

Wasm control flow is accomplished the same way everything else is within a stack machine—by pushing things onto, and popping things off of, the stack. For example, with an if instruction, if whatever is at the top of the stack evaluates as true (non-zero), then the if branch will be executed.

Take a look at an example of the if statement in action:

 (if (i32.eq (call $getHealth) (i32.const 0))
  (then (call $doDeath))
  (else (call $stillAlive))
 )

In this code, if our hypothetical player’s health has reached 0, then we’ll call the doDeath function, otherwise we’ll call the stillAlive function. All those seemingly extra parentheses will make sense later in the chapter.

WebAssembly has the following control flow instructions available:

InstructionDescription

if

Marks the beginning of an if branching instruction.

else

Marks the else block of an if instruction

loop

A labeled block used to create loops

block

A sequence of instructions, often used within expressions

br

Branch to the given label in a containing instruction or block

br_if

Identical to a branch, but with a prerequisite condition

br_table

Branches, but instead of to a label it jumps to a function index in a table

return

Returns a value from the instruction (1.0 only supports one return value)

end

Marks the end of a block, loop, if, or a function

nop

No self-respecting assembly language is without an operation that does nothing

Linear Memory

As you work with linear memory, you’ll truly begin to appreciate the extent to which modern high-level languages have spoiled you. With most languages, you can quickly and easily create a new instance of something on the heap with an operator like new.

Internally, the compiler knows the size of this thing (or has some trick to compensate for not knowing). When you pass an instance of something to a function, the compiler knows whether you’re passing a pointer or a value and how to arrange that value on your stack or heap in order to make the data available to a function.

WebAssembly doesn’t have a heap in the traditional sense. There’s no concept of a new operator. In fact, you don’t allocate memory at the object level because there are no objects. There’s also no garbage collection (at least not in the 1.0 MVP).

Instead, WebAssembly has linear memory. This is a contiguous block of bytes that can be declared internally within the module, exported out of a module, or imported from the host. Think of it as though the code you’re writing is restricted to using a single variable that is a byte array. Your WebAssembly module can grow the linear memory block in increments called pages of 64KB if it needs more space. Sadly, determining if you need more space is entirely up to you and your code—there’s no runtime to do this for you.

This image with variables and byte offsets illustrates just one way to store data in a block of linear memory (how you choose to use and fill linear memory is entirely up to you and your code):

images/fundamentals/linear_memory.png

In addition to the efficiency of direct memory access, there’s another reason why it’s ideal for WebAssembly: security. While the host can read and write any linear memory given to a Wasm module at any time, the Wasm module can never access any of the host’s memory.

As you’ll see in the coming chapters, linear memory is crucial to being able to create powerful applications with WebAssembly. Before using high-level languages like Rust, you should learn how to manipulate linear memory manually so you can appreciate the extent of the work done on your behalf by tools and code generation and understand the impact of your designs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.117.35