11.4 FORTH

Forth is an unusual language. It provides an interactive development environment, which includes both an interpreter and a compiler. Forth programming style encourages you to break a problem down into many small fragments (factoring) and then to develop and test each fragment interactively. Forth advocates assert that breaking the edit-compile-test cycle used by conventional programming languages can lead to great productivity improvements. FORTH is an extensible language, in the sense that you can add your own functions (called words) to its dictionary and they become part of the language. In fact, major portions of FORTH are written in FORTH language.

The language used by most of the Laser printers – PostScript – is derived from FORTH.

There are many implementations of FORTH (and sister languages) available, most well-known being FIG-FORTH. An implementation for x86 machines is called lina and it is a Linux native, i.e. C-less, version of ciforth (common Intel Forth). It is mostly compliant with the ISO Forth standard.

Here we shall mainly refer to Gforth – a portable ANS Forth implementation from the GNU project. Some Forth implementations, usually early versions or those written to be extremely portable, compile threaded code (see Chapter 8), but many modern implementations generate optimized machine code like other common language compilers.

11.4.1 Hello World in Forth

The following FORTH program, which defines a word (a function), is the mandatory “HelloWorld” program:

: HELLO CR .″ Hello, world!″ ;
HELLO <cr>

The first line starting with a ‘:’ defines a new word called HELLO, which prints a carriage-return, then a string “Hello, world!” and then a carriage-return is printed automatically by the FORTH system. The definition ends with the ‘;’ character.

11.4.2 A Quick Introduction to FORTH

When you invoke the Forth, you will see a startup banner printed and nothing else. Forth is now running its command line interpreter, which is called the Text Interpreter (also known as the Outer Interpreter).

45 ok

On carriage-return, the text interpreter processes the input line. It breaks the line into groups of non-white-space characters, called token. For each token,

  • It tries to treat it as a command. It does this by searching a name dictionary. If the token matches an entry in the name dictionary, it provides information that allows the text interpreter perform some actions. In Forth jargon, we say that the token names a word, that the dictionary search returns an execution token (xt ) corresponding to the definition of the word and that the text interpreter executes the xt .
  • If the text interpreter fails to find a match in the name dictionary, it tries to treat the token as a number in the current number base (initially 10). If the token legitimately represents a number, the text interpreter pushes the number onto a stack, called data stack.
  • If the text interpreter is unable to do either of these things with any token, it discards the group of characters and the rest of the line, then prints an error message. If the text interpreter reaches the end of the line without error, it prints the status message “ok” followed by a carriage-return.

Now we try some user-defined words.

: greet .″ Hello and welcome″ ;
greet <cr>
Hello and welcome ok
greet greet <cr>
Hello and welcomeHello and welcome ok

Let us take another example:

: add-two 2 + . ;
4 add-two <cr>
6 ok
: demo 5 add-two;
demo <cr>
7 ok
9 greet demo add-two <cr>
Hello and welcome7 11 ok

The pair of words “:” and “;” is used to start and terminate a new definition, respectively. The first word after the ‘:’ is the name for the new definition.

As you can see from the examples, a definition is built up of words that have already been defined; Forth makes no distinction between definitions that existed when you started the system up and those that you define yourself. The examples also introduce the words ‘.’ (dot), “.″” (dot-quote) and “dup” (dewp). Dot takes the value from the top of the data stack and displays it. It is like “.s” except that it only displays the top item of the stack and it is destructive; after it has executed, the value is no longer on the stack. There is always one space printed after the number, and no spaces before it. Dot-quote defines a string (a sequence of characters) that will be printed when the word is executed. The string can contain any printable characters except “””. A “”” has a special function; it is not a Forth word but it acts as a delimiter (the way that delimiters work is described in the next section). Finally, “dup” duplicates the value at the top of the stack.

We already know that the text interpreter searches through the dictionary to locate names. We already have a definition called add-two . Let us try modifying it by typing in a new definition:

: add-two dup . .″ + 2 =″ 2 + . ; <cr>
ok

How does that work? The word “:” does two special things – first, it prevents the text interpreter from ever seeing the characters add-two. The text interpreter uses a variable called >IN (pronounced “to-in”), to keep track of where it is in the input line. When it encounters the word “:”, it behaves in exactly the same way as it does for any other word; it looks it up in the name dictionary, finds its xt and executes it. When “:” executes, it looks at the input buffer, finds the word add-two and advances the value of >IN to point past it. It then does some other stuff associated with creating the new definition, including creating an entry for “add-two” in the name dictionary. When the execution of “:” completes, control returns to the text interpreter, which is oblivious to the fact that it has been tricked into ignoring part of the input line. Words like “:” that advance the value of >IN and so prevent the text interpreter from acting on the whole of the input line are called parsing words.

The second special thing that “:” does is to change the value of a variable called state, which affects the way that the text interpreter behaves. When Gforth starts up, state has the value 0, and the text interpreter is said to be interpreting. During a colon definition, started with a “:”, state is set to -1 and the text interpreter is said to be compiling.

In this example, the text interpreter is compiling when it processes the string 2 + . ;. It still breaks the string down into character sequences in the same way. However, instead of pushing the number 2 onto the stack, it lays down (compiles) some magic into the definition of add-two that will make the number 2 get pushed onto the stack when add-two is executed. Similarly, the behaviours of ‘+’ and ‘.’ are also compiled into the definition.

One category of words does not get compiled. These so-called immediate words get executed regardless of whether the text interpreter is interpreting or compiling. For example, word “;” is an immediate word. Rather than being compiled into the definition, it executes. Its effect is to terminate the current definition, which includes changing the value of state back to 0.

When you execute “add-two”, it has a run-time effect that is exactly the same as if you had typed

2 + . <cr>

outside of a definition.

In Forth, every word or number can be described in terms of two properties:

  • Its interpretation semantics describe how it will behave when the text interpreter encounters it in interpret state. The interpretation semantics of a word are represented by an execution token.
  • Its compilation semantics describe how it will behave when the text interpreter encounters it in compile state. The compilation semantics of a word are represented in an implementation-dependent way; Gforth uses a compilation token.

Numbers are always treated in a fixed way:

  • When the number is interpreted, its behaviour is to push the number onto the stack.
  • When the number is compiled, a piece of code is appended to the current definition that pushes the number when it runs. In other words, the compilation semantics of a number are to postpone its interpretation semantics until the run-time of the definition that it is being compiled into.

Words do not behave in such a regular way, but most have default semantics which means that they behave like the following:

  • The interpretation semantics of the word are to do something useful.
  • The compilation semantics of the word are to append its interpretation semantics to the current definition, so that its run-time behaviour is to do something useful.

The actual behaviour of any particular word can be controlled by using the words immediate and compile-only when the word is defined. These words set flags in the name dictionary entry of the most recently defined word, and these flags are retrieved by the text interpreter when it finds the word in the name dictionary.

A word that is marked as immediate has compilation semantics that are identical to its interpretation semantics. In other words, it behaves like the following:

  • The interpretation semantics of the word are to do something useful.
  • The compilation semantics of the word are to do something useful and actually the same thing, i.e. it is executed during compilation.

Marking a word as compile-only prohibits the text interpreter from performing the interpretation semantics of the word directly; an attempt to do so will generate an error. It is not necessary to use compile-only and it is not even part of ANS Forth, though it is provided by many implementations, but it is good etiquette to apply it to a word that will not behave correctly and might have unexpected side effects in interpret state. For example, it is only legal to use the conditional word IF within a definition. If you forget this and try to use it elsewhere, the fact that it is marked as compile-only allows the text interpreter to generate a helpful error message.

This example shows the difference between an immediate and a non-immediate word:

: show-state state @ . ;
: show-state-now show-state ; immediate
: word1 show-state ;
: word2 show-state-now ;

The word immediate after the definition of show-state-now makes that word an immediate word.

These definitions introduce a new word: @ (pronounced “fetch”). This word fetches the value of a variable and leaves it on the stack. Therefore, the behaviour of show-state is to print a number that represents the current value of state.

When you execute word1, it prints the number 0, indicating that the system is interpreting. When the text interpreter compiled the definition of word1, it encountered show-state whose compilation semantics are to append its interpretation semantics to the current definition. When you execute word1, it performs the interpretation semantics of show-state.

At the time that word1, and therefore show-state, are executed, the system is interpreting. When you press <cr> after entering the definition of word2, you should have seen the number –1 printed, followed by “ok”. When the text interpreter compiled the definition of word2, it encountered show-state-now, an immediate word, whose compilation semantics are therefore to perform its interpretation semantics. It is executed straight away, even before the text interpreter has moved on to process another group of characters, the “;” in this example. The effect of executing it is to display the value of state at the time that the definition of word2 is being defined. Printing –1 demonstrates that the system is compiling at this time. If you execute word2 it does nothing at all.

Before leaving the subject of immediate words, consider the behaviour of “.″“ in the definition of greet above. This word is both a parsing word and an immediate word. Note that there is a space between .” and the start of the text ″Hello and welcome″, but there is no space between the last letter of welcome and the ″ character. The reason for this is that .″ is a Forth word; it must have a space after it so that the text interpreter can identify it. The “ is not a Forth word; it is a delimiter. The examples earlier show that when the string is displayed, there is neither a space before the H nor after the e. Since “.″“ is an immediate word, it executes at the time that greet is defined. When it executes, its behaviour is to search forward in the input line looking for the delimiter. When it finds the delimiter, it updates >IN to point past the delimiter. It also compiles some magic code into the definition of greet; the xt of a run-time routine that prints a text string. It compiles the string “Hello and welcome” into the memory so that it is available to be printed later. When the text interpreter gains control, the next word it finds in the input stream is “;” and so it terminates the definition of greet.

11.4.3 Summary

  • Forth programs use factoring to break a problem down into small fragments called words or definitions.
  • Forth program development is an interactive process.
  • The main command loop that accepts input, and controls both interpretation and compilation, is called the text interpreter (also known as the outer interpreter).
  • Forth has a very simple syntax, consisting of words and numbers separated by spaces or carriage-return characters. Any additional syntax is imposed by parsing words.
  • Forth uses a stack to pass parameters between words. As a result, it uses postfix notation.
  • To use a word that has previously been defined, the text interpreter searches for the word in the name dictionary.
  • Words have interpretation and compilation semantics.
  • The text interpreter uses the value of state to select between the use of the interpretation and the compilation semantics of a word that it encounters.
  • The relationship between the interpretation and compilation semantics for a word depends upon the way in which the word was defined for example, whether it is an immediate word.
  • Forth definitions can be implemented in Forth – called high-level definitions – or in some other way, usually a lower-level language and as a result often called low-level definitions, code definitions or primitives.
  • Many Forth systems are implemented mainly in Forth.

If you want to learn and use FORTH in some application, you should start by getting familiar with the following words in FORTH:

 

Arithmetic: + ‒ * / /MOD */ ABS INVERT

Comparison: MIN MAX =

Logic: AND OR XOR NOT

Stack manipulation: DUP DROP SWAP OVER

Loops and decisions: IF ELSE ENDIF ?DO I LOOP

Input/Output: . .″ EMIT CR KEY

Defining words: : ; CREATE

Memory allocation words: ALLOT ,

Tools: SEE WORDS .S MARKER

More defining words: VARIABLE CONSTANT VALUE TO CREATE DOES>

Memory access: @ !

11.4.4 Vmgen

Vmgen, included in the Gforth package, is a tool for writing efficient interpreters. It takes a simple virtual machine description and generates efficient C code for dealing with the virtual machine code in various ways, in particular, executing it. The run-time efficiency of the resulting interpreters is usually within a factor of 10 of machine code produced by an optimizing compiler.

The interpreter design strategy supported by Vmgen is to divide the interpreter into two parts:

  • The front-end takes the source code of the language to be implemented and translates it into virtual machine code. This is similar to an ordinary compiler front-end. Typically, an interpreter front-end performs no optimization, so it is relatively simple to implement and runs fast.
  • The virtual machine interpreter executes the virtual machine code.

Such a division is usually used in interpreters, for modularity as well as for efficiency. The virtual machine code is typically passed between front-end and virtual machine interpreter in memory, like in a load-and-go compiler. This avoids the complexity and time cost of writing the code to a file and reading it again.

A virtual machine (VM) represents the program as a sequence of VM instructions, following each other in memory, similar to real machine code. Control flow occurs through VM branch instructions, like in a real machine.

In this setup, Vmgen can generate most of the code dealing with virtual machine instructions from a simple description of the virtual machine instructions, in particular:

VM instruction execution

VM code generation: Useful in the front-end.

VM code decompiler: Useful for debugging the front-end.

VM code tracing: Useful for debugging the front-end and the VM interpreter. You will typically provide other means for debugging the user's programs at the source level.

VM code profiling: Useful for optimizing the VM interpreter with super instructions, which are a combination of basic instructions.

 

To create parts of the interpretive system that do not deal with VM instructions, you have to use other tools (e.g., bison) and/or hand-code them. Vmgen supports efficient interpreters through various optimizations, in particular:

  • Threaded code,
  • Caching the top-of-stack in a register,
  • Combining VM instructions into super instructions,
  • Replicating VM (super) instructions for better BTB prediction accuracy.

As a result, Vmgen-based interpreters are only about an order of magnitude slower than native code from an optimizing C compiler on small benchmarks. On large benchmarks, which spend more time in the run-time system, the slowdown is often less. For example, the slowdown of a Vmgen-generated JVM interpreter over the best JVM JIT (Just-in-Time) compiler was measured at only a factor of 2–3 for large benchmarks. Some other JITs and all other interpreters were found slower than Vmgen-based interpreter.

VMs are usually designed as stack machines, passing data between VM instructions on a stack, and Vmgen supports such designs especially well. However, you can also use Vmgen for implementing a register VM and still benefit from most of the advantages offered by Vmgen. Vmgen is especially attractive for implementing software for embedded systems.

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.251.128