Structure, by D. Bozhinovski
Now that we have vaguely explained what a blockchain is and how it is useful, the next obvious step is to implement these calculations in a computer, so that they are automatically performed. In this chapter, we introduce a tool that will allow us to implement these calculations exactly.
2.1 Introduction to Lisp
Lisp, originating from 1958, stands for LIST Processing and is a family of programming languages. Unlike standard programming languages, it has a fully parenthesized prefix notation. For example, instead of writing 1 + 2, one would write (+ 1 2).
There are many Lisp implementations in the Lisp family. One such implementation is Racket, and we will use it in this book since this implementation is particularly easy for entry-level programmers. The language is used in a variety of contexts such as research, computer science education, and general-purpose programming. It has also been used for commercial projects. One notable example is the Hacker News1 website, which runs on Arc, a programming language developed in Racket.
Lisp implementations are quite known for their minimalism. Due to this minimalism, building a blockchain (or anything, for that matter) in Lisp will imply that you can do the same in most other programming languages with ease. Lisps favor function composition—chaining two functions together—for example, given f(x) and g(x), one composition is f(g(x)). Further in the book, we will see the interesting properties that composition offers and how easily we can maintain and extend our code.
2.1.1 Data Structures and Recursion
Primitives or axioms (starting points or building blocks). As an example, the numbers 1, 2, and so on, are something we do not have to implement ourselves since they are already included in the programming language. Another example is operations on numbers, such as + and *.
Composition or a way to compose primitives to do complex calculations. For example, we can combine + and * as follows: 1 + (2 * 3) or in prefix (Lisp) notation: (+ 1 (* 2 3)).
Abstraction or capturing the composition of primitives. For example, if we find ourselves doing a calculation over and over again, it would be good to capture (abstract, or wrap) it in a function that can be easily reused.
We will rely on these concepts repeatedly throughout the book, as they allow us to build complex structures.
A data structure is a collection of values, the relationships among them, and the functions or operations that can be applied to these values.
An example of a data structure is numbers together with the plus and multiplication functions.
From the motivation in the previous chapter we can see the need of forming such a data structure, where, for example, a block is a structure that contains a hash, an owner, and transaction amount.
There are many data structures. An ordered list is one example, representing the numbers (1, 2, 3) in that order. Further, there are operations on lists, such as counting the number of elements, merging two lists, and so on.
We used the numbers 1, 2, and 3 in the previous example of a list—these elements are primitives . A list, together with its operations, represents an abstraction . Chaining several list operations together represents a composition .
Now that we can transform some data structure (by applying operations on it), it would be good to be able to repeatedly transform a data structure according to some specific rules. For example, if we have a blockchain data structure we may want to come up with a way to transform it such that, for example, a new block is inserted in it. This might require applying the same operation several times.
In mathematics and computer science, functions exhibit recursive behavior when they can be defined by two properties:
- 1.
A simple base case (or cases)—a terminating case that returns a value without using recursion
- 2.
A rule (or rules) that reduces toward the base case
For example, using substitution we can see that fact(3) evaluates to 3 ⋅ fact(2), which is 3 ⋅ 2 ⋅ fact(1), and finally 3 ⋅ 2 ⋅ 1 ⋅ fact(0), which is just 6.
The recursion we just discussed applies the same operation multiple times, and gives the motivation for the next definition.
A tree is a hierarchical, recursive data structure that can have two possible values:
- 1.
An empty value
- 2.
A single value, coupled with another two subtrees
Trees are important in Lisps, as they are used to represent a program’s structure. We will discuss this more in the next section.
2.1.2 Languages and Syntax
In this section, we take a quick look at the foundations of a Lisp, which will provide a high overview of the ideas behind Lisps.
A language consists of:
- 1.
Symbols, which can be combined into sentences
- 2.
Grammar, which is a set of rules that tells us which sentences are well-formed
This definition of a language is also reflected in programming languages, which have a special grammar called the syntax. For example, the C programming language has a special syntax—you have to follow specific rules when writing program statements.
An abstract syntax tree is a tree representation of the abstract syntactic structure of source code written in a programming language.
Lisps do not have the restriction of a special syntax like C has, for example. The code that we will write will be the actual abstract syntax tree. This is why Lisps rely on prefix notation. We see how Lisps are based on a minimalistic design, as we do not get the overhead of many other languages that have special syntax and sometimes functionalities that overlap.
The syntactic elements in Lisp are symbolic expressions or S-expressions. An S-expression can be one of:
- 1.
A symbol (a string of characters)
- 2.
A well-formed list (balanced parentheses) of S-expressions
For example, hello is a valid S-expression, and so is (hello there). But (hello( is not a valid S-expression, because the parentheses are not balanced. Whitespace is important in constructing S-expressions. Note that h ello is different from hello.
An S-expression is well-formed if and only if the abstract syntax tree is balanced.
Syntax has a special meaning in Lisps compared to other languages. With macros as part of the core language, it’s possible to extend this syntax.2 S-expressions form the syntax of a Lisp.
We treated numbers with the plus function as a data structure. Think of another data structure.
Evaluate sum(3), sum(5), and sum(1) given the following definition:
What about sum(−1)?
Which of the following S-expressions is valid?
1. hello
2. 123
3. (hello 123)
4. (hello (123)
5. (+ 1 (* 2 3))
6. (+ (* 3 2) (/ 6 2))
Hint: Drawing an abstract syntax tree for each of the expressions might make it more obvious why one is or isn’t valid.
2.2 Configuration and Installation
The upper text area part of the screen is the definitions area, where we usually write the definitions. Alternatively, the lower part is the interactions area, where we interact with the definitions.
The Help Desk, found under Help > Help Desk on the top menu, contains useful information such as a quick introduction, reference manuals, and examples.
Using the graphical user interface (GUI), which is the recommended way and what we use throughout this book
Using the command-line utilities (racket is the interpreter/compiler and raco is the package manager), which is for more advanced users
2.3 Introduction to Racket
The first thing that Lisp newcomers notice is that there are too many parentheses in Lisp programs. This is true, but it is a direct consequence of the fact that we are writing our abstract syntax tree in a language that has no special syntax.
As we go through this book, we will see the power of expressiveness we get as a result. For example, one advantage is that there is no need for a special order of operations. In high school, we had to remember that * and / had to be evaluated before + and -. This is not the case with Lisps, as the order of evaluation is obvious by the way we’ve written our program.
Let’s consider the expression (+ 1 (* 2 3)). As we mentioned, whitespace is an important part of S-expressions, so (+1 (* 2 3)) is different from (+ 1 (* 2 3)). To convert this to a more familiar notation in our mind, we could swap the operators so that they are in infix notation instead of prefix: (+ 1 (* 2 3)) is the same as (1 + (2 * 3)). We now see that the value of this expression is 7.
We get 7 as a result, as depicted in Figure 2-4. We’ve done our first calculation in Racket.
After finishing an evaluation, DrRacket again waits for a new command. This is because in the interactions area, we are in REPL mode, which stands for Read-Evaluate-Print-Loop. That is, the interactions area will read what we write, try to evaluate it (come up with a result), print the result, and loop back to reading again.
- 1.
(+ 1 (* 2 3))
- 2.
(+ 1 6)
- 3.
7
That is, in each step, we reduce the expression until no further reductions are possible. We immediately notice how powerful substitution as a concept is.
2.3.1 Primitive Types
In the evaluation above , we got a number as a result—the value 7 has a type of number. While types of values are implicit in Racket, we still have a way to check what the type of a value is, as we will see later with the help of predicates.
Symbols, such as hello, world
Lists, such as (1, 2, 3)
Functions, such as f(x) = x + 1
Numbers, such as 1, 2, 3.14
Booleans, such as #t (for True) and #f (for False)
Characters or single letters: #A, #B, #C
Strings or lists of characters: "Hello", "World"
Bytes: Per ASCII code, we can represent characters in terms of a number (e.g. 72 is H)
Bytes, as a list of byte: #"Hello", #"World"
- 1.
The first evaluation (123) has a type of number.
- 2.
The second (#t) and third (#f) evaluations have a type of boolean.
- 3.
The fourth evaluation (#A) has a type of character.
- 4.
The fifth evaluation ("Hello World") has a type of string.
- 5.
The sixth evaluation ((bytes 72 101 108 108 111)) has a type of bytes and is using the ASCII table for letters.
We cover symbols, lists, and functions in the following sections .
2.3.2 Lists, Evaluation, and Quotes
list is a built-in function, just like +, which we already used. list accepts any number of arguments, and as a result, returns a list generated from them. The returned expression '(1 2 3) is just a fancy notation, which is equivalent to the expression (quote (1 2 3)), where we tell Racket to return the actual list (1 2 3) instead of evaluating it.
We notice how parentheses are used to denote a function call, or evaluation. In general, the code (f a_1 a_2 ... a_n) makes a function call to f, passing n arguments in that order. For example, for the function f(x) = x + 1, one example evaluation is f(1) and we write (f 1), where as a return value we get 2.
Note that (list 1 2 3) returned '(1 2 3) as a result. Let’s try to understand what happened here. If (list 1 2 3) had returned (1 2 3), this wouldn’t have made much sense since (as we discussed) this notation would try to call the (nonexistent) function 1 with arguments 2 and 3. Instead, it returned a quoted list: ‘(1 2 3).
Say your name
Say “your name”
In the first example, you expect the person to tell you their name. In the second example, you expect them to say the words “your name,” rather than their actual name.
This allows for the creation of new symbols and is especially important for the creation of macros. There is a special list, called the empty list, and is denoted as (list), or (quote ()), or simply '(). We will later see why this list is special when we start using recursion.
In Racket, parentheses, brackets, and braces have the same effect. Thus, (list 1 2 3) is the same as [list 1 2 3] and {list 1 2 3}. This visual distinction may be useful to group evaluations when there are a lot of parentheses.
Recall that S-expressions can be either a symbol or a list. Since we discussed evaluation, lists, and symbols in this section, at this point in the book we have covered what makes the core of a Lisp.
Create a list in Racket that contains a mixture of different types (numbers, strings). The list should have at least two elements.
Note that list is a function and quote is a syntax. Read the manual for both using the F1 key.
2.3.3 Dotted Pairs
Note how we used function composition here, namely, we “composed” car and cons in the first example, and cdr and cons in the second example.
The motivation for using a list is that it will allow us, for example, to link several blocks together to make a blockchain.
Racket also supports sets. In a list/pair there can be repeated elements, but in a set all elements are unique. Additionally, there are operations that we can use on sets, such as union (merges two sets), subtraction (removes the elements in set 1 that are found in set 2), and so on.
We notice how in Lisp, depending only on a few primitive notions (function calls, pairs, and quote), we can capture abstraction. We will talk more about this in Section 2.3.13.
Represent the same list you created in Exercise 2-4 using cons.
Use a combination of car and cdr for the list in Exercise 2-6 to get the second element in the list.
2.3.4 Adding Definitions
In the definitions area, we added some code. We notice that we used another built-in syntax named define to attach a value (123) to a symbol/variable (a-number).
In the interactions area, we interacted with something that was already defined in the definitions area. In this case, the interaction was to just display the definition’s value by referring to its symbol.
In this book, every Racket program will start with #lang racket. This means that we will be dealing with Racket’s ordinary syntax. There are different values this can accept; for example, we can work with a language specialized in drawing graphics, but that is out of context for this book.
Everything that we write in the definitions area we can also write in the interactions area and vice versa. To have the definitions available in the interactions area, we need to run the program by choosing Racket > Run from the top menu. Note that when we run a program, the interactions area gets cleared.
Definitions can be saved to a file for later usage by choosing File > Save Definitions from the top menu.
Store the list from Exercise 2-7 in the definitions area and then use car and cdr on that list in the interactions area.
2.3.5 Procedures and Functions
In Lisp, a procedure is essentially a mathematical function. When called, it returns some data. However, unlike mathematical functions, some Lisp expressions and procedures have side effects.
For example, consider two functions: add-1 increases a number by one and get-current-weather gets the current weather from an external service. The first function will always return the same value at any point in time, whereas the second “function” can return different values at different points in time.
Thus Lisp procedures are not always functions in the “pure” sense of mathematics, but in practice, they are frequently referred to as “functions” anyway (even those that may have side effects), to emphasize that a computed result is always returned.
Given the reasoning in the previous paragraph, from this point, we will refrain from using the word “function” and stick with “procedure.”
There is special built-in syntax called lambda , which accepts two arguments and produces a procedure as a result. The first argument is a list of arguments that the procedure will accept, and the second argument is an expression (body) that acts on the arguments from the list.
For example, (lambda (x) (+ x 1)) returns a procedure that accepts a single argument x, such that when this procedure is called with an argument, it increases this argument’s value by one: (+ x 1).
In Exercise 2-7 you retrieved the second element from a list with (car (cdr l)). Define a procedure that accepts a list and returns the second element from that list.
2.3.6 Conditional Procedures
Conditional to check
Expression to evaluate if the conditional is true
Expression to evaluate if the conditional is false
Optionally, the last test can be an else to use the specific action if none of the conditions match.
eq? checks whether two arguments refer to the same object in memory.
eqv? is the same as eq?, except that it can also be used for primitive types (e.g. numbers, strings).
equal? is the same as eqv?, except that it can also be used to check if the arguments have the same recursive structure (e.g. lists).
Note that there’s only one empty list '() in memory,3 so all three predicates will return the same value when checking against empty lists.
As expected, this will return true for the empty list but cannot compare objects that aren’t referring to the same memory location or lists that actually have elements.
We already defined is-large using the cond syntax. Represent it using the if syntax.
Represent the following logic using cond: return 'foo if the value is a string or a number, and 'bar otherwise.
2.3.7 Recursive Procedures
Procedures, just like data structures (e.g. trees), can be recursive. We already saw an example with the factorial procedure, in that it called itself to make a computation or a loop.
- 1.
For the empty list, just return 0 since the length of an empty list is 0.
- 2.
Otherwise, return the value of one plus (list-length (cdr x)).
We just saw an example of a recursive behavior, since the recursive cases were reduced to the base case to get a result. With this example, we can see the power of recursion and how it allows us to process values in a repeating manner.
Both procedures are recursive, in that they generate the same result. However, the nature of the evaluation is very different.
Recursive procedures can generate an iterative or a recursive process:
- 1.
A recursive process is one where the current state of calculation is not captured by the arguments, and so it relies on “deferred” evaluations
- 2.
An iterative process is where the current state of calculation is captured completely by its arguments
In the previous examples, list-length generates a recursive process since it needs to go down to the base case and then build its way back up to do the calculations that were “deferred.” In contrast, list-length-iter generates an iterative process, since the results are captured in the arguments.
This distinction is important because the very different nature of evaluation implies a few things. For example, iterative processes evaluate faster than recursive ones.
In contrast, some algorithms cannot be written using iterative processes, as we will see later with left and right folds.
The way we implemented fact represents a recursive procedure that generates a recursive process. Rework it so that it is still a recursive procedure and generates an iterative process.
2.3.8 Procedures That Return Procedures
Note the new syntax on line 3. It states that the return value of the expression on the previous line is a procedure named f. However, on line 4, when we execute (f 1), we get an unnamed procedure. That is because lambdas are anonymous functions without a name.
Note how we define my-cons to return another procedure that accepts an argument z, and then based on that argument’s value, we return either the first or the second element.
Using the substitution method, (my-cons 1 2) evaluates to (lambda (z) (if (= z1 2)). So, this lambda (procedure) “captures” data in a sense. Then, when we call my-car or my-cdr on this procedure, we just pass 1 or 2 to get the first or the second value, respectively.
Implement a procedure so that when it’s evaluated, it returns a procedure that returns a constant number.
Hint: (lambda () 1) is a procedure that accepts no arguments and returns a constant number.
2.3.9 General Higher-Order Procedures
With the example above, we’ve seen how Racket can return procedures as return values (output). However, it can also accept procedures as arguments (input).
A higher-order procedure takes one or more procedures as arguments or returns a procedure as a result.
There are three common built-in higher-order procedures: map, filter and fold.
For the empty list, we just return the empty list.
Otherwise, we extract the first element, apply the transformation procedure, and reconstruct the list by recursively mapping the remainder of the elements.
For the empty list, just as before, we just return the empty list.
Otherwise, if a predicate matches the current element, we include it in the generation of the new list, recursively filtering the remainder of the elements.
Otherwise, we recursively filter the remainder of the elements, skipping the addition of the current one to the list.
For the empty list, we return the initial value.
Otherwise, we use the combining operator to the current element, applied to the folded remainder of the list.
For the empty list, we return the initial value.
Otherwise, we call the fold again, changing the initial value to be combined with the current element and the remainder of the list.
Note that the right fold exhibits a recursive process (think my-length), while the left one exhibits an iterative process (think my-length-iter).
Implement a procedure so that it calls a procedure that’s passed in the arguments.
Hint: (... (lambda () 1)) should return 1.
Use DrRacket’s feature to follow definitions on my-map, my-filter, my-foldr, and my-foldl to get a better understanding of how they work.
Pick some operators and predicates and use my-map, my-filter, my-foldr, and my-foldl with them on lists to see what they evaluate to.
2.3.10 Packages
A package in Racket resembles a set of definitions someone has written for others to use.
For example, if we want to use hashing procedures, we would pick a package that implements these and use them. This allows us to put our focus on the system design instead of defining everything from scratch.
Packages can be browsed at https://pkgs.racket-lang.org. They can be installed from the DrRacket GUI. When we try to use a package, we will be provided with an option to install it, given it is available in the packages repository. Alternatively, packages can be installed using raco pkg install <package_name> from the command line. We will take advantage of packages later.
Note that add-one was undefined because only the procedures we provide in the special syntax (provide ...) will be available for use by those that require the package .
2.3.11 Scope
We created a variable called my-number and assigned the number 123 to it. We also created a procedure called add-to-my-number, which adds a number (that’s passed to it as an argument) to my-number.
Scope refers to the visibility of some specific definitions, or to which parts of the program can use these definitions.
my-number is defined at the same “level” as add-to-my-number, so it is in the scope of add-to-my-number. But the x in add-to-my-number is only accessible in the body of the procedure definition and not accessible to anything outside it.
Variable “shadowing ” occurs when a variable defined in scope has the same name as a variable defined in an outer scope.
In the second example, we have a let within a let. The inner let is defining an x and so is the outer let. However, the x within the inner let will be used in the inner let’s body.
test-1 is using a-number from the global scope. test-2 is using variable shadowing for my-number, so it is the same as saying (define (test-2 x) (+ x x)).
Use DrRacket’s feature to follow definitions on test-1 and test-2.
2.3.12 Mutation
Mutation allows a variable to be redefined with a different value.
Will happily print 123 followed by 1234.
begin allows us to sequence multiple expressions, executing them in order .
This is what makes it hard to reason about programs—when some of the values are modified, some procedures might return different values for the same inputs. Thus, care must be taken when using mutation. However, we will use mutation in the peer- to-peer implementation later, which will make things slightly simpler .
2.3.13 Structures
A structure is a composite data type that defines a grouped list of variables to be placed under one name.
In Racket, the special syntax struct allows us to capture data structures and come up with a new kind of abstraction. In a sense, we already know how we can capture abstractions with car, cons, and cdr. However, struct is much more convenient since it automatically provides procedures to construct a data type and retrieve its values.
document-author, document-title, document-content extracts values from objects.
document constructs an object of such type.
document? checks whether a given object is of such type.
Then, using document on line 3, we can construct an object that is using this data structure.
The #:mutable keyword will automatically generate set-<field>! procedures for every property in the structure.
Create a person structure that contains a first name, last name, and age.
2.3.14 Threads
A thread is a sequence of instructions that can execute in parallel.
Racket has a built-in procedure thread that accepts a procedure that will run in parallel without blocking the next instruction in order.
Note how we used (thread (lambda () ...)) instead of just (thread ...). As we said, thread expects a procedure, but at the end of the evaluation, there would be the output of factorial of some number (for example 3), so (thread 3) does not make sense.
In this parallel execution, the output is not ordered as it was in the previous case. This means that the lambdas within thread are being executed in parallel, so the order of execution cannot be guaranteed.
We will use threads for parallel processing in the peer-to-peer implementation later , where we will have one thread per peer so that when we are serving one peer we don’t block the serving of other peers.
2.4 Creating an Executable
The idea behind producing an executable is so that you can run it on other computers without requiring a DrRacket installation, and also without having to share the original code. In the later chapters, we will create an executable so that the blockchain can be used and shared by others.
This code will just print the text Hello. The print procedure prints some text (similar to display), and read-bytes-line waits for user input. If we did not use read-bytes-line, it would just print and exit right away, before we could read the text.
Running the executable should show something similar to Figure 2-10. Pressing the Return key will exit the program.
2.5 Summary
Lisp is a family of programming languages and Racket belongs to the Lisp family.
Lisps have no special syntax compared to standard programming languages and syntax is defined differently in Lisp, through S-expressions.
Lisp evaluation is very similar to substitution in mathematics.
There are several primitive types: symbols, booleans, characters, strings, and lists.
Lists are special kinds of pairs.
Procedures are a way to capture abstraction. They can accept and return any kind of type, including procedures themselves. They can also be recursive.
Packages allow us to reuse code, written either by ourselves or by someone else.
Produced executables can be shared with friends so that everyone can use them.