In this chapter we will learn the basics of blockchain, addresses, transactions, and Bitcoin’s embedded scripting language. We will generate addresses, create transactions, and send them through the network to explore the inner workings of Bitcoin’s scripting engine.
Blockchain
Bitcoin’s blockchain is an append-only database containing an ordered, back-linked list of blocks that’s being replicated over tens of thousands of nodes, which continuously validate new blocks and update their copies according to the current consensus. Very often we can read descriptions of blockchains as decentralized and distributed databases, which I consider only partially correct. Although their decentralized nature is obvious due to the fact that every node maintains its own copy and only accepts those blocks which obey the Consensus Rules embedded in Bitcoin’s code, the second part of the definition isn’t quite correct. Bitcoin’s blockchain isn’t distributed but replicated , because every node must do the same work by executing the same Consensus Rules, before a block gets accepted as valid.
The topmost block is also called a “tip”. Later, when we start learning how to use Bitcoin’s RPC API,2 we will execute commands that give us detailed information about a blockchain’s state. Another term often used is “parent block” which means the predecessor of a block. As every block contains a hash as its identifier and links back to its predecessor, this block is then being called “parent” block.
In the previous chapter, we have seen that all blocks in Bitcoin’s blockchain ultimately go back to the very first block, the Genesis Block. Every block only has a single parent, but there could exist more than one “child block” that is not yet a confirmed member of the blockchain. Each time a new candidate block arrives in the network, there is also a mining entity behind it that wants to win the racing game. Getting blocks included in the chain is the only way to get rewards. And sometimes it comes to situations where multiple miners have created competing blocks which are valid according to Consensus Rules, but because they’re representing different possible states of the future blockchain, nodes validating them will have to decide which of them will ultimately become the next “tip”.
These operations are nothing exceptional as the design of Bitcoin allows for this flexibility under the assumption that the majority of mining nodes is honest and not willingly manipulating the chain. But as we have already seen in the previous chapter, the competition in Bitcoin and its aligned system of rewards and punishments is keeping all of its participants acting honestly, because any other behavior would ultimately lead to severe monetary losses. This behavior is also a practical example on how the Emergent Consensus happens in the blockchain. By allowing several forks to exist, the Bitcoin protocol leaves enough room for nodes to find a solution which would be very hard to achieve if there was some hard-coded rule that would ultimately decide on what kind of block “deserves” to become the tip of a valid chain.
Moreover, such a rule would ultimately lead to a dangerous partition of the network, which would render Bitcoin unusable as currency, because several variants of “Bitcoin” could exist at the same time. To prevent this the Emergent Consensus exists, which allows participating nodes to resolve any “disputes” automatically. As Bitcoin nodes can join and leave the network anytime, there is always the question: What is the current state of the chain? As we will see on the following pages, a node must first take care of discovering3 as many peer nodes as possible and also informing them about its own existence, before it can start processing any blocks. Also, it must get enough information about the current state of the chain, for example, about the current height and which services other nodes support. A decentralized network is forcing every node to find its own way “out of the dark”, before it can start validating blocks and transactions.
Peer-to-Peer Networking
As no Bitcoin node can function without sharing blockchain information, the question is, how does it find others in the first place? The answer lies in the peer-to-peer, or P2P, network that nodes use to discover each other and exchange data. By default, every node carries a few hard-coded IP addresses and DNS entries that help it kick off the initial discovery procedure. Those nodes, also called seed-nodes , contain information about further nodes, which our node can then use to create a more detailed topology of the network. Being decentralized, Bitcoin’s network has a flat structure without any servers so that every node must find its neighbors on its own, by trying to connect with other IP addresses.
The protocol version it’s following, like 70002
The services it supports, like NODE_NETWORK or NODE_SEGWIT
The current time
The IP address of the node it contacted
Its own IP address
A string describing the local client version, like /Satoshi:0.18.0/
Current blockchain tip6
The contacted node would then analyze this message and send back an acknowledgement message called verack .7 After the node has contacted some of the hard-coded seed-nodes, it’d then receive a list of IP addresses of other validating nodes, which it can then use to expand its network topology. The seed-nodes are basically special DNS servers8 that mimic the DNS protocol by answering queries on port 53, the default DNS port. Each time a node sends a query on port 53, they send a list of IP addresses back. However, merely querying for IP addresses isn’t enough as every node also tries to make other nodes aware of its existence. Therefore, a node would also send addr-messages 9 to its peers, which then would forward it to their own peers, thus expanding the reach of the new node. Also, a node can query other nodes by sending getaddr-messages to receive its peers’ address lists. The strategy of this protocol is clear: to automatically discover, expand, and update the knowledge about the network.
One can also manually define or even disable DNS and peer discovery. To disable DNS completely, the Bitcoin client or daemon can be started with the flag dnsseed=0. To add further nodes, one can add their IP addresses and ports in bitcoin.conf, the configuration file of the Bitcoin client and daemon, with the entry addnode.10 If peers are using the default TCP port 8333, then only the IP address is needed. If only a single seed-node is needed, the flag seednode can be used to define the DNS node to be contacted at the next client or daemon start.
The reason why peers answer with only 500 hashes is to prevent overwhelming the asking node. The asking node will then use those 500 hashes to send further getblocks messages to other nodes, which then in turn will send another 500 hashes that are located on higher positions in the chain.
This way a node can easily send many queries that can be combined into an ordered line of blocks. To get full block data, a node would have to send a message called getdata that contains the hash of the block whose data should be returned.
The whole procedure of querying and updating peers might seem too complex, but one must not forget that in decentralized networks, every participant can go offline anytime without warning. Therefore, nodes are continuously updating and validating their connections to ensure that they always have the most reliable channels, because only those can ensure access to most recent blockchain data. The quest for peer nodes has always been a vital part of Bitcoin’s network, and in the past, it has even used the IRC chat protocol for such tasks. The modern variants of Bitcoin Core client no longer use IRC, but the old source code is still available as an interesting artifact from Bitcoin’s past.11
Node Types
Another important aspect in Bitcoin’s P2P network is the specialization of nodes. Although the protocol itself doesn’t distinguish between various types of nodes, there are configuration options available to maintain and operate nodes with different specializations. We have already mentioned the term Full Node ,12 without having properly defined it. A Full Node is any node that maintains its own wallet, has a full blockchain copy, and validates transactions without relying on any external service or node. Throughout this book we will be building Full Nodes and later also Lightning Nodes that are based on them.
But these aren’t the only available types of Bitcoin nodes. There are also SPV nodes (simplified payment verification) that maintain a lighter version of the full blockchain, which needs much less disk space, and are therefore dependent on other nodes with full blockchain copies. One of the use-cases for such nodes are various mobile wallet applications. As the full blockchain copy consumes more than 250GB, we couldn’t run such a node on a mobile device. But as we will see later, a mobile node only needs to download the headers of blocks which are always 80 bytes long. Currently, the space needed to save all those headers would be less than 50MB, which is not a problem for any modern smartphone. This is possible because blocks are chained together via their block headers so that the second half of each block, the transactions, can be discarded. Only when a mobile node needs information about a certain transaction it would have to issue a request against a Full Node that has the complete information saved in the blockchain. This configuration of course raises another question that has to do with the integrity of information being processed by mobile wallets. As mobile nodes would from time to time need external nodes to process their transactions, one could never achieve the same level of security and independence as possible with Full Nodes. This is a known trade-off that can be mitigated by using own Full Nodes as the only “trusted nodes ” to communicate with. Many good mobile wallets offer an option to configure one or more Full Nodes as sources of transaction information.
Another important functionality we already talked about is routing, which is part of every functioning node . And of course, the mining functionality itself is available in every Bitcoin Core client, but only a subset of running nodes in the network are actually using it. Those functionalities can be combined as not every node has to have all of them activated. On the following pages, we will explore different node types and how they are being used.
Additionally, it uses a protocol for communication with the pool server. Very often this protocol is Stratum,13 but in the future, there will be other mining protocols available, for example, BetterHash,14 that’s currently in development.
There is also another type of Full Node which uses a pruned blockchain copy to save disk space. Such nodes have the same security level as nodes with full copies but don’t keep the whole transaction history locally. Their blockchain copy is of smaller size as they don’t keep previous transactions after they have been validated.
Signatures
To control ownership in Bitcoin, we need certain technologies from the field of cryptography, which is a branch of mathematics. Usually, when we say cryptography, we think about writing secret, encrypted texts or development of algorithms, that are very hard to break. But in Bitcoin we aren’t that much interested in encrypting information as everything that’s written in its blockchain will remain public forever, because it must be accessible for validation of transactions and blocks. What we want instead from cryptography are other, maybe lesser-known tools like digital signatures and keys .
Bitcoin is not only about technology but also about ownership. Before we can spend any bitcoin, we must have provided a proof that we own it. Before someone can send us any “coin”, he or she must have included certain information that points at us as future owners. The question of ownership and its transfer is essential to Bitcoin, and this is why it relies on proven cryptographical technologies, which have been available for decades. In fact, Bitcoin rests on several foundations that predate the World Wide Web, and even the DNS protocol.
One of those foundations is the Public Key Cryptography ,16 that was invented in the 1970s. With PKC we can generate key-pairs that comprise of two keys, one private and the other public. The private key we always keep to ourselves and never disclose it, while the public key can be used for communication with other parties. The mathematics17 behind this technique is quite complex and could easily fill several books so that we will only concentrate on its utilization. Both of the keys we can keep in our wallets, which are very often built into the software we’re using.
For example, the standard Bitcoin Core Wallet , which we’ll compile from scratch, is one such software. However, there are other types of wallets like Hardware Wallets, which are small devices that behave like air-gapped computers18 and never let the private key escape the security chip. Most often we’ll be keeping both types of keys in our wallets, but we could also keep private keys only, as the public ones can be easily generated based on private keys. This is due to the nature of PKC, which allows to generate an almost infinite number of public keys based on a single private key, but not vice versa. Therefore, one of the most important features of PKC is the fact that it is very easy to go from private key to public key, but practically impossible to find out a private key based on the information provided by a public key. However, although very important, this is not everything we need in Bitcoin. As already mentioned, we aren’t interested in creating secret messages but instead in securing ownership of funds written in the blockchain. For this we need an additional feature from PKC, the digital signatures .
A digital signature is a number generated by a hash function that’s being used in Bitcoin to prove the ownership of funds. If I want to spend bitcoin, I must create a transaction which shows that I am indeed the owner of this bitcoin, because otherwise Bitcoin’s scripting engine would refuse to process my transaction. Therefore, I’d use my private key as input data for a hash function that will generate a corresponding digital signature, which then can be later checked for validity by using my public key. With hash function we mean functions that can take any kind of input data, whatever their lengths, and return a digest value of a fixed length. No matter how long our inputs are, the returned “fingerprints” will always be of the same length but each time different. And even if we would change only a single element of input data, the generated fingerprint would change drastically. Hash functions are often being used to check the integrity of documents and to ensure that original data hasn’t been changed.
Another interesting property of PKC is its asymmetry: one can check a digital signature created by a private key by using a public key that’s related to this private key. The key we use to create signatures isn’t the key for checking its validity. One can easily check the results of certain operations (output), but it’s practically impossible to calculate back to original (input) values. It is easy to check a digital signature for its validity, which is the output value of an operation done with a private key, but it’s impossible to calculate the private key itself based on a signature alone. This directly gives the answer to the question on how one could ever be able to secure funds on a public blockchain. As participants never reveal their private keys and only let others check their ownership claims via public keys and signatures, there is no way for any party to steal funds from any other.
Addresses
One example of a digital signature used in Bitcoin is addresses. Most often addresses are based on hashed values of a public key. However, this is not always the case as addresses can be generated by using various inputs as we will see later. In its infancy, the Bitcoin protocol used raw public keys as addresses without hashing them in any way. Early Bitcoin Wallets could even send funds to IP addresses. This functionality was later removed as it was prone to man-in-the-middle attacks.19 Over time, more complex and also more powerful address types arrived on stage.
We take one of our public keys and let SHA256 generate a hash based on it. Then we use this hash as input value for RIPEMD160 function, which then returns another hash value. Basically, a hash of a hash. However, this is not the final hash value we’ll be using as our address, because in the next step, we’d have to encode it in Base5820 format, which is a subset of the more widely known Base64 format. Base58 prohibits the usage of certain ambiguous characters (0, O, l, and 1) to improve readability. In most cases the Bitcoin addresses are being generated by using the Base58Check format that not only generates the final output but also checks for potential errors. Therefore, the addresses generated by it always contain four additional bytes at the end that represent the checksum of the address, which are being used for validation.
And because we have different types of data in Bitcoin, we will also need to prefix our data with a single byte that represents its version. In our case this would be a zero, because we are creating a Bitcoin address. The checksum that we generate is based on the hashed value and this version byte. To get the checksum, we use double-SHA256 function from which we then take 4 bytes and append to the hash value of our public key we generated at the beginning.
Get Checksum from VersionByte+Data.
Get Base58Check format from VersionByte+Data+Checksum.
Here are a few examples of valid Bitcoin address-type variants with their prefixes in bold:
The first example represents the oldest address type that directly maps to a public key. This is the address we just generated with our pseudo-code. The other two address types are based on hashes generated from scripts that we will meet in later chapters. A script in Bitcoin means an operation that will be executed during a transaction and whose returned value will decide if this transaction will be accepted or not. The scripts could contain any condition or logic that return Boolean values (true/false). Later, we will learn how to write scripts, but for now we should keep in mind that P2SH and Bech32 addresses can represent more complex structures than mere public keys. Instead of having a hash of a public key, we now have a hash of a whole script, which is an advantage as hashes are of constant length no matter how long the input value was.
These address formats can be used freely and without any constraints. One can send funds from and to other address types without any problems. The reason Bitcoin has different address types is because of historical and technical reasons. For example, P2SH-type addresses aren’t based on public keys but instead on hashes of scripts, which must be executed by their respective receivers to receive funds.
To test all those key types, one can use web sites like https://bitaddress.org. However, if you are going to generate addresses for private use, you must not generate them in an online browser as there is always a security risk that someone could eavesdrop your communication and steal your private keys. Better download the page and go offline when creating new key pairs. Another way to create them is by using the command getnewaddress 23 in the Console Window of the Bitcoin Core client or via the command line tool bitcoin-cli . You can also set the type and alias of the address to be generated when executing this command.
If your wallet is encrypted, you will have to unlock it first with command walletpassphrase , before you can use dumpprivkey and similar operations, which are able to change the contents of a wallet. Of course, such commands should never be used in public or on machines, which are not properly secured, as knowledge of private keys ultimately means ownership of funds associated with them. Not your keys, not your coins, is a well-known saying in Bitcoin circles. Another useful command for handling addresses in dumpwallet , which is used to export all keys to a human-readable local file. The opposite command, for importing such files, is importwallet.
The public and private keys being used in Bitcoin can be represented by using different formats. The software itself relies on raw bytes, and most of the time, we won’t have to deal with those 256-bit numbers. Instead, we will be using WIF (wallet import format) keys, which combine public key hashes, version prefixes, and checksums.
List of available address types in Bitcoin
Type | Hexa | Base58 | Example |
---|---|---|---|
P2PKH | 0x00 | 1 | 1JM25UwUUkGuKxzjjWzHGH8a556GTakJFW |
P2SH | 0x05 | 3 | 3GJAAhX7fnR4TDFoLR1gYMroA1ZRVF6Mzg |
Private key compressed (WIF) | 0c80 | K or L | KyiAsiqteZL7yg5qzuPN5HvWHZrE5MbAUtr44YfmE3KpKVcWNhqu |
Private key uncompressed (WIF) | 0c80 | 5 | 5JhEdxAiep1fATDEa6TiTf3wFDc2Q6rifhMjv2AcY6j3JqdGnMX |
Transactions
There’s plenty of information as we see, but where does the transfer of funds happen? We see some addresses, but there is no direct way of recognizing who gets what. To answer this question, one has to describe how Bitcoin transactions actually work. As we already know, there are no “accounts” in Bitcoin, that participants could use for spending or receiving funds. Being decentralized, Bitcoin offers no option to set up any kind of registration authority for its users. And because there are no “user accounts”, the first problem we approach is: How can we safely send funds from one party to another, if there are no “parties” at all? One part of the answer lies in the asymmetric cryptography and digital signatures, which take care of assigning funds to their owners and prohibiting access to anyone, who’s not able to provide a valid proof of ownership. As we have seen already, digital signatures are related to private keys but can’t be used to calculate them. Only owners of private keys are able to access funds assigned to their corresponding signatures. Another important fact is the way Bitcoin protocol sees those funds. Everything that’s not been spent is part of the global UTXO set (unspent transaction output). In fact, there are no “coins” in Bitcoin. They only exist as a user-friendly concept that’s constructed by applications which implement the Bitcoin protocol. The rest of the answer we will discover shortly.
All the network sees is a set of unspent outputs from every owner, regardless when the last transaction happened. Just like with “coins”, Bitcoin knows nothing about addresses, as they’re mere user-friendly concepts and not directly visible in the blockchain. There is no way for anyone to go from one address to another or track any inputs or outputs. The only structure Bitcoin sees are small programs that get executed by its internal scripting engine. Bitcoin’s functionality relies on an embedded programming language that never got a proper name so that we call it “Script”. Later we will learn a bit more about it, but for now, we should keep in mind that all Bitcoin does is execute small scripts (programs), which ultimately decide about ownership of funds.
When we want to spend our coins, we have to give exact information which of the parts of the available UTXOs belong to us, that is, we have to provide a valid proof of ownership to the network so we can access those funds. Spoken more abstractly, we could say that creating transactions in Bitcoin is actually changing the state of the UTXO set. There is actually no option to move any coins in Bitcoin, because there are not only no coins available, but also because there isn’t anything that could be moved at all. The only option we have, and this is the essence of every transaction, is to change the ownership information regarding some part of the UTXO set. What moves in Bitcoin is the ownership objects and not the objects themselves.
In many ways, it would be better to use the land metaphor to describe ownership in Bitcoin than metallic coins. Just like a person can own a piece of land and hand it over to someone else, without ever being able to move it, the same happens with bitcoins, which one “sends” to someone else. In most cases, the balance is being calculated by the software used to access Bitcoin’s network. The sum of all unspent outputs is presented as a certain number of bitcoins, which is just a more convenient way to present the current UTXO state to the user.
Another important rule regarding UTXOs is that they can only be consumed once and as a whole. There is no way to split up an UTXO before including it in a transaction. This of course raises the question of what should be done when an UTXO contains a value that’s higher than the one we would like to spend. If I own two bitcoins, which come from some previous transaction, but want to spend only one in a future transaction, what would happen to my other bitcoin? The rule in Bitcoin protocol states that any bitcoins left automatically get included in transaction fees paid to the miner of the respective block. But we of course wouldn’t want to lose our remaining bitcoins.
If we look at the JSON object shown before, we notice the vin entry. This is the input transaction , that consumes all UTXOs, which we will be using to create a new output transaction . Before we can create any outputs, we have to consume some inputs. This is the source of our funds, that we will assign later, in the vout-section of the JSON object. There, we see two entries, each with a different value and addresses . This is the place where the previous input funds get sent to new addresses. In most cases, one of the entries will be a so-called change address 29 that’s sending the difference back to the creator of transaction. Usually, the software we use would automatically generates such an address. This is the place where we take care of getting back all the bitcoins that we don’t want to send (or let miners take them from us). By following this strategy, the Bitcoin protocol elegantly avoids any kind of accounting logic, because everything that deals with the ownership of funds can be maintained by using transactions alone.
The first entry in its “vin” is coinbase , and its “vout” comprises of a single entry that spends 50BTC, which was the initial block reward before the first halving in 2012, to address 1BW18n7MfpU35q4MTBSk8pse3XzQF8XvzT. Also, we see that the given type of address is pubkey . As mentioned before, the initial protocol version only used public keys without hashing them. Nowadays, the usage of raw public keys as addresses is discouraged, and only hashes of public keys (P2PKH) or pay-to-script-hash (P2SH) should be used. Another important piece of data is the number associated with reqSigs key . This value indicates the number of required signatures that is needed to unlock the funds. In both of the JSON objects presented, there was only a single signature needed, but Bitcoin also supports so-called multisig 32 transactions, that is, transactions where multiple signatories (each with own private key) create signatures that get included in a particular transaction.
To later access those funds, a certain minimal number of parties is needed to prove their ownership of funds by providing signatures, which correspond to previously given public keys. There are various options to create such multi-signature transactions, for example, 2-of-3, 2-of-2, etc. Each combination means that at least a certain number of signatures must be given, before funds can be spent. So, 2-of-3 means that if there were three signatories who locked the funds, later at least two of them must give their consent to spend the funds. Such combinations are very useful when executing contracts created in the real world, outside of blockchain, where parties could get prevented from or deliberately refuse to participate in the unlocking procedure. Also, multi-sig transactions could be used to enhance the security of funds, as one could keep several keys on different machines, or hardware wallets. In case of theft, funds would still be safe, as the attacker would need access to multiple keys, which is less likely to succeed. We will later talk about them and their importance in the Lightning Network in more detail. For now, we will only think about two participants in a transaction: a sender and a recipient.
To complete the analysis of a Bitcoin transaction, we have to understand how bitcoins get spent in the first place. As there is no way to spend anything in Bitcoin without having received it via some input transaction, we will now check the parts that make an output transaction possible. This may at first sound illogical; when we say that to be able to send bitcoins out of our wallet, we need an input transaction first, but don’t forget that Bitcoin protocol knows no accounts. Therefore, the only way to move anything in Bitcoin is to change the state of the global UTXO set. What makes an output transaction applicable is the existence of a previous input transaction that we use to prove our ownership of bitcoins.
But as this transaction could have many outputs, we must point at the one where our “input bitcoins” come from. For this we are using the vout key and the number 0, which means the first entry from the list of outputs coming from this transaction. Here we declare a previous output to become our own input. But as we know, we can’t simply point at things in Bitcoin and reclaim ownership. We have to provide a valid proof of claimed ownership. To successfully complete this task, we must also include another piece of code, the unlocking script, that will be executed by Bitcoin’s scripting engine. The result of this execution must be the Boolean value of TRUE.
These “input” bitcoins have been used to generate the next transaction from the example at the beginning. But they could only have been used after the puzzle given by scriptPubKey has been solved. What we see in scriptPubKey is one-half of the complete script that has to be executed by Bitcoin’s scripting engine. The script itself contains a few keywords (OP_DUP, OP_HASH160, OP_EQUALVERIFY) and a hexadecimal value, which will be discussed later. For now, it is sufficient to know that this half of the script demands that anyone who claims to be the owner of these bitcoins must provide the rest of the script so that it can be executed successfully. What’s missing right now is the other half: a signature and a corresponding public key. This data will be included when the UTXO of this transaction should become input of another transaction, that is, when someone wants to spend those funds. The value in scriptSig from the referencing transaction we discussed previously is the unlocking logic we need to spend the funds. The moment these two halves get combined and executed is when the scripting engine checks a transaction for validity. It is also important to know that there is no state during executing of scripts. We say that Bitcoin Script environment is stateless, as there are no preconditions for any script to be executed. The Engine takes the two scripts and checks if it can produce a TRUE by executing them one after another. There is nothing else that could ever influence this execution. Either a script contains everything that’s needed and it succeeds or it fails.
In the early versions of Bitcoin, both of the scripts were executed together. This was later changed due to security reasons, because it was possible for an attacker to spend funds without being the actual owner. Nowadays, the scripting engine executes scriptSig first, and if there are no errors, it executes scriptPubKey as well. If it succeeds at the end, the new transaction will be allowed to spend the referenced input bitcoins. This new transaction will later be included in one of the upcoming blocks. But if the attempted transaction fails, it will get discarded and can’t make it into a block. This is how Bitcoin ensures that only those who possess private keys can access the corresponding funds. However, unlocking scripts don’t have to be signatures and public keys only, but in most cases the unlocking script will contain such data.
In our preceding example, the owner was able to provide valid unlocking logic, and therefore the transaction that wanted to spend 12.54BTC made it into the block. Every input transaction references some output transaction, because for bitcoins to be spent, they must have come from some other transaction. The only exception to this rule is the very first transaction in the Genesis Block, where it didn’t have a preceding transaction, because there was no block before Genesis Block itself. This is also the reason why the initial transaction can never be spent, because there is no valid unlocking script for it. The irony is that the very first 50 bitcoins are locked forever. According to some calculations, four million bitcoins have been lost due to various reasons.33
Bitcoin Script
Everything that touches the blockchain is a script.34 From a purely technological perspective, we could say that Bitcoin is a giant script execution engine, because every time we send or receive funds, we do create transactions that carry little scripts of different complexities with them. In most of the cases our scripts transfer ownerships from one address to another. And the language used to execute them started its existence as an embedded part of Bitcoin’s source code. However, unlike most other programming languages, this one never was designed upfront, with a properly described grammar and syntax. It didn’t even have a name, which is the reason why it’s still simply being called “Script”. Its semantics and structure resemble an obscure language designed in the 1970s, called Forth.35 Unlike most of the modern programming languages, it uses the stack as its machine abstraction. This means that it pushes and pops certain data and operators to and from the stack during execution. To describe its algorithms, it uses the Reverse Polish Notation .36
This is how stack machines work. In Bitcoin’s script there is an operator, or Op-Code, named OP_ADD , that’s being used instead of the + sign. In general, one can push as many data into the stack as the stack memory allows, but as soon as an operator is being read, the stack machine would execute it by taking as many operands as the operator expects and then later pushing back any results into the stack.
When it comes to Bitcoin’s scripting engine, as long as the last value returned is a Boolean TRUE or any other nonzero value, the execution is considered successful. The last value remaining in the stack must be of that type to mark the respective transaction as valid.
Of course, if we provide scripts with missing operands or wrong operators, we’d get an error. There’s also a memory limit regarding script sizes, so the flexibility of putting as many data as one wants is only in theory. To prevent spam and other attacks, Bitcoin’s scripting engine imposes several constraints. As every participating node must validate each script, the possibility of running “bloated” scripts would open an attack surface for DDoS-ing37 the whole network.
This is the serialized content of spender’s digital signature and public key. The term serialization means that the binary representation of data in memory gets converted into its physical form, for example, as an entry in a text file. The opposite of it, deserialization , is the process of reconstructing data in memory based on some physical representation.
In Figure 2-14 we see that Bitcoin Script has prepended the signature and public key to the locking script taken from scriptPubKey . We now have the complete script that can be executed. The execution is done by reading data and commands from left to right. In the preceding figure, we see the script execution pointing at the first entry and moving further to the right. By default, the scripting engine pushes data into the stack until it reads an Op-Code. The Op-Codes don’t get pushed into the stack but executed as they aren’t data but code with certain logic embedded. Often, an Op-Code requires a certain number of elements to be popped from the stack and returns a value to be pushed back into the stack.
Although Script offers a wide range of Op-Codes and even branching in form of special IF-ELSE constructs, it is still not a Turing complete39 programming language. This means that Script has no loop constructs like most other languages do (for-each, while-do, loop-until, and similar semantics). This design choice was deliberate to prevent code execution that could exhaust node’s resources. Imagine a script with a never-ending for-each loop. This would be disastrous as nodes could never complete their validations tasks, which would ultimately lead to a halt of the blockchain.
Wallets
The term wallet in Bitcoin is both a misnomer and something that means very different things. For example, the standard Bitcoin Core application is called a wallet. Additionally, the structure that is managing keys inside the application is called wallet too. And then there are hardware wallets, small devices for keeping private keys in a secure way. But irrespective of its particular usage, the term wallet still remains misleading as it suggests that it’s containing bitcoins, which is impossible as those never leave the chain. More precisely, bitcoins don’t exist at all as Bitcoin protocol only deals with UTXOs, the unspent transaction outputs. The visual representation in form of bitcoin balances is done in the application stack that is located above the raw Bitcoin protocol environment. Therefore, it’d be much better to call all those different “wallets” keychains, because that’s what they really do: keeping and managing key pairs .
There exist two variants of wallets, nondeterministic and deterministic . The nondeterministic wallets are available since the very first version of Bitcoin Core software and basically mean that each key pair is created separately from any other key pair. Such wallets are basically sets of randomly generated key pairs. This of course makes them pretty hard to manage and restore, as we would have to keep track of every key pair and also make sure of not using an address twice, which is always a bad practice as it reveals too much information about our transactions and funds. In general, the usage of nondeterministic wallets is strongly discouraged.
The deterministic wallets,40 also called HD41 (hierarchical deterministic) wallets, are wallets, whose private keys are derived by using a one-way function that consumes a special input data called “seed”. The seed is technically a randomly generated number that later gets combined with additional data like index number and “chain codes” to generate private keys. The advantage of deterministic wallets is that one only needs seed data to restore a complete wallet, regardless how many private keys have been generated already. Unlike nondeterministic wallets, the private and public keys in deterministic wallets are derived one from another. Therefore, to recreate such a wallet, only the initial seed is needed. The rest would follow automatically.
Although sufficient for private use, such wallets have a significant weak point: it is not possible to create public keys without having access to corresponding private keys. This of course makes them unusable for use-cases where keys are exposed to untrusted public, for example, merchant applications. Imagine a web-facing shop application that offers a Pay-with-Bitcoin option. To generate a public key for each new invoice, this web server would have to have access to seed and private keys simultaneously, which would make it an ideal target for attacks. In such cases we need an option to generate public keys without having access to any private key.
With Type-2 wallets we can create different branches that could, for example, represent divisions within a company. Each division could have its own “root” private key that would not disclose any of its siblings, thus making it impossible to guess other keys and their structures. Also, a branch could be solely based on a public key alone like in the web-facing merchant application mentioned previously.
Generate a random sequence between 128 and 256 bits.
Calculate a checksum of the above sequence by taking the first 4 bytes from its SHA256 hash.
Expand the sequence by adding the checksum (it now becomes 132 bits long).
Split the sequence into equal 11-bit chunks.
Map those chunks to words from a 2048-word vocabulary.
Apply the PBKDF2 44algorithm by giving it the mnemonic and a “salt” value as inputs. Optionally, the salt can be expanded by entering a passphrase that adds additional randomness to the procedure. If no passphrase was entered, the value of salt would only contain the string value “mnemonic”.
PBKDF2 algorithm would then stretch the original mnemonic by applying 2048 rounds of hashing with HMAC-SHA512 algorithm. The resulting value would be 512-bit long. This is the actual seed.
These keys can later be used to create “normal” private and public keys. The seed also contains the chain code (the right 256 bits of the 512-bit seed), which is used to create different key chains within the tree structure. This way we can split up the wallet into several divisions, each with its own master private key and child-keys.
The optional passphrase should not be understood as “password” but moreover as a way of creating different wallets based on the same word mnemonic. This strategy can be applied to create different “wallet variants” that could serve different purposes. In extreme cases, for example, extortions, a person could “give away” the seed without giving the accompanying passphrase. The wallet based on seed alone would, for example, contain only a small amount of funds. The “correct” wallet, the one with larger amounts, could only be recreated with the passphrase. The passphrase is actually a pointer to a variant of a wallet, as there can be many of them based on the same seed but with different passphrases.
To experiment with the technology behind HD wallets a web site located at https://iancoleman.io/bip39/ is recommended. The various options offered there allow for testing of different scenarios. It is also possible to run this page locally, without a network connection, which should always be done when creating wallets for real-world usage.
Summary
In this chapter we have learned how Bitcoin works and what its most important parts are. We have looked into the Blockchain, the peer-to-peer network, with its various node types and the Bitcoin’s scripting engine. We have learned to understand scripts and have written and executed one that’s sending funds from one address to another. We have learned that Bitcoin only sees transactions and that everything else revolves around this concept. Without transactions Bitcoin wouldn’t be possible. Although Bitcoin knows nothing about “coins”, we have learned that various applications can help us abstract away many complexities from the underlying architecture. However, we have also learned that Bitcoin maintains a specific structure called the UTXO set, which transitions from one state to another each time a new block gets created. To send funds in Bitcoin actually means to change the UTXO set, which is the number of unspent transaction outputs. Ultimately, we have learned that the ecosystem around Bitcoin offers different options on how to keep our private keys safe, from old wallet types that carry around bunches of keys to highly sophisticated industry standards like HD wallets, which help us create and maintain separate key chains. We have delved into the technology behind HD wallet creation and followed the various steps toward a complete HD wallet based on a single seed mnemonic that can be safely stored outside any software application or device.