CHAPTER 3
Understanding the Blockchain

As I discussed in Chapter 1, “What Is a Cryptocurrency?” the blockchain is not a specific type of software or installation, but rather a concept of recording contracts and transactions in a ledger that is distributed across many nodes on a network. It is easy to think of the blockchain in terms of a specific product such as Bitcoin or the many alternative coins or other startups that are beginning to use it to provide data security and proof of accuracy of data. Although in the short term, most investigations that an analyst will be presented with will be connected to online currency investigations, it is extremely likely that blockchain contracts and so-called ICOs (initial coin offerings) will make their way into cyber labs across the world.

It is also important for a detective to be able to explain the concepts when asked. I've often seen investigators made to look foolish and their competency called into question because they couldn't explain the fundamentals of a concept that they are being asked to give evidence about.

Understanding the blockchain in a conceptual way can also help an analyst better comprehend how criminals might leverage the technology to either facilitate a crime or hide their activities. This could be illegal purchasing, money laundering, or carrying out a fraud of some type—which can all be achieved on a blockchain system. When you understand the technology, you can better make the connection to understanding and predicting types of illegal activity.

A few years ago, I was working with a country's counterterrorism team that spent a huge amount of time learning how an attacker could potentially distribute a noxious chemical, even mocking up systems to see how they could work. In the same way, understanding the underlying technology and brainstorming how criminals could use it can help you stay ahead of the curve.

In this chapter, we will look at data in its raw form, which will help you understand how the blockchain works in its native view. We will also consider how to extract raw hexadecimal information from a block on the chain and analyze its key components. This enables the investigator to check the accuracy of what a tool is reporting.

In previous chapters, I have talked about the concept of a distributed ledger, given examples of the stone coins used on Yap, and played with a shared online spreadsheet. In this chapter, I will break this down to help you understand the following:

  • The structure of a block
  • The headers of a block
  • The use of hashing and the Merkle tree
  • Forks in the blockchain
  • The role of the mempool

The Structure of a Block

So, what is a blockchain? Although this may sound overly simplistic, it is a chain of blocks with each block being made up of a number of transactions clustered into a block by mining. Once a block has been mined it is essentially locked, and nothing can be added or changed. Each transaction in the block is then said to have a “confirmation.” In your mind, you can think of individual transactions making up a new block, the block being “locked” by mining, and then placed on top of the previous block. Of course, it's not really put on top of anything—the block is simply linked mathematically to the blocks that came before it.

In the case of Bitcoin, once a number of transactions have been made, they are “mined” into a block every 10 minutes or so by solving a mathematical puzzle as discussed earlier. Transactions are either in the mempool (which we'll look at in the next chapter) or part of a mined block.

Ethereum is a little different. Transactions can be in the transaction pool (txpool) or in a mined block. Blocks are mined on Ethereum about every 15 seconds.

You can see how many transactions are included in a block on Bitcoin by browsing to http://bit.ly/2fyCoRs. This will show you a live graph of transactions per block. If you choose the All Time option, you can see how the number of transactions per blocks has fluctuated over time, as shown in Figure 3-1.

Illustration of a live graph presenting transactions per block.

Figure 3-1: Live graph of transactions per block.

As you will learn as we progress, the block is hashed in several different ways, including being hashed together with the blocks that came before it. As more blocks are piled on top, it becomes mathematically more and more difficult to change any transactions further down the chain of blocks.

It is possible for an attacker who controls a significant percentage of all the cryptocurrencies mining capacity to launch what is known as the 51% attack. If the attacker controls over half the blockchain's mining capacity, it is theoretically possible to recalculate several previous blocks and create a new fork in the chain known as an orphan fork. So, it would be possible to “spend” 100 bitcoins, and then, using your mining majority, “fork” the chain to not include the transaction so it would never have been mined and could be spent twice (or “double-spent”). It is generally accepted that six confirmations, or six blocks that have been successfully mined on top of the block containing a certain transaction, make it practically impossible to change the chain. However, the BitcoinWiki (http://bit.ly/2x2Djjf) states that for large transactions, one hundred and forty-four blocks or one day is “required before completing the exchange.”

To get a mental picture of a blockchain, I like to think of LEGO® bricks. Imagine the small bricks being transactions—they are clipped together to make a larger block. That block is then placed on the existing Lego™ tower of blocks you have made. If you wanted to change a single brick (representing a transaction in this analogy), the rules state that you must deconstruct the tower, brick by brick, until you get to the brick to change. If the brick you want to change is near the top of the tower, this deconstruction wouldn't be too hard, with not too much work to undo it. However, the further down the tower you go, the harder it becomes.

To complicate matters, let's say it isn't just you building the tower—many builders sat around, building bricks into blocks and adding them to the tower. If more than half of the builders agreed to change one brick further down the chain, they can overpower the rest—in fact, if they are in a significant majority, they can even start a new tower and make that the primary one. However, if the brick (or transaction) to be changed is too far down, even the majority cannot deconstruct the tower quickly enough as new blocks are being added. This analogy illustrates a blockchain and how it is protected from tampering. I mentioned the need for there to be multiple confirmations or mined blocks to protect the chain, which can be likened to glue between the blocks. The glue becomes increasingly set depending on the number of blocks mined above it. The more blocks, the more miners are needed to potentially pull the blocks apart.

The Block Header

To help you understand block headers, I will focus on Bitcoin as an example. Every mined block has a header with a significant amount of information in it. The header in turn sits atop all the transactions that form the block as a whole. Transactions can be different file sizes depending on the platform. With Bitcoin, a block is always less than 1 MB. The Bitcoin Cash fork, which launched in late 2017, can have an 8 MB block size. This type of fork is different from the orphaned forks mentioned earlier and will be discussed later in the book. Ethereum works slightly differently by setting a cap on the “work” needed to process a transaction—a value known as “gas” (which you'll learn about in more detail in Chapter 4, “Transactions”).

As illustrated in Figure 3-2, an entire block in its raw hex is made up of the following elements:

  • The first 4 bytes describe the size of the block.
  • The next 80 bytes make up the block header.
  • The number of transactions takes up the next 1 to 9 bytes and is variable.
  • This is followed by all the transactions.

Schematic illustration of a block header and its constituent parts.

Figure 3-2: Block header and its constituent parts.

The 80 bytes of the block header are further broken up into the following parts (see Figure 3-3):

  • Version
  • Previous Block Hash
  • Merkle Root
  • Timestamp
  • Difficulty Target

Schematic illustration of the block header.

Figure 3-3: The block header.

Version

This value is a version number to track software and protocol upgrades. It's not unusual to see version numbers 2, 3, and 4, as follows:

  • Version 1 was seen in the genesis block in 2009.
  • Version 2 was introduced in Bitcoin Core 0.7.0 in 2012. Version 2 blocks required that the block height be recorded. This soft fork eventually rejected any Version 1 blocks.
  • Version 3 blocks were introduced in Bitcoin Core 0.10.0 in 2015 as a soft fork.
  • Version 4 blocks were specified in BIP65 and introduced in Bitcoin Core 0.11.2 in 2015 as a soft fork.

Unusual version numbers, on the other hand, could be an attempt to create a fork in the blockchain.

Previous Block Hash

This is the hash of the previous block header, providing a link to the block that sits directly “below” it on the blockchain. Note that the whole block is not hashed—only the header of the block. Although the version will always be the same, the previous block hash, the Merkle root, the timestamp, and theoretically, the difficulty target and nonce, will always be different. This means that the block header hash will always be a unique value.

Merkle Root

The Merkle root is a tricky one to explain but is a value essentially derived from a construct called a Merkle tree. In the simplest terms, it is a hash of all the transactions in the block. Of course, this means that if any transaction in the block changes, it will adjust the Merkle root hash and, by extension, will also adjust the block header hash, providing data protection and resilience. However, the transactions are not simply collated and hashed. There is a structure used that has some very special qualities. This is the way it works:

  1. The transaction IDs (TXIDs) are paired.
  2. The new value is created by the pair of ID values.
  3. This new value, termed an intermediate hash, is double-hashed as follows:

    Hash=SHA256(SHA256(Intermediate Hash)

    In other words, the intermediate hash is SHA256-hashed, and then the result is hashed again, which results in a new SHA256 hash.

  4. The result is then paired with another hash created from another TXID pair and then double-hashed, and so on, until you are left with a single hash for the block: the Merkle root. If there is an odd number of transactions, the odd transaction is hashed with itself.

Figure 3-4 shows how four transactions, including the Coinbase, are hashed.

Schematic illustration of the visualization of the Merkle tree.

Figure 3-4: Visualization of the Merkle tree.

As you can see in the figure, A and B are paired, and C and D are paired. They are then hashed, and those hashes are combined and hashed.

This method is extremely efficient if you want to check the accuracy of a single transaction, because you only have to download and verify a single “branch” of the tree. The alternative is to hash every transaction in bulk, but that would require you to re-hash every transaction to check just one. This also allows you to verify the order of the transactions by evaluating just the intermediate hashes rather than accessing the entire blockchain.

Imagine having 32 transactions as “leaves” on the tree, where the root value is only separated five steps from any one transaction. This method requires only five (rather than 32) calculations, which is extremely efficient.

Timestamp

The timestamp is recorded in something called UNIX time. It is quickly recognizable as a 10-number string that starts with 15 (until September 2020, when it will start with 16). This value represents the number of seconds from 00:00:00 1 January 1970. The timestamp is the moment the successful miner started hashing the header (see http://bit.ly/2fDmLrG for more details from the Bitcoin developer reference).

To reverse the UNIX time into human-readable time and date information, you can use an online calculator such as the one at www.unixtimestamp.com/.

Alternatively, you can use Excel to carry out conversion on a list of UNIX timestamps by doing the following (see Figure 3-5):

Snapshot illustration of UNIX time conversion in Excel.

Figure 3-5: UNIX time conversion in Excel.

  1. Put your UNIX value into cell A1.
  2. In cell A2, type
    =(((A1/60)/60)/24)+DATE(1970,1,1)
  3. Format cell A2 as
    Custom - dd/mm/yyyy hh:mm

You can use this technique to convert multiple UNIX timestamps simultaneously.

Difficulty Target

The difficulty target and the nonce work together and define what the miner is trying to find in order to solve the mathematical puzzle, or proof of work, and close the block. The difficulty target is the encoded version of the maximum value that the block's header hash must be less than or equal to.

To visual this, open the Python file you wrote in Chapter 2, “The Hard Bit.” The text field where you put your block hash from your Nickcoin spreadsheet would be a similar value to the difficulty target:

text = ("Copy Block Hash Here")

The nonce is the value that the miner will change to in the Block Header Hash in order to try and fulfill the difficulty target. For example, to try and find a value with a certain number of zeros before it, the miner might enter the following:

for nonce in range(10000000):
input = text+str(nonce)

This part of the code sets a nonce value by counting from 0 to 100000000. The value is incremented on each loop and concatenated with the hash from the text variable. This new, longer value is SHA256 hashed as follows:

hash = hashlib.sha256(input.encode()).hexdigest()

The code then looks to see if the target prefix of five zeros is present in the resulting hash. If it isn't, the code loops around and tries the next nonce value. When the target prefix is found, the result is printed. In the real world, the value is double-hashed but the principle is the same, when the proof-of-work has been found, the block is considered to be mined. It's then added to the chain, and the Bitcoin reward is given to the miner.

Deconstructing Raw Blocks from Hex

With this information about the makeup of a block, we are able to deconstruct the data without the need for an interpreter to do it for us. We can extract the raw hexadecimal from the blockchain and derive the data we need. Why would we want to do that?

It may be possible to extract raw blocks from a memory dump, or extract data from a wiretap. It may even be that you want to interpret the data personally so that you are not relying on the website or software to do it for you. Is the software interpreting the block correctly? How would you know if it wasn't?

The problem with the raw hex from a block is that it looks like this:

0200000066191da95594aeda1a98a19ff054a88a510754e2a4d93e0a00000000000000008485ae797312b2cb37dfb1aac11d7c5ad9dd84364bbe26ffa781853996587d9b10a06555f586161898a9870dfd070401000000010000000000000000000000000000000000000000000000000000000000000000ffffffff1b0356770506cbcde1b6e3fb084e8b873474fe192306457336e3ffceffffffff0121430696000000001976a9147f8723c3a5e64d6e1d47511863aca2f146b0a85588ac000000000100000001418bebf3dfe21ea57f50863195e6cdef756aa755087c36aa2c6d597c573892a5010000006a47304402207b9e4d1c1e126f47db3d74f981b8ee9c124f44a92637a657dc94cd4b05216a9a022014fe5df34c6e2c3b1bb1de3f69097873e220b97c0beefd29cc714abeb8180c880121030e1e08f6d4ba2b71207c961109f9d0b7eaad24b106ecc9b691c297c732d47fccffffffff0200e40b54020000001976a914a31e71f2cfc0327c55cc4026073f06f3e9e1a21a88ac00e40b54020000001976a914d6bbf4f08d2df7ea32b2930ae4b7436d4ca6fe4b88ac000000000100000003c4b4dda5204f1796e65a5d740b87d2c4540c2a6bf85fd7e779ad4b789126b94d010000006b483045022045666fd6805ab5264acdc3d2fcbffc27d0482ef1e0d5dcdb958b18db50767f05022100f53d0fd0ce7951f45beaaa835fbcf503a167026a685aed83c54dddceafdd58a401210270f83fc138312056466b13236680642afcffd493fe1866cc74baddee2cf79ba3ffffffff58a0e67c144d64408cc6fab19f8131a02ca6a46c0c1ebf4643349b0b4dc0fc7f010000006c493046022100aee71752f67d3af1bc599a30b9642765b57a59e7eaf4b4eaa41417689b662eb8022100c49d9f48d26a96

And this is only a very small part of an entire block! However, the block header is, as you would expect, at the top, so it's nice and easy to find. Most of the data is the raw transactions, which we will look at later.

To carry out analysis on the header, you will need to install or open a hex viewer or editor. For Windows, I recommend HxD, which you can download from mh-nexus.de/en/hxd/.

Once you have installed HxD, I recommend you set it up as shown in Figure 3-6, Figure 3-7, and Figure 3-8. If you are using any other Hex reader you will benefit from adjusting the byte width to 32, the byte grouping to 4, and the offset base to Decimal.

Snapshot illustration of setting the byte width to 32 bytes wide.

Figure 3-6: Set the byte width to 32 bytes wide.

Snapshot illustration of setting the byte group size to 4.

Figure 3-7: Set the byte group size to 4.

Snapshot illustration of setting the offset base to decimal.

Figure 3-8: Set the offset base to decimal.

We know that the header is 80 bytes, which corresponds to 160 characters. Browse to http://bit.ly/2xcEmP5, which is a raw block from the Bitcoin blockchain, and copy and paste more than 160 characters into your hex editor. Highlight some of the hex code, and the editor will tell you when you have highlighted 0 to 159 (or 160) blocks. It also gives you the length, which will be 160. See Figure 3-9.

Snapshot illustration of raw hex from a block on the Bitcoin blockchain.

Figure 3-9: Raw hex from a block on the Bitcoin blockchain

Once you have 160 blocks highlighted, delete the rest of the hex that you had copied into your editor.

Before delving into this melee of hexadecimal, you need understand a computing concept called Endianness. This section describes the three types of Endianness: Little Endian, Big Endian, and Internal Byte Order.

Big Endian

Endianness is simply the order in which a value is written. We take for granted in Western languages that everything is written to be read from left to right on the page; however, numerous languages are read from right to left including Hebrew, Arabic Urdu, and others. Big and Little endians simply answer this question: from which end do I start reading a value?

Consider, for example, the following decimal string:

1 2 3 4

This value in Big Endian format is written the same as above: 1 2 3 4.

How can you remember this? If you were to write the decimal columns over each number, you would write “Thousands” for the leftmost column, “Hundreds” for the second column, “Tens” for the third column, and “Ones” (or “Units”) for the rightmost column. Hence, the first number is the “big” end of the number, or Big Endian, because the 1 in the string is in the Thousands column.

Little Endian

This is the opposite of Big Endian. In Little Endian, the value 1 2 3 4 is recorded as 4 3 2 1. In other words, if you were to write the decimal columns, they would be in reverse: “Ones” (or “Units”) for the leftmost column, “Tens” for the second column, “Hundreds” for the third column, and “Thousands” for the rightmost column. Because the first digit recorded is from the Ones (or Units) column, you can remember this as the “little” value—hence, Little Endian.

Internal Byte Order

Internal Byte Order is a little more complex than Big and Little Endianness.

Using the 4-byte hex string A91D1966 as an example, Internal Byte Order does several things:

  • Its splits the string into its individual bytes: A9, 1D, 19, and 66.
  • The individual bytes are read in Big Endian order (left to right) as normal.
  • The bytes are then reversed, making the string Little Endian but with each individual byte remaining Big Endian, as follows: 66, 19, 1D, A9.

This is illustrated in Figure 3-10.

Schematic illustration for visualizing Internal Byte Order.

Figure 3-10: Visualizing Internal byte order.

It's not terribly complicated, but it does mess with your head a bit.

The Bitcoin header uses both Little Endian and Internal Byte Order. Take a look at Figure 3-11.

Schematic illustration showing how each entity is written in the block header.

Figure 3-11: How each entity is written in the block header.

Applying This to the Downloaded Hex

In your hex editor, you should have the 160 hex characters that make up the header of the block (this is block 358230 on the Bitcoin blockchain). Take a look at the first 4 bytes or 8 hex characters. This is the version number and Figure 3-12 shows what this looks like in Little Endian format.

Screenshot illustration of the version in Little Endian.

Figure 3-12: The version in Little Endian.

This is otherwise known as version 2 of Bitcoin. However, what if you want to find something more interesting like the date and time the block was hashed? You simply need to do some counting, which your hex editor can help you with. You know that there are 4 bytes (8 characters) for the version, 32 bytes (64 characters) for the previous block value, and another 32 bytes (64 characters) for the Merkle root value.

To find the date/time, count the next 32 bytes, which gives you the value of the previous block hash in Internal Byte Order format, as shown in Figure 3-13.

Screenshot illustration of a previous block hash in Internal Byte Order.

Figure 3-13: Previous block hash in Internal Byte Order.

Then you count the next 32 bytes, which provides the value of the Merkle root as shown in Figure 3-14.

Screenshot illustration of Merkle root in Internal Byte Order.

Figure 3-14: Merkle root in Internal Byte Order.

The next 4 bytes are the timestamp, as highlighted in Figure 3-15.

Screenshot illustration of timestamp in Little Endian.

Figure 3-15: Timestamp in Little Endian.

The timestamp is both in hex and in Little Endian format. To turn this into a value that's of use to a human, you need to do some conversion. The hex you have found is

10A06555

This has to be reversed from Little Endian, so you have a value of

55560A01

You now need to convert this into a decimal value. Lots of convertors are available online, such as at http://bit.ly/1ExkXDM. However, since you now have Python installed on your computer, you can use a very simple Python command to do it for you. Just follow these steps:

  1. Open a Windows command shell and type the following:
    Python
    If you get an error, it's probably a path issue as I mentioned earlier. You can resolve this by adding the install path to the System Environment Variables. Or, you can simply change directory to the python folder, which is usually in the root of C. Just type the following:
    cd c:python27 (where 27 is the python version)
    python
    You should now be presented with something like this:
    Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:53:40) [MSC v.1500 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
  2. At the >> prompt, type the following, which forces a conversion from base 16 to a decimal integer:
    >>> int("55560A01", 16)
    The value 55560A01 is the hex timestamp value, and the command provides a decimal result of 1431702017. This is the UNIX time value.
  3. Of course, you can use an online convertor or Excel formula to change the UNIX into the standard data and time format. However, since you are playing with your Python console for this exercise, just type the following two simple commands to do the conversion:
    >>> import time
    >>> time.ctime(int("1431702017"))
    This will return the following value:
    'Fri May 15 16:00:17 2015'

I am not teaching a Python class here, but you could throw this together into a very simply little Python script. Open Notepad++, type in the following code, and save it as unix_conv.py:

import time

rawtime = str(raw_input("Whats the value from the hex?"))

#The next line flips the hex from little to big endian

flip = rawtime[::-1]

u_time = int(flip, 16)

final = time.ctime(int(u_time))

print final

Run the script as follows:

python unix_conv.py

It will ask for the value in the hex. (Of course, you could save the hex dump and get Python to count it out for you, but I'll leave that up to you.) It then flips it to Big Endian, converts it to decimal (as you did at the console in the preceding exercise), and then does the time conversion.

This simple technique of counting 68 bytes to find the timestamp is a good way of demonstrating how easy it is to extract these values from the raw blockchain data.

Number of Transactions

You can use the same technique to enumerate the number of transactions that are in the block. This number is right after the end of the block header and is stored in Internal Byte Order format.

Directly after the header ends (remember it's 80 bytes or 160 characters long), you'll find a marker FD. The 2 bytes immediately after FD are the number of transactions in the block. However, because they are in Internal Byte Order, you need to do some moving around to be able to decode the value. Follow these steps:

  1. Browse back to the raw block you were looking at previously: http://bit.ly/2xcEmP5.
  2. To make your life easier, copy and paste a few lines of hex that you ascertain by eye is about 200 or so characters. This will ensure that you have the 160 characters of header and the following bytes that you need for the number of transactions.
  3. Use HxD or your chosen hex editor to count the 160 characters to the end of the block. (Remember that you need to have the View image Offset Base menu option set to Decimal.)

    Figure 3-16 shows the count, and Figure 3-17 shows the value that starts with the FD marker.

    Screenshot illustration of counting 80 bytes.

    Figure 3-16: Counting 80 bytes.

    Screenshot illustration of a number of transactions bytes prefixed FD.

    Figure 3-17: Number of transactions bytes prefixed FD.

    You now have the following value:

    0704
  4. To interpret this value correctly, leave the value of each byte in order but reverse the bytes as follows:
    0407
  5. Use the int Python command to change this value into its decimal equivalent, as follows:
    >>> int("0407", 16)
    This gives you a value of 1031, or 1031 transactions in this block.

Block Height

When conducting an investigation, the investigator will often stumble across a block number, which is also referred to as the block height. Going back to our LEGO® analogy, this is the number of blocks above block zero.

The first block was block zero, otherwise known as the genesis block. If a block has a number or height of 481750, then it is the 481751st block including the zero block.

In the Introduction, I mentioned that the genesis block was created by the Bitcoin founder Satoshi Nakamoto. You can see the raw block at http://bit.ly/2wAXXeJ.

Copy the entire hex code into your hex editor. If you're using HxD, remember to turn on View image Visible Columns image Hex And Text. In the text column, you will be able to read the hidden message that Satoshi left in the genesis block (see Figure 3-18).

Screenshot illustration of the genesis block with text visible.

Figure 3-18: The genesis block with text visible.

You can find numerous sites to browse block information for all of the major cryptocurrencies. Some of my favorites include:

We will use these sites and others extensively in the investigation portion of the book, but for now, just take a moment to browse to these sites and perhaps compare how they display the information for block 481961.

Forks

Forks in a blockchain are often considered to be very complex, but in reality, this concept is not very difficult to understand. Think of it as a fork in a road with a decision to be made: Should I go left or right?

Several different types of forks relate to a blockchain and we have discussed a type of orphan fork previously. Orphan forks created by mining synchronization issues happen all the time when a block is mined virtually simultaneously by more than one miner. Let's say that miners all over the world are looking for the proof-of-work solution for block 214002. Two miners find the hash solution at almost the same time. Using the Peer-to-Peer protocol of the blockchain system, these two miners start to tell their peers that they have the solution. Imagine part of the world being told that miner A has the solution, while other nodes around the globe are being told that miner B has the solution. For a time, this means there is a fork: essentially two versions of 214002 on different parts of the network. How is this resolved?

Within a few minutes, the next block will need to be mined. It is extremely unlikely that a second block would be found by two miners simultaneously, so the branch of the fork that finds the solution to the next block is the block that remains, while the other fork is orphaned. See Figure 3-19.

Schematic illustration of a mining fork causing an orphan fork to appear.

Figure 3-19: A mining fork causing an orphan fork to appear.

The quicker that blocks are set to be mined, the more likelihood there is that blocks will be found at virtually the same time. For example, Bitcoin is 10 minutes, Fastcoin is 30 seconds, and Ethereum is 15 seconds.

The next type of fork is the hard fork; this is where miners accept recommended changes to the underlying software and protocols that are not compatible with the historical blockchain. Perhaps the most well-known hard fork is Bitcoin Cash, which increased the maximum block size limit from 1 MB to 8 MB, allowing around four times the number of transactions per day—an increase from approximately 250,000 to 1 million. Another example is the hard fork from Ethereum to Ethereum Classic. The name choice can be a little bit misleading—Ethereum is actually the fork, and Classic is the original. Without the gory details, a hacker managed to extract $50 million in Ether, and a new fork was made to get out of the hole and refund those who had lost money.

A soft fork is a software upgrade that is backward compatible with the previous version. Software changes are made and accepted by the mining community, but the change does not cause an underlying adjustment that isn't compatible with past mined blocks.

You will remember that when we looked at the “version” field in the raw hex of a block, it provided a software version number. When a recommendation is made for a significant upgrade that will cause either a hard or soft fork, miners simply have to change their version number in mined blocks to the new version number or leave it as it is. This creates a sort of vote and will either confirm or reject a proposal. For example, the 2015 soft fork that brought in BIP66 had 95% of the miners' hashing power in agreement to change their block version to version 3.

A soft fork has three outcomes:

  • The miners all agree, and the fork isn't really a fork, just a software change (see Figure 3-20).

    Schematic illustration of an agreed fork to a new software version.

    Figure 3-20: An agreed fork to a new software version.

  • A majority of miners agree, the new fork sticks, and the old fork slowly dies (see Figure 3-21).

    Schematic illustration of the majority of miners agree, and the old fork fades away.

    Figure 3-21: The majority of miners agree, and the old fork fades away.

  • A majority of miners disagree, and the new fork dies (see Figure 3-22).

    Schematic illustration showing that the new fork does not have support and dies.

    Figure 3-22: The new fork does not have support and dies.

As you can see, forks are really quite simple. When I'm teaching a class on this subject, I often use the illustration of a literal fork in a road. Imagine that you are in a long stretch of traffic and approach a fork. The sign says that there is a better view if you travel to the left, but traveling to the right is a little quicker. Also, depending on how many vehicles use each road, one road will stay open and the other road will be closed.

If the majority of travelers use the quicker road, that becomes the dominant fork, and the other road falls into disuse. This is similar to a hard or soft fork. In some circumstances, a significant number of cars use both roads, so then two approved routes exist. This is like a new hard fork such as Bitcoin Cash.

The Ethereum Block

The Ethereum block is different in structure to the Bitcoin block but you will recognize a number of variables from the Bitcoin block.

Ethereum uses the following equation-based notation for each variable in a block and block header:

  • Block = B
  • Header = H or Bh
  • Transactions = Bt

The variables in the block header are very similar to the Bitcoin variables:

  • parenthash = Hp
  • ommersHash = Ho
  • beneficiary = He
  • number = Hi (like height)
  • timestamp = Hg (UNIX time)
  • mixhash = Hm (difficulty)
  • nonce = Hn

Here are brief descriptions of each of these variables:

Parenthash This is the same as Previous Block Hash and contains the hash of the previous block in the chain. However, Ethereum uses its own hash algorithm called Ethash, which is a SHA derivative called SHA3-256 and SHA3-512.

Ommershash I previously mentioned that Ethereum blocks are mined every 15 seconds or so, which means that blocks can often be found almost simultaneously. Ethereum rewards these blocks with a lower amount than the block that has carried on the blockchain. A block that has been correctly mined but doesn't carry the place on the chain is called an ommer. There could be a number of ommers for a block, and these are hashed and stored in the ommershash.

Beneficiary This is the Ethereum address that the block rewards are paid to.

Number This is the same as the block height in Bitcoin.

Timestamp This is the same as the timestamp in Bitcoin and is stored in UNIX time format.

Mixhash and Nonce These are the same as the Difficulty Target and Nonce in Bitcoin but are used in a slightly different way to achieve a successful mining of a block.

A number of other fields exist too. You can find a good article on the subject at http://bit.ly/2fDL2l3.

Block explorers are available specifically for Ethereum, especially etherscan.io, which is my personal favorite. If you want to take a look at an interesting block, browse to etherscan.io and search for block 3930000. In the comments section is a picture of the Ethereum founder Vitalik Buterin holding a piece of paper with the block height and hash written on it. This is an ingenious “proof-of-life” idea that is similar to holding up a newspaper with the date on it.

It's not easy to get the raw data for Ethereum; however, Blockcypher.com has an API to enable users to get at data via a format called JSON.

You can access data either via a programming language or just from a browser. Although some browsers will try to format the API data, I recommend using Firefox. Using the Add-On Manager, search for JSONView and install it. Then open a Firefox window and type the following:

https://api.blockcypher.com/v1/eth/main/blocks/4202088

This address asks the API to return information on the Ethereum (eth) block number 4202088. Your browser will return data that looks something like this:

{
   "hash":
"0f408977378082b52d10510e4cdb10250d1294a92b80aeb4913c68122420ed9d",
  "height": 4202088,
  "chain": "ETH.main",
  "total": 119556993654581291234,
  "fees": 9926819550000000,
  "size": 3885,
  "ver": 0,
  "time": "2017-08-25T10:45:46Z",
  "received_time": "2017-08-25T10:45:46Z",
  "coinbase_addr": "ea674fdde714fd979de3edf0f56aa9716b898ec8",
  "relayed_by": "",
  "nonce": 16547416540787094411,
  "n_tx": 24,
  "prev_block":
"ef6631ea51f5629bcbf501c8718fbea0fc562207aefcf115d8921d0d86fb7790",
  "mrkl_root":
"23913adc9fb63a8a1bef28089a9cba101bda448498bb1c86b5885dfc260dca5e",
  "uncles": [
    "b6018cf273248c5db96eaefb789eaefcb6f9ee18aa3d5c749219b35e12f181f0"
  ],
  "txids": [
    "4f418d3edf4b605a1999d533ebe3bbe8b4f16f81a6bc153a192f8d6ce3502025",
    "56f560955d26edc755bddabd1a4201cb65a2523025af41faf4d40ed2822d82d2",
    "6fe55b96ad4361e4c744a28d92dcc79aef3370d744f47dc5662455a0e31c838c",
    "8cddec8de1f71c3a2a37eaf65fe293d363f69ce34e1c517ac082cb392cb5794e",
    "00bd72d2e08fe6fcb05a04469080cbb7b4b78e89d9b0756b10d5fc0ed563e0f3",
    "0c88baf1bb3b187847ff877b9e5e3d5cf89470d9073b845eda5f39f708176d2a",
    "8bb83a4f0f1a1e0a652d84cd431905f2a7fe40061447824a30f9615de0da6188",
    "a6990c8fe34caf10684f0304d54d0ea70abe245e1d3b64e34b9411f7feb5aa1b",
    "c25a648088ff9cb6c0618733a3c1e48da6d8c3df64eae205deff24ed7e55d3a0",
    "34d8669152c3c785a89f0934ac8fbdd2be959f11f6ba8b277e658e522f9654e6",
    "5cfb3ef6111b4be05d8cbe82f27f7860c48e95810734793b8a380f83e2e5632d",
    "70d864edfb7284d328496576d3a5c5e9c18646a661c0d5b654db559da6b5222f",
    "9cab5e2c79cd71ec5d625478077aadfb1d89a32f9a69d3bb3788e44be0662be9",
    "ad07108012341d5eacaf22cf5a69b79da34b930cabd8ba6e3b775e9b4867791f",
    "0003522dd3ffce8444b63407e3d562aa207b7f4670bde90ec5e36d610c41c797",
    "147aef0eccc3ba5fcfc93026deadb9abc022b1b364b355c070cdf99556a70d65",
    "f2f82dae3109be2822775940b6fcbd4f80e3af46a24fab60cb16b6f223ce1bf9",
    "897394d1ff93b70350e8bd6b73d3bad8dd58a62ab6ef1c6fbca26b4454eb3f64",
    "fcfc03b54752bf0e086527f6248b3aa931a2cd53e4a54b5b7d99a3bd6603e373",
    "406114f56d2914641b0117f896e3f63d35ca6bd35d4d113f080224ef00684e94"

This provides data that you can easily search or import into another database. You can use the same API at Blockcypher to access raw data from a number of cryptocurrencies including Bitcoin, Litecoin, and Dogecoin. Here are some examples:

Simply replace the block number/height at the end and it will recover the raw data. Being able to get to the raw data can be very useful to an investigator, and we will be using APIs to get at information during the investigation phase of this book.

If you are using Linux, you can use the curl command to pull the data back from the API into a script or terminal. For example, the following code will create a file called test.json that contains data from the API:

    curl -o test.json https://api.blockcypher.com/v1/btc/main/blocks/481961

You can also download a numerical series using the curl command. Let's say that you want to download every block from 481960 to 481970. You would simply add -0 before the URL and then put the range in square brackets like this:

    curl -o test.json -0 https://api.blockcypher.com/v1/btc/main/blocks/[481960-481970]

This will work through each URL in turn and write the data to the test.json file.

You can also use Python to extract the data, which I will discuss in more detail later in this book.

Summary

In this chapter, you learned how blocks are structured. You used crafted URLs to extract the raw data and then used techniques to find the hex to decode dates and transaction counts. You have learned about different encoding patterns and how you can use different tools to decode them. You have also looked at the differences between Bitcoin and Ethereum headers.

Being able to get at the raw uninterpreted information is vital for an investigator to be able to check data displayed by websites and other software. In this chapter, you should have started to understand how you can get that from a blockchain.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.192.120