Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

D. IlettBuilding Quality Shaders for Unity®https://doi.org/10.1007/978-1-4842-8652-4_2

2. Math for Shader Development

Daniel Ilett¹

(1)

Coventry, UK

People learn in different ways. While many people reading this book will want to learn every single bit of math related to shader development before jumping into making shaders, others will be happy to skim-read the important bits and pick up the rest as they go along. In this book, I’ve opted to give you a comprehensive look at shader math early on, with the understanding that you can skip the chapter and flick back here whenever you see fit. Throughout the book, I will provide references back to the appropriate section of this chapter whenever a new concept is introduced for those who want to pick up the important bits as they go.

In this chapter, I will introduce you to the fundamental math that you will encounter when making shaders: from vectors and matrices to trigonometry and coordinate spaces and everything in between.

Vectors

Vectors are a fundamental building block for shaders – almost everything you do inside a shader will involve a vector somewhere. So what are they? Let’s imagine we’re on a treasure hunt and we’re given a map of the local area. We’re starting at the base camp in the bottom-left corner, and we’ll represent this with the vector (0, 0), since this map has two dimensions. If this map measures distances in miles, let’s say there’s an extremely large tree two miles east and one mile north, plus a rocky hill one mile east and four miles north and a sandy beach three miles east and five miles to the north. Charting everything on our map, it’ll look something like Figure 2-1.

Figure 2-1
A map containing a few landmarks

Position and Direction Vectors

We can use vectors to represent the offset between our starting point and each of those three locations – the vector between our starting point and the tree is (2, 1), because it’s two miles to the east, or the x-direction, and one mile to the north, the y-direction. In fact, vectors are great at representing the offset between any two points simply because that’s what a vector is: a quantity that has a length and a direction. That’s the one-line description almost every textbook gives, at least! In this example, the direction is pointing toward the tree, and the length is about 2.24 miles. You might have noticed that the vector starting at the rocky hill and ending at the sandy beach is also (2, 1).

There are a few things to grasp already. Firstly, we can represent any position using its offset from some origin point, which in 2D is (0, 0) – that’s why I conveniently chose that as the starting point on our map. A vector containing only zeroes is always special, as it’s the only vector with a length of zero and without a particular direction – both properties will become relevant as we explore operations on vectors. It’s got a special name too: the zero vector. Secondly, vectors can start at any point on the map. Not only are they useful at telling us where some point is in relation to the origin point but they can tell us about the displacement between two points that are not (0, 0). This is important because saying “the vector (2, 1)” could mean several different things on the same map.

Vector Addition and Subtraction

Now let’s go hiking. From our starting point, we’ll go and explore the rocky hill first, because we think the treasure might be buried on the top. Luckily, there’s a well-trodden dirt path heading directly there. Bad news for us: After an hour of digging, we don’t find anything. Darn. Our next best guess is that the sandy beach might have some clues to the treasure’s location. So, exhausted, we pack up our shovel and metal detector and trek to the beach; this time, there’s a stone path with no bends that leads to the beach. The route we took can be seen in Figure 2-2. Now, how far from the starting point are we?

Vectors can easily be added to one another by taking each of the numbers inside the first vector and adding them to the corresponding numbers from the other vector (although both vectors must have the same dimension). Each of the values inside a vector is called a component, and they’re named like the axes of a graph: the first is the x-component, then y, then in higher dimensions z, and then w. Adding two vectors in 2D, then, is just a case of adding the x-components together and then the y-components. To figure out our position vector on the beach, let’s add up the components of the journey we took.

Equation 2-1: Adding two vectors

$$ left(1,4
ight)+left(2,1
ight)=left(1+2,4+1
ight)=left(3,5
ight) $$

The sandy beach is indeed at (3, 5). Subtraction works the same way. We found a clue on the beach that’s directing us toward the tallest object in the area, so we’ll head over to the tree next – amazingly, there’s a perfectly straight concrete path that passes just by the tree. Whoever built these paths sure knows how to set up contrived math questions. Given the beach is at (3, 5) and the tree is at (2, 1), what’s the vector from the beach to the tree?

Remember that vectors have direction. We can get the answer by taking the destination vector and subtracting the starting vector. The same logic applies here: apply the subtractions component-wise.

Equation 2-2: Subtracting two vectors

$$ left(2,1
ight)-left(3,5
ight)=left(2-3,1-5
ight)=left(-1,-4
ight) $$

In the context of our map, the vector (−1, −4) means “–1 miles east and –4 miles north” or, more simply, “one mile west and four miles south.”

Scalar Multiplication

This time, we’re pretty sure that we’ll strike gold here. The metal detector goes off at the base of the towering tree, so, with great anticipation, we’ll dig into the ground with our shovels. And wouldn’t you know it? After 5 minutes, we hear a kathunk, and we drag out a chest with the treasure inside. Success! With the treasure in our grubby hands, it’s time to head back to base camp, where we started. We already know the tree is at (2, 1), as seen in Figure 2-3, so what’s the easiest way to figure out the vector from the tree to base camp? We can negate a vector to flip its direction. All positive components become negative and vice versa. That’s exactly what we need to do here – therefore, the vector for the final leg of our journey is (−2, −1).

Figure 2-3
Point vectors for the start and end positions of the next leg of the journey

Equation 2-3: Reversing a vector’s direction

This is an example of multiplication by a scalar. If a vector has several components, then a single number by itself is called a scalar, and they are helpful when it comes to vector math. Multiplication can be used to change the length of a vector – for example, multiplying the vector (2, 1) by 3 results in the vector (6, 3), which is three times longer, and multiplying by 0.5 instead results in (1, 0.5), which is half as long. Multiplying by 1 always results in the same vector; for that reason, 1 is called the multiplicative identity, just as the vector (0, 0) is the additive identity. Multiplying a vector by –1, like we just did, will always reverse the vector’s direction but preserve its length. Multiplying by other negative numbers will reverse the direction, but the length will change. And multiplying by 0 always results in the zero vector, which we discussed before. Dividing is the same thing as multiplication – just multiply by the reciprocal instead.

Equation 2-4: Examples of scalar multiplication

$$ 3ast left(2,1
ight)=left(6,3
ight) $$

$$ 1ast left(2,1
ight)=left(2,1
ight) $$

$$ 0ast left(2,1
ight)=left(0,0
ight) $$

Vector Magnitude

I’ve mentioned the length of a vector a few times, so how do we calculate it? The advantage of tracing out vectors on a graph like this is that it makes it easy to see that a vector is always the hypotenuse (the longest edge) of a right-angled triangle, as shown in Figure 2-4.

Figure 2-4
The longest side length of a right-angled triangle can be calculated using the other two side lengths

You’ll likely be familiar with the Pythagorean Theorem, which is exactly what we use to calculate vector length (or, as it’s often called, magnitude). I’ll use the terms length and magnitude interchangeably throughout the book. With vectors, we’ll take the square of each component of the vector, add them together, and then take the square root of the result. We represent magnitude of a vector in formulas by putting vertical bars around it. So, for the vector (−2, −1), its length is represented by ∣(−2, −1)∣ and calculated like so:

Equation 2-5: Calculating the magnitude of a vector

$left|left(-2,-1 ight) ight|=sqrt{2^2+{1}^2}=sqrt{5}approx 2.24$

So there you have it: when your teachers said Pythagoras would be useful in later life, this is what they meant. Now we know that the last leg of our treasure hunt was about 2.24 miles long, which is quite the trek. In fact, the total distance we walked throughout the day was 12.72 miles, so I hope we can afford a shower and a long rest using all that treasure.

Vector Normalization

We’ve seen how the magnitude of a vector might be useful, but what if we are only interested in its direction? For any given vector (except the zero vector), there are infinite other vectors that point in the same direction – we just need to multiply by any positive scalar value to get one of those vectors. As it turns out, many calculations we’re going to make throughout this book are going to work a lot better if the vectors have a length of 1. This is called a unit vector, and the process to turn one vector into a unit vector with the same direction is called normalization. You can see a vector and its normalized counterpart in Figure 2-5.

Figure 2-5
A vector of length 4 and its normalized counterpart, which has length 1

To normalize a vector, all we need to do is divide by its magnitude; we’ve seen how to do each bit before, so let’s put them together. I’m going to start with the vector (3, 4), which I’m going to call A. The corresponding unit vector is denoted $$ hat{oldsymbol{A}} $$ .

Equation 2-6: Normalizing a vector

$hat{A}=frac{A}{mid Amid }=frac{left(3,4 ight)}{sqrt{3^2+{4}^2}}=frac{1}{5}left(3,4 ight)=left(frac{3}{5},frac{4}{5} ight)$

If we were to relate this back to the treasure hunt, then if we had set off from base camp toward the hill at (1, 4), got tired after exactly one mile of walking, and took a rest,then we’d be at $frac{1}{sqrt{17}}left(1,4 ight)approx left(0.24,0.97 ight)$ .

Basis Vectors and Linear Combinations

So far we’ve been discussing the properties of the vectors themselves, but now it’s time to talk about the map as a whole. Earlier, I mentioned that there are infinite vectors that point in the same direction as a given nonzero vector, but what does that mean? Our map is in 2D, and as a physical object, it will have a certain size. Let’s imagine the map extends infinitely in each direction. We can represent any position on this infinite map using a real number in each of the vector’s two components. We call the map a vector space (in fact, this space in particular is called ℝ² because it’s two-dimensional and it’s made up of real numbers, ℝ).

In many contexts we’ll see throughout the book, it will be useful to represent position vectors as a combination of vectors that are perpendicular to each other. For example, the point (1, 1, 1) in 3D (in the space ℝ³) can be obtained by adding (1, 0, 0), (0, 1, 0), and (0, 0, 1) together. We say that the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) form a basis for the vector space ℝ³, because these three vectors have the following properties: they are linearly independent, because we can’t add multiples of any two of those vectors to form the third one; and they span the entire space, because we can form any vector in ℝ³ by combining multiples of these three vectors. We may not use these terms very often in computer graphics, but we will see how sets of perpendicular vectors become important in later sections.

Dot Product

At this point, you should be comfortable with adding vectors, multiplying by scalars, and normalizing vectors. Each of these operations results in a new vector. What other information can we discern from vectors? Often, we are interested in the angle between two vectors, as seen in Figure 2-6.

Figure 2-6
The angle between two vectors

There are plenty of contexts where the angle between two vectors becomes useful, such as lighting, where the angle between a light ray and a surface normal vector influences the amount of illumination falling on the object. In these contexts, we use an operation called the dot product, denoted by a ⋅ b for the vectors a and b.

Equation 2-7: Dot product of two vectors in 2D

$$ oldsymbol{a}ullet oldsymbol{b}=left|oldsymbol{a}
ight| left|oldsymbol{b}
ight| cos heta $$

$oldsymbol{a}ullet oldsymbol{b}={oldsymbol{a}}_x{oldsymbol{b}}_x+{oldsymbol{a}}_y{oldsymbol{b}}_y$

Recall that ∣a∣ and ∣b∣ are the magnitudes of a and b, respectively. θ is the angle between the two vectors. There are two ways of calculating the dot product, and both ways result in a scalar value. Neat! For that reason, the dot product is sometimes called the scalar product. So what can we do with the dot product? Well, it provides an extremely efficient way of evaluating the angle between two vectors. If we combine the preceding two formulas and rearrange them, then we have a good way of calculating the cosine of the angle between the two vectors.

Equation 2-8: Finding the cosine between two vectors

$left|oldsymbol{a} ight| left|oldsymbol{b} ight|cos heta ={oldsymbol{a}}_x{oldsymbol{b}}_x+{oldsymbol{a}}_y{oldsymbol{b}}_y$

$cos heta =left(frac{{oldsymbol{a}}_x{oldsymbol{b}}_x+{oldsymbol{a}}_y{oldsymbol{b}}_y}{left|oldsymbol{a} ight| imes left|oldsymbol{b} ight|} ight)$

We can evaluate the angle between the two vectors using just the cosine of the angle. For instance, if cos θ equals zero (and therefore if the dot product equals zero), the two vectors are at right angles to each other – they are perpendicular. In the lighting example I mentioned previously, the amount of light would be zero. Else, if cos θ equals 1, then the two vectors are parallel – they have the same direction. –1 means they are still parallel but point in opposite directions. We can see this if we calculate the dot product between (2, 1) and (−2, −1):

Equation 2-9: Calculating the cosine of the angle between two vectors

$cos heta =left(frac{2 imes -2+1 imes -1}{sqrt{5} imes sqrt{5}} ight)$

$cos heta =left(frac{-5}{5} ight)$

Any values of cos θ between 0 and 1 mean θ is between 0° and 90°, and when cos θ is between –1 and 0, θ is between 90° and 180°. However, you will notice we had to do quite a bit of rearranging to get a formula for the cosine. What if we could avoid needing to do that? You’ll see that the denominator in Equation 2-8 relates to the length of both vectors – if that equals 1, then the dot product is exactly equal to the cosine. We have already seen a method for making sure the input vectors have a length of 1: normalization! In many of the operations we’ll be doing throughout the book, it will be important to make sure all vectors are normalized first so that we don’t need to divide after doing the dot product. Since normalization involves dividing by the length anyway, this will only be more efficient if we use the dot product on the vectors more than once, but it’s a good habit to get into regardless.

Cross Product

We’ve seen a type of vector multiplication that results in a scalar value. What if we wanted to get a vector result instead? What would that look like? Enter the cross product, also known as the vector product. For two vectors in 3D space, the cross product between them will produce a third vector, which is perpendicular to both original vectors. There are a few caveats. Firstly, the cross product of any vector with the zero vector always results in the zero vector – perpendicularity isn’t well-defined for the zero vector. Secondly, the cross product of a vector with any vector parallel to it (including itself) also results in the zero vector. In this instance, there isn’t a single direction perpendicular to both input vectors – in fact, there are infinite directions such a vector could point in. Also, the cross product isn’t defined in 2D because you can never obtain a third vector perpendicular to both input vectors. For two vectors in 3D, the cross product looks like this:

Equation 2-10: Cross product of two vectors in 3D

$$ oldsymbol{a} imes oldsymbol{b}=left|oldsymbol{a}
ight| left|oldsymbol{b}
ight| sin heta oldsymbol{n} $$

$oldsymbol{a} imes oldsymbol{b}=left(egin{array}{c}{oldsymbol{a}}_{oldsymbol{y}}{oldsymbol{b}}_{oldsymbol{z}}-{oldsymbol{a}}_{oldsymbol{z}}{oldsymbol{b}}_{oldsymbol{y}}\ {}{oldsymbol{a}}_{oldsymbol{z}}{oldsymbol{b}}_{oldsymbol{x}}-{oldsymbol{a}}_{oldsymbol{x}}{oldsymbol{b}}_{oldsymbol{z}}\ {}{oldsymbol{a}}_{oldsymbol{x}}{oldsymbol{b}}_{oldsymbol{y}}-{oldsymbol{a}}_{oldsymbol{y}}{oldsymbol{b}}_{oldsymbol{x}}end{array} ight)$

In the first equation, n is the unit vector perpendicular to both a and b, and θ is the angle between a and b. A useful property of the cross product equation is that if both a and b are unit vectors and are themselves perpendicular to one another, then ∣a∣, ∣b∣, and sin θ all equal zero, and the equation becomes a × b = n. In most cases when calculating the cross product inside shaders, we can just normalize the output if we are certain that the resulting n is not the zero vector.

Matrices

We have seen how vectors work, but they are not expressive enough for us to carry out every operation that computer graphics demands of us. To get the best performance out of our graphics card, we will be using matrices for some of the most expensive calculations in the graphics pipeline. Let’s start with the most obvious question: what are matrices?

A matrix (the singular form of the word – the plural is matrices) is a rectangular array of numbers organized into rows and columns. They can have any size, and we refer to the matrix size by saying a matrix is m by n, where m is the number of rows and n is the number of columns. They’re kind of like tiny Excel spreadsheets.

Equation 2-11: An example of two 2 × 3 matrices

$A=left[egin{array}{ccc}-7& 4& 2\ {}8& 0& -1end{array} ight]kern2.5em B=left[egin{array}{ccc}5& 2& 3\ {}-8& 1& 1end{array} ight]$

Each number inside the matrix – each element of the matrix – is usually a real number in shaders. Matrices are usually denoted by a capital letter (in this example, we have matrices A and B), whereas individual matrix elements are denoted by a lowercase letter and subscripts to indicate which row and column of the matrix an element is from. For example, a_{1, 2} is the element of A in the first row and the second column, which is 4. b_{2, 1} is in the second row and first column of matrix B, which is –8. Unlike many programming languages, matrices are one-indexed – sorry.

As we will see later, matrices are used heavily in computer graphics to represent transformations required for taking data from a mesh and converting it into positions on-screen. If you are writing basic shaders, it is not necessary to know how each and every matrix operation works, because Unity will provide helper functions for us – in which case, you might wish to skip to a later section on space transformations to see how matrices generally help us in the computer graphics pipeline. However, some of the shaders we will see later rely on matrix operations, so I believe it is still useful to understand how to manipulate matrices ourselves.

Note

Sometimes, it takes a while for matrices to stick in your brain if it’s your first time using them. If you need extra worked examples or if you’d like to go further with matrices than this chapter does, then cuemath.com/algebra/solve-matrices/ is a great resource.

Matrix Addition and Subtraction

There are many operations we can do with matrices, so let’s start with the basics. The size of the matrix is crucial because some operations become incompatible between matrices of certain sizes. Let’s take addition as an example. To add two matrices, they must be the same size. Adding is simple – just take each element from the first matrix and add it with the element in the same position from the second matrix.

Equation 2-12: Adding two 2 × 3 matrices

$A+B=left[egin{array}{ccc}-7+5& 4+2& 2+3\ {}8+left(-8 ight)& 0+1& -1+1end{array} ight]=left[egin{array}{ccc}-2& 6& 5\ {}0& 1& 0end{array} ight]$

Subtracting two matrices works in a similar way – both matrices must be the same size. We can think of subtracting a matrix as adding the negative of that matrix; finding the negative of a matrix is as easy as negating each element of the matrix.

Equation 2-13: Subtracting a 2 × 3 matrix from another 2 × 3 matrix

$A-B=left[egin{array}{ccc}-7-5& 4-2& 2-3\ {}8-left(-8 ight)& 0-1& -1-1end{array} ight]=left[egin{array}{ccc}-12& 2& -1\ {}16& -1& -2end{array} ight]$

Scalar Multiplication

Just like we could with vectors, we can multiply a matrix by a scalar value. We take every element of the matrix and multiply each one by the scalar value, resulting in a new matrix the same size as the original one.

Equation 2-14: Multiplying a 2 × 3 matrix by a scalar value

$2 imes A=2left[egin{array}{ccc}-7& 4& 2\ {}8& 0& -1end{array} ight]=left[egin{array}{ccc}2 imes left(-7 ight)& 2 imes 4& 2 imes 2\ {}2 imes 8& 2 imes 0& 2 imes left(-1 ight)end{array} ight]=left[egin{array}{ccc}-14& 8& 4\ {}16& 0& -2end{array} ight]$

This is called scalar multiplication. We will see how matrix multiplication works in a bit, but first, let’s look at some operations and terminology that are unique to matrices.

Square, Diagonal, and Identity Matrices

We saw how matrices are rectangular, but there is a special type of matrix called a square matrix, where there is the same number of rows as columns, such as a 2 × 2, 3 × 3, or 4 × 4 matrix. A diagonal matrix is even more special – it is a square matrix where every element must equal zero, except the elements on the diagonal line from the top left to the bottom right (this is called the leading diagonal). The elements on the leading diagonal could still equal zero. We’ll see later that these kinds of matrix have different behavior under certain operations.

Equation 2-15: A diagonal matrix

$C=left[egin{array}{ccc}7& 0& 0\ {}0& -2& 0\ {}0& 0& 6end{array} ight]$

An extremely important kind of matrix, the identity matrix, denoted I, is a diagonal matrix where all elements on the leading diagonal equal one. There is only one identity matrix for any given matrix dimension – here are the 2 × 2, 3 × 3, and 4 × 4 identity matrices:

Equation 2-16: Identity matrices in two, three, and four dimensions

${I}_2=left[egin{array}{cc}1& 0\ {}0& 1end{array} ight]kern1.25em {I}_3=left[egin{array}{ccc}1& 0& 0\ {}0& 1& 0\ {}0& 0& 1end{array} ight]kern1.25em {I}_4=left[egin{array}{cccc}1& 0& 0& 0\ {}0& 1& 0& 0\ {}0& 0& 1& 0\ {}0& 0& 0& 1end{array} ight]$

Matrix Transpose

These are all interesting types of matrices, but let’s see some other matrix operations. First, there is the matrix transpose operation, denoted with a superscript T, such as A^T (sometimes A^′ is used), which effectively mirrors the matrix in the leading diagonal (remember – this is a diagonal line that starts in the top-left corner). The element a_{1, 2} in the new transposed matrix is equal to element a_{2, 1} in the original matrix. For the matrix A, which was 2 × 3, the matrix A^T is 3 × 2.

Equation 2-17: Transposing a matrix

$A=left[egin{array}{ccc}-7& 4& 2\ {}8& 0& -1end{array} ight]kern1.75em {A}^T=left[egin{array}{cc}-7& 8\ {}4& 0\ {}2& -1end{array} ight]$

There are a few properties of the matrix transpose operation to note. The transpose of the transpose of a matrix will return the original matrix. This makes sense – if we mirror a matrix in the leading diagonal and then mirror again, we expect to get back what we had originally.

Equation 2-18: Transposing a transpose matrix

${left({A}^T ight)}^T={left[egin{array}{cc}-7& 8\ {}4& 0\ {}2& -1end{array} ight]}^T=left[egin{array}{ccc}-7& 4& 2\ {}8& 0& -1end{array} ight]$

We also find that if we add two matrices together and then take the transpose, the result is the same as if we had taken the transpose of the two matrices individually and then added them. Intuitively, this also makes sense if you think of the transpose as just moving the matrix elements to a new position: it doesn’t matter if we add elements and then move them or if we move elements and then add them – we are still adding exactly the same elements together.

Equation 2-19: Transposing the addition is the same as adding the transposes

${A}^T+{B}^T={left(A+B ight)}^T={left[egin{array}{ccc}-2& 6& 5\ {}0& 1& 0end{array} ight]}^T=left[egin{array}{cc}-2& 0\ {}6& 1\ {}5& 0end{array} ight]$

If we multiply a matrix by a scalar value and then take the transpose, we get the same result as if we had taken the transpose of the matrix and then multiplied by the scalar. If you think about this in the same way as the previous example involving addition, the transpose is just moving elements around, so we are multiplying the elements by the same value in either scenario.

Equation 2-20: Transposing the scalar multiple is the same as scalar multiplying the transpose

${(xA)}^T=xleft({A}^T ight)=left[egin{array}{cc}-7x& 8x\ {}4x& 0\ {}2x& -xend{array} ight]$

Matrix Determinant

Transposing is not the only useful matrix operation of course! We can also calculate the matrix determinant, denoted det(A) or ∣A∣ for the matrix A. The determinant only exists for square matrices – those with the same number of rows as columns. We rarely need to calculate this ourselves, but I will include the process here for completion. Let’s start with the determinant of a 2 × 2 matrix:

Equation 2-21: Determinant of a 2 × 2 matrix

$det left(left[egin{array}{cc}a& b\ {}c& dend{array} ight] ight)= ad- bc$

As you can see, the determinant of a matrix is a scalar value. If the determinant of a matrix equals zero, then the matrix has no inverse (we will learn about invertible matrices later). What about larger matrices, such as a 3 × 3 matrix?

$mathit{det}left(left[egin{array}{ccc}a& b& c\ {}d& e& f\ {}g& h& iend{array} ight] ight)$

There are a few methods for calculating the determinant, but we will use the Laplace expansion, which is recursive and uses the 2 × 2 matrix determinant as its base case. The process is like this: We will take any given column or row of the 3 × 3 matrix. Let’s choose the top row. For each element of the row, if we “cross out” the row and column containing that element temporarily, we are left with a 2 × 2 submatrix, which we will calculate the determinant of using the preceding equation. Then, multiply by the element you started with.

In our case, we will be left with three values: a(ei − hf); b(di − gf); and c(dh − ge). We will combine these like so: add the value corresponding to the leftmost element, then subtract the next one, and then add the last one. Therefore, the determinant of this 3 × 3 matrix is a(ei − hf) − b(di − gf) + c(dh − ge).

In fact, if we were carrying out this process for a 4 × 4 matrix, the same rules would apply: calculate the determinant of the submatrices you obtain through “crossing out” each element of the row, then add the first, subtract the second, add the third, and subtract the fourth. This + − + − pattern extends to any size matrix. Don’t worry too much about needing to remember all this – it’s helpful to know what’s happening under the hood, but there are shader functions that do this for you.

So far, we’ve looked at some great matrix operations, but none are quite as useful as the next one we’ll look at. This one is the backbone of the entire graphics pipeline, and without it, we would struggle to build an efficient method of transforming data onto the screen.

Matrix Multiplication

Multiplying a matrix by another matrix can be a little tricky at first, but I’ll go through it step by step. Like many other matrix operations, there are restrictions on the sizes of the two matrices: the number of columns of the first matrix must equal the number of rows in the second one due to the way matrix multiplication works. As it turns out, the result will have as many rows as the first matrix and as many columns as the second. Let’s take matrices A and B^T as examples:

$A=left[egin{array}{ccc}-7& 4& 2\ {}8& 0& -1end{array} ight]kern1.75em {B}^T=left[egin{array}{cc}5& -8\ {}2& 1\ {}3& 1end{array} ight]$

If we were to calculate A × B^T, then this would work because A has three columns and B^T has three rows. The resulting matrix will be 2 × 2. On the other hand, A × B is not a valid operation, because B only has two rows. Before even seeing how matrix multiplication works, we can already make an interesting observation: by the same rules, B^T × A is also a valid multiplication, but it will result in a 3 × 3 matrix. Matrix multiplication is said to be noncommutative because the order of the inputs matters. This is different from multiplying real numbers, which is commutative – for example, 3 × 9 = 9 × 3. On the other hand, matrix multiplication is associative like real number multiplication – that is, for three matrices L, M, and N, it doesn’t matter which order we resolve the following chain of multiplications: L × M × N = (L × M) × N = L × (M × N).

Equation 2-22: Matrix multiplication sizes

$A imes {B}^T=left[egin{array}{cc}{z}_{1,1}& {z}_{1,2}\ {}{z}_{2,1}& {z}_{2,2}end{array} ight]kern1.75em {B}^T imes A=left[egin{array}{ccc}{y}_{1,1}& {y}_{1,2}& {y}_{1,3}\ {}{y}_{2,1}& {y}_{2,2}& {y}_{2,3}\ {}{y}_{3,1}& {y}_{3,2}& {y}_{3,3}end{array} ight]$

How do we carry out the multiplication operation? Let’s calculate A × B^T. We already know this will be a 2 × 2 matrix. To calculate the top-left element, z_{1, 1}, we will perform a product of the first row of the first matrix with the first column of the second matrix. Similarly, the bottom-left element, z_{2, 1}, is the product of the second row of matrix A and the first column of matrix B^T. The product works by taking the first element of the row and the first element of the column and multiplying them, then moving across the row and down the column and adding their product, and so on until you’ve reached the end of both the row and the column, resulting in a single scalar value to put in the result matrix. In fact, it works the same way as the dot product for vectors that we saw earlier.

Equation 2-23: Calculating the matrix multiplication

$A imes {B}^T=left[egin{array}{cc}-7 imes 5+4 imes 2+2 imes 3& -7 imes left(-8 ight)+4 imes 1+2 imes 1\ {}8 imes 5+0 imes 2+left(-1 ight) imes 3& 8 imes left(-8 ight)+0 imes 1+left(-1 ight) imes 1end{array} ight]=left[egin{array}{cc}-21& 62\ {}37& -65end{array} ight]$

Earlier, I introduced identity matrices. Recall that an identity matrix is square and has ones down the leading diagonal, with zeroes everywhere else. If we multiply any matrix by an identity matrix, then it is left unchanged by the operation. It doesn’t matter if the identity is first or second (as long as the sizes are compatible).

Equation 2-24: Multiplying by the identity matrix

$A imes {I}_2=left[egin{array}{ccc}-7 imes 1+0+0& 0+4 imes 1+0& 0+0+2 imes 1\ {}8 imes 1+0+0& 0+0 imes 1+0& 0+0+left(-1 ight) imes 1end{array} ight]=left[egin{array}{ccc}-7& 4& 2\ {}8& 0& -1end{array} ight]$

One last property of matrix multiplication is that taking the transpose after matrix multiplication is the same as taking the transpose of the individual matrices and then multiplying in the opposite order.

Equation 2-25: Matrix multiplication and transpose interaction

${left(C imes D ight)}^T={D}^T imes {C}^T$

Matrix Inverse

The final major matrix operation we will explore is the inverse of a matrix. I briefly mentioned that a matrix is invertible if its determinant is nonzero. But what is the inverse of a matrix? If we multiply a matrix, E, by its inverse, denoted E⁻¹, then the result will be an identity matrix. The order of multiplication does not matter – an identity matrix is always the result, and it will have the same size as E. Let’s say that E is a 3 × 3 matrix.

Equation 2-26: Multiplying a matrix by its inverse

$extrm{A} imes { extrm{A}}^{-1}=left[egin{array}{ccc}1& 0& 0\ {}0& 1& 0\ {}0& 0& 1end{array} ight]kern2.25em {A}^{-1} imes A=left[egin{array}{cc}1& 0\ {}0& 1end{array} ight]$

And how do we calculate the inverse? For a 2 × 2 matrix, this is not too complicated to do by hand. It requires us to calculate the determinant first. Recall that the determinant of a matrix only exists if the matrix is square; this means that non-square matrices do not have an inverse. Matrices that do not have an inverse are also sometimes called singular or degenerate.

Equation 2-27: Inverting a 2 × 2 matrix

${left[egin{array}{cc}a& b\ {}c& dend{array} ight]}^{-1}=frac{1}{ad- bc}left[egin{array}{cc}d& -b\ {}-c& aend{array} ight]$

As you can see, we are dividing by the determinant of the matrix to obtain the result. Inside the matrix, we have swapped the positions of a and d and negated b and c. Let’s try an example.

Equation 2-28: Matrix inverse example

${left[egin{array}{cc}2& -1\ {}4& -3end{array} ight]}^{-1}=frac{1}{2 imes left(-3 ight)-4 imes left(-1 ight)}left[egin{array}{cc}-3& 1\ {}-4& 2end{array} ight]=left(-frac{1}{2} ight)left[egin{array}{cc}-3& 1\ {}-4& 2end{array} ight]=left[egin{array}{cc}1.5& -0.5\ {}2& -1end{array} ight]$

And what about inverting a 3 × 3 or 4 × 4 matrix? Calculating the inverse gets longer and more complicated the larger the matrix is, and it’s not worth learning how to do it, since shaders provide a function to do this for you. I’ve shown you the example for 2 × 2 matrices because they are more manageable, but the process for 3 × 3 matrices and beyond will take a lot of space to explain with little payoff.

We have now seen the basic building blocks of math we will need for the rest of the book. Vector and matrix operations will form the building blocks upon which we will build the next bit of knowledge. So far, we have considered both vectors and matrices in a general sense, but now we are going to see how we can use both for purposes that are directly relevant to computer graphics.

Matrix Transformations

I mentioned previously that matrices are going to be powerful enough for us to use them in the graphics pipeline. Usually, we represent points (or vertices) in space using vectors. As it turns out, and as I have hinted toward slightly, we can consider vectors to be a special case of matrix, which has only one column or one row, depending on which way round it is written. We’ll call them column vectors and row vectors. With that in mind, it becomes possible to manipulate point vectors using matrix multiplications – there are certain operations we can represent easily using matrices, and we’re going to see how they all work, starting with scaling.

Note

If you need an extra resource to get to grips with matrix transformations, then I recommend learnopengl.com/Getting-started/Transformations. Although the website is geared toward learning OpenGL, this section is applicable to learning computer graphics in general.

Scaling Matrices

Scaling is the process of making an object larger or smaller. If we have a cube and we scale it by a factor of two, we have made it twice as large in each axis, as seen in Figure 2-7.

Figure 2-7
Scaling an object in 2D by a uniform amount in each axis

Of course, we can choose to scale in any of the x-, y-, or z-axis independently. Let’s say we have a vector v = (v_x, v_y, v_z) that we want to scale in each axis. If we wish to represent this using just vectors, we can define another vector s = (s_x, s_y, s_z) to represent the scaling factor in each axis (if we want to uniformly scale in all axes, then s_x = s_y = s_z). Then, we can multiply the two together component-wise. This operation is called the Hadamard product, which isn’t often discussed alongside other vector operations – we’ll denote it as v ◦ s.

Equation 2-29: Scaling a point via component-wise multiplication

$oldsymbol{v}igcirc oldsymbol{s}=left({v}_x{s}_x,{v}_y{s}_y,{v}_z{s}_z ight)$

This is an efficient way to perform a scaling operation, but there’s a drawback: it’s not easy to combine this with other operations. In computer graphics, we often need to apply several transformations to an object at once (e.g., translation, rotation, and scaling). If we have a thousand vertices on our object, then we can do each of those operations one after the other on all vertices, or we can combine the operations into a single matrix via matrix multiplication and perform one pass over the vertices. The latter is far more efficient, and that’s why we use matrices. So how do we represent scaling using a matrix? If you recall, the identity matrix has ones down the diagonal. If we multiply a vector by the identity matrix, it is the same as scaling by 1. Hence, if we swap out those ones for other values, we can scale by different amounts.

Equation 2-30: Scaling using a matrix

$left[egin{array}{ccc}{s}_x& 0& 0\ {}0& {s}_y& 0\ {}0& 0& {s}_zend{array} ight]left(egin{array}{c}{v}_x\ {}{v}_y\ {}{v}_zend{array} ight)=left(egin{array}{c}{v}_x{s}_x\ {}{v}_y{s}_y\ {}{v}_z{s}_zend{array} ight)$

Remember the rules for matrix multiplication: we have a 3 × 1 matrix (or column vector), and we wish to get another 3 × 1 matrix back out, so we need a 3 × 3 matrix for the scaling operation, and we need it to be on the left of the vector. I’ll note here that, for brevity, I’ve only described how to scale about the origin. You could, theoretically, scale relative to any point in 3D space, but the math gets a lot trickier. One trick we can use in this case is to translate all points in space so that the origin is now at the desired scaling point, perform the scale, and then undo the original translation (we will see how translation works soon). Of course, scaling is not the only transformation we can apply to vertices. Let’s also see how rotation works.

Rotation Matrices

Rotation is a bit more complicated than scaling. We can perform rotations around each of the x-, y-, and z-axes, like how we can scale nonuniformly in each of those directions, but now we’ll end up with sine and cosine involved – that’s a step-up in difficulty from just swapping out a few entries in a matrix. Figure 2-8 shows us what a rotation of angle θ looks like in 2D.

Figure 2-8
Rotating counterclockwise by θ around the origin

Let’s see how a rotation around the z-axis works in 3D. A rotation around z will preserve the z-component of any point vector while changing the x- and y-components. A nice corollary of that fact is that any rotation in 2D can be thought of as a rotation around z, because you can think of 2D points as 3D points that have forgotten they have a z-axis (i.e., z = 0). If we wanted to carry out a rotation by angle θ around the z-axis, then conventionally the rotation would happen anticlockwise (or counterclockwise depending on where in the world you’re reading this), and it looks like this when working solely with vectors:

Equation 2-31: Rotating by angle θ around the z-axis

${R}_{z, heta}left({v}_x,{v}_y,{v}_z ight)=left({v}_xcos heta -{v}_ysin heta, {v}_xsin heta +{v}_ycos heta, {v}_z ight)$

And, for completion, here are the similar rotations of angle θ around the y-axis and x-axis:

Equation 2-32: Rotating by angle θ around the y-axis and x-axis

${R}_{y, heta}left({v}_x,{v}_y,{v}_z ight)=left({v}_xcos heta +{v}_zsin heta, {v}_y,-{v}_xsin heta +{v}_zcos heta ight)$

${R}_{x, heta}left({v}_x,{v}_y,{v}_z ight)=left({v}_x,{v}_ycos heta -{v}_zsin heta, {v}_ysin heta +{v}_zcos heta ight)$

How would we represent these as matrices? As we can see, it’s trickier to work out than the scaling matrix was because each output vector component sometimes depends on multiple input components. For example, when rotating about the z-axis, if the input x-component is v_x, then the output x-component is v_x cos θ − v_y sin θ. The rotation matrices, then, are not diagonal. In order, the rotations around the z-axis, y-axis, and x-axis by angle θ are represented by a matrix as such:

Equation 2-33: Rotating by angle θ around each axis using a matrix

${R}_{z, heta }=left[egin{array}{ccc}cos heta & -sin heta & 0\ {}sin heta & cos heta & 0\ {}0& 0& 1end{array} ight]$

${R}_{y, heta }=left[egin{array}{ccc}cos heta & 0& sin heta \ {}0& 1& 0\ {}-sin heta & 0& cos heta end{array} ight]$

${R}_{x, heta }=left[egin{array}{ccc}1& 0& 0\ {}0& cos heta & -sin heta \ {}0& sin heta & cos heta end{array} ight]$

Take a few minutes to try out a few example rotations by yourself by multiplying a vector by any of these matrices. If you follow the matrix multiplication steps, take note of which calculations you’re doing – it’s the Hadamard product we saw earlier in the scaling step. And what if we wanted to rotate by angle θ about an arbitrary axis other than the x-, y-, or z-axis? Like we saw with scaling, we need to transform the entire world so that the desired rotation axis aligns with one of those three, then perform the rotation around that axis, and then undo the transformations we did in the first place. In this case, we can perform rotations around the x-axis by angle ψ and y-axis by angle φ to do the initial alignment such that the desired rotation axis lies on the z-axis, then rotate around the z-axis by angle θ, and then rotate around the y-axis by angle (−φ) and the x-axis by angle (−ψ). Let’s call the arbitrary rotation R_new.

Equation 2-34: Rotation around an arbitrary axis

${R}_{new}={R}_{x,-psi} imes {R}_{y,-varphi} imes {R}_{z, heta} imes {R}_{y,varphi} imes {R}_{x,psi }$

This is a great example of how matrix multiplication can help us. Instead of needing to perform each of these rotations on each and every vertex one after the other, we can combine all five rotations into a single matrix via multiplication like this so we only need to multiply each vertex by one matrix. Note the order of rotations – since matrix multiplication is commutative, we must put the matrices in this order. That said, matrix multiplication is associative, so it doesn’t matter which order we resolve each multiplication operation in once they’ve been written out like this. If we want to rotate about an arbitrary point other than the origin, then the process is like scaling – you can translate the entire space such that the arbitrary point lies at the origin, perform your desired rotation, and then undo the translation. On that note, it’s time to see how translation in 3D space works.

Translation Matrices

Translation is probably the easiest of these three transformations to understand. Translation is the process of moving a point vector to a different position – all we need to do is specify an offset. Figure 2-9 shows us what translation in 2D looks like.

Figure 2-9
Translation moves all vertices of a shape by the same offset

With vectors, this is very easy to represent using vector addition; if we wish to move a point vector v = (v_x, v_y, v_z) by an offset t = (t_x, t_y, t_z), then we can represent that like so:

Equation 2-35: Translating a point vector

$oldsymbol{v}+oldsymbol{t}=left({v}_x+{t}_x,{v}_y+{t}_y,{v}_z+{t}_z ight)$

Since it’s so easy to represent translation using vector addition, we can now go ahead and do the same thing we did with rotation and scaling and turn this into a matrix. But wait – we run into a problem if we try. With a 3 × 3 matrix, putting any value other than 1 along the leading diagonal will scale the points, which we don’t want, and putting any value other than 0 in any of the other positions means that the output value for one of the components will depend in part on the input value of a different component, which we also don’t want. That’s because the rotation and scaling matrices we wrote assume that we are rotating or scaling around the origin, but translation is moving the origin. Unfortunately for us, we really need to represent translation as a matrix operation if we are to harness the full benefit of using matrices, so we are going to need more information.

Homogeneous Coordinates

Let’s rethink the way we represent points using vectors. Right now, if we wish to represent a 2D point using a vector, we use a vector with two elements, x and y. For 3D, we add the component z. This is the most intuitive way to represent points (and directions) because we can separate out each component and see exactly where a point is along each axis. But this is not the only way of representing points. Let’s say we have the point (x, y, z) in Cartesian coordinates (the system we’ve been implicitly using until now). I could just as easily represent it using the vector (x, y, z, 1), which has four components instead of three. These are called homogeneous coordinates, and the fourth component is usually labeled w.

There are some quirks to using the new system over the old one. Firstly, any two vectors that are scalar multiples of one another represent the same point in 3D space. The vectors (1, 2, 3, 1) and (2, 4, 6, 2) represent the same point because each element of the second is twice the corresponding element of the first. This won’t be relevant just yet, but we will revisit this fact later. For now, all we will do is set the w component to 1. It’s also worth noting that we can get back to Cartesian coordinates by dividing each component by w and then removing the fourth component.

Now, what impact does this have on the hypothetical translation matrix? Since we are now using four-element vectors (which could be considered 4 × 1 matrices), we will need to use a 4 × 4 matrix. As we established previously, the translation values can’t be inside the upper-left 3 × 3 part of the matrix, and we have the added constraint that we need the w component to stay as 1 after the transformation. Let’s see what such a matrix looks like – I’ll include my working out for the intermediate steps.

Equation 2-36: Translating a point in homogeneous coordinates using a matrix

$left[egin{array}{cccc}1& 0& 0& {t}_x\ {}0& 1& 0& {t}_y\ {}0& 0& 1& {t}_z\ {}0& 0& 0& 1end{array} ight]left(egin{array}{c}{v}_x\ {}{v}_y\ {}{v}_z\ {}1end{array} ight)=left(egin{array}{c}{v}_x+0+0+{t}_x\ {}0+{v}_y+0+{t}_y\ {}0+0+{v}_z+{t}_z\ {}0+0+0+1end{array} ight)=left(egin{array}{c}{v}_x+{t}_x\ {}{v}_y+{t}_y\ {}{v}_z+{t}_z\ {}1end{array} ight)$

Fantastic! Now we have a matrix for translation. Take a couple of minutes to step through each bit of the multiplication to understand why this wasn’t possible with just a 3 × 3 matrix. On that note, we won’t be able to multiply the 3 × 3 transformation matrices for rotation and scaling that we previously worked out by the new 4 × 4 translation matrix, because the sizes are now incompatible. We need to pad out the matrices with something – in each case, we add a fourth column and fourth row containing zeroes, apart from the lower-right element, which is always 1.

Equation 2-37: 4 × 4 rotation and scaling matrices

$S=left[egin{array}{cccc}{s}_x& 0& 0& 0\ {}0& {s}_y& 0& 0\ {}0& 0& {s}_z& 0\ {}0& 0& 0& 1end{array} ight]$

${R}_{z, heta }=left[egin{array}{cccc}cos heta & -sin heta & 0& 0\ {}sin heta & cos heta & 0& 0\ {}0& 0& 1& 0\ {}0& 0& 0& 1end{array} ight]$

${R}_{y, heta }=left[egin{array}{cccc}cos heta & 0& sin heta & 0\ {}0& 1& 0& 0\ {}-sin heta & 0& cos heta & 0\ {}0& 0& 0& 1end{array} ight]$

${R}_{x, heta }=left[egin{array}{cccc}1& 0& 0& 0\ {}0& cos heta & -sin heta & 0\ {}0& sin heta & cos heta & 0\ {}0& 0& 0& 1end{array} ight]$

By now, we are armed with the basic knowledge we’ll need for tackling the computer graphics pipeline. It’s time to see how each piece of the puzzle we’ve seen so far fits into the pipeline as a whole and understand how the math we’ve seen helps us move data from one stage of the pipeline to the next.

Space Transformations

The graphics pipeline is all about spaces. Don’t worry. This isn’t rocket science, and we’re not going into actual outer space, but we are going to learn about many different types of space that exist throughout the graphics pipeline and how we can use matrices to convert from one space to another. In the graphics pipeline, each vertex on a mesh is initially defined relative to a local origin point. This is called object space (or sometimes model space). This is how your mesh looks when loaded in an external modeling program such as Blender or Maya, as seen in Figure 2-10.

Figure 2-10
In object space, all vertices of the object are defined relative to the pivot point of the object

However, in Unity, you will have many objects in the scene, and they will not share a common origin point; when objects are placed at different positions in the world, with individual rotations and scales, this is called world space. All vertices are now relative to a common world origin, and each individual object has a pivot point, with each vertex relative to that point, as seen in Figure 2-11. However, this doesn’t happen magically – we need to supply a matrix ourselves to do this transformation.

Figure 2-11
In world space, objects appear relative to a world origin point

Note

Learn OpenGL has a dedicated section about coordinate transforms at learnopengl.com/Getting-started/Coordinate-Systems.

Object-to-World Space Transformation

Relating this to Unity in particular, each GameObject in your scene has a Transform component, which specifies the position, rotation, and scale of the object. The model matrix – the one that transforms from object to world space – contains each of these transformations inside a single matrix. Thankfully, we’ve covered each of these transformations already, so we will work through an example using what we learned previously.

We are going to transform the point v = (v_x, v_y, v_z) by translating it by t = (t_x, t_y, t_z) and scaling it by a factor of s = (s_x, s_y, s_z), and, for the sake of simplicity, we’ll rotate only around the z-axis by an angle of θ (the real graphics pipeline can rotate around an arbitrary axis or perform multiple rotations). Matrix multiplication is noncommutative, so the order of each operation is important – which order should we do them in? In general, it shouldn’t matter if we are consistent. However, in this context, we know that the scaling and rotation operations assume that our point is relative to the origin, so it will be best if we leave the translation as the final operation. Let’s work through the example. Remember that we’re using homogeneous coordinates, so we’ll be transforming a slightly modified point v^′ = (v_x, v_y, v_z, 1).

Equation 2-38: Object-to-world transformation example (model transformation)

$left[egin{array}{cccc}1& 0& 0& {t}_x\ {}0& 1& 0& {t}_y\ {}0& 0& 1& {t}_z\ {}0& 0& 0& 1end{array} ight]left[egin{array}{cccc}{s}_x& 0& 0& 0\ {}0& {s}_y& 0& 0\ {}0& 0& {s}_z& 0\ {}0& 0& 0& 1end{array} ight]left[egin{array}{cccc}cos heta & -sin heta & 0& 0\ {}sin heta & cos heta & 0& 0\ {}0& 0& 1& 0\ {}0& 0& 0& 1end{array} ight]left[egin{array}{c}{v}_x\ {}{v}_y\ {}{v}_z\ {}1end{array} ight]$

$=left[egin{array}{cccc}{s}_xcos heta & -{s}_xsin heta & 0& {t}_x\ {}{s}_ysin heta & {s}_ycos heta & 0& {t}_y\ {}0& 0& {s}_z& {t}_z\ {}0& 0& 0& 1end{array} ight]left[egin{array}{c}{v}_x\ {}{v}_y\ {}{v}_z\ {}1end{array} ight]$

$=left[egin{array}{c}{v}_x{s}_xcos heta -{v}_y{s}_xsin heta +{t}_x\ {}{v}_x{s}_ysin heta +{v}_y{s}_ycos heta +{t}_y\ {}{v}_z{s}_z+{t}_z\ {}1end{array} ight]$

We won’t only be transforming this single point, however. These matrices operate on every point in the mesh, but as I mentioned, we will multiply all the matrices together once and use that on every vertex. There is an extra wrinkle involved – what if the GameObject under consideration is a child object of some other GameObject? In that case, we can just evaluate from the bottom of the hierarchy upward: we apply the model for the object under consideration, then apply the model matrix of its parent, and so on until you reach the topmost object. This process can be optimized by calculating the model matrix for each GameObject only once and keeping it in memory, since the model matrix of any one object might be used several times.

Now that every vertex is in world space, let’s think about the next step. When rendering objects to the screen, we need some viewpoint within the world to use as our frame of reference, and we usually call this the camera. You can see it in Figure 2-11. Unity provides a Camera component we can attach to a GameObject for this reason; although we can have more than one, for the sake of simplicity, let’s assume there is only one and that it will render to the full screen. The next step is to transform everything relative to the camera.

World-to-View Space Transformation

When all objects in the scene are relative to the camera, we call this view space (sometimes it’s called camera space or eye space). When converting vertex positions in the scene from world space to view space, we don’t really do anything we haven’t seen before – we just need to transform the entire scene such that the camera’s local right, up, and forward directions align with the world’s x-, y-, and z-axes and the camera is positioned at the world origin. Hence, the view matrix, which is applied to every object in the scene, ends up being the inverse of the camera’s model matrix. Figure 2-12 shows a scene in view space from the camera’s perspective. There’s not much else to say about this transformation, so let’s move on to the next step, where we define which objects will be drawn by the camera and which won’t.

Figure 2-12
A representation of view space from the camera’s point of view. Objects seen in this view have a z-position greater than 0. View space is like world space, but the origin is at the camera’s position. Not all objects represented here will necessarily be rendered

View-to-Clip Space Transformation

Game cameras come in two flavors: orthographic and perspective (although strawberry and chocolate would be better in my opinion). Each type dictates the shape of the view volume of the camera – in other words, objects outside this volume will not be “seen” by the camera.

An orthographic camera’s view volume is a cuboid defined by six planes. The near clip plane and far clip plane, respectively, define a minimum and maximum distance along the camera’s forward direction, and objects must lie between those two distances to be drawn. The other four faces, which lie parallel to the camera’s forward direction, are defined by the aspect ratio of the screen and the “size” of the camera (which is a setting in Unity). Objects seen by an orthographic camera will have the same size no matter where inside the view volume they appear (see Figure 2-13), which isn’t how your own vision works. However, this is usually preferable for 2D games.

Figure 2-13
Six lit cubes on top of a plane, as captured by an orthographic camera

On the other hand, a perspective camera’s view volume is a frustum, which is like a pyramid shape with the tip sliced off. The near and far clip planes work the same way, but the shape of the other four planes is defined by the aspect ratio and the field of view (FOV) of the camera (which is also a setting in Unity). Unlike an orthographic camera, objects seen by a perspective camera look smaller when further away, like in real life. An example of a scene captured by a perspective camera is seen in Figure 2-14.

Figure 2-14
The same six lit cubes on a plane, as captured by a perspective camera with the same position and orientation

Based on the properties of the camera, we will define a projection matrix to transform from view space to clip space. In clip space, all objects in the scene will exist in a box bounded between –1 and 1 in each of the x-, y-, and z-axes. This has a few advantages: Firstly, the graphics API (Application Programming Interface) can efficiently clip all objects that lie outside of the box so that it does not need to waste GPU resources on further steps. Secondly, these positions are independent of the screen’s or the game window’s resolution. Figure 2-15 shows what clip space might look like.

Figure 2-15
Clip space places all objects inside a virtual box. Gray objects are inside the box; white ones are outside. Although the box dimensions are bounded between –1 and 1 in each direction, the box can still be a cuboid when visualized in world space. Objects are “stretched” to fill the clip volume

The projection matrix for an orthographic camera is constructed differently from one for a perspective camera. The orthographic variant is easier to create, and it looks like the following:

Equation 2-39: Orthographic projection matrix

${P}_{ortho}=left[egin{array}{cccc}left(frac{2}{right- left} ight)& 0& 0& -left(frac{right+ left}{right- left} ight)\ {}0& left(frac{2}{top- bottom} ight)& 0& -left(frac{top+ bottom}{top- bottom} ight)\ {}0& 0& left(frac{2}{far- near} ight)& -left(frac{far+ near}{far- near} ight)\ {}0& 0& 0& 1end{array} ight]$

Here, the right, left, top, and bottom variables represent the distance in Unity units of each respective side plane from the center of the near plane, and the near and far variables, unsurprisingly, represent the distances of the near and far clip planes from the camera. Figure 2-16 shows some of these distances in a top-down view.

Figure 2-16
A top-down view of an orthographic camera view volume

We can discern a lot about what the orthographic projection matrix is doing just by looking at it. On the right-hand side, we can see classic signs of a translation – this is repositioning each vertex such that the center of the view volume becomes (0, 0). The values along the leading diagonal represent a scaling operation such that the edges of the viewing volume are bounded between –1 and 1 in each axis. This matrix will also preserve the value of the w component of the point vector – if it is 1 before the multiplication, it will be 1 afterward.

The equivalent matrix for perspective projection is a little more complicated to understand. An orthographic camera did not need to account for the field of view of the camera, because objects don’t appear smaller the further away they are. For a perspective camera, however, objects do appear smaller when further away, and the field of view is responsible for how strong the shrinking effect is. Figure 2-17 is a 2D representation of a perspective camera view volume.

Figure 2-17
A top-down view of a perspective camera view volume

The perspective projection matrix is calculated like so:

Equation 2-40: Perspective projection matrix

${P}_{persp}=left[egin{array}{cccc}left(frac{2ullet near}{right- left} ight)& 0& left(frac{right+ left}{right- left} ight)& 0\ {}0& left(frac{2ullet near}{top- bottom} ight)& left(frac{top+ bottom}{top- bottom} ight)& 0\ {}0& 0& left(frac{far+ near}{far- near} ight)& -left(frac{2ullet farullet near}{far- near} ight)\ {}0& 0& 1& 0end{array} ight]$

Applying the perspective projection matrix also ends up with each vertex position inside a bounded box, but it must account for the field of view of the camera. We define the top, bottom, left, and right values relative to the vertical field of view (or FOV) in radians and aspect ratio using a bit of trigonometry like so:

Equation 2-41: Calculating values for the perspective projection matrix

$top= nearullet an left(frac{FOV}{2} ight)$

The most interesting part of the matrix is that it will set the w component of the output vector to the z component of the input vector, which will be important later. For the first time, we will see values of w other than 1.

These transformations are the backbone of the vertex shader stage, which we will see in the next chapter, and there is another trick we can do to make the pipeline run as efficiently as possible. The view and projection matrices are based on the camera’s properties, which usually stay consistent while drawing a frame – that means we can multiply the two together. On top of that, the model matrix stays consistent for every vertex of an object, so we can multiply the model, view, and projection matrices together for drawing each object. The combination model-view-projection (MVP) matrix, as it’s called, is used in vertex shaders to perform all transformations in one fell swoop. It looks like this:

Equation 2-42: The model-view-projection matrix

${M}_{MVP}={M}_{projection} imes {M}_{view} imes {M}_{model}$

There is one last step after each transformation has been performed. We are still in homogeneous coordinates, so we need to transform back into Cartesian coordinates. At the same time, we need to collapse our 3D points onto a 2D screen.

Perspective Divide

Recall that, in homogeneous coordinates, any point that is a scalar multiple of another point represents the same thing. (1, 2, 3, 1) is the same as (2, 4, 6, 2). When using a perspective camera, the MVP transformation ended up setting the position’s w component equal to z. What we want to do now is collapse all points in the world onto a plane located at z = 1 – this is called the perspective divide. The reason for this is that two objects in the scene with the same (x, y) coordinates in the world shouldn’t necessarily appear at the same (x, y) point on your screen due to the perspective effect, where objects further back appear smaller. Figure 2-18 shows how two such points will get projected onto a plane differently.

Figure 2-18
Two points with the same (x, y) coordinates may not be projected onto the same point on the screen if they have a different z coordinate, due to the way perspective projection collapses points onto the screen (i.e., the z = 1 plane)

By dividing homogeneous coordinates by the w component, we end up with the vector $left(frac{x}{w},frac{y}{w},frac{z}{w},1 ight)$ . Since we had previously set w = z, this is equivalent to $left(frac{x}{z},frac{y}{z},1,1 ight)$ , which puts all positions on the plane z=1, like we wanted. Once we have divided by w, we can ignore the last two components of the vector to end up with the final normalized screen position of the point: this is called normalized device coordinates, a 2D representation that maps the top edge of your screen to y = 1, the lower edge to y = − 1, the left edge to x = − 1, and the right edge to x = 1. The perspective divide happens automatically between the vertex and fragment shader stages.

Summary

This chapter has served as your introduction to the math involved in computer graphics. I hope you’re still intact! The basic building blocks are vectors, and we can manipulate them in many ways, including using matrices to multiply vectors. Matrix multiplication is indispensable in the graphics pipeline because it allows us to combine operations efficiently. Using matrices, we will apply a series of transformations to take vertices of a mesh from object space all the way to device coordinates. Here is a summary of what we learned:

Vectors can be used to represent points in any dimension.
There are many operations you can carry out on vectors, such as addition, scalar multiplication, normalization, dot product, and cross product.
Matrices are 2D arrays of numbers with a number of rows and columns.
Some matrix operations, such as determinant and inverse, only exist on square matrices. Some matrices do not have an inverse, and they are called singular.
Homogeneous coordinates add a fourth component to facilitate matrix transformations that would otherwise be impossible, such as translating 3D points.
A series of matrix transformations operate on vertex data, taking it from object space to world space, to view space, to clip space, and to normalized device coordinates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2. Math for Shader Development

Create new playlist

Sign In

Sign Up

2. Math for Shader Development

Vectors

Position and Direction Vectors

Vector Addition and Subtraction

Scalar Multiplication

Vector Magnitude

Vector Normalization

Basis Vectors and Linear Combinations

Dot Product

Cross Product

Matrices

Matrix Addition and Subtraction

Scalar Multiplication

Square, Diagonal, and Identity Matrices

Matrix Transpose

Matrix Determinant

Matrix Multiplication

Matrix Inverse

Matrix Transformations

Scaling Matrices

Rotation Matrices

Translation Matrices

Homogeneous Coordinates

Space Transformations

Object-to-World Space Transformation

World-to-View Space Transformation

View-to-Clip Space Transformation

Perspective Divide

Summary

Table of Contents for
2. Math for Shader Development