Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2. Tensors

Before we dive deep into the world of PyTorch development, it’s important to understand the fundamental data structure in PyTorch, the tensor. By understanding the tensor, you will understand how PyTorch handles and stores data, and since deep learning is fundamentally a collection and manipulation of floating point numbers, understanding tensors will help you understand how PyTorch implements more advanced functions for deep learning. In addition, you may find yourself using tensor operations frequently when pre-processing input data or manipulating output data during model development.

This chapter serves as a quick reference to understanding tensors and implementing tensor functions within your code. We’ll begin by describing what a tensor is and show simple examples of how to use functions to create, manipulate, and accelerate tensors operations on a GPU. Next, we’ll take a broader look at the API for creating tensors and performing math operations so that you can quickly reference a comprehensive list of tensor capabilities. In each section, we will explore some of the more important functions, identify common pitfalls, and provide key points in their usage.

What is a Tensor?

In PyTorch, a tensor is a data structure used to store and manipulate data. Like a NumPy array, a tensor is a multidimensional array containing elements of a single data type. Tensors can be used to represent scalars, vectors, matrices, and n-dimensional arrays and are derived from the torch.Tensor class. However, tensors are more than just arrays of numbers. By creating or instantiating a tensor object from the torch.Tensor class we also get a set of built-in class attributes and operations or class methods which provide a robust set of built-in capabilities. This chapter describes these attributes and operations in detail.

Unlike NumPy arrays, however, tensors include added benefits which make them more suitable for deep learning calculations. First, tensor operations can be performed significantly faster using GPU acceleration. Second, tensors can be stored and manipulated at scale using distributed processing on multiple CPUs and GPUs and across multiple servers. And third, tensors keep track of their graph computations, which we will see in the section on Autograd, is very important in implementing a deep learning library.

To further explain what a tensor actually is and how to use them, let’s begin by showing a simple example that creates some tensors and performs a tensor operation.

Simple CPU Example

Here’s a simple example that creates a tensor, performs a tensor operation, and uses a built-in method on the tensor itself. By default, the tensor data type will be derived from the input data type and the tensor will be allocated to the CPU device. First, we import the PyTorch library, then we create two tensors, x and y, from two-dimensional lists. Next we add the two tensors and store the result in z. Notice we can just use the “+” operator since the torch.Tensor class supports operator overloading. Finally, we print the new tensor z, which we can see is the matrix sum of x and y, and we print the size of z. Notice that z is a tensor object itself and the size() method is used to return its matrix dimensions, namely two by three.

import torch

x = torch.tensor([[1,2,3],[4,5,6]])
y = torch.tensor([[7,8,9],[10,11,12]])
z = x + y
print(z)
> tensor([[ 8, 10, 12],
 [14, 16, 18]])
print(z.size())
> torch.Size([2, 3])

Note

You may see the torch.Tensor() (capital T) constructor used in legacy code. This is an alias for the default tensor type torch.FloatTensor. You should use torch.tensor() to create tensors.

Simple GPU Example

Since accelerating tensor operations on a GPU is a major advantage of tensors over Numpy arrays, we’ll show you an easy example to do so. Here’s the same example but instead we move the tensors to the GPU device if one is available. Notice that the output tensor is also allocated to the GPU. You can also use the device attribute (e.g. z.device) to double check where the tensor resides. In the first line, the torch.cuda.is_available() function will return True is your machine has GPU support. This is a convenient way to write more robust code that can be accelerated when a GPU exists but also runs on a CPU when a GPU is not present. Also, notice that the device=cuda:0 indicates that the first GPU is being used. If your machine contains multiple GPUs, you can also control which GPU is being used.

device = "cuda" if torch.cuda.is_available() else "cpu"
x = torch.tensor([[1,2,3],[4,5,6]], device=device)
y = torch.tensor([[7,8,9],[10,11,12]], device=device)
z = x + y
print(z)
> tensor([[ 8, 10, 12],
    [14, 16, 18]], device='cuda:0')
print(z.size())
> torch.Size([2, 3])
print(z.device)
> cuda:0

Moving Tensors between CPU & GPU

The previous code uses torch.tensor() to create a tensor on a specific device; however, you may want to move an existing tensor between devices. You can do so by using the torch.to() method. When new tensors are created as a result of tensor operations, PyTorch will create the new tensor on the same device. In the following code, z resides on the GPU because x and y reside on the GPU. The tensor z is moved back to the CPU using torch.to(“cpu”) for further processing. Also note that all the tensors within the operation must be on the same device. If x was on the GPU and y was on the CPU, we would get an error.

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
x.to(device)
y.to(device)
z = x + y
z.to(“cpu”)

Note

You can use strings directly as device parameters instead of device objects. The following are all equivalent:

device=”cuda”
device=torch.device(“cuda”)
device=”cuda:0”
device=torch.device(“cuda:0”)

Creating Tensors

The previous section showed a simple way to create tensors; however, there there are many others ways to do it. You can create tensors from pre-existing numeric data or create random samplings. Tensors can be created from pre-existing data stored in array-like structures such as lists, tuples, scalars, NumPy arrays, or serialized data files.

The following code illustrates some common ways to create tensors. First, it shows how to create a tensor from a list using torch.tensor(). This method can also be used to create tensors from other data structure like tuples, sets, or NumPy arrays. You can also create and initialize tensors using functions like torch.empty(), torch.ones(), and torch.zeros() and specifying the desired size. If you want to initialize a tensor with random values, PyTorch supports a robust set of functions that you can use such as torch.rand(), torch.randn(), and torch.randint(). Upon initialization, you can also specify the data type of your tensor elements as well as the device on which it is stored (i.e. CPU or GPU). Finally, PyTorch includes the ability to create tensors that have the same properties of other tensors but are initialized with different data. Functions with the _like postfix such as torch.empty_like() and torch.one_like() have the same size, data type, and device as another tensor but are initialized differently.

# Created from pre-existing arrays
w = torch.tensor([1,2,3]) # from a list
w = torch.tensor((1,2,3)) # from a tuple
w = torch.tensor(numpy.array([1,2,3])) # from a numpy array

# Initialized by size
w = torch.empty(100,200) # uninitialized, elements values are not predictable
w = torch.zeros(100,200) # all elements initialized with 1.0
w = torch.ones(100,200) # all elements initialized with 0.0

# Initialized by size with random values
w = torch.rand(100,200)   # elements are random numbers from a uniform distribution on the interval [0, 1)
w = torch.randn(100,200)  # elements are random numbers from a normal distribution with mean 0 and variance 1
w = torch.randint(5,10,(100,200)) # elements are random integers between 5 and 10

# Initialized with specified data type or device
w = torch.empty((100,200), dtype=torch.float64, device="cuda")

# Initialized to have same size, data type, and device as another tensor
x = torch.empty_like(x)

Note

There’s some legacy functions such as from_numpy() and as_tensor(). They have been replaced in practice by the torch.tensor() constructor, which can be used to handle all cases.

Table 2-1 lists PyTorch functions used to create tensors. You should use each one with the torch namespace, e.g. torch.empty(). You can find more details at https://pytorch.org/docs/stable/torch.html [PyTorch Tensor Documentation].

Table 2-1. Creation Functions
Function	Description
torch.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False)	Creates tensor from existing data structure
torch.empty(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)	Creates tensor from uninitialized elements based on the random state of values in memory
torch.zeros(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)	Creates a tensor with all elements initialized to 0.0
torch.ones(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)	Creates a tensor with all elements initialized to 0.0
torch.arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)	Creates a 1-D tensor of values over range with a common step value
torch.linspace(start, end, steps=100, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)	Creates a 1-D tensor of steps linearly spaced points between start and end
torch.logspace(start, end, steps=100, base=10.0, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)	Creates a 1-D tensor of steps logrithmically spaced points between start and end
torch.eye(n, m=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)	Creates a 2-D tensor with ones on the diagonal and zeros everywhere else
torch.full(size, fill_value, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)	Creates a tensor filled with fill_value
torch.load(f)	Loads a tensor from a serialized pickle file
torch.save(f)	Saves a tensor to a serialized pickle file

The PyTorch documentation contains a complete list of functions for creating tensors as well as more detailed explanations of how to use them. Here are some common pitfalls and additional insights to keep in mind when creating tensors.

Most creation functions accept the optional dtype and device parameters so you can set these at creation.
You should use torch.arange() in favor of the deprecated torch.range() function. Use torch.arange() when the step size is known. Use torch.linspace() when the number of elements is known.
PyTorch provides torch.quantize_per_tensor() and torch.quantize_per_chan() functions that allow you to quantize the data to specific levels.
You can use torch.tensor() to create tensors from array-like structures such as lists, numpy arrays, tuples, and sets. To convert existing tensors to numpy arrays, and lists use the torch.numpy() and torch.tolist() functions respectively.

Tensor Attributes

One PyTorch feature that has contributed to its popularity is the fact that it’s very Pythonic and object oriented in nature. Since a tensor is it’s own data type, you can read attributes of the tensor object itself. Now that you can create tensors, it’s useful to quickly find information about them by accessing their attributes. Assuming x is a tensor, you can access several attributes of x as follows:

x.dtype indicates the tensor’s data type (see Table 2-2 for list of PyTorch data types)
x.device indicates the tensor’s device location (e.g. CPU or GPU memory)
x.shape shows the tensor’s dimensions
x.ndim identifies the number of tensor’s dimensions or rank
x.requires_grad is a boolean indicating whether Tensor keeps track of graph computations (see “Automatic Differentiation (Autograd)” )
x.grad contains the actual gradients if requires_grad is True
x.grad_fn stores the graph computation function used if requires_grad is True
x.is_cuda, x.is_sparse, x.is_quantized, x.is_leaf, x.is_mkldnn are booleans indicating whether the tensor meets certain conditions
x.layout indicates how a tensor is laid out in memory

Remember when accessing object attributes, do not include parentheses () like you would with a class method (e.g. use x.shape not x.shape() ).

Data Types

During deep learning development, it’s important to be aware of the data type used by your data and its calculations. So when you create tensors, you should control what data types are being used. As we mentioned before, all tensors elements have the same data type. You can specify the data type when creating to tensor by using the dtype parameter or you can cast a tensor to a new dtype using the casting method or to() method as shown in the code below.

# Specify data type at creation using dtype
w = torch.tensor([1,2,3], dtype=torch.float32)

# Use casting method to cast to a new data type
w.int()    # w remains a float32 after cast
w = w.int()  # w changes to int32 after cast

# Use to() method to cast to a new type
w = w.to(torch.float64) 
w = w.to(dtype=torch.float64) 

# Python automatically converts data types during operations
x = torch.tensor([1,2,3], dtype=torch.int32)
y = torch.tensor([1,2,3], dtype=torch.float32)
z = x + y

: Pass in data type
: Define data type directly with dtype
: Python automatically converts x to float32 and returns z as float32

Note that the casting and to() methods do not change the tensor’s data type unless you reassign the tensor. Also, when performing operations on mixed data types, PyTorch will automatically cast tensors to the appropriate type.

Most of the tensor creation functions allow you to specify the data type upon creation using the dtype parameter. When you set the dtype or cast tensors, remember to use the torch namespace such as torch.int64 not just int64. Table 2-2 lists all the available data types in PyTorch. Each data type results in a different tensor class depending on the tensor’s device. The corresponding tensor class is shown in the two rightmost columns for CPU and GPU respectively.

Table 2-2. Tensor Data Types
Data type	dtype	CPU tensor	GPU tensor
32-bit floating point (default)	torch.float32 or torch.float	torch.FloatTensor	torch.cuda.FloatTensor
64-bit floating point	torch.float64 or torch.double	torch.DoubleTensor	torch.cuda.DoubleTensor
16-bit floating point	torch.float16 or torch.half	torch.HalfTensor	torch.cuda.HalfTensor
8-bit integer (unsigned)	torch.uint8	torch.ByteTensor	torch.cuda.ByteTensor
8-bit integer (signed)	torch.int8	torch.CharTensor	torch.cuda.CharTensor
16-bit integer (signed)	torch.int16 or torch.short	torch.ShortTensor	torch.cuda.ShortTensor
32-bit integer (signed)	torch.int32 or torch.int	torch.IntTensor	torch.cuda.IntTensor
64-bit integer (signed)	torch.int64 or torch.long	torch.LongTensor	torch.cuda.LongTensor
Boolean	torch.bool	torch.BoolTensor	torch.cuda.BoolTensor

Note

Memory Considerations and In-place Operations When performing deep learning experimentation, it’s also important to be aware of how tensors are stored in memory. This could have a major impact on the speed and performance of your deep learning models. Tensor values are allocated in contiguous blocks of memory managed as torch.Storage instances. No matter what their dimensions, tensors are stored as a 1D collection of elements in memory. They are indexed by keeping track of the size, offset, and per-dimension strides. Therefore, transposing or reshaping tensors does not need to move data; it simply changes its per-dimension strides.

To reduce space complexity, you may want to reuse memory and overwrite tensor values using in-place operations. To perform in-place operations, append the underscore (_) post-fix to the function name. For example, the function y.add_(x) adds x to y but the results would be stored in y.

Creating Tensors from Random Samples

The need to create random data comes up often during deep learning development. Sometimes, you will need to initialize weights to random values or create random inputs with specified distributions. PyTorch supports a very robust set of functions that you can use to create tensors from random data. As with other creation functions, you can specify the dtype and device when creating the tensor. Table 2-3 lists some examples of random sampling functions.

Table 2-3. Random Sampling Functions
Function	Description
torch.rand(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)	Selects random values from uniform distribution on interval [0 to 1]
torch.randn(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)	Selects random values from standard normal distribution with zero mean unit variance
torch.normal(mean, std, *, generator=None, out=None)	Selects random numbers rom normal distribution with specified mean and variance
torch.randint(low=0, high, size, *, generator=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)	Selects random integers generated uniformly between specified low and high values
torch.randperm(n, out=None, dtype=torch.int64, layout=torch.strided, device=None, requires_grad=False)	Creates a random permutation of integers from 0 to n-1
torch.bernoulli(input, *, generator=None, out=None)	Draws binary random numbers (0 or 1) from a Bernoulli distribution
torch.multinomial(input, num_samples, replacement=False, *, generator=None, out=None)	Selects random number from list according to weights from multinomial distribution

You can also create tensors of values sampled from other distributions using torch.empty() and in-place functions including cauchy, exponential, geometric, and log_normal. Remember in-place methods use the underscore post-fix. For example, x = torch.empty([10,5].cauchy_() creates a tensor of random numbers drawn from the Cauchy distribution.

Creating Tensors Like Other Tensors

Sometimes you may want to create and initialize a tensor that has similar properties to another tensor including the dtype, device, and layout properties to facilitate calculations. Many of the tensor creation operations have a similarity function that allows you to easily do this. The similarity functions will have the post-fix _like. For example, torch.empty_like(tensor_a) will create an empty tensor with the dtype, device and layout properties of tensor_a. Some examples of similarity functions include empty_like(), zeros_like(), ones_like(), full_like(), rand_like(), randn_like(), and rand_int_like().

Tensor Operations

Now that you understand how to create tensors, let’s explore what you can do with them. PyTorch supports a robust set of tensor operations that allow you to access and transform your tensor data. First, we’ll describe how to access portions of your data, manipulate their elements, and combine tensors to form new tensors. Then, we’ll show you how to perform simple calculations as well as advanced mathematical computations, often in constant time. PyTorch provides many built-in functions. It’s useful to see what’s available before creating your own.

Indexing, Slicing, Combining & Splitting Tensors

Once you have created tensors, you may want to access portions of the data and combine or split tensors to form new tensors. The following code demonstrates how to perform these types of operations. You can slice and index tensors in the same way you would slice and index NumPy arrays as shown in the first few lines of code. Note that indexing and slicing will return tensors even if it’s only a single element. You will need to use the item() function to convert a single element tensor to a Python value when passing to other functions like print(). Next, we see that slicing uses the same [start:end:step] format that is used for slicing Python lists and NumPy arrays, and boolean indexing allows you to extract portions of the data that meet certain criteria. PyTorch supports transposing and reshaping arrays as shown in the next few lines of code. Finally, you can combine or split tensors by using functions like torch.stack() and torch.unbind() respectively as shown.

x = torch.tensor([[1,2],[3,4],[5,6],[7,8]])
print(x)
> tensor([[1, 2],
    [3, 4],
    [5, 6],
    [7, 8]])

# Indexing, returns a tensor
print(x[1,1])
> tensor(4)

# Indexing, returns value as Python number
print(x[1,1].item())
> 4

# Slicing
print(x[:2,1])
> tensor([2, 4])

# Boolean indexing
print(x[x<5])
> tensor([1, 2, 3, 4])

# Transpose array, x.t() or x.T can be used
print(x.t())
> tensor([[1, 3, 5, 7],
    [2, 4, 6, 8]])

# Changing shape, Usually view() is preferred over reshape()
print(x.view((2,4)))
> tensor([[1, 3, 5, 7],
    [2, 4, 6, 8]])

# Combining tensors
y = torch.stack((x, x))
print(y)
tensor([[[1, 2],
     [3, 4],
     [5, 6],
     [7, 8]],

    [[1, 2],
     [3, 4],
     [5, 6],
     [7, 8]]])

# Splitting tensors
a,b = x.unbind(dim=1)
print(a,b)
> tensor([1, 3, 5, 7]) tensor([2, 4, 6, 8])

PyTorch provides a robust set of built-in functions that can be used to access, split, and combine tensors in different ways. Table 2-4 lists some commonly used functions to manipulate tensor elements.

Table 2-4. Indexing, Slicing, Combining, and Splitting Operations
Function	Description
torch.cat()	Concatenates the given sequence of seq tensors in the given dimension.
torch.chunk()	Splits a tensor into a specific number of chunks. Each chunk is a view of the input tensor.
torch.gather()	Gathers values along an axis specified by dim
torch.index_select()	Returns a new tensor which indexes the input tensor along dimension dim using the entries in index which is a LongTensor.
torch.masked_select()	Returns a new 1-D tensor which indexes the input tensor according to the boolean mask mask which is a BoolTensor.
torch.narrow()	Returns a tensor that is a narrow version of the input tensor.
torch.nonzero()	Return the indices of non-zero elements.
torch.reshape()	you should use view() instead to insure tensor is not copied
torch.split()	Splits the tensor into chunks. Each chunk is a view of the original tensor.
torch.squeeze()	Returns a tensor with all the dimensions of input of size 1 removed.
torch.stack()	Concatenates sequence of tensors along a new dimension.
torch.t()	Expects input to be ⇐ 2-D tensor and transposes dimensions 0 and 1.
torch.transpose()	You can choose which dimensions to transpose.
torch.take()	Returns a tensor at specified indices when slicing is not continuous
torch.unbind()	Removes a tensor dimension by returning a tuple of the dimension removed.
torch.unsqueeze()	Returns a new tensor with a dimension of size one inserted at the specified position.
torch.where()	Return a tensor of elements selected from either x or y depending on condition

Some of these functions may seem redundant. However, the following key distinctions and best practices are important to keep in mind:

torch.item() is an important and commonly used function to return the Python number from a tensor.
Use view() instead of reshape() for reshaping tensors in most cases. Using reshape() may cause the tensor to be copied depending on it’s layout in memory. view() ensures that it will not be copied.
Using x.T or x.t() is a simple way to transpose 1D or 2D tensors. Use transpose() when dealing with multidimensional tensors.
The torch.squeeze() function is used often in deep learning to remove an unused dimension. For example batch of images with a single image can be reduce from 4D to 3D using squeeze().

Note

PyTorch is very Pythonic in nature. Like most Python classes, some PyTorch functions can be applied directly on a tensor using a built-in method such as x.size(). Other functions are called directly using the torch namespace. These functions take a tensor as an input such as x in torch.save(x, ‘tensor.pt’). Methods can also be chained together. For example, torch.rand(2,2).max().item() creates a 2x2 tensor of random floats, finds the maximum value, and returns the value itself from the resulting tensor.

Tensor Operations for Mathematics

Deep Learning development is strongly based on mathematical computations so PyTorch supports a very robust set of built-in math functions. Whether you are creating new data transforms, customizing loss functions or building your own optimization algorithms, you can speed up your research and development with the math functions provided by PyTorch. The purpose of this section is to provide a quick overview of many of the mathematical functions available so that you can quickly build your awareness of what currently exists and find the appropriate functions when needed.

PyTorch supports many different types of math functions including pointwise operations, reduction functions, comparison calculations, linear algebra, spectral and other math computations. The first type of useful math operations are considered to be pointwise operations. Pointwise operations perform an operation on each point in the tensor individually and return a new tensor. They are useful for rounding and truncation as well as trigonometrical and logical operations. By default, the functions will create a new tensor or use one passed in by the out parameter. If you want to perform an in-place operation, remember to append the underscore to the function name. Table 2-5 lists some commonly used pointwise operations.

Table 2-5. Pointwise Operations
Operation Type	Sample Functions
Basic Math	add(), div(), mul(), neg(), reciprocal(), true_divide()
Truncation	ceil(), clamp(), floor(), floor_divide(), fmod(), frac(), remainder(), round(), sigmoid(), trunc(), lerp()
Complex Numbers	real(), imag(), conj(), abs(), angle()
Trigonometric	cos(), sin(), tan(), asin(), acos(), atan(), sinh(), cosh(), tanh(), deg2rad(), rad2deg()
Exponential & Logrithmic	exp(), expm1(), log(), log10(), log1p(), log2(), logaddexp()pow(), rsqrt(), sqrt(), square()
Logical	logical_and(), logical_or(), logical_not(), logical_xor()
Cumulative Math	addcdiv(), addcmul()
Bitwise	bitwise_not(), bitwise_and(), bitwise_or(), bitwise_xor()
Error Functions	erf(), erfc(), erfinv(),
Gamma Functions	digamma(), lgamma(), mvlgamma(), polygamma()

Use Python hints or refer to the PyTorch documentation for details on function usage. Note that true_divide() converts tensor data to floats first, and should be used when dividing integers to obtain true division results.

Note

Three different syntaxes can be used for most tensor operations. Tensors support operator overloading so you can use operators directly such as z = x + y Although this is less common, you can also use PyTorch functions such as torch.add() to do the same thing. Lastly, you can perform operations in-place using the underscore () post-fix. The function y.add(x) would achieve the same results but the results would be stored in y.

A second type of math functions are called reduction operations. Reduction operations reduce a bunch of numbers down to a single number or smaller set of numbers. That is, they reduce the dimensionality or rank of the tensor. Reduction operations include functions like maximum or minimum values as well as many statistical calculations like mean or standard deviation. These operations are frequently used in deep learning. For example, deep learning classification often uses the argmax() function to reduce softmax outputs to a dominant class.

Table 2-6 shows a list of commonly used reduction operations. Note that many of these functions accept dim parameter which specifies the dimension of reduction for multi-dimensional tensors. This is similar to the axis parameter in NumPy. By default, when dim is not specified, the reduction occurs across all dimensions. Specifying dim = 1, will compute the operation across each row. For example, torch.mean(x,1) will compute the mean for each row in tensor x.

Table 2-6. Reduction Operations
Function	Description
torch.argmax(input, dim, keepdim=False, out=None)	Returns the indice(s) of the max value across all elements or just a dimension if it’s specified
torch.argmin(input, dim, keepdim=False, out=None)	Returns the indice(s) of the min value across all elements or just a dimension if it’s specified
torch.dist(input, dim, keepdim=False, out=None)	Computes the p-norm of two tensors
torch.logsumexp(input, dim, keepdim=False, out=None)	Computes the log of summed exponentials of each row of the input tensor in the given dimension dim
torch.mean(input, dim, keepdim=False, out=None)	Computes the mean or average across all elements or just a dimension if it’s specified
torch.median(input, dim, keepdim=False, out=None)	Computes the median or middle value across all elements or just a dimension if it’s specified
torch.mode(input, dim, keepdim=False, out=None)	Computes the mode or most frequent value across all elements or just a dimension if it’s specified
torch.norm(input, p=fro, dim=None, keepdim=False, out=None, dtype=None)	Computes the matrix or vector norm across all elements or just a dimension if it’s specified
torch.prod(input, dim, keepdim=False, dtype=None)	Computes the product of all elements or just a dimension if it’s specified
torch.std(input, dim, keepdim=False, out=None)	Computes the standard deviation across all elements or just a dimension if it’s specified
torch.std_mean(input, unbiased=True)	Computes the standard deviation and mean across all elements or just a dimension if it’s specified
torch.sum(input, dim, keepdim=False, out=None)	Computes the sum of all elements or just a dimension if it’s specified
torch.unique(input, dim, keepdim=False, out=None)	Removes duplicates across entire tensor or just a dimension if it’s specified
torch.unique_consecutive(input, dim, keepdim=False, out=None)	Similar to torch.unique() but only removes consecutive duplicates
torch.var(input, dim, keepdim=False, out=None)	Computes the variance across all elements or just a dimension if it’s specified
torch.var_mean(input, dim, keepdim=False, out=None)	Computes the mean and variance across all elements or just a dimension if it’s specified

A third type of mathematical operations are called comparison functions. Comparison functions usually compare all the values within a tensor or compare one tensor’s values to another’s. They can return a tensor full of booleans based on each element’s value such as torch.eq() or torch.is_boolean(). They may also find the maximum or minimum value, sort tensor values, or return the top subset of tensors elements. Table 2-7 lists some commonly used comparison functions for your reference.

Table 2-7. Comparison Operations
Operation Type	Sample Functions
Compare tensor to other tensors	eq(), gt(), ge(), lt(), le(), ne() or ==, >, >=, <, ⇐, != respectively
Test tensor status or conditions	isclose(), isfinite(), isinf(), isnan()
Return a single boolean for the entire tensor	allclose(), equal()
Find value(s) over the entire tensor or along a given dimension	max(), min(), kthvalue(), sort(), topk(), argsort()

Comparison functions seem pretty straight forward, however, there are a few key points to keep in mind. Some common pitfalls are described as follows:

The torch.eq() function or == returns a tensor of the same size with a Boolean result for each element. The torch.equal() function tests if the tensors are the same size and that all elements within tensor are equal then returns a single Boolean value.
The function torch.allclose() also returns a single boolean value if all elements are close to a specified value.

The next type of mathematical functions are considered linear algebra functions. Linear algebra functions facilitate matrix operations and are important for deep learning computations. Many computational including gradient descent and optimization algorithms use linear algebra to implement their calculations. PyTorch supports a robust set of built-in linear Algebra operations, many of which are based on the BLAS and LAPACK standardized libraries for linear algebra calculations.

Table 2-8 lists some commonly used linear algebra operations. They range from matrix multiplication to batch calculations and solvers. It’s important to point out that matrix multiplication is not the same as pointwise multiplication with torch.mul() or the * operator. A complete study of Linear algebra is beyond the scope of this book, but you may find it useful to access some of the linear algebra functions when performing feature reduction or developing custom deep learning algorithms. See PyTorch Linear Algebra Documentation for a complete list of available functions and more details on how to use them.

Table 2-8. Linear Algebra Operations
Function	Description
torch.matmul()	Matrix product of two tensors, supports broadcasting
torch.chain_matmul()	Matrix product of N tensors
torch.mm()	Matrix product of two tensors (if broadcasting required, use matmul())
torch.addmm()	Computes matrix product of two tensors and adds to input
torch.bmm()	Computes a batch of matrix products
torch.addbmm()	Computes a batch of matrix products and adds to input
torch.baddbmm()	Computes a batch of matrix products and adds to input batch
torch.mv()	Computes product of matrix and vector
torch.addmv()	Computes product of matrix and vector and adds to input
torch.matrix_power(input, n)	Returns tensor raised to the power of n (for square tensors)
torch.eig()	Finds the eigenvalues and eigenvectors of a real square tensor
torch.inverse()	Computes the inverse of a square tensor
torch.det()	Computes the determinant of matrix or batch of matrices
torch.logdet()	Computes the log determinant of matrix or batch of matrices
torch.dot()	Computes the inner product of two tensors
torch.addr()	Computes the outer product of two tensors and adds to input
torch.solve	Returns solution to a system of linear equations
torch.svd	Performs single value decomposition
torch.pca_lowrank()	Performs linear Principle Component Analysis
torch.cholesky()	Computes Cholesky decomposition
torch.cholesky_inverse()	Computes inverse of a symmetric positive-definite matrix and returns Cholesky factor
torch.cholesky_solve()	Solves a system of linear equations using Cholesky factor

The final type of mathematical operations are considered spectral and other math operations. Depending on the domain of interest, these function may be useful for data transforms or analysis. For example, spectral operations like the Fast Fourier Transform can play an important role in computer vision or digital signal processing applications. Table 2-9 lists some built-in operations for spectrum analysis and other mathematical operations.

Table 2-9. Spectral and Other Math Operations
Operation Type	Sample Functions
Fast, Inverse, and Short-time Fourier Transforms	fft(), ifft(), stft()
Real to complex FFT and Complex to real IFFT	rfft(), irfft()
Windowing algorithms	bartlett_window(), blackman_window(), hamming_window(), hann_window(),
Histogram and Bin Counts	histc(), bincount(),
Cumulative operations	cummax(), cummin(), cumprod(), cumsum(), trace() (sum of the diagonal) einsum() (sum of products using Einstein summation)
Normalization functions	cdist(), renorm()
Cross product and Dot product	cross(), tensordot(), cartesian_prod()
Functions that create a diagonal tensor with elements of the input tensor	diag(), diag_embed(), diag_flat(), diagonal()
Einstein summation	einsum()
Matrix reduction and restructuring	flatten(), flip(), rot90(), repeat_interleave(), meshgrid(), roll(), combinations()
Functions that return the lower or upper triangles and their indices	tril(), tril_indices, triu(), triu_indices()

Automatic Differentiation (Autograd)

One function is worth calling out in it’s own subsection because it’s what makes PyTorch so powerful for deep learning development. The backward() function uses PyTorch’s automatic differentiation package, torch.autograd, to differentiate and compute gradients of tensors based on the chain rule. Here’s a simple example of auto differentiation. We define a function, f = sum(x^2). If we want to find dx/df, we need to set the requires_grad = True flag for x. f.backward() performs the differentiation with respect to f and stores dx/df in the x.grad attribute.

x = torch.tensor([[1,2,3],[4,5,6]], dtype=torch.float, requires_grad=True)
f = x.pow(2).sum()
print(f)
> tensor(91., grad_fn=<SumBackward0>)
f.backward()
print(x.grad) # dx/df
> tensor([[ 2., 4., 6.],
    [ 8., 10., 12.]])

Note

Only Tensors of floating point dtype can require gradients.

Training neural networks requires us to compute the weight gradients on the backward pass during training. As our neural networks get deeper and more complex, this feature automates the complex computations.

This chapter provided a quick reference to create tensors and perform operations. Now that you have a good foundation on tensors and how to use them, we will focus on how to use tensors and PyTorch to perform deep learning research. In the next chapter, we will review and provide a quick reference to the deep learning development process before jumping into writing code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2. Tensors

Create new playlist

Sign In

Sign Up

Chapter 2. Tensors

What is a Tensor?

Simple CPU Example

Note

Simple GPU Example

Moving Tensors between CPU & GPU

Note

Creating Tensors

Note

Tensor Attributes

Data Types

Note

Creating Tensors from Random Samples

Creating Tensors Like Other Tensors

Tensor Operations

Indexing, Slicing, Combining & Splitting Tensors

Note

Tensor Operations for Mathematics

Note

Automatic Differentiation (Autograd)

Note

Table of Contents for
2. Tensors