Before we dive deep into the world of PyTorch development, it’s important to understand the fundamental data structure in PyTorch, the tensor. By understanding the tensor, you will understand how PyTorch handles and stores data, and since deep learning is fundamentally a collection and manipulation of floating point numbers, understanding tensors will help you understand how PyTorch implements more advanced functions for deep learning. In addition, you may find yourself using tensor operations frequently when pre-processing input data or manipulating output data during model development.
This chapter serves as a quick reference to understanding tensors and implementing tensor functions within your code. We’ll begin by describing what a tensor is and show simple examples of how to use functions to create, manipulate, and accelerate tensors operations on a GPU. Next, we’ll take a broader look at the API for creating tensors and performing math operations so that you can quickly reference a comprehensive list of tensor capabilities. In each section, we will explore some of the more important functions, identify common pitfalls, and provide key points in their usage.
In PyTorch, a tensor is a data structure used to store and manipulate data. Like a NumPy array, a tensor is a multidimensional array containing elements of a single data type. Tensors can be used to represent scalars, vectors, matrices, and n-dimensional arrays and are derived from the torch.Tensor class. However, tensors are more than just arrays of numbers. By creating or instantiating a tensor object from the torch.Tensor class we also get a set of built-in class attributes and operations or class methods which provide a robust set of built-in capabilities. This chapter describes these attributes and operations in detail.
Unlike NumPy arrays, however, tensors include added benefits which make them more suitable for deep learning calculations. First, tensor operations can be performed significantly faster using GPU acceleration. Second, tensors can be stored and manipulated at scale using distributed processing on multiple CPUs and GPUs and across multiple servers. And third, tensors keep track of their graph computations, which we will see in the section on Autograd, is very important in implementing a deep learning library.
To further explain what a tensor actually is and how to use them, let’s begin by showing a simple example that creates some tensors and performs a tensor operation.
Here’s a simple example that creates a tensor, performs a tensor operation, and uses a built-in method on the tensor itself. By default, the tensor data type will be derived from the input data type and the tensor will be allocated to the CPU device. First, we import the PyTorch library, then we create two tensors, x and y, from two-dimensional lists. Next we add the two tensors and store the result in z. Notice we can just use the “+” operator since the torch.Tensor class supports operator overloading. Finally, we print the new tensor z, which we can see is the matrix sum of x and y, and we print the size of z. Notice that z is a tensor object itself and the size() method is used to return its matrix dimensions, namely two by three.
import
torch
x
=
torch
.
tensor
([[
1
,
2
,
3
],[
4
,
5
,
6
]])
y
=
torch
.
tensor
([[
7
,
8
,
9
],[
10
,
11
,
12
]])
z
=
x
+
y
(
z
)
>
tensor
([[
8
,
10
,
12
],
[
14
,
16
,
18
]])
(
z
.
size
())
>
torch
.
Size
([
2
,
3
])
You may see the torch.Tensor() (capital T) constructor used in legacy code. This is an alias for the default tensor type torch.FloatTensor. You should use torch.tensor() to create tensors.
Since accelerating tensor operations on a GPU is a major advantage of tensors over Numpy arrays, we’ll show you an easy example to do so. Here’s the same example but instead we move the tensors to the GPU device if one is available. Notice that the output tensor is also allocated to the GPU. You can also use the device attribute (e.g. z.device) to double check where the tensor resides. In the first line, the torch.cuda.is_available() function will return True is your machine has GPU support. This is a convenient way to write more robust code that can be accelerated when a GPU exists but also runs on a CPU when a GPU is not present. Also, notice that the device=cuda:0 indicates that the first GPU is being used. If your machine contains multiple GPUs, you can also control which GPU is being used.
device
=
"cuda"
if
torch
.
cuda
.
is_available
()
else
"cpu"
x
=
torch
.
tensor
([[
1
,
2
,
3
],[
4
,
5
,
6
]],
device
=
device
)
y
=
torch
.
tensor
([[
7
,
8
,
9
],[
10
,
11
,
12
]],
device
=
device
)
z
=
x
+
y
(
z
)
>
tensor
([[
8
,
10
,
12
],
[
14
,
16
,
18
]],
device
=
'cuda:0'
)
(
z
.
size
())
>
torch
.
Size
([
2
,
3
])
(
z
.
device
)
>
cuda
:
0
The previous code uses torch.tensor() to create a tensor on a specific device; however, you may want to move an existing tensor between devices. You can do so by using the torch.to() method. When new tensors are created as a result of tensor operations, PyTorch will create the new tensor on the same device. In the following code, z resides on the GPU because x and y reside on the GPU. The tensor z is moved back to the CPU using torch.to(“cpu”) for further processing. Also note that all the tensors within the operation must be on the same device. If x was on the GPU and y was on the CPU, we would get an error.
device
=
torch
.
device
(
“
cuda
”
if
torch
.
cuda
.
is_available
()
else
“
cpu
”
)
x
.
to
(
device
)
y
.
to
(
device
)
z
=
x
+
y
z
.
to
(
“
cpu
”
)
You can use strings directly as device parameters instead of device objects. The following are all equivalent:
device=”cuda”
device=torch.device(“cuda”)
device=”cuda:0”
device=torch.device(“cuda:0”)
The previous section showed a simple way to create tensors; however, there there are many others ways to do it. You can create tensors from pre-existing numeric data or create random samplings. Tensors can be created from pre-existing data stored in array-like structures such as lists, tuples, scalars, NumPy arrays, or serialized data files.
The following code illustrates some common ways to create tensors. First, it shows how to create a tensor from a list using torch.tensor(). This method can also be used to create tensors from other data structure like tuples, sets, or NumPy arrays. You can also create and initialize tensors using functions like torch.empty(), torch.ones(), and torch.zeros() and specifying the desired size. If you want to initialize a tensor with random values, PyTorch supports a robust set of functions that you can use such as torch.rand(), torch.randn(), and torch.randint(). Upon initialization, you can also specify the data type of your tensor elements as well as the device on which it is stored (i.e. CPU or GPU). Finally, PyTorch includes the ability to create tensors that have the same properties of other tensors but are initialized with different data. Functions with the _like postfix such as torch.empty_like() and torch.one_like() have the same size, data type, and device as another tensor but are initialized differently.
# Created from pre-existing arrays
w
=
torch
.
tensor
([
1
,
2
,
3
])
# from a list
w
=
torch
.
tensor
((
1
,
2
,
3
))
# from a tuple
w
=
torch
.
tensor
(
numpy
.
array
([
1
,
2
,
3
]))
# from a numpy array
# Initialized by size
w
=
torch
.
empty
(
100
,
200
)
# uninitialized, elements values are not predictable
w
=
torch
.
zeros
(
100
,
200
)
# all elements initialized with 1.0
w
=
torch
.
ones
(
100
,
200
)
# all elements initialized with 0.0
# Initialized by size with random values
w
=
torch
.
rand
(
100
,
200
)
# elements are random numbers from a uniform distribution on the interval [0, 1)
w
=
torch
.
randn
(
100
,
200
)
# elements are random numbers from a normal distribution with mean 0 and variance 1
w
=
torch
.
randint
(
5
,
10
,(
100
,
200
))
# elements are random integers between 5 and 10
# Initialized with specified data type or device
w
=
torch
.
empty
((
100
,
200
),
dtype
=
torch
.
float64
,
device
=
"cuda"
)
# Initialized to have same size, data type, and device as another tensor
x
=
torch
.
empty_like
(
x
)
There’s some legacy functions such as from_numpy() and as_tensor(). They have been replaced in practice by the torch.tensor() constructor, which can be used to handle all cases.
Table 2-1 lists PyTorch functions used to create tensors. You should use each one with the torch namespace, e.g. torch.empty(). You can find more details at https://pytorch.org/docs/stable/torch.html [PyTorch Tensor Documentation].
Function | Description |
---|---|
torch.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False) |
Creates tensor from existing data structure |
torch.empty(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates tensor from uninitialized elements based on the random state of values in memory |
torch.zeros(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a tensor with all elements initialized to 0.0 |
torch.ones(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a tensor with all elements initialized to 0.0 |
torch.arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a 1-D tensor of values over range with a common step value |
torch.linspace(start, end, steps=100, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a 1-D tensor of steps linearly spaced points between start and end |
torch.logspace(start, end, steps=100, base=10.0, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a 1-D tensor of steps logrithmically spaced points between start and end |
torch.eye(n, m=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a 2-D tensor with ones on the diagonal and zeros everywhere else |
torch.full(size, fill_value, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a tensor filled with fill_value |
torch.load(f) |
Loads a tensor from a serialized pickle file |
torch.save(f) |
Saves a tensor to a serialized pickle file |
The PyTorch documentation contains a complete list of functions for creating tensors as well as more detailed explanations of how to use them. Here are some common pitfalls and additional insights to keep in mind when creating tensors.
Most creation functions accept the optional dtype and device parameters so you can set these at creation.
You should use torch.arange() in favor of the deprecated torch.range() function. Use torch.arange() when the step size is known. Use torch.linspace() when the number of elements is known.
PyTorch provides torch.quantize_per_tensor() and torch.quantize_per_chan() functions that allow you to quantize the data to specific levels.
You can use torch.tensor() to create tensors from array-like structures such as lists, numpy arrays, tuples, and sets. To convert existing tensors to numpy arrays, and lists use the torch.numpy() and torch.tolist() functions respectively.
One PyTorch feature that has contributed to its popularity is the fact that it’s very Pythonic and object oriented in nature. Since a tensor is it’s own data type, you can read attributes of the tensor object itself. Now that you can create tensors, it’s useful to quickly find information about them by accessing their attributes. Assuming x is a tensor, you can access several attributes of x as follows:
x.dtype indicates the tensor’s data type (see Table 2-2 for list of PyTorch data types)
x.device indicates the tensor’s device location (e.g. CPU or GPU memory)
x.shape shows the tensor’s dimensions
x.ndim identifies the number of tensor’s dimensions or rank
x.requires_grad is a boolean indicating whether Tensor keeps track of graph computations (see “Automatic Differentiation (Autograd)” )
x.grad contains the actual gradients if requires_grad is True
x.grad_fn stores the graph computation function used if requires_grad is True
x.is_cuda, x.is_sparse, x.is_quantized, x.is_leaf, x.is_mkldnn are booleans indicating whether the tensor meets certain conditions
x.layout indicates how a tensor is laid out in memory
Remember when accessing object attributes, do not include parentheses () like you would with a class method (e.g. use x.shape not x.shape() ).
During deep learning development, it’s important to be aware of the data type used by your data and its calculations. So when you create tensors, you should control what data types are being used. As we mentioned before, all tensors elements have the same data type. You can specify the data type when creating to tensor by using the dtype parameter or you can cast a tensor to a new dtype using the casting method or to() method as shown in the code below.
# Specify data type at creation using dtype
w
=
torch
.
tensor
(
[
1
,
2
,
3
]
,
dtype
=
torch
.
float32
)
# Use casting method to cast to a new data type
w
.
int
(
)
# w remains a float32 after cast
w
=
w
.
int
(
)
# w changes to int32 after cast
# Use to() method to cast to a new type
w
=
w
.
to
(
torch
.
float64
)
w
=
w
.
to
(
dtype
=
torch
.
float64
)
# Python automatically converts data types during operations
x
=
torch
.
tensor
(
[
1
,
2
,
3
]
,
dtype
=
torch
.
int32
)
y
=
torch
.
tensor
(
[
1
,
2
,
3
]
,
dtype
=
torch
.
float32
)
z
=
x
+
y
Pass in data type
Define data type directly with dtype
Python automatically converts x to float32 and returns z as float32
Note that the casting and to() methods do not change the tensor’s data type unless you reassign the tensor. Also, when performing operations on mixed data types, PyTorch will automatically cast tensors to the appropriate type.
Most of the tensor creation functions allow you to specify the data type upon creation using the dtype parameter. When you set the dtype or cast tensors, remember to use the torch namespace such as torch.int64 not just int64. Table 2-2 lists all the available data types in PyTorch. Each data type results in a different tensor class depending on the tensor’s device. The corresponding tensor class is shown in the two rightmost columns for CPU and GPU respectively.
Data type | dtype | CPU tensor | GPU tensor |
---|---|---|---|
32-bit floating point (default) |
torch.float32 or torch.float |
torch.FloatTensor |
torch.cuda.FloatTensor |
64-bit floating point |
torch.float64 or torch.double |
torch.DoubleTensor |
torch.cuda.DoubleTensor |
16-bit floating point |
torch.float16 or torch.half |
torch.HalfTensor |
torch.cuda.HalfTensor |
8-bit integer (unsigned) |
torch.uint8 |
torch.ByteTensor |
torch.cuda.ByteTensor |
8-bit integer (signed) |
torch.int8 |
torch.CharTensor |
torch.cuda.CharTensor |
16-bit integer (signed) |
torch.int16 or torch.short |
torch.ShortTensor |
torch.cuda.ShortTensor |
32-bit integer (signed) |
torch.int32 or torch.int |
torch.IntTensor |
torch.cuda.IntTensor |
64-bit integer (signed) |
torch.int64 or torch.long |
torch.LongTensor |
torch.cuda.LongTensor |
Boolean |
torch.bool |
torch.BoolTensor |
torch.cuda.BoolTensor |
Memory Considerations and In-place Operations When performing deep learning experimentation, it’s also important to be aware of how tensors are stored in memory. This could have a major impact on the speed and performance of your deep learning models. Tensor values are allocated in contiguous blocks of memory managed as torch.Storage instances. No matter what their dimensions, tensors are stored as a 1D collection of elements in memory. They are indexed by keeping track of the size, offset, and per-dimension strides. Therefore, transposing or reshaping tensors does not need to move data; it simply changes its per-dimension strides.
To reduce space complexity, you may want to reuse memory and overwrite tensor values using in-place operations. To perform in-place operations, append the underscore (_) post-fix to the function name. For example, the function y.add_(x) adds x to y but the results would be stored in y.
The need to create random data comes up often during deep learning development. Sometimes, you will need to initialize weights to random values or create random inputs with specified distributions. PyTorch supports a very robust set of functions that you can use to create tensors from random data. As with other creation functions, you can specify the dtype and device when creating the tensor. Table 2-3 lists some examples of random sampling functions.
Function | Description |
---|---|
torch.rand(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Selects random values from uniform distribution on interval [0 to 1] |
torch.randn(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Selects random values from standard normal distribution with zero mean unit variance |
torch.normal(mean, std, *, generator=None, out=None) |
Selects random numbers rom normal distribution with specified mean and variance |
torch.randint(low=0, high, size, *, generator=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Selects random integers generated uniformly between specified low and high values |
torch.randperm(n, out=None, dtype=torch.int64, layout=torch.strided, device=None, requires_grad=False) |
Creates a random permutation of integers from 0 to n-1 |
torch.bernoulli(input, *, generator=None, out=None) |
Draws binary random numbers (0 or 1) from a Bernoulli distribution |
torch.multinomial(input, num_samples, replacement=False, *, generator=None, out=None) |
Selects random number from list according to weights from multinomial distribution |
You can also create tensors of values sampled from other distributions using torch.empty() and in-place functions including cauchy, exponential, geometric, and log_normal. Remember in-place methods use the underscore post-fix. For example, x = torch.empty([10,5].cauchy_() creates a tensor of random numbers drawn from the Cauchy distribution.
Sometimes you may want to create and initialize a tensor that has similar properties to another tensor including the dtype, device, and layout properties to facilitate calculations. Many of the tensor creation operations have a similarity function that allows you to easily do this. The similarity functions will have the post-fix _like. For example, torch.empty_like(tensor_a) will create an empty tensor with the dtype, device and layout properties of tensor_a. Some examples of similarity functions include empty_like(), zeros_like(), ones_like(), full_like(), rand_like(), randn_like(), and rand_int_like().
Now that you understand how to create tensors, let’s explore what you can do with them. PyTorch supports a robust set of tensor operations that allow you to access and transform your tensor data. First, we’ll describe how to access portions of your data, manipulate their elements, and combine tensors to form new tensors. Then, we’ll show you how to perform simple calculations as well as advanced mathematical computations, often in constant time. PyTorch provides many built-in functions. It’s useful to see what’s available before creating your own.
Once you have created tensors, you may want to access portions of the data and combine or split tensors to form new tensors. The following code demonstrates how to perform these types of operations. You can slice and index tensors in the same way you would slice and index NumPy arrays as shown in the first few lines of code. Note that indexing and slicing will return tensors even if it’s only a single element. You will need to use the item() function to convert a single element tensor to a Python value when passing to other functions like print(). Next, we see that slicing uses the same [start:end:step] format that is used for slicing Python lists and NumPy arrays, and boolean indexing allows you to extract portions of the data that meet certain criteria. PyTorch supports transposing and reshaping arrays as shown in the next few lines of code. Finally, you can combine or split tensors by using functions like torch.stack() and torch.unbind() respectively as shown.
x
=
torch
.
tensor
([[
1
,
2
],[
3
,
4
],[
5
,
6
],[
7
,
8
]])
(
x
)
>
tensor
([[
1
,
2
],
[
3
,
4
],
[
5
,
6
],
[
7
,
8
]])
# Indexing, returns a tensor
(
x
[
1
,
1
])
>
tensor
(
4
)
# Indexing, returns value as Python number
(
x
[
1
,
1
]
.
item
())
>
4
# Slicing
(
x
[:
2
,
1
])
>
tensor
([
2
,
4
])
# Boolean indexing
(
x
[
x
<
5
])
>
tensor
([
1
,
2
,
3
,
4
])
# Transpose array, x.t() or x.T can be used
(
x
.
t
())
>
tensor
([[
1
,
3
,
5
,
7
],
[
2
,
4
,
6
,
8
]])
# Changing shape, Usually view() is preferred over reshape()
(
x
.
view
((
2
,
4
)))
>
tensor
([[
1
,
3
,
5
,
7
],
[
2
,
4
,
6
,
8
]])
# Combining tensors
y
=
torch
.
stack
((
x
,
x
))
(
y
)
tensor
([[[
1
,
2
],
[
3
,
4
],
[
5
,
6
],
[
7
,
8
]],
[[
1
,
2
],
[
3
,
4
],
[
5
,
6
],
[
7
,
8
]]])
# Splitting tensors
a
,
b
=
x
.
unbind
(
dim
=
1
)
(
a
,
b
)
>
tensor
([
1
,
3
,
5
,
7
])
tensor
([
2
,
4
,
6
,
8
])
PyTorch provides a robust set of built-in functions that can be used to access, split, and combine tensors in different ways. Table 2-4 lists some commonly used functions to manipulate tensor elements.
Function | Description |
---|---|
torch.cat() |
Concatenates the given sequence of seq tensors in the given dimension. |
torch.chunk() |
Splits a tensor into a specific number of chunks. Each chunk is a view of the input tensor. |
torch.gather() |
Gathers values along an axis specified by dim |
torch.index_select() |
Returns a new tensor which indexes the input tensor along dimension dim using the entries in index which is a LongTensor. |
torch.masked_select() |
Returns a new 1-D tensor which indexes the input tensor according to the boolean mask mask which is a BoolTensor. |
torch.narrow() |
Returns a tensor that is a narrow version of the input tensor. |
torch.nonzero() |
Return the indices of non-zero elements. |
torch.reshape() |
you should use view() instead to insure tensor is not copied |
torch.split() |
Splits the tensor into chunks. Each chunk is a view of the original tensor. |
torch.squeeze() |
Returns a tensor with all the dimensions of input of size 1 removed. |
torch.stack() |
Concatenates sequence of tensors along a new dimension. |
torch.t() |
Expects input to be ⇐ 2-D tensor and transposes dimensions 0 and 1. |
torch.transpose() |
You can choose which dimensions to transpose. |
torch.take() |
Returns a tensor at specified indices when slicing is not continuous |
torch.unbind() |
Removes a tensor dimension by returning a tuple of the dimension removed. |
torch.unsqueeze() |
Returns a new tensor with a dimension of size one inserted at the specified position. |
torch.where() |
Return a tensor of elements selected from either x or y depending on condition |
Some of these functions may seem redundant. However, the following key distinctions and best practices are important to keep in mind:
torch.item() is an important and commonly used function to return the Python number from a tensor.
Use view() instead of reshape() for reshaping tensors in most cases. Using reshape() may cause the tensor to be copied depending on it’s layout in memory. view() ensures that it will not be copied.
Using x.T or x.t() is a simple way to transpose 1D or 2D tensors. Use transpose() when dealing with multidimensional tensors.
The torch.squeeze() function is used often in deep learning to remove an unused dimension. For example batch of images with a single image can be reduce from 4D to 3D using squeeze().
PyTorch is very Pythonic in nature. Like most Python classes, some PyTorch functions can be applied directly on a tensor using a built-in method such as x.size(). Other functions are called directly using the torch namespace. These functions take a tensor as an input such as x in torch.save(x, ‘tensor.pt’). Methods can also be chained together. For example, torch.rand(2,2).max().item() creates a 2x2 tensor of random floats, finds the maximum value, and returns the value itself from the resulting tensor.
Deep Learning development is strongly based on mathematical computations so PyTorch supports a very robust set of built-in math functions. Whether you are creating new data transforms, customizing loss functions or building your own optimization algorithms, you can speed up your research and development with the math functions provided by PyTorch. The purpose of this section is to provide a quick overview of many of the mathematical functions available so that you can quickly build your awareness of what currently exists and find the appropriate functions when needed.
PyTorch supports many different types of math functions including pointwise operations, reduction functions, comparison calculations, linear algebra, spectral and other math computations. The first type of useful math operations are considered to be pointwise operations. Pointwise operations perform an operation on each point in the tensor individually and return a new tensor. They are useful for rounding and truncation as well as trigonometrical and logical operations. By default, the functions will create a new tensor or use one passed in by the out parameter. If you want to perform an in-place operation, remember to append the underscore to the function name. Table 2-5 lists some commonly used pointwise operations.
Operation Type | Sample Functions |
---|---|
Basic Math |
add(), div(), mul(), neg(), reciprocal(), true_divide() |
Truncation |
ceil(), clamp(), floor(), floor_divide(), fmod(), frac(), remainder(), round(), sigmoid(), trunc(), lerp() |
Complex Numbers |
real(), imag(), conj(), abs(), angle() |
Trigonometric |
cos(), sin(), tan(), asin(), acos(), atan(), sinh(), cosh(), tanh(), deg2rad(), rad2deg() |
Exponential & Logrithmic |
exp(), expm1(), log(), log10(), log1p(), log2(), logaddexp()pow(), rsqrt(), sqrt(), square() |
Logical |
logical_and(), logical_or(), logical_not(), logical_xor() |
Cumulative Math |
addcdiv(), addcmul() |
Bitwise |
bitwise_not(), bitwise_and(), bitwise_or(), bitwise_xor() |
Error Functions |
erf(), erfc(), erfinv(), |
Gamma Functions |
digamma(), lgamma(), mvlgamma(), polygamma() |
Use Python hints or refer to the PyTorch documentation for details on function usage. Note that true_divide() converts tensor data to floats first, and should be used when dividing integers to obtain true division results.
Three different syntaxes can be used for most tensor operations. Tensors support operator overloading so you can use operators directly such as z = x + y Although this is less common, you can also use PyTorch functions such as torch.add() to do the same thing. Lastly, you can perform operations in-place using the underscore () post-fix. The function y.add(x) would achieve the same results but the results would be stored in y.
A second type of math functions are called reduction operations. Reduction operations reduce a bunch of numbers down to a single number or smaller set of numbers. That is, they reduce the dimensionality or rank of the tensor. Reduction operations include functions like maximum or minimum values as well as many statistical calculations like mean or standard deviation. These operations are frequently used in deep learning. For example, deep learning classification often uses the argmax() function to reduce softmax outputs to a dominant class.
Table 2-6 shows a list of commonly used reduction operations. Note that many of these functions accept dim parameter which specifies the dimension of reduction for multi-dimensional tensors. This is similar to the axis parameter in NumPy. By default, when dim is not specified, the reduction occurs across all dimensions. Specifying dim = 1, will compute the operation across each row. For example, torch.mean(x,1) will compute the mean for each row in tensor x.
Function | Description |
---|---|
torch.argmax(input, dim, keepdim=False, out=None) |
Returns the indice(s) of the max value across all elements or just a dimension if it’s specified |
torch.argmin(input, dim, keepdim=False, out=None) |
Returns the indice(s) of the min value across all elements or just a dimension if it’s specified |
torch.dist(input, dim, keepdim=False, out=None) |
Computes the p-norm of two tensors |
torch.logsumexp(input, dim, keepdim=False, out=None) |
Computes the log of summed exponentials of each row of the input tensor in the given dimension dim |
torch.mean(input, dim, keepdim=False, out=None) |
Computes the mean or average across all elements or just a dimension if it’s specified |
torch.median(input, dim, keepdim=False, out=None) |
Computes the median or middle value across all elements or just a dimension if it’s specified |
torch.mode(input, dim, keepdim=False, out=None) |
Computes the mode or most frequent value across all elements or just a dimension if it’s specified |
torch.norm(input, p=fro, dim=None, keepdim=False, out=None, dtype=None) |
Computes the matrix or vector norm across all elements or just a dimension if it’s specified |
torch.prod(input, dim, keepdim=False, dtype=None) |
Computes the product of all elements or just a dimension if it’s specified |
torch.std(input, dim, keepdim=False, out=None) |
Computes the standard deviation across all elements or just a dimension if it’s specified |
torch.std_mean(input, unbiased=True) |
Computes the standard deviation and mean across all elements or just a dimension if it’s specified |
torch.sum(input, dim, keepdim=False, out=None) |
Computes the sum of all elements or just a dimension if it’s specified |
torch.unique(input, dim, keepdim=False, out=None) |
Removes duplicates across entire tensor or just a dimension if it’s specified |
torch.unique_consecutive(input, dim, keepdim=False, out=None) |
Similar to torch.unique() but only removes consecutive duplicates |
torch.var(input, dim, keepdim=False, out=None) |
Computes the variance across all elements or just a dimension if it’s specified |
torch.var_mean(input, dim, keepdim=False, out=None) |
Computes the mean and variance across all elements or just a dimension if it’s specified |
A third type of mathematical operations are called comparison functions. Comparison functions usually compare all the values within a tensor or compare one tensor’s values to another’s. They can return a tensor full of booleans based on each element’s value such as torch.eq() or torch.is_boolean(). They may also find the maximum or minimum value, sort tensor values, or return the top subset of tensors elements. Table 2-7 lists some commonly used comparison functions for your reference.
Operation Type |
Sample Functions |
Compare tensor to other tensors |
eq(), gt(), ge(), lt(), le(), ne() or ==, >, >=, <, ⇐, != respectively |
Test tensor status or conditions |
isclose(), isfinite(), isinf(), isnan() |
Return a single boolean for the entire tensor |
allclose(), equal() |
Find value(s) over the entire tensor or along a given dimension |
max(), min(), kthvalue(), sort(), topk(), argsort() |
Comparison functions seem pretty straight forward, however, there are a few key points to keep in mind. Some common pitfalls are described as follows:
The torch.eq() function or == returns a tensor of the same size with a Boolean result for each element. The torch.equal() function tests if the tensors are the same size and that all elements within tensor are equal then returns a single Boolean value.
The function torch.allclose() also returns a single boolean value if all elements are close to a specified value.
The next type of mathematical functions are considered linear algebra functions. Linear algebra functions facilitate matrix operations and are important for deep learning computations. Many computational including gradient descent and optimization algorithms use linear algebra to implement their calculations. PyTorch supports a robust set of built-in linear Algebra operations, many of which are based on the BLAS and LAPACK standardized libraries for linear algebra calculations.
Table 2-8 lists some commonly used linear algebra operations. They range from matrix multiplication to batch calculations and solvers. It’s important to point out that matrix multiplication is not the same as pointwise multiplication with torch.mul() or the * operator. A complete study of Linear algebra is beyond the scope of this book, but you may find it useful to access some of the linear algebra functions when performing feature reduction or developing custom deep learning algorithms. See PyTorch Linear Algebra Documentation for a complete list of available functions and more details on how to use them.
Function | Description |
---|---|
torch.matmul() |
Matrix product of two tensors, supports broadcasting |
torch.chain_matmul() |
Matrix product of N tensors |
torch.mm() |
Matrix product of two tensors (if broadcasting required, use matmul()) |
torch.addmm() |
Computes matrix product of two tensors and adds to input |
torch.bmm() |
Computes a batch of matrix products |
torch.addbmm() |
Computes a batch of matrix products and adds to input |
torch.baddbmm() |
Computes a batch of matrix products and adds to input batch |
torch.mv() |
Computes product of matrix and vector |
torch.addmv() |
Computes product of matrix and vector and adds to input |
torch.matrix_power(input, n) |
Returns tensor raised to the power of n (for square tensors) |
torch.eig() |
Finds the eigenvalues and eigenvectors of a real square tensor |
torch.inverse() |
Computes the inverse of a square tensor |
torch.det() |
Computes the determinant of matrix or batch of matrices |
torch.logdet() |
Computes the log determinant of matrix or batch of matrices |
torch.dot() |
Computes the inner product of two tensors |
torch.addr() |
Computes the outer product of two tensors and adds to input |
torch.solve |
Returns solution to a system of linear equations |
torch.svd |
Performs single value decomposition |
torch.pca_lowrank() |
Performs linear Principle Component Analysis |
torch.cholesky() |
Computes Cholesky decomposition |
torch.cholesky_inverse() |
Computes inverse of a symmetric positive-definite matrix and returns Cholesky factor |
torch.cholesky_solve() |
Solves a system of linear equations using Cholesky factor |
The final type of mathematical operations are considered spectral and other math operations. Depending on the domain of interest, these function may be useful for data transforms or analysis. For example, spectral operations like the Fast Fourier Transform can play an important role in computer vision or digital signal processing applications. Table 2-9 lists some built-in operations for spectrum analysis and other mathematical operations.
Operation Type | Sample Functions |
---|---|
Fast, Inverse, and Short-time Fourier Transforms |
fft(), ifft(), stft() |
Real to complex FFT and Complex to real IFFT |
rfft(), irfft() |
Windowing algorithms |
bartlett_window(), blackman_window(), hamming_window(), hann_window(), |
Histogram and Bin Counts |
histc(), bincount(), |
Cumulative operations |
cummax(), cummin(), cumprod(), cumsum(), trace() (sum of the diagonal) einsum() (sum of products using Einstein summation) |
Normalization functions |
cdist(), renorm() |
Cross product and Dot product |
cross(), tensordot(), cartesian_prod() |
Functions that create a diagonal tensor with elements of the input tensor |
diag(), diag_embed(), diag_flat(), diagonal() |
Einstein summation |
einsum() |
Matrix reduction and restructuring |
flatten(), flip(), rot90(), repeat_interleave(), meshgrid(), roll(), combinations() |
Functions that return the lower or upper triangles and their indices |
tril(), tril_indices, triu(), triu_indices() |
One function is worth calling out in it’s own subsection because it’s what makes PyTorch so powerful for deep learning development. The backward() function uses PyTorch’s automatic differentiation package, torch.autograd, to differentiate and compute gradients of tensors based on the chain rule. Here’s a simple example of auto differentiation. We define a function, f = sum(x^2). If we want to find dx/df, we need to set the requires_grad = True flag for x. f.backward() performs the differentiation with respect to f and stores dx/df in the x.grad attribute.
x
=
torch
.
tensor
([[
1
,
2
,
3
],[
4
,
5
,
6
]],
dtype
=
torch
.
float
,
requires_grad
=
True
)
f
=
x
.
pow
(
2
)
.
sum
()
(
f
)
>
tensor
(
91.
,
grad_fn
=<
SumBackward0
>
)
f
.
backward
()
(
x
.
grad
)
# dx/df
>
tensor
([[
2.
,
4.
,
6.
],
[
8.
,
10.
,
12.
]])
Only Tensors of floating point dtype can require gradients.
Training neural networks requires us to compute the weight gradients on the backward pass during training. As our neural networks get deeper and more complex, this feature automates the complex computations.
This chapter provided a quick reference to create tensors and perform operations. Now that you have a good foundation on tensors and how to use them, we will focus on how to use tensors and PyTorch to perform deep learning research. In the next chapter, we will review and provide a quick reference to the deep learning development process before jumping into writing code.
18.116.13.113