Understanding the performance benefits of dispatch

Using a singleton type is nice because we can avoid writing conditional branches. Another side benefit is that performance can be greatly improved. An interesting example can be found in the ntuple function from Julia's Base package.

The ntuple function is used to create a tuple of N elements by applying a function over the sequence of 1 to N. For example, we can create a tuple of even numbers as follows:

The first argument is an anonymous function that doubles the value. Since we specified 10 in the second argument, it mapped over the range of 1 to 10 and gave us 2, 4, 6, ... 20. If we take a peek into the source code, we will find this interesting definition:

function ntuple(f::F, n::Integer) where F
t = n == 0 ? () :
n == 1 ? (f(1),) :
n == 2 ? (f(1), f(2)) :
n == 3 ? (f(1), f(2), f(3)) :
n == 4 ? (f(1), f(2), f(3), f(4)) :
n == 5 ? (f(1), f(2), f(3), f(4), f(5)) :
n == 6 ? (f(1), f(2), f(3), f(4), f(5), f(6)) :
n == 7 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7)) :
n == 8 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7), f(8)) :
n == 9 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7), f(8), f(9)) :
n == 10 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7), f(8), f(9), f(10)) :
_ntuple(f, n)
return t
end

While the code is indented quite nicely, we can clearly see that it supports up to 10 elements by hard coding the short-circuit branches with the ? and : ternary operators. If it's more than 10, then it calls another function to create the tuple:

function _ntuple(f, n)
@_noinline_meta
(n >= 0) || throw(ArgumentError(string("tuple length should be ≥ 0, got ", n)))
([f(i) for i = 1:n]...,)
end

This _ntuple function is expected to perform poorly because it creates an array using comprehension and then the result is splatted into a new tuple. You may be very surprised by the performance benchmarking result when we compare the case of creating a 10-element tuple versus an 11-element tuple:

The ntuple function is designed to perform optimally when the number of elements is small, that is, for 10 or fewer elements. It would be possible to change the ntuple function to hardcode more, but it would be too tedious to write the code, and the resulting code would be extremely ugly.

Perhaps a little more surprisingly, Julia actually comes with another variation of the same function while using the Val singleton type, as shown in the following screenshot:

There is literally no difference between 10 and 11 elements. In fact, even with 100 elements, the performance is quite reasonable (17 nanoseconds) compared to the non-Val version (820 nanoseconds). Let's take a look at how it is implemented. The following has been taken from the Julia source code:

# Using singleton type dynamic dispatch
# inferrable ntuple (enough for bootstrapping)
ntuple(f, ::Val{0}) = ()
ntuple(f, ::Val{1}) = (@_inline_meta; (f(1),))
ntuple(f, ::Val{2}) = (@_inline_meta; (f(1), f(2)))
ntuple(f, ::Val{3}) = (@_inline_meta; (f(1), f(2), f(3)))

@inline function ntuple(f::F, ::Val{N}) where {F,N}
N::Int
(N >= 0) || throw(ArgumentError(string("tuple length should be ≥ 0, got ", N)))
if @generated
quote
@nexprs $N i -> t_i = f(i)
@ncall $N tuple t
end
else
Tuple(f(i) for i = 1:N)
end
end

From the preceding code, we can see that there are a few functions being defined for tuples that have fewer than four elements. After that, the function uses a meta-programming technique to generate code on the fly. In this case, it uses a special construct that allows the compiler to choose between code generation and its generic implementation, which is represented in if-blocks and else-blocks in the code. Looking at how the @generated@nexprs, and @ncalls macros work is out of the scope of this section, but you are encouraged to find out more from the Julia reference manual. 

According to our preceding performance test, calling ntuple with Val(100) was quite fast, so it appears that the compiler has chosen the code generation path.

To summarize, we have learned how to use parametric types to create new singletons and create functions that are dispatched by these singleton types. We can apply this pattern whenever we need to handle such conditional branches.

Next, we will learn how to develop automated testing code effectively using stubs and mocks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.10.137