JIT classes

As of today, Numba doesn't support optimization of generic Python objects. This limitation, however, doesn't have a huge impact on numerical codes as they usually involve arrays and math operations exclusively.

Nevertheless, certain data structures are much more naturally implemented using objects; therefore, Numba provides support for defining classes that can be used and compiled to fast, native code.

Bear in mind that this is one of the newest (almost experimental) features, and it is extremely useful as it allows us to extend Numba to support fast data structures that are not easily implemented with arrays.

As an example, we will show how to implement a simple linked list using JIT classes. A linked list can be implemented by defining a Node class that contains two fields: a value and the next item in the list. As you can see in the following figure, each Node connects to the next and holds a value, and the last node contains a broken link, to which we assign a value of None:

In Python, we can define the Node class as follows:

    class Node:
def __init__(self, value):
self.next = None
self.value = value

We can manage the collection of Node instances by creating another class, called LinkedList. This class will keep track of the head of the list (in the preceding figure, this corresponds to the Node with value 3). To insert an element in the front of the list, we can simply create a new Node and link it to the current head.

In the following code, we develop the initialization function for LinkedList and the LinkedList.push_back method that inserts an element in the front of the list using the strategy outlined earlier:

    class LinkedList:

def __init__(self):
self.head = None

def push_front(self, value):
if self.head == None:
self.head = Node(value)
else:
# We replace the head
new_head = Node(value)
new_head.next = self.head
self.head = new_head

For debugging purposes, we can also implement the LinkedList.show method that traverses and prints each element in the list. The method is shown in the following snippet:

        def show(self):
node = self.head
while node is not None:
print(node.value)
node = node.next

At this point, we can test our LinkedList and see whether it behaves correctly. We can create an empty list, add a few elements, and print its content. Note that since we are pushing elements at the front of the list, the last elements inserted will be the first to be printed:

    lst = LinkedList()
lst.push_front(1)
lst.push_front(2)
lst.push_front(3)
lst.show()
# Output:
# 3
# 2
# 1

Finally, we can implement a function, sum_list, that returns the sum of the elements in the linked list. We will use this method to time differences between the Numba and pure Python version:

    @nb.jit
def sum_list(lst):
result = 0
node = lst.head
while node is not None:
result += node.value
node = node.next
return result

If we measure the execution time of the original sum_list version and the nb.jit version, we see that there is not much difference. The reason is that Numba cannot infer the type of classes:

    lst = LinkedList()
[lst.push_front(i) for i in range(10000)]

%timeit sum_list.py_func(lst)
1000 loops, best of 3: 2.36 ms per loop

%timeit sum_list(lst)
100 loops, best of 3: 1.75 ms per loop

We can improve the performance of sum_list by compiling the Node and LinkedList classes using the nb.jitclass decorator.

The nb.jitclass decorator takes a single argument that contains the attribute types. In the Node class, the attribute types are int64 for value and Node for next. The nb.jitclass decorator will also compile all the methods defined for the class. Before delving into the code, there are two observations that need to be made.

First, the attribute declaration has to be done before the class is defined, but how do we declare a type we haven't defined yet? Numba provides the nb.deferred_type() function, which can be used for this purpose.

Second, the next attribute can be either None or a Node instance. This is what is called an optional type, and Numba provides a utility, called nb.optional, that lets you declare variables that can be (optionally) None.

This Node class is illustrated in the following code sample. As you can see,  node_type is predeclared using nb.deferred_type(). The attributes are declared as a list of pairs containing the attribute name and the type (also note the use of nb.optional). After the class declaration, we are required to declare the deferred type:

    node_type = nb.deferred_type()

node_spec = [
('next', nb.optional(node_type)),
('value', nb.int64)
]

@nb.jitclass(node_spec)
class Node:
# Body of Node is unchanged

node_type.define(Node.class_type.instance_type)

The LinkedList class can be easily compiled, as follows. All that's needed is to define the head attribute and to apply the nb.jitclass decorator:

    ll_spec = [
('head', nb.optional(Node.class_type.instance_type))
]

@nb.jitclass(ll_spec)
class LinkedList:
# Body of LinkedList is unchanged

We can now measure the execution time of the sum_list function when we pass a JIT LinkedList:

    lst = LinkedList()
[lst.push_front(i) for i in range(10000)]

%timeit sum_list(lst)
1000 loops, best of 3: 345 µs per loop

%timeit sum_list.py_func(lst)
100 loops, best of 3: 3.36 ms per loop

Interestingly, when using a JIT class from a compiled function, we obtain a substantial performance improvement against the pure Python version. However, using the JIT class from the original sum_list.py_func actually results in worse performance. Ensure that you use JIT classes only inside compiled functions!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.192.120