Handling mutable data types in the arguments

So far, we did not pay much attention to the arguments or keyword arguments being passed to the function. Care must be taken when any of those arguments are mutable. Why? Because our current implementation uses the arguments as the key of the dictionary cache. If we mutate the key of a dictionary, it could lead to unexpected results.

Suppose that we have a function that takes 2 seconds to run:

# This is a slow implementation
slow_sum_abs = (x::AbstractVector{T} where {T <: Real}) -> begin
sleep(2)
sum(abs(v) for v in x)
end

Knowing that it's quite slow, we happily memoize it as usual:

sum_abs = memoize(slow_sum_abs)

Initially, it seems to work perfectly, as it has always been:

However, we are shocked by the following observation:

Bummer! Rather than returning a value of 21, it returns the previous result as if -6 were not inserted to the array. Out of curiosity, let's push one more value to the array and try again:

It's working again. Why is that happening? To understand that, let's recap how the memoize function was written:

function memoize(f)
memo = Dict()
(args...; kwargs...) -> begin
x = (args, kwargs)
if haskey(memo, x)
return memo[x]
...

As you can see, we are caching the data using the (args, kwargs) tuple as the key of the dictionary object. The problem is that the argument being passed to the memoized sum_abs function is a mutable object. The dictionary object gets confused when the key is mutated. In that case, it may or may not locate the key anymore.

When we added -6 to the array, it found the same object in the dictionary and returned the cached result. When we added 7 to the array, it could not find the object. Hence, the function does not work 100% of the time.

To fix this issue, we need to make sure that the content of the arguments are considered, not just the memory address of the container. A common practice is to apply a hash function to the thing that we wish to use as a key to the dictionary. Here's one implementation:

function hash_all_args(args, kwargs)
h = 0xed98007bd4471dc2
h += hash(args, h)
h += hash(kwargs, h)
return h
end

The initial value of the h variable is randomly selected. On a 64-bit system, we can generate it with a call to rand(UInt64)The hash function is a generic function defined in the Base module. We will keep it simple here for illustration purposes. In reality, a better implementation would support a 32-bit system as well. 

The memoize function can now be rewritten to utilize such a hashing scheme:

We can test it again more extensively. Let's redefine the sum_abs function again using the new memoize function. Then, we run a loop and capture the calculation result and timing.

The result is shown as follows:

Fantastic! It now returns the correct result even though the input data has been mutated.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.170.63