The flyweight pattern

The flyweight pattern is used to handle a large number of fine-grained objects efficiently by sharing memory for similar/same objects. 

A good example of this involves handling strings. In the field of data science, we frequently need to read and analyze a large amount of data that is represented in a tabular format. In many cases, certain columns may contain a large number of strings that are just repeated values. For example, a population survey might have a column stating gender, and so it will contain either Male or Female.

Unlike some other programming languages, strings are not interned in Julia. This means that 10 copies of the word Male are going to be stored repeatedly, occupying 10 times the memory space that is used by a single string of Male. We can see this effect easily from the REPL, as follows:

So, storing 100,000 copies of a Male string occupies roughly 800 KB of memory. That is quite a waste of memory. A common way to solve this problem is to maintain a pooled array. Rather than storing 100,000 strings, we can just encode the data and store 100,000 bytes instead so that 0x01 corresponds to male and 0x00 corresponds to female. We can reduce the memory footprint eightfold by using s as follows:

You may wonder why there are 40 extra bytes being reported. Those 40 bytes are actually used by the array container. Now, given that the gender column is binary in this case, we can actually squeeze it further by storing bits instead of bytes, as follows:

Again, we reduce the memory usage approximately eightfold (by going from 1 byte to 1 bit) by using BitArray to store the gender values. This is an aggressive optimization of memory usage. But we still need to store the Male and Female strings somewhere, right? This is an easy task because we know they can be tracked in any data structure, such as a dictionary:

To summarize, we are now capable of storing 100,000 gender values in 12,568 + 370 = 12,938 bytes of memory. Compared to the original dumb way of storing strings directly, we have saved more than 98% of memory consumption! How did we achieve such a huge saving? Because all records share the same two strings. The only data that we have to maintain is an array of references to those strings.

So, that is the concept of the flyweight pattern. The same trick is used over and over again in many places. For example, the CSV.jl package uses a package called CategoricalArrays, which provides essentially the same kind of memory optimization.

Next, we will go over the last few traditional patterns—bridge, decorator, and facade.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.2.122