Writing a native fuzzy search implementation

This is where things start to get interesting. When working with existing C++ code, you'll probably want to work with multiple instances of the same class, but we can't just instantiate a C++ class in Dart. Instead, we'll crate a wrapper class in Dart that stores the memory address of the C++ object. Let's start by creating a C++ class with a native fuzzy search implementation:

// fuzzy_search.hpp
class FuzzySearch {
public:
    FuzzySearch(Dart_Handle inputItems);
    std::vector<char*> search(const char* term);
    size_t size();
private:
    char** items;
    size_t length
};

This is our header file. Header files in C/C++ are commonly used to define interfaces of libraries. In practice, we can use functions and data structures without worrying about their implementation. Previously, we used dart_api.h, which defines, for example, Dart_Handle or Dart_SetReturnValue():

// fuzzy_search.cpp
#include <vector>
#include "fuzzy_search.hpp"
#include "include/dart_api.h"

FuzzySearch::FuzzySearch(Dart_Handle inputList) {
    intptr_t length_ptr;

    Dart_ListLength(inputList, &length_ptr);
    length = length_ptr;
    // Allocate memory for an array of C strings.
    items = (char**)malloc(length * sizeof(char*));
    
    for (int i = 0; i < length; i++) {
        const char* cname;
        // Convert Dart's String to char*.
        Dart_StringToCString(
            Dart_ListGetAt(inputList, i), &cname);
        // Allocate memory for the string + termination flag.
        items[i] = (char*)malloc(strlen(cname) + 1);
        strcpy(items[i], cname);
    }
}

bool fuzzy_search(const char* item, const char* term) {
    // Same approach as fuzzy search in Chapter 2.
    size_t term_lng = strlen(term);
    size_t item_lng = strlen(item);
    int ti = 0; // term index
    // int si = 0; // key index
    for (int si = 0; si < item_lng; si++) {
        if (term[ti] == item[si] && ++ti == term_lng) {
            return true;
        }
    }
    return false;
}

std::vector<char*> FuzzySearch::search(const char* term) const {
    std::vector<char*> results;
    for (int i = 0; i < size(); i++) {
        if (fuzzy_search(items[i], term)) {
            results.push_back(items[i]);
        }
    }
    return results;
}

size_t FuzzySearch::size() {
    return length;
}

The search algorithm is the same as the one we used in Chapter 2, Practical Dart. The FuzzySearch::search() method returns a std::vector<char*> object, which is a resizable list that we'll later convert to a Dart List object. In the constructor, we allocated an array of pointers to arrays of characters (a 2D array) representing each searchable item.

Then, we create two functions accessible from Dart. The first one will create a new instance of FuzzySearch, and the second one will perform a single search:

// main.cpp
void FuzzySearchCreate(Dart_NativeArguments arguments) {
    Dart_EnterScope();
    Dart_Handle inputList = Dart_GetNativeArgument(arguments, 0);
    
    FuzzySearch *fuzzy = new FuzzySearch(inputList);
    // Return pointer to the object (memory address).
    Dart_Handle result = Dart_NewInteger((int64_t)fuzzy);
    Dart_SetReturnValue(arguments, result);
    Dart_ExitScope();
}

void FuzzySearchSearch(Dart_NativeArguments args) {
    Dart_EnterScope();
    int64_t ptr;
    const char* term;
    Dart_IntegerToInt64(Dart_GetNativeArgument(args, 0), &ptr);
    Dart_StringToCString(Dart_GetNativeArgument(args, 1), &term);

    // Get the instance of FuzzySearch at this memory address.
    FuzzySearch *fuzzy = reinterpret_cast<FuzzySearch*>ptr;
    std::vector<char*> results = fuzzy->search(term);
    
    Dart_Handle result = Dart_NewList(results.size());
    for (int i = 0; i < results.size(); i++) {
        Dart_ListSetAt(result, i, 
            Dart_NewStringFromCString(results.at(i)));
    }
    Dart_SetReturnValue(args, result);
    Dart_ExitScope();
}

Note what we're actually returning from FuzzySearchCreate(). It's not an object but just an integer with a memory address of a FuzzySearch instance in the memory. Then, in FuzzySearchSearch(), we pass this address as an argument and use type casting to get the proper instance of FuzzySearch. Running search() returns a vector of char*, so we need to create an empty list first and fill it with Dart strings. Also, we need to know the exact size of a Dart list before creating it.

The Dart code is just a wrapper class:

// lib/fuzzy_search/fuzzy_search.dart
import 'dart-ext:fuzzy_search';

// All access to native functions is done via NativeFuzzySearch
// class and never directly.
int _create(List terms) native "FuzzySearchCreate";
List<String> _search(int ptr, term) native "FuzzySearchSearch";

class NativeFuzzySearch implements FuzzySearchInterface {
  int _pointer;
  
  NativeFuzzySearch(List terms) {
    _pointer = _create(terms);
  }
  
  List<String> search(String term) => _search(_pointer, term);
}

Note

The int built-in type is a 32-bit or 64-bit integer depending on the Dart VM build. As our native extension has to be built for the same architecture, the pointer always fits into its range.

We can use this wrapper class just like any other class in Dart without even noticing that under the hood, there are some pointers passing here and there:

import 'package:Chapter_09/fuzzy_search/fuzzy_search.dart';

main() {
  List<String> dict = ['item1', 'item2', ...];
  NativeFuzzySearch nativeFuzzy = new NativeFuzzySearch(dict);
  List<String> results = nativeFuzzy.search('test_term'),
  print(results);
}

In order to test and compare the performance of both Dart and C/C++ implementations, we'll use the same dataset from Chapter 2, Practical Dart, but we'll replicate it more than 20 times, run a couple of test searches over it multiple times, and print average processing time for each of them (the complete benchmarking code is among the source code for this chapter):

Testing FuzzySearch (199034 items):
average: 18 ms

Testing NativeFuzzySearch (199034 items):
average: 13 ms

We can see that the native implementation is about 28 percent faster.

Note

We're measuring the processing time by ourselves, which is easier for demonstration purposes. If you're looking for a more standardized environment, take a look at https://github.com/dart-lang/benchmark_harness.

Optimizing our C/C++ code

However, there are some optimization tricks that we can use.

We allocate memory for every single item in the list with malloc(). This is correct, but the bottleneck is that items are shattered all over the memory. Each access to memory retrieves not just the exact amount of bytes that we need but an entire memory word (typically, 4 bytes on 32-bit systems and 8 bytes on 64-bit systems). Loaded memory words are saved into the processor's L1-3 caches.

This implies that when we have, for example, a 9-character long string, it requires 2 memory accesses on a 64-bit system where the second memory access is required to load just one byte. Therefore, when we're iterating the entire array of char*, it's better to allocate one large memory block for all the items. A better version of our constructor will look like:

FuzzySearch::FuzzySearch(Dart_Handle inputList) {
    intptr_t length_ptr;
    // Get Dart's List length.
    Dart_ListLength(inputList, &length_ptr);
    length = length_ptr;

    items = (char**)malloc(length * sizeof(char*));

    // Calculate total size of all items in bytes.
    uint32_t size = 0;
    for (int i = 0; i < length; i++) {
        const char* cname;
        Dart_StringToCString(
            Dart_ListGetAt(inputList, i), &cname);
        size += strlen(cname) + 1;
    }
    // Allocate one large block of memory.
    char* ptr = (char*)malloc(size * sizeof(char));
    
    for (int i = 0; i < length; i++) {
        const char* cname;
        Dart_StringToCString(
            Dart_ListGetAt(inputList, i), &cname);
        items[i] = ptr;
        strcpy(items[i], cname);
        // Increment the pointer.
        ptr += strlen(cname) + 1;
    }
}

With this updated version, the average search time is about 11.5 ms.

Note

In order to keep the examples simple, we're not checking returned values from malloc() or Dart_StringToCString(), which in a real application should be checked for error code.

But we can go even further. We can make use of multiple cores without any manual splitting and merging C++ vectors for each core by ourselves. We'll make use of the OpenMP 4.0 standard that is supported, for example, in GNU GCC 4.9 compiler and modify the search() method with two new statements for the precompiler:

std::vector<char*> FuzzySearch::search(const char* term) {
    #pragma omp declare reduction (merge : std::vector<char*> : 
      omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))
    std::vector<char*> results;
    
    #pragma omp parallel for reduction(merge: results)
    for (int i = 0; i < length; i++) {
        if (fuzzy_search(items[i], term)) {
            results.push_back(items[i]);
        }
    }
    return results;
}

These two #pragma omp statements are ignored by compilers that don't support OpenMP. As vector<T> is not thread-safe, we create a private instance for each thread and then merge all of them into one vector<T> result. OpenMP does all this automatically without modifying our code at all. We could hardcode how many threads will be spawned, but OpenMP can read the OMP_NUM_THREADS environment variable for how many threads to spawn in each parallel block so that we can easily test what's the best number for us.

On Intel Code Duo 2.5 GHz, the best results occur with four threads that give 5.1 ms on average. At the end, our native implementation is 3.5 times faster than the Dart code running in the Dart VM. Note that the standalone Dart VM has much better performance than Dartium.

We could also use compiler options such as strict aliasing to optimize the usage of pointers or try to work with ifs and loops for better branch prediction, but this level of optimization is generally considered for developers with no friends.

Multithreading with Dart Isolates

Although Dart doesn't have C-style threads, we can still make use of multiple CPUs and cores with Dart Isolates using the built-in dart:isolate library that works both in the browser (using Web Workers) and in the standalone Dart VM. Every Dart app has at least one Isolate called root.

Unlike threads, Isolates don't share the same memory space, so they can't modify each other's variables and all communication between them must be done via message passing.

We're not going to go into it here because it wouldn't help us very much. There are no C-style thread locks for Isolates, so synchronizing multiple Isolates would be rather difficult. There's not much documentation available for Isolates right now, so the only way to get to know anything about it is in the Dart API at https://api.dartlang.org/apidocs/channels/stable/dartdoc-viewer/dart:isolate.Isolate.

There's also an experimental Dart runtime in development called Fletch that promises very high concurrency (https://github.com/dart-lang/fletch).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.36.146