Sliding processing

As already seen in Chapter 3, CLR Internals, CLR has some limitations in memory management.

Working with engineering data means having to deal with a huge dataset of more than a million records.

Although we can load a simple integer array with millions of items in memory, the same thing will be impossible when the number rises by a lot, or the data type becomes heavier than a simple integer value.

The .NET has a complete enumerator-like execution model that can help us handle a billion items without ever having to deal with all such items in memory, all together. Here is an example on sliding processing:

static void Main(string[] args)
{
    //dataset
    //this dataset will not be streamed until needed
    var enumerableDataset = RetrieveHugeDataset();

    //start using the enumerable
    //this will actually start executing code within RetrieveHugeDataset method
    foreach (var item in enumerableDataset)
        if (item % 12 == 0)
            Console.WriteLine("-> {0}", item);

    //parallel elaboration is also available
    enumerableDataset
        .AsParallel()
        .Where(x => x % 12 == 0)
        .ForAll(item => Console.WriteLine("-> {0}", item));

    Console.ReadLine();
}

static readonly Random random = new Random();

//return an enumerable cursor to read data in a sliding way
static IEnumerable<int> RetrieveHugeDataset()
{
    //easy implementation for testing purpose
    for (int i = 0; i < 10000; i++)
    {
        //emulate some resource usage
        Thread.Sleep(random.Next(50, 200));

        //signal an item available to the enumerator
        yield return random.Next(10, 100000);
    }
}

Although this is a simple example, the ability to process a huge dataset with data-parallelism, without storing the whole dataset in memory, is often mandatory when dealing with special data elaboration such as what is produced by CNC systems or audio ADCs. When dealing with a high frequency sampler dataset, it is easy to store more than a billion of items. Because dealing with such a huge dataset in memory may easily cause an OutOfMemoryException issue, it is easy to see that sliding elaboration is the only design that can avoid memory issues altogether, with the ability to process in a parallel manner.

Keep in mind that LINQ queries against in-memory objects work with exactly the same implementation as the preceding code. Most LINQ methods, such as an altering method, or a filtering method, will internally execute in a sliding way. By executing a LINQ expression against another enumerator, such as our RetrieveHugeDataset method, we start a completely new world of programming in which the data flows between enumerator steps without having to being stored somewhere in memory in a fixed-length container.

A canonical example of such sliding elaboration also uses a source or target (or together) a stream-based class, as a FileStream or NetworkStream. The combination of all such sliding processor classes is infinite and greatly powerful.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.80.45