Evaluating the performance of data types

Years ago, when CPUs were very slow, the choice of the type of variable was actually an important choice. With modern managed programming languages, even the most primitive data types perform quite the same, but in some cases specific and different behaviors still exist regarding data type performance, in terms of throughput and resource usage.

The following is a sample application that is useful for checking data type speed in randomly generating and sum ten million data items of primitive values. Obviously, such sample code is not able to give any kind of absolute speed rating. It is a simple demonstration application that is useful for giving an idea of different data-type performance behavior:

//a random value generator
var r = new Random();

//repeat 10 times the test to have an averaged value
var stopwatchResults = new List<double>();

//a stopwatch for precision profiling
var w = new Stopwatch();

w.Start();
//change type here for testing another data type
var values = Enumerable.Range(0, 10000000).Select(i => (float)(r.NextDouble() * 10))
    .ToArray();

w.Stop();
Console.WriteLine("Value array generated in {0:N0}ms", w.ElapsedMilliseconds);

for (int j = 0; j < 10; j++)
{
    w.Reset();
    w.Start();

    //change type here for testing another data type
    float result = 0;

    //sum all values
    foreach (var f in values)
        result += f;

    w.Stop();
    Console.WriteLine("Result generated in {0:N0}ms", w.ElapsedMilliseconds);
    stopwatchResults.Add(w.ElapsedMilliseconds);
}

Console.WriteLine("
-> Result generated in {0:N0}ms avg", stopwatchResults.Average());
Console.ReadLine();

When executed, this application gives an average execution time. Here are some results per data type that were executed on my laptop, which has a quad-core Intel i7-4910MQ, running at 3.9Ghz in turbo mode.

Note

Please note that the following values are only useful in relation to each other and they are not valid speed benchmark results.

Type

Average result

Int32

37 ms

Int16

37 ms

Int64

37 ms

Double

37 ms

Single

37 ms

Decimal

328 ms

As expected, all data types with a footprint of 32 or 64 bits executed at the same on my 64-bit CPU within the CLR execution environment, but the Decimal data type, which is actually a 128 floating-point number, executed almost 10 times slower than all the other data types.

Although in most financial or mathematical computation the increased precision of a decimal is mandatory, this demonstrates how using a lower precision data type will really boost throughput and latency of any of our applications.

The Decimal datatype is able to drastically reduce rounding errors that often happen when using standard 64-bit or 32-bit (double or float) floating point data types, because most of the decimal memory footprint is used to increase precision instead of minimum/maximum numeric values. Visit the following link for more details: https://msdn.microsoft.com/en-us/library/system.decimal.aspx.

BigInteger

A special case is when dealing with arbitrary-sized data types, such as the BigInteger of the System.Numerics namespace (add reference to System.Numerics assembly in order to use the related namespace). This structure can handle virtually any numeric signed integer value. The size of the structure in memory will grow together with the internal value numeric size, bringing an always-increasing resource usage and bad throughput times.

Usually, performance is not impacted by the numeric values of two (or multiple) variables when involved in a mathematical computation. This means that computing 10*10 or 10*500 costs the same with regards to CPU usage. When dealing with arbitrary sized typed variables, this assertion becomes false, because of the increased internal size of the data being computed, which brings a higher CPU usage at each value increase.

Let us see how a BigInteger multiplication speed changes with the same multiplier:

Multiplier

Average result

Difference

10

332 ms

 

100

348 ms

+ 5%

1000

441 ms

+ 26%

10000

640 ms

+ 45%

100000

658 ms

+ 3%

Things change a lot here. The BigInteger numeric structure performs in a similar way to the Decimal type, when the number is actually a small value. Compared with other data types, this type always performs worse as the value increases, although only by a small amount, because of the intrinsic implementation of the type that internally contains an arbitrary amount of small integer values that compose the full value. A BigInteger type has no bound limitations in the numeric value range.

Compared to a Decimal type that has very high precision with a good numeric value range, the BigInteger type has no precision (as an integer value) with no value range limitation. This differentiation simply states that we should only use the BigInteger type when we definitely need storing and computing calculations against a huge numeric value, possibly with no known upper/lower numeric range boundaries.

This behavior should discourage users from using such a data type, except when it is absolutely necessary. Consider that an arbitrary size numeric structure is hard to persist on any relational database without strong customizations or by using serialization features, with the high costs of lot of data extraction whenever we need to read/write such a value.

Half-precision data type

Although not available by default within the CLR, when dealing with unmanaged code, often referred to as unsafe code, this old data type becomes available.

A native implementation for .NET is made in this link: http://csharp-half.sourceforge.net/.

This implementation, as expected, uses native unsafe code from C++. To enable unsafe coding within C#, we need to select the Allow unsafe code flag within the Build pane of project property page.

Here, a 16-bit floating-point precision example is given with unsafe coding:

var doublePrecision = double.Parse("10.987654321");
Console.WriteLine("{0}", doublePrecision);

var singlePrecision = float.Parse("10.987654321");
Console.WriteLine("{0}", singlePrecision);

var halfPrecision = Half.Parse("10.987654321");
Console.WriteLine("{0}", halfPrecision);

//result:

10.987654321
10.98765
10.98438

When you lose precision by using a smaller data type, you reduce the characteristic digit count (the number of digits different from zero) of the contained value. As seen, the 64-bit example (the first row) clearly shows more digits than the other two types.

Another great disadvantage of using small precision datatypes is that because of the numeric characterization reduction, it may be possible that a specific value won't exist. In this case, the nearest value available is used in place of the original value. This issue also exists in 32-bit and 64-bit floating points. But when working with a very tiny 16-bit value, such as the half-precision data type, this issue becomes more evident.

Note

All floating-point values have a characteristic number and a 10-based multiplier. This means that the more characteristic numbers we use upside the decimal separator, the fewer will remain available downside the decimal separator. And the opposite is also true.

Regarding performance, maybe because of this specific implementation, performances are actually poor with high computation times that become a visible issue on big datasets.

Using a 16-bit floating-point data type is discouraged because it is unable to give any performance improvement, but only in case if we definitely need such a 16-bit data type, that may be because of some legacy-application integration.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.171.212