Years ago, when CPUs were very slow, the choice of the type of variable was actually an important choice. With modern managed programming languages, even the most primitive data types perform quite the same, but in some cases specific and different behaviors still exist regarding data type performance, in terms of throughput and resource usage.
The following is a sample application that is useful for checking data type speed in randomly generating and sum ten million data items of primitive values. Obviously, such sample code is not able to give any kind of absolute speed rating. It is a simple demonstration application that is useful for giving an idea of different data-type performance behavior:
//a random value generator var r = new Random(); //repeat 10 times the test to have an averaged value var stopwatchResults = new List<double>(); //a stopwatch for precision profiling var w = new Stopwatch(); w.Start(); //change type here for testing another data type var values = Enumerable.Range(0, 10000000).Select(i => (float)(r.NextDouble() * 10)) .ToArray(); w.Stop(); Console.WriteLine("Value array generated in {0:N0}ms", w.ElapsedMilliseconds); for (int j = 0; j < 10; j++) { w.Reset(); w.Start(); //change type here for testing another data type float result = 0; //sum all values foreach (var f in values) result += f; w.Stop(); Console.WriteLine("Result generated in {0:N0}ms", w.ElapsedMilliseconds); stopwatchResults.Add(w.ElapsedMilliseconds); } Console.WriteLine(" -> Result generated in {0:N0}ms avg", stopwatchResults.Average()); Console.ReadLine();
When executed, this application gives an average execution time. Here are some results per data type that were executed on my laptop, which has a quad-core Intel i7-4910MQ, running at 3.9Ghz in turbo mode.
Type |
Average result |
---|---|
Int32 |
37 ms |
Int16 |
37 ms |
Int64 |
37 ms |
Double |
37 ms |
Single |
37 ms |
Decimal |
328 ms |
As expected, all data types with a footprint of 32 or 64 bits executed at the same on my 64-bit CPU within the CLR execution environment, but the Decimal
data type, which is actually a 128 floating-point number, executed almost 10 times slower than all the other data types.
Although in most financial or mathematical computation the increased precision of a decimal is mandatory, this demonstrates how using a lower precision data type will really boost throughput and latency of any of our applications.
The
Decimal
datatype is able to drastically reduce rounding errors that often happen when using standard 64-bit or 32-bit (double or float) floating point data types, because most of the decimal memory footprint is used to increase precision instead of minimum/maximum numeric values. Visit the following link for more details: https://msdn.microsoft.com/en-us/library/system.decimal.aspx.
A special case is when dealing with arbitrary-sized data types, such as the BigInteger
of the System.Numerics
namespace (add reference to System.Numerics
assembly in order to use the related namespace). This structure can handle virtually any numeric signed integer value. The size of the structure in memory will grow together with the internal value numeric size, bringing an always-increasing resource usage and bad throughput times.
Usually, performance is not impacted by the numeric values of two (or multiple) variables when involved in a mathematical computation. This means that computing 10*10
or 10*500
costs the same with regards to CPU usage. When dealing with arbitrary sized typed variables, this assertion becomes false, because of the increased internal size of the data being computed, which brings a higher CPU usage at each value increase.
Let us see how a BigInteger
multiplication speed changes with the same multiplier:
Multiplier |
Average result |
Difference |
---|---|---|
10 |
332 ms | |
100 |
348 ms |
+ 5% |
1000 |
441 ms |
+ 26% |
10000 |
640 ms |
+ 45% |
100000 |
658 ms |
+ 3% |
Things change a lot here. The BigInteger
numeric structure performs in a similar way to the Decimal
type, when the number is actually a small value. Compared with other data types, this type always performs worse as the value increases, although only by a small amount, because of the intrinsic implementation of the type that internally contains an arbitrary amount of small integer values that compose the full value. A BigInteger
type has no bound limitations in the numeric value range.
Compared to a Decimal
type that has very high precision with a good numeric value range, the BigInteger
type has no precision (as an integer value) with no value range limitation. This differentiation simply states that we should only use the BigInteger
type when we definitely need storing and computing calculations against a huge numeric value, possibly with no known upper/lower numeric range boundaries.
This behavior should discourage users from using such a data type, except when it is absolutely necessary. Consider that an arbitrary size numeric structure is hard to persist on any relational database without strong customizations or by using serialization features, with the high costs of lot of data extraction whenever we need to read/write such a value.
Although not available by default within the CLR, when dealing with unmanaged code, often referred to as unsafe code, this old data type becomes available.
A native implementation for .NET is made in this link: http://csharp-half.sourceforge.net/.
This implementation, as expected, uses native unsafe code from C++. To enable unsafe coding within C#, we need to select the Allow unsafe code
flag within the Build pane of project property page.
Here, a 16-bit floating-point precision example is given with unsafe coding:
var doublePrecision = double.Parse("10.987654321"); Console.WriteLine("{0}", doublePrecision); var singlePrecision = float.Parse("10.987654321"); Console.WriteLine("{0}", singlePrecision); var halfPrecision = Half.Parse("10.987654321"); Console.WriteLine("{0}", halfPrecision); //result: 10.987654321 10.98765 10.98438
When you lose precision by using a smaller data type, you reduce the characteristic digit count (the number of digits different from zero) of the contained value. As seen, the 64-bit example (the first row) clearly shows more digits than the other two types.
Another great disadvantage of using small precision datatypes is that because of the numeric characterization reduction, it may be possible that a specific value won't exist. In this case, the nearest value available is used in place of the original value. This issue also exists in 32-bit and 64-bit floating points. But when working with a very tiny 16-bit value, such as the half-precision data type, this issue becomes more evident.
Regarding performance, maybe because of this specific implementation, performances are actually poor with high computation times that become a visible issue on big datasets.
Using a 16-bit floating-point data type is discouraged because it is unable to give any performance improvement, but only in case if we definitely need such a 16-bit data type, that may be because of some legacy-application integration.
18.218.171.212