Chapter 6 Using the .NET Framework

The previous chapter discussed general .NET coding techniques and pitfalls, especially those related to language features. In this chapter, we discuss some of the issues you must consider when using the enormous library of code that ships with .NET. I cannot possibly discuss all of the various subsystems and classes that are part of the .NET Framework, but the purpose of this chapter is to give you the tools you need to do your own investigations into performance, and to be aware of common patterns that you may need to avoid.

The .NET Framework was written with an extremely broad audience in mind (all developers everywhere, really), and is meant to be a general-purpose framework, providing stable, correct, robust code that can handle many situations. As such, it does not emphasize raw performance, and you will find many things you will need to work around in the inner loops of your codebase.

To get around weaknesses in the .NET Framework, you may need to use some ingenuity. Some possible approaches are:

  • Use an alternate API with less cost
  • Redesign your application to not call the API as often
  • Re-implement some APIs in a more performant manner
  • Do an interop into a native API to accomplish the same thing (assuming the marshalling cost is less)
  1. Understand Every API you call

The guiding principle of this chapter is this:

You must understand the code executing behind every API call you make.

To say you have control of your performance is to assert that you know the code that executes in every critical path of your code. You should not have an opaque 3rd-party library at the center of your inner loop—that is ceding control.

You will not always have access to the source of every method you call down to the assembly level (you do always have access to the assembly level!), but there is usually good documentation for all Windows APIs. With .NET, you can use one of the many IL viewing tools out there to see what the Framework is doing. (This ease of inspection does not extend to the CLR itself, which is written largely in native code.)

Get used to examining Framework code for anything you are not familiar with. The more that performance is important to you, the more you need to question the implementation of APIs you do not own. Just remember to keep your pickiness proportionate to the need for speed.

What follows in this chapter is a discussion of a few general areas you should be concerned with as well as some specific, common classes every program will use.

  1. Multiple APIs for the Same Thing

You will occasionally run into a situation where you can choose among many APIs for accomplishing the same thing. A good example is XML parsing. There are at least 9 different ways to parse XML in .NET:

  • XmlTextReader
  • XmlValidatingReader
  • XDocument
  • XmlDocument
  • XPathNavigator
  • XPathDocument
  • LINQ-to-XML
  • DataContractSerializer
  • XmlSerializer

Which one you use depends on factors such as ease of use, productivity, suitability for the task, and performance. XmlTextReader is very fast, but it is forward-only and does no validation. XmlDocument is very convenient because it has a full object model loaded, but it is among the slowest options.

It is as true for XML parsing as it for other API choices: not all options will be equal, performance-wise. Some will be faster, but use more memory. Some will use very little memory, but may not allow certain operations. You will have to determine which features you need and measure the performance to determine which API provides the right balance of functionality vs. performance. You should prototype the options and profile them running on sample data.

  1. Collections

.NET provides over 21 built-in collection types, including concurrent and generic versions of many popular data structures. Most programs will only need to use a combination of these and you should rarely need to create your own.

Some collections still exist in the .NET Framework only for backward compatibility reasons and should never be used by new code. These include:

  • ArrayList
  • Hashtable
  • Queue
  • SortedList
  • Stack
  • ListDictionary
  • HybridDictionary

The reasons these should be avoided are casting and boxing. These collections store references to Object instances in them so you will always need to cast down to your actual object type.

The boxing problem is even more pernicious. Suppose you want to have an ArrayList of Int32 value types. Each value will be individually boxed and stored on the heap. Instead of iterating through a contiguous array of memory to access each integer, each array reference will require a pointer dereference, heap access (likely hurting locality), and then an unboxing operation to get at the inner value. This is horrible. Use a non-resizable array or one of the generic collection classes instead.

In the early versions of .NET there were some string-specific collections that are now obsolete because of the power of generics. Examples include NameValueCollection, OrderedDictionary, StringCollection, and StringDictionary. They do not necessarily have performance problems per se, but there is no need to even consider them unless you are using an existing API that requires them.

The simplest, and likely the most-used, collection is the humble Array. Arrays are ideal because they are compact, using a single contiguous block, which improves processor cache locality when accessing multiple elements. Accessing them is in constant time and copying them is fast. Resizing them, however, will mean allocating a new array and copying the old values into the new object. Many of the more complicated data structures are built on top of arrays.

Choosing which collections to use depends on many factors, including: semantic meaning in the APIs (push/pop, enqueue/dequeue, Add/Remove, etc.), underlying storage mechanism and cache locality, speed of various operations on the collection such as Add and Remove, and whether you need to synchronize access to the collection. All of these factors can greatly influence the performance of your program.

  1. Generic Collections

The generic collection classes are

  • Dictionary<TKey, TValue>
  • HashSet<T>
  • LinkedList<T>
  • List<T>
  • Queue<T>
  • SortedDictionary<TKey, TValue>
  • SortedList<TKey, TValue>
  • SortedSet<T>
  • Stack<T>

These deprecate all of the non-generic versions and should always be preferred. They incur no boxing or casting costs and will have better memory locality for the most part (especially for the List-style structures that are implemented using arrays).

Within this set, though, there can be very large performance differences. For example, Dictionary, SortedDictionary, and SortedList all store key-value relationships, but have very different insertion and lookup characteristics.

  • Dictionary is implemented as a hash table and has O(1) insertion and retrieval times. See Appendix B for a discussion of Big O notation if you are not familiar with this.
  • SortedDictionary is implemented as a binary search tree and has O(log n) insertion and retrieval times.
  • SortedList is implemented as a sorted array. It has O(log n) retrieval times, but can have O(n) insertion times in the worst case. If you insert random elements it will need to resize frequently and move the existing elements. It is ideal if you insert all of the elements in order, and then use it for fast lookups.

Of the three, SortedList has the smallest memory requirements because it uses arrays. The other two will have much more random memory access, but can guarantee better insertion times on average. Which one of these you use depends greatly on your application’s requirements.

The difference between HashSet and SortedSet is similar to the difference between Dictionary and SortedDictionary.

  • HashSet uses a hash table and has O(1) insertion and removal operations.
  • SortedSet uses a binary search tree and has O(log n) insertion and removal operations.

List, Stack, and Queue all use arrays internally and thus have good locality of reference for efficient operations on many values, however when adding a lot of values, they will resize these internal arrays as needed. To avoid wasteful resizing and the CPU and GC overhead it causes, if you know the size beforehand, you should always pre-allocate the needed space by passing a capacity value via the constructor or changing the collection’s Capacity property. List has O(1) insertion, but O(n) removal and searching. Stack and Queue can only add or remove from one end of the collection so have O(1) time in all operations.

LinkedList has O(1) insertion and removal characteristics, but it should be avoided for primitive types because it will allocate a new LinkedListNode object for every item you add, which can be wasteful overhead.

  1. Concurrent Collections

See Chapter 4 for a discussion of concurrency in general, which must inform your use of the concurrent collection classes.

They are all located in the System.Collections.Concurrent namespace and are all defined for use with generics:

  • ConcurrentBag<T> (A bag is similar to a set, but it allows duplicates)
  • ConcurrentDictionary<TKey, TValue>
  • ConccurentQueue<T>
  • ConcurrentStack<T>

Most of these are implemented internally using Interlocked or Monitor synchronization primitives. You can and should examine their implementations using an IL reflection tool.

Pay attention to the APIs for insertion and removal of items from these collections. They all have Try methods which can fail to accomplish the operation in the case another thread beat them to it and there is now a conflict. For example, ConcurrentStack has a TryPop method which returns a Boolean value indicating whether it was able to pop a value. If another thread pops the last value, the current thread’s TryPop will return false.

ConcurrentDictionary has a few methods which deserve special attention. You can call TryAdd to add a key and value to the dictionary, or TryUpdate to update an existing value. Often, you will not care whether it is already in the collection and want to add or update it—it does not matter. For this, there is the AddOrUpdate method which does exactly that, but rather than having you provide the new value directly, you instead need to pass two delegates: one for add and one for update. If the key does not exist, the first delegate will be called with the key and you will need to return a value. If the key does exist, the second delegate is called with the key and existing value and you need to return a new value (which could just be the existing value).

In either case, the AddOrUpdate method will return to you the new value—but it is important to realize that this new value may not be the value from the current thread’s AddOrUpdate call! These methods are thread safe, but not atomic. It is possible another thread calls this method with the same key and the first thread will return the value from the second thread.

There is also an overload of the method that does not have a delegate for the add case (you just pass in a value).

A simple example will be helpful:

dict.AddOrUpdate(
// Key I'm trying to add
0,
// Delegate to call when adding--return string value based on the key
key => key.ToString(),
// Delegate to call when already present -- update existing value
(key, existingValue) => existingValue);

dict.AddOrUpdate(
// Key I'm trying to add
0,
// Value to add if new
"0",
// Delegate to call when already present--update existing value
(key, existingValue) => existingValue);

The reason for having these delegates rather than just passing in the value is that in many cases generating the value for a given key is a very expensive operation and you do not want two threads to do it simultaneously. The delegate gives you a chance to just use the existing value instead of regenerating a new copy. However, note that there is no guarantee that the delegates are called only once. Also, if you need to provide synchronization around the value creation or update, you need to add that synchronization in the delegates themselves—the collection will not do it for you.

Related to AddOrUpdate is the GetOrAdd method which has almost identical behavior.

string val1 = dict.GetOrAdd(
// The key to retrieve
0,
// A delegate to generate the value if not present
k => k.ToString());

string val2 = dict.GetOrAdd(
// The key to retrieve
0,
// The value to add if not present
"0");

The lesson here is to be careful when using concurrent collections. They have special requirements and behaviors in order to guarantee safety and efficiency, and you need to understand exactly how they are used in the context of your program to use them correctly and effectively.

  1. Other Collections

There are a handful of other specialized collections that ship with .NET, but most of them are string-specific or store Objects so can safely be ignored. Notable exceptions are BitArray and BitVector32.

BitArray represents an array of bit values. You can set individual bits and perform Boolean logic on the array as a whole. If you need only 32 bits of data, though, use BitVector32 which is faster and has less overhead because it is a struct (it is little more than wrapper around an Int32).

  1. Creating Your Own Collection Types

I have rarely had the need to create my own collection types from scratch, but the need does occasionally arise. If the built-in types do not have the right semantics for you, then definitely create your own as an appropriate abstraction. When doing so, follow these general guidelines:

  1. Implement the standard collection interfaces wherever they make sense (IEnumerable<T>, ICollection<T>, IList<T>, IDictionary<TKey, TValue>).
  2. Consider how the collection will be used when deciding how to store the data internally.
  3. Pay attention to things like locality-of-reference and favor arrays if sequential access is common.
  4. Do you need to add synchronization into the collection itself? Or perhaps create a concurrent version of the collection?
  5. Understand the run-time complexity of the add, insert, update, find, and remove algorithms. See Appendix A for a discussion of Big O complexity.
  6. Implement APIs that make semantic sense, e.g. Pop for stacks, Dequeue for queues.
  1. Strings

In .NET, strings are immutable. Once created, they exist forever in that state until garbage collected. This means that any modification of a string results in creation of a new string. Fast, efficient programs generally do not modify strings in any way. Think about it: strings represent textual data, which is largely for human consumption. Unless your program is specifically for displaying or processing text, strings should be treated like opaque data blobs as much as possible. If you have the choice, always prefer non-string representations of data.

  1. String Comparisons

As with so many things in performance optimization, the best string comparison is the one that does not happen at all. If you can get away with it, use enums, or some other numeric data for decision-making. If you must use strings, keep them short and use the simplest alphabet possible.

There are many ways to compare strings: by pure byte value, using the current culture, with case insensitivity, etc. You should use the simplest way possible. For example:

String.Compare(a, b, StringComparison.OrdinalIgnoreCase);

is faster than

String.Compare(a, b, StringComparison.Ordinal);

which is faster than

String.Compare(a, b, StringComparison.CurrentCulture);

If you are processing computer-generated strings, such as configuration settings or some other tightly coupled interface, then ordinal comparisons with case sensitivity are all you need.

All string comparisons should use method overloads that includes an explicit StringComparison enumeration. Omitting this should be considered an error.

Finally, String.Equals is a special case of String.Compare and should be used when you do not care about sort order. It is not actually faster in many cases, but it conveys the intent of your code better.

  1. ToLower, ToUpper

Avoid calling methods like ToLower and ToUpper, especially if you are doing this for string comparison purposes. Instead, use one of the IgnoreCase options for the String.Compare method.

There is a bit of a tradeoff, but not much of one. On the one hand, doing a case-sensitive string comparison is faster, but this still does not justify the use of ToUpper or ToLower, which are guaranteed to process every character, where a comparison might not need to. It also creates a new string, allocating memory and putting more pressure on the garbage collector.

Just avoid this.

  1. Concatenation

For simple concatenation of a known (at compile time) quantity of strings, just use the ‘+’ operator or the String.Concat method. This is usually more efficient than using a StringBuilder.

string result = a + b + c + d + e + f;

Do not consider StringBuilder until the number of strings is variable and likely larger than a few dozen. The compiler will optimize simple string concatenation in a way to lessen the memory overhead.

  1. String Formatting

String.Format is an expensive method. Do not use it unless necessary. Avoid it for simple situations like this:

string message = String.Format("The file {0} was {1} successfully.",
filename, operation);

Instead, just do some simple concatenation:

string message = "The file " + filename + " was " + operation + "
successfully";

Reserve use of String.Format for cases where performance does not matter or the format specification is more complex (like specifying how many decimals to use for a double value).

  1. ToString

Be wary of calling ToString for many classes. If you are lucky, it will return a string that already exists. Other classes will cache the string once generated. For example, the IPAddress class caches its string, but it has an extremely expensive string generation process that involves StringBuilder, formatting, and boxing. Other types may create a new string every time you call it. This can be very wasteful for the CPU and also impact the frequency of garbage collections.

When designing your own classes, consider the scenarios your class’s ToString method will be called in. If it is called often, ensure that you are generating the string as rarely as possible. If it is only a debug helper, then it likely does not matter what it does.

  1. Avoid String Parsing

If you can, reserve string parsing for offline processing or for during startup only. String processing is often CPU-intensive, repetitive, and memory-heavy—all things to avoid.

  1. Avoid APIs that Throw Exceptions under Normal Circumstances

Exceptions are expensive, as you saw in Chapter 5. As such, they should be reserved for truly exceptional circumstances. Unfortunately, there are some common APIs which defy this basic assumption.

Most basic data types have a Parse method, which will throw a FormatException when the input string is in an unrecognized format. For example, Int32.Parse, DateTime.Parse, etc. Unless your program should exit completely when a parsing error occurs, avoid these methods in favor of TryParse, which will return a bool if parsing fails.

Another example is the System.Net.HttpWebRequest class, which will throw an exception if it receives a non-200 response from a server. This bizarre behavior is thankfully corrected in the System.Net.Http.HttpClient class in .NET 4.5.

  1. Avoid APIs That Allocate From the Large Object Heap

The only way you can do this is by profiling heap allocations using PerfView, which will show the stacks allocating memory like this. Just be aware that there are some .NET APIs that will do this. For example, calling the Process.GetProcesses method will guarantee an allocation on the large object heap. You can avoid this by caching its results, calling it less frequently, or retrieving the information you need via interop directly into the Win32 API.

  1. Use Lazy Initialization

If your program uses a large or expensive-to-create object that is rarely used, or may not be used at all during a given invocation of the program, you can use the Lazy<T> class to wrap a lazy initializer around it. As soon as the Value property is accessed, the real object will be initialized according to the constructor you used to create the Lazy<T> object.

If your object has a default constructor, you can use the simplest version of Lazy<T>:

var lazyObject = new Lazy<MyExpensiveObject>();
...
if (needRealObject)
{
MyExpensiveObject realObject = lazyObject.Value;
...
}

If construction is more complex, you can pass a Func<T> to the constructor.

var myObject = new Lazy<MyExpensiveObject>(() => Factory.CreateObject("A"));
...
MyExpensiveObject realObject = myObject.Value

Factory.CreateObject is just a dummy method that produces MyExpensiveObject.

If myObject.Value is accessed from multiple threads, it is very possible that each thread will want to initialize the object. By default, Lazy<T> is completely thread safe and only a single thread will be allowed to execute the creation delegate and set the Value property. You can modify this with a LazyThreadSafetyMode enumeration. This enumeration has three values:

  • None—No thread safety. If important, you must ensure that the Lazy<T> object is accessed via a single thread in this case.
  • ExecutionAndPublication—Only a single thread is allowed to execute the creation delegate and set the Value property.
  • PublicationOnly—Multiple threads can execute the creation delegate, but only a single one will initialize the Value property.

You should use Lazy<T> in place of your own singleton and double-checked locking pattern implementations.

If you have a large number of objects and Lazy<T> is too much overhead to use, you can use the static EnsureInitialized method on the LazyInitializer class. This uses Interlocked methods to ensure that the object reference is only assigned to once, but it does not ensure that the creation delegate is called only once. Unlike Lazy<T>, you must call the EnsureInitialized method yourself.

static MyObject[] objects = new MyObject[1024];

static void EnsureInitialized(int index)
{
LazyInitializer.EnsureInitialized(ref objects[index],
() => ExpensiveCreationMethod(index));
}

  1. The Surprisingly High Cost of Enum

You probably do not expect methods that operate on Enums, a fundamentally integer type, to be very expensive. Unfortunately, because of the requirements of type safety, simple operations are more expensive than you realize.

Take the Enum.HasFlag method, for example. You likely imagine the implementation to be something like the following:

public static bool HasFlag(Enum value, Enum flag)
{
return (value & flag) != 0;
}

Unfortunately, what you actually get is something similar to:

// C# code generated by ILSpy
public bool HasFlag(Enum flag)
{
if (flag == null)
{
throw new ArgumentNullException("flag");
}
if (!base.GetType().IsEquivalentTo(flag.GetType()))
{
throw new ArgumentException(“Enum types do not match”,
new object[]
{
flag.GetType(),
base.GetType()
}));
}
return this.InternalHasFlag(flag);
}

This is a good example of the side effects of using a general purpose framework. If you control your entire code base, then you can do better, performance-wise. If you find you need to do a HasFlag test a lot, then do the check yourself:

[Flags]
enum Options
{
Read = 0x01,
Write = 0x02,
Delete = 0x04
}

...

private static bool HasFlag(Options option, Options flag)
{
return (option & flag) != 0;
}

Enum.ToString is also quite expensive for enums that have the [Flags] attribute. One option is to cache all of the ToString calls for that Enum type in a simple Dictionary. Or you can avoid writing these strings at all and get much better performance just using the actual numeric value and convert to strings offline.

For a fun exercise, see how much code is invoked when you call Enum.IsDefined. Again, the existing implementation is perfectly fine if raw performance does not matter, but you will be horrified if you find out it is a real bottleneck for you!

Story I found out about the performance problems of Enum the hard way, after a release. During a regular CPU profile I noticed that a significant portion of CPU, over 3%, was going to just Enum.HasFlag and Enum.ToString. Excising all calls to HasFlag and using a Dictionary for the cached strings reduced the overhead to negligible amounts.

  1. Tracking Time

Time means two things:

  • Absolute time of day
  • Time span (how long something took)

For absolute times, .NET supplies the versatile DateTime structure. However, calling DateTime.Now is a fairly expensive operation because it has to consider time zone information. Consider calling DateTime.UtcNow instead, which is more streamlined.

Even calling DateTime.UtcNow might be too expensive for you if you need to track a lot of time stamps. If that is the case, get the time once and then track offsets instead, rebuilding the absolute time offline, using the time span measuring techniques showed next.

To measure time intervals, .NET provides the TimeSpan struct. If you subtract two DateTime structs you will get a TimeSpan struct. However, if you need to measure very small time spans with minimal overhead, you must use the system’s performance counter, via System.Diagnostics.Stopwatch, instead, which will return to you a 64-bit number measuring the number of clock ticks since the CPU received power. To calculate the real time difference you take two measurements of the clock tick, subtract them, and divide by the system’s clock tick count frequency. Note that this frequency is not necessarily related to the CPU’s frequency. Most modern processors change their CPU frequency often, but the tick frequency will not be affected.

You can use the Stopwatch class like this:

var stopwatch = Stopwatch.StartNew();
...do work...
stopwatch.Stop();
TimeSpan
elapsed = stopwatch.Elapsed;
long elapsedTicks = stopwatch.ElapsedTicks;

There are also static methods to get a time stamp and the clock frequency, which may be more convenient if you are tracking a lot of time stamps and want to avoid the overhead of creating a new Stopwatch object for every interval.

long receiveTime = Stopwatch.GetTimestamp();
long parseTime = Stopwatch
.GetTimestamp();
long startTime = Stopwatch
.GetTimestamp();
long endTime = Stopwatch
.GetTimestamp();

double totalTimeSeconds = (endTime - receiveTime) /
Stopwatch.Frequency;

Finally, please remember that values received from the Stopwatch.GetTimestamp method are only valid in the current executing session and only for calculating relative time differences.

Combining the two types of time, you can see how to calculate offsets from a base DateTime object to get new absolute times:

DateTime start = DateTime.Now;
long startTime = Stopwatch
.GetTimestamp();
long endTime = Stopwatch
.GetTimestamp();

double diffSeconds = (endTime - startTime) / Stopwatch
.Frequency;
DateTime
end = start.AddSeconds(diffSeconds);

  1. Regular Expressions

Regular expressions are not fast. The costs include:

  • Assembly generation—With some options, an in-memory assembly is generated on the fly when you create a Regex object. This helps with the runtime performance, but is expensive to create the first time.
  • JIT costs can be high—The code generated from a regular expression can be very long and have patterns that give the jitter fits. Thankfully, the most recent changes to the CLR have gone a long way to fix this, especially for 64-bit processes. See http://www.writinghighperf.net/go/27 for more information.
  • Evaluation time can be long—This depends on the input text and the pattern to match. It is quite easy to write regular expressions that perform poorly and optimizing them in and of themselves is a whole topic unto itself.

There are a few things you can do to improve Regex performance:

  • Ensure you are up-to-date with .NET and patches.
  • Create a Regex instance variable rather than using the static methods.
  • Create the Regex object with the RegexOptions.Compiled flag.
  • Do not recreate Regex objects over and over again. Create one, save it, and reuse it to match on new input strings.
  1. LINQ

The biggest danger with Language Integrated Query (LINQ) is that it has the potential to hide code from you—code for which you cannot be accountable because it is not present in your source file!

LINQ is phenomenally convenient at times, and many LINQ queries are perfectly performant, but it can make heavy use of delegates, interfaces, and temporary object allocation if you go crazy with temporary dynamic objects, joins, or complex where clauses.

You can often achieve some significant speedup in time by using Parallel LINQ, but keep in mind this is not actually reducing the amount of work to be done; it is just spreading it across multiple processors. For a mostly single-threaded application that just wants to use reduce the time it takes to execute a LINQ query, this may be perfectly acceptable. On the other hand, if you are writing a server that is already using all of the cores to perform processing, then spreading LINQ across those same processors will not help the big picture and may even hurt it. In this case, it may be better to do without LINQ at all and find something more efficient.

If you suspect that you have some unaccounted-for complexity, run PerfView and look at the JITStats view to see the IL sizes and JIT times for methods that involve LINQ. Also look at the CPU usage of those methods once JITted.

  1. Reading Files

There are a number of convenience methods on the File class such as Open, OpenRead, OpenText, and OpenWrite. These are fine if performance is not critical.

If you are doing a lot of disk I/O, then you need to pay attention to the type of disk access you are doing, whether it is random, sequential, or if you need to ensure that the write has been physically written to the platter before notifying the application of I/O completion. For this level of detail, you will need to use the FileStream class and a constructor overload that accepts the FileOptions enumeration. You can logically OR multiple flags together, but not all combinations are valid. None of these options are required, but they can provide hints to the operating system or file system on how to optimize file access.

using (var stream = new FileStream(
@"C:foo.txt",
FileMode.Open,
FileAccess.Read,
FileShare.Read,
16384 /* Buffer Size*/,
FileOptions.SequentialScan | FileOptions.Encrypted))
{
...
}

The options available to you are:

  • AsynchronousIndicates that you will be doing asynchronous reading or writing to the file. This is not required to actually perform asynchronous reads and writes, but if you do not specify it then, while your threads will not be blocked, the underlying I/O is performed synchronously without I/O completion ports. There are also overrides of the FileStream constructor that will take a Boolean parameter to specify asynchronous access.
  • DeleteOnClose—Causes the OS to delete the file when the last handle to the file is closed. Use this for temporary files.
  • Encrypted—Causes the file to be encrypted using the current account’s credentials.
  • RandomAccess—Gives a hint to the file system to optimize caching for random access.
  • SequentialAccess—Gives a hint to the file system that the file is going to be read sequentially from beginning to end.
  • WriteThrough—Ignore caching and go directly to the disk. This generally makes I/O slower. The flag will be obeyed by the file system’s cache, but many storage devices also have onboard caches, and they are free to ignore this flag and report a successful completion before it is written to permanent storage.

Random access is bad for any device, such as a hard disk or tape, that needs to seek to the required position. Sequential access should be preferred for performance reasons.

  1. Optimize HTTP Settings and Network Communication

If your application makes outbound HTTP calls, there are a number of settings you can change to optimize network transmission. You should exercise caution in changing these, however, as their effectiveness greatly depends on your network topology and the servers on the other end of the connection. You also need to take into account whether the target endpoints are in a data center you control, or are somewhere on the Internet. You will need to measure carefully to see if these settings benefit you or not.

To change these by default for all endpoints, modify these static properties on the ServicePointManager class:

  • DefaultConnectionLimit—The number of connections per end point. Setting this higher may increase overall throughput if the network links and both endpoints can handle it.
  • Expect100Continue—When a client initiates a POST or PUT command it normally waits for a 100-Continue signal from the server before proceeding to send the data. This allows the server to reject the request before the data is sent, saving bandwidth. If you control both endpoints and this situation does not apply to you, turn this off to improve latency.
  • ReceiveBufferSizeThe size of the buffer used for receiving requests. The default is 8 KB. You can use a larger buffer if you regularly get large requests.
  • SupportsPipelining—Allows multiple requests to be sent without waiting for a response between each one. However, the responses are sent back in order. See RFC 2616 (the HTTP/1.1 standard) at http://www.writinghighperf.net/go/28 for more information.
  • UseNagleAlgorithm—Nagling, described in RFC 896 at http://www.writinghighperf.net/go/29 is a way to reduce the overhead of packets on a network by combining many small packets into a single larger packet. This can be beneficial by reducing overall network transmission overhead, but it can also cause packets to be delayed. On modern networks, this value should usually be off. You can experiment with turning this off and see if there is a reduction in response times.

All of these settings can also be applied independently to individual ServicePoint objects, which can be useful if you want to customize settings by endpoint, perhaps to differentiate between local datacenter endpoints and those on the Internet. In addition to the above, the ServicePoint class also lets you control some additional parameters:

  • ConnectionLeaseTimeout—Specifies the maximum time in milliseconds that an active connection will be kept alive. Set this to -1 to keep connections alive forever. This setting is useful for load balancing, where you will want to periodically force connections to close so they connect to other machines. Setting this value to 0 will cause the connection to close after every request. This is not recommended because making a new HTTP connection is fairly expensive.
  • MaxIdleTime—Specifies the maximum time in milliseconds that a connection can remain open but idle. Set this to Timeout.Infinite to keep connections open indefinitely, regardless of whether they are active or not.
  • ConnectionLimitSpecifies the maximum number of connections this endpoint can have.

You can also force an individual HTTP request to close its current connection (after the response has been sent back) by setting the KeepAlive header to false.

Story Ensure that what you are transmitting is optimally encoded. While profiling an internal system, we noticed an extremely high memory allocation rate and CPU usage for a particular component. With some investigation, we realized that it was receiving an HTTP response, transforming the received bytes into a base64-encoded string, decoding that string into a binary blob, and then finally deserializing that blob back into a strongly typed object. It was wasting bandwidth by needlessly encoding a binary blob as a string, and wasting our CPU with multiple layers of encoding, and finally it was causing more time spent in GC with multiple large object allocations. The lesson is to send only what you need, as compactly as possible. Base64 is rarely, if ever, useful today, especially among internal components. Regardless of whether you are doing file or network I/O, encode the data as ideally as possible. For example, if you need to read a series of integers, do not waste CPU, memory, disk space, and network bandwidth wrapping that in XML.

Finally, another word of caution relating to the principle highlighted at the top of this chapter about the general purpose of much of the .NET Framework. The built-in HTTP client, while generally very good and perfectly acceptable for downloading Internet content, may not be suitable for all applications, particularly if your application is very sensitive to latencies at high percentiles, especially with intra-datacenter requests. If you care about 95th or 99th percentile latencies for HTTP requests, you may have to write your own HTTP client around the underlying WinHTTP APIs to get that last bit of performance. Doing this correctly takes quite a bit of expertise in both HTTP and multithreading in .NET to get right, so you need to justify the effort.

  1. Reflection

Reflection is the process of loading a .NET assembly dynamically during runtime and manually loading, examining, or even executing the types located therein. This is not a fast process under any circumstance.

To demonstrate how reflection generally works in this scenario, here is some simple code from the ReflectionExe sample project that loads an “extension” assembly dynamically:

var assembly = Assembly.Load(extensionFile);

var types = assembly.GetTypes();
Type extensionType = null;
foreach (var type in types)
{
var interfaceType = type.GetInterface("IExtension");
if (interfaceType != null)
{
extensionType = type;
break;
}
}

object extensionObject = null;
if (extensionType != null)
{
extensionObject = Activator.CreateInstance
(extensionType);
}

At this point, there are two options we can follow to execute the code in our extension. To stay with pure reflection, we can retrieve the MethodInfo object for the method we want to execute and then invoke it:

MethodInfo executeMethod = extensionType.GetMethod("Execute");
executeMethod.Invoke(extensionObject, new object[] { 1, 2 });

This is painfully slow, about 100 times slower than casting the object to an interface and executing it directly:

IExtension extensionViaInterface = extensionObject as IExtension;
extensionViaInterface.Execute(1, 2);

If you can, you always want to execute your code this way rather than relying on the raw MethodInfo.Invoke technique. If a common interface is not possible, then see Chapter 5’s section on generating code to execute dynamically loaded assemblies much faster than reflection.

  1. Measurement

Many of the techniques for finding issues with .NET Framework performance are exactly the same as with your own code. When you use tools to profile CPU usage, memory allocations, exceptions, contention, and more, you will see the hotspots in the framework just like you see them in your own code.

Note that PerfView will group much of the framework together and you may need to change these view settings to get a better picture of where Framework performance is going.

  1. Performance Counters

.NET has many categories of performance counters. Chapters 2 through 4, which cover garbage collection, JIT compilation, and asynchronous programming, all detail the performance counters for their specific topic. .NET has additional performance counters for the following categories:

  • .NET CLR Data—Counters relating to SQL clients, connection pools, and commands
  • .NET CLR Exceptions—Counters relating to rate of exceptions thrown
  • .NET CLR Interop—Counters relating to calling native code from managed
  • .NET CLR Networking—Counters relating to connections and amount of data transmitted
  • .NET CLR Remoting—Counters relating to the number of remote calls, object allocations, channels, and more
  • .NET CLR Data Provider for SqlServer/Oracle—Counters for various .NET database clients

Depending on your system’s configuration you may see more or less than these.

  1. Summary

As with all frameworks, you need to understand the implementation details of all the APIs you use. Do not take anything for granted.

Take care when picking collection classes. Consider API semantics, memory locality, algorithmic complexity, and space usage when choosing a collection. Completely avoid the older-style non-generic collections like ArrayList and HashTable. Use concurrent collections only when you need to synchronize most or all of the accesses.

Pay particular attention to string usage and avoid creating extra strings.

Avoid APIs that throw exceptions in normal circumstances, allocate from the large object heap, or have more expensive implementations than you expect.

When using regular expressions, make sure that you do not recreate the same Regex objects over and over again, and strongly consider compiling them with the RegexOptions.Compiled flag.

Pay attention to the type of I/O you are doing and use the appropriate flags when opening files to give the OS a chance to optimize performance for you. For network calls, disable Nagling and Expect100Continue. Only transmit the data you need and avoid unnecessary layers of encoding.

Avoid using reflection APIs to execute dynamically loaded code. Call this kind of code via common interfaces or through code generated delegates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.186.79