Chapter 2. Coding Algorithms

2.1 Overview

We code every day, thinking about the problem we’re solving and ensuring that our algorithms work correctly. This is how it should be and modern tools and SDKs increasingly free our time to do just that. Even so, there are features of C#, .NET, and coding in general that have significant effects on efficiency, performance, and maintainability.

Performance

A few subjects in this chapter discuss application performance, such as the efficient handling of strings, caching data, or delaying the instantiation of a type until you need it. In some simple scenarios, these things might not matter. However, in complex enterprise apps that need the performance and scale, keeping an eye on these techniques can help avoid expensive problems in production.

Maintainability

How you organize code can significantly affect its maintainability. Building on the discussions in Chapter 1, you’ll see a new pattern, Strategy, and how it can help simplify an algorithm and make an app more extensible. Another section discusses using recursion for naturally occurring hierarchical data. Collecting these techniques and thinking about the best way to approach an algorithm can make a significant difference in the maintainability and quality of code.

Mindset

A couple sections of this chapter might be interesting in specific contexts - different ways to think about solving problems. You might not use regular expressions every day, but they’re very useful when you need them. Another section, on converting to/from Unix time, looks into the future of .NET as a cross-platform language; knowing that we need a certain mindset to think about designing algorithms in an environment we might not have ever considered in the past.

2.2 2.1 Processing strings Efficiently

Problem

A profiler indicates a problem in part of your code that builds a large string iteratively and you need to improve performance.

Solution

Here’s an InvoiceItem class we’ll be working with:

class InvoiceItem
{
    public decimal Cost { get; set; }
    public string Description { get; set; }
}

This method produces sample data for the demo:

static List<InvoiceItem> GetInvoiceItems()
{
    var items = new List<InvoiceItem>();
    var rand = new Random();
    for (int i = 0; i < 100; i++)
        items.Add(
            new InvoiceItem
            {
                Cost = rand.Next(i),
                Description = "Invoice Item #" + (i+1)
            });

    return items;
}

There are two methods for working with strings. First, the inefficient method:

static string DoStringConcatenation(List<InvoiceItem> lineItems)
{
    string report = "";

    foreach (var item in lineItems)
        report += $"{item.Cost:C} - {item.Description}";

    return report;
}

Next is the more efficient method:

static string DoStringBuilderConcatenation(List<InvoiceItem> lineItems)
{
    var reportBuilder = new StringBuilder();

    foreach (var item in lineItems)
        reportBuilder.Append($"{item.Cost:C} - {item.Description}");

    return reportBuilder.ToString();
}

The Main method ties all of this together:

static void Main(string[] args)
{
    List<InvoiceItem> lineItems = GetInvoiceItems();

    DoStringConcatenation(lineItems);

    DoStringBuilderConcatenation(lineItems);
}

Discussion

There are different reasons why we need to gather data into a longer string. Reports, whether text based or formatted via HTML or other markup, require combining text strings. Sometimes we add items to an email or manually build PDF content as an email attachment. Other times we might need to export data in a non-standard format for legacy systems. Too often, developers use string concatenation when StringBuilder is the superior choice.

String concatenation is intuitive and quick to code, which is why so many people do it. However, concatenating strings can also kill application performance. The problem occurs because each concatenation performs expensive memory allocations. Let’s examine both the wrong way to build strings and the right way.

The logic in the DoStringConcatenation method extracts Cost and Description from each InvoiceItem and concatenates that to a growing string. Concatenating just a few strings might go unnoticed. However, imagine if this was 25, 50, or 100 lines or more. Using string concatenation as an example, Section 3.10 shows how string concatenation is an exponentially time intensive operation that destroys application performance.

Note

When concatenating within the same expression, e.g. string1 + string2, the C# compiler can optimize the code. It’s the loop with concatenation that causes the huge performance hit.

The DoStringBuilderConcatenation method fixes this problem. It uses the StringBuilder, which is in the System.Text namespace. It uses the Builder pattern, described in section 1.10, where each AppendText adds the new string to the StringBuilder instance, reportsBuilder. Before returning, the method calls ToString to convert the StringBuilder contents to a string.

Tip

As a rule of thumb, once you’ve gone past 4 string concatenations, you’ll receive better performance by using StringBuilder.

Fortunately, the .NET ecosystem has many .NET Framework libraries and 3rd party libraries that help with forming strings of common format. You should use one of these libraries whenever possible because they’re often optimized for performance and will save time and make the code easier to read. To give you an idea, here are a few libraries to consider for common formats:

Data Format | Library

JSON .NET 5 | System.Text.Json JSON ⇐ .NET 4.x | Json.NET XML | LINQ to XML CSV | LINQ to CSV HTML | System.Web.UI.HtmlTextWriter PDF | Various Commercial and Open Source Providers Excel | Various Commercial and Open Source Providers

One more thought - Custom search and filtering panels are common to give users a simple way to query corporate data. Too frequently, developers use string concatenation to build SQL queries. While string concatenation is easier, beyond performance, the problem with that is security. String concatenated SQL statements open the opportunity for SQL Injection attack. In this case, StringBuilder isn’t a solution. Instead, you should use a data library that parameterizes user input to circumvent SQL injection. There’s ADO.NET, LINQ Providers, and other 3rd party data libraries that do input value parameterization for you. For dynamic queries, using a data library might be harder, but it is possible. You might want to seriously consider using LINQ, which I discuss in Chapter 4.

See Also

Section 1.10 Building a Fluid Interface Section 3.10 Measuring Performance Chapter 4 Querying with LINQ

2.3 2.2 Simplifying Instance Cleanup

Problem

Old using statements cause unnecessary nesting and you want to clean up and simplify code.

Solution

This program has using statements for reading and writing to a text file:

class Program
{
    const string FileName = "Invoice.txt";

    static void Main(string[] args)
    {
        Console.WriteLine(
            "Invoice App
" +
            "-----------
");

        WriteDetails();

        ReadDetails();
    }

    static void WriteDetails()
    {

        using var writer = new StreamWriter(FileName);

        Console.WriteLine("Type details and press [Enter] to end.
");

        string detail = string.Empty;
        do
        {
            Console.Write("Detail: ");
            detail = Console.ReadLine();
            writer.WriteLine(detail);
        }
        while (!string.IsNullOrWhiteSpace(detail));
    }

    static void ReadDetails()
    {
        Console.WriteLine("
Invoice Details:
");

        using var reader = new StreamReader(FileName);

        string detail = string.Empty;
        do
        {
            detail = reader.ReadLine();
            Console.WriteLine(detail);
        }
        while (!string.IsNullOrWhiteSpace(detail));
    }
}

Discussion

Before C# 8, using statement syntax required parenthesis for IDisposable object instantiation and an enclosing block. During runtime, when the program reached the closing block, it would call Dispose on the instantiated object. If you needed multiple using statements to operate at the same time, developers would often nest them, resulting in extra space in addition to normal statement nesting. This pattern was enough of an annoyance to some developers that Microsoft added a feature to the language to simplify using statements.

In the solution, you can see a couple places where the new using statement syntax occurs: instantiating the StreamWriter in WriteDetails and instantiating the StreamReader in ReadDetails. In both cases, the using statement is on a single line. Gone are the parenthesis and curly braces and each statement terminates with a semi-colon.

The scope of the new using statement is its enclosing block, calling the using object’s Dispose method when execution reaches the end of the enclosing block. In the solution, the enclosing block is the method, which causes each using object’s Dispose method to be called at the end of the method.

What’s different about the single line using statement is that it will work with both IDisposable objects and objects that implement a disposable pattern. In this context, a disposable pattern means that the object doesn’t implement IDisposable, yet it has a parameterless Dispose method.

See Also

Section 1.1 Managing Object End-of-Lifetime

2.4 2.3 Keeping Logic Local

Problem

An algorithm has complex logic that is better refactored to another method, but the logic is really only used in one place.

Solution

The program uses the CustomerType and InvoiceItem:

enum CustomerType
{
    None,
    Bronze,
    Silver,
    Gold
}

class InvoiceItem
{
    public decimal Cost { get; set; }
    public string Description { get; set; }
}

This method generates and returns a demo set of invoices:

static List<InvoiceItem> GetInvoiceItems()
{
    var items = new List<InvoiceItem>();
    var rand = new Random();
    for (int i = 0; i < 100; i++)
        items.Add(
            new InvoiceItem
            {
                Cost = rand.Next(i),
                Description = "Invoice Item #" + (i + 1)
            });

    return items;
}

Finally, the Main method shows how to use a local function:

static void Main()
{
    List<InvoiceItem> lineItems = GetInvoiceItems();

    decimal total = 0;

    foreach (var item in lineItems)
        total += item.Cost;

    total = ApplyDiscount(total, CustomerType.Gold);

    Console.WriteLine($"Total Invoice Balance: {total:C}");

    decimal ApplyDiscount(decimal total, CustomerType customerType)
    {
        switch (customerType)
        {
            case CustomerType.Bronze:
                return total - total * .10m;
            case CustomerType.Silver:
                return total - total * .05m;
            case CustomerType.Gold:
                return total - total * .02m;
            case CustomerType.None:
            default:
                return total;
        }
    }
}

Discussion

Local methods are useful whenever code is only relevant to a single method and you want to separate that code. Reasons for separating code are to give meaning to a set of complex logic, re-use logic and simplify calling code (perhaps a loop), or allow an async method to throw an exception before awaiting the enclosing method.

The Main method in the solution has a local method, named ApplyDiscount. This example demonstrates how a local method can simplify code. If you examine the code in ApplyDiscount, it might not be immediately clear what its purpose is. However, by separating that logic into its own method, anyone can read the method name and know what the purpose of the logic is. This is a great way to make code more maintainable, by expressing intent, and making that logic local where another developer won’t need to hunt for a class method that might move around after future maintenance.

2.5 2.4 Operating on Multiple Classes the Same Way

Problem

An application must be extensible, for adding new plug-in capabilities, but you don’t want to re-write existing code for new classes.

Solution

This is a common interface for several classes to implement:

public interface IInvoice
{
    bool IsApproved();

    void PopulateLineItems();

    void CalculateBalance();

    void SetDueDate();
}

Here are a few classes that implement IInvoice:

public class BankInvoice : IInvoice
{
    public void CalculateBalance()
    {
        Console.WriteLine("Calculating balance for BankInvoice.");
    }

    public bool IsApproved()
    {
        Console.WriteLine("Checking approval for BankInvoice.");
        return true;
    }

    public void PopulateLineItems()
    {
        Console.WriteLine("Populating items for BankInvoice.");
    }

    public void SetDueDate()
    {
        Console.WriteLine("Setting due date for BankInvoice.");
    }
}

public class EnterpriseInvoice : IInvoice
{
    public void CalculateBalance()
    {
        Console.WriteLine("Calculating balance for EnterpriseInvoice.");
    }

    public bool IsApproved()
    {
        Console.WriteLine("Checking approval for EnterpriseInvoice.");
        return true;
    }

    public void PopulateLineItems()
    {
        Console.WriteLine("Populating items for EnterpriseInvoice.");
    }

    public void SetDueDate()
    {
        Console.WriteLine("Setting due date for EnterpriseInvoice.");
    }
}

public class GovernmentInvoice : IInvoice
{
    public void CalculateBalance()
    {
        Console.WriteLine("Calculating balance for GovernmentInvoice.");
    }

    public bool IsApproved()
    {
        Console.WriteLine("Checking approval for GovernmentInvoice.");
        return true;
    }

    public void PopulateLineItems()
    {
        Console.WriteLine("Populating items for GovernmentInvoice.");
    }

    public void SetDueDate()
    {
        Console.WriteLine("Setting due date for GovernmentInvoice.");
    }
}

This method populates a collection with classes that implement IInvoice:

static List<IInvoice> GetInvoices()
{
    return new List<IInvoice>
    {
        new BankInvoice(),
        new EnterpriseInvoice(),
        new GovernmentInvoice()
    };
}

The Main method has an algorithm that operates on the IInvoice interface:

static void Main(string[] args)
{
    List<IInvoice> invoices = GetInvoices();

    foreach (var invoice in invoices)
    {
        if (invoice.IsApproved())
        {
            invoice.CalculateBalance();
            invoice.PopulateLineItems();
            invoice.SetDueDate();
        }
    }
}

Discussion

As a developer’s career progresses, chances are they’ll encounter requirements that customers want an application to be “extensible”. Although the exact meaning is anomalous to even the most seasoned architects, there’s a general understanding that “extensibility” should be a theme in the application’s design. We generally move in this direction by identifying areas of the application that can and will change over time. Patterns can help with this, such as the factory classes of Section 1.3, factory methods of Section 1.4, and builders in Section 1.10. In a similar light, the Strategy pattern described in this section helps organize code for extensibility.

The Strategy pattern is useful when there are multiple object types to work with at the same time and you want them to be interchangeable and write code one time that operates the same way for each object. The software we use every day are classic examples of where a Strategy could work. Office applications have different document types and allow developers to write their own add-ins. Browsers have add-ins that developers can write. The editors and Integrated Development Environments (IDEs) you use every day have plug-in capabilities.

The solution describes an application that operates on different types of invoices in the domains of Banking, Enterprise, and Government. Each of these domains have their own business rules related to legal or other requirements. What makes this extensible is the fact that, in the future, we can add another object to handle invoices in another domain.

The glue to making this work is the IInvoice interface. It contains the required methods (or contract) that each implementing object must define. You can see that the BankInvoice, EnterpriseInvoice, and GovernmentInvoices each implement IInvoice.

GetInvoices simulates the situation where you would write code to populate invoices from a data source. Whenever you need to extend the framework, by adding a new IInvoice derived type, this is the only code that changes. Because all classes are IInvoice, they can all be returned via the same List<IInvoice> collection.

Finally, examine the Main method. It iterates on each IInvoice object, calling each method. Main doesn’t care what the specific implementation is and so its code never needs to change to accommodate instance specific logic. You don’t need if or switch statements for special cases, which blows up into spaghetti code in maintenance. Any future changes will be on how Main works with the IInvoice interface. Any changes to business logic associated with invoices is limited to the invoice types themselves. This is easy to maintain and easy to figure out where logic is and should be. Further, it’s also easy to extend by adding a new Plug-In class that implements IInvoice.

See Also

1.3 Delegating Object Creation to a Class 1.4 Delegating Object Creation to a Method 1.10 Building a Fluid Interface

2.6 2.5 Checking for Type Equality

Problem

You need to search for objects in a collection and default equality won’t work.

Solution

The Invoice class implements IEquatable<T>:

public class Invoice : IEquatable<Invoice>
{
    public int CustomerID { get; set; }

    public DateTime Created { get; set; }

    public List<string> InvoiceItems { get; set; }

    public decimal Total { get; set; }

    public bool Equals(Invoice other)
    {
        if (ReferenceEquals(other, null))
            return false;

        if (ReferenceEquals(this, other))
            return true;

        if (GetType() != other.GetType())
            return false;

        return
            CustomerID == other.CustomerID &&
            Created.Date == other.Created.Date;
    }

    public override bool Equals(object other)
    {
        return Equals(other as Invoice);
    }

    public override int GetHashCode()
    {
        return (CustomerID + Created.Ticks).GetHashCode();
    }

    public static bool operator ==(Invoice left, Invoice right)
    {
        if (ReferenceEquals(left, null))
            return ReferenceEquals(right, null);

        return left.Equals(right);
    }

    public static bool operator !=(Invoice left, Invoice right)
    {
        return !(left == right);
    }
}

This code returns a collection of Invoice classes:

private static List<Invoice> GetAllInvoices()
{
    return new List<Invoice>
    {
        new Invoice { CustomerID = 1, Created = DateTime.Now },
        new Invoice { CustomerID = 2, Created = DateTime.Now },
        new Invoice { CustomerID = 1, Created = DateTime.Now },
        new Invoice { CustomerID = 3, Created = DateTime.Now }
    };
}

Here’s how to use the Invoice class:

static void Main(string[] args)
{
    List<Invoice> allInvoices = GetAllInvoices();

    Console.WriteLine($"# of All Invoices: {allInvoices.Count}");

    var invoicesToProcess = new List<Invoice>();

    foreach (var invoice in allInvoices)
    {
        if (!invoicesToProcess.Contains(invoice))
            invoicesToProcess.Add(invoice);
    }

    Console.WriteLine($"# of Invoices to Process: {invoicesToProcess.Count}");
}

Discussion

The default equality semantics for reference types is reference equality and for value types is value equality. Reference equality means that when comparing objects, do their references refer to the same exact object instance. Value equality occurs when each member of an object is compared before two objects are considered equal. The problem with reference equality is that sometimes you have two copies of an object, referring to different object instances, but you really want to check their values to see if they are equal. Value equality might also pose a problem because you might only want to check part of the object to see if they’re equal.

To solve the problem of inadequate default equality, the solution implements custom equality on Invoice. The Invoice class implements the IEquatable<T> interface, where T is Invoice. Although IEquatable<T> requires an Equals(T other) method, you should also implement Equals(object other), GetHashCode(), and the == and != operators, resulting in a consistent definition of equals for all scenarios.

There’s a lot of science in picking a good hash code, which is out of scope for this book, so the solution implementation is minimal.

The equality implementation avoids repeating code. The != operator invokes (and negates) the == operator. The == operator checks references and returns true if both references are null and false if only one reference is null. Both the == operator and the Equals(object other) method call the Equals(Invoice other) method.

The current instance is clearly not null, so Equals(Invoice other) only checks the other reference and returns false if it’s null. Then it checks to see if this and other have reference equality, which would obviously mean they are equal. Then if the objects aren’t the same type, they are not considered equal. Finally, return the results of the values to compare. In this example, the only thing that makes sense is the CustomerID and Date.

Note

One of the places you might change in the Equals(Invoice other) method is the type check. You could have a different opinion, based on the requirements of your application. e.g. What if you wanted to check equality even if other was a derived type? Then change the logic to accept derived types also.

The Main method processes invoices, ensuring we don’t add duplicate invoices to a list. In the loop, it calls the collection Contains method, which checks the object’s equality. If it doesn’t find a matching object, it adds it to the invoicesToProcess list. When running the program, there are 4 invoices that exist in allInvoices, but only 3 are added to invoicesToProcess because there’s one duplicate (based on CustomerID and Date) in allInvoices.

Note

C# 9.0 Records give you IEquatable<T> logic by default. However, Records give you value equality and you would want to implement IEquatable<T> yourself if you needed to be more specific. e.g. if your object has free-form text fields that don’t contribute to the identity of the object, why waste resources doing the unnecessary field comparisons? Another problem (maybe more rare) could be that some parts of a record might be different for temporal reasons, e.g. temporary timestamps, status, or Globally Unique Identifiers (GUIDs), that will cause the objects to never be equal during processing.

2.7 2.6 Processing Data Hierarchies

Problem

An app needs to work with hierarchical data and an iterative approach is too complex and unnatural.

Solution

This is the format of data we’re starting with:

class BillingCategory
{
    public int ID { get; set; }
    public string Name { get; set; }
    public int? Parent { get; set; }
}

This method returns a collection of hierarchically related records:

static List<BillingCategory> GetBillingCategories()
{
    return new List<BillingCategory>
    {
        new BillingCategory { ID = 1, Name = "First 1",  Parent = null },
        new BillingCategory { ID = 2, Name = "First 2",  Parent = null },
        new BillingCategory { ID = 4, Name = "Second 1", Parent = 1 },
        new BillingCategory { ID = 3, Name = "First 3",  Parent = null },
        new BillingCategory { ID = 5, Name = "Second 2", Parent = 2 },
        new BillingCategory { ID = 6, Name = "Second 3", Parent = 3 },
        new BillingCategory { ID = 8, Name = "Third 1",  Parent = 5 },
        new BillingCategory { ID = 8, Name = "Third 2",  Parent = 6 },
        new BillingCategory { ID = 7, Name = "Second 4", Parent = 3 },
        new BillingCategory { ID = 9, Name = "Second 5", Parent = 1 },
        new BillingCategory { ID = 8, Name = "Third 3",  Parent = 9 }
    };
}

This is a recursive algorithm that transforms the flat data into a hierarchical form:

static List<BillingCategory> BuildHierarchy(
     List<BillingCategory> categories, int? catID, int level)
{
    var found = new List<BillingCategory>();

    foreach (var cat in categories)
    {
        if (cat.Parent == catID)
        {
            cat.Name = new string('	', level) + cat.Name;
            found.Add(cat);
            List<BillingCategory> subCategories =
                BuildHierarchy(categories, cat.ID, level + 1);
            found.AddRange(subCategories);
        }
    }

    return found;
}

The Main method runs the program and prints out the hierarchical data:

static void Main(string[] args)
{
    List<BillingCategory> categories = GetBillingCategories();

    List<BillingCategory> hierarchy =
        BuildHierarchy(categories, catID: null, level: 0);

    PrintHierarchy(hierarchy);
}

static void PrintHierarchy(List<BillingCategory> hierarchy)
{
    foreach (var cat in hierarchy)
        Console.WriteLine(cat.Name);
}

Discussion

It’s hard to tell how many times you have or will encounter iterative algorithms with complex logic and conditions on how the loop operates. Loops like for, foreach, and while are familiar and often used when more elegant solutions are available. I’m not suggesting there’s anything wrong with loops, which are integral parts of our language toolset. However, it’s useful to expand our minds to other techniques that might lend themselves to more elegant and maintainable code for given situations. Sometimes a declarative approach, like a simple lambda on a collection’s ForEach operator is simple and clear. LINQ is a nice solution for working with object collections in memory, which is the subject of Chapter 4. Another alternative is recursion - the subject of this section.

The main point I’m making here is that we need to write algorithms using the techniques that are most natural for a given situation. A lot of algorithms do use loops naturally, like iterating through a collection. Other tasks beckon for recursion. A class of algorithms that work on hierarchies might be excellent candidates for recursion.

The solution demonstrates one of the areas where recursion simplified processing and makes the code clear. It processes a list of categories based on billing. Notice that the BillingCategory class has both an ID and a Parent. These manage the hierarchy, where the Parent identifies the parent category. Any BillingCategory with a null Parent is a top level category. This is a single table relational DB representation of hierarchical data.

The GetBillingCategories represents how the BillingCategories arrive from a DB. It’s a flat structure. Notice how the Parent properties reference the BillingCategory IDs that are their parents. Another important fact about the data is that there isn’t a clean ordering between parents and children. In a real application, you’ll start off with a given set of categories and add new categories later. Again, maintenance in code and data over time changes how we approach algorithm design and this would complicate an iterative solution.

The purpose of this solution is to take the flat category representation and transform it into another list that represents the hierarchical relationship between categories. This was a simple solution, but you might imagine an object based representation where parent categories contained a collection with child categories. The BuildHierarchy method is the recursive algorithm that does this.

The BuildHierarchy method accepts 3 parameters: categories, catID, and level. The categories parameter is the flat collection from the DB and every recursive call receives a reference to this same collection. A potential optimization might be to remove categories that have already been processed, though the demo avoids anything distracting from presented concepts. The catID parameter is the ID for the current BillingCategory and the code is searching for all sub-categories whose Parent matches catID - as demonstrated by the if statement inside the foreach loop. The level parameter helps manage the visual representation of each category. The first statement inside the if block uses level to determine how many tabs ( ) to prefix to the category name. Every time we make a recursive call to BuildHierarchy, we increment level so that subcategories are indented more than their parents.

The algorithm calls BuildHierarchy with the same categories collection. Also, it uses the ID of the current category, not the catID parameter. This means that it recursively calls BuildHierarchy until it reaches the bottom most categories. It will know it’s at the bottom of the hierarchy because the foreach loop completes with no new categories because there aren’t any sub-categories for the current (bottom) category.

After reaching the bottom, BuildHierarchy returns and continues the foreach loop, collecting all of the categories under the catID - that is, their Parent is catID. Then it appends any matching sub-categories to the found collection to the calling BuildHierchy. This continues until the algorithm reaches the top level and all root categories are processed.

Note

The recursive algorithm in this solution is referred to as Depth First Search.

Having arrived at the top level, BuildHierarchy returns the entire collection to its original caller, which is Main. Main originally called BuildHierarchy with the entire flat categories collection. It set catID to null, indicating that BuildHierarchy should start at the root level. The level argument is 0, indicating that we don’t want any tab prefixes on root level category names. Here’s the output:

First 1 Second 1 Second 5 Third 3 First 2 Second 2 Third 1 First 3 Second 3 Third 2 Second 4

Looking back at the GetBillingCategories method, you can see how the visual representation matches the data.

2.8 2.7 Converting From/To Unix Time

Problem

A service is sending date information in seconds or ticks since the Linux epoc that needs to be converted to a C#/.NET DateTime.

Solution

Here are some values we’ll be using:

static readonly DateTime LinuxEpoch =   new DateTime(1970, 1, 1, 0, 0, 0, 0);
static readonly DateTime WindowsEpoch = new DateTime(0001, 1, 1, 0, 0, 0, 0);
static readonly double EpochMillisecondDifference =
    new TimeSpan(LinuxEpoch.Ticks - WindowsEpoch.Ticks).TotalMilliseconds;

These methods convert from and to Linux epoch timestamps:

public static string ToLinuxTimestampFromDateTime(DateTime date)
{
    double dotnetMilliseconds = TimeSpan.FromTicks(date.Ticks).TotalMilliseconds;

    double linuxMilliseconds = dotnetMilliseconds - EpochMillisecondDifference;

    double timestamp = Math.Round(
        linuxMilliseconds, 0, MidpointRounding.AwayFromZero);

    return timestamp.ToString();
}

public static DateTime ToDateTimeFromLinuxTimestamp(string timestamp)
{
    ulong.TryParse(timestamp, out ulong epochMilliseconds);
    return LinuxEpoch + +TimeSpan.FromMilliseconds(epochMilliseconds);
}

The Main method demonstrates how to use those methods:

static void Main()
{
    Console.WriteLine(
        $"WindowsEpoch == DateTime.MinValue: " +
        $"{WindowsEpoch == DateTime.MinValue}");

    DateTime testDate = new DateTime(2021, 01, 01);

    Console.WriteLine($"testDate: {testDate}");

    string linuxTimestamp = ToLinuxTimestampFromDateTime(testDate);

    TimeSpan dotnetTimeSpan = TimeSpan.FromMilliseconds(long.Parse(linuxTimestamp));
    DateTime problemDate = new DateTime(dotnetTimeSpan.Ticks);

    Console.WriteLine($"Accidentally based on .NET Epoch: {problemDate}");

    DateTime goodDate = ToDateTimeFromLinuxTimestamp(linuxTimestamp);

    Console.WriteLine($"Properly based on Linux Epoch: {goodDate}");
}

Discussion

Sometimes developers represent date/time data as milliseconds or ticks in a database. Ticks are measured as 100 nanoseconds. Both milliseconds and Ticks represent time starting at a pre-defined epoch, which is some point in time that is the minimum date for a computing platform. For .NET, the epoch is 01/01/0001 00:00:00, corresponding to the WindowsEpoch field in the solution. This is the same as DateTime.MinValue, but defining this way makes the example more explicit. For MacOS, the epoch is 1 January 1904 and for Linux, the epoch is 1 January 1970, as shown by the LinuxEpoch field in the solution.

Note

There are various opinions on whether representing DateTime values as milliseconds or ticks is a proper design. However, I leave that debate to other people and venues. My habit is to use the DateTime format of the database I’m using. I also translate the DateTime to UTC because many apps need to exist beyond the local time zone and you need a consistent translatable representation.

Increasingly, developers are more likely to encounter situations where they need to build cross-platform solutions or integrate with a 3rd party system with milliseconds or ticks based on a different epoch. e.g. The Twitter API began using milliseconds based on the Linux epoch in their 2020 version 2.0 release. The solution example is inspired by code that works with milliseconds from Twitter API responses. The release of .NET Core gave us cross-platform capabilities for C# developers for Console and ASP.NET MVC Core applications. .NET 5 continues the cross-platform story and the roadmap for .NET 6 includes the first rich GUI interface, codenamed Maui. If you’ve been accustomed to working solely in the Microsoft and .NET platforms, this should indicate that things continue to change along the type of thinking required for future development.

The ToLinuxTimestampFromDateTime takes a .NET DateTime and converts it to a Linux timestamp. The Linux timestamp is the number of milliseconds from the Linux epoch. Since we’re working in milliseconds, the TimeSpan converts the DateTime ticks to milliseconds. To perform the conversion, we subtract the number of milliseconds between the .NET time and the equivalent Linux time, which we pre-calculated in EpochMillisecondDifference by subtracting the .NET (Windows) epoch from the Linux epoch. After the conversion, we need to round the value to eliminate excess precision. The default to Math.Round uses what’s called Bankers rounding, which is often not what we need, so the overload with MidpointRounding.AwayFromZero does the rounding we expect. The solution returns the final value as a string and you can change that for what makes sense for your implementation.

The ToDateTimeFromLinuxTimestamp method is remarkably simpler. After converting to a ulong, it creates a new timestamp from the milliseconds and adds that to the LinuxEpoch. Here’s the output from the Main method:

WindowsEpoch == DateTime.MinValue: True testDate: 1/1/2021 12:00:00 AM Accidentally based on .NET Epoch: 1/2/0052 12:00:00 AM Properly based on Linux Epoch: 1/1/2021 12:00:00 AM

As you can see, DateTime.MinValue is the same as the Windows epoch. Using 1/1/2021 as a good date (at least we hope so), Main starts by properly converting that date to a Linux timestamp. Then it shows the wrong way to process that date. Finally, it calls ToDateTimeFromLinuxTimestamp performing the proper translation.

2.9 2.8 Caching Frequently Requested Data

Problem

Network latency is causing an app to run slowly because static frequently used data is being fetched too often.

Solution

Here’s the type of data that will be cached:

public class InvoiceCategory
{
    public int ID { get; set; }

    public string Name { get; set; }
}

This is the interface for the repository that retrieves the data:

public interface IInvoiceRepository
{
    List<InvoiceCategory> GetInvoiceCategories();
}

This is the repository the retrieves and caches the data:

public class InvoiceRepository : IInvoiceRepository
{
    static List<InvoiceCategory> invoiceCategories;

    public List<InvoiceCategory> GetInvoiceCategories()
    {
        if (invoiceCategories == null)
            invoiceCategories = GetInvoiceCategoriesFromDB();

        return invoiceCategories;
    }

    List<InvoiceCategory> GetInvoiceCategoriesFromDB()
    {
        return new List<InvoiceCategory>
        {
            new InvoiceCategory { ID = 1, Name = "Government" },
            new InvoiceCategory { ID = 2, Name = "Financial" },
            new InvoiceCategory { ID = 3, Name = "Enterprise" },
        };
    }
}

Here’s the program that uses that repository:

class Program
{
    readonly IInvoiceRepository invoiceRep;

    public Program(IInvoiceRepository invoiceRep)
    {
        this.invoiceRep = invoiceRep;
    }

    static void Main()
    {
        new Program(new InvoiceRepository()).Run();
    }

    void Run()
    {
        List<InvoiceCategory> categories = invoiceRep.GetInvoiceCategories();

        foreach (var category in categories)
            Console.WriteLine($"ID: {category.ID}, Name: {category.Name}");
    }
}

Discussion

Depending on the technology you’re using, there could be plenty of options for caching data through mechanisms like CDN, HTTP, and data source solutions. Each has a place and purpose and this section doesn’t try to cover all of those options. Rather, it just has a quick and simple technique for caching data that might be helpful.

You might have experienced a scenario where there’s a set of data used in a lot of different places. The nature of the data is typically lookup lists or business rule data. In the course of every day work, we build queries that includes this data either in direct select queries or in the form of database table joins. We forget about it until someone starts complaining about application performance. Analysis might reveal that there are a lot of queries that request that same data over and over again. If it’s practical, you can cache that data in memory to avoid network latency exacerbated by excessive queries to the same set of data.

This isn’t a blanket solution because you have to think about whether it’s practical in your situation. e.g. it’s impractical to hold too much data in memory, which will cause other scalability problems. Ideally, it’s a finite and relatively small set of data, like invoice categories. That data shouldn’t change too often because if you need real-time access to dynamic data, this won’t work. e.g. If the underlying data source changes, the cache is likely to be holding the old stale data.

The solution shows an InvoiceCategory class that we’re going to cache. It’s for a lookup list, just two values per object, a finite and relatively small set of values, and something that doesn’t change much. You can imagine that every query for invoices would require this data as well as admin or search screens with lookup lists. It might speed up invoice queries by removing the extra join and returning less data over the wire where you can join the cached data after the DB query.

The solution has an InventoryRepository that implements the IInvoiceRepository interface. This wasn’t strictly necessary for this example, though it does support demonstrating another example of IoC, as discussed in Section 1.2.

The InvoiceRepository class has a invoiceCategories field for holding a collection of InvoiceCategory. The GetInvoiceCategories method would normally make a DB query and return the results. However, this example only does the DB query if invoiceCategories is null and caches the result in invoiceCategories. This way, subsequent requests get the cached version and doesn’t require a DB query.

Note

The invoiceCategories field is static because you only want a single cache. In stateless web scenarios, as in ASP.NET, the IIS process recycles unpredictably and developers are advised not to rely on static variables. This situation is different because if the recycle clears out invoiceCategories, leaving it null, the next query will re-populate it.

The Main method uses IoC to instantiate InvoiceRepository and performs a query for the InvoiceCategory collection.

See Also

1.2 Removing Explicit Dependencies

2.10 2.9 Delaying Type Instantiation

Problem

A class has heavy instantiation requirements and you can save on resource usage by delaying the instantiation to only when necessary.

Solution

Here’s the data we’ll work with:

public class InvoiceCategory
{
    public int ID { get; set; }

    public string Name { get; set; }
}

This is the repository interface:

public interface IInvoiceRepository
{
    void AddInvoiceCategory(string category);
}

This is the repository that we delay instantiation of:

public class InvoiceRepository : IInvoiceRepository
{
    public InvoiceRepository()
    {
        Console.WriteLine("InvoiceRepository Instantiated.");
    }

    public void AddInvoiceCategory(string category)
    {
        Console.WriteLine($"for category: {category}");
    }
}

This program shows a few ways to perform lazy initialization of the repository:

class Program
{
    public static ServiceProvider Container;

    readonly Lazy<InvoiceRepository> InvoiceRep =
        new Lazy<InvoiceRepository>();

    readonly Lazy<IInvoiceRepository> InvoiceRepFactory =
        new Lazy<IInvoiceRepository>(CreateInvoiceRepositoryInstance);

    readonly Lazy<IInvoiceRepository> InvoiceRepIoC =
        new Lazy<IInvoiceRepository>(CreateInvoiceRepositoryFromIoC);

    static IInvoiceRepository CreateInvoiceRepositoryInstance()
    {
        return new InvoiceRepository();
    }

    static IInvoiceRepository CreateInvoiceRepositoryFromIoC()
    {
        return Program.Container.GetRequiredService<IInvoiceRepository>();
    }

    static void Main()
    {
        Container =
            new ServiceCollection()
                .AddTransient<IInvoiceRepository, InvoiceRepository>()
                .BuildServiceProvider();

        new Program().Run();
    }

    void Run()
    {
        IInvoiceRepository viaLazyDefault = InvoiceRep.Value;
        viaLazyDefault.AddInvoiceCategory("Via Lazy Default 
");

        IInvoiceRepository viaLazyFactory = InvoiceRepFactory.Value;
        viaLazyFactory.AddInvoiceCategory("Via Lazy Factory 
");

        IInvoiceRepository viaLazyIoC = InvoiceRepIoC.Value;
        viaLazyIoC.AddInvoiceCategory("Via Lazy IoC 
");
    }
}

Discussion

Sometimes you have objects with heavy startup overhead. They might need some initial calculation or have to wait on data that takes a while because of network latency or dependencies on poorly performing external systems. This can have serious negative consequences, especially on application startup. Imagine an app that is losing potential customers during trial because it starts too slow or even enterprise users whose work is impacted by wait times. Although you may or may not be able to fix the root cause of the performance bottleneck, another option might be to delay instantiation of that object until you need it. e.g. What if you really don’t need that object immediately and can show a start screen right away?

The solution demonstrates how to use Lazy<T> to delay object instantiation. The object in question is the InvoiceRepository and we’re assuming it has a problem in its constructor logic that causes a delay in instantiation.

Program has 3 fields whose type is Lazy<InvoiceRepository>, showing 3 different ways to instantiate. The first field, InvoiceRep instantiates a Lazy<InvoiceRepository> with no parameters. It assumes that InvoiceRepository has a default constructor (parameterless) and will create a new instance when called.

The InvoiceRepFactory property instance references the CreateInvoiceRepositoryInstance method. When code accesses this property, it calls the CreateInvoiceRepositoryInstance to construct the object. Since it’s a method, you have a lot of flexibility in building the object.

In addition to the other two options, the InvoiceRepIoC property shows how you can use Lazy instantiation with IoC. Notice that the Main method builds an IoC container, as described in Section 1.2. The CreateInvoiceRepositoryFromIoC method uses that IoC container to request an instance of InvoiceRepository.

Finally, the Run method shows how to access the fields, through the Lazy<T>.Value property.

See Also

1.2 Removing Explicit Dependencies

2.11 2.10 Parsing Data Files

Problem

The application needs to extract data from a custom external format and string type operations lead to complex and less efficient code.

Solution

Here’s the data types we’ll be working with:

class InvoiceItem
{
    public decimal Cost { get; set; }
    public string Description { get; set; }
}

class Invoice
{
    public string Customer { get; set; }
    public DateTime Created { get; set; }
    public List<InvoiceItem> Items { get; set; }
    public decimal Total { get; set; }
}

This method returns the raw string data that we want to extract and convert to invoices:

static string GetInvoiceTransferFile()
{
    return
        "Creator 1::8/05/20::Item 1	35.05	Item 2	25.18	Item 3	13.13::Customer 1::[NOTE] 1
" +
        "Creator 2::8/10/20::Item 1	45.05::Customer 2::[NOTE] 2
" +
        "Creator 1::8/15/20::Item 1	55.05	Item 2	65.18::Customer 3::[NOTE] 3
";
}

These are utility methods for building and saving invoices:

static Invoice GetInvoice(string matchCustomer, string matchCreated, string matchItems)
{
    List<InvoiceItem> lineItems = GetLineItems(matchItems);

    DateTime.TryParse(matchCreated, out DateTime created);

    var invoice =
        new Invoice
        {
            Customer = matchCustomer,
            Created = created,
            Items = lineItems
        };
    return invoice;
}

static List<InvoiceItem> GetLineItems(string matchItems)
{
    var lineItems = new List<InvoiceItem>();

    string[] itemStrings = matchItems.Split('	');

    for (int i = 0; i < itemStrings.Length; i += 2)
    {
        decimal.TryParse(itemStrings[i + 1], out decimal cost);
        lineItems.Add(
            new InvoiceItem
            {
                Description = itemStrings[i],
                Cost = cost
            });
    }

    return lineItems;
}

static void SaveInvoices(List<Invoice> invoices)
{
    Console.WriteLine($"{invoices.Count} invoices saved.");
}

This method uses regular expressions to extract values from raw string data:

static List<Invoice> ParseInvoices(string invoiceFile)
{
    var invoices = new List<Invoice>();

    Regex invoiceRegEx = new Regex(
        @"^.+?::(?<created>.+?)::(?<items>.+?)::(?<customer>.+?)::.+");

    foreach (var invoiceString in invoiceFile.Split('
'))
    {
        Match match = invoiceRegEx.Match(invoiceString);

        if (match.Success)
        {
            string matchCustomer = match.Groups["customer"].Value;
            string matchCreated = match.Groups["created"].Value;
            string matchItems = match.Groups["items"].Value;

            Invoice invoice = GetInvoice(matchCustomer, matchCreated, matchItems);
            invoices.Add(invoice);
        }
    }

    return invoices;
}

The Main method runs the demo:

static void Main(string[] args)
{
    string invoiceFile = GetInvoiceTransferFile();

    List<Invoice> invoices = ParseInvoices(invoiceFile);

    SaveInvoices(invoices);
}

Discussion

Sometimes, we’ll encounter textual data that doesn’t fit standard data formats. It might come from existing document files, log files, or external and legacy systems. Often, we need to ingest that data and process it for storage in a DB. This section explains how to do that with regular expressions.

The solution shows the data format we want to generate is an Invoice with a collection of InvoiceItem. The GetInvoiceTransferFile method shows the format of the data. The demo suggests that the data might come from a legacy system that already produced that format and it’s easier to write C# code to ingest that than to add code in that system for a better supported format. The specific data we’re interested in extracting are the created date, invoice items, and customer name. Notice that newlines ( ) separate records, double colons (::) separate invoice fields, and tabs ( ) separate invoice item fields.

The GetInvoice and GetLineItems methods construct the objects from extracted data and serve to separate object construction from the regular expression extraction logic.

The ParseInvoices method uses regular expressions to extract values from the input string. The RegEx constructor parameter contains the regular expression string, used to extract values. While an entire discussion of regular expressions is out of scope, here’s what this string does:

  • ^ says to start at the beginning of the string

  • .?::+ matches all characters, up to the next invoice field separator (::). That said, it ignores the contents that were matched.

  • (?<created>.?)::+, (?<items>.?)::+, and (?<customer>.?)::+ are similar to .?)::+, but go a step further by extracting values into groups based on the given name. e.g. (?<created>.?)::+ means that it will extract all matched data and put the data in a group named “created”.

  • .+ matches all remaining characters

The foreach loop relies on the separator in the string to work with each invoice. The Match method executes the regular expression match, extracting values. If the match was successful, the code extracts values from groups, calls GetInvoice and adds the new invoice to the invoices collection.

You might have noticed how we’re using GetLineItems to extract data from the matchItems parameter, from the regular expression items field. We could have used a more sophisticated regular expression to take care of that too. However, this was intentional for contrast in demonstrating how regular expression processing is a more elegant solution in this situation.

Tip

As an enhancement, you might log any situations where match.Success is false if you’re concerned about losing data and/or want to know if there’s a bug in the regular expression or original data formatting.

Example 2-1.

Finally, the application returns the new line items to the calling code, Main, so it can save them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.252.23