We code every day, thinking about the problem we’re solving and ensuring that our algorithms work correctly. This is how it should be and modern tools and SDKs increasingly free our time to do just that. Even so, there are features of C#, .NET, and coding in general that have significant effects on efficiency, performance, and maintainability.
A few subjects in this chapter discuss application performance, such as the efficient handling of strings, caching data, or delaying the instantiation of a type until you need it. In some simple scenarios, these things might not matter. However, in complex enterprise apps that need the performance and scale, keeping an eye on these techniques can help avoid expensive problems in production.
How you organize code can significantly affect its maintainability. Building on the discussions in Chapter 1, you’ll see a new pattern, Strategy, and how it can help simplify an algorithm and make an app more extensible. Another section discusses using recursion for naturally occurring hierarchical data. Collecting these techniques and thinking about the best way to approach an algorithm can make a significant difference in the maintainability and quality of code.
A couple sections of this chapter might be interesting in specific contexts - different ways to think about solving problems. You might not use regular expressions every day, but they’re very useful when you need them. Another section, on converting to/from Unix time, looks into the future of .NET as a cross-platform language; knowing that we need a certain mindset to think about designing algorithms in an environment we might not have ever considered in the past.
A profiler indicates a problem in part of your code that builds a large string iteratively and you need to improve performance.
Here’s an InvoiceItem
class we’ll be working with:
class
InvoiceItem
{
public
decimal
Cost
{
get
;
set
;
}
public
string
Description
{
get
;
set
;
}
}
This method produces sample data for the demo:
static
List
<
InvoiceItem
>
GetInvoiceItems
()
{
var
items
=
new
List
<
InvoiceItem
>();
var
rand
=
new
Random
();
for
(
int
i
=
0
;
i
<
100
;
i
++)
items
.
Add
(
new
InvoiceItem
{
Cost
=
rand
.
Next
(
i
),
Description
=
"Invoice Item #"
+
(
i
+
1
)
});
return
items
;
}
There are two methods for working with strings. First, the inefficient method:
static
string
DoStringConcatenation
(
List
<
InvoiceItem
>
lineItems
)
{
string
report
=
""
;
foreach
(
var
item
in
lineItems
)
report
+=
$
"{item.Cost:C} - {item.Description}"
;
return
report
;
}
Next is the more efficient method:
static
string
DoStringBuilderConcatenation
(
List
<
InvoiceItem
>
lineItems
)
{
var
reportBuilder
=
new
StringBuilder
();
foreach
(
var
item
in
lineItems
)
reportBuilder
.
Append
(
$
"{item.Cost:C} - {item.Description}"
);
return
reportBuilder
.
ToString
();
}
The Main
method ties all of this together:
static
void
Main
(
string
[]
args
)
{
List
<
InvoiceItem
>
lineItems
=
GetInvoiceItems
();
DoStringConcatenation
(
lineItems
);
DoStringBuilderConcatenation
(
lineItems
);
}
There are different reasons why we need to gather data into a longer string. Reports, whether text based or formatted via HTML or other markup, require combining text strings. Sometimes we add items to an email or manually build PDF content as an email attachment. Other times we might need to export data in a non-standard format for legacy systems. Too often, developers use string concatenation when StringBuilder
is the superior choice.
String concatenation is intuitive and quick to code, which is why so many people do it. However, concatenating strings can also kill application performance. The problem occurs because each concatenation performs expensive memory allocations. Let’s examine both the wrong way to build strings and the right way.
The logic in the DoStringConcatenation
method extracts Cost
and Description
from each InvoiceItem
and concatenates that to a growing string. Concatenating just a few strings might go unnoticed. However, imagine if this was 25, 50, or 100 lines or more. Using string concatenation as an example, Section 3.10 shows how string concatenation is an exponentially time intensive operation that destroys application performance.
When concatenating within the same expression, e.g. string1 + string2, the C# compiler can optimize the code. It’s the loop with concatenation that causes the huge performance hit.
The DoStringBuilderConcatenation
method fixes this problem. It uses the StringBuilder
, which is in the System.Text
namespace. It uses the Builder pattern, described in section 1.10, where each AppendText
adds the new string to the StringBuilder
instance, reportsBuilder
. Before returning, the method calls ToString
to convert the StringBuilder
contents to a string.
As a rule of thumb, once you’ve gone past 4 string concatenations, you’ll receive better performance by using StringBuilder.
Fortunately, the .NET ecosystem has many .NET Framework libraries and 3rd party libraries that help with forming strings of common format. You should use one of these libraries whenever possible because they’re often optimized for performance and will save time and make the code easier to read. To give you an idea, here are a few libraries to consider for common formats:
Data Format | Library
JSON .NET 5 | System.Text.Json JSON ⇐ .NET 4.x | Json.NET XML | LINQ to XML CSV | LINQ to CSV HTML | System.Web.UI.HtmlTextWriter PDF | Various Commercial and Open Source Providers Excel | Various Commercial and Open Source Providers
One more thought - Custom search and filtering panels are common to give users a simple way to query corporate data. Too frequently, developers use string concatenation to build SQL queries. While string concatenation is easier, beyond performance, the problem with that is security. String concatenated SQL statements open the opportunity for SQL Injection attack. In this case, StringBuilder
isn’t a solution. Instead, you should use a data library that parameterizes user input to circumvent SQL injection. There’s ADO.NET, LINQ Providers, and other 3rd party data libraries that do input value parameterization for you. For dynamic queries, using a data library might be harder, but it is possible. You might want to seriously consider using LINQ, which I discuss in Chapter 4.
Section 1.10 Building a Fluid Interface Section 3.10 Measuring Performance Chapter 4 Querying with LINQ
Old using
statements cause unnecessary nesting and you want to clean up and simplify code.
This program has using statements for reading and writing to a text file:
class
Program
{
const
string
FileName
=
"Invoice.txt"
;
static
void
Main
(
string
[]
args
)
{
Console
.
WriteLine
(
"Invoice App "
+
"----------- "
);
WriteDetails
();
ReadDetails
();
}
static
void
WriteDetails
()
{
using
var
writer
=
new
StreamWriter
(
FileName
);
Console
.
WriteLine
(
"Type details and press [Enter] to end. "
);
string
detail
=
string
.
Empty
;
do
{
Console
.
Write
(
"Detail: "
);
detail
=
Console
.
ReadLine
();
writer
.
WriteLine
(
detail
);
}
while
(!
string
.
IsNullOrWhiteSpace
(
detail
));
}
static
void
ReadDetails
()
{
Console
.
WriteLine
(
" Invoice Details: "
);
using
var
reader
=
new
StreamReader
(
FileName
);
string
detail
=
string
.
Empty
;
do
{
detail
=
reader
.
ReadLine
();
Console
.
WriteLine
(
detail
);
}
while
(!
string
.
IsNullOrWhiteSpace
(
detail
));
}
}
Before C# 8, using
statement syntax required parenthesis for IDisposable
object instantiation and an enclosing block. During runtime, when the program reached the closing block, it would call Dispose
on the instantiated object. If you needed multiple using
statements to operate at the same time, developers would often nest them, resulting in extra space in addition to normal statement nesting. This pattern was enough of an annoyance to some developers that Microsoft added a feature to the language to simplify using statements.
In the solution, you can see a couple places where the new using
statement syntax occurs: instantiating the StreamWriter
in WriteDetails
and instantiating the StreamReader
in ReadDetails
. In both cases, the using
statement is on a single line. Gone are the parenthesis and curly braces and each statement terminates with a semi-colon.
The scope of the new using
statement is its enclosing block, calling the using
object’s Dispose
method when execution reaches the end of the enclosing block. In the solution, the enclosing block is the method, which causes each using
object’s Dispose
method to be called at the end of the method.
What’s different about the single line using
statement is that it will work with both IDisposable
objects and objects that implement a disposable pattern. In this context, a disposable pattern means that the object doesn’t implement IDisposable
, yet it has a parameterless Dispose
method.
Section 1.1 Managing Object End-of-Lifetime
An algorithm has complex logic that is better refactored to another method, but the logic is really only used in one place.
The program uses the CustomerType
and InvoiceItem
:
enum
CustomerType
{
None
,
Bronze
,
Silver
,
Gold
}
class
InvoiceItem
{
public
decimal
Cost
{
get
;
set
;
}
public
string
Description
{
get
;
set
;
}
}
This method generates and returns a demo set of invoices:
static
List
<
InvoiceItem
>
GetInvoiceItems
()
{
var
items
=
new
List
<
InvoiceItem
>();
var
rand
=
new
Random
();
for
(
int
i
=
0
;
i
<
100
;
i
++)
items
.
Add
(
new
InvoiceItem
{
Cost
=
rand
.
Next
(
i
),
Description
=
"Invoice Item #"
+
(
i
+
1
)
});
return
items
;
}
Finally, the Main
method shows how to use a local function:
static
void
Main
()
{
List
<
InvoiceItem
>
lineItems
=
GetInvoiceItems
();
decimal
total
=
0
;
foreach
(
var
item
in
lineItems
)
total
+=
item
.
Cost
;
total
=
ApplyDiscount
(
total
,
CustomerType
.
Gold
);
Console
.
WriteLine
(
$
"Total Invoice Balance: {total:C}"
);
decimal
ApplyDiscount
(
decimal
total
,
CustomerType
customerType
)
{
switch
(
customerType
)
{
case
CustomerType
.
Bronze
:
return
total
-
total
*
.
10
m
;
case
CustomerType
.
Silver
:
return
total
-
total
*
.
05
m
;
case
CustomerType
.
Gold
:
return
total
-
total
*
.
02
m
;
case
CustomerType
.
None
:
default
:
return
total
;
}
}
}
Local methods are useful whenever code is only relevant to a single method and you want to separate that code. Reasons for separating code are to give meaning to a set of complex logic, re-use logic and simplify calling code (perhaps a loop), or allow an async method to throw an exception before awaiting the enclosing method.
The Main
method in the solution has a local method, named ApplyDiscount
. This example demonstrates how a local method can simplify code. If you examine the code in ApplyDiscount
, it might not be immediately clear what its purpose is. However, by separating that logic into its own method, anyone can read the method name and know what the purpose of the logic is. This is a great way to make code more maintainable, by expressing intent, and making that logic local where another developer won’t need to hunt for a class method that might move around after future maintenance.
An application must be extensible, for adding new plug-in capabilities, but you don’t want to re-write existing code for new classes.
This is a common interface for several classes to implement:
public
interface
IInvoice
{
bool
IsApproved
();
void
PopulateLineItems
();
void
CalculateBalance
();
void
SetDueDate
();
}
Here are a few classes that implement IInvoice
:
public
class
BankInvoice
:
IInvoice
{
public
void
CalculateBalance
()
{
Console
.
WriteLine
(
"Calculating balance for BankInvoice."
);
}
public
bool
IsApproved
()
{
Console
.
WriteLine
(
"Checking approval for BankInvoice."
);
return
true
;
}
public
void
PopulateLineItems
()
{
Console
.
WriteLine
(
"Populating items for BankInvoice."
);
}
public
void
SetDueDate
()
{
Console
.
WriteLine
(
"Setting due date for BankInvoice."
);
}
}
public
class
EnterpriseInvoice
:
IInvoice
{
public
void
CalculateBalance
()
{
Console
.
WriteLine
(
"Calculating balance for EnterpriseInvoice."
);
}
public
bool
IsApproved
()
{
Console
.
WriteLine
(
"Checking approval for EnterpriseInvoice."
);
return
true
;
}
public
void
PopulateLineItems
()
{
Console
.
WriteLine
(
"Populating items for EnterpriseInvoice."
);
}
public
void
SetDueDate
()
{
Console
.
WriteLine
(
"Setting due date for EnterpriseInvoice."
);
}
}
public
class
GovernmentInvoice
:
IInvoice
{
public
void
CalculateBalance
()
{
Console
.
WriteLine
(
"Calculating balance for GovernmentInvoice."
);
}
public
bool
IsApproved
()
{
Console
.
WriteLine
(
"Checking approval for GovernmentInvoice."
);
return
true
;
}
public
void
PopulateLineItems
()
{
Console
.
WriteLine
(
"Populating items for GovernmentInvoice."
);
}
public
void
SetDueDate
()
{
Console
.
WriteLine
(
"Setting due date for GovernmentInvoice."
);
}
}
This method populates a collection with classes that implement IInvoice
:
static
List
<
IInvoice
>
GetInvoices
()
{
return
new
List
<
IInvoice
>
{
new
BankInvoice
(),
new
EnterpriseInvoice
(),
new
GovernmentInvoice
()
};
}
The Main
method has an algorithm that operates on the IInvoice
interface:
static
void
Main
(
string
[]
args
)
{
List
<
IInvoice
>
invoices
=
GetInvoices
();
foreach
(
var
invoice
in
invoices
)
{
if
(
invoice
.
IsApproved
())
{
invoice
.
CalculateBalance
();
invoice
.
PopulateLineItems
();
invoice
.
SetDueDate
();
}
}
}
As a developer’s career progresses, chances are they’ll encounter requirements that customers want an application to be “extensible”. Although the exact meaning is anomalous to even the most seasoned architects, there’s a general understanding that “extensibility” should be a theme in the application’s design. We generally move in this direction by identifying areas of the application that can and will change over time. Patterns can help with this, such as the factory classes of Section 1.3, factory methods of Section 1.4, and builders in Section 1.10. In a similar light, the Strategy pattern described in this section helps organize code for extensibility.
The Strategy pattern is useful when there are multiple object types to work with at the same time and you want them to be interchangeable and write code one time that operates the same way for each object. The software we use every day are classic examples of where a Strategy could work. Office applications have different document types and allow developers to write their own add-ins. Browsers have add-ins that developers can write. The editors and Integrated Development Environments (IDEs) you use every day have plug-in capabilities.
The solution describes an application that operates on different types of invoices in the domains of Banking, Enterprise, and Government. Each of these domains have their own business rules related to legal or other requirements. What makes this extensible is the fact that, in the future, we can add another object to handle invoices in another domain.
The glue to making this work is the IInvoice
interface. It contains the required methods (or contract) that each implementing object must define. You can see that the BankInvoice
, EnterpriseInvoice
, and GovernmentInvoices
each implement IInvoice
.
GetInvoices
simulates the situation where you would write code to populate invoices from a data source. Whenever you need to extend the framework, by adding a new IInvoice
derived type, this is the only code that changes. Because all classes are IInvoice
, they can all be returned via the same List<IInvoice>
collection.
Finally, examine the Main
method. It iterates on each IInvoice
object, calling each method. Main
doesn’t care what the specific implementation is and so its code never needs to change to accommodate instance specific logic. You don’t need if
or switch
statements for special cases, which blows up into spaghetti code in maintenance. Any future changes will be on how Main
works with the IInvoice
interface. Any changes to business logic associated with invoices is limited to the invoice types themselves. This is easy to maintain and easy to figure out where logic is and should be. Further, it’s also easy to extend by adding a new Plug-In class that implements IInvoice
.
1.3 Delegating Object Creation to a Class 1.4 Delegating Object Creation to a Method 1.10 Building a Fluid Interface
You need to search for objects in a collection and default equality won’t work.
The Invoice
class implements IEquatable<T>
:
public
class
Invoice
:
IEquatable
<
Invoice
>
{
public
int
CustomerID
{
get
;
set
;
}
public
DateTime
Created
{
get
;
set
;
}
public
List
<
string
>
InvoiceItems
{
get
;
set
;
}
public
decimal
Total
{
get
;
set
;
}
public
bool
Equals
(
Invoice
other
)
{
if
(
ReferenceEquals
(
other
,
null
))
return
false
;
if
(
ReferenceEquals
(
this
,
other
))
return
true
;
if
(
GetType
()
!=
other
.
GetType
())
return
false
;
return
CustomerID
==
other
.
CustomerID
&&
Created
.
Date
==
other
.
Created
.
Date
;
}
public
override
bool
Equals
(
object
other
)
{
return
Equals
(
other
as
Invoice
);
}
public
override
int
GetHashCode
()
{
return
(
CustomerID
+
Created
.
Ticks
).
GetHashCode
();
}
public
static
bool
operator
==(
Invoice
left
,
Invoice
right
)
{
if
(
ReferenceEquals
(
left
,
null
))
return
ReferenceEquals
(
right
,
null
);
return
left
.
Equals
(
right
);
}
public
static
bool
operator
!=(
Invoice
left
,
Invoice
right
)
{
return
!(
left
==
right
);
}
}
This code returns a collection of Invoice
classes:
private
static
List
<
Invoice
>
GetAllInvoices
()
{
return
new
List
<
Invoice
>
{
new
Invoice
{
CustomerID
=
1
,
Created
=
DateTime
.
Now
},
new
Invoice
{
CustomerID
=
2
,
Created
=
DateTime
.
Now
},
new
Invoice
{
CustomerID
=
1
,
Created
=
DateTime
.
Now
},
new
Invoice
{
CustomerID
=
3
,
Created
=
DateTime
.
Now
}
};
}
Here’s how to use the Invoice
class:
static
void
Main
(
string
[]
args
)
{
List
<
Invoice
>
allInvoices
=
GetAllInvoices
();
Console
.
WriteLine
(
$
"# of All Invoices: {allInvoices.Count}"
);
var
invoicesToProcess
=
new
List
<
Invoice
>();
foreach
(
var
invoice
in
allInvoices
)
{
if
(!
invoicesToProcess
.
Contains
(
invoice
))
invoicesToProcess
.
Add
(
invoice
);
}
Console
.
WriteLine
(
$
"# of Invoices to Process: {invoicesToProcess.Count}"
);
}
The default equality semantics for reference types is reference equality and for value types is value equality. Reference equality means that when comparing objects, do their references refer to the same exact object instance. Value equality occurs when each member of an object is compared before two objects are considered equal. The problem with reference equality is that sometimes you have two copies of an object, referring to different object instances, but you really want to check their values to see if they are equal. Value equality might also pose a problem because you might only want to check part of the object to see if they’re equal.
To solve the problem of inadequate default equality, the solution implements custom equality on Invoice
. The Invoice
class implements the IEquatable<T>
interface, where T
is Invoice
. Although IEquatable<T>
requires an Equals(T other)
method, you should also implement Equals(object other)
, GetHashCode()
, and the ==
and !=
operators, resulting in a consistent definition of equals for all scenarios.
There’s a lot of science in picking a good hash code, which is out of scope for this book, so the solution implementation is minimal.
The equality implementation avoids repeating code. The !=
operator invokes (and negates) the ==
operator. The ==
operator checks references and returns true
if both references are null
and false
if only one reference is null
. Both the ==
operator and the Equals(object other)
method call the Equals(Invoice other)
method.
The current instance is clearly not null
, so Equals(Invoice other)
only checks the other
reference and returns false
if it’s null
. Then it checks to see if this
and other
have reference equality, which would obviously mean they are equal. Then if the objects aren’t the same type, they are not considered equal. Finally, return the results of the values to compare. In this example, the only thing that makes sense is the CustomerID
and Date
.
One of the places you might change in the Equals(Invoice other)
method is the type check. You could have a different opinion, based on the requirements of your application. e.g. What if you wanted to check equality even if other
was a derived type? Then change the logic to accept derived types also.
The Main
method processes invoices, ensuring we don’t add duplicate invoices to a list. In the loop, it calls the collection Contains
method, which checks the object’s equality. If it doesn’t find a matching object, it adds it to the invoicesToProcess
list. When running the program, there are 4 invoices that exist in allInvoices
, but only 3 are added to invoicesToProcess
because there’s one duplicate (based on CustomerID
and Date
) in allInvoices
.
C# 9.0 Records give you IEquatable<T>
logic by default. However, Records give you value equality and you would want to implement IEquatable<T>
yourself if you needed to be more specific. e.g. if your object has free-form text fields that don’t contribute to the identity of the object, why waste resources doing the unnecessary field comparisons? Another problem (maybe more rare) could be that some parts of a record might be different for temporal reasons, e.g. temporary timestamps, status, or Globally Unique Identifiers (GUIDs), that will cause the objects to never be equal during processing.
An app needs to work with hierarchical data and an iterative approach is too complex and unnatural.
This is the format of data we’re starting with:
class
BillingCategory
{
public
int
ID
{
get
;
set
;
}
public
string
Name
{
get
;
set
;
}
public
int?
Parent
{
get
;
set
;
}
}
This method returns a collection of hierarchically related records:
static
List
<
BillingCategory
>
GetBillingCategories
()
{
return
new
List
<
BillingCategory
>
{
new
BillingCategory
{
ID
=
1
,
Name
=
"First 1"
,
Parent
=
null
},
new
BillingCategory
{
ID
=
2
,
Name
=
"First 2"
,
Parent
=
null
},
new
BillingCategory
{
ID
=
4
,
Name
=
"Second 1"
,
Parent
=
1
},
new
BillingCategory
{
ID
=
3
,
Name
=
"First 3"
,
Parent
=
null
},
new
BillingCategory
{
ID
=
5
,
Name
=
"Second 2"
,
Parent
=
2
},
new
BillingCategory
{
ID
=
6
,
Name
=
"Second 3"
,
Parent
=
3
},
new
BillingCategory
{
ID
=
8
,
Name
=
"Third 1"
,
Parent
=
5
},
new
BillingCategory
{
ID
=
8
,
Name
=
"Third 2"
,
Parent
=
6
},
new
BillingCategory
{
ID
=
7
,
Name
=
"Second 4"
,
Parent
=
3
},
new
BillingCategory
{
ID
=
9
,
Name
=
"Second 5"
,
Parent
=
1
},
new
BillingCategory
{
ID
=
8
,
Name
=
"Third 3"
,
Parent
=
9
}
};
}
This is a recursive algorithm that transforms the flat data into a hierarchical form:
static
List
<
BillingCategory
>
BuildHierarchy
(
List
<
BillingCategory
>
categories
,
int?
catID
,
int
level
)
{
var
found
=
new
List
<
BillingCategory
>();
foreach
(
var
cat
in
categories
)
{
if
(
cat
.
Parent
==
catID
)
{
cat
.
Name
=
new
string
(
' '
,
level
)
+
cat
.
Name
;
found
.
Add
(
cat
);
List
<
BillingCategory
>
subCategories
=
BuildHierarchy
(
categories
,
cat
.
ID
,
level
+
1
);
found
.
AddRange
(
subCategories
);
}
}
return
found
;
}
The Main
method runs the program and prints out the hierarchical data:
static
void
Main
(
string
[]
args
)
{
List
<
BillingCategory
>
categories
=
GetBillingCategories
();
List
<
BillingCategory
>
hierarchy
=
BuildHierarchy
(
categories
,
catID
:
null
,
level
:
0
);
PrintHierarchy
(
hierarchy
);
}
static
void
PrintHierarchy
(
List
<
BillingCategory
>
hierarchy
)
{
foreach
(
var
cat
in
hierarchy
)
Console
.
WriteLine
(
cat
.
Name
);
}
It’s hard to tell how many times you have or will encounter iterative algorithms with complex logic and conditions on how the loop operates. Loops like for
, foreach
, and while
are familiar and often used when more elegant solutions are available. I’m not suggesting there’s anything wrong with loops, which are integral parts of our language toolset. However, it’s useful to expand our minds to other techniques that might lend themselves to more elegant and maintainable code for given situations. Sometimes a declarative approach, like a simple lambda on a collection’s ForEach
operator is simple and clear. LINQ is a nice solution for working with object collections in memory, which is the subject of Chapter 4. Another alternative is recursion - the subject of this section.
The main point I’m making here is that we need to write algorithms using the techniques that are most natural for a given situation. A lot of algorithms do use loops naturally, like iterating through a collection. Other tasks beckon for recursion. A class of algorithms that work on hierarchies might be excellent candidates for recursion.
The solution demonstrates one of the areas where recursion simplified processing and makes the code clear. It processes a list of categories based on billing. Notice that the BillingCategory
class has both an ID
and a Parent
. These manage the hierarchy, where the Parent
identifies the parent category. Any BillingCategory
with a null
Parent
is a top level category. This is a single table relational DB representation of hierarchical data.
The GetBillingCategories
represents how the BillingCategories
arrive from a DB. It’s a flat structure. Notice how the Parent
properties reference the BillingCategory
IDs that are their parents. Another important fact about the data is that there isn’t a clean ordering between parents and children. In a real application, you’ll start off with a given set of categories and add new categories later. Again, maintenance in code and data over time changes how we approach algorithm design and this would complicate an iterative solution.
The purpose of this solution is to take the flat category representation and transform it into another list that represents the hierarchical relationship between categories. This was a simple solution, but you might imagine an object based representation where parent categories contained a collection with child categories. The BuildHierarchy
method is the recursive algorithm that does this.
The BuildHierarchy
method accepts 3 parameters: categories
, catID
, and level
. The categories
parameter is the flat collection from the DB and every recursive call receives a reference to this same collection. A potential optimization might be to remove categories that have already been processed, though the demo avoids anything distracting from presented concepts. The catID
parameter is the ID
for the current BillingCategory
and the code is searching for all sub-categories whose Parent
matches catID
- as demonstrated by the if
statement inside the foreach
loop. The level
parameter helps manage the visual representation of each category. The first statement inside the if
block uses level
to determine how many tabs (
) to prefix to the category name. Every time we make a recursive call to BuildHierarchy
, we increment level so that subcategories are indented more than their parents.
The algorithm calls BuildHierarchy
with the same categories collection. Also, it uses the ID
of the current category, not the catID
parameter. This means that it recursively calls BuildHierarchy
until it reaches the bottom most categories. It will know it’s at the bottom of the hierarchy because the foreach
loop completes with no new categories because there aren’t any sub-categories for the current (bottom) category.
After reaching the bottom, BuildHierarchy
returns and continues the foreach
loop, collecting all of the categories under the catID
- that is, their Parent
is catID
. Then it appends any matching sub-categories to the found
collection to the calling BuildHierchy
. This continues until the algorithm reaches the top level and all root categories are processed.
The recursive algorithm in this solution is referred to as Depth First Search.
Having arrived at the top level, BuildHierarchy
returns the entire collection to its original caller, which is Main
. Main
originally called BuildHierarchy
with the entire flat categories
collection. It set catID
to null
, indicating that BuildHierarchy
should start at the root level. The level
argument is 0
, indicating that we don’t want any tab prefixes on root level category names. Here’s the output:
First 1 Second 1 Second 5 Third 3 First 2 Second 2 Third 1 First 3 Second 3 Third 2 Second 4
Looking back at the GetBillingCategories
method, you can see how the visual representation matches the data.
A service is sending date information in seconds or ticks since the Linux epoc that needs to be converted to a C#/.NET DateTime.
Here are some values we’ll be using:
static
readonly
DateTime
LinuxEpoch
=
new
DateTime
(
1970
,
1
,
1
,
0
,
0
,
0
,
0
);
static
readonly
DateTime
WindowsEpoch
=
new
DateTime
(
0001
,
1
,
1
,
0
,
0
,
0
,
0
);
static
readonly
double
EpochMillisecondDifference
=
new
TimeSpan
(
LinuxEpoch
.
Ticks
-
WindowsEpoch
.
Ticks
).
TotalMilliseconds
;
These methods convert from and to Linux epoch timestamps:
public
static
string
ToLinuxTimestampFromDateTime
(
DateTime
date
)
{
double
dotnetMilliseconds
=
TimeSpan
.
FromTicks
(
date
.
Ticks
).
TotalMilliseconds
;
double
linuxMilliseconds
=
dotnetMilliseconds
-
EpochMillisecondDifference
;
double
timestamp
=
Math
.
Round
(
linuxMilliseconds
,
0
,
MidpointRounding
.
AwayFromZero
);
return
timestamp
.
ToString
();
}
public
static
DateTime
ToDateTimeFromLinuxTimestamp
(
string
timestamp
)
{
ulong
.
TryParse
(
timestamp
,
out
ulong
epochMilliseconds
);
return
LinuxEpoch
+
+
TimeSpan
.
FromMilliseconds
(
epochMilliseconds
);
}
The Main
method demonstrates how to use those methods:
static
void
Main
()
{
Console
.
WriteLine
(
$
"WindowsEpoch == DateTime.MinValue: "
+
$
"{WindowsEpoch == DateTime.MinValue}"
);
DateTime
testDate
=
new
DateTime
(
2021
,
01
,
01
);
Console
.
WriteLine
(
$
"testDate: {testDate}"
);
string
linuxTimestamp
=
ToLinuxTimestampFromDateTime
(
testDate
);
TimeSpan
dotnetTimeSpan
=
TimeSpan
.
FromMilliseconds
(
long
.
Parse
(
linuxTimestamp
));
DateTime
problemDate
=
new
DateTime
(
dotnetTimeSpan
.
Ticks
);
Console
.
WriteLine
(
$
"Accidentally based on .NET Epoch: {problemDate}"
);
DateTime
goodDate
=
ToDateTimeFromLinuxTimestamp
(
linuxTimestamp
);
Console
.
WriteLine
(
$
"Properly based on Linux Epoch: {goodDate}"
);
}
Sometimes developers represent date/time data as milliseconds or ticks in a database. Ticks are measured as 100 nanoseconds. Both milliseconds and Ticks represent time starting at a pre-defined epoch, which is some point in time that is the minimum date for a computing platform. For .NET, the epoch is 01/01/0001 00:00:00, corresponding to the WindowsEpoch
field in the solution. This is the same as DateTime.MinValue
, but defining this way makes the example more explicit. For MacOS, the epoch is 1 January 1904 and for Linux, the epoch is 1 January 1970, as shown by the LinuxEpoch
field in the solution.
There are various opinions on whether representing DateTime
values as milliseconds or ticks is a proper design. However, I leave that debate to other people and venues. My habit is to use the DateTime
format of the database I’m using. I also translate the DateTime
to UTC because many apps need to exist beyond the local time zone and you need a consistent translatable representation.
Increasingly, developers are more likely to encounter situations where they need to build cross-platform solutions or integrate with a 3rd party system with milliseconds or ticks based on a different epoch. e.g. The Twitter API began using milliseconds based on the Linux epoch in their 2020 version 2.0 release. The solution example is inspired by code that works with milliseconds from Twitter API responses. The release of .NET Core gave us cross-platform capabilities for C# developers for Console and ASP.NET MVC Core applications. .NET 5 continues the cross-platform story and the roadmap for .NET 6 includes the first rich GUI interface, codenamed Maui. If you’ve been accustomed to working solely in the Microsoft and .NET platforms, this should indicate that things continue to change along the type of thinking required for future development.
The ToLinuxTimestampFromDateTime
takes a .NET DateTime
and converts it to a Linux timestamp. The Linux timestamp is the number of milliseconds from the Linux epoch. Since we’re working in milliseconds, the TimeSpan
converts the DateTime
ticks to milliseconds. To perform the conversion, we subtract the number of milliseconds between the .NET time and the equivalent Linux time, which we pre-calculated in EpochMillisecondDifference
by subtracting the .NET (Windows) epoch from the Linux epoch. After the conversion, we need to round the value to eliminate excess precision. The default to Math.Round
uses what’s called Bankers rounding, which is often not what we need, so the overload with MidpointRounding.AwayFromZero
does the rounding we expect. The solution returns the final value as a string and you can change that for what makes sense for your implementation.
The ToDateTimeFromLinuxTimestamp
method is remarkably simpler. After converting to a ulong
, it creates a new timestamp from the milliseconds and adds that to the LinuxEpoch. Here’s the output from the Main
method:
WindowsEpoch == DateTime.MinValue: True testDate: 1/1/2021 12:00:00 AM Accidentally based on .NET Epoch: 1/2/0052 12:00:00 AM Properly based on Linux Epoch: 1/1/2021 12:00:00 AM
As you can see, DateTime.MinValue
is the same as the Windows epoch. Using 1/1/2021 as a good date (at least we hope so), Main
starts by properly converting that date to a Linux timestamp. Then it shows the wrong way to process that date. Finally, it calls ToDateTimeFromLinuxTimestamp
performing the proper translation.
Network latency is causing an app to run slowly because static frequently used data is being fetched too often.
Here’s the type of data that will be cached:
public
class
InvoiceCategory
{
public
int
ID
{
get
;
set
;
}
public
string
Name
{
get
;
set
;
}
}
This is the interface for the repository that retrieves the data:
public
interface
IInvoiceRepository
{
List
<
InvoiceCategory
>
GetInvoiceCategories
();
}
This is the repository the retrieves and caches the data:
public
class
InvoiceRepository
:
IInvoiceRepository
{
static
List
<
InvoiceCategory
>
invoiceCategories
;
public
List
<
InvoiceCategory
>
GetInvoiceCategories
()
{
if
(
invoiceCategories
==
null
)
invoiceCategories
=
GetInvoiceCategoriesFromDB
();
return
invoiceCategories
;
}
List
<
InvoiceCategory
>
GetInvoiceCategoriesFromDB
()
{
return
new
List
<
InvoiceCategory
>
{
new
InvoiceCategory
{
ID
=
1
,
Name
=
"Government"
},
new
InvoiceCategory
{
ID
=
2
,
Name
=
"Financial"
},
new
InvoiceCategory
{
ID
=
3
,
Name
=
"Enterprise"
},
};
}
}
Here’s the program that uses that repository:
class
Program
{
readonly
IInvoiceRepository
invoiceRep
;
public
Program
(
IInvoiceRepository
invoiceRep
)
{
this
.
invoiceRep
=
invoiceRep
;
}
static
void
Main
()
{
new
Program
(
new
InvoiceRepository
()).
Run
();
}
void
Run
()
{
List
<
InvoiceCategory
>
categories
=
invoiceRep
.
GetInvoiceCategories
();
foreach
(
var
category
in
categories
)
Console
.
WriteLine
(
$
"ID: {category.ID}, Name: {category.Name}"
);
}
}
Depending on the technology you’re using, there could be plenty of options for caching data through mechanisms like CDN, HTTP, and data source solutions. Each has a place and purpose and this section doesn’t try to cover all of those options. Rather, it just has a quick and simple technique for caching data that might be helpful.
You might have experienced a scenario where there’s a set of data used in a lot of different places. The nature of the data is typically lookup lists or business rule data. In the course of every day work, we build queries that includes this data either in direct select queries or in the form of database table joins. We forget about it until someone starts complaining about application performance. Analysis might reveal that there are a lot of queries that request that same data over and over again. If it’s practical, you can cache that data in memory to avoid network latency exacerbated by excessive queries to the same set of data.
This isn’t a blanket solution because you have to think about whether it’s practical in your situation. e.g. it’s impractical to hold too much data in memory, which will cause other scalability problems. Ideally, it’s a finite and relatively small set of data, like invoice categories. That data shouldn’t change too often because if you need real-time access to dynamic data, this won’t work. e.g. If the underlying data source changes, the cache is likely to be holding the old stale data.
The solution shows an InvoiceCategory
class that we’re going to cache. It’s for a lookup list, just two values per object, a finite and relatively small set of values, and something that doesn’t change much. You can imagine that every query for invoices would require this data as well as admin or search screens with lookup lists. It might speed up invoice queries by removing the extra join and returning less data over the wire where you can join the cached data after the DB query.
The solution has an InventoryRepository
that implements the IInvoiceRepository
interface. This wasn’t strictly necessary for this example, though it does support demonstrating another example of IoC, as discussed in Section 1.2.
The InvoiceRepository
class has a invoiceCategories
field for holding a collection of InvoiceCategory
. The GetInvoiceCategories
method would normally make a DB query and return the results. However, this example only does the DB query if invoiceCategories
is null
and caches the result in invoiceCategories
. This way, subsequent requests get the cached version and doesn’t require a DB query.
The invoiceCategories
field is static because you only want a single cache. In stateless web scenarios, as in ASP.NET, the IIS process recycles unpredictably and developers are advised not to rely on static variables. This situation is different because if the recycle clears out invoiceCategories
, leaving it null
, the next query will re-populate it.
The Main
method uses IoC to instantiate InvoiceRepository
and performs a query for the InvoiceCategory
collection.
1.2 Removing Explicit Dependencies
A class has heavy instantiation requirements and you can save on resource usage by delaying the instantiation to only when necessary.
Here’s the data we’ll work with:
public
class
InvoiceCategory
{
public
int
ID
{
get
;
set
;
}
public
string
Name
{
get
;
set
;
}
}
This is the repository interface:
public
interface
IInvoiceRepository
{
void
AddInvoiceCategory
(
string
category
);
}
This is the repository that we delay instantiation of:
public
class
InvoiceRepository
:
IInvoiceRepository
{
public
InvoiceRepository
()
{
Console
.
WriteLine
(
"InvoiceRepository Instantiated."
);
}
public
void
AddInvoiceCategory
(
string
category
)
{
Console
.
WriteLine
(
$
"for category: {category}"
);
}
}
This program shows a few ways to perform lazy initialization of the repository:
class
Program
{
public
static
ServiceProvider
Container
;
readonly
Lazy
<
InvoiceRepository
>
InvoiceRep
=
new
Lazy
<
InvoiceRepository
>();
readonly
Lazy
<
IInvoiceRepository
>
InvoiceRepFactory
=
new
Lazy
<
IInvoiceRepository
>(
CreateInvoiceRepositoryInstance
);
readonly
Lazy
<
IInvoiceRepository
>
InvoiceRepIoC
=
new
Lazy
<
IInvoiceRepository
>(
CreateInvoiceRepositoryFromIoC
);
static
IInvoiceRepository
CreateInvoiceRepositoryInstance
()
{
return
new
InvoiceRepository
();
}
static
IInvoiceRepository
CreateInvoiceRepositoryFromIoC
()
{
return
Program
.
Container
.
GetRequiredService
<
IInvoiceRepository
>();
}
static
void
Main
()
{
Container
=
new
ServiceCollection
()
.
AddTransient
<
IInvoiceRepository
,
InvoiceRepository
>()
.
BuildServiceProvider
();
new
Program
().
Run
();
}
void
Run
()
{
IInvoiceRepository
viaLazyDefault
=
InvoiceRep
.
Value
;
viaLazyDefault
.
AddInvoiceCategory
(
"Via Lazy Default "
);
IInvoiceRepository
viaLazyFactory
=
InvoiceRepFactory
.
Value
;
viaLazyFactory
.
AddInvoiceCategory
(
"Via Lazy Factory "
);
IInvoiceRepository
viaLazyIoC
=
InvoiceRepIoC
.
Value
;
viaLazyIoC
.
AddInvoiceCategory
(
"Via Lazy IoC "
);
}
}
Sometimes you have objects with heavy startup overhead. They might need some initial calculation or have to wait on data that takes a while because of network latency or dependencies on poorly performing external systems. This can have serious negative consequences, especially on application startup. Imagine an app that is losing potential customers during trial because it starts too slow or even enterprise users whose work is impacted by wait times. Although you may or may not be able to fix the root cause of the performance bottleneck, another option might be to delay instantiation of that object until you need it. e.g. What if you really don’t need that object immediately and can show a start screen right away?
The solution demonstrates how to use Lazy<T>
to delay object instantiation. The object in question is the InvoiceRepository
and we’re assuming it has a problem in its constructor logic that causes a delay in instantiation.
Program
has 3 fields whose type is Lazy<InvoiceRepository>
, showing 3 different ways to instantiate. The first field, InvoiceRep
instantiates a Lazy<InvoiceRepository>
with no parameters. It assumes that InvoiceRepository
has a default constructor (parameterless) and will create a new instance when called.
The InvoiceRepFactory
property instance references the CreateInvoiceRepositoryInstance
method. When code accesses this property, it calls the CreateInvoiceRepositoryInstance
to construct the object. Since it’s a method, you have a lot of flexibility in building the object.
In addition to the other two options, the InvoiceRepIoC
property shows how you can use Lazy instantiation with IoC. Notice that the Main
method builds an IoC container, as described in Section 1.2. The CreateInvoiceRepositoryFromIoC
method uses that IoC container to request an instance of InvoiceRepository
.
Finally, the Run
method shows how to access the fields, through the Lazy<T>.Value
property.
1.2 Removing Explicit Dependencies
The application needs to extract data from a custom external format and string type operations lead to complex and less efficient code.
Here’s the data types we’ll be working with:
class
InvoiceItem
{
public
decimal
Cost
{
get
;
set
;
}
public
string
Description
{
get
;
set
;
}
}
class
Invoice
{
public
string
Customer
{
get
;
set
;
}
public
DateTime
Created
{
get
;
set
;
}
public
List
<
InvoiceItem
>
Items
{
get
;
set
;
}
public
decimal
Total
{
get
;
set
;
}
}
This method returns the raw string data that we want to extract and convert to invoices:
static
string
GetInvoiceTransferFile
()
{
return
"Creator 1::8/05/20::Item 1 35.05 Item 2 25.18 Item 3 13.13::Customer 1::[NOTE] 1 "
+
"Creator 2::8/10/20::Item 1 45.05::Customer 2::[NOTE] 2 "
+
"Creator 1::8/15/20::Item 1 55.05 Item 2 65.18::Customer 3::[NOTE] 3 "
;
}
These
are
utility
methods
for
building
and
saving
invoices
:
static
Invoice
GetInvoice
(
string
matchCustomer
,
string
matchCreated
,
string
matchItems
)
{
List
<
InvoiceItem
>
lineItems
=
GetLineItems
(
matchItems
);
DateTime
.
TryParse
(
matchCreated
,
out
DateTime
created
);
var
invoice
=
new
Invoice
{
Customer
=
matchCustomer
,
Created
=
created
,
Items
=
lineItems
};
return
invoice
;
}
static
List
<
InvoiceItem
>
GetLineItems
(
string
matchItems
)
{
var
lineItems
=
new
List
<
InvoiceItem
>();
string
[]
itemStrings
=
matchItems
.
Split
(
' '
);
for
(
int
i
=
0
;
i
<
itemStrings
.
Length
;
i
+=
2
)
{
decimal
.
TryParse
(
itemStrings
[
i
+
1
],
out
decimal
cost
);
lineItems
.
Add
(
new
InvoiceItem
{
Description
=
itemStrings
[
i
],
Cost
=
cost
});
}
return
lineItems
;
}
static
void
SaveInvoices
(
List
<
Invoice
>
invoices
)
{
Console
.
WriteLine
(
$
"{invoices.Count} invoices saved."
);
}
This method uses regular expressions to extract values from raw string data:
static
List
<
Invoice
>
ParseInvoices
(
string
invoiceFile
)
{
var
invoices
=
new
List
<
Invoice
>();
Regex
invoiceRegEx
=
new
Regex
(
@"^.+?::(?<created>.+?)::(?<items>.+?)::(?<customer>.+?)::.+"
);
foreach
(
var
invoiceString
in
invoiceFile
.
Split
(
' '
))
{
Match
match
=
invoiceRegEx
.
Match
(
invoiceString
);
if
(
match
.
Success
)
{
string
matchCustomer
=
match
.
Groups
[
"customer"
].
Value
;
string
matchCreated
=
match
.
Groups
[
"created"
].
Value
;
string
matchItems
=
match
.
Groups
[
"items"
].
Value
;
Invoice
invoice
=
GetInvoice
(
matchCustomer
,
matchCreated
,
matchItems
);
invoices
.
Add
(
invoice
);
}
}
return
invoices
;
}
The Main
method runs the demo:
static
void
Main
(
string
[]
args
)
{
string
invoiceFile
=
GetInvoiceTransferFile
();
List
<
Invoice
>
invoices
=
ParseInvoices
(
invoiceFile
);
SaveInvoices
(
invoices
);
}
Sometimes, we’ll encounter textual data that doesn’t fit standard data formats. It might come from existing document files, log files, or external and legacy systems. Often, we need to ingest that data and process it for storage in a DB. This section explains how to do that with regular expressions.
The solution shows the data format we want to generate is an Invoice
with a collection of InvoiceItem
. The GetInvoiceTransferFile
method shows the format of the data. The demo suggests that the data might come from a legacy system that already produced that format and it’s easier to write C# code to ingest that than to add code in that system for a better supported format. The specific data we’re interested in extracting are the created
date, invoice items
, and customer
name. Notice that newlines (
) separate records, double colons (::
) separate invoice fields, and tabs (
) separate invoice item fields.
The GetInvoice
and GetLineItems
methods construct the objects from extracted data and serve to separate object construction from the regular expression extraction logic.
The ParseInvoices
method uses regular expressions to extract values from the input string. The RegEx
constructor parameter contains the regular expression string, used to extract values. While an entire discussion of regular expressions is out of scope, here’s what this string does:
^
says to start at the beginning of the string
.
?::+ matches all characters, up to the next invoice field separator (::
). That said, it ignores the contents that were matched.
(?<created>.
?)::+, (?<items>.
?)::+, and (?<customer>.
?)::+ are similar to .
?)::+, but go a step further by extracting values into groups based on the given name. e.g. (?<created>.
?)::+ means that it will extract all matched data and put the data in a group named “created”.
.
+ matches all remaining characters
The foreach
loop relies on the
separator in the string to work with each invoice. The Match
method executes the regular expression match, extracting values. If the match was successful, the code extracts values from groups, calls GetInvoice
and adds the new invoice to the invoices
collection.
You might have noticed how we’re using GetLineItems
to extract data from the matchItems
parameter, from the regular expression items
field. We could have used a more sophisticated regular expression to take care of that too. However, this was intentional for contrast in demonstrating how regular expression processing is a more elegant solution in this situation.
As an enhancement, you might log any situations where match.Success
is false
if you’re concerned about losing data and/or want to know if there’s a bug in the regular expression or original data formatting.
Finally, the application returns the new line items to the calling code, Main
, so it can save them.
18.188.252.23