Using ML.NET with .NET Core and Forecasting

Now that we have completed our deep dive into the various groups of algorithms ML.NET offers, we will begin to explore integrating ML.NET into a production application over the next few chapters. In this chapter, we will deep dive into a .NET Core console application building on the structure defined in previous chapters with a focus on hardening and error handling. The application we will be building uses forecasting to predict stock prices based on a series of trends. By the end of this chapter, you should have a firm grasp of designing and coding a production-grade .NET Core application with ML.NET.

In this chapter, we will cover the following topics:

  • Breaking down the .NET Core application architecture
  • Creating the forecasting application
  • Exploring additional production application enhancements

Breaking down the .NET Core application architecture

As mentioned in Chapter 1Getting Started with Machine Learning and ML.NET, .NET Core 3.x is the preferred platform for using ML.NET due to the optimization done in the 3.0 release. In addition, .NET Core provides a singular coding framework to target Linux, macOS, and Windows, as noted in the following diagram:

.NET Core architecture

From its inception in 2016, the underlying goals of .NET Core have been to provide rapid updates and feature parity with (the previously Windows-only) Microsoft .NET Framework. Over time and versions, the gap has gotten smaller by simply adding the APIs that were missing, using additional NuGet packages. One such example of this is Microsoft.Windows.Compatibility that provides 20,000 APIs not found in the Core framework including registry access, drawing, and Windows Permission Model access. This approach keeps the framework light and cross-platform but does introduce some design patterns to help you to develop your platform-specific applications.

Take, for instance, a Windows Desktop application that uses ML.NET to provide an Intrusion Detection System (IDS). A simple approach would be to write all of the code in a .NET Core Windows Presentation Foundation (WPF) application. However, this would tie you to Windows only without doing major refactoring. A better approach would be to create a .NET Core class library that contains all platform-agnostic code and then creates abstract classes or interfaces to implement the platform-specific code inside your platform application.

.NET Core targets

As mentioned previously, .NET Core offers a single framework to target Windows, macOS, and Linux. However, this doesn't just apply to console applications as we have used throughout this book. Recent work in .NET Core 3 has provided the ability to port existing .NET Framework WPF and Windows Forms applications to .NET Core 3, thereby enabling applications that rely on potentially years-old frameworks to use the latest .NET Core advancements. In addition, web applications that previously used ASP.NET can be migrated over to ASP.NET Core (ASP.NET WebForms does not currently have a migration path).

Another benefit of .NET Core targeting is the ability to compile with the --self-contained flag. This flag compiles your application or library and then bundles all necessary .NET Core framework files. This allows you to deploy your application without a .NET prerequisite during install. This does make your overall build output larger, but in a customer scenario, a ~100MB increase far outweighs the deployment hurdles of prerequisites.

.NET Core future

You might wonder what the future of .NET Framework, Mono, and .NET Core is. Fortunately, Microsoft, at the time of this writing, has confirmed that all existing frameworks will be migrated into a singular framework simply called .NET 5. Previously, when making a decision on which framework to use, certain trade-offs were guaranteed. Hence, taking the benefits of each framework and unifying them for the first time will eliminate these trade-offs entirely. Take, for instance, Mono's Ahead-Of-Time (AOT) compilation or Xamarin's cross-platform UI support, which can be utilized inside an existing .NET Core 3.x application based on the information released.

A preview of .NET 5 is expected in the first half of 2020, with a production release in November 2020.

Creating the stock price estimator application

As mentioned earlier, the application we will be creating is a stock price estimator. Given a set of stock prices across days, weeks, or years, the forecasting algorithm will internally identify trending patterns. Unlike previous chapters, the application will be architected to be plugged into a production pipeline.

As with previous chapters, the completed project code, sample dataset, and project files can be downloaded from: https://github.com/PacktPublishing/Hands-On-Machine-Learning-With-ML.NET/tree/master/chapter08.

Exploring the project architecture

Building upon the project architecture and code we created in previous chapters, the architecture we will be exploring in this chapter further enhances the architecture to be more structured and thereby more usable for an end user.

Like in some of the previous chapters, an additional NuGet package—Microsoft.ML.TimeSeries—is required to utilize the forecasting functionality in ML.NET. Version 1.3.1 is used in both the included example on GitHub and throughout this chapter's deep dive.

In the following screenshot, you will find the Visual Studio Solution Explorer view of the project. There are several new additions to the solution to facilitate the production use case we are targeting. We will review in detail each of the new files shown in the solution screenshot here later on in this chapter:

The sampledata.csv file contains 24 rows of stock prices. Feel free to adjust the data to fit your own observations or to adjust the trained model. Here is a snippet of the data:

33
34
301
33
44
299
40
50
400
60
76
500

Each of these rows contains the stock price value we will populate into a StockPrices class object that we will review later on in this chapter.

In addition to this, we added the testdata.csv file that contains additional data points to test the newly trained model against and evaluate it. Here is a snippet of the data inside of testdata.csv:

10
25
444
9
11
333
4
3
500

Diving into the code

For this application, as noted in the previous section, we are building on top of the work completed in previous chapters. However, for this chapter, we will be changing every file to support production use cases. For each file changed from previous chapters, we will review the changes made and the reasoning behind these changes.

Classes and enumerations that were changed or added are as follows:

  • ProgramActions
  • CommandLineParser
  • BaseML
  • StockPrediction
  • StockPrices
  • Predictor
  • Trainer
  • ProgramArguments
  • Program

The ProgramActions enumeration

The following ProgramActions enumeration has been added to the solution to facilitate the use of a strongly typed and structured path for handling various actions the program performs:

namespace chapter08.Enums
{
public enum ProgramActions
{
TRAINING,
PREDICT
}
}

In the case of this application, we only have two actions—Training and Predicting. However, as shown in previous chapters, you might also have a feature extraction step or maybe provide an evaluation step. This design pattern allows flexibility while also removing the magic strings problem mentioned at the beginning of this chapter.

The CommandLineParser class

The CommandLineParser class provides a program-agnostic parser for handling command-line arguments. In previous chapters, we were manually parsing the indexes and mapping those values to arguments. On the other hand, this approach creates a flexible, easy-to-maintain and structured response object that maps arguments directly to the properties. Let's now dive into the class:

  1. First, we define the function prototype:
public static T ParseArguments<T>(string[] args) 

The use of generics (that is, T) creates a flexible approach to making this method unconstrained to just this application.

  1. Next, we test for null arguments:
if (args == null)
{
throw new ArgumentNullException(nameof(args));
}
  1. Then, we test for empty arguments and let the user know default values are going to be used instead of failing, as in previous chapters:
if (args.Length == 0)
{
Console.WriteLine("No arguments passed in - using defaults");

return Activator.CreateInstance<T>();
}
  1. After null and empty checks are performed, we then perform a multiple of two checks since all arguments are pairs:
if (args.Length % 2 != 0)
{
throw new ArgumentException($"Arguments must be in pairs, there were {args.Length} passed in");
}
  1. Continuing, we then create an object of the T type using the Activator.CreateInstance method:
var argumentObject = Activator.CreateInstance<T>();

Ensure that, when creating class objects, the constructor has no arguments as this call would throw an exception if so. If you create an object with constructor parameters and without a parameterless constructor, use the overload of Activator.CreateInstance and pass in the required parameters.

  1. Next, we utilize reflection to grab all of the properties of the T type:
var properties = argumentObject.GetType().GetProperties();
  1. Now that we have both the generic object created and the properties of that object, we then loop through each of the argument key/value pairs and set the property in the object:
for (var x = 0; x < args.Length; x += 2)
{
var property = properties.FirstOrDefault(
a => a.Name.Equals(args[x],
StringComparison.CurrentCultureIgnoreCase));

if (property == null)
{
Console.WriteLine($"{args[x]} is an invalid argument");

continue;
}

if (property.PropertyType.IsEnum)
{
property.SetValue(argumentObject,
Enum.Parse(property.PropertyType,
args[x + 1], true));
}
else
{
property.SetValue(argumentObject, args[x + 1]);
}
}

Note the special case for the IsEnum function to handle our previously covered ProgramActions enumeration. Since a string value cannot be automatically converted to an enumeration, we needed to handle the string-to-enumeration conversion specifically with the Enum.Parse method. As written, the enumeration handler is generic if you add more enumerations to the T type.

The BaseML class

The BaseML class for this application has been streamlined to simply instantiate the MLContext object:

using Microsoft.ML;

namespace chapter08.ML.Base
{
public class BaseML
{
protected readonly MLContext MlContext;

protected BaseML()
{
MlContext = new MLContext(2020);
}
}
}

The StockPrediction class

The StockPrediction class is the container for our prediction values, as defined here:

namespace chapter08.ML.Objects
{
public class StockPrediction
{
public float[] StockForecast { get; set; }

public float[] LowerBound { get; set; }

public float[] UpperBound { get; set; }
}
}

The StockForecast property will hold our predicted stock values based on the model training and submitted value to the prediction engine. The LowerBound and UpperBound values hold the lowest and highest estimated values respectively.

The StockPrices class

The StockPrices class contains our single floating-point value holding the stock price. To keep the code cleaner when populating the values, a constructor accepting the stock price value has been added:

using Microsoft.ML.Data;

namespace chapter08.ML.Objects
{
public class StockPrices
{
[LoadColumn(0)]
public float StockPrice;

public StockPrices(float stockPrice)
{
StockPrice = stockPrice;
}
}
}

The Predictor class

The Predictor class, in comparison to previous chapters, has been streamlined and adapted to support forecasting:

  1. First, we adjust the Predict method to accept the newly defined ProgramArguments class object:
public void Predict(ProgramArguments arguments)   
  1. Next, we update the model file.Exists check to utilize the arguments object:
if (!File.Exists(arguments.ModelFileName))
{
Console.WriteLine(
$"Failed to find model at {arguments.ModelFileName}");

return;
}
  1. Similarly, we also update the prediction filename reference to the utilize the arguments object:
if (!File.Exists(arguments.PredictionFileName))
{
Console.WriteLine(
$"Failed to find input data at {
arguments.PredictionFileName}");

return;
}
  1. Next, we also modify the model open call to utilize the arguments object:
using (var stream = new FileStream(Path.Combine(AppContext.BaseDirectory, arguments.ModelFileName), FileMode.Open, FileAccess.Read, FileShare.Read))
{
mlModel = MlContext.Model.Load(stream, out _);
}
  1. We then create the Time Series Engine object with our StockPrices and StockPrediction types:
var predictionEngine = mlModel.CreateTimeSeriesEngine<StockPrices, StockPrediction>(MlContext);
  1. Next, we read the stock price prediction file into a string array:
var stockPrices = File.ReadAllLines(arguments.PredictionFileName);
  1. Lastly, we iterate through each input, call the prediction engine, and display the estimated values:
foreach (var stockPrice in stockPrices)
{
var prediction = predictionEngine.Predict(
new StockPrices(Convert.ToSingle(stockPrice)));

Console.WriteLine(
$"Given a stock price of ${stockPrice},
the next 5 values are predicted to be: " +
$"{string.Join(", ", prediction.StockForecast.Select(
a => $"${Math.Round(a)}"))}");
}

The Trainer class

The Trainer class, akin to the Predictor class, received both streamlining and changes to account for the ML.NET forecasting algorithm:

  1. First, update the function prototype to take the ProgramArguments object:
public void Train(ProgramArguments arguments)     
  1. Next, we update the training file check to utilize the argument object:
if (!File.Exists(arguments.TrainingFileName))
{
Console.WriteLine($"Failed to find training data file ({arguments.TrainingFileName})");

return;
}
  1. Similarly, we then update the testing file check to utilize the argument object:
if (!File.Exists(arguments.TestingFileName))
{
Console.WriteLine($"Failed to find test data file
({arguments.TestingFileName})");

return;
}
  1. Next, we load the StockPrices values from the training file:
var dataView = MlContext.Data.LoadFromTextFile<StockPrices>(arguments.TrainingFileName);
  1. We then create the Forecasting object and utilize the nameof C# feature to avoid magic string references:
var model = MlContext.Forecasting.ForecastBySsa(
outputColumnName: nameof(StockPrediction.StockForecast),
inputColumnName: nameof(StockPrices.StockPrice),
windowSize: 7,
seriesLength: 30,
trainSize: 24,
horizon: 5,
confidenceLevel: 0.95f,
confidenceLowerBoundColumn: nameof(StockPrediction.LowerBound),
confidenceUpperBoundColumn: nameof(StockPrediction.UpperBound));

The input and output column name references are as we have seen in previous chapters. The windowSize property is the duration between the data points in the training set. For this application, we are using 7 to indicate a week's duration. The seriesLength property indicates the total duration of the dataset in this case. The horizon property indicates how many predicted values should be calculated when the model is run. In our case, we are asking for 5 predicted values.

  1. Lastly, we transform the model with the training data, call the CreateTimeSeriesEngine method, and write the model to disk:
var transformer = model.Fit(dataView);

var forecastEngine = transformer.CreateTimeSeriesEngine<StockPrices, StockPrediction>(MlContext);

forecastEngine.CheckPoint(MlContext, arguments.ModelFileName);

Console.WriteLine($"Wrote model to {arguments.ModelFileName}");

The ProgramArguments class

This new class, as referred to earlier in this section, provides the one-to-one mapping of arguments to properties used throughout the application:

  1. First, we define the properties that map directly to the command-line arguments:
public ProgramActions Action { get; set; }

public string TrainingFileName { get; set; }

public string TestingFileName { get; set; }

public string PredictionFileName { get; set; }

public string ModelFileName { get; set; }
  1. Lastly, we populate default values for the properties:
public ProgramArguments()
{
ModelFileName = "chapter8.mdl";

PredictionFileName = @"......Datapredict.csv";

TrainingFileName = @"......Datasampledata.csv";

TestingFileName = @"......Data estdata.csv";
}

Unlike previous chapters, if any property was not set as expected, the program would fail. This is fine for the developer experience; however, in the real world, end users will more than likely attempt to just run the application without any parameters.

The Program class

Inside the Program class, the code has been simplified to utilize the new CommandLineParser class discussed earlier in this chapter. With the use of the CommandLineParser class, all of the actions have been switched to utilize strongly-typed enumerations:

  1. First, while relatively simplistic, clearing the screen of any previous run data is an improved UX:
Console.Clear();
  1. We then use our new CommandLineParser class and associated ParseArguments method to create a strongly-typed argument object:
var arguments = CommandLineParser.ParseArguments<ProgramArguments>(args);
  1. We then can use a simplified and strongly typed switch case to handle our two actions:
switch (arguments.Action)
{
case ProgramActions.PREDICT:
new Predictor().Predict(arguments);
break;
case ProgramActions.TRAINING:
new Trainer().Train(arguments);
break;
default:
Console.WriteLine($"Unhandled action {arguments.Action}");
break;
}

Running the application

To run the application, the process is nearly identical to the sample application in Chapter 3Regression Model, with the addition of passing in the test dataset when training:

  1. Running the application without any arguments to train the model, we use the following step:
PS chapter08inDebug
etcoreapp3.0> .chapter08.exe
No arguments passed in - using defaults
Wrote model to chapter8.mdl
  1. Running the application to make predicitons based on the included prediction data, we use the following step:
PS chapter08inDebug
etcoreapp3.0> .chapter08.exe action predict
Given a stock price of $101, the next 5 values are predicted to be: $128, $925, $140, $145, $1057
Given a stock price of $102, the next 5 values are predicted to be: $924, $138, $136, $1057, $158
Given a stock price of $300, the next 5 values are predicted to be: $136, $134, $852, $156, $150
Given a stock price of $40, the next 5 values are predicted to be: $133, $795, $122, $149, $864
Given a stock price of $30, the next 5 values are predicted to be: $767, $111, $114, $837, $122
Given a stock price of $400, the next 5 values are predicted to be: $105, $102, $676, $116, $108
Given a stock price of $55, the next 5 values are predicted to be: $97, $594, $91, $103, $645
Given a stock price of $69, the next 5 values are predicted to be: $557, $81, $87, $605, $90
Given a stock price of $430, the next 5 values are predicted to be: $76, $78, $515, $84, $85

Feel free to modify the values and see how the prediction changes based on the dataset that the model was trained on. A few areas of experimentation from this point might be to do the following:

  • Tweak the hyperparameters reviewed in the Trainer class, such as the windowSize, seriesLength, or horizon properties, to see how accuracy is affected.
  • Add significantly more data points—this may utilize a data feed of your favorite stock you watch.

Exploring additional production application enhancements

Now that we have completed our deep dive, there are a couple of additional elements to possibly further enhance the application. A few ideas are discussed here.

Logging

Logging utilizing NLog (https://nlog-project.org/) or a similar open source project is highly recommended as your application complexity increases. This will allow you to log to a file, console, or third-party logging solution such as Loggly at varying levels. For instance, if you deploy this application to a customer, breaking down the error level to at least Debug, Warning, and Error will be helpful when debugging issues remotely.

Utilizing Reflection further

As noted earlier in this section to create flexibility and adaptability, we utilized Reflection to parse the command-line arguments. You could take this a step further and replace the switch case statement/standard flow in the Program class with an entirely reflection-based approach, meaning for every action defined in the application, it could inherit from an abstract BaseAction class and at runtime, based on the argument, call the appropriate class. For every new action, simply adding a new entry to the ProgramActions enumeration and then defining a class with that enumeration would be all that is required.

Utilizing a database

In a real-world scenario, the data provided to run predictions will more than likely come from a database. This database, whether it is a Postgres, SQL Server, or SQLite database (to name a few), can be accessed with Microsoft's Entity Framework Core or with ML.NET's built-in database loader method—CreateDatabaseLoader. This loader is akin to how we have loaded data from enumerable or text files with the extra steps of injecting SQL queries. 

In a production scenario, given Entity Framework Core's performance and ability to use LINQ instead of plaintext over ML.NET's implementation (at the time of this writing), I would recommend using Entity Framework if database sources are utilized.

Summary

Throughout this chapter, we have deep-dived into what goes into a production-ready .NET Core application architecture using the work performed in previous chapters as a foundation. We also created a brand new stock price estimator using the forecasting algorithm in ML.NET. Lastly, we discussed some ways to further enhance a .NET Core application (and production applications in general).

In the next chapter, we will deep dive into creating a production-file-classification web application using ML.NET's binary classification and ASP.NET Core's framework.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.179.171