Using ML.NET with ASP.NET Core

Now that we have an idea of how to create a production-grade .NET Core console application, in this chapter, we will deep dive into creating a fully functional ASP.NET Core Blazor web application. This application will utilize an ML.NET binary classification model to make file classifications on Windows executables (Portable Executable (PE) files), in order to determine whether the files themselves are either clean or malicious. Furthermore, we will explore breaking our application code into a component-based architecture using a .NET Core library to share between our web application and the console application that will train our model. By the end of the chapter, you should have a firm grasp of designing and coding a production-grade ASP.NET Core Blazor web application with ML.NET.

In this chapter, we will cover the following topics:

  • Breaking down ASP.NET Core 
  • Creating the file classification web application
  • Exploring additional production-application enhancements

Breaking down ASP.NET Core

Building on the same .NET Core technology discussed in Chapter 8, Using ML.NET with .NET Core and Forecasting, ASP.NET Core adds a powerful web framework. This web framework includes a powerful rendering engine, Razor, in addition to supporting scalable representational state transfer (REST) services. The example in this chapter will use this technology to create our file classification frontend. In the next two sections, we will dive into the ASP.NET Core architecture and discuss Blazor, the new web framework from Microsoft.

Understanding the ASP.NET Core architecture

At a high level, ASP.NET Core builds on top of .NET Core, providing a fully-featured web framework. As with .NET Core, ASP.NET Core runs on Windows, Linux, and macOS, in addition to allowing deployments to x86, x64, and Advanced RISC Machine (ARM) CPU architectures.

A typical ASP.NET Core application includes the following:

  • Models
  • Views
  • Controllers

These components form a common web architecture principle of Model-View-Controller, otherwise known as MVC.

Controllers

Controllers provide the server-side code for handling business logic for both web applications and REST services. Controllers can include both web and REST calls in the same controller, although I would recommend keeping them separate to ensure your code is organized cleanly.

Models

Models provide the container of data from the Controller to the View, and vice versa. For example, take a listing page pulling data from a database. The controller would return a model populated with that data, and if that same data was then used for filtering, it would also be serialized into JavaScript Object Notation (JSON) and sent back to the Controller.

Views

Views provide the templates for the frontend view with support for model binding. Model binding allows properties bound to various Domain Object Model (DOM) objects—such as textboxes, checkboxes, and dropdowns—to be cleanly mapped to and from. This approach of model binding has the added benefit of supporting strongly typed references, which comes in extremely handy when you have a complex View with dozens of properties bound to a Model.  

Form handling with the model binding provides a similar model to the Model-View ViewModel (MVVM) approach we are going to dive into in Chapter 10, Using ML.NET with UWP, with a Universal Windows Platform (UWP) application.

If you want to deep dive further into ASP.NET, Channel 9 from Microsoft has a series called ASP.NET Core 101 that covers all of the main aspects of ASP.NET, at https://channel9.msdn.com/Series/ASPNET-Core-101.

Blazor

Building on the ASP.NET Core infrastructure, Blazor focuses on removing one of the biggest hurdles with complex web applications—JavaScript. Blazor allows you to write C# code instead of JavaScript code to handle client-side tasks such as form handling, HTTP calls, and asynchronously loading data. Under the hood, Blazor uses WebAssembly (Wasm), a popular high-performant JavaScript framework supported by all current browsers (Edge, Safari, Chrome, and Firefox). 

Similar to other frameworks, Blazor also supports and recommends the use of modular components to promote reuse. These are called Blazor components.

In addition, there are three project types when creating a Blazor application:

  • The Blazor-only client side is used, which is ideal for more static pages.
  • A Blazor (ASP.NET Core-hosted) client-side application that is hosted inside ASP.NET Core (this is the project type we are going to review in the next section).
  • A Blazor server-side application that updates the DOM. This is ideal for use with SignalR, Microsoft's real-time web framework supporting chats, real-time tickers, and maps, to name but a few.
If you want to deep dive further into Blazor, Microsoft has written an abundant amount of documentation on Microsoft Developer Network (MSDN) at: https://docs.microsoft.com/en-us/aspnet/core/blazor/?view=aspnetcore-3.1.

Creating the file classification web application

As mentioned earlier, the application we will be creating is a file classification web application. Using the knowledge garnered in the Creating a binary classification application section in Chapter 4, Classification Model, we will be taking it a step further and looking at adding more attributes to a file prior to making a classification. In addition, we will be integrating machine learning with ML.NET into the web application, where an end user can upload files for classification, returning either clean or malicious files, along with a confidence of that prediction.

As with previous chapters, the completed project code, sample dataset, and project files can be downloaded at: https://github.com/PacktPublishing/Hands-On-Machine-Learning-With-ML.NET/tree/master/chapter09.

Exploring the project architecture

Given the previous applications have all been command-line applications, the project architecture for this example is quite different.

As with some of the previous chapters, an additional ML.NET NuGet package—Microsoft.ML.FastTree—is required in order to utilize the FastTree algorithm in ML.NET. Version 1.3.1 is used in both the included example on GitHub and throughout this chapter's deep dive.

In the following screenshot, you will find the Visual Studio Solution Explorer view of the example's solution. Given that this example comprises three separate projects (more akin to a production scenario), the amount of both new and significantly modified files is quite large. We will review each of the new files shown in the following solution screenshot in detail in further sections:

The sampledata.csv file contains 14 rows of extracted features from Windows Executables (we will go into these features in more detail in the next section). Feel free to adjust the data to fit your own observations or to adjust the trained model with different sample files. The following snippet is one of the rows found in the sampledata.data file:

18944 0 7 0 0 4 True "!This program cannot be run in DOS mode.Fm;Ld &~_New_ptrt(M4_Alloc_max"uJIif94H3"j?TjV*?invalid argum_~9%sC:Program Files (x86Microsoft Visu Studio20cl4exomory"/Owneby CWGnkno excepti & 0xFF;b?eCErr[E7XE#D%d3kRegO(q/}nKeyExWa!0 S=+,H}VoDebugPE.pdbC,j?_info ByteToWidendled=aekQ3V?$buic_g(@1@A8?5/wQAEAAV0;AH@Z?flush@Co12@XcCd{(kIN<7BED!?rdbufPA[Tght_tDB.0J608(:6<?xml version='1.0' encoding='UTF8' standalone='yes'?><assembly xmlns='urn:schemasmicrosoftcom:asm.v1' manifestVersion='1.0'> <trustInfo xmlns="urn:schemasmicrosoftcom:asm.v3"> <security> <requestedPrivileges> <requestedExecutionLevel level='asInvoker' uiAccess='false' /> </requestedPrivileges> </security> </trustInfo></assembly>KERNEL32.DLLMSVCP140D.dllucrtbased.dllVCRUNTIME140D.dllExitProcessGetProcAddressLoadLibraryAVirtualProtect??1_Lockit@std@@QAE@XZ"

In addition to this, we added the testdata.data file that contains additional data points to test the newly trained model against and evaluate it. Here is a sample row of the data inside of testdata.data:

1670144 1 738 0 0 24 False "!This program cannot be run in DOS mode.WATAUAVAWH A_AA]A\_t$ UWAVHx UATAUAVAWHA_AA]A]UVWATAUAVAWH|$@H!t$0HA_AA]A\_]VWATAVAWHSUVWATAUAVAWH(A_AA]A\_][@USVWATAVAWHA_AA\_[]UVWATAUAVAWHA_AA]A\_]@USVWAVH` UAUAVHWATAUAVAWH A_AA]A\_x ATAVAWHUSVWATAUAVAWHA_AA]A\_[]UVWATAUAVAWHA_AA]A\_]$ UVWATAUAVAWHA_AA]A\_]x UATAUAVAWHA_AA]A]@USVWAVHUVWATAUAVAWHA_AA]A\_]UVWATAUAVAWHA_AA]A\_]@USVWATAVAWHA_AA\_[]t$ UWAVH@USVWAVHUVWAVAWHh VWATAVAWHUVWAVAWHUVWATAUAVAWHpA_AA]A\_]WATAUAVAWH0A_AA]A\_L$ UVWATAUAVAWH@A_AA]A\_]UVWATAUAVAWH`A_AA]A\_]UVWATAUAVAWHpA_AA]A\_]@USVWATAVAWHD$0fD9 tA_AA\_[]"

Due to the size of the example project, we will be diving into the code for each of the different components before running the applications at the end of this section, in the following order:

  • The .NET Core library for common code between the two applications
  • The ASP.NET Blazor web application for running the prediction
  • The .NET Core console application for feature extraction and training

Diving into the library

The classes and enumerations that were changed or added are as follows:

  • FileClassificationResponseItem
  • Converters
  • ExtensionMethods
  • HashingExtension
  • FileData
  • FileDataPrediction
  • FileClassificationFeatureExtractor
  • FileClassificationPredictor
  • FileClassificationTrainer

The Constants and BaseML classes remain unmodified from Chapter 8, Using ML.NET with .NET Core and Forecasting.

Due to the nature of this application and that of production applications, where there are multiple platforms and/or ways to execute shared code, a library is used in this chapter's example application. The benefit of using a library is that all common code can reside in a portable and dependency-free manner. Expanding the functionality in this sample application to include desktop or mobile applications would be a much easier lift than having the code either duplicated or kept in the actual applications.

The FileClassificationResponseItem class

The FileClassificationResponseItem class is the common class that contains the properties that are used to feed our model, and is also used to return back to the end user in the web application.

  1. First, we define the TRUE and FALSE mapping to 1.0f and 0.0f respectively, like this:
private const float TRUE = 1.0f;
private const float FALSE = 0.0f;
  1. Next, we add all of the properties to be used to feed our model and display it back to the end user in the web application. The FileSize, Is64BitNumImports NumImportFunctions, NumExportFunctions, IsSigned, and Strings properties are used specifically as features in our model. The SHA1Sum, Confidence, IsMalicious, and ErrorMessage properties are used to return our classification back to the end user, as illustrated in the following code block:
public string SHA1Sum { get; set; }

public double Confidence { get; set; }

public bool IsMalicious { get; set; }

public float FileSize { get; set; }

public float Is64Bit { get; set; }

public float NumImports { get; set; }

public float NumImportFunctions { get; set; }

public float NumExportFunctions { get; set; }

public float IsSigned { get; set; }

public string Strings { get; set; }

public string ErrorMessage { get; set; }
  1. Next, we have the constructor method.  The constructor, as you can see, has a byte array as a parameter. This was done to facilitate both the training and prediction paths in both of the applications, the idea being that the raw file bytes will come into the constructor from a File.ReadAllBytes call or other mechanisms, to provide flexibility. From there, we use the PeNet NuGet package. This package provides an easy-to-use interface for extracting features from a Windows Executable (also known as a PE file). For the scope of this application, a couple of features were chosen to be extracted and stored into the respective properties, as shown in the following code block:
public FileClassificationResponseItem(byte[] fileBytes)
{
SHA1Sum = fileBytes.ToSHA1();
Confidence = 0.0;
IsMalicious = false;
FileSize = fileBytes.Length;

try
{
var peFile = new PeNet.PeFile(fileBytes);

Is64Bit = peFile.Is64Bit ? TRUE : FALSE;

try
{
NumImports = peFile.ImageImportDescriptors.Length;
}
catch
{
NumImports = 0.0f;
}

NumImportFunctions = peFile.ImportedFunctions.Length;

if (peFile.ExportedFunctions != null)
{
NumExportFunctions = peFile.ExportedFunctions.Length;
}

IsSigned = peFile.IsSigned ? TRUE : FALSE;

Strings = fileBytes.ToStringsExtraction();
}
catch (Exception)
{
ErrorMessage = $"Invalid file ({SHA1Sum}) -
only PE files are supported";
}
}

The FileData class

The FileData class, as with previous containers of prediction data, provides our model with the fields necessary to provide a file classification. In addition, we overrode the ToString method to ease the exporting of this data to a comma-separated values (CSV) file during our feature extraction step, as follows:

public class FileData
{
[LoadColumn(0)]
public float FileSize { get; set; }

[LoadColumn(1)]
public float Is64Bit { get; set; }

[LoadColumn(2)]
public float NumberImportFunctions { get; set; }

[LoadColumn(3)]
public float NumberExportFunctions { get; set; }

[LoadColumn(4)]
public float IsSigned { get; set; }

[LoadColumn(5)]
public float NumberImports { get; set; }

[LoadColumn(6)]
public bool Label { get; set; }

[LoadColumn(7)]
public string Strings { get; set; }

public override string ToString() =>
$"{FileSize} {Is64Bit} {NumberImportFunctions} " +
$"{NumberExportFunctions} {IsSigned} {NumberImports} " +
$"{Label} "{Strings}"";
}

The FileDataPrediction class

The FileDataPrediction class contains the prediction's classification and probability properties to return to the end user in our web application, as shown in the following code block:

public class FileDataPrediction
{
public bool Label { get; set; }

public bool PredictedLabel { get; set; }

public float Score { get; set; }

public float Probability { get; set; }
}

The Converters class

The Converters class provides an extension method to convert the FileClassificationResponseItem class—reviewed earlier in this section—to the FileData class. By making an extension method, as shown in the following code block, we can quickly and cleanly convert between the application container and our model-only container:

public static class Converters
{
public static FileData ToFileData(
this FileClassificationResponseItem fileClassification)
{
return new FileData
{
Is64Bit = fileClassification.Is64Bit,
IsSigned = fileClassification.IsSigned,
NumberImports = fileClassification.NumImports,
NumberImportFunctions = fileClassification.NumImportFunctions,
NumberExportFunctions = fileClassification.NumExportFunctions,
FileSize = fileClassification.FileSize,
Strings = fileClassification.Strings
};
}
}

The ExtensionMethods class

The ExtensionMethods class, as shown in previous chapters, contains helper extension methods. In this example, we will be adding in the ToStrings extension method. Strings are a highly popular first pass and an easy-to-capture feature when making a classification of a file. Let's dive into the method, as follows:

  1. First, we define two new constants for handling the buffer size and the encoding. As mentioned earlier, 1252 is the encoding in which Windows Executables are encoded, as shown in the following code block:
private const int BUFFER_SIZE = 2048;
private const int FILE_ENCODING = 1252;
  1. The next change is the addition of the ToStringsExtraction method itself and defining our regular expression, as follows:
public static string ToStringsExtraction(this byte[] data)
{
var stringRex = new Regex(@"[ -~ ]{8,}",
RegexOptions.Compiled);

This regular expression is what we will use to traverse the file's bytes.

  1. Next, we initialize the StringBuilder class and check if the passed-in byte array is null or empty (if it is, we can't process it), like this:
var stringLines = new StringBuilder();

if (data == null || data.Length == 0)
{
return stringLines.ToString();
}
  1. Now that we have confirmed there are bytes in the passed-in array, we only want to take up to 65536 bytes. The reason for this is that if the file is 100 MB, this operation could take significant time to perform. Feel free to adjust this number and see the efficacy results. The code is shown here:
var dataToProcess = data.Length > 65536 ? data.Take(65536).ToArray() : data;
  1. Now that we have the bytes we are going to analyze, we will loop through and extract lines of text found in the bytes, as follows:
using (var ms = new MemoryStream(dataToProcess, false))
{
using (var streamReader = new StreamReader(ms,
Encoding.GetEncoding(FILE_ENCODING),
false, BUFFER_SIZE, false))
{
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();

if (string.IsNullOrEmpty(line))
{
continue;
}

line = line.Replace("^", "").Replace(")",
"").Replace("-", "");

stringLines.Append(string.Join(string.Empty,
stringRex.Matches(line).Where(a =>
!string.IsNullOrEmpty(a.Value) &&
!string.IsNullOrWhiteSpace(a.Value)).ToList()));
}
}
}
  1. Finally, we simply return the lines joined into a single string, like this:
return string.Join(string.Empty, stringLines);

The HashingExtensions class

The new HashingExtensions class converts our byte array to a SHA1 string. The reason for not putting this with our other extension methods is to provide a common class to potentially hold SHA256, ssdeep, or other hashes (especially given the recent SHA1 collisions, proving SHA1 to be insecure).

For this method, we're using the built-in .NET Core SHA1 class, and then converting it to a Base64 string with a call to ToBase64String, as follows:

public static class HashingExtension
{
public static string ToSHA1(this byte[] data)
{
var sha1 = System.Security.Cryptography.SHA1.Create();

var hash = sha1.ComputeHash(data);

return Convert.ToBase64String(hash);
}
}

The FileClassificationFeatureExtractor class

The FileClassificationFeatureExtractor class contains our Extract and ExtractFolder methods:

  1. First, our ExtractFolder method takes in the folder path and the output file that will contain our feature extraction, as shown in the following code block:
private void ExtractFolder(string folderPath, string outputFile)
{
if (!Directory.Exists(folderPath))
{
Console.WriteLine($"{folderPath} does not exist");

return;
}

var files = Directory.GetFiles(folderPath);

using (var streamWriter =
new StreamWriter(Path.Combine(AppContext.BaseDirectory,
$"../../../../{outputFile}")))
{
foreach (var file in files)
{
var extractedData = new
FileClassificationResponseItem(
File.ReadAllBytes(file)).ToFileData();

extractedData.Label = !file.Contains("clean");

streamWriter.WriteLine(extractedData.ToString());
}
}

Console.WriteLine($"Extracted {files.Length} to {outputFile}");
}
  1. Next, we use the Extract method to call both the training and test extraction, as follows:
public void Extract(string trainingPath, string testPath)
{
ExtractFolder(trainingPath, Constants.SAMPLE_DATA);
ExtractFolder(testPath, Constants.TEST_DATA);
}

The FileClassificationPredictor class

The FileClassificationPredictor class provides the interface for both our command-line and web applications, using an overloaded Predict method:

  1. The first Predict method is for our command-line application that simply takes in the filename and is called into the overload in Step 2 after loading in the bytes, as follows:
public FileClassificationResponseItem Predict(string fileName)
{
var bytes = File.ReadAllBytes(fileName);

return Predict(new FileClassificationResponseItem(bytes));
}
  1. The second implementation is for our web application that takes the FileClassificationResponseItem object, creates our prediction engine, and returns the prediction data, as follows:
public FileClassificationResponseItem Predict(FileClassificationResponseItem file)
{
if (!File.Exists(Common.Constants.MODEL_PATH))
{
file.ErrorMessage = $"Model not found (
{Common.Constants.MODEL_PATH}) -
please train the model first";

return file;
}

ITransformer mlModel;

using (var stream = new FileStream(Common.Constants.MODEL_PATH,
FileMode.Open, FileAccess.Read, FileShare.Read))
{
mlModel = MlContext.Model.Load(stream, out _);
}

var predictionEngine =
MlContext.Model.CreatePredictionEngine<FileData,
FileDataPrediction>(mlModel);

var prediction = predictionEngine.Predict(file.ToFileData());

file.Confidence = prediction.Probability;
file.IsMalicious = prediction.PredictedLabel;

return file;
}

The FileClassificationTrainer class

The last class added in the library is the FileClassificationTrainer class. This class supports the use of the FastTree ML.NET trainer, as well as utilizing our features we have extracted from the files:

  1. The first change is the use of the FileData class to read the CSV file into the dataView property, as shown in the following code block:
var dataView = MlContext.Data.LoadFromTextFile<FileData>(trainingFileName, hasHeader: false);

             2. Next, we map our FileData features to create our pipeline, as follows:

var dataProcessPipeline = 
MlContext.Transforms.NormalizeMeanVariance(
nameof(FileData.FileSize))
.Append(MlContext.Transforms.NormalizeMeanVariance(
nameof(FileData.Is64Bit)))
.Append(MlContext.Transforms.NormalizeMeanVariance(
nameof(FileData.IsSigned)))
.Append(MlContext.Transforms.NormalizeMeanVariance(
nameof(FileData.NumberImportFunctions)))
.Append(MlContext.Transforms.NormalizeMeanVariance(
nameof(FileData.NumberExportFunctions)))
.Append(MlContext.Transforms.NormalizeMeanVariance(
nameof(FileData.NumberImports)))
.Append(MlContext.Transforms.Text.FeaturizeText(
"FeaturizeText", nameof(FileData.Strings)))
.Append(MlContext.Transforms.Concatenate(FEATURES,
nameof(FileData.FileSize), nameof(FileData.Is64Bit),
nameof(FileData.IsSigned),
nameof(FileData.NumberImportFunctions),
nameof(FileData.NumberExportFunctions),
nameof(FileData.NumberImports), "FeaturizeText"));

             3. Lastly, we initialize our FastTree algorithm, as follows:

var trainer = MlContext.BinaryClassification.Trainers.FastTree(
labelColumnName: nameof(FileData.Label),
featureColumnName: FEATURES,
numberOfLeaves: 2,
numberOfTrees: 1000,
minimumExampleCountPerLeaf: 1,
learningRate: 0.2);

The rest of the method is similar to our previous binary classification Train method in Chapter 5Clustering Models.

Diving into the web application

With the library code having been reviewed, the next component is the web application. As discussed in the opening section, our web application is an ASP.NET Core Blazor application. For the scope of this example, we are using standard approaches for handling the backend and frontend. The architecture of this app combines both Blazor and ASP.NET Core—specifically, using ASP.NET Core to handle the REST service component of the app.

The files we will be diving into in this section are the following ones:

  • UploadController
  • Startup
  • Index.razor

The UploadController class

The purpose of the UploadController class is to handle the server-side processing of the file once submitted. For those having used ASP.NET MVC or Web API in the past, this controller should look very familiar:

  1. The first thing to note is the attribute tags decorating the class. The ApiController attribute configures the controller to handle HTTP APIs, while the Route tag indicates the controller will be listening on the /Upload path, as shown in the following code block:
[ApiController]
[Route("[controller]")]
public class UploadController : ControllerBase
  1. The next thing to note is the use of Dependency Injection (DI) in the constructor of UploadController passing in the predictor object. DI is a powerful approach to providing access to singleton objects such as FileClassificationPredictor or databases, and is illustrated in the following code block:
private readonly FileClassificationPredictor _predictor;

public UploadController(FileClassificationPredictor predictor)
{
_predictor = predictor;
}
  1. Next, we create a helper method to handle taking the IFormFile from the HTTP post and returning all of the bytes, as follows:
private static byte[] GetBytesFromPost(IFormFile file)
{
using (var ms = new BinaryReader(file.OpenReadStream()))
{
return ms.ReadBytes((int)file.Length);
}
}
  1. Lastly, we create the Post method. The HttpPost attribute tells the routing engine to listen for only a HttpPost call. The method handles taking the output of the GetBytesFromPost method call, creates the FileClassificationResponseItem object, and then returns the prediction, as shown in the following code block:
[HttpPost]
public FileClassificationResponseItem Post(IFormFile file)
{
if (file == null)
{
return null;
}

var fileBytes = GetBytesFromPost(file);

var responseItem = new FileClassificationResponseItem(
fileBytes);

return _predictor.Predict(responseItem);
}

The Startup class

The Startup class in both an ASP.NET Core and Blazor app controls the initialization of the various services used in the web application. Two major changes have been made to the Startup template that comes with Visual Studio, as follows:

  1. The first change is in the ConfigureServices method. Because this was a combined application of both ASP.NET Core and Blazor, we need to call the AddControllers method. In addition, we are going to utilize DI and initialize the predictor object once, prior to adding it as a singleton, as shown in the following code block:
public void ConfigureServices(IServiceCollection services)
{
services.AddRazorPages();
services.AddControllers();
services.AddServerSideBlazor();

services.AddSingleton<FileClassificationPredictor>();
services.AddSingleton<HttpClient>();
}
  1. The second change comes in the Configure method. The first thing is to register the CodePages instance. Without this call, the feature extraction call to reference the Windows-1252 encoding will cause an exception (we will add this call to the trainer application as well, in the next section). The second thing is to configure the use of MapControllerRoute, as illustrated in the following code block:
public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

if (env.IsDevelopment())
{
app.UseDeveloperExceptionPage();
}
else
{
app.UseExceptionHandler("/Error");
}

app.UseStaticFiles();

app.UseRouting();

app.UseEndpoints(endpoints =>
{
endpoints.MapControllerRoute("default",
"{controller=Home}/{action=Index}/{id?}");
endpoints.MapBlazorHub();
endpoints.MapFallbackToPage("/_Host");
});
}

The Index.razor file

The Index.razor file contains the frontend to our file classification web application. In addition, it contains the REST call to our UploadController class described earlier in this section. For this deep dive, we will specifically look at the Blazor code block, as follows:

  1. The first thing to note is the declaration of our FileClassificationResponseItem class. We define the variable in this block, as it will allow access throughout the page. The second element is the declaration of our HandleSelection method, as illustrated in the following code block:
FileClassificationResponseItem _classificationResponseItem;

async Task HandleSelection(IEnumerable<IFileListEntry> files) {
  1. Next, we take the first file, convert it to an array of bytes, and create the MultipartFormdataContent object to POST to the previously described Post method, as follows:
var file = files.FirstOrDefault();

if (file != null)
{
var ms = new MemoryStream();
await file.Data.CopyToAsync(ms);

var content = new MultipartFormDataContent {
{
new ByteArrayContent(ms.GetBuffer()), "file", file.Name
}
};
  1. Lastly, we POST the file to our UploadController endpoint and asynchronously await the response from our ML.NET prediction, before assigning the response to our response variable, _classificationResponseItem, as follows:
var response = await client.PostAsync("http://localhost:5000/upload/", content);

var jsonResponse = await response.Content.ReadAsStringAsync();

_classificationResponseItem = JsonSerializer.Deserialize<FileClassificationResponseItem>(jsonResponse, new JsonSerializerOptions
{
PropertyNameCaseInsensitive = true
});

Diving into the trainer application

Now that we have reviewed the shared library and the web application,  let's dive into the trainer application.

We will review the following files:

  • ProgramArguments
  • ProgramActions
  • Program

The ProgramArguments class

Building off the work in the ProgramArguments class detailed in Chapter 8Using ML.NET with .NET Core and Forecasting, we are only making one addition to the class. This change adds properties to store the Testing and Training folder paths, and is illustrated in the following code block:

public string TestingFolderPath { get; set; }

public string TrainingFolderPath { get; set; }

Unlike the previous chapter, feature extraction is based on a number of Windows executable files, as opposed to just an included CSV file.

The ProgramActions enumeration

The first change is in the ProgramActions enumeration. In Chapter 8Using ML.NET with .NET Core and Forecasting, we had only training and prediction. However, as mentioned earlier in this chapter, we now also have FeatureExtraction to perform. To add support, we simply add FEATURE_EXTRACTOR to the enumeration, like so:

public enum ProgramActions
{
FEATURE_EXTRACTOR,
TRAINING,
PREDICT
}

The Program class

Inside the Program class, there are only two changes from the previous chapter's overhaul of the command-line argument parsing, as follows:

  1. First, we need to register the CodePages encoder instance to properly read the Windows-1252 encoding from the files as we did in the web application, as follows:
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
  1. We then can use a simplified and strongly typed switch case to handle our three actions, as follows:
switch (arguments.Action)
{
case ProgramActions.FEATURE_EXTRACTOR:
new FileClassificationFeatureExtractor().Extract(
arguments.TrainingFolderPath,
arguments.TestingFolderPath);
break;
case ProgramActions.PREDICT:
var prediction = new
FileClassificationPredictor().Predict(
arguments.PredictionFileName);

Console.WriteLine(
$"File is {(prediction.IsMalicious ? "malicious" :
"clean")} with a {prediction.Confidence:P2}%
confidence");
break;
case ProgramActions.TRAINING:
new FileClassificationTrainer().Train(
arguments.TrainingFileName,
arguments.TestingFileName);
break;
default:
Console.WriteLine($"Unhandled action {arguments.Action}");
break;
}

Running the trainer application

To begin, we will need to first run the chapter09.trainer application to perform feature extraction and training of our model. To run the trainer application, the process is nearly identical to the sample application shown in Chapter 3, Regression Model, with the addition of passing in the test dataset folder path when training, and we will follow these steps:

  1. We will run the trainer application, passing in the paths to the training and test folders to perform feature extraction, as follows:
PS chapter09chapter09.trainerinDebug
etcoreapp3.1> .chapter09.trainer.exe trainingfolderpath ........TrainingData testingfolderpath ........TestData
Extracted 14 to sampledata.data
Extracted 14 to testdata.data
Included in the code repository are two pre-feature extracted files (sampledata.csv and testdata.csv) to allow you to train a model without performing your own feature extraction.  If you would like to perform your own feature extraction, create a TestData and TrainingData folder.  Populate these folders with a sampling of PowerShell (PS1), Windows Executables (EXE ) and Microsoft Word documents (DOCX).
  1. Now, we will again run the application to train the model based on Step 1 sample and test data exports. The resulting model (fileclassification.mdl) will be in the same folder as the executable, as follows:
PS chapter09chapter09.trainerinDebug
etcoreapp3.1> .chapter09.trainer.exe action training trainingfilename ........sampledata.data testingfilename ........	estdata.data
Entropy: 0.5916727785823275
Log Loss: 12.436063032030377
Log Loss Reduction: -20.018480961432264

Feel free to modify the values and see how the prediction changes based on the dataset on which the model was trained. A few areas of experimentation from this point might be to do the following:

  • Tweak the hyperparameters reviewed in the Trainer class—such as the numberOfLeaves, numberOfTrees, and learningRate—to see how accuracy is affected.
  • Add new features to the FileData class, such as specific imports, instead of using just the count.
  • Add more variation to the training and sample set to get a better sampling of data.

For convenience, the GitHub repository includes both the testdata.csv and sampledata.csv files.

Running the web application

Now that our model has been trained, we can run our web application and test the submission of a file. You must first build the web application if you haven't already. This will create the bindebug etcoreapp3.1 folder. After building the web application, copy the model we trained in the previous section. At this point, start the web application. Upon starting, you should see the following in your default browser:

Proceed to click on the Choose File button, select an .exe or .dll file, and you should see the following results from our model:

Feel free to try various files on your machine to see the confidence score, and if you receive a false positive, perhaps add additional features to the model to correct the classification.

Exploring additional ideas for improvements

Now that we have completed our deep dive, there are a couple of additional elements to possibly further enhance the application. A few ideas are discussed next.

Logging

As with our previous chapter's deep dive into logging, adding logging could be crucial to remotely understand when an error occurs on a web application. Logging utilizing NLog (https://nlog-project.org/) or a similar open source project is highly recommended as your application complexity increases. This will allow you to log to a file, console, or third-party logging solution—such as Loggly—at varying levels.

Utilizing a caching layer

Imagine deploying this application on a public-facing web server and having hundreds of concurrent users. Chances are that users might upload the same file—caching the results in memory would avoid unnecessary CPU processing to run the prediction every time. Some caching options include utilizing the ASP.NET in-memory caching, or external caching databases such as Redis. These are both available via NuGet packages.

Utilizing a database

On a similar note to the caching suggestion, recording the results in a database could avoid unnecessary CPU processing. A logical choice would be to utilize a NoSQL database such as MongoDB. Using the SHA1 hash as the key and the value as the full JSON response could significantly improve performance in a high-traffic scenario. MongoDB has a .NET interface available on NuGet called MongoDB.Driver. Version 2.10.0 is the latest at the time of writing.

Summary

Over the course of this chapter, we have discussed what goes into a production-ready ASP.NET Core Blazor web application architecture, using the work performed in previous chapters as a foundation. We also created a brand new file classification web application utilizing the FastTree binary classifier from ML.NET. Lastly, we also discussed some ways to further enhance an ASP.NET Core application (and production applications in general).

In the next chapter, we will deep dive into creating a production web browser using the content of a web page to determine if the content is malicious or not, using ML.NET's sentiment analysis and the UWP framework.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.182.97