Discovering SQL Server R Machine Learning Services

Microsoft provides the highly scalable R engine in two flavors:

  • R Machine Learning Services (In-Database): This is the installation that integrates R into SQL Server. It includes a database service that runs outside the SQL Server Database Engine and provides a communication channel between the Database Engine and R runtime. You install it with SQL Server setup. The R engine includes open source R components and, in addition, a set of scalable R packages.
  • Microsoft R Server: This is a standalone R server with the same open and scalable packages that run on multiple platforms.

For development, you prepare a client installation. You can download Microsoft R Client from http://aka.ms/rclient/download. This installation includes the open R engine and the scalable packages as well. In addition to the engine, you will probably want to also install a development IDE, either RStudio or R Tools for Visual Studio. Of course, you can also download and install the Developer Edition of SQL Server 2017 instead. This way, you get both the R runtime with the scalable packages and the database engine.

Some scalable packages shipped with SQL Server R Machine Learning Services are:

  • RevoScaleR: This is a set of parallelized scalable R functions for processing data, data overview and preliminary analysis, and machine-learning models. The procedures in this package can work with chunks of data at a time, so they don't need to load all of the data in memory immediately.
  • RevoPemaR: This package allows you to write custom parallel external algorithms.
  • MicrosoftML: This is a new package from December 2016, with many additional scalable machine learning algorithms implemented.

The following figure shows how the communication process between SQL Server and R engine works:

The communication between SQL Server and R runtime

The components involved and their communications are as follows:

  • In SQL Server Database Engine, you run an R script with the sys.sp_execute_external_script system stored procedure. SQL Server sends the request to the Launchpad service, a new service that supports the execution of external scripts.
  • The Launchpad service starts the launcher appropriate for the language of your script. Currently, the only launcher available is the RLauncher, and therefore you can launch an external script from SQL Server using the R language only. However, you can see that the infrastructure is prepared to enable the execution of scripts in additional programming languages.
  • The RLauncher starts RTerm, the R terminal application for executing R scripts.
  • The RTerm sends the script to BxlServer. This is a new executable used for communication between SQL Server and the R engine. The scalable R functions are implemented in this executable as well.
  • The BxlServer uses SQL Satellite, a new extensibility API that provides a fast data transfer between SQL Server and an external runtime. Again, currently only the R runtime is supported.

Time to test the execution of an R script in SQL Server! First, you need to use the sys.sp_configure system stored procedure to enable external scripts. You can do this with the following code:

USE master; 
EXEC sys.sp_configure 'show advanced options', 1; 
RECONFIGURE 
EXEC sys.sp_configure 'external scripts enabled', 1;  
RECONFIGURE; 

After that, you can call the sys.sp_execute_external_script system stored procedure. The most important parameters of this procedure include:

  • @language: Currently limited to value R
  • @script: The actual script in the external language
  • @input_data_1_name: The name of the data frame, as seen in the R code in the @script parameter for the first input dataset; the default name is InputDataSet
  • @input_data_1: The T-SQL query that specifies the first input dataset
  • @output_data_1_name: The name of the R object, most probably a data frame, with the output dataset; the default name is OutputDataSet
  • WITH RESULT SETS: The option where you specify the column names and data types of the output of the R script, as seen in SQL Server
  • @params: a list of input and output parameters for the script

In the following example, the R script called from SQL Server retrieves a list of installed packages:

EXECUTE sys.sp_execute_external_script 
 @language=N'R', 
 @script =  
 N'str(OutputDataSet); 
   packagematrix <- installed.packages(); 
   NameOnly <- packagematrix[,1]; 
   OutputDataSet <- as.data.frame(NameOnly);' 
WITH RESULT SETS ( ( PackageName nvarchar(20) ) ); 

The shortened results are:

PackageName
--------------------
base
boot
class
...
RevoIOQ
revoIpe
RevoMods
RevoPemaR
RevoRpeConnector
RevoRsrConnector
RevoScaleR
RevoTreeView
RevoUtils
RevoUtilsMath
...
spatial
splines
stats
...

You can see that besides some base packages, there is a set of packages where the name starts with the string Revo, including the RevoScaleR and RevoPemaR packages. These are two packages with scalable functions and their associated packages.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.191.134