Using Python in SQL Server

In the last section of this chapter, you are going to learn how to use Python with SQL Server and in SQL Server. You can use two scalable libraries, the revoscalepy and MicrosoftML libraries, which correspond to equivalent R libraries. You can also read SQL Server data by using ODBC. You can use the pyodbc package for this task. The following code imports all the necessary libraries for this section:

import numpy as np 
import pandas as pd 
import pyodbc; 
from revoscalepy import rx_lin_mod, rx_predict, rx_summary

Now it is time to read SQL Server data. Note that like for R, I also used the ODBC Data Sources tool to create a system DSN called AWDW in advance. It points to my AdventureWorksDW2014 demo database. For the connection, I use the same RUser SQL Server login I created for R. Of course, I created a database user in the AdventureWorksDW2014 database for this user, and gave the user permission to read the data. The following code reads SQL Server data and stores it in a data frame. It also checks the shape and the first five rows of this data frame:

con = pyodbc.connect('DSN=AWDW;UID=RUser;PWD=Pa$$w0rd') 
query = """SELECT CustomerKey, Age, 
             YearlyIncome, TotalChildren, 
             NumberCarsOwned 
           FROM dbo.vTargetMail;""" 
TM = pd.read_sql(query, con) 
TM.head(5) 
TM.shape

You can use the rx_summary() revoscalepy scalable function to quickly get some descriptive statistics for the data. The following code does this for the NumberCarsOwned variable:

summary = rx_summary("NumberCarsOwned", TM) 
print(summary)

Here are the results:

               Name      Mean    StdDev  Min  Max  ValidObs
    NumberCarsOwned  1.502705  1.138394  0.0  4.0   18484.0

The next step is initializing and training a linear regression model, using number of cars owned as the target variable, and income, age, and number of children as input variables. Please note the syntax of the rx_lin_mod() function—it actually uses R syntax for the function parameters. This syntax might be simpler for you if you already use R; however, it might look a bit weird to pure Python developers:

linmod = rx_lin_mod( 
    "NumberCarsOwned ~ YearlyIncome + Age + TotalChildren",  
    data = TM)

Finally, the following code makes and shows the predictions:

predmod = rx_predict(linmod, data = TM, output_data = TM) 
predmod.head(10)

Now you need to switch to SSMS. You will use Python inside T-SQL code. If you did not configure your SQL Server to allow external scripts, you have to do it now, with the help of the following code:

USE master; 
EXEC sys.sp_configure 'show advanced options', 1; 
RECONFIGURE WITH OVERRIDE; 
EXEC sys.sp_configure 'external scripts enabled', 1;  
RECONFIGURE WITH OVERRIDE; 
GO 
-- Restart SQL Server 
-- Check the configuration 
EXEC sys.sp_configure; 
GO

You can immediately check whether you can run Python code with the sys.sp_execute_external_script procedure. The following code returns a 1 x 1 table, with value 1 in the single cell:

EXECUTE sys.sp_execute_external_script  
@language =N'Python', 
@script=N' 
OutputDataSet = InputDataSet 
print("Input data is: 
", InputDataSet) 
',  
@input_data_1 = N'SELECT 1 as col';

The following code creates the RUser login and database user used earlier for reading SQL Server data in Python. It also gives this user the permission to read the data:

CREATE LOGIN RUser WITH PASSWORD=N'Pa$$w0rd'; 
GO 
USE AdventureWorksDW2014; 
GO 
CREATE USER RUser FOR LOGIN RUser; 
ALTER ROLE db_datareader ADD MEMBER RUser; 
GO

And finally, here is the big code that runs Python to create the same linear regression model as before, however this time within SQL Server. In the result, you get the actual data with the predicted number of cars:

USE AdventureWorksDW2014; 
EXECUTE sys.sp_execute_external_script  
@language =N'Python', 
@script=N' 
from revoscalepy import rx_lin_mod, rx_predict 
import pandas as pd 
linmod = rx_lin_mod( 
    "NumberCarsOwned ~ YearlyIncome + Age + TotalChildren",  
    data = InputDataSet) 
predmod = rx_predict(linmod, data = InputDataSet, output_data = InputDataSet) 
print(linmod) 
OutputDataSet = predmod 
',  
@input_data_1 = N' 
SELECT CustomerKey, CAST(Age AS INT) AS Age, 
  CAST(YearlyIncome AS INT) AS YearlyIncome,  
  TotalChildren, NumberCarsOwned 
FROM dbo.vTargetMail;' 
WITH RESULT SETS (( 
 "CustomerKey" INT NOT NULL, 
 "Age" INT NOT NULL, 
 "YearlyIncome" INT NOT NULL, 
 "TotalChildren" INT NOT NULL, 
 "NumberCarsOwned" INT NOT NULL,  
 "NumberCarsOwned_Pred" FLOAT NULL)); 
GO

Before finishing this section, let me point out casts in the input SELECT statement. In comparison to SQL Server, Python supports a limited number of data types. Some conversions between SQL Server data types can be done implicitly, others must be done manually. You can read the details about possible implicit conversions at https://docs.microsoft.com/en-us/sql/advanced-analytics/python/python-libraries-and-data-types.

Table of Contents for Using Python in SQL Server

Create new playlist

Sign In

Sign Up

Table of Contents for
Using Python in SQL Server