R language basics

You can start investigating R by writing simple expressions that include literals and operators. Here are some examples:

1 + 1;
2 + 3 * 4;
3 ^ 3;
sqrt(81);
pi;

This code evaluates three mathematical expressions first using the basic operators. Check the results and note that R, as expected, evaluates the expressions using operator precedence as we know from mathematics. Then it calls the sqrt() function to calculate the square root of 81. Finally, the code checks the value of the base package built-in constant for the number pi (π). R has some built-in constants. Check them by searching help for them with ??"constants".

It is easy to generate sequences. The following code shows some examples:

rep(1,10); 
3:7;          
seq(3,7); 
seq(5,17,by=3);       

The first command replicates the number 1 10 times using the rep() function. The second line generates the sequence of numbers between 3 and 7. The third line does exactly the same, this time using the seq() function. This function gives you additional possibilities, as the fourth line shows. This command generates a sequence of numbers between 5 and 17, but this time with an increment of 3.

Writing ad hoc expressions means you need to rewrite them whenever you need them. To reuse the values, you need to store them in variables. You assign a value to a variable with an assignment operator. R supports multiple assignment operators. You can use the left assignment operator (<-), where the variable name is on the left side, or the right assignment operator (->), where the variable name is on the right side. You can also use the equals (=) operator. The left assignment operator is the one you will see most commonly in R code. The following code stores the numbers 2, 3, and 4 in variables and then performs a calculation using the variables:

x <- 2; 
y <- 3; 
z <- 4; 
x + y * z; 

The result is 14. Note again that R is case-sensitive. For example, the following line of code produces an error, because variables X, Y, and Z are not defined:

x + y + z; 

You can separate part of a variable name with a dot. This way, you can organize your objects into namespaces, just as you can in .NET languages. Here is an example:

This.Year <- 2016; 
This.Year; 

You can check whether the equals assignment operator really works:

x = 2; 
y = 3; 
z = 4; 
x + y * z; 

If you executed the last code, you would get the same result as with the code that used the left assignment operator instead of the equals operator.

Besides mathematical operators, R supports logical operators as well. To test exact equality, use the double equals (==) operator. Other logical operators include <, <=, >, >=, and != to test inequality. In addition, you can combine two logical expressions into a third one using the logical AND (&) and logical OR (|) operators. The following code checks a variable for exact equality with a number literal:

x <- 2; 
x == 2; 

The result is TRUE.

Every variable in R is actually an object. A simple scalar variable is a vector of length one. A vector is a one-dimensional array of scalars of the same type, or mode: numeric, character, logical, complex (imaginary numbers), and raw (bytes). You use the combine function c() to define the vectors. Here are the ways to assign variable values as vectors. Note that the variables with the same names will be overwritten:

x <- c(2,0,0,4);        
assign("y", c(1,9,9,9));  
c(5,4,3,2) -> z;               
q = c(1,2,3,4); 

The first line uses the left assignment operator. The second assigns the second vector to the variable y using the assign() function. The third line uses the right assignment operator, and the fourth line, the equals operator.

You can perform operations on vectors just like you would perform them on scalars (remember, after all, a scalar is just a vector of length 1). Here are some examples of vector operations:

x + y; 
x * 4; 
sqrt(x); 

The results of the previous three lines of code are:

3  9  9 13
8  0  0 16
1.414214 0.000000 0.000000 2.000000

You can see that the operations were performed element by element. You can operate on a selected element only as well. You use numerical index values to select specific elements. Here are some examples:

x <- c(2,0,0,4);   
x[1];    
x[-1];    
x[1] <- 3; x;    
x[-1] = 5; x;   

First, the code assigns a vector to a variable. The second line selects the first element of the vector. The third line selects all elements except the first one and returns a vector of three elements. The fourth line of the code assigns a new value to the first element and then shows the vector. The last line assigns new values to all elements but the first one, and then shows the vector. The results are, therefore:

2
0 0 4
3 0 0 4
3 5 5 5

You can also use logical operators on vectors. Here are some examples:

y <- c(1,9,9,9); 
y < 8;     
y[4] = 1; 
y < 8; 
y[y<8] = 2; y; 

The first line assigns a vector to variable y. The second line compares each vector value to a numeric constant 8 and returns TRUE for those elements where the value is lower than the given value. The third line assigns a new value to the fourth element of the vector. The fourth line performs the same comparison of the vector elements to number 8 again and returns TRUE for the first and fourth element. The last line edits the elements of vector y that satisfy the condition in the parentheses—those elements where the value is less than 8. The result is:

TRUE FALSE FALSE FALSE
TRUE FALSE FALSE  TRUE
2 9 9 2

Vectors and scalars are very basic data structures. You will learn about more advanced data structures in the next section of this chapter. Before that, it is high time to mention a very important concept in R—packages.

The R code shown in this chapter has used only core capabilities so far; capabilities that you get when you install the R engine. Although these capabilities are already very extensive, the real power of R comes with additional packages.

Packages are optional modules you can download and install. Each package brings additional functions, or demo data, in a well-defined format. The number of available packages is growing year by year. At the time of writing this book, in winter 2017, the number of downloadable packages was already almost 10,000. A small set of standard packages is already included when you install the core engine. You can check out installed packages with the installed.packages() command. Packages are stored in the folder called library. You can get the path to the library with the .libPaths() function (note the dot in the name). You can use the library() function to list the packages in your library.

The most important command to learn is install.packages("packagename"). This command searches the CRAN sites for the package, downloads it, unzips it, and installs it. Of course, you need a web connection in order to execute it successfully. You can imagine that such a simplistic approach is not very welcome in highly secure environments. Of course, in order to use R Machine Learning Services in SQL Server, the package installation is more secure and more complex, as you will learn later in this chapter.

Once a package is installed, you load it into memory with the library(packagename) command. You can get help on the content of the package with the help(package = "packagename") command. For example, if you want to read the data from a SQL Server database, you need to install the RODBC library. The following code installs this library, loads it, and gets the help for functions and methods available in it:

install.packages("RODBC"); 
library(RODBC); 
help(package = "RODBC"); 

Before reading the data from SQL Server, you need to perform two additional tasks. First, you need to create a login and a database user for the R session and give the user the permission to read the data. Then, you need to create an ODB data source name (DSN) that points to this database. In SSMS, connect to your SQL Server, and then in Object Explorer, expand the Security folder. Right-click on the Logins subfolder. Create a new login and a database user in the WideWorldImportersDW database, and add this user to the db_datareader role.

I created a SQL Server login called RUser with the password as Pa$$w0rd, and a user with the same name, as the following screenshot shows:

Generating the RUser login and database user

After that, I used the ODBC Data Sources tool to create a system DSN called WWIDW. I configured the DSN to connect to my local SQL Server with the RUser SQL Server login and appropriate password, and changed the context to the WideWorldImportersDW database. If you've successfully finished both steps, you can execute this R code to read the data from SQL Server:

con <- odbcConnect("WWIDW", uid="RUser", pwd="Pa$$w0rd"); 
sqlQuery(con,  
         "SELECT c.Customer, 
            SUM(f.Quantity) AS TotalQuantity, 
            SUM(f.[Total Excluding Tax]) AS TotalAmount, 
            COUNT(*) AS SalesCount 
          FROM Fact.Sale AS f 
           INNER JOIN Dimension.Customer AS c 
            ON f.[Customer Key] = c.[Customer Key] 
          WHERE c.[Customer Key] <> 0 
          GROUP BY c.Customer 
          HAVING COUNT(*) > 400 
          ORDER BY SalesCount DESC;"); 
close(con); 

The code returns the following (abbreviated) result:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.80.209