Writing new functions

A function is an object loaded into the computer's temporary memory and can be activated (usually with specific arguments) to perform a certain action. So far, we have used predefined functions (from R's base packages; starting in Chapter 3, Working with Tables, we are going to use functions from other contributed packages). In this section, we will describe the structure of a function's definition and see how we can write our own functions.

Note that in this book you are not going to define that many functions and the functions you will define are going to be rather simple. The reason for this is that most of the time you will be learning new methods, rather than repeatedly applying a given method you developed (which would justify writing a function for it). However, in practice, wrapping your code to a function form is frequently useful in cases where you have developed a certain procedure you would like to apply routinely to different datasets.

Defining our own functions

Let's review the components of a function's definition using an example. In the following example, we define a new function called add_five, which adds 5 to the provided argument and returns the result:

> add_five = function(x) {
+ x_plus_five = x + 5
+ return(x_plus_five)
+ }

The components of the definition are as follows:

  • The function's name (for example, add_five)
  • The assignment operator (=)
  • The function definition operator (function)
  • The function's parameters, possibly with default values, within brackets (for example, (x))
  • Opening brackets for the code section ({)
  • The function's body of code (for example, x_plus_five=x+5)
  • The definition of the returned value (for example, return(x_plus_five))
  • Closing brackets for the code section (})

The idea is that the code that constitutes the function's body will run every time the function is called:

> add_five(5)
[1] 10
> add_five(7)
[1] 12

When we perform a function call, the objects that we provide as arguments are assigned to local objects within the function's environment so that the function's code can use them. These objects exist only while the function runs and are inaccessible from the global environment after the function is terminated:

> x_plus_five
Error: object 'x_plus_five' not found

Every function returns a value that we would frequently like to preserve for subsequent calculations. This is done by assignment in the same way we saw earlier in this chapter for predefined functions:

> result = add_five(3)
> result
[1] 8

The return(x_plus_five) expression can be skipped since by default, the function returns the last created object (which is x_plus_five). Therefore, in fact, we do not even need to assign the result to the x_plus_five object. In addition, when the code section contains a single expression, we can omit the parentheses. Therefore, an identical function can be defined simply, as follows:

> add_five = function(x) x + 5

Setting default values for the arguments

We can assign default arguments to parameters during the function's definition. This way, we will be able to skip some (or all) of the parameters during a function call. In other words, we can provide no arguments for some of the parameters, in which case the function will use the default arguments:

> add_five = function(x) x + 5
> add_five()
Error in add_five() : argument "x" is missing, with no default
> add_five = function(x = 1) x + 5
> add_five()
[1] 6

In the preceding example, in the first case, we got an error message since we tried calling the add_five function without providing an argument for the x parameter, which had no default value. In the second case, the function call was successful since this time the function was defined with a default value for x (which was equal to 1 and thus, the returned value was 6).

Many of the predefined functions in R have default arguments for some of the parameters. For example, the default arguments for the mode and length parameters of the vector function are "logical" and 0:

> vector()
logical(0)

Therefore, by default, it creates an empty logical vector (the default arguments can be found on the respective function's help page). There are no limitations for the class each argument in a function call must belong to as long as we (or the person who wrote the function) have not defined such limitations. However, if one of the expressions in the function's code results in an error given the particular set of arguments, the execution of the function will terminate and we will get no returned value. For example, our add_five function will trigger an error when supplying a character vector as an argument:

> add_five("one")
Error in x + 5 : non-numeric argument to binary operator
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.108.175