NB: R Functions

Programming for Data Science

Functions are fundamental to R, as with most programming languages.

Syntactically, R functions are constructed by the function statement and assigned to a variable.

my.function <- function(<args>) {
    <body>
    return(<return_value>)
}

<args> are arguments that may take default values.

Defaults are assigned with the = operator, not <-.

<body> is code executed when the function is called.

<return_value> is the value returned by the return().

Note the return() is optional.

If not called, R will return the last variable in the body.

Let’s look at an example.

Here we define a function that computes Z-scores by doing the following:

First, it takes a value and a vector of values as inputs.

Second, it normalizes the value against the vector by subtracting the vector mean from value, and dividing by vector standard deviation.

compute_zscore <- function(val, vec) {
  (val - mean(vec)) / sd(vec)
}

Let’s test it with some sample data.

x <- 5                
xx <- c(4, 6, 7, 8, 2, 11)
compute_zscore(x, xx)
-0.424476359978009

Note that if vector contains identical values, sd is zero, and so the z-score is undefined.

compute_zscore(x, c(1, 1, 1, 1))
Inf

Also, if a vector contains missing values, the result will be NA.

xx_na <- c(1, NA, 3, 5) 
compute_zscore(x, xx_na)
<NA>

Here’s another example.

We write a function that returns \(1\) if passed value is odd, \(0\) if even.

Recall that %% is modulus operator, which returns the remainder of a division operation.

is_odd <- function(x) { 
    if (x %% 2 == 1) { 
        return(1) 
    } else { 
        return(0)
    } 
}

Call to test some cases:

is_odd(4)
is_odd(3)
0
1

Default Argument Values

Function arguments can use default values:

threshold_vals <- function(p, thresh = 0.5) {
  p > thresh
}

Here we use the default thresh.

threshold_vals(c(0.6, 0.4, 0.1, 1))
  1. TRUE
  2. FALSE
  3. FALSE
  4. TRUE

Now, pass a different threshold:

threshold_vals(c(0.6, 0.4, 0.1, 1), 0.7)
  1. FALSE
  2. FALSE
  3. FALSE
  4. TRUE

Error Trapping

You can assert important preconditions with stop().

Here, we assert that the lengths of vectors x and y match.

If they don’t. we throw an error with stop().

add_vectors <- function(x, y) {
  if (length(x) != length(y)) {
    stop("x and y must be the same length", call. = FALSE)
  }
  x + y
}
add_vectors(c(1, 2, 3), c(3, 3, 3))
  1. 4
  2. 5
  3. 6

Let’s see if it traps this error:

add_vectors(c(1, 2, 3), c(3, 3, 3, 3))
ERROR: Error: x and y must be the same length

Scoping Rules

Scoping rules for functions are similar to those in Python.

R uses the tinted glass model discussed earlier.

z <- 4
test_fcn <- function(x) {
  x^z
}
test_fcn(2)
16

Since z isn’t in the function, R looks in the function’s environment for it.

Note that R handles potential scope conflicts differently to Python.

Recall that Python would not have allowed the following to run, since the function treats m as both global and local.

m <- 5
test_2 <- function(x) {
    print(m)
    m <- x**2
    print(m)
}
test_2(10)
[1] 5
[1] 100