class: center, middle, inverse, title-slide .title[ # Programming Tools in Data Science ] .subtitle[ ## Lecture #6: Function ] .author[ ### Samuel Orso ] .date[ ### 12 October 2023 ] --- # Function <img src="images/lego-674354_1280.jpg" width="600" height="400" style="display: block; margin: auto;" /> --- # ''Everything is a function call'' What does happen in `R`, or what does the `R` evaluator do once given an object? There are only three cases: * Evaluating a constant ```r 18.10 ``` ``` ## [1] 18.1 ``` * Parsing a syntactic name or any string within quoted marks ```r sqrt ``` ``` ## function (x) .Primitive("sqrt") ``` ```r `+` ``` ``` ## function (e1, e2) .Primitive("+") ``` --- # ''Everything is a function call'' * A function call, **the vast majority**. ```r 1.1 + 2.1 ``` ``` ## [1] 3.2 ``` ```r `+`(1.1, 2.1) ``` ``` ## [1] 3.2 ``` --- <img src="images/function.png" width="3541" style="display: block; margin: auto;" /> --- # Function components A function has three components: - **arguments**: the inputs the user gives to the function which will determine the value or type of output of a function; - **body**: the code lines containing the commands and operations which deliver the desired output; - **environment**: every R function is built within an environment of reference from which to source possible input values and/or other functions necessary for it to work. --- # Function parts You can verify these components using: - `formals(fun)`: return a list of arguments for `fun`; - `body(fun)`: return the code lines from within `fun`; - `environment(fun)`: return a frame, i.e. a collection of named objects, where `fun` finds values associated with names. --- # Example ```r my_div <- function (numerator, denominator) { div <- numerator / denominator return(div) } formals(my_div) ``` ``` ## $numerator ## ## ## $denominator ``` ```r body(my_div) ``` ``` ## { ## div <- numerator/denominator ## return(div) ## } ``` ```r environment(my_div) ``` ``` ## <environment: R_GlobalEnv> ``` --- # Note on the example * The following implementation is also valid. ```r my_div <- function (numerator, denominator) numerator / denominator ``` * Curly brackets would be needed if the body was taking more lines. * If `return` is not specified, `R` returns the value of the last evaluated expression by default. --- # Arguments How arguments can be passed to a function in `R`? * **positional matching**: arguments must be entered in the same order as defined in the function ```r my_div(1, 2) ``` ``` ## [1] 0.5 ``` ```r my_div(2, 1) ``` ``` ## [1] 2 ``` --- # Arguments * **perfect matching**: `R` searches for the arguments matching the exact name ```r my_div(numerator = 1, denominator = 2) ``` ``` ## [1] 0.5 ``` ```r my_div(denominator = 1, numerator = 2) ``` ``` ## [1] 2 ``` --- # Arguments * **prefix matching**: `R` searches for the first letters matching the exact name ```r my_div(n = 1, d = 2) ``` ``` ## [1] 0.5 ``` ```r my_div(d = 1, n = 2) ``` ``` ## [1] 2 ``` --- # A note on assignment operators `=` `<-` * Both `=` `<-` can be used to assign a value to a name. * When used to assign function arguments, there is a major difference: ```r my_div(numerator = 2, denominator = 1) ``` ``` ## [1] 2 ``` ```r numerator ``` ``` ## Error in eval(expr, envir, enclos): object 'numerator' not found ``` ```r my_div(numerator <- 2, denominator = 1) ``` ``` ## [1] 2 ``` ```r numerator ``` ``` ## [1] 2 ``` --- # Body The body of a function is simply the set of instructions and (possible) other functions that use the arguments provided by the user and computes the desired output. <center> <iframe src="https://giphy.com/embed/3oEdv9Y8md1SwyOMYE" width="360" height="360" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/animation-mechanical-3oEdv9Y8md1SwyOMYE">via GIPHY</a></p> </center> --- # Signalling conditions As a programmer, it is important to give meaningful indications to the user. There three types of signalling conditions: * **errors**: sever problem, indicated via `stop()`; * **warnings**: mild problem, indicated via `warning()`; * **messages**: informative, indicated via `message()`. --- # Errors Errors are related to the intentions behind the program. The programmer should ensure that the function is used within the scope of the intention. ```r # Is it possible to divide two characters? my_div("numerator","denominator") ``` ``` ## Error in numerator/denominator: non-numeric argument to binary operator ``` ```r # new definition my_div <- function (numerator, denominator) { # verify that both arguments are numeric (double or integer) * if(any(!is.numeric(numerator), !is.numeric(denominator))){ stop("`numerator` and `denominator` must be numeric") } div <- numerator / denominator return(div) } my_div("numerator","denominator") ``` ``` ## Error in my_div("numerator", "denominator"): `numerator` and `denominator` must be numeric ``` --- ```r # a matrix is still numeric, does it work? A <- matrix(1:9,ncol=3) B <- matrix(1:12,ncol=3) my_div(A,B) ``` ``` ## Error in numerator/denominator: non-conformable arrays ``` At this point, the programmer has to decide whether arrays are allowed. If the answer is positive, then extra verification is necessary as the dimension must match. ```r # dimension for arrays dim(A) ``` ``` ## [1] 3 3 ``` ```r length(A) ``` ``` ## [1] 9 ``` ```r # what happens for a vector? a <- 1:3 dim(a) ``` ``` ## NULL ``` ```r length(a) ``` ``` ## [1] 3 ``` --- ```r # new definition my_div <- function (numerator, denominator) { # verify that both arguments are numeric (double or integer) if(any(!is.numeric(numerator), !is.numeric(denominator))){ stop("`numerator` and `denominator` must be numeric") } # verify length match * if(length(numerator) != length(denominator)){ stop("`numerator` and `denominator` must have the same length") } # verify dimension match * if(length(dim(numerator)) != length(dim(denominator))){ stop("`numerator` and `denominator` must have the same dimensions") } * if(any(dim(numerator) != dim(denominator))){ stop("`numerator` and `denominator` must have the same dimensions") } div <- numerator / denominator return(div) } my_div(A, B) ``` ``` ## Error in my_div(A, B): `numerator` and `denominator` must have the same length ``` --- ```r A <- matrix(1:9,ncol=3) B <- matrix(10:18,ncol=3) my_div(A, B) ``` ``` ## [,1] [,2] [,3] ## [1,] 0.1000000 0.3076923 0.4375000 ## [2,] 0.1818182 0.3571429 0.4705882 ## [3,] 0.2500000 0.4000000 0.5000000 ``` --- # Warnings Suppose that when two arrays of different size but same dimensions are input, the programmer decide to return the division of a reduced form. The programmer should then warn the user. --- ```r # new definition my_div <- function (numerator, denominator) { # verify that both arguments are numeric (double or integer) if(any(!is.numeric(numerator), !is.numeric(denominator))){ stop("`numerator` and `denominator` must be numeric") } # verify dimension match if(length(dim(numerator)) != length(dim(denominator))){ stop("`numerator` and `denominator` must have the same dimensions") } # verify length match new_num <- numerator new_den <- denominator num_len <- length(numerator) den_len <- length(denominator) if(num_len != den_len){ # two cases if(num_len < den_len){ new_den <- numerator # `new_den` has same dimension has `numerator` new_den[seq_len(num_len)] <- denominator[seq_len(num_len)] * warning("Size of `denominator` is reduced to match `numerator`") } else { new_num <- numerator # `new_num` has same dimension has `denominator` new_num[seq_len(num_len)] <- numerator[seq_len(num_len)] * warning("Size of `numerator` is reduced to match `denominator`") } } div <- new_num / new_den return(div) } ``` --- ```r A <- matrix(1:9,ncol=3) B <- matrix(1:30,ncol=3) A/B ``` ``` ## Error in A/B: non-conformable arrays ``` ```r my_div(A, B) ``` ``` ## Warning in my_div(A, B): Size of `denominator` is reduced to match `numerator` ``` ``` ## [,1] [,2] [,3] ## [1,] 1 1 1 ## [2,] 1 1 1 ## [3,] 1 1 1 ``` --- # Messages The programmer optionally can print informative message. ```r # new definition my_div <- function (numerator, denominator) { ... * message("Starting the division") div <- new_num / new_den return(div) } ``` ```r my_div(A, B) ``` ``` ## Warning in my_div(A, B): Size of `denominator` is reduced to match `numerator` ``` ``` ## Starting the division ``` ``` ## [,1] [,2] [,3] ## [1,] 1 1 1 ## [2,] 1 1 1 ## [3,] 1 1 1 ``` --- # Lexical scoping Lexical scoping consists in how to find a value associated with a name. What do you think will be the output of the following command? ```r new_num <- 1 new_den <- 2 my_div(A, B) ``` --- # Lexical scoping Lexical scoping consists in how to find a value associated with a name. What do you think will be the output of the following command? ```r new_num <- 1 new_den <- 2 my_div(A, B) ``` ``` ## Warning in my_div(A, B): Size of `denominator` is reduced to match `numerator` ``` ``` ## Starting the division ``` ``` ## [,1] [,2] [,3] ## [1,] 1 1 1 ## [2,] 1 1 1 ## [3,] 1 1 1 ``` --- When looking for the value of a name, `R` follows some rules: * **dynamic lookup**: `R` looks for a name when the function is run, not when it is created. ```r f <- function() x * x f() ``` ``` ## Error in f(): object 'x' not found ``` ```r x <- 10 f() ``` ``` ## [1] 100 ``` * **name masking**: `R` looks for a name from the current environment, and if not supplied, to the parent environment and so on. ```r x <- 10 f <- function(){ x <- 1 x * x } f() ``` ``` ## [1] 1 ``` --- # Environment It is important to understand environment to understand where `R` finds names. Environment is a collection of named objects. * Every name in an environment is unique * Every environment has a parent --- Special environment: * **current**: this is where the code is currently running, usually the global environment. ```r environment() ``` ``` ## <environment: R_GlobalEnv> ``` * **global**: this is your "workspace", where you interactively experiment some code in `R` ```r globalenv() ``` ``` ## <environment: R_GlobalEnv> ``` * **empty**: every environment has a parent except the empty environment, which is empty ```r emptyenv() ``` ``` ## <environment: R_EmptyEnv> ``` --- * **execution**: it usually exists only during function calls. ```r f <- function() print(environment()) f() ``` ``` ## <environment: 0x55fe609ec3a0> ``` ```r f() ``` ``` ## <environment: 0x55fe60878980> ``` * **package**: when you attach a package with `library(pkg)`, the package becomes a parent of the global environment. --- When looking for a binding, `R` follows a path ```r search() ``` ``` ## [1] ".GlobalEnv" "package:stats" "package:graphics" ## [4] "package:grDevices" "package:utils" "package:datasets" ## [7] "package:methods" "Autoloads" "package:base" ``` ```r library(ggplot2) search() ``` ``` ## [1] ".GlobalEnv" "package:ggplot2" "package:stats" ## [4] "package:graphics" "package:grDevices" "package:utils" ## [7] "package:datasets" "package:methods" "Autoloads" ## [10] "package:base" ``` Be careful, if two packages have the same name for a function, which one is going to be called? You can use the operator `pkg::name` to make sure you make the right call. --- <blockquote> To understand computations in `R`, two slogans are helpful: <ul> <li>Everything that exists is an object.</li> <li>Everything that happens is a function call.</li> </ul> .right[— John Chambers] </blockquote> --- # Special functions Everything that happens is a function call, really? ```r 1 + 2 ``` ``` ## [1] 3 ``` ```r `+`(1,2) ``` ``` ## [1] 3 ``` ```r `+` ``` ``` ## function (e1, e2) .Primitive("+") ``` --- Remarks: * The `+` is a `.Primitive` function, it is implemented in `C` and is particularly efficient. * `+` is an **infix** function: the name appears between two arguments. Other infix functions comprises `=`, `<-`, `+`, `-`, `*`, `/`, `!`, `<`, `%*%`, `&`, `|`,... * You can create your own infix functions using `%operator%` pattern (e.g. `%>%` `magrittr` pipe operator). * It is licit but bad practice to redefine already existing infix functions as they are not reserved words ```r `+` <- function(x, y) 10 * x * y 1 + 2 ``` ``` ## [1] 20 ``` ```r environment(`+`) ``` ``` ## <environment: R_GlobalEnv> ``` ```r base::`+`(1,2) ``` ``` ## [1] 3 ``` --- # Function composition Everything is an object, so a function is an object too right? So you could pass an object as an argument? Yes! ```r A <- matrix(rnorm(9),nc=3) B <- matrix(rnorm(9),nc=3) C <- matrix(rnorm(9),nc=3) D <- matrix(rnorm(9),nc=3) my_div(my_div(A,B),my_div(C,D)) ``` ``` ## Starting the division ## Starting the division ## Starting the division ``` ``` ## [,1] [,2] [,3] ## [1,] -0.43334036 -0.1808198 1.6730775 ## [2,] -12.73393109 -3.1765239 1.5243314 ## [3,] 0.05825386 102.8161792 -0.1847996 ``` This is called **nesting**. --- You can also create intermediate objects. ```r AB <- my_div(A,B) ``` ``` ## Starting the division ``` ```r CD <- my_div(C,D) ``` ``` ## Starting the division ``` ```r my_div(AB,CD) ``` ``` ## Starting the division ``` ``` ## [,1] [,2] [,3] ## [1,] -0.43334036 -0.1808198 1.6730775 ## [2,] -12.73393109 -3.1765239 1.5243314 ## [3,] 0.05825386 102.8161792 -0.1847996 ``` --- or a final option is to use **piping** with for example `%>%` `magrittr` operator (which works only for one argument) ```r library(magrittr) my_div(A,B) %>% my_div(CD) ``` ``` ## Starting the division ## Starting the division ``` ``` ## [,1] [,2] [,3] ## [1,] -0.43334036 -0.1808198 1.6730775 ## [2,] -12.73393109 -3.1765239 1.5243314 ## [3,] 0.05825386 102.8161792 -0.1847996 ``` --- class: sydney-blue, center, middle # Question ? .pull-down[ <a href="https://ptds.samorso.ch/"> .white[<svg viewBox="0 0 384 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M369.9 97.9L286 14C277 5 264.8-.1 252.1-.1H48C21.5 0 0 21.5 0 48v416c0 26.5 21.5 48 48 48h288c26.5 0 48-21.5 48-48V131.9c0-12.7-5.1-25-14.1-34zM332.1 128H256V51.9l76.1 76.1zM48 464V48h160v104c0 13.3 10.7 24 24 24h104v288H48z"></path></svg> website] </a> <a href="https://github.com/ptds2023/"> .white[<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> GitHub] </a> ] --- # Exercise * What does the following code return? ```r x <- 2 f1 <- function(x) { function() { x + 3 } } f1(1)() ``` * How would you usually write these codes? ```r `+`(1, `*`(2, 3)) `*`(3, `+`(2, 1)) ``` * How could you make this function call easier to read? ```r mean(, TRUE, x = c(seq(10), rep(NA,3))) ``` --- * Does the following code throw an error when executed? Why or why not? ```r f2 <- function(a, b) { a * 3 } f2(3, stop("This is an error!")) f2(stop("This is an error!"), 3) ``` * Propose an infix function.