R is a very powerful language used in Data Science, Data Analytics and Stastistics in general. Having a knowledge of how R works will really help in your data analysis projects. This short article takes a snappy look at basic structure of creating a function in R and also a peep at Dates in R. (Credit to the Data Science Soecialisation by Johns Hopkins University on Coursera )

 

 

Let’s take a look at the basic syntax of functions in R:

  1. Create a function that takes 2 arguments and adds the arguments together
    add2 <- function(x, y){
                x + y
    }
  2. Return all values from a vector greater than 10
    above10 <- function(x){
    
                       x[x>10] 
    
    }
    

     

    OR

    above_n <- function(x, n){
                       use <- x > n ##define a logical vector for numbers greater than "n"
                       x[use]
    
    }
    

     

  3. Specify arguments in the function and give default value for the argument so that if
    the user forgets to specify a value for the argument, the default value will be used. This is really helpful if you have lots of arguments some of which you do not intend changing most of the time, you can give them default values in the function definition

    above_15 <- function(x, j=9){
                           use <- x > j
                           x[use]
                        }
  4. Get a data.frame and get each column of the data.frame and take the mean of the each column
    columnmean <- function (x){
    #get the number of columns
    nc <- ncol(x)
    
    #initialise /declare vector which will hold the means we will be calculating 
    #much same as declaring a list in Python or Java to hold yourr results from a loop
    
    means <- numeric(nc)
    
    for(i in 1: nc){
    means[i] <- mean(x[, i])
    }
    means ## since this is the last expression, that is the value which will be returned
    }
    
    
    
    
    
    # if the event in the dataset has NAs then we cannot calculate the mean. So we will have 
    #to find a way to remove NAs from the Dataset
    # We can pass an argument and initialise it to TRUE and then pass that variable to the our 
    #mean function in our FOR LOOP
    columnmean_No_NA <- function (x, removeNAs = TRUE){
    #get the number of columns
    nc <- ncol(x)
    
    #initialise /declare vector which will hold the means we will be calculating 
    #much same as declaring a list in Python or Java to hold yourr results from a loop
    
    means <- numeric(nc)
    
    for(i in 1: nc){
    means[i] <- mean(x[, i], na.rm = removeNAs) ## remove NAs from the mean function
    }
    means ## since this is the last expression, that is the value which will be returned
    }
    

     

  5.  Match by NAME or by POSITION : When calling functions, we can pass argumnents to the functions if the function has some arguments in its definition. The values we pass as arguments to the function can be matched to the function’s defined arguments by NAME or by POSITION. Let’s take a look at this in the code’s below
    Let’s generate some random normal values

    mydata <-rnorm(100)
    

     

    Let’s get the standard deviation of mydata . The standard deviation functions “sd” is defined as sd(x, na.rm = FALSE) Arguments

    x
    a numeric vector or an R object which is coercible to one by as.double(x).
    na.rm
    logical. Should missing values be removed?

    We will call the standard deviation function by NOT naming the argument (x) in our call but we will simply pass our data to the function as below.

    sd(mydata)

    This defaults the passsed argument (mydata) to the first function-argument definition for the sd function which is (hence x will be = mydata). In this sense our argument passing has been matched by position                                                                                                                   
    We can also match by NAME by specifically naming the argument x and referring it to the argument we are passing

    sd(x=mydata)  #match argument by name

     

    One thing to note is that, when you name the argument (match by name) you do not necessarily  have to put the arguments in any specific order. You simply
    have to refer to name to the arguments in the function and that will work

    sd(na.rm =FALSE, x=mydata)
    

     

    If only one of the passed arguments is named, the name will be respectively assinged and the
    remaining unnamed argument will be defaulted to the next argument in which has not yet being matched. In the code below, na.rm = FALSE has been matched by name, hence it is crossed out. However, mydata has been passed without the argument x being named, hence mydata will be positionally assigned to the first argument in sd function which has not yet got a matching ; in this case, argument x

    sd(na.rm = FALSE, mydata)

     

  6. Lazy evaluation or call-by-need:  This is where an argument in a function is not used at all in the body of function but it has been declared as an argument. The arguments are only evaluated as and when needed.  For instance in the function below:
    call_when_needed <- function(a, b){
    
                                           a^3
    }
    

     

    Calling the function and passing just one value without naming the argument, will match the argument by position , therefore, the passed argument will reference argument a (which is the first argument definition in function call_when_needed and argument will not be used at all as it has not be provided when calling the function

    call_when_needed(3)
  7. “…” :  It is the argument defined in a function when we cannot know all the probable
    arguments which will be passed to the function in advance.
    Which means the function can have varied number of arguments in place of the “…”                                                                                                                                                                                                                                                 
  8. Lexical Scoping: The scope in which the variables in a function are being called.
    make.power <- function(n){
    pow <- function(x){
    
    x^n
    }
    
    pow
    
    }
    
    cube <- make.power(3)
    cube

     

    Dates

  9. Dates in R as stored as Date class . Date is basically  a YEAR, MONTH and DAY
    Example : converting character String to Date

    x <- as.Date("1970-01-01")

    Let’s get the number of days after 1970-01-01

    unclass(x)

     

  10. POSIXlt and POSIXct

    POSIXlt — stores the datetime as LIST and  POSIXct – stores datetime as one large integer, which is the number of seconds passed since 1907-01-01

    curr_time <- Sys.time()
    curr_time

     

    Let’s convert to POSIXlt — storing the date as LIST

    p <- as.POSIXlt(curr_time)
    

    Let’s get names of the columns in POSIXlt date

    names(unclass(p))

     

    Let’s check the class of “p”

    class(p)

     

    Let’s extract only one element eg sec

    p$sec

     

    Let’s get the POSIXct – number of seconds since 1970-01-01

    pct <- as.POSIXct(curr_time)

    Let’s get the integer value since 1970-01-01

    unclass(pct)

     

  11. strptime :   This function is used to convert character String to datetime objects. the class of date returned is POSIXlt. Let’s see an example below:
    datestring <- c("January 10, 2012 10:40", "December 9, 2011 9:10")
    str_time <- strptime(datestring, "%B %d, %Y %H:%M")
    str_time

     

    lets the check the class returned from the strptime function

    class(str_time)

     

This is a quick basic highlight of functions in R. Feel free to put any queries in the comments box below

Summary of what was learnt

In this short intro to R functions structure, we looked at

  1.  Functions
  2. Calling functions by NAME or POSITION
  3. Dates, POSIXlt and POSIXct

 

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *