R is a very powerful language used in Data Science, Data Analytics and Stastistics in general. Having a knowledge of how R works will really help in your data analysis projects. This short article takes a snappy look at basic structure of creating a function in R and also a peep at Dates in R. (Credit to the Data Science Soecialisation by Johns Hopkins University on Coursera )
Let’s take a look at the basic syntax of functions in R:
- Create a function that takes 2 arguments and adds the arguments together
add2 <- function(x, y){ x + y }
- Return all values from a vector greater than 10
above10 <- function(x){ x[x>10] }
OR
above_n <- function(x, n){ use <- x > n ##define a logical vector for numbers greater than "n" x[use] }
- Specify arguments in the function and give default value for the argument so that if
the user forgets to specify a value for the argument, the default value will be used. This is really helpful if you have lots of arguments some of which you do not intend changing most of the time, you can give them default values in the function definitionabove_15 <- function(x, j=9){ use <- x > j x[use] }
- Get a data.frame and get each column of the data.frame and take the mean of the each column
columnmean <- function (x){ #get the number of columns nc <- ncol(x) #initialise /declare vector which will hold the means we will be calculating #much same as declaring a list in Python or Java to hold yourr results from a loop means <- numeric(nc) for(i in 1: nc){ means[i] <- mean(x[, i]) } means ## since this is the last expression, that is the value which will be returned } # if the event in the dataset has NAs then we cannot calculate the mean. So we will have #to find a way to remove NAs from the Dataset # We can pass an argument and initialise it to TRUE and then pass that variable to the our #mean function in our FOR LOOP columnmean_No_NA <- function (x, removeNAs = TRUE){ #get the number of columns nc <- ncol(x) #initialise /declare vector which will hold the means we will be calculating #much same as declaring a list in Python or Java to hold yourr results from a loop means <- numeric(nc) for(i in 1: nc){ means[i] <- mean(x[, i], na.rm = removeNAs) ## remove NAs from the mean function } means ## since this is the last expression, that is the value which will be returned }
- Match by NAME or by POSITION : When calling functions, we can pass argumnents to the functions if the function has some arguments in its definition. The values we pass as arguments to the function can be matched to the function’s defined arguments by NAME or by POSITION. Let’s take a look at this in the code’s below
Let’s generate some random normal valuesmydata <-rnorm(100)
Let’s get the standard deviation of mydata . The standard deviation functions “sd” is defined as sd(x, na.rm = FALSE) Arguments
x
a numeric vector or an R object which is coercible to one by as.double(x).
na.rm
logical. Should missing values be removed?We will call the standard deviation function by NOT naming the argument (x) in our call but we will simply pass our data to the function as below.
sd(mydata)
This defaults the passsed argument (mydata) to the first function-argument definition for the sd function which is x (hence x will be = mydata). In this sense our argument passing has been matched by position
We can also match by NAME by specifically naming the argument x and referring it to the argument we are passingsd(x=mydata) #match argument by name
One thing to note is that, when you name the argument (match by name) you do not necessarily have to put the arguments in any specific order. You simply
have to refer to name to the arguments in the function and that will worksd(na.rm =FALSE, x=mydata)
If only one of the passed arguments is named, the name will be respectively assinged and the
remaining unnamed argument will be defaulted to the next argument in which has not yet being matched. In the code below, na.rm = FALSE has been matched by name, hence it is crossed out. However, mydata has been passed without the argument x being named, hence mydata will be positionally assigned to the first argument in sd function which has not yet got a matching ; in this case, argument xsd(na.rm = FALSE, mydata)
- Lazy evaluation or call-by-need: This is where an argument in a function is not used at all in the body of function but it has been declared as an argument. The arguments are only evaluated as and when needed. For instance in the function below:
call_when_needed <- function(a, b){ a^3 }
Calling the function and passing just one value without naming the argument, will match the argument by position , therefore, the passed argument will reference argument a (which is the first argument definition in function call_when_needed and argument b will not be used at all as it has not be provided when calling the function
call_when_needed(3)
- “…” : It is the argument defined in a function when we cannot know all the probable
arguments which will be passed to the function in advance.
Which means the function can have varied number of arguments in place of the “…” - Lexical Scoping: The scope in which the variables in a function are being called.
make.power <- function(n){ pow <- function(x){ x^n } pow } cube <- make.power(3) cube
Dates
- Dates in R as stored as Date class . Date is basically a YEAR, MONTH and DAY
Example : converting character String to Datex <- as.Date("1970-01-01")
Let’s get the number of days after 1970-01-01
unclass(x)
-
POSIXlt and POSIXct
POSIXlt — stores the datetime as LIST and POSIXct – stores datetime as one large integer, which is the number of seconds passed since 1907-01-01
curr_time <- Sys.time() curr_time
Let’s convert to POSIXlt — storing the date as LIST
p <- as.POSIXlt(curr_time)
Let’s get names of the columns in POSIXlt date
names(unclass(p))
Let’s check the class of “p”
class(p)
Let’s extract only one element eg sec
p$sec
Let’s get the POSIXct – number of seconds since 1970-01-01
pct <- as.POSIXct(curr_time)
Let’s get the integer value since 1970-01-01
unclass(pct)
- strptime : This function is used to convert character String to datetime objects. the class of date returned is POSIXlt. Let’s see an example below:
datestring <- c("January 10, 2012 10:40", "December 9, 2011 9:10") str_time <- strptime(datestring, "%B %d, %Y %H:%M") str_time
lets the check the class returned from the strptime function
class(str_time)
This is a quick basic highlight of functions in R. Feel free to put any queries in the comments box below
Summary of what was learnt
In this short intro to R functions structure, we looked at
- Functions
- Calling functions by NAME or POSITION
- Dates, POSIXlt and POSIXct