Some Basic Introduction to Functions and Dates in R

R is a very powerful language used in Data Science, Data Analytics and Stastistics in general. Having a knowledge of how R works will really help in your data analysis projects. This short article takes a snappy look at basic structure of creating a function in R and also a peep at Dates in R. (Credit to the Data Science Soecialisation by Johns Hopkins University on Coursera )

Let’s take a look at the basic syntax of functions in R:

Create a function that takes 2 arguments and adds the arguments together
```
add2 <- function(x, y){
            x + y
}
```

Return all values from a vector greater than 10

above10 <- function(x){

                   x[x>10] 

}

above_n <- function(x, n){
                   use <- x > n ##define a logical vector for numbers greater than "n"
                   x[use]

}

Specify arguments in the function and give default value for the argument so that if
the user forgets to specify a value for the argument, the default value will be used. This is really helpful if you have lots of arguments some of which you do not intend changing most of the time, you can give them default values in the function definition
```
above_15 <- function(x, j=9){
                       use <- x > j
                       x[use]
                    }
```

Get a data.frame and get each column of the data.frame and take the mean of the each column

columnmean <- function (x){
#get the number of columns
nc <- ncol(x)

#initialise /declare vector which will hold the means we will be calculating 
#much same as declaring a list in Python or Java to hold yourr results from a loop

means <- numeric(nc)

for(i in 1: nc){
means[i] <- mean(x[, i])
}
means ## since this is the last expression, that is the value which will be returned
}





# if the event in the dataset has NAs then we cannot calculate the mean. So we will have 
#to find a way to remove NAs from the Dataset
# We can pass an argument and initialise it to TRUE and then pass that variable to the our 
#mean function in our FOR LOOP
columnmean_No_NA <- function (x, removeNAs = TRUE){
#get the number of columns
nc <- ncol(x)

#initialise /declare vector which will hold the means we will be calculating 
#much same as declaring a list in Python or Java to hold yourr results from a loop

means <- numeric(nc)

for(i in 1: nc){
means[i] <- mean(x[, i], na.rm = removeNAs) ## remove NAs from the mean function
}
means ## since this is the last expression, that is the value which will be returned
}

Match by NAME or by POSITION : When calling functions, we can pass argumnents to the functions if the function has some arguments in its definition. The values we pass as arguments to the function can be matched to the function’s defined arguments by NAME or by POSITION. Let’s take a look at this in the code’s below
Let’s generate some random normal values
```
mydata <-rnorm(100)
```
Let’s get the standard deviation of mydata . The standard deviation functions “sd” is defined as sd(x, na.rm = FALSE) Arguments

x
a numeric vector or an R object which is coercible to one by as.double(x).
na.rm
logical. Should missing values be removed?

We will call the standard deviation function by NOT naming the argument (x) in our call but we will simply pass our data to the function as below.
```
sd(mydata)
```
This defaults the passsed argument (mydata) to the first function-argument definition for the sd function which is x (hence x will be = mydata). In this sense our argument passing has been matched by position
We can also match by NAME by specifically naming the argument x and referring it to the argument we are passing
```
sd(x=mydata)  #match argument by name
```
One thing to note is that, when you name the argument (match by name) you do not necessarily have to put the arguments in any specific order. You simply
have to refer to name to the arguments in the function and that will work
```
sd(na.rm =FALSE, x=mydata)
```
If only one of the passed arguments is named, the name will be respectively assinged and the
remaining unnamed argument will be defaulted to the next argument in which has not yet being matched. In the code below, na.rm = FALSE has been matched by name, hence it is crossed out. However, mydata has been passed without the argument x being named, hence mydata will be positionally assigned to the first argument in sd function which has not yet got a matching ; in this case, argument x
```
sd(na.rm = FALSE, mydata)
```
Lazy evaluation or call-by-need: This is where an argument in a function is not used at all in the body of function but it has been declared as an argument. The arguments are only evaluated as and when needed. For instance in the function below:
```
call_when_needed <- function(a, b){

                                       a^3
}
```
Calling the function and passing just one value without naming the argument, will match the argument by position , therefore, the passed argument will reference argument a (which is the first argument definition in function call_when_needed and argument b will not be used at all as it has not be provided when calling the function
```
call_when_needed(3)
```
“…” : It is the argument defined in a function when we cannot know all the probable
arguments which will be passed to the function in advance.
Which means the function can have varied number of arguments in place of the “…”

Lexical Scoping: The scope in which the variables in a function are being called.

make.power <- function(n){
pow <- function(x){

x^n
}

pow

}

cube <- make.power(3)
cube

Dates

Dates in R as stored as Date class . Date is basically a YEAR, MONTH and DAY
Example : converting character String to Date
```
x <- as.Date("1970-01-01")
```
Let’s get the number of days after 1970-01-01
```
unclass(x)
```
POSIXlt and POSIXct

POSIXlt — stores the datetime as LIST and POSIXct – stores datetime as one large integer, which is the number of seconds passed since 1907-01-01
```
curr_time <- Sys.time()
curr_time
```
Let’s convert to POSIXlt — storing the date as LIST
```
p <- as.POSIXlt(curr_time)
```
Let’s get names of the columns in POSIXlt date
```
names(unclass(p))
```
Let’s check the class of “p”
```
class(p)
```
Let’s extract only one element eg sec
```
p$sec
```
Let’s get the POSIXct – number of seconds since 1970-01-01
```
pct <- as.POSIXct(curr_time)
```
Let’s get the integer value since 1970-01-01
```
unclass(pct)
```
strptime : This function is used to convert character String to datetime objects. the class of date returned is POSIXlt. Let’s see an example below:
```
datestring <- c("January 10, 2012 10:40", "December 9, 2011 9:10")
str_time <- strptime(datestring, "%B %d, %Y %H:%M")
str_time
```
lets the check the class returned from the strptime function
```
class(str_time)
```

This is a quick basic highlight of functions in R. Feel free to put any queries in the comments box below

Summary of what was learnt

In this short intro to R functions structure, we looked at

Functions
Calling functions by NAME or POSITION
Dates, POSIXlt and POSIXct

Some Basic Introduction to Functions and Dates in R

Dates

POSIXlt and POSIXct

Save Multiple Pandas DataFrames to One Single Excel Sheet Side by Side or Dowwards – XlsxWriter

Python iloc, loc, ix Data Retrieving Selection Functions

Making Data Management Decisions – Data Analysis and Intrepretation

Various Types of Basic Charts For Data Analysis and Exploration – Visualization and Meaning

Test a Multiple – Multivariate Regression Model

Decision Tree Price Optimisation – Regression Tree

Leave a Reply Cancel reply

DataPandas LTS

EXPLORE DataPandas

ImportAnt link

GET IN TOUCH

© 2023 DataPandas

Dates

POSIXlt and POSIXct

People Who Read The Above Post Also Read This:

Similar Posts

Leave a Reply Cancel reply

DataPandas LTS

EXPLORE DataPandas

ImportAnt link

GET IN TOUCH

© 2023 DataPandas

Review Cart