| | | | |

Java Python R SQL Excel Compared Similarities For Data Science and Data Analytics – The Basics

If you have ever worked with Java, Python, R, SQL, Excel and other Languages on a varied Data Science or Data Analytics projects, you will realise that all these languages have similar syntaxes, or at least, can achieve the same objective with very similar codes.

 

Below is a comparison and similarities of these various tools and programmes I put together for quick reference when I realised that there are some similarities between these languages languages and tools are.

Though it is not complete and exhaustive, I believe it will be of help to someone.

If you have any similarities amongst any of these languages, please leave it as a comment in the comment box I will update the table.

I would want this to help all new people who are into Data Science and Data Analytics as I wish I had something similar to this when I first started. Thank you for your contribution.

 

Navigating the table:

Please

  1. Use the search box located directly on the top right hand corner of the table to search for terms
  2. Use the scroll bar at the bottom of the table to see all columns
  3. The table has 5 columns which are Java, Python, R, SQL, Excel, Example,  Description

 

Java, Python, R ,SQL, Excel Similarities For Data Science - The Basics

JavaPythonRSQLEXCELEXAMPLEDESCRIPTION
instanceofisinstanceis used to test whether the object is an instance of the specified type (class or subclass or interface).
normalize%p7 = sub2["USFREQMO"].value_counts(sort = False, normalize = True)get percentage of the values. . normalize = True means get the percentages
sort_values()sort(),order(), arrange()sortp7 = sub2["USFREQMO"].value_counts(sort = False, normalize = True)sort the data in . the default = True (descending order)
sort_index(ascending=True)sortsort by index and not the values
assignrow_number()countifs, sumifsdata.assign(dupes=data.groupby(['AID']).cumcount()+1).query('dupes>1')[:6]checking the number of occurrences of a particular variable
importlibraryimport pandas = library("swirl")
pip installinstall.packageseg pip install virtualenv = install.package("swirl")
getcwd()getwd()get working directory
pandas.read_csv()read.csv()read.csv("mydata.csv")read a csv file
dir()list contents in the current working directory
ls()list the variables and functions available to the current console
source()source("myCode.R")load files from the directory
=<-x <- 1 , python x= 1assignment operator
//### commentsingle comment
/* */'' ' ''''block comment
range(1,20)01:20the colon, indicates a ssequence, meaning 1 up to 20
vectorscan contain ONLY one data type
listtype of vector that can contain mixed data types
dimnamesdimension names of the vector ie, the list, matrix, dataframes or base vectors
concat()c()c(0.5, 0.6)concatenate
vector(class, length)vector("numeric", length =10)every vector should have 2 arguments. Class and length
" "" ""string"string quotation
true = 1, false = 0true = 1, false = 0numeric denotations for boolean values of True and False
to_numeric()as.numeric()as.numeric("6")cast to numeric data type
type()class()x =6 , class(x)find the data type class of the variable x
cast or ()asas.numeric(), as.logical()cast from one datatype to the other
Stringcharacter"character", python = "String"character is the name given to String in R environment
indexing start at 0indexing start at 1
indexing a list is by single brackets eg [1]indexing a list is by double brackets eg [[1]]indicates item at position 1
matrices -- they are VECTORS with DIMENSION attribute. Which means the have rows and columns. Matrice can only contain ONE data class
matrice can store only one type of data class or not a mixed type
shapedim()dim(m)get the dimension of matrice or dataframe(python). First number is ROW and second is COLUMN
matrices are created column-wisetakes the data and put them along the colum first. So if number of rows = 2, then it will go 2 rows down the colum and go the next column then 2 rows down it
factor data typeused to store categorical variable or data types
value_counts()table(f)table(factor variable)frequency counts
NaNNa or NaNmissing values
is.na()test objects if there are NA
is.nan()used to test for NaN
a NaN value is NA but the converse is not true
len(df.index)nrow()number of rows
len(df.columns)ncol()number of columns
df.columnsnames()finding the names or labels of the object or column names of the dataframe or object
booleanlogicalboolean term is called logical in R
DataFramedata.framePython is called DataFrame and R is called data.frame
dput()dput() writes the data type and meta of a dataframes or data to a file and then later dget to get the metadata
dump()s0
[subsetting with single [ --- you will always get object of the same CLASS back. You can select multiple items via a sequence
[[subsetting elements of a list or data frame. It can be used to select ONLY ONE ELEMENT of the list or dataframe
..$used to select element of a list or data frame by name, semantics are similar to [[
list[1:3]list[c(1,3)]subsetting the 1st and 3rd element of a list by passing a vector object of 1, 3 to the single bracket of the list. Single brackets are used to list multiple items
!!!TRUEnot or inverser of the logical operator
dropnacomplete.cases or create logical vectors for subsetused to drop NA values from large data set
range(1,20,0.5)seq(1,20,0.5)reange from 1 - 20 with increment of 0.5
length()len()length()len()len()length of a vector
rep()rep(0,times=40)replicate the same values several times
fundatmental data typesatomic vectorseg, numeric, logical, boolean(python), etc
<<<<<less than
!ERROR! A59 -> Formula Error: Unexpected operator '='!ERROR! B59 -> Formula Error: Unexpected operator '='!ERROR! C59 -> Formula Error: Unexpected operator '='==equality
>>>>>greater than
!=!=!=!= OR <><>not equal to
| or |||| or ||orORor
An expression using the OR operator will evaluate to TRUE if the left operand or the right operand
is TRUE. If both are TRUE, the expression will evaluate to TRUE, however if neither are TRUE, then
the expression will be FALSE.
& or &&&& or &&andANDif the right and left operands of AND are both TRUE the entire expression
is TRUE, otherwise it is FALSE. For example, TRUE & TRUE evaluates to TRUE. Try typing FALSE & FALSE
to how it is evaluated.
You can use the `&` operator to evaluate AND across a vector. The `&&` version of AND only evaluates
the first member of a vector. Let's test both for practice. Type the expression TRUE & c(TRUE,
| FALSE, FALSE).
As you may recall, arithmetic has an order of operations and so do logical expressions. All AND
operators are evaluated before OR operators. Let's look at an example of an ambiguous case.
> 5 > 8 || 6 != 8 && 4 > 3.9
[1] TRUE

| That's a job well done!

|====================================================== | 58%

| Let's walk through the order of operations in the above case. First the left and right operands of
| the AND operator are evaluated. 6 is not equal 8, 4 is greater than 3.9, therefore both operands are
| TRUE so the resulting expression `TRUE && TRUE` evaluates to TRUE. Then the left operand of the OR
| operator is evaluated: 5 is not greater than 8 so the entire expression is reduced to FALSE || TRUE.
| Since the right operand of this expression is TRUE the entire expression evaluates to TRUE.
identical(), isTRUE, xor()extra inbuilt R functions for logical operations. Xor is called exclusive or
which()We can use the resulting logical vector to ask other questions about ints. The which() function
| takes a logical vector as an argument and returns the indices of the vector that are TRUE. For
| example which(c(TRUE, FALSE, TRUE)) would return the vector c(1, 3).
%%%%modulus operator
paste(my_char, collapse = " ")write out a character vector of length greater than one to a 1 character vector
sum(my_na)get the coutn of TRUEs and FALSEs (where my_na is logical vector)
NANA -> not available hence a place holder for a value
NaNNaN -> not a number
y <- x[!is.na(x)]find all values in a vector that are not Nas
zero based indexingzero based indexing1 based indexing1 based indexingfirst element in a list or vector starts with 1 whereas it is zero with Python and Java
x[c(-2,-10)] or x[-c(2,10)]all elements of vector-list X except the ones at index 2 and 10 (represented by negative signs)
row-wisecolumn-wise programmeconstructs dataframe or matrix along the colunns and continues to the next column -R. opposite is true for Python
if(){
#doSomething
}else if{
#doSomething
}else{
}
if():
#doSomething
elif:
#doSomething
else
#doSomething
if(){
#doSomething
}else if{
#doSomething
}else{
}
R and Java same style for if-elseif-else whereas python uses tab-breaks without brackets
for ( I; i<10; i++){
print(i)
}
for( I in range(0,10):
print(i)
for(I in 1:10){
print(i)
}
the R loop index and loop range in the parenthesis is similar to Python, but the outer curly brackets is similar to Java
repeatto keep repeating until a condition is satisfied
nextskip to next iteration. Do not execute code that come right after next statement. Can be used with any iteration or loop
breakexit the loop entirely. Can be used with any iteration or loop
returnexit a function and return a value that you pass it
The R function returns whatever the last expression in the function is
a_function <- function(){
#doSomethingHere
}
functions in R as also objects just like vector , dataframe, and matrix objects
named argumentspotentially have default values
formal argumentsthose included in the formal definition of the function
functions can be matched by position or by name
in R , a function can be defined inside another function AND a function can RETURN a function as the return value
DateR dates are stored as number of days since 1970
unclass()find the number of days passed since 1970-01-01
Sys.timecurrent time of the operating system
strptimeconverts character string to datetime objects
difftime()datediff()
info()str()get the data types and info about the columns
describe()summary()get descriptive statistics about the data
lapply()The lapply() function takes a list as input, applies a function to each element of
| the list, then returns a list of the same length as the original one.The 'l' in 'lapply' stands for 'list'. It can also take data.frame as data.frame is a list of vectors
sapply()simplifies the result returned by lapply. Hence the "s" infront of sapply.In general, if the result is a list where every element is of length one, then
| sapply() returns a vector. If the result is a list where every element is a vector of
| the same length (> 1), sapply() returns a matrix. If sapply() can't figure things
| out, then it just returns a list, no different from what lapply() would give you.
vapply()Whereas sapply() tries to 'guess' the correct format of the result, vapply() allows
| you to specify it explicitly. If the result doesn't match the format you specify,
| vapply() will throw an error, causing the operation to stop. This can prevent
| significant problems in your code that might be caused by getting unexpected return
| values from sapply().
tapply() -- into groups or levelssplit your data up into groups based on the
| value of some variable, then apply a function to the members of each group.
apply()apply functions over an array or matrix
split()
objects or instances are made from classesobjects or instances can be made from functions
toString()str*strit tells you whats in an object or function
object.size()check how much memory the item is occupy
colmeans()find the means of each column
plot()short form for scatter
hist()short form for histogram
matchmatchmatch("city", names(chicago))returns the index of the item being matched
mergemergejoin 2 dataframes or objects on a common column
groupby()group_by()pivot_tables
%>%chain operator: The code to the right
| of %>% operates on the result from the code to the left of %>%.
str.containsgrepl, greplikedf[df['A'].str.contains("hello")],

df[grepl("hello",df$A),]
find all rows in the column "A" where the values contains the text "hello"
s.substring(2)s[2 : ]substr("Hello",1,3)substring("Hello",1,3)R= substr("Hello",1,3);
p = s[2 : ]
get substring from a String
str.trim()str.strip()str_trim("Hello ")ltrim(), rtrim()trim()strip or trim extra trailing spaces in a string
replacereplacesub, gsubreplacereplaceR= sub("_","@", str)
P =str.replace("_", "@")
J = str.replace('_', '@')
replace or substitute an old character string with a new character string
Want more information like this?

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *