Java Python R SQL Excel Compared Similarities For Data Science and Data Analytics – The Basics
If you have ever worked with Java, Python, R, SQL, Excel and other Languages on a varied Data Science or Data Analytics projects, you will realise that all these languages have similar syntaxes, or at least, can achieve the same objective with very similar codes.
Below is a comparison and similarities of these various tools and programmes I put together for quick reference when I realised that there are some similarities between these languages languages and tools are.
Though it is not complete and exhaustive, I believe it will be of help to someone.
If you have any similarities amongst any of these languages, please leave it as a comment in the comment box I will update the table.
I would want this to help all new people who are into Data Science and Data Analytics as I wish I had something similar to this when I first started. Thank you for your contribution.
Navigating the table:
Please
- Use the search box located directly on the top right hand corner of the table to search for terms
- Use the scroll bar at the bottom of the table to see all columns
- The table has 5 columns which are Java, Python, R, SQL, Excel, Example, Description
Java, Python, R ,SQL, Excel Similarities For Data Science - The Basics
Java | Python | R | SQL | EXCEL | EXAMPLE | DESCRIPTION | |
---|---|---|---|---|---|---|---|
instanceof | isinstance | is used to test whether the object is an instance of the specified type (class or subclass or interface). | |||||
normalize | % | p7 = sub2["USFREQMO"].value_counts(sort = False, normalize = True) | get percentage of the values. . normalize = True means get the percentages | ||||
sort_values() | sort(),order(), arrange() | sort | p7 = sub2["USFREQMO"].value_counts(sort = False, normalize = True) | sort the data in . the default = True (descending order) | |||
sort_index(ascending=True) | sort | sort by index and not the values | |||||
assign | row_number() | countifs, sumifs | data.assign(dupes=data.groupby(['AID']).cumcount()+1).query('dupes>1')[:6] | checking the number of occurrences of a particular variable | |||
import | library | import pandas = library("swirl") | |||||
pip install | install.packages | eg pip install virtualenv = install.package("swirl") | |||||
getcwd() | getwd() | get working directory | |||||
pandas.read_csv() | read.csv() | read.csv("mydata.csv") | read a csv file | ||||
dir() | list contents in the current working directory | ||||||
ls() | list the variables and functions available to the current console | ||||||
source() | source("myCode.R") | load files from the directory | |||||
= | <- | x <- 1 , python x= 1 | assignment operator | ||||
// | # | # | # comment | single comment | |||
/* */ | '' ' '''' | block comment | |||||
range(1,20) | 01:20 | the colon, indicates a ssequence, meaning 1 up to 20 | |||||
vectors | can contain ONLY one data type | ||||||
list | type of vector that can contain mixed data types | ||||||
dimnames | dimension names of the vector ie, the list, matrix, dataframes or base vectors | ||||||
concat() | c() | c(0.5, 0.6) | concatenate | ||||
vector(class, length) | vector("numeric", length =10) | every vector should have 2 arguments. Class and length | |||||
" " | " " | "string" | string quotation | ||||
true = 1, false = 0 | true = 1, false = 0 | numeric denotations for boolean values of True and False | |||||
to_numeric() | as.numeric() | as.numeric("6") | cast to numeric data type | ||||
type() | class() | x =6 , class(x) | find the data type class of the variable x | ||||
cast or () | as | as.numeric(), as.logical() | cast from one datatype to the other | ||||
String | character | "character", python = "String" | character is the name given to String in R environment | ||||
indexing start at 0 | indexing start at 1 | ||||||
indexing a list is by single brackets eg [1] | indexing a list is by double brackets eg [[1]] | indicates item at position 1 | |||||
matrices -- they are VECTORS with DIMENSION attribute. Which means the have rows and columns. Matrice can only contain ONE data class | |||||||
matrice can store only one type of data class or not a mixed type | |||||||
shape | dim() | dim(m) | get the dimension of matrice or dataframe(python). First number is ROW and second is COLUMN | ||||
matrices are created column-wise | takes the data and put them along the colum first. So if number of rows = 2, then it will go 2 rows down the colum and go the next column then 2 rows down it | ||||||
factor data type | used to store categorical variable or data types | ||||||
value_counts() | table(f) | table(factor variable) | frequency counts | ||||
NaN | Na or NaN | missing values | |||||
is.na() | test objects if there are NA | ||||||
is.nan() | used to test for NaN | ||||||
a NaN value is NA but the converse is not true | |||||||
len(df.index) | nrow() | number of rows | |||||
len(df.columns) | ncol() | number of columns | |||||
df.columns | names() | finding the names or labels of the object or column names of the dataframe or object | |||||
boolean | logical | boolean term is called logical in R | |||||
DataFrame | data.frame | Python is called DataFrame and R is called data.frame | |||||
dput() | dput() writes the data type and meta of a dataframes or data to a file and then later dget to get the metadata | ||||||
dump() | s | 0 | |||||
[ | subsetting with single [ --- you will always get object of the same CLASS back. You can select multiple items via a sequence | ||||||
[[ | subsetting elements of a list or data frame. It can be used to select ONLY ONE ELEMENT of the list or dataframe | ||||||
. | . | $ | used to select element of a list or data frame by name, semantics are similar to [[ | ||||
list[1:3] | list[c(1,3)] | subsetting the 1st and 3rd element of a list by passing a vector object of 1, 3 to the single bracket of the list. Single brackets are used to list multiple items | |||||
! | ! | !TRUE | not or inverser of the logical operator | ||||
dropna | complete.cases or create logical vectors for subset | used to drop NA values from large data set | |||||
range(1,20,0.5) | seq(1,20,0.5) | reange from 1 - 20 with increment of 0.5 | |||||
length() | len() | length() | len() | len() | length of a vector | ||
rep() | rep(0,times=40) | replicate the same values several times | |||||
fundatmental data types | atomic vectors | eg, numeric, logical, boolean(python), etc | |||||
< | < | < | < | < | less than | ||
!ERROR! A59 -> Formula Error: Unexpected operator '=' | !ERROR! B59 -> Formula Error: Unexpected operator '=' | !ERROR! C59 -> Formula Error: Unexpected operator '=' | = | = | equality | ||
> | > | > | > | > | greater than | ||
!= | != | != | != OR <> | <> | not equal to | ||
| or || | | | | or || | or | OR | or An expression using the OR operator will evaluate to TRUE if the left operand or the right operand is TRUE. If both are TRUE, the expression will evaluate to TRUE, however if neither are TRUE, then the expression will be FALSE. | ||
& or && | & | & or && | and | AND | if the right and left operands of AND are both TRUE the entire expression is TRUE, otherwise it is FALSE. For example, TRUE & TRUE evaluates to TRUE. Try typing FALSE & FALSE to how it is evaluated. You can use the `&` operator to evaluate AND across a vector. The `&&` version of AND only evaluates the first member of a vector. Let's test both for practice. Type the expression TRUE & c(TRUE, | FALSE, FALSE). | ||
As you may recall, arithmetic has an order of operations and so do logical expressions. All AND operators are evaluated before OR operators. Let's look at an example of an ambiguous case. > 5 > 8 || 6 != 8 && 4 > 3.9 [1] TRUE | That's a job well done! |====================================================== | 58% | Let's walk through the order of operations in the above case. First the left and right operands of | the AND operator are evaluated. 6 is not equal 8, 4 is greater than 3.9, therefore both operands are | TRUE so the resulting expression `TRUE && TRUE` evaluates to TRUE. Then the left operand of the OR | operator is evaluated: 5 is not greater than 8 so the entire expression is reduced to FALSE || TRUE. | Since the right operand of this expression is TRUE the entire expression evaluates to TRUE. | |||||||
identical(), isTRUE, xor() | extra inbuilt R functions for logical operations. Xor is called exclusive or | ||||||
which() | We can use the resulting logical vector to ask other questions about ints. The which() function | takes a logical vector as an argument and returns the indices of the vector that are TRUE. For | example which(c(TRUE, FALSE, TRUE)) would return the vector c(1, 3). | ||||||
% | % | %% | modulus operator | ||||
paste(my_char, collapse = " ") | write out a character vector of length greater than one to a 1 character vector | ||||||
sum(my_na) | get the coutn of TRUEs and FALSEs (where my_na is logical vector) | ||||||
NA | NA -> not available hence a place holder for a value | ||||||
NaN | NaN -> not a number | ||||||
y <- x[!is.na(x)] | find all values in a vector that are not Nas | ||||||
zero based indexing | zero based indexing | 1 based indexing | 1 based indexing | first element in a list or vector starts with 1 whereas it is zero with Python and Java | |||
x[c(-2,-10)] or x[-c(2,10)] | all elements of vector-list X except the ones at index 2 and 10 (represented by negative signs) | ||||||
row-wise | column-wise programme | constructs dataframe or matrix along the colunns and continues to the next column -R. opposite is true for Python | |||||
if( #doSomething }else if{ #doSomething }else{ } | if(): #doSomething elif: #doSomething else #doSomething | if( #doSomething }else if{ #doSomething }else{ } | R and Java same style for if-elseif-else whereas python uses tab-breaks without brackets | ||||
for ( I; i<10; i++){ print(i) } | for( I in range(0,10): print(i) | for(I in 1:10){ print(i) } | the R loop index and loop range in the parenthesis is similar to Python, but the outer curly brackets is similar to Java | ||||
repeat | to keep repeating until a condition is satisfied | ||||||
next | skip to next iteration. Do not execute code that come right after next statement. Can be used with any iteration or loop | ||||||
break | exit the loop entirely. Can be used with any iteration or loop | ||||||
return | exit a function and return a value that you pass it | ||||||
The R function returns whatever the last expression in the function is | |||||||
a_function <- function( #doSomethingHere } | functions in R as also objects just like vector , dataframe, and matrix objects | ||||||
named arguments | potentially have default values | ||||||
formal arguments | those included in the formal definition of the function | ||||||
functions can be matched by position or by name | |||||||
in R , a function can be defined inside another function AND a function can RETURN a function as the return value | |||||||
Date | R dates are stored as number of days since 1970 | ||||||
unclass( | find the number of days passed since 1970-01-01 | ||||||
Sys.time | current time of the operating system | ||||||
strptime | converts character string to datetime objects | ||||||
difftime() | datediff() | ||||||
info() | str() | get the data types and info about the columns | |||||
describe() | summary() | get descriptive statistics about the data | |||||
lapply() | The lapply() function takes a list as input, applies a function to each element of | the list, then returns a list of the same length as the original one.The 'l' in 'lapply' stands for 'list'. It can also take data.frame as data.frame is a list of vectors | ||||||
sapply() | simplifies the result returned by lapply. Hence the "s" infront of sapply.In general, if the result is a list where every element is of length one, then | sapply() returns a vector. If the result is a list where every element is a vector of | the same length (> 1), sapply() returns a matrix. If sapply() can't figure things | out, then it just returns a list, no different from what lapply() would give you. | ||||||
vapply() | Whereas sapply() tries to 'guess' the correct format of the result, vapply() allows | you to specify it explicitly. If the result doesn't match the format you specify, | vapply() will throw an error, causing the operation to stop. This can prevent | significant problems in your code that might be caused by getting unexpected return | values from sapply(). | ||||||
tapply() -- into groups or levels | split your data up into groups based on the | value of some variable, then apply a function to the members of each group. | ||||||
apply() | apply functions over an array or matrix | ||||||
split() | |||||||
objects or instances are made from classes | objects or instances can be made from functions | ||||||
toString() | str* | str | it tells you whats in an object or function | ||||
object.size() | check how much memory the item is occupy | ||||||
colmeans() | find the means of each column | ||||||
plot() | short form for scatter | ||||||
hist() | short form for histogram | ||||||
match | match | match("city", names(chicago)) | returns the index of the item being matched | ||||
merge | merge | join 2 dataframes or objects on a common column | |||||
groupby() | group_by() | pivot_tables | |||||
%>% | chain operator: The code to the right | of %>% operates on the result from the code to the left of %>%. | ||||||
str.contains | grepl, grep | like | df[df['A'].str.contains("hello")], df[grepl("hello",df$A),] | find all rows in the column "A" where the values contains the text "hello" | |||
s.substring(2) | s[2 : ] | substr("Hello",1,3) | substring("Hello",1,3) | R= substr("Hello",1,3); p = s[2 : ] | get substring from a String | ||
str.trim() | str.strip() | str_trim("Hello ") | ltrim(), rtrim() | trim() | strip or trim extra trailing spaces in a string | ||
replace | replace | sub, gsub | replace | replace | R= sub("_","@", str) P =str.replace("_", "@") J = str.replace('_', '@') | replace or substitute an old character string with a new character string |