Java Python R SQL Excel Compared Similarities For Data Science and Data Analytics – The Basics

If you have ever worked with Java, Python, R, SQL, Excel and other Languages on a varied Data Science or Data Analytics projects, you will realise that all these languages have similar syntaxes, or at least, can achieve the same objective with very similar codes.

Below is a comparison and similarities of these various tools and programmes I put together for quick reference when I realised that there are some similarities between these languages languages and tools are.

Though it is not complete and exhaustive, I believe it will be of help to someone.

If you have any similarities amongst any of these languages, please leave it as a comment in the comment box I will update the table.

I would want this to help all new people who are into Data Science and Data Analytics as I wish I had something similar to this when I first started. Thank you for your contribution.

Navigating the table:

Please

Use the search box located directly on the top right hand corner of the table to search for terms
Use the scroll bar at the bottom of the table to see all columns
The table has 5 columns which are Java, Python, R, SQL, Excel, Example, Description

Java, Python, R ,SQL, Excel Similarities For Data Science - The Basics

Java	Python	R	SQL	EXCEL	EXAMPLE	DESCRIPTION
instanceof	isinstance					is used to test whether the object is an instance of the specified type (class or subclass or interface).
	normalize			%	p7 = sub2["USFREQMO"].value_counts(sort = False, normalize = True)	get percentage of the values. . normalize = True means get the percentages
	sort_values()	sort(),order(), arrange()		sort	p7 = sub2["USFREQMO"].value_counts(sort = False, normalize = True)	sort the data in . the default = True (descending order)
	sort_index(ascending=True)			sort		sort by index and not the values
	assign		row_number()	countifs, sumifs	data.assign(dupes=data.groupby(['AID']).cumcount()+1).query('dupes>1')[:6]	checking the number of occurrences of a particular variable
	import	library			import pandas = library("swirl")
	pip install	install.packages			eg pip install virtualenv = install.package("swirl")
	getcwd()	getwd()			get working directory
	pandas.read_csv()	read.csv()			read.csv("mydata.csv")	read a csv file
		dir()				list contents in the current working directory
		ls()				list the variables and functions available to the current console
		source()			source("myCode.R")	load files from the directory
	=	<-			x <- 1 , python x= 1	assignment operator
//	#	#			# comment	single comment
/* */	'' ' ''''				block comment
	range(1,20)	01:20				the colon, indicates a ssequence, meaning 1 up to 20
		vectors				can contain ONLY one data type
		list				type of vector that can contain mixed data types
		dimnames				dimension names of the vector ie, the list, matrix, dataframes or base vectors
	concat()	c()			c(0.5, 0.6)	concatenate
	vector(class, length)				vector("numeric", length =10)	every vector should have 2 arguments. Class and length
	" "	" "			"string"	string quotation
	true = 1, false = 0	true = 1, false = 0				numeric denotations for boolean values of True and False
	to_numeric()	as.numeric()			as.numeric("6")	cast to numeric data type
	type()	class()			x =6 , class(x)	find the data type class of the variable x
	cast or ()	as			as.numeric(), as.logical()	cast from one datatype to the other
	String	character			"character", python = "String"	character is the name given to String in R environment
	indexing start at 0	indexing start at 1
	indexing a list is by single brackets eg [1]	indexing a list is by double brackets eg [[1]]			indicates item at position 1
		matrices -- they are VECTORS with DIMENSION attribute. Which means the have rows and columns. Matrice can only contain ONE data class
		matrice can store only one type of data class or not a mixed type
	shape	dim()			dim(m)	get the dimension of matrice or dataframe(python). First number is ROW and second is COLUMN
		matrices are created column-wise				takes the data and put them along the colum first. So if number of rows = 2, then it will go 2 rows down the colum and go the next column then 2 rows down it
		factor data type				used to store categorical variable or data types
	value_counts()	table(f)			table(factor variable)	frequency counts
	NaN	Na or NaN				missing values
		is.na()				test objects if there are NA
		is.nan()				used to test for NaN
		a NaN value is NA but the converse is not true
	len(df.index)	nrow()				number of rows
	len(df.columns)	ncol()				number of columns
	df.columns	names()				finding the names or labels of the object or column names of the dataframe or object
	boolean	logical				boolean term is called logical in R
	DataFrame	data.frame				Python is called DataFrame and R is called data.frame
		dput()				dput() writes the data type and meta of a dataframes or data to a file and then later dget to get the metadata
		dump()				s	0
		[				subsetting with single [ --- you will always get object of the same CLASS back. You can select multiple items via a sequence
		[[				subsetting elements of a list or data frame. It can be used to select ONLY ONE ELEMENT of the list or dataframe
.	.	$				used to select element of a list or data frame by name, semantics are similar to [[
	list[1:3]	list[c(1,3)]				subsetting the 1st and 3rd element of a list by passing a vector object of 1, 3 to the single bracket of the list. Single brackets are used to list multiple items
	!	!			!TRUE	not or inverser of the logical operator
	dropna	complete.cases or create logical vectors for subset				used to drop NA values from large data set
	range(1,20,0.5)	seq(1,20,0.5)				reange from 1 - 20 with increment of 0.5
length()	len()	length()	len()	len()		length of a vector
		rep()			rep(0,times=40)	replicate the same values several times
	fundatmental data types	atomic vectors				eg, numeric, logical, boolean(python), etc
<	<	<	<	<		less than
!ERROR! A59 -> Formula Error: Unexpected operator '='	!ERROR! B59 -> Formula Error: Unexpected operator '='	!ERROR! C59 -> Formula Error: Unexpected operator '='	=	=		equality
>	>	>	>	>		greater than
!=	!=	!=	!= OR <>	<>		not equal to
\| or \|\|	\|	\| or \|\|	or	OR		or An expression using the OR operator will evaluate to TRUE if the left operand or the right operand is TRUE. If both are TRUE, the expression will evaluate to TRUE, however if neither are TRUE, then the expression will be FALSE.
& or &&	&	& or &&	and	AND		if the right and left operands of AND are both TRUE the entire expression is TRUE, otherwise it is FALSE. For example, TRUE & TRUE evaluates to TRUE. Try typing FALSE & FALSE to how it is evaluated. You can use the `&` operator to evaluate AND across a vector. The `&&` version of AND only evaluates the first member of a vector. Let's test both for practice. Type the expression TRUE & c(TRUE, \| FALSE, FALSE).
						As you may recall, arithmetic has an order of operations and so do logical expressions. All AND operators are evaluated before OR operators. Let's look at an example of an ambiguous case. > 5 > 8 \|\| 6 != 8 && 4 > 3.9 [1] TRUE \| That's a job well done! \|====================================================== \| 58% \| Let's walk through the order of operations in the above case. First the left and right operands of \| the AND operator are evaluated. 6 is not equal 8, 4 is greater than 3.9, therefore both operands are \| TRUE so the resulting expression `TRUE && TRUE` evaluates to TRUE. Then the left operand of the OR \| operator is evaluated: 5 is not greater than 8 so the entire expression is reduced to FALSE \|\| TRUE. \| Since the right operand of this expression is TRUE the entire expression evaluates to TRUE.
		identical(), isTRUE, xor()				extra inbuilt R functions for logical operations. Xor is called exclusive or
		which()				We can use the resulting logical vector to ask other questions about ints. The which() function \| takes a logical vector as an argument and returns the indices of the vector that are TRUE. For \| example which(c(TRUE, FALSE, TRUE)) would return the vector c(1, 3).
%	%	%%				modulus operator


		paste(my_char, collapse = " ")				write out a character vector of length greater than one to a 1 character vector
		sum(my_na)				get the coutn of TRUEs and FALSEs (where my_na is logical vector)
		NA				NA -> not available hence a place holder for a value
		NaN				NaN -> not a number
		y <- x[!is.na(x)]				find all values in a vector that are not Nas
zero based indexing	zero based indexing	1 based indexing		1 based indexing		first element in a list or vector starts with 1 whereas it is zero with Python and Java
		x[c(-2,-10)] or x[-c(2,10)]				all elements of vector-list X except the ones at index 2 and 10 (represented by negative signs)
row-wise	column-wise programme					constructs dataframe or matrix along the colunns and continues to the next column -R. opposite is true for Python

if(){ #doSomething }else if{ #doSomething }else{ }	if(): #doSomething elif: #doSomething else #doSomething	if(){ #doSomething }else if{ #doSomething }else{ }				R and Java same style for if-elseif-else whereas python uses tab-breaks without brackets
for ( I; i<10; i++){ print(i) }	for( I in range(0,10): print(i)	for(I in 1:10){ print(i) }				the R loop index and loop range in the parenthesis is similar to Python, but the outer curly brackets is similar to Java
		repeat				to keep repeating until a condition is satisfied
		next				skip to next iteration. Do not execute code that come right after next statement. Can be used with any iteration or loop
		break				exit the loop entirely. Can be used with any iteration or loop
		return				exit a function and return a value that you pass it
						The R function returns whatever the last expression in the function is
		a_function <- function(){ #doSomethingHere }				functions in R as also objects just like vector , dataframe, and matrix objects
		named arguments				potentially have default values
		formal arguments				those included in the formal definition of the function
		functions can be matched by position or by name
		in R , a function can be defined inside another function AND a function can RETURN a function as the return value
		Date				R dates are stored as number of days since 1970
		unclass()				find the number of days passed since 1970-01-01
		Sys.time				current time of the operating system
		strptime				converts character string to datetime objects
		difftime()	datediff()
	info()	str()				get the data types and info about the columns
	describe()	summary()				get descriptive statistics about the data
		lapply()				The lapply() function takes a list as input, applies a function to each element of \| the list, then returns a list of the same length as the original one.The 'l' in 'lapply' stands for 'list'. It can also take data.frame as data.frame is a list of vectors
		sapply()				simplifies the result returned by lapply. Hence the "s" infront of sapply.In general, if the result is a list where every element is of length one, then \| sapply() returns a vector. If the result is a list where every element is a vector of \| the same length (> 1), sapply() returns a matrix. If sapply() can't figure things \| out, then it just returns a list, no different from what lapply() would give you.
		vapply()				Whereas sapply() tries to 'guess' the correct format of the result, vapply() allows \| you to specify it explicitly. If the result doesn't match the format you specify, \| vapply() will throw an error, causing the operation to stop. This can prevent \| significant problems in your code that might be caused by getting unexpected return \| values from sapply().
		tapply() -- into groups or levels				split your data up into groups based on the \| value of some variable, then apply a function to the members of each group.
		apply()				apply functions over an array or matrix
		split()
	objects or instances are made from classes	objects or instances can be made from functions
toString()	str*	str				it tells you whats in an object or function
		object.size()				check how much memory the item is occupy
		colmeans()				find the means of each column

		plot()				short form for scatter
		hist()				short form for histogram
		match		match	match("city", names(chicago))	returns the index of the item being matched
	merge	merge				join 2 dataframes or objects on a common column
	groupby()	group_by()		pivot_tables
		%>%				chain operator: The code to the right \| of %>% operates on the result from the code to the left of %>%.
	str.contains	grepl, grep	like		df[df['A'].str.contains("hello")], df[grepl("hello",df$A),]	find all rows in the column "A" where the values contains the text "hello"
s.substring(2)	s[2 : ]	substr("Hello",1,3)	substring("Hello",1,3)		R= substr("Hello",1,3); p = s[2 : ]	get substring from a String
str.trim()	str.strip()	str_trim("Hello ")	ltrim(), rtrim()	trim()		strip or trim extra trailing spaces in a string
replace	replace	sub, gsub	replace	replace	R= sub("_","@", str) P =str.replace("_", "@") J = str.replace('_', '@')	replace or substitute an old character string with a new character string

Want more information like this?

Java Python R SQL Excel Compared Similarities For Data Science and Data Analytics – The Basics

Java, Python, R ,SQL, Excel Similarities For Data Science - The Basics

Sending Email Using Python Smtplib Automate Email Sending Alerts

Installing Talend Whilst Java 1.8 Is Already Installed on Windows 10 PC

SQL Statement Query Clause Evaluation Order – Which Part of SQL Statements Execute First and in What Order?

Basic Plotting Using Bokeh Python Pandas Library – Scatter, Line Visualizations

Excel INDEX And MATCH functions Explained from Beginner to Advanced with Examples

Create Batch Dot Bat File to Run Your Python Script With Windows Scheduler

Leave a Reply Cancel reply

DataPandas LTS

EXPLORE DataPandas

ImportAnt link

GET IN TOUCH

© 2024 DataPandas

Java, Python, R ,SQL, Excel Similarities For Data Science - The Basics

People Who Read The Above Post Also Read This:

Similar Posts

Leave a Reply Cancel reply

DataPandas LTS

EXPLORE DataPandas

ImportAnt link

GET IN TOUCH

© 2024 DataPandas