Running a Random Forest – Data Analysis and Intrepretation

Overview

My research work deals with Ghana, a country from the Gapminder dataset as has already been discussed from the beginning and progression through this course.

The variables in my observation dataset are all quantitative.

For the purposes of this assignment, I have binned my quantitative target variable, Life Expectancy (lifeexpectancy) into a 2-level binary categorical target variable. I have named this categorical target variable, lifeExpectancyCat. It has been coded as 0 = low life expectancy and 1 = high life expectancy

I have also binned 2 of the other predictor categorical variables (incomeperperson and exports) for the purpose of this assignment.

Thus these are the respective categorical end result variables of my quantitative predictor variables:

Quantitative Binary Categorical variable

incomeperperson – incomeLevelGrp

exports – exportsCatGrp

I have also added 2 more explanatory variables which is obtained
from the Gapminder website: http://www.gapminder.org/data/

to my list of variables which are used for this assignment. This is to get more explanatory variables for this Random Forest Assignment.

These new variables are:

agriculture which represents Agriculture, value added (% of GDP)

democracyscore which represents Overall polity score from the Polity IV dataset, calculated by subtracting an autocracy score from a democracy score. It is a summary measure of a country’s democratic and free nature. -10 is the lowest value, 10 the highest.

Running a Random Forest

Random forest analysis was performed to evaluate the
importance of a series of explanatory variables in predicting my binary,
categorical response variable – . The following explanatory variables were
included as possible contributors to a random forest evaluating;

income per person, exports, inflation, agriculture, and democracy score

The explanatory variable with the highest relative importance
scores was agriculture which has relative importance score of 0.4005748 followed by democracy score with a score of 0.35894682, followed by inflation with a relative importance score of 0.12744075. Then comes exports
with a score of 0.05900944. The explanatory variable with the lowest relative
importance score is income per person with a relative importance score of 0.05402819

Relative Importance Scores

The accuracy of the random forest was 95% (0.94999999999999996),
with the subsequent growing of multiple trees rather than a single tree, adding
little to the overall accuracy of the model, and suggesting that interpretation
of a single decision tree may be appropriate.

Random Forest Graph

Requesting the shape of my predictor training sample it can be seen that it has 30 observation or rows which represents 60% of our original sample and 5 explanatory variables as indicated above.

The test sample has 20 observation or rows which is 40% of
the original sample and 5 explanatory variables or columns

A Confusion Matrix was used to estimate the prediction
accuracy of my model. From the results of the Confusion Matrix, the models
accurately classified 19 of the total 20 observations included in my data set
for the classification and misclassified just 1 of the 20 observations included
in the classification process. This means the model correctly classified 95% of
the observations as having High Life Expectancy or Low Life Expectancy and
misclassified 5% of the observations in my data set.

The Confusion Matrix result can be seen below:

This
can be interpreted as the model having low prediction error as it correctly
classified high percentage of the observations and misclassified low percentage
of the observations in my dataset

This can further be seen by running a Test Accuracy score on my
model and the result was 0.94999999999999996 (95%) as can also be seen below:

################################

PYTHON
CODE

#################################

################################

CODE
OUTPUT

#################################

Use Python 2 and 3 Pyodbc and Sqlalchemy to connect to SQL Server Client 11.0

ByBernard Adabankah April 18, 2021February 14, 2023

There are several ways you can use Python to connect to SQL Server and one of such ways is to use pyodbc and sqlalchemy People Who Read The Above Post Also Read This: Pyodbc Sqlalchemy Python 2 and 3 SQL Server Native Client 11.0 Get Numbers and Percentages from Tables – Microsoft Sql Server Superset…

Python

Python PyQt Gui Basics Button Window PyQt4 Pt1

Bydatapandasadmin September 28, 2016October 23, 2016

This is just some quick scrap notes on the basics of Python GUI building using PyQt. In [1]: import sys from PyQt4 import QtGui, QtCore # to help add buttons #app = QtGui.QApplication(sys.argv) # #window = QtGui.QWidget() # ##set the posstion and size of the window #window.setGeometry(50, 50, 500, 500) # ##lets set the title #window.setWindowTitle(‘Date…

Python

Crab Recommender System – Framework in Python Example and installation Problem Fix

Bydatapandasadmin August 21, 2016November 14, 2016

This is just to show that the import errors which were encountered during the installation of Crab, a Recommender Framework in Python worked fine with the fixes I earlier outlined. These were the errors and how they were fixed: ImportError: No module named ‘scikits ImportError: No module named learn.base The code below is the…

Coursera | Data Science

Association Of The Literacy Rate And Life Expectancy & Association Of The Literacy Rate And Income Per Person: The Case of Ghana

Bydatapandasadmin December 20, 2015October 23, 2016

Background of the Dataset CSV file Used: In the GapMinder Codebook the Unique Identifier = Country Hence in this program, my Unique Identifier = Ghana 1. There are 3 chosen variables (columns) that are core to my chosen research question which is based on the country Ghana. These are a. incomeperperson b. lifeexpectancy…

Python

Array Processing – Python Numpy – How to work with Arrays in Python

Bydatapandasadmin August 19, 2016October 23, 2016

Hi Guys, Thanks for all your emails. In this note, we will be looking at Array Processing in Python. This is part of lectures on Learning Python for Data Analysis and Visualization by Jose Portilla on Udemy. In [3]: import numpy as np import matplotlib.pyplot as plt %matplotlib inline In [4]: points = np.arange(-5,5,0.01) In [5]: dx, dy…

Data Science | Resources

Decision Tree Price Optimisation – Regression Tree

Bydatapandasadmin November 2, 2016November 2, 2016

Price Optimisation can be achieved in several ways. I did a simple analysis of how you can use Decision Tree to price optimise a product. This article was published on Data Science Central which can be fully read here: Price Optimisation Using Decision Tree (Regression Tree) – Machine Learning I have received a lot…

Running a Random Forest – Data Analysis and Intrepretation

Use Python 2 and 3 Pyodbc and Sqlalchemy to connect to SQL Server Client 11.0

Python PyQt Gui Basics Button Window PyQt4 Pt1

Crab Recommender System – Framework in Python Example and installation Problem Fix

Association Of The Literacy Rate And Life Expectancy & Association Of The Literacy Rate And Income Per Person: The Case of Ghana

Array Processing – Python Numpy – How to work with Arrays in Python

Decision Tree Price Optimisation – Regression Tree

Leave a Reply Cancel reply

DataPandas LTS

EXPLORE DataPandas

ImportAnt link

GET IN TOUCH

© 2023 DataPandas

People Who Read The Above Post Also Read This:

Similar Posts

Leave a Reply Cancel reply

DataPandas LTS

EXPLORE DataPandas

ImportAnt link

GET IN TOUCH

© 2023 DataPandas

Review Cart