OVERVIEW
My research work deals with Ghana, a country from the Gapminder
dataset.
What I found in my logistic regression analysis.
Discussion of the results for the associations between all of my explanatory
variables and my response variable
The primary quantitative explanatory variable in my
regression analysis is the Income Per Person (incomeperperson) and the
response variable is the Life Expectancy (lifeexpectancy)
The other quantitative explanatory variables included in
my multiple regression analysis are Exports (exports) and Inflation
(inflation)
Binning Quantitative Explanatory Variables into Binary Categorical Variable
For the purposes of running the logistic regression in
this assignment, I have binned my quantitative response variable, Life
Expectancy (lifeexpectancy) into 2 categories. That is a 2 -level binary
categorical response variable. I have named this categorical response variable,
lifeExpectancyCat. It has been coded as 0 = low life expectancy and 1
= high life expectancy
I have also binned all the other explanatory variables
categorical variables for the purpose of the logistic regression analysis.
Thus these are the respective categorical end result
variables of my quantitative explanatory variables:
Quantitative Binary Categorical variable
incomeperperson – incomeLevelGrp
inflation – inflationCatGrp
exports – exportsCatGrp
Whether my results supported my hypothesis for the association
between my primary explanatory variable and the response variable
My hypothesis was that there is an association between the
Life Expectancy and Income Per Person of the people of Ghana. Having binned
Income Per Person into 2 binary explanatory variable, my question will
therefore be, are the people of Ghana with High Life Expectancy associated with
those with high Income Level than those without high income levels?
Firstly I run the logistic regression for my primary
categorical response variable lifeExpectancyCat and my primary categorical
explanatory variable, incomeLevelGrp
From the Logit
Regression Results with incomeLevelGrp,
it can be seen that the test is not significant as the p value = 0.643
which is greater than the statistical alpha value of 0.05.
After running the odds ratios on my parameters, I get the
following results:
Odds Ratios
Intercept
0.71
incomeLevelGrp
1.30
As the odds ratio of the incomeLevelGrp is greater than 1 this can be interpreted as those with high
income level are 1.30 times more likely to have high life Expectancy than those
with low income levels. That is, the probability of having high
Life Expectancy increases in those people who have high Income Levels than
those who do not have high income levels.
I further run the confidence interval on my parameters
because my data set is just a sample of the Ghana population and the true
population odds ratio might slightly different from the odds ratio above due to
random sampling. From the confidence intervals and I had the following results:
Lower CI
Upper CI OR
Intercept 0.32 1.61 0.71
incomeLevelGrp 0.43
3.94 1.30
From the results, the odds ratios indicates that there is
95% certainty that the odds ratio from the true population of Ghana lies
somewhere between 0.43 and 3.94
##########################################
Logit Regression Results with incomeLevelGrp
###########################################
Secondly, I run the logistic regression for my primary
categorical response variable lifeExpectancyCat and my primary other categorical
explanatory variables, starting with inflationCatGrp
From the Logit
Regression Results with inflationCatGrp, it can be seen that test is not
significant as the p-value = 0.261 which is greater than the
statistical alpha value of 0.05.
The parameter co-efficient is negative , that is = -1.2993
and the “z” figure is also negative which is = -1.124 which
indicates, there is a negative relationship between Life Expectancy (lifeExpectancyCat)
and Inflation (inflationCatGrp)
After running the odds ratios on my parameters, I get the
following results:
Odds Ratios
Intercept
0.92
inflationCatGrp
0.27
As the odds ratio of the inflationCatGrp is less
than 1, this can be interpreted
as the period with high inflation are 0.27 times less likely to have high life
Expectancy than those with low inflation. That is the probability
of having high Life Expectancy is lower in those periods of high inflation than
those periods of low inflation in Ghana.
I further run the confidence interval on my parameters
and from the confidence intervals and i had the following results:
Lower CI
Upper CI OR
Intercept 0.51 1.63 0.92
inflationCatGrp 0.03
2.63 0.27
From the results, the odds ratios indicates that there is
95% certainty that the odds ratio from the true population of Ghana lies
somewhere between 0.03 and 2.63
########################################
Logit Regression Results with inflationCatGrp
########################################
Thirdly, I run the logistic regression for my primary
categorical response variable lifeExpectancyCat and my last categorical
explanatory variable, exportsCatGrp
From the Logit
Regression Results with exportsCatGrp,
it can be seen that test is not significant as the p-value = 1.000 which is greater than the statistical
alpha value of 0.05.
After running the odds ratios on my parameters, I get the
following results
Odds Ratios
Intercept
0.68
exportsCatGrp 38649160672.63
As the odds ratio of the exportsCatGrp, is greater
than 1 this can be interpreted as high
exports levels are 38649160672.63 times more likely to have high life
Expectancy than low exports levels. That is the probability of
having high Life Expectancy increases in those periods of high export Levels
than those periods of low exports in Ghana.
I further run the confidence interval on my parameters
and this is the confidence interval results:
Lower CI
Upper CI OR
Intercept 0.38 1.22 0.68
exportsCatGrp 0.00
inf 38649160672.63
From the results, the odds ratios indicates that there is
95% certainty that the odds ratio from the true population of Ghana lies
somewhere between 0.00 and positive infinity
########################################
Logit Regression Results with exportsCatGrp
########################################
Checking if
there is Evidence of confounding for the association between my primary
explanatory and response variable
Finally, I run the logistic regression for my primary
categorical response variable lifeExpectancyCat and all of my categorical explanatory variables, incomeLevelGrp, exportsCatGrp and
inflationCatGrp
From the Logit
Regression Results with all my binary explanatory variables, it can be seen all
of them have p values of greater than the statistical alpha value of 0.05. None
of them is statistically significant and there is no evidence of confounding.
The resulting odds ratios are as follows and confident
intervals can be seen below:
Odds Ratios
Intercept 0.81
exportsCatGrp
37609999660.63
incomeLevelGrp 0.87
inflationCatGrp 0.33
dtype: float64
From the results it can interpreted that after
controlling for incomeLevelGrp and
inflationCatGrp high exports levels are
37609999660.63 times more likely
to have high life Expectancy than low exports levels.
Also after controlling for exportsCatGrp and inflationCatGrp, people
with high income levels are 0.87 times less likely to have high life Expectancy than those with low income
levels as the odds ratio for incomeLevelGrp is less than 1 after controlling
for the other 2 variables.
Also after controlling for exportsCatGrp and incomeLevelGrp , the odds ratio for
inflationCatGrp is still less
than 1 which indicates the period with
high inflation are 0.33times
less likely to have high life Expectancy than those with low inflation.
The confidence intervals for all the explanatory
variables can also be seen in the results below:
#####################################
Logit Regression Results with All Variables
#####################################
Lower CI Upper CI OR
Intercept
0.35 1.88 0.81
exportsCatGrp
0.00 inf 37609999660.63
incomeLevelGrp
0.27 2.82 0.87
inflationCatGrp
0.03 3.21 0.33
############################
MY PYTHON PROGRAM CODE
#############################
#############################
ALL OUTPUT FROM MY CODE
#############################