Running An Analysis of Variance (ANOVA) – Data Analysis Tools

Chosen Dataset

I will be working with Data from the Gapminder dataset.

This happens to be the same dataset I worked with under the Data Management and Visualization course assignments. As elaborated and discussed in under the Data Management and Visualization course assignments, I have chosen to focus on the country, Ghana.

Hence I will be particularly interested in some data about the country Ghana (as dealt with under Data Management and Visualization course
assignments) and these are primarily the

a. incomeperperson

b. lifeexpectancy

c. literacyrate

It is worth pointing out that, the gapminder.csv provided for the assignment
comprise all the countries in the world without a laser focus on my country of
interest –Ghana. This means that only one year of value for each of these core
variables are entered in the gapminder.csv data provided for the assigment.

The result is that when I run my python program with just these single values there are no other more yearly based values for each of these various to make a
meaningful frequency distribution which are only specific to Ghana,
unless I compare the values of Ghana in general to all the other countries, which is NOT the focus of my research work.

To be able to achieve this laser focus research on only the country Ghana I will
fetch this data from the http://www.gapminder.org/ website, specifically their data section which can be found herehttp://www.gapminder.org/data/.

I will therefore need to compile a new data csv file with focus on Ghana which will give me all the variables I will need for my analysis. In a nutshell, this new
data csv file seeks to enable me load and call the relevant variables and columns
in my python program and more importantly to be get the relevant variables I
will need for my research work going forward.

I will, therefore, call the new data csv file for the assignment: gapminder_ghana_updated.csv

This will be the Gapminder csv data file I will be calling and loading into my
python program

The gapminder_ghana_updated.csv dataset csv for this project can be view and dowloaded here:

https://drive.google.com/file/d/0B2KfPRxy4ootQnRzVUZQQXdFX1U/view?usp=sharing

see screenshot here for guide (http://prntscr.com/9gctxn)

Data Variables

All the data variables I worked with on the Gapminder dataset are all quantitative, however, as stated in the requirements for the Running An Analysis Of Variance assignment, I will need one of my variables (explanatory) to be categorical.

I have therefore added a 4^thvariable by name Inflation; which I will
categorise in order to get a categorical variable purposely for the Analysis Of
Variance Test.

Inflation, GDP deflator (annual %):

According to the Gapminder codebook, Inflation as measured by the annual growth rate of the GDP implicit deflator shows the rate of price change in the economy as a whole. The GDP implicit deflator is the ratio of GDP in current local currency to GDP in constant local currency. Source: World Bank national accounts data, and OECD National Accounts data files

Hence the 4 variables I will be working with on this and subsequent assignments are:

a. incomeperperson

b. lifeexpectancy

c. literacyrate

d. Inflation

I have duly updated My Personal Codebook to include this 4^th Variable. Hence the updated Personal Codebook can be found at this link:

https://docs.google.com/document/d/177YfOjdk4oekFu20OLt4fgmu-n7cgRULAJ_Kd9KdYaM/edit?usp=sharing

The Research Question

For the purposes of this assignment; Running An Analysis Of
Variance Test, I will modify my research question used in the previous
course a little bit.

Hence the question I will be looking at in this assignment is: Is there an association OR relation between Inflation and Income Per Person of the Ghanaian population.

Hypothesis Testing

The Null and Alternate Hypotheses:

From the above research question, the Null Hypothesis (H_o) is that there is no association /relations between Inflation and Income Per Person of the Ghanaian population.

Whereas the Alternate Hypothesis (H_a) states that there is an association / relation between Inflation and Income Per Person of the Ghanaian population

Sample:

Sample is the data from the Gapminder dataset with focus on Ghana

Assessing the evidence:

This is done by Running an analysis of variance (ANOVA TEST ) on the hypotheses.

I do this by using running the test using the Python program.

MY PYTHON PROGRAM CODE:

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

# -*- coding: utf-8 -*-

"""

Created on Mon Jan 4 00:59:30 2016

@author: Bernard

"""

import numpy

import pandas

#to be able to get the p

value and conduct the ANOVA F TEST we import this library

import

statsmodels.formula.api as smf

import

statsmodels.stats.multicomp as multi

#load the

gapminder_ghana_updated dataset csv into the program

data =

pandas.read_csv('gapminder_ghana_updated.csv', low_memory = False)

#Converting data to

numeric

data["incomeperperson"]

= data["incomeperperson"].convert_objects(convert_numeric=True)

data["lifeexpectancy"]

= data["lifeexpectancy"].convert_objects(convert_numeric=True)

data["literacyrate"]

= data["literacyrate"].convert_objects(convert_numeric= True)

data["Inflation"]

= data["Inflation"].convert_objects(convert_numeric= True)

#create a variable for

inflationcategory

data["inflationCategory"]

= data["Inflation"]

#categorical groupings for

inflation. This is to get one categorical variable for the

#ANOVA test

data["inflationCategory"]

= pandas.cut(data.inflationCategory, [-4, 32, 64, 96, 128])

#including only data

relevant for our testing by droping irrelavant data

dataSub =

data[["incomeperperson", "inflationCategory"]].dropna()

#Change format from

numberic to categorical

dataSub["inflationCategory"]

= dataSub["inflationCategory"].astype("category")

#describe inflation

Running An Analysis of Variance (ANOVA) – Data Analysis Tools

Run Your Python and R Codes Online For Your Data Science and Machine Learning Projects Mini Projects For Free

Choosing a Dataset – Data Management and Visualization

Running a Lasso Regression Analysis – Data Analysis and Intrepretation

Connect Push Commit Pull Your GitHub Repository To Your Local Computer’s Directory

Test a Multiple – Multivariate Regression Model

Save Multiple Pandas DataFrames to One Single Excel Sheet Side by Side or Dowwards – XlsxWriter

Leave a Reply Cancel reply

DataPandas LTS

EXPLORE DataPandas

ImportAnt link

GET IN TOUCH

© 2024 DataPandas

People Who Read The Above Post Also Read This:

Similar Posts

Leave a Reply Cancel reply

DataPandas LTS

EXPLORE DataPandas

ImportAnt link

GET IN TOUCH

© 2024 DataPandas