Background of the Dataset CSV file Used:

The background to the Dataset CSV file used has been explained extensively in the week 2’s assignment. Not to bore assessors and readers by repeating everything here again, please simply check the background information from my previous assignment which can be assessed at this link: http://adabadata.tumblr.com/  OR in tumblr, it can be seen as one of the past posts specifically the one with the title

(PYTHON PROGRAM For The Research
Topic Association Of The Literacy Rate And Life Expectancy & Association Of
The Literacy Rate And Income Per Person:  The Case of Ghana
)

For easy access though, I will post the link to the actual dataset csv which has been used for this project here again.

The   gapminder_ghana_updated.csv  dataset csv for this project can be view and dowloaded here:

https://drive.google.com/file/d/0B2KfPRxy4ootbzl5N0g1dUtIVzA/view?pref=2&pli=1

see screenshot here
for guide (http://prntscr.com/9gctxn)

 

Creating Graphs For My Data

As instructed in the assignment, I will be continuing with the program I have successfully run.
That will be the program written in week 2.

MY PYTHON PROGRAM
CODE: 

# -*- coding: utf-8 -*-

"""

Created on Sat Jan  2
12:33:55 2016



@author: Bernard

"""



#import statements

import pandas

import numpy

import seaborn

import matplotlib.pyplot as plt









#load the gapminder_ghana_updated dataset csv into the
program

data = pandas.read_csv('gapminder_ghana_updated.csv',
low_memory = False)









#print number of observations(rows) which is the number of
years this data

#has been looked at; print length

print("number of observations(rows) which is the number
of years this data has been looked at: ")

print(len(data))



#print number of variables (columns)

print("number of variables (columns) available in the
dataset: ")

print(len(data.columns))



print("data index: ")

print(len(data.index))











#Converting datat to numeric

data["incomeperperson"] =
data["incomeperperson"].convert_objects(convert_numeric=True)

data["lifeexpectancy"] =
data["lifeexpectancy"].convert_objects(convert_numeric=True)

data["literacyrate"] =
data["literacyrate"].convert_objects(convert_numeric= True)











#displaying rows or observation in Dataframe.

#inc_pp_count is the name that will hold the result from
incomeperperson count

# sort = false ; i use value false so that the data will be
sorted according

#to the original format and sequence  of the loaded data



print("counts for incomeperperson - 2010 Gross Domestic
Product per capita in constant 2000 US$ of Ghana. ")

inc_pp_count =
data["incomeperperson"].value_counts(sort = False)

#print the count of inc_pp_count ; incomeperperson

print(inc_pp_count)



print("percentages for incomeperperson - 2010 Gross
Domestic Product per capita in constant 2000 US$ of Ghana. ")

inc_pp_percent =
data["incomeperperson"].value_counts(sort=False, normalize =True)

#print the percentage of incomeperperson

print(inc_pp_percent)











print("counts for lifeexpectancy- 2011 life expectancy
at birth (years) of Ghana")

life_exp_count =
data["lifeexpectancy"].value_counts(sort = False)

#print the count of life_exp_count ; lifeexpectancy

print(life_exp_count)



print("percentages for lifeexpectancy- 2011 life
expectancy at birth (years) of Ghana ")

life_exp_percent =
data["lifeexpectancy"].value_counts(sort =False, normalize = True)

#print the percentage of life_exp_count ; lifeexpectancy

print(life_exp_percent)











print("counts for literacyrate - 2010, Literacy rate,
adult total (% of people ages 15 and above) of Ghana")

lit_rate_count =
data["literacyrate"].value_counts(sort = False ,dropna=False) #dropna
displays missen values

#print the count of lit_rate_count ; literacyrate

print(lit_rate_count)











print("percentages literacyrate - 2010, Literacy rate,
adult total (% of people ages 15 and above) of Ghana ")

lit_rate_percent =
data["literacyrate"].value_counts(sort =False, normalize = True)

#print the percentage of lit_rate_count ; literacyrate

print(lit_rate_percent)











#univariate bar graph for quantitative variable -
incomeperperson

seaborn.distplot(data["incomeperperson"].dropna(),
kde=False);

plt.xlabel("Incomeperperson - 2010 Gross Domestic
Product per capita in constant 2000 US$")

plt.title("Incomeperperson - 2010 Gross Domestic
Product per capita in constant 2000 US$ of Ghana. ")











#univariate bar graph for quantitative variable -
lifeexpectancy

seaborn.distplot(data["lifeexpectancy"].dropna(),
kde=False);

plt.xlabel("Lifeexpectancy- 2011 life expectancy at
birth (years)")

plt.title("Lifeexpectancy- 2011 life expectancy at
birth (years) of Ghana ")











#univariate bar graph for quantitative variable -
literacyrate

seaborn.distplot(data["literacyrate"].dropna(),
kde=False);

plt.xlabel("literacyrate - 2010, Literacy rate, adult
total (% of people ages 15 and above)")

plt.title("Literacyrate - 2010, Literacy rate, adult
total (% of people ages 15 and above) of Ghana ")











#Standard deviation and other descriptive statistics for the
quantitative variables

print("describe Incomeperperson - 2010 Gross Domestic
Product per capita in constant 2000 US$ of Ghana.")

desc1 = data["incomeperperson"].describe()

print(desc1)











print("describe Lifeexpectancy- 2011 life expectancy at
birth (years) of Ghana")

desc2 = data["lifeexpectancy"].describe()

print(desc2)











print("describe Literacyrate - 2010, Literacy rate,
adult total (% of people ages 15 and above) of Ghana")

desc3 = data["literacyrate"].describe()

print(desc3)











#Scatterplot for the relationship between Literacy Rate and
Life Expectancy of Ghana

scat1 = seaborn.regplot(x="literacyrate",
y="lifeexpectancy", fit_reg=False, data=data)

plt.xlabel("LITERACYRATE")

plt.ylabel("LIFEEXPECTANCY")

plt.title("Scatterplot for the Association between
Literacy Rate and Life Expectancy of Ghana")











#Showing The Line of Best Fit by dropping
"fit_reg" in the seaborn.regplot function to show Scatterplot for the

#relationship between Literacy Rate and Life Expectancy of
Ghana

scat2 = seaborn.regplot(x="literacyrate",
y="lifeexpectancy", data=data)

plt.xlabel("LITERACYRATE")

plt.ylabel("LIFEEXPECTANCY")

plt.title("Scatterplot for the Association between
Literacy Rate and Life Expectancy of Ghana")











#Scatterplot for the relationship between Literacy Rate and
Income Per Person of Ghana

scat3 = seaborn.regplot(x="literacyrate",
y="incomeperperson", fit_reg=False, data=data)

plt.xlabel("LITERACYRATE")

plt.ylabel("INCOMEPERPERSON")

plt.title("Scatterplot for the Association between
Literacy Rate and Income Per Person of Ghana")











#Showing The Line of Best Fit by dropping
"fit_reg" in the seaborn.regplot function to show Scatterplot for the

#relationship between Literacy Rate and Income Per Person of
Ghana

scat4 = seaborn.regplot(x="literacyrate",
y="incomeperperson", data=data)

plt.xlabel("LITERACYRATE")

plt.ylabel("INCOMEPERPERSON")

plt.title("Scatterplot for the Association between
Literacy Rate and Income Per Person of Ghana")

 

The Univariate graph for quantitative variable – incomeperperson

please view the graph at this link

https://drive.google.com/file/d/0B2KfPRxy4ootaHoxWWpDTnRjZ1E/view?usp=sharing

This graph is bimodal, with its first highest peak at the median incomeperperson rate of 500 to 1000.
The second highest peak is at the median rate of 2000 – 2500 of the incomeperperson rate of the people of Ghana. It seems to be skewed to the right as there are higher frequencies in lower incomeperperson levels than the higher incomeperperson levels.

 

 

The Univariate graph for quantitative variable – lifeexpectancy

please view the graph at this link

https://drive.google.com/file/d/0B2KfPRxy4ootTjNnMDhDLTVTTFk/view?usp=sharing

 

This graph is bimodal, with its first highest peak at the median lifeexpectancy between 24 and 30 years. The second highest peak is at the median lifeexpectancy of 60 – 66 years.

It seems to be skewed to the right as there is a very high frequency in the lower
number of years than the higher number of years.

 

The Univariate graph for quantitative variable – literacyrate

please view the graph at this link

https://drive.google.com/file/d/0B2KfPRxy4ootUEdGU28wQXl6cVE/view?usp=sharing

This graph is uniform. This is because it has no modes or values around which the distribution is concentrated. It is uniform not for the fact that all the literacyrate values gathered for the entire 216 years are around the same figure, but for the fact that only 2 literacyrate values were present in the Gapminder data for the country under discussion – Ghana

Hence there is a great deal of NaN values (Years without any literacyrate data collected) leaving me with only 2 values to analyse which are literacyrates of 57.897473 and 71.497075 – hence making the data for the literacyrate somewhat limited.

 

Scatterplot for the relationship between Literacy Rate and Life Expectancy of Ghana

please view the graph at this link

https://drive.google.com/file/d/0B2KfPRxy4ootRXR2NkJvS3BUR1E/view?usp=sharing

From the scatterplot it can be seen that the higher the literacyrate,
the higher the lifeexpectancy of the people of Ghana; and the lower the literacyrate, the lower the lifeexpectancy of the people of Ghana.  We can say there is a positive relationship between the literacyrate  and the lifeexpectancy
of the people of Ghana. However, it must be mentioned that, the GapMinder data
for the literacyrate is limited as it is recorded for only 2 different years.
Hence this does not give us enough data to make a definite conclusion that
there is a positive relationship between literacyrate  and the lifeexpectancy of the people of Ghana even though the 2 literacyrate values  and the over 200 lifeexpectancy values generating the scatterplot above depicts so.

 

Showing The Line Of Best to show Scatterplot for the relationship between Literacy Rate and Life Expectancy of Ghana

please view the graph at this link

https://drive.google.com/file/d/0B2KfPRxy4ootRU1Ma2x4S28wTjA/view?usp=sharing

The Line Of Best fit in the Scatterplot suggests there is a positive relationship between the literacyrate and the lifeexpectancy of the people of
Ghana. However, as mentioned above, due to limited Gapminder data on the literacyrate , eventhough the scatterplots suggests such relationship,  I cannot definitely conclude there is positive relationship between the literacyrate  and the lifeexpectancy of the people of Ghana.

 

Scatterplot for the relationship between Literacy Rate and Income Per Person of Ghana

please view the graph at this link

https://drive.google.com/file/d/0B2KfPRxy4ootMWtRYVBNdVQ5c0k/view?usp=sharing

 

Similar analysis applies to the analysis for the relationship between the literacyrate  and the lifeexpectancy of the people of Ghana

From the scatterplot it can be seen that the higher the literacyrate,
the higher the incomeperperson of the people of Ghana; and the lower the literacyrate, the lower the incomeperperson of the people of Ghana.  We can say there is a positive relationship between the literacyrate  and the incomeperperson of the people of Ghana. However, it must be mentioned that, the GapMinder data for the literacyrate is limited as it is recorded for only 2 different years.
Hence this does not give us enough data to make a definite conclusion that
there is a positive relationship between literacyrate  and the incomeperperson of the people of Ghana even though the 2 literacyrate values  and the over 200 incomeperperson  values generating the scatterplot above depicts so.

 

Showing The Line Of Best Fit by to show Scatterplot for the relationship between Literacy Rate and Income Per Person of Ghana

please view the graph at this link

https://drive.google.com/file/d/0B2KfPRxy4ootM25rZE4zV05yS3M/view?usp=sharing

 

The Line Of Best fit in the Scatterplot suggests there is a positive relationship between the literacyrate and the incomeperperson  of the people of Ghana. However, as mentioned above, due to limited Gapminder data on the literacyrate  , eventhough the scatterplots suggests such relationship, I cannot definitely conclude there is positive relationship between the literacyrate  and
the incomeperperson  of the people of Ghana.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *