Background of the Dataset CSV file Used:
The background to the Dataset CSV file used has been explained extensively in the week 2’s assignment. Not to bore assessors and readers by repeating everything here again, please simply check the background information from my previous assignment which can be assessed at this link: http://adabadata.tumblr.com/ OR in tumblr, it can be seen as one of the past posts specifically the one with the title
For easy access though, I will post the link to the actual dataset csv which has been used for this project here again.
The gapminder_ghana_updated.csv dataset csv for this project can be view and dowloaded here:
https://drive.google.com/file/d/0B2KfPRxy4ootbzl5N0g1dUtIVzA/view?pref=2&pli=1
see screenshot here
for guide (http://prntscr.com/9gctxn)
Creating Graphs For My Data
As instructed in the assignment, I will be continuing with the program I have successfully run.
That will be the program written in week 2.
MY PYTHON PROGRAM
CODE:
# -*- coding: utf-8 -*- """ Created on Sat Jan 2 12:33:55 2016 @author: Bernard """ #import statements import pandas import numpy import seaborn import matplotlib.pyplot as plt #load the gapminder_ghana_updated dataset csv into the program data = pandas.read_csv('gapminder_ghana_updated.csv', low_memory = False) #print number of observations(rows) which is the number of years this data #has been looked at; print length print("number of observations(rows) which is the number of years this data has been looked at: ") print(len(data)) #print number of variables (columns) print("number of variables (columns) available in the dataset: ") print(len(data.columns)) print("data index: ") print(len(data.index)) #Converting datat to numeric data["incomeperperson"] = data["incomeperperson"].convert_objects(convert_numeric=True) data["lifeexpectancy"] = data["lifeexpectancy"].convert_objects(convert_numeric=True) data["literacyrate"] = data["literacyrate"].convert_objects(convert_numeric= True) #displaying rows or observation in Dataframe. #inc_pp_count is the name that will hold the result from incomeperperson count # sort = false ; i use value false so that the data will be sorted according #to the original format and sequence of the loaded data print("counts for incomeperperson - 2010 Gross Domestic Product per capita in constant 2000 US$ of Ghana. ") inc_pp_count = data["incomeperperson"].value_counts(sort = False) #print the count of inc_pp_count ; incomeperperson print(inc_pp_count) print("percentages for incomeperperson - 2010 Gross Domestic Product per capita in constant 2000 US$ of Ghana. ") inc_pp_percent = data["incomeperperson"].value_counts(sort=False, normalize =True) #print the percentage of incomeperperson print(inc_pp_percent) print("counts for lifeexpectancy- 2011 life expectancy at birth (years) of Ghana") life_exp_count = data["lifeexpectancy"].value_counts(sort = False) #print the count of life_exp_count ; lifeexpectancy print(life_exp_count) print("percentages for lifeexpectancy- 2011 life expectancy at birth (years) of Ghana ") life_exp_percent = data["lifeexpectancy"].value_counts(sort =False, normalize = True) #print the percentage of life_exp_count ; lifeexpectancy print(life_exp_percent) print("counts for literacyrate - 2010, Literacy rate, adult total (% of people ages 15 and above) of Ghana") lit_rate_count = data["literacyrate"].value_counts(sort = False ,dropna=False) #dropna displays missen values #print the count of lit_rate_count ; literacyrate print(lit_rate_count) print("percentages literacyrate - 2010, Literacy rate, adult total (% of people ages 15 and above) of Ghana ") lit_rate_percent = data["literacyrate"].value_counts(sort =False, normalize = True) #print the percentage of lit_rate_count ; literacyrate print(lit_rate_percent) #univariate bar graph for quantitative variable - incomeperperson seaborn.distplot(data["incomeperperson"].dropna(), kde=False); plt.xlabel("Incomeperperson - 2010 Gross Domestic Product per capita in constant 2000 US$") plt.title("Incomeperperson - 2010 Gross Domestic Product per capita in constant 2000 US$ of Ghana. ") #univariate bar graph for quantitative variable - lifeexpectancy seaborn.distplot(data["lifeexpectancy"].dropna(), kde=False); plt.xlabel("Lifeexpectancy- 2011 life expectancy at birth (years)") plt.title("Lifeexpectancy- 2011 life expectancy at birth (years) of Ghana ") #univariate bar graph for quantitative variable - literacyrate seaborn.distplot(data["literacyrate"].dropna(), kde=False); plt.xlabel("literacyrate - 2010, Literacy rate, adult total (% of people ages 15 and above)") plt.title("Literacyrate - 2010, Literacy rate, adult total (% of people ages 15 and above) of Ghana ") #Standard deviation and other descriptive statistics for the quantitative variables print("describe Incomeperperson - 2010 Gross Domestic Product per capita in constant 2000 US$ of Ghana.") desc1 = data["incomeperperson"].describe() print(desc1) print("describe Lifeexpectancy- 2011 life expectancy at birth (years) of Ghana") desc2 = data["lifeexpectancy"].describe() print(desc2) print("describe Literacyrate - 2010, Literacy rate, adult total (% of people ages 15 and above) of Ghana") desc3 = data["literacyrate"].describe() print(desc3) #Scatterplot for the relationship between Literacy Rate and Life Expectancy of Ghana scat1 = seaborn.regplot(x="literacyrate", y="lifeexpectancy", fit_reg=False, data=data) plt.xlabel("LITERACYRATE") plt.ylabel("LIFEEXPECTANCY") plt.title("Scatterplot for the Association between Literacy Rate and Life Expectancy of Ghana") #Showing The Line of Best Fit by dropping "fit_reg" in the seaborn.regplot function to show Scatterplot for the #relationship between Literacy Rate and Life Expectancy of Ghana scat2 = seaborn.regplot(x="literacyrate", y="lifeexpectancy", data=data) plt.xlabel("LITERACYRATE") plt.ylabel("LIFEEXPECTANCY") plt.title("Scatterplot for the Association between Literacy Rate and Life Expectancy of Ghana") #Scatterplot for the relationship between Literacy Rate and Income Per Person of Ghana scat3 = seaborn.regplot(x="literacyrate", y="incomeperperson", fit_reg=False, data=data) plt.xlabel("LITERACYRATE") plt.ylabel("INCOMEPERPERSON") plt.title("Scatterplot for the Association between Literacy Rate and Income Per Person of Ghana") #Showing The Line of Best Fit by dropping "fit_reg" in the seaborn.regplot function to show Scatterplot for the #relationship between Literacy Rate and Income Per Person of Ghana scat4 = seaborn.regplot(x="literacyrate", y="incomeperperson", data=data) plt.xlabel("LITERACYRATE") plt.ylabel("INCOMEPERPERSON") plt.title("Scatterplot for the Association between Literacy Rate and Income Per Person of Ghana")
The Univariate graph for quantitative variable – incomeperperson
please view the graph at this link
https://drive.google.com/file/d/0B2KfPRxy4ootaHoxWWpDTnRjZ1E/view?usp=sharing
This graph is bimodal, with its first highest peak at the median incomeperperson rate of 500 to 1000.
The second highest peak is at the median rate of 2000 – 2500 of the incomeperperson rate of the people of Ghana. It seems to be skewed to the right as there are higher frequencies in lower incomeperperson levels than the higher incomeperperson levels.
The Univariate graph for quantitative variable – lifeexpectancy
please view the graph at this link
https://drive.google.com/file/d/0B2KfPRxy4ootTjNnMDhDLTVTTFk/view?usp=sharing
This graph is bimodal, with its first highest peak at the median lifeexpectancy between 24 and 30 years. The second highest peak is at the median lifeexpectancy of 60 – 66 years.
It seems to be skewed to the right as there is a very high frequency in the lower
number of years than the higher number of years.
The Univariate graph for quantitative variable – literacyrate
please view the graph at this link
https://drive.google.com/file/d/0B2KfPRxy4ootUEdGU28wQXl6cVE/view?usp=sharing
This graph is uniform. This is because it has no modes or values around which the distribution is concentrated. It is uniform not for the fact that all the literacyrate values gathered for the entire 216 years are around the same figure, but for the fact that only 2 literacyrate values were present in the Gapminder data for the country under discussion – Ghana
Hence there is a great deal of NaN values (Years without any literacyrate data collected) leaving me with only 2 values to analyse which are literacyrates of 57.897473 and 71.497075 – hence making the data for the literacyrate somewhat limited.
Scatterplot for the relationship between Literacy Rate and Life Expectancy of Ghana
please view the graph at this link
https://drive.google.com/file/d/0B2KfPRxy4ootRXR2NkJvS3BUR1E/view?usp=sharing
From the scatterplot it can be seen that the higher the literacyrate,
the higher the lifeexpectancy of the people of Ghana; and the lower the literacyrate, the lower the lifeexpectancy of the people of Ghana. We can say there is a positive relationship between the literacyrate and the lifeexpectancy
of the people of Ghana. However, it must be mentioned that, the GapMinder data
for the literacyrate is limited as it is recorded for only 2 different years.
Hence this does not give us enough data to make a definite conclusion that
there is a positive relationship between literacyrate and the lifeexpectancy of the people of Ghana even though the 2 literacyrate values and the over 200 lifeexpectancy values generating the scatterplot above depicts so.
Showing The Line Of Best to show Scatterplot for the relationship between Literacy Rate and Life Expectancy of Ghana
please view the graph at this link
https://drive.google.com/file/d/0B2KfPRxy4ootRU1Ma2x4S28wTjA/view?usp=sharing
The Line Of Best fit in the Scatterplot suggests there is a positive relationship between the literacyrate and the lifeexpectancy of the people of
Ghana. However, as mentioned above, due to limited Gapminder data on the literacyrate , eventhough the scatterplots suggests such relationship, I cannot definitely conclude there is positive relationship between the literacyrate and the lifeexpectancy of the people of Ghana.
Scatterplot for the relationship between Literacy Rate and Income Per Person of Ghana
please view the graph at this link
https://drive.google.com/file/d/0B2KfPRxy4ootMWtRYVBNdVQ5c0k/view?usp=sharing
Similar analysis applies to the analysis for the relationship between the literacyrate and the lifeexpectancy of the people of Ghana
From the scatterplot it can be seen that the higher the literacyrate,
the higher the incomeperperson of the people of Ghana; and the lower the literacyrate, the lower the incomeperperson of the people of Ghana. We can say there is a positive relationship between the literacyrate and the incomeperperson of the people of Ghana. However, it must be mentioned that, the GapMinder data for the literacyrate is limited as it is recorded for only 2 different years.
Hence this does not give us enough data to make a definite conclusion that
there is a positive relationship between literacyrate and the incomeperperson of the people of Ghana even though the 2 literacyrate values and the over 200 incomeperperson values generating the scatterplot above depicts so.
Showing The Line Of Best Fit by to show Scatterplot for the relationship between Literacy Rate and Income Per Person of Ghana
please view the graph at this link
https://drive.google.com/file/d/0B2KfPRxy4ootM25rZE4zV05yS3M/view?usp=sharing
The Line Of Best fit in the Scatterplot suggests there is a positive relationship between the literacyrate and the incomeperperson of the people of Ghana. However, as mentioned above, due to limited Gapminder data on the literacyrate , eventhough the scatterplots suggests such relationship, I cannot definitely conclude there is positive relationship between the literacyrate and
the incomeperperson of the people of Ghana.