Background of the Dataset CSV file Used:

 

In the GapMinder Codebook the Unique        Identifier = Country

Hence in this program, my Unique Identifier = Ghana

1.      There are 3 chosen variables (columns) that are core to my chosen research question which is based on the country Ghana.

These are

a.       incomeperperson

b.      lifeexpectancy

c.       literacyrate

 

The first 2 variables (incomeperperson and lifeexpectancy) already have values in the gapminder.csv data provided for the assignment. But the problem is, the gapminder.csv provided for the assignment comprise all the countries in the world without a laser focus on my country of interest –Ghana. This means that only one year of value for each of these core variables are entered in the gapminder.csv data provided for the assigment.

The result is that when I run my python program with just these single values there are no other more yearly based values for each of these various to make a meaningful frequency distribution which are only specific to Ghana,  unless I compare the values of Ghana in general to all the other countries, which is NOT the focus of my research work.

My research work is to find out the Association of the Literacy rate and Life
expectancy & Association of the Literacy rate and Income Per Person
with focus on the country, Ghana ONLY.

 

To be able to achieve this laser focus research on only the country Ghana I will fetch this data from the http://www.gapminder.org/ website, specifically their data section which can be found here http://www.gapminder.org/data/.

 

I will therefore need to compile a new data csv file with focus on Ghana which will give me all the variables I will need for my analysis. In a nutshell, this new data csv file seeks to enable me load and call the relevant variables and columns in my python program and more importantly to be get the relevant variables I will need for my research work going forward.

I will, therefore, call the new data csv file for the assignment: gapminder_ghana_updated.csv

This will be the Gapminder csv data file I will be calling and loading into my python program

 

It is also worth mentioning that, the third variable being the literacyrate was not available on the gapminder.csv provided for the assignment. Since this is one of the core variables, I have duly added this variable and its associated values to my
new data file (gapminder_ghana_updated.csv)
to be able to make sense of the frequency distributions for my research topic.

 

The gapminder_ghana_updated.csv  dataset csv for this project can be view and dowloaded here:

 

https://drive.google.com/file/d/0B2KfPRxy4ootbzl5N0g1dUtIVzA/view?pref=2&pli=1

 

see screenshot here for guide (http://prntscr.com/9gctxn)

 

MY PYTHON PROGRAM CODE: 

 

# -*- coding: utf-8 -*-
"""
Created on Sat Dec 19 11:24:10 2015

@author: Bernard
"""

#import statements
import pandas
import numpy

#load the gapminder_ghana_updated dataset csv into the program
data = pandas.read_csv('gapminder_ghana_updated.csv', low_memory = False)

#print number of observations(rows) which is the number of years this data 
#has been looked at; print length
print("number of observations(rows) which is the number of years this data has been looked at: ")
print(len(data))

#print number of variables (columns)
print("number of variables (columns) available in the dataset: ")
print(len(data.columns))

print("data index: ")
print(len(data.index))

#Converting datat to numeric
data["incomeperperson"] = data["incomeperperson"].convert_objects(convert_numeric=True)
data["lifeexpectancy"] = data["lifeexpectancy"].convert_objects(convert_numeric=True)
data["literacyrate"] = data["literacyrate"].convert_objects(convert_numeric= True)

#displaying rows or observation in Dataframe.
#inc_pp_count is the name that will hold the result from incomeperperson count
# sort = false ; i use value false so that the data will be sorted according 
#to the original format and sequence  of the loaded data

print("counts for incomeperperson - 2010 Gross Domestic Product per capita in constant 2000 US$ of Ghana. ")
inc_pp_count = data["incomeperperson"].value_counts(sort = False)
#print the count of inc_pp_count ; incomeperperson
print(inc_pp_count)

print("percentages for incomeperperson - 2010 Gross Domestic Product per capita in constant 2000 US$ of Ghana. ")
inc_pp_percent = data["incomeperperson"].value_counts(sort=False, normalize =True)
#print the percentage of incomeperperson
print(inc_pp_percent)

print("counts for lifeexpectancy- 2011 life expectancy at birth (years) of Ghana")
life_exp_count = data["lifeexpectancy"].value_counts(sort = False)
#print the count of life_exp_count ; lifeexpectancy
print(life_exp_count)

print("percentages for lifeexpectancy- 2011 life expectancy at birth (years) of Ghana ")
life_exp_percent = data["lifeexpectancy"].value_counts(sort =False, normalize = True)
#print the percentage of life_exp_count ; lifeexpectancy
print(life_exp_percent)

print("counts for literacyrate - 2010, Literacy rate, adult total (% of people ages 15 and above) of Ghana")
lit_rate_count = data["literacyrate"].value_counts(sort = False ,dropna=False) #dropna displays missen values
#print the count of lit_rate_count ; literacyrate
print(lit_rate_count)

print("percentages literacyrate - 2010, Literacy rate, adult total (% of people ages 15 and above) of Ghana ")
lit_rate_percent = data["literacyrate"].value_counts(sort =False, normalize = True)
#print the percentage of lit_rate_count ; literacyrate
print(lit_rate_percent)

 

The Output Of
My Python Program That Displays Three Of My Data Variables As Frequency Tables:

These variables are the

a.       incomeperperson

b.      lifeexpectancy

c. literacyrate

 

 

OUTPUT

 

<<<<<<BEGINNING OF OUTPUT >>>>>>>>

number of observations(rows) which is the number of
years this data has been looked at:

216

number of variables (columns) available in the
dataset:

5

data index:

216

counts for incomeperperson – 2010 Gross Domestic Product
per capita in constant 2000 US$ of Ghana.

768     3

769     1

2306    1

771     1

773     1

1861    1

1287    1

1628    1

2240    1

1804    1

3873    1

2036    1

2322    1

3091    1

3685    1

789     1

1816    1

1305    1

1818    1

1778    1

4099    1

2751    1

1822    1

2079    1

1232    1

1570    1

2130    1

2072    1

2085    1

808     1

..

733     1

734     2

735     2

736     1

2273    1

738     2

739     1

741     1

742     2

743     1

744     1

745     1

746     1

747     2

749     1

2015    1

751     1

752     1

1906    1

754     1

756     2

757     1

759     1

761     1

762     1

763     1

764     1

2244    1

766     1

2559    1

Name: incomeperperson, dtype: int64

percentages for incomeperperson – 2010 Gross
Domestic Product per capita in constant 2000 US$ of Ghana.

768
0.013889

769
0.004630

2306
0.004630

771
0.004630

773
0.004630

1861
0.004630

1287
0.004630

1628
0.004630

2240
0.004630

1804
0.004630

3873
0.004630

2036
0.004630

2322
0.004630

3091
0.004630

3685
0.004630

789
0.004630

1816
0.004630

1305
0.004630

1818
0.004630

1778
0.004630

4099
0.004630

2751
0.004630

1822
0.004630

2079
0.004630

1232
0.004630

1570
0.004630

2130
0.004630

2072
0.004630

2085
0.004630

808
0.004630

733
0.004630

734
0.009259

735
0.009259

736
0.004630

2273
0.004630

738
0.009259

739
0.004630

741
0.004630

742
0.009259

743
0.004630

744
0.004630

745
0.004630

746
0.004630

747
0.009259

749
0.004630

2015
0.004630

751
0.004630

752
0.004630

1906
0.004630

754
0.004630

756
0.009259

757
0.004630

759
0.004630

761
0.004630

762
0.004630

763
0.004630

764
0.004630

2244
0.004630

766
0.004630

2559
0.004630

Name: incomeperperson, dtype: float64

counts for lifeexpectancy- 2011 life expectancy at
birth (years) of Ghana

29.884300
1

9.437626
1

60.300000
4

60.500000
2

58.100000
2

63.500000
1

31.816600
1

61.500000
1

28.000000
120

29.240200
1

30.528400
1

31.172500
1

32.460700
1

33.104800
1

34.393000
1

35.037100
1

36.325300
1

37.613500
1

38.257600
1

39.545800
1

40.189900
1

41.478100
1

42.122200
1

43.410400
1

44.054500
1

45.083820
1

33.748900
1

47.005480
1

48.247920
1

49.454360
1

 

56.700000
1

44.698600
1

61.800000
1

45.732040
1

46.372260
1

54.897780
1

57.400000
1

28.476880
1

47.630700
1

55.800000
1

48.856140
1

58.700000
1

62.700000
1

28.357660
1

56.400000
1

50.612800
1

51.704240
1

57.200000
1

59.700000
1

52.715680
1

60.200000
1

28.238440
1

53.640120
1

58.300000
1

54.490560
1

60.600000
2

55.500000
1

60.400000
1

59.400000
1

28.119220
1

Name: lifeexpectancy, dtype: int64

percentages for lifeexpectancy- 2011 life expectancy
at birth (years) of Ghana

29.884300
0.004630

9.437626
0.004630

60.300000
0.018519

60.500000
0.009259

58.100000
0.009259

63.500000
0.004630

31.816600    0.004630

61.500000
0.004630

28.000000
0.555556

29.240200
0.004630

30.528400
0.004630

31.172500
0.004630

32.460700
0.004630

33.104800
0.004630

34.393000
0.004630

35.037100
0.004630

36.325300
0.004630

37.613500
0.004630

38.257600
0.004630

39.545800
0.004630

40.189900
0.004630

41.478100
0.004630

42.122200
0.004630

43.410400
0.004630

44.054500
0.004630

45.083820
0.004630

33.748900
0.004630

47.005480
0.004630

48.247920
0.004630

49.454360    0.004630

56.700000
0.004630

44.698600
0.004630

61.800000
0.004630

45.732040
0.004630

46.372260
0.004630

54.897780
0.004630

57.400000
0.004630

28.476880
0.004630

47.630700
0.004630

55.800000
0.004630

48.856140
0.004630

58.700000
0.004630

62.700000
0.004630

28.357660
0.004630

56.400000
0.004630

50.612800
0.004630

51.704240
0.004630

57.200000
0.004630

59.700000
0.004630

52.715680
0.004630

60.200000
0.004630

28.238440
0.004630

53.640120    0.004630

58.300000
0.004630

54.490560
0.004630

60.600000
0.009259

55.500000
0.004630

60.400000
0.004630

59.400000
0.004630

28.119220
0.004630

Name: lifeexpectancy, dtype: float64

counts for literacyrate – 2010, Literacy rate, adult
total (% of people ages 15 and above) of Ghana

NaN
214

57.897473
1

71.497075
1

Name: literacyrate, dtype: int64

percentages literacyrate – 2010, Literacy rate,
adult total (% of people ages 15 and above) of Ghana

57.897473
0.00463

71.497075
0.00463

Name: literacyrate, dtype: float64

 

<<<<<<END OF OUTPUT
>>>>>>>>

 

 

 

VARIABLES AS FREQUENCY TABLES CAN BE FOUND AT THIS LINK:

 

https://drive.google.com/file/d/0B2KfPRxy4ootS0IxWHk0LVJZNXc/view?usp=sharing

 

SUMMARY:

This summary is about the 3 main variables that my program
caters for;

These are the

a.       incomeperperson

b.      lifeexpectancy

c.       literacyrate

The summary will look at the values the variables
take, how often they take them, the presence of missing data, etc

These data are only focusing on the country Ghana
and they were looked at from a period of 216 years starting from the year 1800 to
2015

.

Incomeperperson   variable

From the dataset, the Incomeperperson
takes a look at the Gross Domestic
Product per capita in constant 2000 US$ of Ghana. From the frequency table
between the years of 1800 to 2015 there 3 years that had 768 USD as the Incomeperperson
of the Ghana people.

This 3 time occurrence represented 0.013889%
of the total number of individual Incomeperperson figures looked at over the
216 years.

 

Incomeperperson of the values 734, 735, 738,
742, 747, 756 USD each occurred twice over the period of times they were looked
at.

Each of represent 0.009259 % of the total
number of individual Incomeperperson figures looked at over the 216 years.

The rest of the Incomeperpersons figures
were single distinct values which each occurred uniquely within the time frame
they were looked at.

And these single occurrences all put
together form over 80% of the total number of individual Incomeperperson
figures looked at over the 216 years. This means that Incomeperperson changed
almost every year. That is where my research question comes in, to find out the Association of the Income Per Person and Literacy rate
and  of the people of Ghana

 

lifeexpectancy   variable

The second variable, lifeexpectancy
takes a look at the life expectancy at
birth ( in years) of Ghana population.

From the frequency table between the
years of 1800 to 2015 it can significantly be noticed that the life expectancy
of the people of Ghana was recorded as 28 years for 120 distinct years! This
forms 0.555556% of the total of lifeexpectancy  years recorded over the 216 years the data was
collected.

Over 30 % of the lifeexpectancy years
were distinct as the frequency and percentage values are 1 and 0.00463% each
respectively.

 

literacyrate   variable

The third and final variable in focus in this
program is the literacyrate which is Literacy rate of the adult total (% of
people ages 15 and above) of Ghana

From the frequency table this variable is only
recorded twice.

 

Missing Data

There is a missen data in the variables significantly the literacyrate   as
indicated the literacyrate   frequency table. There are years where the literacy rate was not recorded and such all those years missen data in terms of literacy rate of the people of Ghana.

This has been represented with a “NaN” value in the frequency distribution table

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *