Python Pandas Groupby function agg Series GroupbyObject

Group By FunctionThis is a quick look at Python groupby function. Very powerful and useful function. We will take a simple look at it here.

Credits to Data School , creator of Python course materials.

lets import sample dataset

In [18]:

 

In [19]:

 

In [20]:

 

Out[20]:
country beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol continent
0 Afghanistan 0 0 0 0.0 Asia
1 Albania 89 132 54 4.9 Europe
2 Algeria 25 0 14 0.7 Africa
3 Andorra 245 138 312 12.4 Europe
4 Angola 217 57 45 5.9 Africa

What is the average beer servings across ALL countries

In [21]:

 

Out[21]:
In [22]:

 

Out[22]:
In [23]:

 

In [ ]:

 

What is the average beer servings by continents

In [24]:

 

Out[24]:

lets filter the drinks by only one continent, eg Africa and then get its mean

In [25]:

 

Out[25]:

lets find the maximum beer_serving by continent

In [26]:

 

Out[26]:

there is a powerful ‘agg’ function which allows us to specifiy multiply functions at one time , by passing the functions as a list to the agg function

In [27]:

 

Out[27]:
count min max mean
continent
Africa 53 0 376 61.471698
Asia 44 0 247 37.045455
Europe 45 0 361 193.777778
North America 23 1 285 145.434783
Oceania 16 0 306 89.687500
South America 12 93 333 175.083333

You can also make calculations across all the numerical columns at one time by not selecting any specific column to use for calculation.

In [28]:

 

Out[28]:
beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol
continent
Africa 61.471698 16.339623 16.264151 3.007547
Asia 37.045455 60.840909 9.068182 2.170455
Europe 193.777778 132.555556 142.222222 8.617778
North America 145.434783 165.739130 24.521739 5.995652
Oceania 89.687500 58.437500 35.625000 3.381250
South America 175.083333 114.750000 62.416667 6.308333

We can visualize the information in a simple plot

In [29]:

 

Out[29]:

You can also retrieven how many instances of each continent is seen.

In [32]:

 

Out[32]:

Group By SPLITS the dataframe into a group of objects which each has their own keys. Functions can then be applied to each indivicual
split object, a group of these split objects or all of these split objects as groups.
We can run analysis and afterwards combine these split object back into a dataframe!

In [ ]:

lets create a groupyby object

In [33]:

 

lets check the type

In [34]:

 

Out[34]:

The groupby object is iteratable and the split objects (groups of groupbydataframe objects) from the grougpby function has their repective keys / index.
Lets iterate through this grouped object

In [35]:

 

 

Want more information like this?

Similar Posts

  • | |

    Testing a Basic Linear Regression Model – Data Analysis and Intrepretation

    Testing a Basic Linear Regression Model Background My research work deals with Ghana, a country from the Gapminder dataset as has already been discussed from the beginning and progression through this course.     1)     Program  Code and Output

           #####################     OUTPUT BEGIN  #####################     Axes(0.125,0.125;0.775×0.775) Describe the centered…

  • Python Pandas Pivot Table Index location Percentage calculation on Two columns

    pivot table for year on year This is a quick example of how to use pivot_table,  to calculate year on year percentage sales . In [162]:

      In [163]:

     

    In [164]:

      Out[164]: First Name Last Name Sex Phone Fax Email Address Booking Date Departure Date Arrival Date Address1 … Number of Passengers…

  • Array Transposition – Numpy Python Data Analysis

    Welcome Guys, We will be looking at Array transposition in this quick notes. This is part of lectures on Learning Python for Data Analysis and Visualization by Jose Portilla on Udemy.   In [1]:

      In [2]:

      Out[2]:

    In [17]:

      Out[17]:

    In [10]:

      Out[10]:

    In [11]:

      Out[11]:

    In [23]:

     

  • | | |

    K-Means Cluster Analysis – Data Analysis and Intrepretation

    Overview My research work deals with Ghana, a country from the Gapminder dataset as has already been discussed from the beginning and progression through this course I conducted a k-means cluster analysis to find out the underlying sets of the population of Ghana based on their similarity of responses on 22 variables that represent characteristics…

  • |

    Free Open Source Data Visualization Tools and Service

    There are a couple Free Open Source Data Visualization Tools and Services which you can make use of for your work data exploration and visualization purposes. Below ,  I list a few ones in no order of importance. Caravel:   It is a Python based data exploration and visualization software by Airbnb. It is powerful and…

  • Python Bokeh plotting Data Exploration Visualization And Pivot Tables Analysis

    This is a quick walk through Bokeh data exploration and visualization and also python pivot_tables (credit to pbpython on the pivot_tables).Visualization Dashboard In [49]:

      In [50]:

     

    In [51]:

      Out[51]: First Name Last Name Sex Phone Fax Email Address Booking Date Departure Date Arrival Date Address1 … Number of Passengers Total Cost Currency…

Leave a Reply

Your email address will not be published. Required fields are marked *