Python Pandas Groupby function agg Series GroupbyObject

Group By FunctionThis is a quick look at Python groupby function. Very powerful and useful function. We will take a simple look at it here.

Credits to Data School , creator of Python course materials.

lets import sample dataset

In [18]:

 

In [19]:

 

In [20]:

 

Out[20]:
country beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol continent
0 Afghanistan 0 0 0 0.0 Asia
1 Albania 89 132 54 4.9 Europe
2 Algeria 25 0 14 0.7 Africa
3 Andorra 245 138 312 12.4 Europe
4 Angola 217 57 45 5.9 Africa

What is the average beer servings across ALL countries

In [21]:

 

Out[21]:
In [22]:

 

Out[22]:
In [23]:

 

In [ ]:

 

What is the average beer servings by continents

In [24]:

 

Out[24]:

lets filter the drinks by only one continent, eg Africa and then get its mean

In [25]:

 

Out[25]:

lets find the maximum beer_serving by continent

In [26]:

 

Out[26]:

there is a powerful ‘agg’ function which allows us to specifiy multiply functions at one time , by passing the functions as a list to the agg function

In [27]:

 

Out[27]:
count min max mean
continent
Africa 53 0 376 61.471698
Asia 44 0 247 37.045455
Europe 45 0 361 193.777778
North America 23 1 285 145.434783
Oceania 16 0 306 89.687500
South America 12 93 333 175.083333

You can also make calculations across all the numerical columns at one time by not selecting any specific column to use for calculation.

In [28]:

 

Out[28]:
beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol
continent
Africa 61.471698 16.339623 16.264151 3.007547
Asia 37.045455 60.840909 9.068182 2.170455
Europe 193.777778 132.555556 142.222222 8.617778
North America 145.434783 165.739130 24.521739 5.995652
Oceania 89.687500 58.437500 35.625000 3.381250
South America 175.083333 114.750000 62.416667 6.308333

We can visualize the information in a simple plot

In [29]:

 

Out[29]:

You can also retrieven how many instances of each continent is seen.

In [32]:

 

Out[32]:

Group By SPLITS the dataframe into a group of objects which each has their own keys. Functions can then be applied to each indivicual
split object, a group of these split objects or all of these split objects as groups.
We can run analysis and afterwards combine these split object back into a dataframe!

In [ ]:

lets create a groupyby object

In [33]:

 

lets check the type

In [34]:

 

Out[34]:

The groupby object is iteratable and the split objects (groups of groupbydataframe objects) from the grougpby function has their repective keys / index.
Lets iterate through this grouped object

In [35]:

 

 

Want more information like this?

Similar Posts

  • Array Transposition – Numpy Python Data Analysis

    Welcome Guys, We will be looking at Array transposition in this quick notes. This is part of lectures on Learning Python for Data Analysis and Visualization by Jose Portilla on Udemy.   In [1]:

      In [2]:

      Out[2]:

    In [17]:

      Out[17]:

    In [10]:

      Out[10]:

    In [11]:

      Out[11]:

    In [23]:

     

  • |

    Python iloc, loc, ix Data Retrieving Selection Functions

      Pandas iloc, loc, and ix functions are very powerful ways to quickly select data from your dataframe. Today , we take a quick look at these 3 functions. Credits to Data School, you can check him out in Youtube  In [1]:

      In [2]:

      In [3]:

      Out[3]: City Colors Reported Shape Reported State…

  • |

    Declare Public Protected and Private Variables in Python – Object Oriented Programming

    In Python, the scope ( Public, Protected, Private) characteristic of an attribute or member of the class is indicated by “the naming conventions of the member”. These are the naming conventions Public: This means the member can be accessed outside of the class by other instances. The naming convention denotes that it has no underscores…

  • Superset Caravel BI Tool Installation and Visualizations on Heroku

    Superset (formerly called Caravel ) BI Tool by  http://airbnb.io/superset  is a free Data Visualization tool which you can harness its powers for your visualization analytic needs. If you are finding problems installing on your windows PC or you simply want your own online version you can simply sign up FREE with Heroku (at least choosing…

  • | |

    Testing a Basic Linear Regression Model – Data Analysis and Intrepretation

    Testing a Basic Linear Regression Model Background My research work deals with Ghana, a country from the Gapminder dataset as has already been discussed from the beginning and progression through this course.     1)     Program  Code and Output

           #####################     OUTPUT BEGIN  #####################     Axes(0.125,0.125;0.775×0.775) Describe the centered…

  • Crab Recommender System – Framework in Python Example and installation Problem Fix

      This is just to show that the import errors which were encountered during the installation of Crab, a Recommender Framework in Python  worked fine with the fixes I earlier outlined. These were the errors and how they were fixed: ImportError: No module named ‘scikits ImportError: No module named learn.base   The code below is the…

Leave a Reply

Your email address will not be published. Required fields are marked *