Pandas iloc, loc, and ix functions are very powerful ways to quickly select data from your dataframe. Today , we take a quick look at these 3 functions.
Credits to Data School, you can check him out in Youtube
import pandas as pd
#lets get this public dataset and play with it
ufo = pd.read_csv('http://bit.ly/uforeports')
#lets examine the head
ufo.head()
#lets work with loc. it is for selecting rows and columns by label
#lets get first row and all columns
ufo.loc[0, : ]
that returns a pandas series object
#lets get first 3 rows and all columns, by passing the rows as lists
ufo.loc[[0,1,2], : ]
#we can also use a slice to get all the rows we want and ALL columns
ufo.loc[0:2, : ] # note that the last row number '2' in the slice is also outputed
#you can also do it this way, but for legibility it is normally not ideal
# selects rows 0 -2 and all columns
ufo.loc[0:2]
#lets do some column selections
ufo.loc[ : ,'City']
Lets get multiple columns by passing a list
ufo.loc[ : , ['City','State'] ]
We can also specify a range
ufo.loc[ : , 'City':'State']
We can also combine the row and column selections
ufo.loc[0:2, 'City':'State']
We can also simply drop the time column to achieve the same result
ufo.head(3).drop('Time', axis=1)
You can also select data using boolean statemet. example lets select where city = OAKLAND
ufo[ufo.City=='Oakland']
We can do same with loc
ufo.loc[ufo.City=='Oakland', : ]
We can also select specific rows with the boolean expression
ufo.loc[ufo.City=='Oakland', 'State']
Now lets tate a look at it ‘iloc’
iloc is for selecting rows and columns by integer position. that is what the ‘i’ stands for
Lets select all rows and columns in integer positions 0 and 3
ufo.iloc[ : , [0, 3]]
We can also do a slice range of numbers. When you do a slice range of numbers, the selected data does not include the last number
in the slice
ufo.iloc[ : , 0:4]
the returned data does not include the data at integer number 4, so it only returns 0,1,2,3 integer positions
lets select a set of rows and all columns
ufo.iloc[0:3 , : ]
Now lets take a look at ix. it allows you to mix labels and integers. Tis kind of blend between loc and iloc
lets grab some new public data
drinks = pd.read_csv('http://bit.ly/drinksbycountry', index_col='country') #lets set the index to country
drinks.head()
Lets select the first figure under beer servings for albania
drinks.ix['Albania',0]
we can also do the same thing like this way
drinks.ix[ 1 , 'beer_servings']
lets select with slice range
drinks.ix['Albania':'Andorra', 0:2]
important thing about ix. if you pass it slice range numbers and you have integer indexes or integer columns it will treat the passed
slice range as integers positions and returns all data inclusive of the both numbers in the slice. However if your columns or index has
labels instead of numbers and you pass a slice range number, ix will treat it as labels and we will return data not including the last number
in the slice
#example
ufo.ix[0:2, 0:2]
we have labels in our columns so ix treated the slice range as labels and returned index positions 0,1
but we had numbers in our index and ix treated them as integer positions and returns all data from 0,1 and 2