Python iloc, loc, ix Data Retrieving Selection Functions
Pandas iloc, loc, and ix functions are very powerful ways to quickly select data from your dataframe. Today , we take a quick look at these 3 functions.
Credits to Data School, you can check him out in Youtube
|
1 |
import pandas as pd |
|
1 2 |
#lets get this public dataset and play with it ufo = pd.read_csv('http://bit.ly/uforeports') |
|
1 2 |
#lets examine the head ufo.head() |
|
1 2 3 |
#lets work with loc. it is for selecting rows and columns by label #lets get first row and all columns ufo.loc[0, : ] |
that returns a pandas series object
|
1 2 |
#lets get first 3 rows and all columns, by passing the rows as lists ufo.loc[[0,1,2], : ] |
|
1 2 |
#we can also use a slice to get all the rows we want and ALL columns ufo.loc[0:2, : ] # note that the last row number '2' in the slice is also outputed |
|
1 2 3 |
#you can also do it this way, but for legibility it is normally not ideal # selects rows 0 -2 and all columns ufo.loc[0:2] |
|
1 2 |
#lets do some column selections ufo.loc[ : ,'City'] |
Lets get multiple columns by passing a list
|
1 |
ufo.loc[ : , ['City','State'] ] |
We can also specify a range
|
1 |
ufo.loc[ : , 'City':'State'] |
We can also combine the row and column selections
|
1 |
ufo.loc[0:2, 'City':'State'] |
We can also simply drop the time column to achieve the same result
|
1 |
ufo.head(3).drop('Time', axis=1) |
You can also select data using boolean statemet. example lets select where city = OAKLAND
|
1 |
ufo[ufo.City=='Oakland'] |
We can do same with loc
|
1 |
ufo.loc[ufo.City=='Oakland', : ] |
We can also select specific rows with the boolean expression
|
1 |
ufo.loc[ufo.City=='Oakland', 'State'] |
Now lets tate a look at it ‘iloc’
iloc is for selecting rows and columns by integer position. that is what the ‘i’ stands for
Lets select all rows and columns in integer positions 0 and 3
|
1 |
ufo.iloc[ : , [0, 3]] |
We can also do a slice range of numbers. When you do a slice range of numbers, the selected data does not include the last number
in the slice
|
1 |
ufo.iloc[ : , 0:4] |
the returned data does not include the data at integer number 4, so it only returns 0,1,2,3 integer positions
lets select a set of rows and all columns
|
1 |
ufo.iloc[0:3 , : ] |
Now lets take a look at ix. it allows you to mix labels and integers. Tis kind of blend between loc and iloc
lets grab some new public data
|
1 |
drinks = pd.read_csv('http://bit.ly/drinksbycountry', index_col='country') #lets set the index to country |
|
1 |
drinks.head() |
Lets select the first figure under beer servings for albania
|
1 |
drinks.ix['Albania',0] |
we can also do the same thing like this way
|
1 |
drinks.ix[ 1 , 'beer_servings'] |
lets select with slice range
|
1 |
drinks.ix['Albania':'Andorra', 0:2] |
important thing about ix. if you pass it slice range numbers and you have integer indexes or integer columns it will treat the passed
slice range as integers positions and returns all data inclusive of the both numbers in the slice. However if your columns or index has
labels instead of numbers and you pass a slice range number, ix will treat it as labels and we will return data not including the last number
in the slice
|
1 2 |
#example ufo.ix[0:2, 0:2] |
we have labels in our columns so ix treated the slice range as labels and returned index positions 0,1
but we had numbers in our index and ix treated them as integer positions and returns all data from 0,1 and 2
