[python] Filter after groupby in pandas
library pandas python
Published : 2022-03-30   Lastmod : 2022-07-17

Apply filter after groupby in pandas

While using pandas, I encountered a situation where I wanted to apply a certain condition after a groupby. I found that I can use groupby.filter(lambda x: x) to apply a filter function.

github

  • The file in jupyter notebook format on github is here

google colaboratory

  • If you want to run it on google colaboratory, here

Execution environment

!sw_vers
ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G95
!python -V
Python 3.5.5 :: Anaconda, Inc.

Create a suitable DataFrame.

import pandas as pd

df = pd.DataFrame({
    'item': ['apple', 'apple', 'apple', 'orange', 'melon', 'apple', 'orange'],
    'sales': [1, 2, 1, 2, 3, 1, 1],
})
df
itemsales
0apple1
1apple2
2apple1
3orange2
4melon3
5apple1
6orange1

From this DataFrame, for example, there was an opportunity to group by only those items with a count of 2 or more items. This can be done by using filter and lambda after groupby as follows.

df.groupby('item').filter(lambda x: x['sales'].count() >= 2)
itemsales
0apple1
1apple2
2apple1
3orange2
5apple1
6orange1
df.groupby('item').filter(lambda x: x['sales'].max() >= 3)
itemsales
4melon3
df.groupby('item').filter(lambda x: x['sales'].min() <= 2)
itemsales
0apple1
1apple2
2apple1
3orange2
5apple1
6orange1

Since we have been creating extra DataFrames, we will now execute them in a one-liner.

Related Articles