[python] Applying filter after groupby in pandas

Applying filter after groupby in pandas

While using pandas, I encountered a situation where I needed to apply certain conditions after a groupby operation. After some research, I found that I could use the groupby.filter(lambda x: x) function to apply a filter.

github

The Jupyter notebook file is available on github here

google colaboratory

To run on Google Colaboratory, use this link

Execution Environment

!sw_vers

ProductName:		macOS
ProductVersion:		13.5.1
BuildVersion:		22G90

!python -V

Python 3.11.4

Creating a Sample DataFrame

import pandas as pd

df = pd.DataFrame(
    {
        "user": ["A", "A", "A", "B", "C", "A", "C"],
        "number": [1, 2, 1, 2, 3, 1, 1],
    }
)

df

	user	number
0	A	1
1	A	2
2	A	1
3	B	2
4	C	3
5	A	1
6	C	1

I had an opportunity where I wanted to groupby each user and filter only those with a count of 2 or more. This can be achieved by using filter and lambda after groupby as follows:

df.groupby("user").filter(lambda x: x["number"].count() >= 2)

	user	number
0	A	1
1	A	2
2	A	1
4	C	3
5	A	1
6	C	1

It’s also possible to filter based on the maximum value of elements, such as those with a maximum value of 3 or more:

df.groupby("user").filter(lambda x: x["number"].max() >= 3)

	user	number
4	C	3
6	C	1

Similarly, it can be done for the minimum value of elements:

df.groupby("user").filter(lambda x: x["number"].min() <= 1)

	user	number
0	A	1
1	A	2
2	A	1
4	C	3
5	A	1
6	C	1

Until now, I had been creating unnecessary DataFrames, so from now on, I will execute these operations in a one-liner.