Applying filter after groupby in pandas

While using pandas, I encountered a situation where I needed to apply certain conditions after a groupby operation. After some research, I found that I could use the groupby.filter(lambda x: x) function to apply a filter.

github

  • The Jupyter notebook file is available on github here

google colaboratory

Execution Environment

!sw_vers
ProductName:		macOS
ProductVersion:		13.5.1
BuildVersion:		22G90
!python -V
Python 3.11.4

Creating a Sample DataFrame

import pandas as pd

df = pd.DataFrame(
    {
        "user": ["A", "A", "A", "B", "C", "A", "C"],
        "number": [1, 2, 1, 2, 3, 1, 1],
    }
)

df
usernumber
0A1
1A2
2A1
3B2
4C3
5A1
6C1

I had an opportunity where I wanted to groupby each user and filter only those with a count of 2 or more. This can be achieved by using filter and lambda after groupby as follows:

df.groupby("user").filter(lambda x: x["number"].count() >= 2)
usernumber
0A1
1A2
2A1
4C3
5A1
6C1

It’s also possible to filter based on the maximum value of elements, such as those with a maximum value of 3 or more:

df.groupby("user").filter(lambda x: x["number"].max() >= 3)
usernumber
4C3
6C1

Similarly, it can be done for the minimum value of elements:

df.groupby("user").filter(lambda x: x["number"].min() <= 1)
usernumber
0A1
1A2
2A1
4C3
5A1
6C1

Until now, I had been creating unnecessary DataFrames, so from now on, I will execute these operations in a one-liner.