Applying filter after groupby in pandas
While using pandas, I encountered a situation where I needed to apply certain conditions after a groupby operation. After some research, I found that I could use the groupby.filter(lambda x: x)
function to apply a filter.
github
- The Jupyter notebook file is available on github here
google colaboratory
- To run on Google Colaboratory, use this link
Execution Environment
!sw_vers
ProductName: macOS
ProductVersion: 13.5.1
BuildVersion: 22G90
!python -V
Python 3.11.4
Creating a Sample DataFrame
import pandas as pd
df = pd.DataFrame(
{
"user": ["A", "A", "A", "B", "C", "A", "C"],
"number": [1, 2, 1, 2, 3, 1, 1],
}
)
df
user | number | |
---|---|---|
0 | A | 1 |
1 | A | 2 |
2 | A | 1 |
3 | B | 2 |
4 | C | 3 |
5 | A | 1 |
6 | C | 1 |
I had an opportunity where I wanted to groupby each user and filter only those with a count of 2 or more. This can be achieved by using filter and lambda after groupby as follows:
df.groupby("user").filter(lambda x: x["number"].count() >= 2)
user | number | |
---|---|---|
0 | A | 1 |
1 | A | 2 |
2 | A | 1 |
4 | C | 3 |
5 | A | 1 |
6 | C | 1 |
It’s also possible to filter based on the maximum value of elements, such as those with a maximum value of 3 or more:
df.groupby("user").filter(lambda x: x["number"].max() >= 3)
user | number | |
---|---|---|
4 | C | 3 |
6 | C | 1 |
Similarly, it can be done for the minimum value of elements:
df.groupby("user").filter(lambda x: x["number"].min() <= 1)
user | number | |
---|---|---|
0 | A | 1 |
1 | A | 2 |
2 | A | 1 |
4 | C | 3 |
5 | A | 1 |
6 | C | 1 |
Until now, I had been creating unnecessary DataFrames, so from now on, I will execute these operations in a one-liner.