Applying filter after groupby in pandas
While using pandas, I encountered a situation where I needed to apply certain conditions after a groupby operation. After some research, I found that I could use the groupby.filter(lambda x: x) function to apply a filter.
github
- The Jupyter notebook file is available on github here
google colaboratory
- To run on Google Colaboratory, use this link
Execution Environment
!sw_vers
ProductName: macOS
ProductVersion: 13.5.1
BuildVersion: 22G90
!python -V
Python 3.11.4
Creating a Sample DataFrame
import pandas as pd
df = pd.DataFrame(
{
"user": ["A", "A", "A", "B", "C", "A", "C"],
"number": [1, 2, 1, 2, 3, 1, 1],
}
)
df
| user | number | |
|---|---|---|
| 0 | A | 1 |
| 1 | A | 2 |
| 2 | A | 1 |
| 3 | B | 2 |
| 4 | C | 3 |
| 5 | A | 1 |
| 6 | C | 1 |
I had an opportunity where I wanted to groupby each user and filter only those with a count of 2 or more. This can be achieved by using filter and lambda after groupby as follows:
df.groupby("user").filter(lambda x: x["number"].count() >= 2)
| user | number | |
|---|---|---|
| 0 | A | 1 |
| 1 | A | 2 |
| 2 | A | 1 |
| 4 | C | 3 |
| 5 | A | 1 |
| 6 | C | 1 |
It’s also possible to filter based on the maximum value of elements, such as those with a maximum value of 3 or more:
df.groupby("user").filter(lambda x: x["number"].max() >= 3)
| user | number | |
|---|---|---|
| 4 | C | 3 |
| 6 | C | 1 |
Similarly, it can be done for the minimum value of elements:
df.groupby("user").filter(lambda x: x["number"].min() <= 1)
| user | number | |
|---|---|---|
| 0 | A | 1 |
| 1 | A | 2 |
| 2 | A | 1 |
| 4 | C | 3 |
| 5 | A | 1 |
| 6 | C | 1 |
Until now, I had been creating unnecessary DataFrames, so from now on, I will execute these operations in a one-liner.