[python] A note on adding Series after dropna in pandas
library python
Published : 2022-01-16   Lastmod : 2022-01-16

## Python Tips

This is my personal memo about useful notations in python. I have not touched on the basics. It is limited to what I find useful.

### github

• The jupyter notebook format file on github is here

• To run it on google colaboratory here

### Author’s environment

! sw_vers

ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G95

Python -V

Python 3.5.5 :: Anaconda, Inc.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import time
import json

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import japanize_matplotlib


## Note on adding columns with pandas.

When I tried to add a column of type Series to a DataFrame with nan, after dropping it, it didn’t work as expected and I lost several hours.

a = pd.DataFrame({
'a': [1,2,3],
'b': [1,np.nan,3],
'c': [1,2,3],
})

a = a.dropna(subset=['b'])
a['d'] = pd.Series([1,33])
a

abcd
011.011.0
233.03NaN

I assumed that the second data in $d$ had 33 in it, but it has nan in it.

It took me a few hours to realize that I had to reset the index.

a = pd.DataFrame({
'a': [1,2,3],
'b': [1,np.nan,3],
'c': [1,2,3],
})

a = a.dropna(subset=['b']).reset_index()
a['d'] = pd.Series([1,33])
a

.
indexabcd
0011.011
1233.0333

and the new column was added as expected.

Experimentally indexing the first $a$ with an appropriate number resulted in the addition of a column that I had not expected.

a = pd.DataFrame({
'a': [1,2,3],
'b': [1,np.nan,3],
'c': [1,2,3],
},index=[12,24,36])

a = a.dropna(subset=['b'])
a['d'] = pd.Series([1,33])
a

abcd
1211.01NaN
3633.03NaN

I think the index is implicitly set from 0 when adding a new column as follows.

a['d'] = pd.Series([1,33])