pandas and data analysis

pandas is an important tool that you must use in your data analysis. Whether you know how to use it or not, or whether you can move your hands quickly without having to do a Google search for what you want to do, will directly affect your ability as an engineer. In this section, I will explain the methods and how to use them that I think are important based on my experience with specific data. If I come across any other important usage, I will add it to this page.

github

  • The file in jupyter notebook format is here

google colaboratory

  • If you want to run it in google colaboratory here pandas_nb.ipynb)

Author’s environment

The author’s OS is macOS, and the options are different from Linux and Unix commands.

! sw_vers
```` ProductName: Mac OS X

    ProductName: Mac OS X
    ProductVersion: 10.14.6
    BuildVersion: 18G6020



```python
Python -V
```python !

    Python 3.7.3


Import the basic libraries and check their versions.


```python
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import matplotlib
import matplotlib.pyplot as plt
import scipy
import numpy as np

print('matplotlib version :', matplotlib.__version__)
print('scipy version :', scipy.__version__)
print('numpy version :', np.__version__)
matplotlib version : 3.0.3
scipy version : 1.4.1
numpy version : 1.16.2

import and version checking

``python import pandas as pd

print(‘pandas version :’, pd.version)


    pandas version : 1.0.3


## Basic operations

## Loading and displaying data

The data we will be using is from [daniel's github](https://github.com/chendaniely/pandas_for_everyone), the repository of a famous author who has written a book on how to use pandas. [Basic coding for Python data analysis/machine learning! Introduction to using the pandas library](https://www.amazon.co.jp/dp/B07NZP6V29/ref=dp-kindle-redirect?_encoding=UTF8&btkr=1). I also have it. I think it is very informative.

I think the data is the number of Ebola outbreaks (Cases) and deaths (Deaths).

I will use `read_csv` to read the CSV and display the first 5 lines.


Using ``python
import pandas as pd

df = pd.read_csv('. /country_timeseries.csv', sep=',')
df.head()
```




<div>.
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th { vertical-align: right; }
        text-align: right;
    }
</style
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Date</th>
      <th>Day</th>.
      <th>Cases_Guinea</th> <th>Cases_Guinea</th
      <th>Cases_Liberia</th> <th>Cases_Sierra</th
      <th>Cases_SierraLeone</th
      <th>Cases_Nigeria</th
      <th>Cases_Senegal</th
      <th>Cases_UnitedStates</th
      <th>Cases_Spain</th
      <th>Cases_Mali</th
      <th>Deaths_Guinea</th> <th>Deaths_Guinea</th
      <th>Deaths_Liberia</th> <th>Deaths_Liberia</th
      <th>Deaths_SierraLeone</th
      <th>Deaths_Nigeria</th> <th>Deaths_SierraLeone</th
      <th>Deaths_Senegal</th> <th>Deaths_Senegal</th
      <th>Deaths_UnitedStates</th
      <th>Deaths_Spain</th
      <th>Deaths_Mali</th>.
    </tr>.
  </thead>
  <tbody>.
    <tr>.
      <th>0</th>.
      <td>1/5/2015</td>
      <td>289</td>
      <td>2776.0</td>
      <td>NaN</td>
      <td>10030.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>1786.0</td>
      <td>NaN</td>
      <td>2977.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1/4/2015</td>
      <td>288</td>
      <td>2775.0</td>
      <td>NaN</td>
      <td>9780.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>1781.0</td>
      <td>NaN</td>
      <td>2943.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1/3/2015</td>
      <td>287</td>
      <td>2769.0</td>
      <td>8166.0</td>
      <td>9722.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>1767.0</td>
      <td>3496.0</td>
      <td>2915.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>3</th> <td>1/2/2015</td>
      <td>1/2/2015</td>
      <td>286</td>
      <td>NaN</td>
      <td>8157.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3496.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>4</th>
      <td>12/31/2014</td>
      <td>284</td>
      <td>2730.0</td>
      <td>8115.0</td>
      <td>9633.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>1739.0</td>
      <td>3471.0</td>
      <td>2827.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>



Display the last 5 data.


```python
df.tail()
```





<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Date</th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Guinea</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>117</th>
      <td>3/27/2014</td>
      <td>5</td>
      <td>103.0</td>
      <td>8.0</td>
      <td>6.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>66.0</td>
      <td>6.0</td>
      <td>5.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>118</th>
      <td>3/26/2014</td>
      <td>4</td>
      <td>86.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>62.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>119</th>
      <td>3/25/2014</td>
      <td>3</td>
      <td>86.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>60.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>120</th>
      <td>3/24/2014</td>
      <td>2</td>
      <td>86.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>59.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>121</th>
      <td>3/22/2014</td>
      <td>0</td>
      <td>49.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>29.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>



### Check your data.

#### Get data type and other information


```python
df.info()
```

    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 122 entries, 0 to 121
    Data columns (total 18 columns):
     #   Column               Non-Null Count  Dtype
    ---  ------               --------------  -----
     0   Date                 122 non-null    object
     1   Day                  122 non-null    int64
     2   Cases_Guinea         93 non-null     float64
     3   Cases_Liberia        83 non-null     float64
     4   Cases_SierraLeone    87 non-null     float64
     5   Cases_Nigeria        38 non-null     float64
     6   Cases_Senegal        25 non-null     float64
     7   Cases_UnitedStates   18 non-null     float64
     8   Cases_Spain          16 non-null     float64
     9   Cases_Mali           12 non-null     float64
     10  Deaths_Guinea        92 non-null     float64
     11  Deaths_Liberia       81 non-null     float64
     12  Deaths_SierraLeone   87 non-null     float64
     13  Deaths_Nigeria       38 non-null     float64
     14  Deaths_Senegal       22 non-null     float64
     15  Deaths_UnitedStates  18 non-null     float64
     16  Deaths_Spain         16 non-null     float64
     17  Deaths_Mali          12 non-null     float64
    dtypes: float64(16), int64(1), object(1)
    memory usage: 17.3+ KB






#### 大きさ(行数と列数)の確認


```python
df.shape
```




    (122, 18)



#### インデックスの確認


```python
df.index
```




    RangeIndex(start=0, stop=122, step=1)



#### カラム名の確認


```python
df.columns
```




    Index(['Date', 'Day', 'Cases_Guinea', 'Cases_Liberia', 'Cases_SierraLeone',
           'Cases_Nigeria', 'Cases_Senegal', 'Cases_UnitedStates', 'Cases_Spain',
           'Cases_Mali', 'Deaths_Guinea', 'Deaths_Liberia', 'Deaths_SierraLeone',
           'Deaths_Nigeria', 'Deaths_Senegal', 'Deaths_UnitedStates',
           'Deaths_Spain', 'Deaths_Mali'],
          dtype='object')







#### Retrieving data for any column name

Specify a column name to display only the desired column.

```python
df[['Cases_UnitedStates','Deaths_UnitedStates']].head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Cases_UnitedStates</th>
      <th>Deaths_UnitedStates</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1</th>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2</th>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>3</th>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>4</th>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>



#### Retrieve data by specifying the number of rows and columns


```python
df.iloc[[6,7],[0,3]]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Date</th>
      <th>Cases_Liberia</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>6</th>
      <td>12/27/2014</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>7</th>
      <td>12/24/2014</td>
      <td>7977.0</td>
    </tr>
  </tbody>
</table>
</div>




#### Retrieve data that satisfies certain conditions


```python
df[df['Deaths_Liberia'] > 3000][['Deaths_Liberia']]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Deaths_Liberia</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2</th>
      <td>3496.0</td>
    </tr>
    <tr>
      <th>3</th>
      <td>3496.0</td>
    </tr>
    <tr>
      <th>4</th>
      <td>3471.0</td>
    </tr>
    <tr>
      <th>5</th>
      <td>3423.0</td>
    </tr>
    <tr>
      <th>7</th>
      <td>3413.0</td>
    </tr>
    <tr>
      <th>9</th>
      <td>3384.0</td>
    </tr>
    <tr>
      <th>10</th>
      <td>3376.0</td>
    </tr>
    <tr>
      <th>12</th>
      <td>3290.0</td>
    </tr>
    <tr>
      <th>14</th>
      <td>3177.0</td>
    </tr>
    <tr>
      <th>16</th>
      <td>3145.0</td>
    </tr>
    <tr>
      <th>18</th>
      <td>3016.0</td>
    </tr>
  </tbody>
</table>
</div>




#### Deleting a column
We are deleting the column Deaths_Guinea.



```python
df.drop(['Deaths_Guinea'], axis=1, inplace=True)
df.columns
```




    Index(['Date', 'Day', 'Cases_Guinea', 'Cases_Liberia', 'Cases_SierraLeone',
           'Cases_Nigeria', 'Cases_Senegal', 'Cases_UnitedStates', 'Cases_Spain',
           'Cases_Mali', 'Deaths_Liberia', 'Deaths_SierraLeone', 'Deaths_Nigeria',
           'Deaths_Senegal', 'Deaths_UnitedStates', 'Deaths_Spain', 'Deaths_Mali'],
          dtype='object')




#### Getting statistics

Using describe(), you can get the statistics for each column. This is powerful when you want to get a quick overview.



```python
df.describe()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>count</th>
      <td>122.000000</td>
      <td>93.000000</td>
      <td>83.000000</td>
      <td>87.000000</td>
      <td>38.000000</td>
      <td>25.00</td>
      <td>18.000000</td>
      <td>16.0</td>
      <td>12.000000</td>
      <td>81.000000</td>
      <td>87.000000</td>
      <td>38.000000</td>
      <td>22.0</td>
      <td>18.000000</td>
      <td>16.000000</td>
      <td>12.000000</td>
    </tr>
    <tr>
      <th>mean</th>
      <td>144.778689</td>
      <td>911.064516</td>
      <td>2335.337349</td>
      <td>2427.367816</td>
      <td>16.736842</td>
      <td>1.08</td>
      <td>3.277778</td>
      <td>1.0</td>
      <td>3.500000</td>
      <td>1101.209877</td>
      <td>693.701149</td>
      <td>6.131579</td>
      <td>0.0</td>
      <td>0.833333</td>
      <td>0.187500</td>
      <td>3.166667</td>
    </tr>
    <tr>
      <th>std</th>
      <td>89.316460</td>
      <td>849.108801</td>
      <td>2987.966721</td>
      <td>3184.803996</td>
      <td>5.998577</td>
      <td>0.40</td>
      <td>1.178511</td>
      <td>0.0</td>
      <td>2.746899</td>
      <td>1297.208568</td>
      <td>869.947073</td>
      <td>2.781901</td>
      <td>0.0</td>
      <td>0.383482</td>
      <td>0.403113</td>
      <td>2.405801</td>
    </tr>
    <tr>
      <th>min</th>
      <td>0.000000</td>
      <td>49.000000</td>
      <td>3.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>1.00</td>
      <td>1.000000</td>
      <td>1.0</td>
      <td>1.000000</td>
      <td>2.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.0</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>25%</th>
      <td>66.250000</td>
      <td>236.000000</td>
      <td>25.500000</td>
      <td>64.500000</td>
      <td>15.000000</td>
      <td>1.00</td>
      <td>3.000000</td>
      <td>1.0</td>
      <td>1.000000</td>
      <td>12.000000</td>
      <td>6.000000</td>
      <td>4.000000</td>
      <td>0.0</td>
      <td>1.000000</td>
      <td>0.000000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>50%</th>
      <td>150.000000</td>
      <td>495.000000</td>
      <td>516.000000</td>
      <td>783.000000</td>
      <td>20.000000</td>
      <td>1.00</td>
      <td>4.000000</td>
      <td>1.0</td>
      <td>2.500000</td>
      <td>294.000000</td>
      <td>334.000000</td>
      <td>8.000000</td>
      <td>0.0</td>
      <td>1.000000</td>
      <td>0.000000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>75%</th>
      <td>219.500000</td>
      <td>1519.000000</td>
      <td>4162.500000</td>
      <td>3801.000000</td>
      <td>20.000000</td>
      <td>1.00</td>
      <td>4.000000</td>
      <td>1.0</td>
      <td>6.250000</td>
      <td>2413.000000</td>
      <td>1176.000000</td>
      <td>8.000000</td>
      <td>0.0</td>
      <td>1.000000</td>
      <td>0.000000</td>
      <td>6.000000</td>
    </tr>
    <tr>
      <th>max</th>
      <td>289.000000</td>
      <td>2776.000000</td>
      <td>8166.000000</td>
      <td>10030.000000</td>
      <td>22.000000</td>
      <td>3.00</td>
      <td>4.000000</td>
      <td>1.0</td>
      <td>7.000000</td>
      <td>3496.000000</td>
      <td>2977.000000</td>
      <td>8.000000</td>
      <td>0.0</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>6.000000</td>
    </tr>
  </tbody>
</table>
</div>



You can also use the value_counts method to easily find the frequency of a value. It is a little difficult to understand because the data used in this project is a continuous quantity, but we can see that the number of data for 0.0 is 15. You can see that the number of data for 0.0 is 15, and all the other data are 1.


```python
df['Deaths_Liberia'].value_counts()
```




    11.0      9
    12.0      4
    3496.0    2
    13.0      2
    24.0      2
             ..
    2963.0    1
    88.0      1
    3413.0    1
    3177.0    1
    105.0     1
    Name: Deaths_Liberia, Length: 66, dtype: int64




## Change the index to datetime type

Change the index to Date and overwrite it. For time-series data, it is often easier to analyze if the index is a date. However, rather than simply indexing it as a string, converting it to datetime type, which is a standard feature of panda, will make it more convenient for aggregation.

Change the index name "Date" to YYYYMMDD. Use the rename function.

```python
df.rename(columns={'Date':'YYYYMMDD'}, inplace=True)
df.set_index('YYYYMMDD', inplace=True)
df.index
```




    Index(['1/5/2015', '1/4/2015', '1/3/2015', '1/2/2015', '12/31/2014',
           '12/28/2014', '12/27/2014', '12/24/2014', '12/21/2014', '12/20/2014',
           ...
           '4/4/2014', '4/1/2014', '3/31/2014', '3/29/2014', '3/28/2014',
           '3/27/2014', '3/26/2014', '3/25/2014', '3/24/2014', '3/22/2014'],
          dtype='object', name='YYYYMMDD', length=122)




```python
df.head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>1/5/2015</th>
      <td>289</td>
      <td>2776.0</td>
      <td>NaN</td>
      <td>10030.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2977.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1/4/2015</th>
      <td>288</td>
      <td>2775.0</td>
      <td>NaN</td>
      <td>9780.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2943.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1/3/2015</th>
      <td>287</td>
      <td>2769.0</td>
      <td>8166.0</td>
      <td>9722.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3496.0</td>
      <td>2915.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1/2/2015</th>
      <td>286</td>
      <td>NaN</td>
      <td>8157.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3496.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>12/31/2014</th>
      <td>284</td>
      <td>2730.0</td>
      <td>8115.0</td>
      <td>9633.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3471.0</td>
      <td>2827.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.columns
```




    Index(['Day', 'Cases_Guinea', 'Cases_Liberia', 'Cases_SierraLeone',
           'Cases_Nigeria', 'Cases_Senegal', 'Cases_UnitedStates', 'Cases_Spain',
           'Cases_Mali', 'Deaths_Liberia', 'Deaths_SierraLeone', 'Deaths_Nigeria',
           'Deaths_Senegal', 'Deaths_UnitedStates', 'Deaths_Spain', 'Deaths_Mali'],
          dtype='object')



Sort by index. However, the date is a string object, so it is not sorted as I had planned.


```python
df.sort_index(ascending=True).head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>1/2/2015</th>
      <td>286</td>
      <td>NaN</td>
      <td>8157.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3496.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1/3/2015</th>
      <td>287</td>
      <td>2769.0</td>
      <td>8166.0</td>
      <td>9722.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3496.0</td>
      <td>2915.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1/4/2015</th>
      <td>288</td>
      <td>2775.0</td>
      <td>NaN</td>
      <td>9780.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2943.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1/5/2015</th>
      <td>289</td>
      <td>2776.0</td>
      <td>NaN</td>
      <td>10030.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2977.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>10/1/2014</th>
      <td>193</td>
      <td>1199.0</td>
      <td>3834.0</td>
      <td>2437.0</td>
      <td>20.0</td>
      <td>1.0</td>
      <td>1.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2069.0</td>
      <td>623.0</td>
      <td>8.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>




Change the index to a datetime type.

```python
df.index
```




    Index(['1/5/2015', '1/4/2015', '1/3/2015', '1/2/2015', '12/31/2014',
           '12/28/2014', '12/27/2014', '12/24/2014', '12/21/2014', '12/20/2014',
           ...
           '4/4/2014', '4/1/2014', '3/31/2014', '3/29/2014', '3/28/2014',
           '3/27/2014', '3/26/2014', '3/25/2014', '3/24/2014', '3/22/2014'],
          dtype='object', name='YYYYMMDD', length=122)




```python
df.index = pd.to_datetime(df.index, format='%m/%d/%Y')
df.index
```




    DatetimeIndex(['2015-01-05', '2015-01-04', '2015-01-03', '2015-01-02',
                   '2014-12-31', '2014-12-28', '2014-12-27', '2014-12-24',
                   '2014-12-21', '2014-12-20',
                   ...
                   '2014-04-04', '2014-04-01', '2014-03-31', '2014-03-29',
                   '2014-03-28', '2014-03-27', '2014-03-26', '2014-03-25',
                   '2014-03-24', '2014-03-22'],
                  dtype='datetime64[ns]', name='YYYYMMDD', length=122, freq=None)




We can see that the type has been changed from dtype='object' to object='datetime64', which is a datetime type. So let's sort it.


```python
df.sort_index(ascending=True, inplace=True)
df.head(10)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-03-22</th>
      <td>0</td>
      <td>49.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-24</th>
      <td>2</td>
      <td>86.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-25</th>
      <td>3</td>
      <td>86.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-26</th>
      <td>4</td>
      <td>86.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-27</th>
      <td>5</td>
      <td>103.0</td>
      <td>8.0</td>
      <td>6.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>6.0</td>
      <td>5.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-28</th>
      <td>6</td>
      <td>112.0</td>
      <td>3.0</td>
      <td>2.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3.0</td>
      <td>2.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-29</th>
      <td>7</td>
      <td>112.0</td>
      <td>7.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-31</th>
      <td>9</td>
      <td>122.0</td>
      <td>8.0</td>
      <td>2.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>4.0</td>
      <td>2.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-04-01</th>
      <td>10</td>
      <td>127.0</td>
      <td>8.0</td>
      <td>2.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>5.0</td>
      <td>2.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-04-04</th>
      <td>13</td>
      <td>143.0</td>
      <td>18.0</td>
      <td>2.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>7.0</td>
      <td>2.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.tail(10)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-12-20</th>
      <td>272</td>
      <td>2571.0</td>
      <td>7862.0</td>
      <td>8939.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3384.0</td>
      <td>2556.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-21</th>
      <td>273</td>
      <td>2597.0</td>
      <td>NaN</td>
      <td>9004.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2582.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-24</th>
      <td>277</td>
      <td>2630.0</td>
      <td>7977.0</td>
      <td>9203.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3413.0</td>
      <td>2655.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-27</th>
      <td>280</td>
      <td>2695.0</td>
      <td>NaN</td>
      <td>9409.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2732.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-28</th>
      <td>281</td>
      <td>2706.0</td>
      <td>8018.0</td>
      <td>9446.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3423.0</td>
      <td>2758.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-31</th>
      <td>284</td>
      <td>2730.0</td>
      <td>8115.0</td>
      <td>9633.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3471.0</td>
      <td>2827.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2015-01-02</th>
      <td>286</td>
      <td>NaN</td>
      <td>8157.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3496.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2015-01-03</th>
      <td>287</td>
      <td>2769.0</td>
      <td>8166.0</td>
      <td>9722.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3496.0</td>
      <td>2915.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2015-01-04</th>
      <td>288</td>
      <td>2775.0</td>
      <td>NaN</td>
      <td>9780.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2943.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2015-01-05</th>
      <td>289</td>
      <td>2776.0</td>
      <td>NaN</td>
      <td>10030.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2977.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>




This shows that the sorting is working as expected.

Also, since the datetime type is now set as the index, it is easier to handle dates.
For example, to get the data for the year 2015, we can use the following


```python
df['2015']
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2015-01-02</th>
      <td>286</td>
      <td>NaN</td>
      <td>8157.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3496.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2015-01-03</th>
      <td>287</td>
      <td>2769.0</td>
      <td>8166.0</td>
      <td>9722.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3496.0</td>
      <td>2915.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2015-01-04</th>
      <td>288</td>
      <td>2775.0</td>
      <td>NaN</td>
      <td>9780.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2943.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2015-01-05</th>
      <td>289</td>
      <td>2776.0</td>
      <td>NaN</td>
      <td>10030.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2977.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>




```python
df['2014-12'].sort_index(ascending=True)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-12-03</th>
      <td>256</td>
      <td>NaN</td>
      <td>7719.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3177.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-07</th>
      <td>260</td>
      <td>2292.0</td>
      <td>NaN</td>
      <td>7897.0</td>
      <td>20.0</td>
      <td>1.0</td>
      <td>4.0</td>
      <td>1.0</td>
      <td>7.0</td>
      <td>NaN</td>
      <td>1768.0</td>
      <td>8.0</td>
      <td>0.0</td>
      <td>1.0</td>
      <td>0.0</td>
      <td>6.0</td>
    </tr>
    <tr>
      <th>2014-12-09</th>
      <td>262</td>
      <td>NaN</td>
      <td>7797.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3290.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-14</th>
      <td>267</td>
      <td>2416.0</td>
      <td>NaN</td>
      <td>8356.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2085.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-18</th>
      <td>271</td>
      <td>NaN</td>
      <td>7830.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3376.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-20</th>
      <td>272</td>
      <td>2571.0</td>
      <td>7862.0</td>
      <td>8939.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3384.0</td>
      <td>2556.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-21</th>
      <td>273</td>
      <td>2597.0</td>
      <td>NaN</td>
      <td>9004.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2582.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-24</th>
      <td>277</td>
      <td>2630.0</td>
      <td>7977.0</td>
      <td>9203.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3413.0</td>
      <td>2655.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-27</th>
      <td>280</td>
      <td>2695.0</td>
      <td>NaN</td>
      <td>9409.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>2732.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-28</th>
      <td>281</td>
      <td>2706.0</td>
      <td>8018.0</td>
      <td>9446.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3423.0</td>
      <td>2758.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-12-31</th>
      <td>284</td>
      <td>2730.0</td>
      <td>8115.0</td>
      <td>9633.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3471.0</td>
      <td>2827.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>




Furthermore, statistics such as averages and totals can be easily obtained on a yearly or monthly basis.


```python
df.resample('Y').mean()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-12-31</th>
      <td>139.940678</td>
      <td>848.988889</td>
      <td>2191.481481</td>
      <td>2162.488095</td>
      <td>16.736842</td>
      <td>1.08</td>
      <td>3.277778</td>
      <td>1.0</td>
      <td>3.5</td>
      <td>1040.582278</td>
      <td>613.297619</td>
      <td>6.131579</td>
      <td>0.0</td>
      <td>0.833333</td>
      <td>0.1875</td>
      <td>3.166667</td>
    </tr>
    <tr>
      <th>2015-12-31</th>
      <td>287.500000</td>
      <td>2773.333333</td>
      <td>8161.500000</td>
      <td>9844.000000</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3496.000000</td>
      <td>2945.000000</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.resample('M').mean()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-03-31</th>
      <td>4.500000</td>
      <td>94.500000</td>
      <td>6.500000</td>
      <td>3.333333</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3.750000</td>
      <td>3.000000</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-04-30</th>
      <td>24.333333</td>
      <td>177.818182</td>
      <td>24.555556</td>
      <td>2.200000</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>9.625000</td>
      <td>1.111111</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-05-31</th>
      <td>51.888889</td>
      <td>248.777778</td>
      <td>12.555556</td>
      <td>7.333333</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>11.111111</td>
      <td>1.222222</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-06-30</th>
      <td>84.636364</td>
      <td>373.428571</td>
      <td>35.500000</td>
      <td>125.571429</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>28.000000</td>
      <td>29.375000</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-07-31</th>
      <td>115.700000</td>
      <td>423.000000</td>
      <td>212.300000</td>
      <td>420.500000</td>
      <td>1.333333</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>121.300000</td>
      <td>189.500000</td>
      <td>0.666667</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-08-31</th>
      <td>145.090909</td>
      <td>559.818182</td>
      <td>868.818182</td>
      <td>844.000000</td>
      <td>13.363636</td>
      <td>1.000000</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>468.454545</td>
      <td>353.000000</td>
      <td>3.545455</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-09-30</th>
      <td>177.500000</td>
      <td>967.888889</td>
      <td>2815.625000</td>
      <td>1726.000000</td>
      <td>20.714286</td>
      <td>1.285714</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>1508.000000</td>
      <td>565.777778</td>
      <td>8.000000</td>
      <td>0.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-10-31</th>
      <td>207.470588</td>
      <td>1500.444444</td>
      <td>4758.750000</td>
      <td>3668.111111</td>
      <td>20.000000</td>
      <td>1.000000</td>
      <td>2.555556</td>
      <td>1.0</td>
      <td>1.0</td>
      <td>2419.000000</td>
      <td>1151.666667</td>
      <td>8.000000</td>
      <td>0.0</td>
      <td>0.666667</td>
      <td>0.428571</td>
      <td>1.000</td>
    </tr>
    <tr>
      <th>2014-11-30</th>
      <td>237.214286</td>
      <td>1950.500000</td>
      <td>7039.000000</td>
      <td>5843.625000</td>
      <td>20.000000</td>
      <td>1.000000</td>
      <td>4.000000</td>
      <td>1.0</td>
      <td>4.0</td>
      <td>2928.857143</td>
      <td>1256.750000</td>
      <td>8.000000</td>
      <td>0.0</td>
      <td>1.000000</td>
      <td>0.000000</td>
      <td>3.625</td>
    </tr>
    <tr>
      <th>2014-12-31</th>
      <td>271.181818</td>
      <td>2579.625000</td>
      <td>7902.571429</td>
      <td>8985.875000</td>
      <td>20.000000</td>
      <td>1.000000</td>
      <td>4.000000</td>
      <td>1.0</td>
      <td>7.0</td>
      <td>3362.000000</td>
      <td>2495.375000</td>
      <td>8.000000</td>
      <td>0.0</td>
      <td>1.000000</td>
      <td>0.000000</td>
      <td>6.000</td>
    </tr>
    <tr>
      <th>2015-01-31</th>
      <td>287.500000</td>
      <td>2773.333333</td>
      <td>8161.500000</td>
      <td>9844.000000</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>3496.000000</td>
      <td>2945.000000</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.resample('Y').sum()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-12-31</th>
      <td>16513</td>
      <td>76409.0</td>
      <td>177510.0</td>
      <td>181649.0</td>
      <td>636.0</td>
      <td>27.0</td>
      <td>59.0</td>
      <td>16.0</td>
      <td>42.0</td>
      <td>82206.0</td>
      <td>51517.0</td>
      <td>233.0</td>
      <td>0.0</td>
      <td>15.0</td>
      <td>3.0</td>
      <td>38.0</td>
    </tr>
    <tr>
      <th>2015-12-31</th>
      <td>1150</td>
      <td>8320.0</td>
      <td>16323.0</td>
      <td>29532.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>6992.0</td>
      <td>8835.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.resample('M').sum()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-03-31</th>
      <td>36</td>
      <td>756.0</td>
      <td>26.0</td>
      <td>10.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>15.0</td>
      <td>9.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2014-04-30</th>
      <td>365</td>
      <td>1956.0</td>
      <td>221.0</td>
      <td>22.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>77.0</td>
      <td>10.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2014-05-31</th>
      <td>467</td>
      <td>2239.0</td>
      <td>113.0</td>
      <td>66.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>100.0</td>
      <td>11.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2014-06-30</th>
      <td>931</td>
      <td>2614.0</td>
      <td>284.0</td>
      <td>879.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>196.0</td>
      <td>235.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2014-07-31</th>
      <td>1157</td>
      <td>4230.0</td>
      <td>2123.0</td>
      <td>4205.0</td>
      <td>4.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>1213.0</td>
      <td>1895.0</td>
      <td>2.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2014-08-31</th>
      <td>1596</td>
      <td>6158.0</td>
      <td>9557.0</td>
      <td>9284.0</td>
      <td>147.0</td>
      <td>1.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>5153.0</td>
      <td>3883.0</td>
      <td>39.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2014-09-30</th>
      <td>2130</td>
      <td>8711.0</td>
      <td>22525.0</td>
      <td>15534.0</td>
      <td>145.0</td>
      <td>9.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>12064.0</td>
      <td>5092.0</td>
      <td>56.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2014-10-31</th>
      <td>3527</td>
      <td>13504.0</td>
      <td>38070.0</td>
      <td>33013.0</td>
      <td>160.0</td>
      <td>8.0</td>
      <td>23.0</td>
      <td>7.0</td>
      <td>3.0</td>
      <td>19352.0</td>
      <td>10365.0</td>
      <td>64.0</td>
      <td>0.0</td>
      <td>6.0</td>
      <td>3.0</td>
      <td>3.0</td>
    </tr>
    <tr>
      <th>2014-11-30</th>
      <td>3321</td>
      <td>15604.0</td>
      <td>49273.0</td>
      <td>46749.0</td>
      <td>160.0</td>
      <td>8.0</td>
      <td>32.0</td>
      <td>8.0</td>
      <td>32.0</td>
      <td>20502.0</td>
      <td>10054.0</td>
      <td>64.0</td>
      <td>0.0</td>
      <td>8.0</td>
      <td>0.0</td>
      <td>29.0</td>
    </tr>
    <tr>
      <th>2014-12-31</th>
      <td>2983</td>
      <td>20637.0</td>
      <td>55318.0</td>
      <td>71887.0</td>
      <td>20.0</td>
      <td>1.0</td>
      <td>4.0</td>
      <td>1.0</td>
      <td>7.0</td>
      <td>23534.0</td>
      <td>19963.0</td>
      <td>8.0</td>
      <td>0.0</td>
      <td>1.0</td>
      <td>0.0</td>
      <td>6.0</td>
    </tr>
    <tr>
      <th>2015-01-31</th>
      <td>1150</td>
      <td>8320.0</td>
      <td>16323.0</td>
      <td>29532.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>6992.0</td>
      <td>8835.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
      <td>0.0</td>
    </tr>
  </tbody>
</table>
</div>




It is very useful. Yet another advantage of the datetime type is that you can use methods such as `.year` and `.month` to get the year and month of the year.


```python
df.index.year
```




    Int64Index([2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
                ...
                2014, 2014, 2014, 2014, 2014, 2014, 2015, 2015, 2015, 2015],
               dtype='int64', name='YYYYMMDD', length=122)




```python
df.index.month
```




    Int64Index([ 3,  3,  3,  3,  3,  3,  3,  3,  4,  4,
                ...
                12, 12, 12, 12, 12, 12,  1,  1,  1,  1],
               dtype='int64', name='YYYYMMDD', length=122)




```python
df.index.day
```




    Int64Index([22, 24, 25, 26, 27, 28, 29, 31,  1,  4,
                ...
                20, 21, 24, 27, 28, 31,  2,  3,  4,  5],
               dtype='int64', name='YYYYMMDD', length=122)



## Cut Processing (Creating a Histogram)
When you are analyzing data, there are often situations where you want to divide the data under certain conditions and aggregate them. For example, you may want to aggregate the data by season. Here's an example of labeling by season, though the months and seasons don't match.


```python
labels = ['spring', 'summer', 'fall', 'winter'].
df['season'] = pd.cut(list(df.index.month), bins=[0,3,6,9,12], labels=labels, right=True)
df[['season']][5:10]]
```





<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>season</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-03-28</th>
      <td></td>
    </tr>
    <tr>
      <th>2014-03-29</th>
      <td></td>
    </tr>
    <tr>
      <th>2014-03-31</th>
      <td></td>
    </tr>
    <tr>
      <th>2014-04-01</th>
      <td></td>
    </tr>
    <tr>
      <th>2014-04-04</th>
      <td></td>
    </tr>
  </tbody>
</table>
</div>




```python
df[['season']][73:78]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>season</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-09-21</th>
      <td></td>
    </tr>
    <tr>
      <th>2014-09-23</th>
      <td></td>
    </tr>
    <tr>
      <th>2014-09-28</th>
      <td></td>
    </tr>
    <tr>
      <th>2014-10-01</th>
      <td></td>
    </tr>
    <tr>
      <th>2014-10-04</th>
      <td></td>
    </tr>
  </tbody>
</table>
</div>




## How to use query, where, and mask (and sort)
You can use query, where, etc. in the same way as numpy. The usage is intuitively the same as numpy, so you can use it right away. Let's run a query on the number of infected and dead.

The query specifies the conditional expression to be extracted.


```python
df[['Deaths_Liberia','Cases_Liberia']].query('Deaths_Liberia > 100 and Cases_Liberia > 7000')
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Deaths_Liberia</th>
      <th>Cases_Liberia</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-11-15</th>
      <td>2964.0</td>
      <td>7069.0</td>
    </tr>
    <tr>
      <th>2014-11-18</th>
      <td>2963.0</td>
      <td>7082.0</td>
    </tr>
    <tr>
      <th>2014-11-22</th>
      <td>3016.0</td>
      <td>7168.0</td>
    </tr>
    <tr>
      <th>2014-11-28</th>
      <td>3145.0</td>
      <td>7635.0</td>
    </tr>
    <tr>
      <th>2014-12-03</th>
      <td>3177.0</td>
      <td>7719.0</td>
    </tr>
    <tr>
      <th>2014-12-09</th>
      <td>3290.0</td>
      <td>7797.0</td>
    </tr>
    <tr>
      <th>2014-12-18</th>
      <td>3376.0</td>
      <td>7830.0</td>
    </tr>
    <tr>
      <th>2014-12-20</th>
      <td>3384.0</td>
      <td>7862.0</td>
    </tr>
    <tr>
      <th>2014-12-24</th>
      <td>3413.0</td>
      <td>7977.0</td>
    </tr>
    <tr>
      <th>2014-12-28</th>
      <td>3423.0</td>
      <td>8018.0</td>
    </tr>
    <tr>
      <th>2014-12-31</th>
      <td>3471.0</td>
      <td>8115.0</td>
    </tr>
    <tr>
      <th>2015-01-02</th>
      <td>3496.0</td>
      <td>8157.0</td>
    </tr>
    <tr>
      <th>2015-01-03</th>
      <td>3496.0</td>
      <td>8166.0</td>
    </tr>
  </tbody>
</table>
</div>




If the where condition is also specified, data that satisfies the condition will be returned as is, and data that does not satisfy the condition will be returned with NaN stored in it.


```python
df[['Deaths_Liberia']].where(df['Deaths_Liberia'] > 1000)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Deaths_Liberia</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-03-22</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-24</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-25</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-26</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-27</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
    </tr>
    <tr>
      <th>2014-12-31</th>
      <td>3471.0</td>
    </tr>
    <tr>
      <th>2015-01-02</th>
      <td>3496.0</td>
    </tr>
    <tr>
      <th>2015-01-03</th>
      <td>3496.0</td>
    </tr>
    <tr>
      <th>2015-01-04</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2015-01-05</th>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
<p>122 rows × 1 columns</p>
</div>



You can also put another number instead of NaN. This is the same as in numpy and is very helpful.


```python
df[['Deaths_Liberia']].where(df['Deaths_Liberia'] > 3000,0)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Deaths_Liberia</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-03-22</th>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2014-03-24</th>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2014-03-25</th>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2014-03-26</th>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2014-03-27</th>
      <td>0.0</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
    </tr>
    <tr>
      <th>2014-12-31</th>
      <td>3471.0</td>
    </tr>
    <tr>
      <th>2015-01-02</th>
      <td>3496.0</td>
    </tr>
    <tr>
      <th>2015-01-03</th>
      <td>3496.0</td>
    </tr>
    <tr>
      <th>2015-01-04</th>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2015-01-05</th>
      <td>0.0</td>
    </tr>
  </tbody>
</table>
<p>122 rows × 1 columns</p>
</div>



The mask method is the opposite of where, and rewrites the second argument with the one that satisfies the condition.


```python
df[['Deaths_Liberia']].mask(df['Deaths_Liberia'] > 3000, 0)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Deaths_Liberia</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-03-22</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-24</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-25</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-26</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2014-03-27</th>
      <td>6.0</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
    </tr>
    <tr>
      <th>2014-12-31</th>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2015-01-02</th>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2015-01-03</th>
      <td>0.0</td>
    </tr>
    <tr>
      <th>2015-01-04</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2015-01-05</th>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
<p>122 rows × 1 columns</p>
</div>



## How to use null

Data often contains nulls, so for proper data analysis, we need to know how many nulls are in the data and how much they affect the analysis.

With `isnull`, you can create a table with the null part set to false.



```python
df.isnull()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
      <th>season</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-03-22</th>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
    </tr>
    <tr>
      <th>2014-03-24</th>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
    </tr>
    <tr>
      <th>2014-03-25</th>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
    </tr>
    <tr>
      <th>2014-03-26</th>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
    </tr>
    <tr>
      <th>2014-03-27</th>
      <td>False</td>
      <td>False</td>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>2014-12-31</th>
      <td>False</td>
      <td>False</td>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
    </tr>
    <tr>
      <th>2015-01-02</th>
      <td>False</td>
      <td>True</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
    </tr>
    <tr>
      <th>2015-01-03</th>
      <td>False</td>
      <td>False</td>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
    </tr>
    <tr>
      <th>2015-01-04</th>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
    </tr>
    <tr>
      <th>2015-01-05</th>
      <td>False</td>
      <td>False</td>
      <td>True</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>True</td>
      <td>False</td>
    </tr>
  </tbody>
</table>
<p>122 rows × 17 columns</p>
</div>




You can also use the sum method to count the number of nulls in each column.


```python
df.isnull().sum()
```




    Day                      0
    Cases_Guinea            29
    Cases_Liberia           39
    Cases_SierraLeone       35
    Cases_Nigeria           84
    Cases_Senegal           97
    Cases_UnitedStates     104
    Cases_Spain            106
    Cases_Mali             110
    Deaths_Liberia          41
    Deaths_SierraLeone      35
    Deaths_Nigeria          84
    Deaths_Senegal         100
    Deaths_UnitedStates    104
    Deaths_Spain           106
    Deaths_Mali            110
    season                   0
    dtype: int64




Similarly, you can use the mean method to get the average.


```python
df.isnull().mean()
```




    Day                    0.000000
    Cases_Guinea           0.237705
    Cases_Liberia          0.319672
    Cases_SierraLeone      0.286885
    Cases_Nigeria          0.688525
    Cases_Senegal          0.795082
    Cases_UnitedStates     0.852459
    Cases_Spain            0.868852
    Cases_Mali             0.901639
    Deaths_Liberia         0.336066
    Deaths_SierraLeone     0.286885
    Deaths_Nigeria         0.688525
    Deaths_Senegal         0.819672
    Deaths_UnitedStates    0.852459
    Deaths_Spain           0.868852
    Deaths_Mali            0.901639
    season                 0.000000
    dtype: float64




Rewrites the data of Null. Use the method `fillna`.


```python
df.fillna(value={'Cases_Liberia': 0.0, 'Deaths_Liberia': 0.0}, inplace=True)
df.isnull().sum()
```




    Day                      0
    Cases_Guinea            29
    Cases_Liberia            0
    Cases_SierraLeone       35
    Cases_Nigeria           84
    Cases_Senegal           97
    Cases_UnitedStates     104
    Cases_Spain            106
    Cases_Mali             110
    Deaths_Liberia           0
    Deaths_SierraLeone      35
    Deaths_Nigeria          84
    Deaths_Senegal         100
    Deaths_UnitedStates    104
    Deaths_Spain           106
    Deaths_Mali            110
    season                   0
    dtype: int64



Now indeed, the number of nulls in Cases_Liberia and Death_Liberia is zero.

We can also delete rows that have nulls in certain columns, which can be seen by comparing the number of data before and after applying dropna. Before the deletion, the number of rows with



```python
df.shape
```




    (122, 17)




The result is as follows. After the deletion, the following shows that the file has indeed been deleted.


```python
df.dropna(subset=['Cases_Nigeria'], axis=0).shape
```




    (38, 17)




```python
df.dropna(subset=['Cases_Nigeria'], axis=0).isnull().sum()
```




    Day                     0
    Cases_Guinea            2
    Cases_Liberia           0
    Cases_SierraLeone       0
    Cases_Nigeria           0
    Cases_Senegal          13
    Cases_UnitedStates     21
    Cases_Spain            23
    Cases_Mali             27
    Deaths_Liberia          0
    Deaths_SierraLeone      0
    Deaths_Nigeria          0
    Deaths_Senegal         16
    Deaths_UnitedStates    21
    Deaths_Spain           23
    Deaths_Mali            27
    season                  0
    dtype: int64




```python
df.dropna(subset=['Cases_Nigeria'], axis=0).head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Day</th>
      <th>Cases_Guinea</th>
      <th>Cases_Liberia</th>
      <th>Cases_SierraLeone</th>
      <th>Cases_Nigeria</th>
      <th>Cases_Senegal</th>
      <th>Cases_UnitedStates</th>
      <th>Cases_Spain</th>
      <th>Cases_Mali</th>
      <th>Deaths_Liberia</th>
      <th>Deaths_SierraLeone</th>
      <th>Deaths_Nigeria</th>
      <th>Deaths_Senegal</th>
      <th>Deaths_UnitedStates</th>
      <th>Deaths_Spain</th>
      <th>Deaths_Mali</th>
      <th>season</th>
    </tr>
    <tr>
      <th>YYYYMMDD</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-07-23</th>
      <td>123</td>
      <td>427.0</td>
      <td>249.0</td>
      <td>525.0</td>
      <td>0.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>129.0</td>
      <td>224.0</td>
      <td>0.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>秋</td>
    </tr>
    <tr>
      <th>2014-07-27</th>
      <td>126</td>
      <td>460.0</td>
      <td>329.0</td>
      <td>533.0</td>
      <td>1.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>156.0</td>
      <td>233.0</td>
      <td>1.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>秋</td>
    </tr>
    <tr>
      <th>2014-07-30</th>
      <td>129</td>
      <td>472.0</td>
      <td>391.0</td>
      <td>574.0</td>
      <td>3.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>227.0</td>
      <td>252.0</td>
      <td>1.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>秋</td>
    </tr>
    <tr>
      <th>2014-08-01</th>
      <td>132</td>
      <td>485.0</td>
      <td>468.0</td>
      <td>646.0</td>
      <td>4.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>255.0</td>
      <td>273.0</td>
      <td>1.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>秋</td>
    </tr>
    <tr>
      <th>2014-08-04</th>
      <td>135</td>
      <td>495.0</td>
      <td>516.0</td>
      <td>691.0</td>
      <td>9.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>282.0</td>
      <td>286.0</td>
      <td>1.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>秋</td>
    </tr>
  </tbody>
</table>
</div>




## Renaming a column or index
We have already seen above that we often want to rename a column or index, so we use the rename method.


Use the ``python
df.rename(columns={'before': 'after'}, inplace=True)
df.rename(index={'before': 'after'}, inplace=True)

SQL like methods

SQL’s familiar groupby is available in pandas. I personally use this one a lot.

df.groupby(['season'])['season'].count()
season
spring    12
summer    35
autumn    33
winter    42
Name: season, dtype: int64

Output to CSV

Outputs all data stored in the memory.

df.to_csv('./out.csv')
!head  -n 10 out.csv
YYYYMMDD,Day,Cases_Guinea,Cases_Liberia,Cases_SierraLeone,Cases_Nigeria,Cases_Senegal,Cases_UnitedStates,Cases_Spain,Cases_Mali,Deaths_Liberia,Deaths_SierraLeone,Deaths_Nigeria,Deaths_Senegal,Deaths_UnitedStates,Deaths_Spain,Deaths_Mali,season
2014-03-22,0,49.0,0.0,,,,,,,0.0,,,,,,,春
2014-03-24,2,86.0,0.0,,,,,,,0.0,,,,,,,春
2014-03-25,3,86.0,0.0,,,,,,,0.0,,,,,,,春
2014-03-26,4,86.0,0.0,,,,,,,0.0,,,,,,,春
2014-03-27,5,103.0,8.0,6.0,,,,,,6.0,5.0,,,,,,春
2014-03-28,6,112.0,3.0,2.0,,,,,,3.0,2.0,,,,,,春
2014-03-29,7,112.0,7.0,,,,,,,2.0,,,,,,,春
2014-03-31,9,122.0,8.0,2.0,,,,,,4.0,2.0,,,,,,春
2014-04-01,10,127.0,8.0,2.0,,,,,,5.0,2.0,,,,,,夏
df.to_csv('./out.csv', columns=['Deaths_Liberia'])
!head  -n 10 out.csv
YYYYMMDD,Deaths_Liberia
2014-03-22,0.0
2014-03-24,0.0
2014-03-25,0.0
2014-03-26,0.0
2014-03-27,6.0
2014-03-28,3.0
2014-03-29,2.0
2014-03-31,4.0
2014-04-01,5.0

You can disable the header and index.

df.to_csv('./out.csv', header=False, index=False)
!head  -n 10 out.csv
0,49.0,0.0,,,,,,,0.0,,,,,,,春
2,86.0,0.0,,,,,,,0.0,,,,,,,春
3,86.0,0.0,,,,,,,0.0,,,,,,,春
4,86.0,0.0,,,,,,,0.0,,,,,,,春
5,103.0,8.0,6.0,,,,,,6.0,5.0,,,,,,春
6,112.0,3.0,2.0,,,,,,3.0,2.0,,,,,,春
7,112.0,7.0,,,,,,,2.0,,,,,,,春
9,122.0,8.0,2.0,,,,,,4.0,2.0,,,,,,春
10,127.0,8.0,2.0,,,,,,5.0,2.0,,,,,,夏
13,143.0,18.0,2.0,,,,,,7.0,2.0,,,,,,夏

Other additions

Type conversion

Overwrite with a type. This is the notation for batch conversions. inplace is missing and I spent a bit of time on it, so I’m writing it down.

test = pd.DataFrame({'max':[1], 'min':[2], 'mean':[1.5]})

test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   max     1 non-null      int64
 1   min     1 non-null      int64
 2   mean    1 non-null      float64
dtypes: float64(1), int64(2)
memory usage: 152.0 bytes
test = test.astype({'max': float, 'min': float, 'mean': float})

test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   max     1 non-null      float64
 1   min     1 non-null      float64
 2   mean    1 non-null      float64
dtypes: float64(3)
memory usage: 152.0 bytes

Frequently used functions

As a final summary, here is a list of frequently used functions. It is like a personal snipet.

Change the index (to the name of an existing column)

df.set_index('xxxx')

Change column names

df.rename(columns={'before': 'after'}, inplace=True)
df.rename(index={'before': 'after'}, inplace=True)

Sort by a column

df.sort_values(by='xxx', ascending=True)

Sort by index

df.sort_index()

Type conversion for datetime types

df.to_datetime()

Number of NaN’s per column

df.isnull().sum()

References.