Python Tips

When I was doing data analysis, I had a chance to use Venn diagrams to visualize the data, so I’m writing this down so I don’t forget. The Venn diagram is a handy tool for visualizing the relationships between datasets, such as duplicates, and is useful in the EDA stage.

github

  • The jupyter notebook format file on github is here.

google colaboratory

  • If you want to run it on google colaboratory here

Author’s environment

sw_vers
ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G103
Python -V
Python 3.8.5

Install venn

!pip install matplotlib-venn

Import venn2.

from matplotlib_venn import venn2

Create a Venn diagram using venn.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import time
import json

import matplotlib.pyplot as plt
import numpy as np

import japanize_matplotlib

Prepare two suitable data sets.

g1 = set([i for i in range(0,50,1)])
g2 = set([i for i in range(40,90,1)])
plt.figure(figsize=(6,4))
plt.title('Venn diagram for Group:A and Group:B')
venn2(subsets=[set(g1),set(g2)],set_labels=('Group:A','Group:B'))
plt.show()

Three Venn diagrams

In venn, you can create Venn diagrams for three datasets by reading venn3.

from matplotlib_venn import venn3

Prepare three suitable data sets.

g1 = set([i for i in range(0,50,1)])
g2 = set([i for i in range(30,60,1)])
g3 = set([i for i in range(40,90,1)])
plt.figure(figsize=(6,4))
plt.title('Venn diagram for Group:A,Group:B,Group:C')
venn3(subsets=[set(g1),set(g2),set(g3)],set_labels=('Group:A','Group:B','Group:C'))
plt.show()

This is very useful, so don’t forget it.