The last modifications of this post were around 3 years ago, some information may be outdated!
This is a draft, the content is not complete and of poor quality!
What (general)?
- In statistics, the earth mover's distance (EMD) is a measure of the distance between two probability distributions over a region D.[ref]
- In stats or computer science, it's "Earth mover's distance".
- In maths, it's "Wasserstein metric"
- The Wasserstein distance is the minimum cost of transporting mass in converting the data distribution q to the data distribution p.
What (math way)?
The idea borrowed from this. The first Wasserstein distance between the distributions and is:
where is the set of (probability) distributions on whose marginals are and on the first and second factors respectively.
If and are the respective CDFs of and , this distance also equals to:
Example of metric
Suppose we wanna move the blocks on the left to dotted-blocks on the right, we wanna find the "energy" (or metric) to do that.
Energy = weight of block x distance to move that block.
Suppose that weight of each block is 1. All below figures are copied from this.
There are 2 ways to do that,
2 ways of moving blocks from left to right.
Above example gives the same energies () but there are usually different as below example,
Coding
from scipy.stats import wasserstein_distance
arr1 = [1,2,3,4,5,6]
arr2 = [1,2,3,4,5,6]
wasserstein_distance(arr1, arr2)
0.0
# they are exactly the same!
arr1 = [1,2,3]
arr2 = [4,5,6]
wasserstein_distance(arr1, arr2)
# 3.0000000000000004
import seaborn as sns
sns.distplot(arr1, kde=False, hist_kws={"histtype": "step", "linewidth": 3, "alpha": 1, "color": "b"})
sns.distplot(arr2, kde=False, hist_kws={"histtype": "step", "linewidth": 3, "alpha": 1, "color": "r"})
References
- What is an intuitive explanation of the Wasserstein distance?
- GAN — Wasserstein GAN & WGAN-GP
- An example of why we need to use EMD instead of Kolmogorov–Smirnov distance (video).
💬 Comments