Instance Noise isn’t a very satisfying solution(it just adds noise to the inputs and says that the supports now overlap)
a.k.a. EM distance or Wasserstein-1 distance
An alternative to f-divergence which is not a function of the density ratio
If you think of the probability distributions as mounds of dirt, the EM distance describes how much effort it takes to transform one mound of dirt so it is the same as the other using an optimal transport plan
Accounts for both mass and distance
If the supports of the distributions don’t overlap, the EM distance will describe how far apart they are
For the simple example described earlier:
Note that we now have gradients that always point towards the optimal θ!
EM distance is defined as W(Pr,Pg)=infγ∈Π(Pr,Pg)E(x,y)∼γ∣∣x−y∣∣
Notation: think of the infimum as a minimum
Considers all possible“configurations” of pairing up points from the two distributions
Calculates the mean distance of pairs in each configuration
Returns the smallest mean distance across all of the configurations
A problem with existing GANs
What the Lipschitz?