The Geometry of Distributions II Information Distances - Computational ...

Distributions[APV10, KPV10]. Problem. Given n points on the simplex Ad, find a mapping to Ak,k ≪ d such that the “distance structure” of the points is preserved.
3MB Sizes 2 Downloads 198 Views
The Geometry of Distributions II Information Distances: Properties And Algorithms

Suresh Venkatasubramanian University of Utah

Lecture Plan

Information geometry

Algorithms for information distances

Spatially-aware information distances

Review: The Rogues’ Club Bregman divergence For convex φ : Rd → R Dφ (p, q) = φ(p) − φ(q) − h∇φ(q), p − qi f-divergence For convex f : R → R, f (1) = 0, Df (p, q) =

qi

∑ pi f ( pi ) i

α-divergence For |α| < 1, 4 Dα (p, q) = [1 − 1 − α2

Z

p(1−α)/2 q(1+łpha)/2 ]

Review: The Rogues’ Club Bregman divergence For convex φ : Rd → R Dφ (p, q) = φ(p) − φ(q) − h∇φ(q), p − qi

α-divergence For |α| < 1, 4 Dα (p, q) = [1 − 1 − α2

Z

p(1−α)/2 q(1+łpha)/2 ]

Bregman Divergences

Properties: Asymmetry

Dφ (p, q) = φ(p) − φ(q) − h∇φ(q), p − qi In general,

D(p,q)

p

q

Dφ (p, q) 6= Dφ (q, p) However, let

D(q,p)

φ∗ (u) = suphu, pi − φ(p) p

q* D*(q*, p*)

then Dφ (p, q) = Dφ∗ (q∗ , p∗ ) where x∗ = ∇φ(x)

p*

Properties II: Generalized Cosine Rule Generalized Cosine Rule: Dφ (p, q) + Dφ (q, r) = Dφ (p, r) + h∇φ(r) − ∇φ(q), p − qi Compare with standard cosine rule:

kp − qk2 kq − rk2 kp − rk2 + = + hr − q, p − qi 2 2 2 In general, Dφ (p, q) behaves like square of a distance (“energy”, not “distance”) q Dφ (p, q) is not a metric in general. Dφ (p, q) is not a metric.

Bregman bisectors are halfplanes

p

r

D( r,q

)

Dφ (r, p) = Dφ (r, q)

D(

r,p

)=

hr, ∇φ(q) − ∇φ(p)i = f (p, q)

Most combinatorial geometry carries over (Voronoi, Delaunay, arrangements, duality)[BNN10]

q

Euclidean Approximations

Dφ (p, q) = φ(p) − φ(q) − h∇φ(q), p − qi

D(p,q)

q

p

Euclidean Approximations

Dφ (p, q) = φ(p) − φ(q) − h∇φ(q), p − qi

D(p,q)

q

p

Dφ (p, q) = s> ∇2 φ(r)s, s = p − q, r ∈ [p, q]

Euclidean Approximations

Dφ (p, q) = φ(p) − φ(q) − h∇φ(q), p − qi

= s> ∇2 φ(r)s, s = p − q, r ∈ [p, q] As q → p, this becomes the Mahalanobis distance, for positive definite A: q kp − qkA = (p − q) > A(p − q)

Euclidean Approximations

R

∗ λmax = max λmax (∇2 φ(x)) x∈R

∗ λmin = min λmin (∇2 φ(x))

µ=

x∈R ∗ λmax ∗ λmin

µ-similarity A distance measure is µ-similar if d(a, b) + d(b, c) ≥

1 d(a, c) µ

Lemma For a given convex function φ, there exists a bounded region R(µ) in which Dφ is µ-similar. Proof follows by exploiting the connection to the Mahalanobis distance, and looking at the variation in eigenvalues of ∇2 φ. Euclidean clustering algorithm ⇒ µ-similar Bregman clustering algorithm. [ABS10, AB10, ABS11, CM08, MR09]

Dimensionality Reduction

Dimensionality Reduction For Distributions[APV10, KPV10]

Problem Given n points on the simplex ∆d , find a mapping to ∆k , k  d such that the “distance structure” of the points is preserved.

The Hellinger Distance

d2H (p, q) =

∑(



pi −



qi ) 2

i

Hellinger distance: Natural geometric interpretation Local approximation of Fisher information, and closely related to Jensen-Shannon and Bhattacharya distance.

Hellinger Is Chordal Distance on Sphere

√ √ √ (p1 , p2 , . . . pd ) → ( p1 , p2 , . . . , pd ) Hellinger distance on simplex ↔ Euclidean distance on sphere. Dimensionality reduction for simplex ↔ Dimensionality reduction on “sphere”<