slides

3 downloads 556 Views 3MB Size Report
Past, Present, and Future of MDS. – 3 – dissimilarity matrix Δ. O1. O2. O3. L. On-1 On. O1. 0. O2 δ12. 0. O3 δ13
Past, Present, and Future of Multidimensional Scaling Patrick J. F. Groenen *Econometric Institute, Erasmus University Rotterdam, The Netherlands, [email protected], http://people.few.eur.nl/groenen/

Summary: 1 2 3 4 5

What is MDS? Some Historical Milestones Present Future Summary of highlights in MDS

1 What is MDS? • Table of travel times by train between 10 French cities:

Bordeaux Brest Lille Lyon Marseille Nice Parijs Strassbourg Toulouse Tours

Bordeaux Brest 0 9h58 0 6h39 7h11 8h05 7h11 5h47 8h49 8h30 13h36 2h59 4h17 8h08 10h16 2h02 13h52 2h36 5h38

Lille

0 4h52 6h12 8h20 1h04 6h54 9h42 4h17

Lyon

0 1h35 4h33 2h01 4h36 4h25 4h21

Marseille

Nice

0 2h26 3h00 7h04 3h26 5h13

Parijs

0 5h52 11h15 6h29 9h04

Strassb ourg

0 4h01 5h14 1h13

Toulouse

0 10h56 6h03

Tours

0 6h06

0

Lille

Brest

Paris

Strassbourg

Brest Strassbourg Tours

Tours

Lille Paris Lyon

Lyon

Bordeaux Marseille

Bordeaux Toulouse

Toulouse Nice Marseille

Geographic map of France.

Nice

MDS map of travel time by train. Past, Present, and Future of MDS

–2–

dissimilarity matrix ∆ O1

O2

O3

O1

0

O2 O3 M

δ12 δ13

0

M

M

On-1 On

δ1,n-1 δ2,n-1 δ3,n-1 δ1n δ2n δ3n

δ23

0 M

L

On-1

O L L

0

δ2n

On

0

⇓ O • 1

coordinates matrix X

O1

dim 1 x11

dim 2 x12

O2 O3

x21 x31

x22 x32

M On-1

M xn-1,1

M xn-1,2

On

xn1

xn1





Past, Present, and Future of MDS

On



On-1 •O2

O • 3

–3–

• First sentence in Borg and Groenen (2005): Multidimensional scaling (MDS) is a method that represents measurements of similarity (or dissimilarity) among pairs of objects as distances between points of a lowdimensional space. • Who uses MDS? – – – –

psychology, sociology, archaeology, biology,

– – – –

medicine, chemistry, network analysis economists, etc.

• Similarities and dissimilarities: – Large similarity approximated by small distance in MDS. – Large dissimilarity (δij) approximated by large distance in MDS. – General term: proximity.

Past, Present, and Future of MDS

–4–

2 Some Historical Milestones • 1635: van Langren: Provides a distance matrix and a map.

Newcastle Durham Map of Durham county – Cartographer: Jacob van Langren – Date 1635

Past, Present, and Future of MDS

–5–

• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:

Provides a solution for classical MDS based on eigendecomposition

• 1966: Gower:

Provides independently the same solution for classical MDS and gives connection to principal components analysis.

Classical MDS:

minimize Strain(X) = 1/4||J(∆ ∆(2)–D(2)(X))J||2 with J centering matrix by eigendecomposition of –½ J∆ ∆(2)J

Past, Present, and Future of MDS

–6–

• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:

Provides a solution for classical MDS based on eigendecomposition

• 1966: Gower:

Provides independently the same solution for classical MDS and gives connection to principal components analysis.

• 1962: Shepard:

Provides a heuristic for MDS.

Past, Present, and Future of MDS

–7–

• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:

Provides a solution for classical MDS based on eigendecomposition

• 1966: Gower:

Provides independently the same solution for classical MDS and gives connection to principal components analysis.

• 1962: Shepard:

Provides a heuristic for MDS.

• 1964: Kruskal:

Establishes least-squares MDS. Provides a minimization algorithm. Proposes ordinal MDS plus optimization

∑i < j (dˆij − d ij ( X )) ∑i < j d ij2 (X )

2

Minimize Stress-I:

σI(X, dˆ ) =

with dˆ ij disparity satisfying a monotone relation with proximities. Past, Present, and Future of MDS

–8–

– Classic example: Rothkopf (1957) Morse code confusion data + Is there some systematic way in which people confuse Morse codes? + 36 Morse code (26 for alphabet, 10 for numbers) + Subjects task: judge whether two Morse codes are the same or not. For example: + Is .- (N) the same as .-. (R)? Yes (1), or no (2) + + + + +

Stimulus pair presented in two orders: pair NR and RN. Each subject judges many combinations of Morse codes. N = 598. Morse code confusion table: proportion confused. Data are similarities

.-... -.-. -.. M -----

A B C D M 0

A 92 5 4 8 M 9

B 4 84 38 62 M 3

C 6 37 87 17 M 11

Past, Present, and Future of MDS

D 13 31 17 88 M 2

L L L L L O L

0 3 4 12 6 M 94 –9–

– Classic example: Rothkopf (1957) Morse code confusion data

----.--------. ..---

---..

--.--... -.-- --.. ...--.-. -.... -..-... ....-

.---

---

.--.

--.

--.

-.-..

.-.

...........

.

.--

.-.. ..-.

-

......

Past, Present, and Future of MDS

..

– 10 –

• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:

Provides a solution for classical MDS based on eigendecomposition

• 1966: Gower:

Provides independently the same solution for classical MDS and gives connection to principal components analysis.

• 1962: Shepard:

Provides a heuristic for MDS.

• 1964: Kruskal:

Establishes least-squares MDS. Provides a minimization algorithm. Proposes ordinal MDS plus optimization

• 1964: Guttman:

Facet theory and regional interpretation in MDS.

– In facet theory, extra information (external variables) is available on the objects according to the facet design by which the objects are generated:

Past, Present, and Future of MDS

– 11 –

• 1964: Guttman:

Facet theory and regional interpretation in MDS.

+ Every object i belongs to a category on one or more facets. + See, e.g., Guttman (1959), Borg & Shye (1995), Borg & Groenen (1997, 1998) Dissimilarity matrix ∆:

Facet design Facet

O1

O2

O3

On-1

L

O1 0 O2 δ12 0 O3 δ13 δ23 0 M M M M O On-1 δ1,n-1 δ2,n-1 δ3,n-1 L On δ1n δ2n δ3n L

On

0 δ2n

1

2

3

O1 O2

1 1

1 2

3 3

O3 M On-1

2 M 3 3

1 M 1 2

3 M 1 1

On

0

– The extra facet information is used to partition the objects in the MDS space in regions. – Facets are used for regional hypotheses about the empirical structure of the data. a

a

c

c

b b

b

c

a a

c

b b axial

a a a a

b

b a

c

a

c

c

b

b c b

c

b

c

a

a

c

a

a

Past, Present, and Future of MDS

c

b

b

b b

modular

c a

c c

c

c a

b

polar

– 12 –

• 1964: Guttman:

Facet theory and regional interpretation in MDS.

– For the Morse code data, we have additional information available: 1. Length of the signal (.05 to .95 seconds). 2. Signal type (ratio of long versus short beeps). Letter A B C D E F G H I J K L M N O P Q R S

Morse code .-... -.-. -.. . ..-. --. .... .. .---..-.. --. --.--. --..-. ...

Length 25 45 55 35 05 45 45 35 15 65 45 45 35 25 55 55 65 35 25

Signal type 1=2 1>2 1=2 1>2 1 1>2 12 12 12

95

11122

1

2

25

2>1 22211

85

1=2

55

1>2

05

1

1

Regionally constrained Past, Present, and Future of MDS

– 14 –

• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:

Provides a solution for classical MDS based on eigendecomposition

• 1966: Gower:

Provides independently the same solution for classical MDS and gives connection to principal components analysis.

• 1962: Shepard:

Provides a heuristic for MDS.

• 1964: Kruskal:

Establishes least-squares MDS. Provides a minimization algorithm. Proposes ordinal MDS plus optimization

• 1964: Guttman:

Facet theory and regional interpretation in MDS.

• 1969: Horan Dimension weighting models in 3-way MDS 1970: Carroll and Chang: Introduction (INDSCAL, IDIOSCAL)

Past, Present, and Future of MDS

– 15 –

• 1969: Horan: Dimension weighting models in 3-way MDS 1970: Carroll and Chang: Introduction (INDSCAL, IDIOSCAL) – 3-way MDS: more than one dissimilarity matrix: – In the weighted Euclidean model or each source, the common space G may be stretched or shrunk along the axes. – Model δijk ≈ dij(GSk) with + G a single common space and + Sk is a diagonal matrix of dimension weights and – INDSCAL uses STRAIN loss.

s 11 = 1.5 J J J J J

J

J

J

J

J

J

Common space

J J J J J





s 12 = .5 1 s 21 = .8

J J J J J J J J J J J J J J J J J J J J J

J

J

J

J

J

J

J J J J J



∆2

s 22 = 1.5

s 31 = 1 J J J J J J J J J J J J J J J J

Past, Present, and Future of MDS



∆3

s 32 = .3

– 16 –

• 1969: Horan: Dimension weighting models in 3-way MDS 1970: Carroll and Chang: Introduction (INDSCAL, IDIOSCAL) – 3-way MDS: more than one dissimilarity matrix: – In the weighted Euclidean model or each source, the common space G may be stretched or shrunk along the axes. – In the generalized Euclidean model, the common space G may rotated, then stretched or shrunk along (rotated) axes. – Model δijk ≈ dij(GSk) with

J J J J J J J J J J J J J J J J

+ G a single common space and + Sk is any matrix of dimension weights – IDIOSCAL uses STRAIN loss.

Common space J J J J J J J J J J J J J J J J

JJ J

J JJ JJJ J JJ JJJ J

α = 30o 1





s 11 = 1.2 1

s 12 = .5 α = 45o 2



∆2

s 21 = .8 s 22 = .5 α = -10o 3

JJ J J J JJ JJ JJ J J J J J

Past, Present, and Future of MDS



∆3

s 31 = 1 s 32 = .3

– 17 –

• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:

Provides a solution for classical MDS based on eigendecomposition

• 1966: Gower:

Provides independently the same solution for classical MDS and gives connection to principal components analysis.

• 1962: Shepard:

Provides a heuristic for MDS.

• 1964: Kruskal:

Establishes least-squares MDS. Provides a minimization algorithm. Proposes ordinal MDS plus optimization

• 1964: Guttman:

Facet theory and regional interpretation in MDS.

• 1969: Horan 1970: Carroll and Chang:

Dimension weighting models in 3-way MDS Introduction (INDSCAL, IDIOSCAL)

Past, Present, and Future of MDS

– 18 –

2.1 Milestones in MDS algorithms • 1958, 1966:

Torgerson & Gower: solutions for classical MDS.

• 1964: Kruskal:

Introduction Stress-I loss function plus minimization and ordinal MDS. De Leeuw

• 1977: De Leeuw:

Introduction SMACOF (Scaling by MAjorizing a COomplicated function) algorithm for MDS.

• 1980: De Leeuw & Heiser: SMACOF extended to a comprehensive MDS algorithm allowing transformations of the dissimilarities, constraints on the configuration, and three-way dimension weighting extensions. • 1988: De Leeuw:

Heiser

Convergence results derived of the SMACOF algorithm.

• 1995: Groenen, Mathar, Heiser: Extension SMACOF to city-block distances. Past, Present, and Future of MDS

Mathar – 19 –

• Formalizing MDS by minimizing raw Stress over X: σr(X, dˆ ) =

(

∑ w ij dˆij − d ij (X) i 0 – Suppose ∆ = c  1 1 0 1   1 1 1 0 – What configuration does MDS yield with constant data? – Buja, Logan, Reeds, & Shepp (1994) proved:

• • • • • • • • •

1 dimensional points equally spaced on a line

2 dimensional points on concentric circles Past, Present, and Future of MDS

3 dimensional or higher points on a sphere – 25 –

• 1986-1998:

Meulman: integration of (nonlinear) multivariate analysis and MDS.

• 1994

Constant dissimilarities

Buja:

• 1978,1995-

Various authors: local minima in MDS:

1. unidimensional scaling (Defays, De Leeuw, Pliner, Hubert, Arabie, Vera)

Daniel Defays

Larry Hubert & Mathew Hesson-McInnes

Phipps Arabie

Past, Present, and Future of MDS

Jose Fernando Vera

– 26 –

– When do local minima occur? + Unidimensional scaling (De Leeuw & Heiser, 1977; Defays, 1978; Hubert and Arabie 1986; Pliner, 1996). + City-block MDS (Hubert, Arabie & Hesson-McInnes, 1992). + Depends on data: with increasing dimensionality ⇒ fewer local minima and error structure.

Past, Present, and Future of MDS

– 27 –

– When do local minima occur? + Unidimensional scaling (De Leeuw & Heiser, 1977; Defays, 1978; Hubert and Arabie 1986; Pliner, 1996). + City-block MDS (Hubert, Arabie & Hesson-McInnes, 1992). + Depends on data: with increasing dimensionality ⇒ fewer local minima and error structure. – What can you do about local minima? + Multiple random starts. + Tunneling (Groenen & Heiser, 1996) + Distance smoothing: unidimensional scaling (Pliner, 1996), city-block MDS, general MDS (Groenen et al. 1999). + Meta heuristics + simulated annealing: De Soete, Hubert, Arabie (1988), Brusco (2001), Vera & Heiser (2005, 2007) + genetic algorithm, …..

Past, Present, and Future of MDS

– 28 –

• 1986-1998:

Meulman: integration of (nonlinear) multivariate analysis and MDS.

• 1994 Buja:

Constant dissimilarities

• 1978,1995-

Various authors: local minima in MDS:

• 1998: Buja:

Applying weights in Stress to mimic loss functions

+ Choose wij = δ ij− 2 . Then Raw Stress becomes: 2

 dij ( X )  2 2 −2  σr(X) = ∑i < j w ij (δ ij − dij ( X )) = ∑i < j δ ij (δ ij − dij ( X )) = ∑i < j 1 −  δ ij   + These weights make that Stress fits the ratio of distances to dissimilarities:  dij ( X )     = 1 − 5  = .5 large δ ij = 10, dij(X) = 5: eij = 1 − δ ij   10    dij ( X )    = 1 − 1  = .5 small δ ij = 2, dij(X) = 1: eij = 1 − δ ij   2   + This is a very good idea to attach equal importance to small and large errors. Past, Present, and Future of MDS

– 29 –

• 1986-1998:

Meulman: integration of (nonlinear) multivariate analysis and MDS.

• 1994 Buja:

Constant dissimilarities

• 1978,1995-

Various authors: local minima in MDS:

• 1998: Buja:

Applying weights in Stress to mimic loss functions (after Buja, 1998).

Past, Present, and Future of MDS

– 30 –

4 Future • 1999-: Heiser, Meulman, Busing: PROXSCAL (i.e. SMACOF) in SPSS (PASW) • 2009: De Leeuw & Mair:

SMACOF in R.

Past, Present, and Future of MDS

– 31 –

• 1999-: Heiser, Meulman, Busing: PROXSCAL (i.e. SMACOF) in SPSS (PASW) • 2009: De Leeuw & Mair :

SMACOF in R.

• 2000: Tenenbaum, et al.:

Large scale MDS ISOMAP heuristic for.

• 2005-: Groenen, Trosset, Kagie: Large scale MDS through Stress. – Problems: + Computationally too demanding. + Storage is a problem (n2). + Uninformative solutions.

1000

CPU seconds

100 10 1 .1 .01 10

100

1000

10000

n

Past, Present, and Future of MDS

– 32 –

– Solution Groenen, Trosset, Kagie: + Use only a fraction of the data. + Make use of smart designs. + Use sparseness of the data efficiently to obtain a fast majorization algorithm. – Comparison large scale majorization versus SMACOF – n = 1,000 – Proportion nonmissing: .05 (Nnonmis = 23,000 out of 499,500)

– n = 10,000 – Proportion nonmissing: .005 (Nnonmis = 250,000 out of 49,995,000)

0.45

0.45 Large scale majorization SMACOF

0.35

0.35

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05 0

0.5

1 CPU seconds

1.5

Large scale majorization SMACOF

0.4

Stress

Stress

0.4

2

0.05 0

Past, Present, and Future of MDS

50

100 CPU seconds

150

200

– 33 –

– Avoiding uninformative solutions: + Edinburgh Associative Thesaurus (EAT) data set (1968, 1971): words associated with stimulus + cij contains the number of associations between words i and j. + n = 23,219 terms. + 325,060 nonzero association counts between terms (sparseness = 0.1%). + Solution when choosing: δij = 1/cij

Past, Present, and Future of MDS

– 34 –

+ Solution when choosing the gravity model δij =

oi o j c ij

and wij = δ ij5

with oi is the total number of occurrences of term i

Past, Present, and Future of MDS

– 35 –

Past, Present, and Future of MDS

– 36 –

• 1999-: Heiser, Meulman, Busing: PROXSCAL (i.e. SMACOF) in SPSS (PASW) • 2009: De Leeuw & Mair :

SMACOF in R.

• 2000: Tenenbaum, et al.:

Large scale MDS ISOMAP heuristic.

• 2005-: Groenen, Trosset, Kagie: Large scale MDS through Stress. • 2002-: Buja, Cook, Swayne:

Dynamic MDS visualization in the G-GVis software.

Andreas Buja, Deborah Swayne, Di Cook • 2003: Groenen:

Dynamic MDS visualization through iMDS. Past, Present, and Future of MDS

– 37 –

• 1999-: Heiser, Meulman, Busing: PROXSCAL (i.e. SMACOF) in SPSS (PASW) • 2009: De Leeuw & Mair :

SMACOF in R.

• 2000: Tenenbaum, et al.:

Large scale MDS ISOMAP heuristic.

• 2005-: Groenen, Trosset, Kagie: Large scale MDS through Stress. • 2002-: Buja, Cook, Swayne:

Dynamic MDS visualization in the G-GVis software.

• 2003: Groenen:

Dynamic MDS visualization through iMDS.

• 2002: Denœux, Masson, Groenen, Winsberg, Diday: • 2006: Groenen, Winsberg:

Symbolic MDS of intervals

Symbolic MDS of histograms

Past, Present, and Future of MDS

– 38 –

• 2002: Denœux, Masson, Groenen, Winsberg, Diday:

Symbolic MDS of interval data

– Instead of δij, its interval is given: δij ∈ [ δ ij( L ) , δ ij(U ) ] – Find MDS with coordinates xis also in an interval. – Minimize I-Stress:

σ I2 ( X, R ) =

∑i < j w ij (δ ij(L ) − d ij(L ) ( X, R ))

2



(

(U ) w δ ij ij i< j

0.6

+

)

2 − d ij(U ) ( X, R )

0.4

0.2

7 10

8

d28(L)

with 0

d ij( L ) ( X, R )

1 2

-0.2

6 9

longest distance between rectangles

4

d28(U)

longmallest distance between rectangles

d ij(U ) ( X, R )

3

-0.4

5 -0.6

-0.8 -0.8

Past, Present, and Future of MDS

-0.6

-0.4

-0.2

0

0.2

0.4

– 39 –

0.6

• 2006: Groenen, Winsberg:

Symbolic MDS of histograms

– Instead of δij, its empirical distribution (percentiles) are given: α = [.20, .30, .40] Lower bound (L ) δ ijk percentile

Upper bound (U ) percentile δ ijk

k

α

1

.20

20

δ ij(1L )

80

δ ij(U1 )

2

.30

30

δ ij(L2)

70

δ ij(U2 )

3

.40

40

δ ij(L3)

60

δ ij(U3 )

– Minimize 2 σ HistI ( X, R 1,..., R K ) =

( (δ

) ))

∑k ∑i < j w ij δ ijk(L ) − d ij(L ) ( X, R k )

2

∑k ∑i < j w ij

2

(U ) ijk

− d ij(U ) ( X, R k

+

subject to 0 ≤ ris1 ≤ ris2 ≤ … ≤ risK.

Past, Present, and Future of MDS

– 40 –

• 1999-: Heiser, Meulman, Busing: PROXSCAL (i.e. SMACOF) in SPSS (PASW) • 2009: De Leeuw & Mair :

SMACOF in R.

• 2000: Tenenbaum, et al.:

Large scale MDS ISOMAP heuristic.

• 2005-: Groenen, Trosset, Kagie: Large scale MDS through Stress. • 2002-: Buja, Cook, Swayne:

Dynamic MDS visualization in the G-GVis software.

• 2003: Groenen:

Dynamic MDS visualization through iMDS.

• 2002: Denœux, Masson, Groenen, Winsberg, Diday:

Symbolic MDS of intervals

• 2006: Groenen, Winsberg:

Symbolic MDS of histograms

• 2010: Groenen:

Dynamic MDS of Dutch political parties

Past, Present, and Future of MDS

– 41 –

• Political party comparison website for Dutch parliament elections 2010 asks to rate 30 politcal statements (www.stemwijzer.nl), e.g., 1. The government needs to cut the budget by biljons. The budget deficit should disappear at the latest in 2015. Agree Don’t know Disagree 2. Those with high income should pay more taxes. Agree Don’t know

Disagree

– 11 political parties also rated these 30 items. – What is the political landscape in the Dutch elections of 2010? – Do iMDS on the distances between the 11 parties in 30 dimensional space.

Past, Present, and Future of MDS

– 42 –

5 Summary of highlights in MDS Past

Main author(s)

Topic

1958, 1966

Torgerson,Gower

Classical MDS

1964

Kruskal

Least-squares MDS through Stress with transformations

1964

Guttman

Facet theory and regional interpretations in MDS

1969, 1970

Horan, Carroll

Three-way MDS models (INDSCAL, IDIOSCAL)

1977-

De Leeuw and others The majorization algorithm for MDS

Present 1986-1998

Meulman

Distance-based MVA through MDS

1994

Buja

Constant dissimilarities

1978, 1995- Various

Local minimum problem

1998

Smart use of weights in MDS

Buja

Past, Present, and Future of MDS

– 43 –

Future 1999,

Heiser, Meulman, Busing

Modern MDS software: Proxscal in SPSS (PASW)

2000

Tenenbaum, et al.

Large scale MDS ISOMAP heuristic

2002

Buja, Swayne, Cook

Dynamic MDS in GGvis (part of GGobi)

2003

Groenen

Dynamic MDS visualization through iMDS

2005-

Groenen, Trosset, Kagie

Large scale MDS through Stress

2002

Denœux, Masson, Groenen, Winsberg, Diday

Symbolic MDS of interval dissimilarities

2006

Groenen, Winsberg

Symbolic MDS of histograms

2009

De Leeuw, Mair

Smacof package in R

Past, Present, and Future of MDS

– 44 –