Past, Present, and Future of MDS. â 3 â dissimilarity matrix Î. O1. O2. O3. L. On-1 On. O1. 0. O2 δ12. 0. O3 δ13
Past, Present, and Future of Multidimensional Scaling Patrick J. F. Groenen *Econometric Institute, Erasmus University Rotterdam, The Netherlands,
[email protected], http://people.few.eur.nl/groenen/
Summary: 1 2 3 4 5
What is MDS? Some Historical Milestones Present Future Summary of highlights in MDS
1 What is MDS? • Table of travel times by train between 10 French cities:
Bordeaux Brest Lille Lyon Marseille Nice Parijs Strassbourg Toulouse Tours
Bordeaux Brest 0 9h58 0 6h39 7h11 8h05 7h11 5h47 8h49 8h30 13h36 2h59 4h17 8h08 10h16 2h02 13h52 2h36 5h38
Lille
0 4h52 6h12 8h20 1h04 6h54 9h42 4h17
Lyon
0 1h35 4h33 2h01 4h36 4h25 4h21
Marseille
Nice
0 2h26 3h00 7h04 3h26 5h13
Parijs
0 5h52 11h15 6h29 9h04
Strassb ourg
0 4h01 5h14 1h13
Toulouse
0 10h56 6h03
Tours
0 6h06
0
Lille
Brest
Paris
Strassbourg
Brest Strassbourg Tours
Tours
Lille Paris Lyon
Lyon
Bordeaux Marseille
Bordeaux Toulouse
Toulouse Nice Marseille
Geographic map of France.
Nice
MDS map of travel time by train. Past, Present, and Future of MDS
–2–
dissimilarity matrix ∆ O1
O2
O3
O1
0
O2 O3 M
δ12 δ13
0
M
M
On-1 On
δ1,n-1 δ2,n-1 δ3,n-1 δ1n δ2n δ3n
δ23
0 M
L
On-1
O L L
0
δ2n
On
0
⇓ O • 1
coordinates matrix X
O1
dim 1 x11
dim 2 x12
O2 O3
x21 x31
x22 x32
M On-1
M xn-1,1
M xn-1,2
On
xn1
xn1
•
⇒
Past, Present, and Future of MDS
On
•
On-1 •O2
O • 3
–3–
• First sentence in Borg and Groenen (2005): Multidimensional scaling (MDS) is a method that represents measurements of similarity (or dissimilarity) among pairs of objects as distances between points of a lowdimensional space. • Who uses MDS? – – – –
psychology, sociology, archaeology, biology,
– – – –
medicine, chemistry, network analysis economists, etc.
• Similarities and dissimilarities: – Large similarity approximated by small distance in MDS. – Large dissimilarity (δij) approximated by large distance in MDS. – General term: proximity.
Past, Present, and Future of MDS
–4–
2 Some Historical Milestones • 1635: van Langren: Provides a distance matrix and a map.
Newcastle Durham Map of Durham county – Cartographer: Jacob van Langren – Date 1635
Past, Present, and Future of MDS
–5–
• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:
Provides a solution for classical MDS based on eigendecomposition
• 1966: Gower:
Provides independently the same solution for classical MDS and gives connection to principal components analysis.
Classical MDS:
minimize Strain(X) = 1/4||J(∆ ∆(2)–D(2)(X))J||2 with J centering matrix by eigendecomposition of –½ J∆ ∆(2)J
Past, Present, and Future of MDS
–6–
• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:
Provides a solution for classical MDS based on eigendecomposition
• 1966: Gower:
Provides independently the same solution for classical MDS and gives connection to principal components analysis.
• 1962: Shepard:
Provides a heuristic for MDS.
Past, Present, and Future of MDS
–7–
• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:
Provides a solution for classical MDS based on eigendecomposition
• 1966: Gower:
Provides independently the same solution for classical MDS and gives connection to principal components analysis.
• 1962: Shepard:
Provides a heuristic for MDS.
• 1964: Kruskal:
Establishes least-squares MDS. Provides a minimization algorithm. Proposes ordinal MDS plus optimization
∑i < j (dˆij − d ij ( X )) ∑i < j d ij2 (X )
2
Minimize Stress-I:
σI(X, dˆ ) =
with dˆ ij disparity satisfying a monotone relation with proximities. Past, Present, and Future of MDS
–8–
– Classic example: Rothkopf (1957) Morse code confusion data + Is there some systematic way in which people confuse Morse codes? + 36 Morse code (26 for alphabet, 10 for numbers) + Subjects task: judge whether two Morse codes are the same or not. For example: + Is .- (N) the same as .-. (R)? Yes (1), or no (2) + + + + +
Stimulus pair presented in two orders: pair NR and RN. Each subject judges many combinations of Morse codes. N = 598. Morse code confusion table: proportion confused. Data are similarities
.-... -.-. -.. M -----
A B C D M 0
A 92 5 4 8 M 9
B 4 84 38 62 M 3
C 6 37 87 17 M 11
Past, Present, and Future of MDS
D 13 31 17 88 M 2
L L L L L O L
0 3 4 12 6 M 94 –9–
– Classic example: Rothkopf (1957) Morse code confusion data
----.--------. ..---
---..
--.--... -.-- --.. ...--.-. -.... -..-... ....-
.---
---
.--.
--.
--.
-.-..
.-.
...........
.
.--
.-.. ..-.
-
......
Past, Present, and Future of MDS
..
– 10 –
• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:
Provides a solution for classical MDS based on eigendecomposition
• 1966: Gower:
Provides independently the same solution for classical MDS and gives connection to principal components analysis.
• 1962: Shepard:
Provides a heuristic for MDS.
• 1964: Kruskal:
Establishes least-squares MDS. Provides a minimization algorithm. Proposes ordinal MDS plus optimization
• 1964: Guttman:
Facet theory and regional interpretation in MDS.
– In facet theory, extra information (external variables) is available on the objects according to the facet design by which the objects are generated:
Past, Present, and Future of MDS
– 11 –
• 1964: Guttman:
Facet theory and regional interpretation in MDS.
+ Every object i belongs to a category on one or more facets. + See, e.g., Guttman (1959), Borg & Shye (1995), Borg & Groenen (1997, 1998) Dissimilarity matrix ∆:
Facet design Facet
O1
O2
O3
On-1
L
O1 0 O2 δ12 0 O3 δ13 δ23 0 M M M M O On-1 δ1,n-1 δ2,n-1 δ3,n-1 L On δ1n δ2n δ3n L
On
0 δ2n
1
2
3
O1 O2
1 1
1 2
3 3
O3 M On-1
2 M 3 3
1 M 1 2
3 M 1 1
On
0
– The extra facet information is used to partition the objects in the MDS space in regions. – Facets are used for regional hypotheses about the empirical structure of the data. a
a
c
c
b b
b
c
a a
c
b b axial
a a a a
b
b a
c
a
c
c
b
b c b
c
b
c
a
a
c
a
a
Past, Present, and Future of MDS
c
b
b
b b
modular
c a
c c
c
c a
b
polar
– 12 –
• 1964: Guttman:
Facet theory and regional interpretation in MDS.
– For the Morse code data, we have additional information available: 1. Length of the signal (.05 to .95 seconds). 2. Signal type (ratio of long versus short beeps). Letter A B C D E F G H I J K L M N O P Q R S
Morse code .-... -.-. -.. . ..-. --. .... .. .---..-.. --. --.--. --..-. ...
Length 25 45 55 35 05 45 45 35 15 65 45 45 35 25 55 55 65 35 25
Signal type 1=2 1>2 1=2 1>2 1 1>2 12 12 12
95
11122
1
2
25
2>1 22211
85
1=2
55
1>2
05
1
1
Regionally constrained Past, Present, and Future of MDS
– 14 –
• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:
Provides a solution for classical MDS based on eigendecomposition
• 1966: Gower:
Provides independently the same solution for classical MDS and gives connection to principal components analysis.
• 1962: Shepard:
Provides a heuristic for MDS.
• 1964: Kruskal:
Establishes least-squares MDS. Provides a minimization algorithm. Proposes ordinal MDS plus optimization
• 1964: Guttman:
Facet theory and regional interpretation in MDS.
• 1969: Horan Dimension weighting models in 3-way MDS 1970: Carroll and Chang: Introduction (INDSCAL, IDIOSCAL)
Past, Present, and Future of MDS
– 15 –
• 1969: Horan: Dimension weighting models in 3-way MDS 1970: Carroll and Chang: Introduction (INDSCAL, IDIOSCAL) – 3-way MDS: more than one dissimilarity matrix: – In the weighted Euclidean model or each source, the common space G may be stretched or shrunk along the axes. – Model δijk ≈ dij(GSk) with + G a single common space and + Sk is a diagonal matrix of dimension weights and – INDSCAL uses STRAIN loss.
s 11 = 1.5 J J J J J
J
J
J
J
J
J
Common space
J J J J J
≈
∆
s 12 = .5 1 s 21 = .8
J J J J J J J J J J J J J J J J J J J J J
J
J
J
J
J
J
J J J J J
≈
∆2
s 22 = 1.5
s 31 = 1 J J J J J J J J J J J J J J J J
Past, Present, and Future of MDS
≈
∆3
s 32 = .3
– 16 –
• 1969: Horan: Dimension weighting models in 3-way MDS 1970: Carroll and Chang: Introduction (INDSCAL, IDIOSCAL) – 3-way MDS: more than one dissimilarity matrix: – In the weighted Euclidean model or each source, the common space G may be stretched or shrunk along the axes. – In the generalized Euclidean model, the common space G may rotated, then stretched or shrunk along (rotated) axes. – Model δijk ≈ dij(GSk) with
J J J J J J J J J J J J J J J J
+ G a single common space and + Sk is any matrix of dimension weights – IDIOSCAL uses STRAIN loss.
Common space J J J J J J J J J J J J J J J J
JJ J
J JJ JJJ J JJ JJJ J
α = 30o 1
≈
∆
s 11 = 1.2 1
s 12 = .5 α = 45o 2
≈
∆2
s 21 = .8 s 22 = .5 α = -10o 3
JJ J J J JJ JJ JJ J J J J J
Past, Present, and Future of MDS
≈
∆3
s 31 = 1 s 32 = .3
– 17 –
• 1635: van Langren: Provides a distance matrix and a map. • 1958: Torgerson:
Provides a solution for classical MDS based on eigendecomposition
• 1966: Gower:
Provides independently the same solution for classical MDS and gives connection to principal components analysis.
• 1962: Shepard:
Provides a heuristic for MDS.
• 1964: Kruskal:
Establishes least-squares MDS. Provides a minimization algorithm. Proposes ordinal MDS plus optimization
• 1964: Guttman:
Facet theory and regional interpretation in MDS.
• 1969: Horan 1970: Carroll and Chang:
Dimension weighting models in 3-way MDS Introduction (INDSCAL, IDIOSCAL)
Past, Present, and Future of MDS
– 18 –
2.1 Milestones in MDS algorithms • 1958, 1966:
Torgerson & Gower: solutions for classical MDS.
• 1964: Kruskal:
Introduction Stress-I loss function plus minimization and ordinal MDS. De Leeuw
• 1977: De Leeuw:
Introduction SMACOF (Scaling by MAjorizing a COomplicated function) algorithm for MDS.
• 1980: De Leeuw & Heiser: SMACOF extended to a comprehensive MDS algorithm allowing transformations of the dissimilarities, constraints on the configuration, and three-way dimension weighting extensions. • 1988: De Leeuw:
Heiser
Convergence results derived of the SMACOF algorithm.
• 1995: Groenen, Mathar, Heiser: Extension SMACOF to city-block distances. Past, Present, and Future of MDS
Mathar – 19 –
• Formalizing MDS by minimizing raw Stress over X: σr(X, dˆ ) =
(
∑ w ij dˆij − d ij (X) i 0 – Suppose ∆ = c 1 1 0 1 1 1 1 0 – What configuration does MDS yield with constant data? – Buja, Logan, Reeds, & Shepp (1994) proved:
• • • • • • • • •
1 dimensional points equally spaced on a line
2 dimensional points on concentric circles Past, Present, and Future of MDS
3 dimensional or higher points on a sphere – 25 –
• 1986-1998:
Meulman: integration of (nonlinear) multivariate analysis and MDS.
• 1994
Constant dissimilarities
Buja:
• 1978,1995-
Various authors: local minima in MDS:
1. unidimensional scaling (Defays, De Leeuw, Pliner, Hubert, Arabie, Vera)
Daniel Defays
Larry Hubert & Mathew Hesson-McInnes
Phipps Arabie
Past, Present, and Future of MDS
Jose Fernando Vera
– 26 –
– When do local minima occur? + Unidimensional scaling (De Leeuw & Heiser, 1977; Defays, 1978; Hubert and Arabie 1986; Pliner, 1996). + City-block MDS (Hubert, Arabie & Hesson-McInnes, 1992). + Depends on data: with increasing dimensionality ⇒ fewer local minima and error structure.
Past, Present, and Future of MDS
– 27 –
– When do local minima occur? + Unidimensional scaling (De Leeuw & Heiser, 1977; Defays, 1978; Hubert and Arabie 1986; Pliner, 1996). + City-block MDS (Hubert, Arabie & Hesson-McInnes, 1992). + Depends on data: with increasing dimensionality ⇒ fewer local minima and error structure. – What can you do about local minima? + Multiple random starts. + Tunneling (Groenen & Heiser, 1996) + Distance smoothing: unidimensional scaling (Pliner, 1996), city-block MDS, general MDS (Groenen et al. 1999). + Meta heuristics + simulated annealing: De Soete, Hubert, Arabie (1988), Brusco (2001), Vera & Heiser (2005, 2007) + genetic algorithm, …..
Past, Present, and Future of MDS
– 28 –
• 1986-1998:
Meulman: integration of (nonlinear) multivariate analysis and MDS.
• 1994 Buja:
Constant dissimilarities
• 1978,1995-
Various authors: local minima in MDS:
• 1998: Buja:
Applying weights in Stress to mimic loss functions
+ Choose wij = δ ij− 2 . Then Raw Stress becomes: 2
dij ( X ) 2 2 −2 σr(X) = ∑i < j w ij (δ ij − dij ( X )) = ∑i < j δ ij (δ ij − dij ( X )) = ∑i < j 1 − δ ij + These weights make that Stress fits the ratio of distances to dissimilarities: dij ( X ) = 1 − 5 = .5 large δ ij = 10, dij(X) = 5: eij = 1 − δ ij 10 dij ( X ) = 1 − 1 = .5 small δ ij = 2, dij(X) = 1: eij = 1 − δ ij 2 + This is a very good idea to attach equal importance to small and large errors. Past, Present, and Future of MDS
– 29 –
• 1986-1998:
Meulman: integration of (nonlinear) multivariate analysis and MDS.
• 1994 Buja:
Constant dissimilarities
• 1978,1995-
Various authors: local minima in MDS:
• 1998: Buja:
Applying weights in Stress to mimic loss functions (after Buja, 1998).
Past, Present, and Future of MDS
– 30 –
4 Future • 1999-: Heiser, Meulman, Busing: PROXSCAL (i.e. SMACOF) in SPSS (PASW) • 2009: De Leeuw & Mair:
SMACOF in R.
Past, Present, and Future of MDS
– 31 –
• 1999-: Heiser, Meulman, Busing: PROXSCAL (i.e. SMACOF) in SPSS (PASW) • 2009: De Leeuw & Mair :
SMACOF in R.
• 2000: Tenenbaum, et al.:
Large scale MDS ISOMAP heuristic for.
• 2005-: Groenen, Trosset, Kagie: Large scale MDS through Stress. – Problems: + Computationally too demanding. + Storage is a problem (n2). + Uninformative solutions.
1000
CPU seconds
100 10 1 .1 .01 10
100
1000
10000
n
Past, Present, and Future of MDS
– 32 –
– Solution Groenen, Trosset, Kagie: + Use only a fraction of the data. + Make use of smart designs. + Use sparseness of the data efficiently to obtain a fast majorization algorithm. – Comparison large scale majorization versus SMACOF – n = 1,000 – Proportion nonmissing: .05 (Nnonmis = 23,000 out of 499,500)
– n = 10,000 – Proportion nonmissing: .005 (Nnonmis = 250,000 out of 49,995,000)
0.45
0.45 Large scale majorization SMACOF
0.35
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05 0
0.5
1 CPU seconds
1.5
Large scale majorization SMACOF
0.4
Stress
Stress
0.4
2
0.05 0
Past, Present, and Future of MDS
50
100 CPU seconds
150
200
– 33 –
– Avoiding uninformative solutions: + Edinburgh Associative Thesaurus (EAT) data set (1968, 1971): words associated with stimulus + cij contains the number of associations between words i and j. + n = 23,219 terms. + 325,060 nonzero association counts between terms (sparseness = 0.1%). + Solution when choosing: δij = 1/cij
Past, Present, and Future of MDS
– 34 –
+ Solution when choosing the gravity model δij =
oi o j c ij
and wij = δ ij5
with oi is the total number of occurrences of term i
Past, Present, and Future of MDS
– 35 –
Past, Present, and Future of MDS
– 36 –
• 1999-: Heiser, Meulman, Busing: PROXSCAL (i.e. SMACOF) in SPSS (PASW) • 2009: De Leeuw & Mair :
SMACOF in R.
• 2000: Tenenbaum, et al.:
Large scale MDS ISOMAP heuristic.
• 2005-: Groenen, Trosset, Kagie: Large scale MDS through Stress. • 2002-: Buja, Cook, Swayne:
Dynamic MDS visualization in the G-GVis software.
Andreas Buja, Deborah Swayne, Di Cook • 2003: Groenen:
Dynamic MDS visualization through iMDS. Past, Present, and Future of MDS
– 37 –
• 1999-: Heiser, Meulman, Busing: PROXSCAL (i.e. SMACOF) in SPSS (PASW) • 2009: De Leeuw & Mair :
SMACOF in R.
• 2000: Tenenbaum, et al.:
Large scale MDS ISOMAP heuristic.
• 2005-: Groenen, Trosset, Kagie: Large scale MDS through Stress. • 2002-: Buja, Cook, Swayne:
Dynamic MDS visualization in the G-GVis software.
• 2003: Groenen:
Dynamic MDS visualization through iMDS.
• 2002: Denœux, Masson, Groenen, Winsberg, Diday: • 2006: Groenen, Winsberg:
Symbolic MDS of intervals
Symbolic MDS of histograms
Past, Present, and Future of MDS
– 38 –
• 2002: Denœux, Masson, Groenen, Winsberg, Diday:
Symbolic MDS of interval data
– Instead of δij, its interval is given: δij ∈ [ δ ij( L ) , δ ij(U ) ] – Find MDS with coordinates xis also in an interval. – Minimize I-Stress:
σ I2 ( X, R ) =
∑i < j w ij (δ ij(L ) − d ij(L ) ( X, R ))
2
∑
(
(U ) w δ ij ij i< j
0.6
+
)
2 − d ij(U ) ( X, R )
0.4
0.2
7 10
8
d28(L)
with 0
d ij( L ) ( X, R )
1 2
-0.2
6 9
longest distance between rectangles
4
d28(U)
longmallest distance between rectangles
d ij(U ) ( X, R )
3
-0.4
5 -0.6
-0.8 -0.8
Past, Present, and Future of MDS
-0.6
-0.4
-0.2
0
0.2
0.4
– 39 –
0.6
• 2006: Groenen, Winsberg:
Symbolic MDS of histograms
– Instead of δij, its empirical distribution (percentiles) are given: α = [.20, .30, .40] Lower bound (L ) δ ijk percentile
Upper bound (U ) percentile δ ijk
k
α
1
.20
20
δ ij(1L )
80
δ ij(U1 )
2
.30
30
δ ij(L2)
70
δ ij(U2 )
3
.40
40
δ ij(L3)
60
δ ij(U3 )
– Minimize 2 σ HistI ( X, R 1,..., R K ) =
( (δ
) ))
∑k ∑i < j w ij δ ijk(L ) − d ij(L ) ( X, R k )
2
∑k ∑i < j w ij
2
(U ) ijk
− d ij(U ) ( X, R k
+
subject to 0 ≤ ris1 ≤ ris2 ≤ … ≤ risK.
Past, Present, and Future of MDS
– 40 –
• 1999-: Heiser, Meulman, Busing: PROXSCAL (i.e. SMACOF) in SPSS (PASW) • 2009: De Leeuw & Mair :
SMACOF in R.
• 2000: Tenenbaum, et al.:
Large scale MDS ISOMAP heuristic.
• 2005-: Groenen, Trosset, Kagie: Large scale MDS through Stress. • 2002-: Buja, Cook, Swayne:
Dynamic MDS visualization in the G-GVis software.
• 2003: Groenen:
Dynamic MDS visualization through iMDS.
• 2002: Denœux, Masson, Groenen, Winsberg, Diday:
Symbolic MDS of intervals
• 2006: Groenen, Winsberg:
Symbolic MDS of histograms
• 2010: Groenen:
Dynamic MDS of Dutch political parties
Past, Present, and Future of MDS
– 41 –
• Political party comparison website for Dutch parliament elections 2010 asks to rate 30 politcal statements (www.stemwijzer.nl), e.g., 1. The government needs to cut the budget by biljons. The budget deficit should disappear at the latest in 2015. Agree Don’t know Disagree 2. Those with high income should pay more taxes. Agree Don’t know
Disagree
– 11 political parties also rated these 30 items. – What is the political landscape in the Dutch elections of 2010? – Do iMDS on the distances between the 11 parties in 30 dimensional space.
Past, Present, and Future of MDS
– 42 –
5 Summary of highlights in MDS Past
Main author(s)
Topic
1958, 1966
Torgerson,Gower
Classical MDS
1964
Kruskal
Least-squares MDS through Stress with transformations
1964
Guttman
Facet theory and regional interpretations in MDS
1969, 1970
Horan, Carroll
Three-way MDS models (INDSCAL, IDIOSCAL)
1977-
De Leeuw and others The majorization algorithm for MDS
Present 1986-1998
Meulman
Distance-based MVA through MDS
1994
Buja
Constant dissimilarities
1978, 1995- Various
Local minimum problem
1998
Smart use of weights in MDS
Buja
Past, Present, and Future of MDS
– 43 –
Future 1999,
Heiser, Meulman, Busing
Modern MDS software: Proxscal in SPSS (PASW)
2000
Tenenbaum, et al.
Large scale MDS ISOMAP heuristic
2002
Buja, Swayne, Cook
Dynamic MDS in GGvis (part of GGobi)
2003
Groenen
Dynamic MDS visualization through iMDS
2005-
Groenen, Trosset, Kagie
Large scale MDS through Stress
2002
Denœux, Masson, Groenen, Winsberg, Diday
Symbolic MDS of interval dissimilarities
2006
Groenen, Winsberg
Symbolic MDS of histograms
2009
De Leeuw, Mair
Smacof package in R
Past, Present, and Future of MDS
– 44 –