Mining attributed static graphs

1 downloads 208 Views 12MB Size Report
Data mining queries: retrieve sets of data, called ... Constraint-based pattern mining framework. Topological ..... Mini
Céline ROBARDET

Mon, 29.09. Resource-aware Machine Learning

1 / 78

Different kinds of torrents of data .

Resource-aware Machine Learning

2 / 78

Potential increase of our knowledge .

Resource-aware Machine Learning

3 / 78

Viewed as attributed dynamic graphs .

Resource-aware Machine Learning

4 / 78

Mining network data .

Network data brings several questions: • Working with network data is messy • Not just “wiring diagrams” but also dynamics and data (features, attributes) on nodes and edges • Computational challenges • Large scale network data • Algorithmic models as vocabulary for expressing complex

scientific questions • Social science, physics, biology +

Understanding how network structure and node attribute values relate and affect each other.

Resource-aware Machine Learning

5 / 78

Database versus Inductive queries Database queries: retrieve transactions that match a search criteria

+

. Data mining queries: retrieve sets of data, called patterns, that match some criteria

the criteria is computed on individual data or on sets of data

Resource-aware Machine Learning

6 / 78

Constraint-based pattern mining .

Developed for extracting itemsets in transaction databases

Resource-aware Machine Learning

7 / 78

Can be used to analyze graphs . Graphs that are often • Dynamic: when nodes and edges appear/disappear through time • Attributed: when another relation describes the nodes or the

edges themselves Relational graph +

Nodes are uniquely identified through time

Resource-aware Machine Learning

8 / 78

Some inductive queries on graphs . • What are the vertex

attributes that strongly co-vary with the graph structure?

Co-authors that published at ICDE with a high degree and a low clustering coefficient Resource-aware Machine Learning

• What are the sub-graphs

whose vertex attributes evolve similarly?

Airports whose arrival delays increased over the three weeks following Katrina hurricane

9 / 78

Outline of the talk . Constraint-based pattern mining framework Topological patterns in static attributed graphs Mining dynamic graphs Trend dynamic sub-graphs Triggering patterns of topology changes Conclusions and perspectives

Resource-aware Machine Learning

10 / 78

.

Constraint-based pattern mining framework

Resource-aware Machine Learning

Constraint-based pattern mining framework

11 / 78

Pattern mining .

A Pattern φ describes a subgroup of the data D • observed several times • or associated with characteristic properties

The pattern shape is fixed: φ ∈ L +

whose cartinality is exponential or infinite in the size of the data

C evaluates the adequacy of the pattern to the data C(φ, D) → Boolean

Pattern mining task: Find all interesting subgroups Th(L, D, C) = {φ ∈ L | C(φ, D) is true } Resource-aware Machine Learning

Constraint-based pattern mining framework

12 / 78

Pattern constraints .

Constraints are needed for: • only retrieving patterns that describe an interesting subgroup of

the data • making the extraction feasible

Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually. Ü

They are defined up to the partial order ⪯ used for listing the patterns

Resource-aware Machine Learning

Constraint-based pattern mining framework

13 / 78

Pattern constraints .

Constraints are needed for: • only retrieving patterns that describe an interesting subgroup of

the data • making the extraction feasible

Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually. Ü

They are defined up to the partial order ⪯ used for listing the patterns

Resource-aware Machine Learning

Constraint-based pattern mining framework

13 / 78

Pattern constraints .

Constraints are needed for: • only retrieving patterns that describe an interesting subgroup of

the data • making the extraction feasible

Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually. Ü

They are defined up to the partial order ⪯ used for listing the patterns

Resource-aware Machine Learning

Constraint-based pattern mining framework

13 / 78

Constraint properties - 1 .

Monotone constraint

Anti-monotone constraint

∀φ1 ⪯ φ2 , C(φ1 , D) ⇒ C(φ2 , D)

∀φ1 ⪯ φ2 , C(φ2 , D) ⇒ C(φ1 , D)

abcde

abcde

abcd abce abde acde bcde

abcd abce abde acde bcde

abc

abd

abe

acd

ace

ade

bcd

bce

bde

cde

abc

abd

abe

acd

ace

ade

bcd

bce

bde

cde

ab

ac

ad

ae

bc

bd

be

cd

ce

de

ab

ac

ad

ae

bc

bd

be

cd

ce

de

a

b

c

d

e

C(φ, D) ≡ b ∈ φ ∨ c ∈ φ Resource-aware Machine Learning

a

b

c

d

e

C(φ, D) ≡ a ̸∈ φ ∧ c ̸∈ φ

Constraint-based pattern mining framework

14 / 78

Constraint properties - 2 .

Loose AM constraints

Convertible constraints

⪯ is extended to the prefix order ≤ so C(φ, D) ⇒ ∃e ∈ φ : C(φ \ {e}, D) that ∀φ1 ≤ φ2 , C(φ2 , D) ⇒ C(φ1 , D) abcde

abcde

abcd abce abde acde bcde

abcd abce abde acde bcde

abc

abd

abe

acd

ace

ade

bcd

bce

bde

cde

ab

ac

ad

ae

bc

bd

be

cd

ce

de

a

b

c

d

abc

abd

abe

acd

ace

ade

bcd

bce

bde

cde

ab

ac

ad

ae

bc

bd

be

cd

ce

de

C(φ, w) ≡ avg(w(φ)) > σ w(a) ≥ w(b) ≥ w(c) ≥ w(d) ≥ w(e) Resource-aware Machine Learning

a

e

b

c

d

e

C(φ, w) ≡ var(w(φ)) ≤ σ

Pei and Han - 2000 Constraint-based pattern mining framework

Bonchi and Lucchese - 2007 15 / 78

Enumeration strategy .

Binary partition: the element 'a' is enumerated abcde

abcd abce abde acde bcde

abc

abd

abe

acd

ace

ade

bcd

bce

bde

cde

ab

ac

ad

ae

bc

bd

be

cd

ce

de

a

Resource-aware Machine Learning

b

c

d

e

Constraint-based pattern mining framework

16 / 78

Enumeration strategy .

Binary partition: the element 'a' is enumerated R∨

R∨

.

a ∈ R∨ \ R∧

R∧ ∪ {a} R∨ \ {a}

R∧

R∧

Resource-aware Machine Learning

Constraint-based pattern mining framework

16 / 78

Constraint evaluation .

Monotone constraint R∨ C(R∨ , D) is false . empty

R∧

Resource-aware Machine Learning

Anti-monotone constraint R∨

. empty C(R∧ , D) is false R∧

Constraint-based pattern mining framework

17 / 78

Constraint evaluation .

Monotone constraint R∨ C(R∨ , D) is false . empty

R∧

Resource-aware Machine Learning

Anti-monotone constraint R∨

. empty C(R∧ , D) is false R∧

Constraint-based pattern mining framework

17 / 78

Constraint evaluation .

Monotone constraint R∨ C(R∨ , D) is false . empty

R∧

Resource-aware Machine Learning

Anti-monotone constraint R∨

. empty C(R∧ , D) is false R∧

Constraint-based pattern mining framework

17 / 78

Constraint evaluation .

Monotone constraint R∨ C(R∨ , D) is false . empty

R∧

Resource-aware Machine Learning

Anti-monotone constraint R∨

. empty C(R∧ , D) is false R∧

Constraint-based pattern mining framework

17 / 78

A new class of constraints .

Piecewise monotone and anti-monotone constraints 1.

C involves p times the pattern φ: C(φ, D) = f(φ1 , · · · φp , D)

2.

fi,φ (x) = (φ1 , · · · φi−1 , x, φi+1 , · · · , φp , D)

3.

∀i = 1 . . . p, fi,φ is either monotone or anti-monotone: { fi,φ (x) ⇒ fi,φ (y) iff fi,φ is monotone ∀ x ⪯ y, fi,φ (y) ⇒ fi,φ (x) iff fi,φ is anti-monotone

Resource-aware Machine Learning

L. Cerf, J. Besson, C. Robardet, J-F. Boulicaut - 2009 Constraint-based pattern mining framework 18 / 78

An example . • ∀e, w(e) ≥ 0 • C(φ, w) ≡ avg(w(φ)) > σ ≡



w(e) |φ|

e∈φ

> σ.

C(φ, D) is piecewise monotone and anti-monotone with ∑ e∈φ1 w(e) f(φ1 , φ2 , D) = |φ2 | ∀x ⪯ y, • f1,φ is monotone: f(x, φ2 , D) = • f2,φ is anti-monotone: ∑

f(φ1 , y, D) =

e∈φ1

Resource-aware Machine Learning

|y|

w(e)



e∈x w(e) |φ2 |



>σ⇒

e∈φ1

|x|

w(e)



>σ⇒

w(e) |φ2 |

e∈y





Constraint-based pattern mining framework

19 / 78

Piecewise constraint exploitation . R∨

Evaluation

If f(R∨ , R∧ , D) = then R is empty.



e∈R∨ w(e) |R∧ |

≤σ

. empty

R∧

Propagation • ∃e ∈ R∨ \ R∧ such that f(R∨ \ {e}, R∧ , D) ≤ σ, then e is moved in

R∧

• ∃e ∈ R∨ \ R∧ such that f(R∨ , R∧ ∪ {e}, D) ≤ σ, then e is removed

from R∨

Resource-aware Machine Learning

Constraint-based pattern mining framework

20 / 78

Algorithmic principles . Function Generic_CBPM_enumeration(R∨ , R∧ ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:

if Check_constraints(R∧ , R∨ ) then (R∧ , R∨ )← Constraint_Propagation(R∧ , R∨ ) if R∧ = R∨ then output R∧ else for all e ∈ R∨ \ R∧ do Generic_CBPM_Enumeration(R∧ ∪ {e}, R∨ ) Generic_CBPM_Enumeration(R∧ , R∨ \ {e}) end for end if end if

Resource-aware Machine Learning

Constraint-based pattern mining framework

21 / 78

Case studies . Mining of • Formal concepts [IDA journal 05] • Fault-tolerant patterns [KDID 05, ICCS 06] • Closed patterns in n-ary relations [SDM 08] • Parallel episodes [SDM 09] • Subspace clustering [SDM 09] • Topological patterns in static attributed graphs [TKDE 13] • Evolution patterns in dynamic graphs [ICDM 09] • Trend dynamic sub-graphs [DS 12, PKDD 13] • Triggering patterns [ASONAM 14]

Resource-aware Machine Learning

Constraint-based pattern mining framework

22 / 78

.

Topological patterns in static attributed graphs

Resource-aware Machine Learning

Topological patterns in static attributed graphs

23 / 78

Outline . Constraint-based pattern mining framework Topological patterns in static attributed graphs Topological properties Rank correlation constraint Mining dynamic graphs Trend dynamic sub-graphs Triggering patterns of topology changes Conclusions and perspectives Resource-aware Machine Learning

Topological patterns in static attributed graphs

24 / 78

Mining attributed static graphs .

• Composed by two relations • Are these two relations

linked? + Do the attribute values depend on the role played by the vertex in the graph?

New pattern domain that identifies sets of node attributes that co-vary with the graph structure +

the graph structure is captured by topological properties

Resource-aware Machine Learning

A. Prado, M. Plantevit, C. Robardet, J-F. Boulicaut - 2013 Topological patterns in static attributed graphs 25 / 78

Topological node properties .

Topological properties are derived from the edges of the graph • Direct neighborhood ▶

degree, clustering

• Connection to all other

vertices ▶ Centrality measures Betw. Cent. EigenVector Cent. Size of the community Clust. Coeff. Degree cent. Microscopic View

Resource-aware Machine Learning

Macroscopic View Closeness Cent.

PageRank Size of the largest quasi-cliques #quasi-cliques involving v

Vertex attributes

Topological patterns in static attributed graphs

26 / 78

Topological patterns . h i t 2 5.1 10

A h 9

Topological pattern language

i 3.5

t 11

h i t 25 1.5 18

0 h

i 2 5.2

h 17

ms ≡ (m, s) ∈ M × {+, −}

• L = 2M×{+,−}

F h 13

1

C

i 2

t 9

t 6

N

166

E

h 47

i 0

t 8

h 32

i 1

t 12

P i 7

i 2.5

0

i 6

O t 1

t 15

h 3

i 5

t 3

I K

i 5.3

t 0

27 / 78

14

h 10

12

Betweenness centrality

Topological patterns in static attributed graphs

i 4

0

J

M h 5

0

h 8

0

t 2 h 1

Resource-aware Machine Learning

G

t 7

91

54

h 0

t 14

0

52

73

D

• M × {+, −}

i 4.25

H

B

• M: set of all properties +

h 7

0

L

i 3

1

h i 6 4.5

t 4

t 5

Pattern examples .

Co-authorship network • Node attributes: number of publications in some conferences

(KDD, ICDM, VLDB, etc.).

Patterns that can be discovered • The higher the number of publications at VLDB and SIGMOD, the

higher the centrality value. • The higher the number of publications at SAC • The lower the number of publications at KDD and ICDM. • The lower the centrality value.

Resource-aware Machine Learning

Topological patterns in static attributed graphs

28 / 78

How to extract such patterns? . “Propositionalization": the topological properties are associated to each node.

+

Tabular data

Resource-aware Machine Learning

Topological patterns in static attributed graphs

29 / 78

Topological constraint .

Sets of properties that behave in a similar manner for a large number of vertex pairs • Kendall tau rank correlation coefficient + +

based on the ranks of pattern property values counts the number of vertex pairs that are ordered similarly on all pattern properties

Age .

. Income

.

.

.

Resource-aware Machine Learning

.

.

.

.

.

.

.

.

.

.

Topological patterns in static attributed graphs

.

.

.

.

.

.

30 / 78

.

Example .

P = {KDD+ , ClusCoef− } • Supported by 5 pairs • Ex: (A2,A1) • suppτ (P) =

 5  

5 2

Resource-aware Machine Learning



=

1 2

Topological patterns in static attributed graphs

31 / 78

Constraint properties .

• Let P ∈ L, Suppτ (P) = {

▷s ≡

< when s is + > when s is −

|{(u,v)∈V2 | ∀ms ∈P:m(u)▷s m(v)}|

(2n)

with

and n is the number of vertices

Ctopo (P, D) ≡ Suppτ (P) ≥ σ

• Ctopo is anti-monotone • Redundant symmetrical patterns

Suppτ (A+ , B− ) = Suppτ (A− , B+ ) • Support computation quadratic on the number of vertices: + +

An upper bound to avoid, in linear time, some support computation ECLAT mining algorithm with range trees to ease the search of supporting vertices

Resource-aware Machine Learning

Topological patterns in static attributed graphs

32 / 78

Constraint properties .

• Let P ∈ L, Suppτ (P) = {

▷s ≡

< when s is + > when s is −

|{(u,v)∈V2 | ∀ms ∈P:m(u)▷s m(v)}|

(2n)

with

and n is the number of vertices

Ctopo (P, D) ≡ Suppτ (P) ≥ σ

• Ctopo is anti-monotone • Redundant symmetrical patterns

Suppτ (A+ , B− ) = Suppτ (A− , B+ ) • Support computation quadratic on the number of vertices: + +

An upper bound to avoid, in linear time, some support computation ECLAT mining algorithm with range trees to ease the search of supporting vertices

Resource-aware Machine Learning

Topological patterns in static attributed graphs

32 / 78

Two case studies .

Movie Graph

DBLP Graph

• Netflix and IMDb (1998-2005). • Users rate movies (1 to 5 stars). • 5,972 nodes (movies). • 64 338 edges (common actor). • 5 local attributes: Release year, num_users (that rate the movie), avg_rating, stdev_rating, et num_actors • 9 topological properties

Resource-aware Machine Learning

• 42 252 nodes. • 210 320 edges (common

publication). • 29 local attributes: Nb of publications (1990-2013) in 29 journals and conference venues • 9 topological properties

Topological patterns in static attributed graphs

33 / 78

Some results . ä

{avg_rating+ , num_customers+ }

“People rate the films they like." ä

{num_customers+ , Degree+ } “People rate movies with major actors." ( e.g., R de Niro, S. Connery, and T. Hanks )

ä

{stdev_rating+ , PageRank− } “Controversial movies are isolated."

Resource-aware Machine Learning

Topological patterns in static attributed graphs

34 / 78

Some results . ä

{avg_rating+ , num_customers+ }

“People rate the films they like." ä

{num_customers+ , Degree+ } “People rate movies with major actors." ( e.g., R de Niro, S. Connery, and T. Hanks )

ä

{stdev_rating+ , PageRank− } “Controversial movies are isolated."

Resource-aware Machine Learning

Topological patterns in static attributed graphs

34 / 78

Some results ä

{avg_rating+ , num_customers+ }

.

“People rate the films they like." ä ä

{num_customers+ , Degree+ } “People rate movies with major actors." ( e.g., R de Niro, S. Connery, and T. Hanks ) {stdev_rating+ , PageRank− } “Controversial movies are isolated."

Resource-aware Machine Learning

Topological patterns in static attributed graphs

34 / 78

Results on DBLP - 1 .

Is publishing at SAC penalizing? • {SAC+ , ECML/PKDD− }, {SAC+ , KDD− }, {SAC+ , VLDB− } • {SAC+ , PageRank− } • Of course not! • Bias in the data (SAC has a much wider spectrum than databases

and data mining).

Resource-aware Machine Learning

Topological patterns in static attributed graphs

35 / 78

Results on DBLP - 1 .

Is publishing at SAC penalizing? • {SAC+ , ECML/PKDD− }, {SAC+ , KDD− }, {SAC+ , VLDB− } • {SAC+ , PageRank− } • Of course not! • Bias in the data (SAC has a much wider spectrum than databases

and data mining).

Resource-aware Machine Learning

Topological patterns in static attributed graphs

35 / 78

Results on DBLP - 2 .

{Prk+ , Deg+ , Betw+ , ClusCoef− } • Dense part ≡ database • Other parts ≡ NLP, ML

Conclusion: These behaviors are not specific to a community!

Resource-aware Machine Learning

Topological patterns in static attributed graphs

36 / 78

PageRank and conferences . Top 5 publications related to the emergence of {Deg+ } and {Betw+ } for Prk+ (A) and the top 5 authors (B) (A) Rank 1 2 3 4 5

Deg Publication ECML/PKDD+ IEEE TKDE+ PAKDD+ DASFAA+ ICDM+

Factor 2.5 2.28 2.21 2.09 1.95

Resource-aware Machine Learning

Between Publication Factor PVLDB+ 5.67 EDBT+ 5.11 VLDB J.+ 4.35 SIGMOD+ 4.25 ICDE+ 3.42

(B) Prk+ Deg+ Prk+ Between+ + + ECML/PKDD PVLDB Christos Faloutsos Gerhard Weikum Jiawei Han Jiawei Han Philip S. Yu David Maier Bing Liu Philip S. Yu C. Lee Giles Hector GarciaMolina

Topological patterns in static attributed graphs

37 / 78

.

Mining dynamic graphs

Resource-aware Machine Learning

Mining dynamic graphs

38 / 78

Outline . Constraint-based pattern mining framework Topological patterns in static attributed graphs Mining dynamic graphs Constrained subgraphs in static graphs Temporal relationships Trend dynamic sub-graphs Triggering patterns of topology changes Conclusions and perspectives Resource-aware Machine Learning

Mining dynamic graphs

39 / 78

Objective .

• Study the evolution of a

graph over time • Capture strong interactions

in the graph and their evolution over time + Evolving communities of social dynamic graphs

Resource-aware Machine Learning

Mining dynamic graphs

C. Robardet - ICDM 2009 40 / 78

The data . Time 1

Time 2

Time 3

Focus on the microscopic level and propose a constraint-based mining approach to uncover evolving patterns.

Resource-aware Machine Learning

Mining dynamic graphs

41 / 78

Mine subgraphs in each static graph . Time 1

Time 2

Time 3







Compute highly connected subgraphs that are also isolated.

Resource-aware Machine Learning

Mining dynamic graphs

42 / 78

Pseudo-clique constraint . Subgraphs of interest are usually those made of vertices that have a high density of edges. ⇒ Cliques are subgraphs with maximal density ⇒ Pseudo-cliques relax this strong property using a user-defined threshold σ ∈ [0, 1]: S is a pseudo-clique iff

2|ES | ≥σ |S|(|S| − 1)

Properties The pseudo-clique constraint is not anti-monotonic: Expanding a subgraph by adding a vertex could make the density increase or decrease. Resource-aware Machine Learning

Mining dynamic graphs

43 / 78

Loose anti-monotone constraint .

Pseudo-cliques can always be grown from a smaller pseudo-clique with one vertex less [F. Zhu et al. (PAKDD 2007)] • Let S be a pseudo-clique • Let v⋆ be a vertex of S having the smallest degree on S • Thus S \ {v⋆ } is also a pseudo-clique σ=

2 3

Resource-aware Machine Learning

S \ {v⋆ }

S

Mining dynamic graphs

44 / 78

Other useful constraints .

Maximality and isolated constraint Not all the pseudo-cliques of a graph are of importance: • Some are redundant (because non maximal) • Others have many links to external vertices

The isolation constraint imposes a maximum to the average number of external links per vertex: ∑ u∈S (deg(u) − degS (u)) ≤γ |S|

Resource-aware Machine Learning

Mining dynamic graphs

45 / 78

Other useful constraints . σ = 0.7

Without Isolated constraint

9 patterns

Resource-aware Machine Learning

With γ = 1

1 pattern

Mining dynamic graphs

46 / 78

Temporal relationships among subgraphs .

Time 1

Time 2

Time 3









Combine patterns from consecutive time stamps to construct a global model of the dynamic of the graphs.

Resource-aware Machine Learning

Mining dynamic graphs

47 / 78

Global model of dynamic graph .

Objective

Structured the numerous pseudo-cliques obtained to answer the following questions: • Do the strong interactions growth, diminish or remain stable over

time? • When do the change occur?

Temporal relationships between time consecutive subgraphs • Stability: S remains the same between t − 1 and t • Growth: S enlarges at time t • Diminution: S shrinks at time t • Extinction: S disappears at time t • Emergence: S appears at time t Resource-aware Machine Learning

Mining dynamic graphs

48 / 78

Lyon's shared bicycle system Velo'v .

Velo'v system • 340 stations spread in Lyon • 4000 bikes available in those stations • rental at any station, return it at any other one

Velo'v data • More than 13 millions of bicycle trips • Time-stamps of the trip • Rent and return station IDs Resource-aware Machine Learning

Mining dynamic graphs

49 / 78

Results for Velo'v graph .

Evolving patterns (σ = 0.8 and γ = 5) of Velo'v stations and their localization in Lyon. Patterns are shadowed on the map. Resource-aware Machine Learning

Mining dynamic graphs

50 / 78

.

Trend dynamic sub-graphs

Resource-aware Machine Learning

Trend dynamic sub-graphs

51 / 78

Outline . Constraint-based pattern mining framework Topological patterns in static attributed graphs Mining dynamic graphs Trend dynamic sub-graphs Definition of patterns Definition of constraints Triggering patterns of topology changes Conclusions and perspectives Resource-aware Machine Learning

Trend dynamic sub-graphs

52 / 78

Dynamic attributed graph .

The data {Gt = (V, Et ), t = 1, · · · , tmax } and A a set of ordinal attributes: ai : V × T → Di , with Di the domain of ai

Resource-aware Machine Learning

Trend dynamic sub-graphs

53 / 78

Trend Dynamic Sub-graphs .

What are the sub-graphs whose vertex attributes evolve similarly? Mining maximal sub-graphs that satisfy some constraints on the graph topology and on the attribute values • The connectivity of the dynamic subgraphs is constrained by a

maximum diameter value. • To be more robust towards intrinsic inter-individual variability, we

do not compare raw numerical values, but their trends.

Resource-aware Machine Learning

E. Desmier, M. Plantevit, C. Robardet, J-F. Boulicaut - PKDD 2013 Trend dynamic sub-graphs 54 / 78

Language of patterns . L = {(U, S, Ω) | U ⊂ V, S = ⟨t1 , · · · , ts ⟩, Ω ⊂ A × {+, −}} − Example: ({A, C, D}, ⟨t0 , t1 ⟩, {a+ 1 ,a3 }) ∈ L

a1 0

a2 2

D

a2 1

t0

Resource-aware Machine Learning

a3 4

a1 2

a2 0

C

a1 3

a2 2

a3 4

a1 1

a2 3

.

a3 2

F

C E

D

a3 0

a1 4

a2 2

t1

Trend dynamic sub-graphs

a3 1

a1 3

a2 1

a2 1

a3 0

.

.

. F E

a1 2

a1 2 B

A

.

a3 4

.

a1 a2 1 3

a3 0

.

.

a3 5

.

C

a2 1

a1 a2 6 3

a3 0

B

A

a1 2

a2 1

.

E

a1 2

a1 5

a2 4

D .

a3 1

.

.

a3 1

a2 2

.

.

. .

F

a1 5

.

a3 5

.

a2 3

a3 5

B

A

a1 0

a2 4

.

a1 2

.

a3 3

.

a1 a2 3 5

a3 3

a1 5 t2

55 / 78

a2 1

a3 0

a3 3

Definition of constraints .

A pattern (U, S, Ω) satisfies

• Cdiam ((U, S, Ω), D) ≡ diameterGt (U) ≤ k, ∀t ∈ S

...

1

...

...

Diameter

• Ctrend ((U, S, Ω), D) ≡ ∀u ∈ U, ∀t ∈ S, ∀(a, m) ∈ Ω

{

a(u, t) < a(u, t + 1), if m = + a(u, t) > a(u, t + 1), if m = −

• Cmax ((U, S, Ω), D) ≡ U, S and Ω cannot be enlarged without

invalidating one or both of the above constraints.

Resource-aware Machine Learning

Trend dynamic sub-graphs

56 / 78

Constraint properties .

• Ctrend ((U, S, Ω), D) is anti-monotone • Cdiam ((U, S, Ω), D) is piecewise monotone:

diameterGt (U) ≤ k ≡ max dGt (U) (v, w) ≤ k v,w∈U

f(U1 , U2 , D) = max dGt (U2 ) (v, w) v,w∈U1

• f1,U is anti-monotone i.e. if it is satisfied on U1 , it is also satisfied for

any of its subsets; • f2,U is monotone i.e. if it is satisfied on Gt (U2 ), then, adding some

vertices and edges to Gt (U2 ) will not increase its value +

We can use the propagation mechanisms

• Cmax ((U, S, Ω), D) is pushed using specific mechanisms Resource-aware Machine Learning

Trend dynamic sub-graphs

57 / 78

Katrina hurricane results .

Top pattern w.r.t. time specificity (in red)

Top pattern w.r.t. trend specificity (Yellow)

• 71 airports whose arrival delays increase over 3 weeks.

• 5 airports whose number of flights increased over 3 weeks

• Arrival delays never increased in these airports during another week.

• Substitutions flights were provided from these airports during this period.

• The hurricane strongly influenced the domestic flight punctuality.

• This behavior is rather rare in the rest of the graph

Resource-aware Machine Learning

Trend dynamic sub-graphs

58 / 78

.

Triggering patterns of topology changes

Resource-aware Machine Learning

Triggering patterns of topology changes

59 / 78

Outline . Constraint-based pattern mining framework Topological patterns in static attributed graphs Mining dynamic graphs Trend dynamic sub-graphs Triggering patterns of topology changes Triggering patterns Conclusions and perspectives

Resource-aware Machine Learning

Triggering patterns of topology changes

60 / 78

Context & motivation . Networks structurally change over time: How to describe these dynamics? a: 4 b: 1 c: 5 deg: 1

deg +

c-

a+ b + u2

u2

u1

u3

u5

u4

a: 2 b: 0 c: 6 deg: 2

u2

u2

a: 7 b: 4 u1 c: 5 deg: 1

u3

u5

u4

a: 2 b: 1 c: 6 deg: 2

a: 7 b: 5 c: 2 deg: 1

u1

u3

u5

u4

a: 5 b: 4 c: 5 deg: 2

a: 8 b: 5 c: 1 deg: 4

u3

u5

u4

2

a: 5 b: 5 c: 5 deg: 2

3

4

u2

a: 9 b: 6 u1 c: 1 deg: 4

u3

u5

u4

c-

a+ b +

1

u2

u1

a: 6 b: 5 c: 2 deg: 2

a: 9 b: 6 u1 c: 0 deg: 4

u3

u5

u4

a: 7 b: 5 c: 1 deg: 4

deg +

5

6

Intuition Consider attributed graphs evolving through time The variation of some attribute values (nodes attributes) of a node can lead in several cases to a structural change (topological properties).

Resource-aware Machine Learning

M. Kaytoue, Y. Pitarch, M. Plantevit, C. Robardet -- ASONAM 2014 Triggering patterns of topology changes 61 / 78

Context & motivation .

a: 4 b: 1 c: 5 deg: 1

deg +

c-

a+ b + u2

u2

u1

u3

u5

u4

a: 2 b: 0 c: 6 deg: 2

u2

u2

a: 7 b: 4 u1 c: 5 deg: 1

u3

u5

u4

a: 2 b: 1 c: 6 deg: 2

a: 7 b: 5 c: 2 deg: 1

u1

u3

u5

u4

a: 5 b: 4 c: 5 deg: 2

a: 8 b: 5 c: 1 deg: 4

u2

u1

u3

u5

u4

a: 5 b: 5 c: 5 deg: 2

2

u3

u5

u4

a: 6 b: 5 c: 2 deg: 2

a: 9 b: 6 u1 c: 0 deg: 4

u3

u5

u4

c-

a+ b +

1

u2

a: 9 b: 6 u1 c: 1 deg: 4

3

4

deg +

6

5

Triggering pattern example: ⟨{a+ , b+ }, {c− }, {deg+ }⟩ a+ b+ c− deg+

Updating his status more often giving positive opinions about others receiving less negative opinions from the others is often followed by an increase of user's popularity

Resource-aware Machine Learning

Triggering patterns of topology changes

62 / 78

a: 7 b: 5 c: 1 deg: 4

Dynamic attributed graphs . Let G = {G1 , . . . , Gt } be a sequence of t static attributed graphs • Gi = (V, Ei , F) with T = {1, . . . , t} • F the set of numerical attributes that map each vertex-time pair to

a real value: ∀f ∈ F, f : V × T → R. ä F gathers the node attributes and the node topological properties

a: 4 b: 1 c: 5 deg: 1

deg +

c-

a+ b + u2

u2

u1

u3

u5

u4

a: 2 b: 0 c: 6 deg: 2

u2

u2

a: 7 b: 4 u1 c: 5 deg: 1

u3

u5

u4

a: 2 b: 1 c: 6 deg: 2

a: 7 b: 5 c: 2 deg: 1

u1

u3

u5

u4

a: 5 b: 4 c: 5 deg: 2

a: 8 b: 5 c: 1 deg: 4

u2

u1

u3

u5

u4

a: 5 b: 5 c: 5 deg: 2

2

Resource-aware Machine Learning

u3

u5

u4

a: 6 b: 5 c: 2 deg: 2

a: 9 b: 6 u1 c: 0 deg: 4

u3

u5

u4

c-

a+ b +

1

u2

a: 9 b: 6 u1 c: 1 deg: 4

3

4

Triggering patterns of topology changes

deg +

6

5

63 / 78

a: 7 b: 5 c: 1 deg: 4

Characterizing vertex behaviors Vertex descriptive sequence

.

• A discretization function gives a variation symbol to a

vertex/attribute/time triple, e.g.  discr(v, f, i) =

 + if f(v, i) − f(v, i − 1) ≥ 2 and i > 1 − if f(v, i) − f(v, i − 1) ≤ −2 and i > 1  ∅ otherwise

• A vertex v is described by a sequence of itemsets

δ(v) = ⟨{discr(v, f, 1) | f ∈ F}, . . . , {discr(v, f, t) | f ∈ F}⟩ • ∆ = {δ(v) | v ∈ V} is the set of all sequences.

Example δ(u1 ) = ⟨{a+ , b+ }, {c− }, {deg+ }⟩ ∆ = {δ(u1 ), δ(u2 ), δ(u3 ), δ(u4 ), δ(u5 )} Resource-aware Machine Learning

Triggering patterns of topology changes

64 / 78

Triggering patterns . A triggering pattern is a sequence P = ⟨L, R⟩ with • L a sequence of node attribute variations • R a single topological property variation

Support measure supp(P, ∆) = {v ∈ V | P ⪯ δ(v)} where p ⪯ q means that p is a super-sequence of q

Example • L = ⟨{a+ , b+ }, {c− }⟩ • R = ⟨{deg+ }⟩ • supp(⟨L, R⟩, ∆) =

{u1 , u3 } Resource-aware Machine Learning

a: 4 b: 1 c: 5 deg: 1

deg +

c-

a+ b + u2

u2

u1

u3

u5

u4

a: 2 b: 0 c: 6 deg: 2

u2

u2

a: 7 b: 4 u1 c: 5 deg: 1

u3

u5

u4

a: 2 b: 1 c: 6 deg: 2

a: 7 b: 5 c: 2 deg: 1

u1

u3

u5

u4

a: 5 b: 4 c: 5 deg: 2

a: 8 b: 5 c: 1 deg: 4

u3

u5

u4

2

a: 5 b: 5 c: 5 deg: 2

u2

a: 9 b: 6 u1 c: 1 deg: 4

u3

u5

u4

c-

a+ b +

1

u2

u1

3

Triggering patterns of topology changes

4

a: 6 b: 5 c: 2 deg: 2

a: 9 b: 6 u1 c: 0 deg: 4

u3

u5

u4

deg +

5

65 / 78

6

a: 7 b: 5 c: 1 deg: 4

Assessing the strength of a pattern .

Triggering pattern growth rate Let P = ⟨L, R⟩, we denote by ∆R ⊆ ∆ the set of vertex descriptive sequences that contain R. The growth rate of P is given by: GR(P, ∆R ) =

|∆ \ ∆R | |supp(L, ∆R )| × |∆R | |supp(L, ∆ \ ∆R )|

G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD, pages 43--52, 1999.

Resource-aware Machine Learning

Triggering patterns of topology changes

66 / 78

Quantitative results .

(i) Runtimes and number of patterns, (ii) Distribution of execution times, (iii) Support distribution, (iv) Growth rate vs support, (v) Scalability. Resource-aware Machine Learning

Triggering patterns of topology changes

67 / 78

The DBLP data Detecting asynchronous events

.

++ , degree++ } → {degree++ } {eigenvector++ 1 }, {VLDB 2 3

Resource-aware Machine Learning

Triggering patterns of topology changes

68 / 78

US flights data . Synchronous events • RITA1: daily in September 2001 {#Canceled+ }{Deg− , Close− , NbCliq− , Prk− , Betw− } → Deg+

Airports that absorb the traffic two days after • RITA2: monthly in (sept. 2000 ; Sept. 2002 ) {#Canceled+ }{#Canceled− }, {nbCliq− , Betw+ } → nbCliq+

A "back to normal" around March 2002 • RITA3: Aug./Sept. 2005 (Katrina Hurricane) {#Canceled+ , DelayAtDep+ }, {#Diverted− , #Depar− , #Arrival− } → {close− }

All the airports supporting this pattern are located in the US West coast where Katrina raged.

Resource-aware Machine Learning

Triggering patterns of topology changes

69 / 78

.

Conclusions and perspectives

Resource-aware Machine Learning

Conclusions and perspectives

70 / 78

Conclusion .

• Constraint-based pattern mining framework can be used to

analyze dynamic graphs • Patterns combine information about the topological structure, the

vertex attribute values and their tendencies. + A wide variety of data can be mined + Provide new insights on the techniques that can be used to analyze dynamic graphs

Resource-aware Machine Learning

Conclusions and perspectives

71 / 78

Challenges and perspectives - 1 .

Enhancing the quality of the extracted information • by capturing changes in the graph structure • while allowing changes in trend direction +

Combining sub-graph and sequence mining

Resource-aware Machine Learning

Conclusions and perspectives

72 / 78

Challenges and perspectives - 2 .

Avoiding the known drawbacks of pattern mining approaches • How to get rid of redundant patterns? • How to fix the threshold values?

Returning patterns whose constraint values are surprising • with respect to what could be expected from randomized data • with respect to what happened in the past (real-time data mining) +

Trigger alerts when changes in behavior

Resource-aware Machine Learning

Conclusions and perspectives

73 / 78

Challenges and perspectives - 2 .

Avoiding the known drawbacks of pattern mining approaches • How to get rid of redundant patterns? • How to fix the threshold values?

Returning patterns whose constraint values are surprising • with respect to what could be expected from randomized data • with respect to what happened in the past (real-time data mining) +

Trigger alerts when changes in behavior

Resource-aware Machine Learning

Conclusions and perspectives

73 / 78

Challenges and perspectives - 3 .

Pattern without threshold values: Skyline patterns Retrieves the patterns that are not dominated by any other pattern Tid t1 t2 t3 t4 t5 t6 Patterns ABCDEF AB AC A ABCD C

Items A B C D E F A B C D E F A B D A C E freq 2 3 3 4 2 3

length 6 2 2 1 4 1

Resource-aware Machine Learning

area 12 6 6 4 8 3

Conclusions and perspectives

A. Soulet, C. Raissi, M. Plantevit, B. Cremilleux - 2011 74 / 78

Ongoing research projects .

Vél'innov - ANR INOV 2012 - labex IMU project • Constraint-based pattern mining of Vélo'V data • Characterizing the usage of the bicycle sharing system • Modeling the system to simulate the impacts of local changes (in the city or in the system) • Pitfall: • Vertex attributes are static (INSEE data) or derived from the edges (e.g., number of users of a considered category) + Searching for relationships between attributes and the graph structure does not make sense • Envisioned solutions: • Considering the user trajectories and looking for paths that characterize a user group • Considering an attributed static graph with temporal usage vectors associated to edges and seeking for sub-graphs homogeneous on vertex and edges attributes Resource-aware Machine Learning

Conclusions and perspectives

75 / 78

Ongoing research projects - 1 .

GRAISearch: enhancing a location-based social media search engine • FP7-PEOPLE-2013-IAPP: Industry-Academia Partnerships and

Pathways • Benefits for DM2L team • 40 researcher months • Access to data with high added value • An industrial cooperation with critical potential benefits for the

company

Resource-aware Machine Learning

Conclusions and perspectives

76 / 78

Ongoing research projects - 2 .

GRAISearch: Research tasks • Geo-localized demographic flow prediction using geo-tagged

social media uploaded data

• Geo-localized event detection algorithm • Integration into a geo-located social media recommender system

Resource-aware Machine Learning

Conclusions and perspectives

77 / 78

Acknowledgments .

Resource-aware Machine Learning

Conclusions and perspectives

78 / 78