Active Learning with. Partially Observed Data. April 7, 2014 ... Implications in Big Graph Data ... (for a very large ,
Active Learning with Partially Observed Data April 7, 2014 BGM Workshop at WWW’14 Seoul, Korea
Seungwhan Moon Carnegie Mellon University
1
Active Learning
2
* Diagram from [B. Settles (2008)]
Active Learning Query Strategies •
Uncertainty Sampling [Lewis and Gale (1994)]
•
Query by Committee [Seung et al., (1992)]
•
Entropy Based Sampling [Chaloner and Verdinelli, (1995)]
•
Density Weighted Methods [Settles and Craven (2008)]
•
Proactive Sampling Methods [P. Donmez and J. Carbonell (2010)]
•
Active Learning with Graphs [Kong et al., (2011), Bilgic et al., (2010), etc.]
3
Query Strategy - Uncertainty Sampling
Label 1 Label 2 Unlabeled
Current Decision Boundary 4
Query Strategy - Uncertainty Sampling
= most uncertain
Label 1 Label 2 Unlabeled
Current Decision Boundary 5
Query Strategy Density-Weighted Sampling
Label 1 Label 2 Unlabeled
Current Decision Boundary 6
Query Strategy Density-Weighted Sampling
= most dense & informative
Label 1 Label 2 Unlabeled
Current Decision Boundary 7
Active Learning with Graph
8
Kong et al., (2011)
Active Learning with Partially Observed Data •
Active Learning as joint optimization selections
Label 1 Label 2 Partially Featured (Unlabeled) Projection (via Inference) Current Decision Boundary
1. Query Labels
2. Query Missing Attributes 9
Acquiring Labels •
Traditional Active Learning
•
Active Learning with Imputed Values
: fully featured subset
•
: partially featured subset
Incorporating the inference confidence
10
Acquiring Missing Attributes •
Pick the samples with missing attributes that will improve the system’s current imputation model — Highest density — Lowest confidence
•
Imputation confidence is approximated by applying
11
on
Joint Optimization Selections •
At each iteration, we decide whether to query a sample for labels or to query a sample for its missing features.
•
Greedy cost-optimization strategy (choosing the best action at each iteration)
: cost for acquiring labels : cost for acquiring missing attributes
•
Note that cost can be actual annotation expense, computational overhead, etc. 12
Imputation Model •
Expectation Maximization (EM) inference algorithm
•
Model the joint density over all features with a GMM
1. Expectation •
Likelihood : missing features (partition) : observed features
•
Posterior probability of
belonging to : prior probability of component j
13
Imputation Model 2. Maximization •
Recompute the parameters separately for each Gaussian
•
Assume
•
Impute any missing values
comes from
,
conditioned on
to the conditional mean
14
Imputation Model 2. Maximization •
Incorporate the imputed values when updating
•
Finally, we recompute
and
by summing the posterior weights
15
Algorithm
Implications in Big Graph Data •
Graph Categorization •
Graph enumeration / isomorphism testing (NP-complete)
•
Via topological and label attributes [G. Li et. al (2012)] :(e.g.) spectral attributes with Eigen decomposition ~
•
Progressive feature completion via Active Learning ~ (for a very large
,
: number of attributes,
17
: number of clusters)
Results on UCI Dataset
18
Results on Protein Interaction Networks •
2361 nodes, 7182 edges
•
5 PIN classes
•
Implemented 8 topological attributes for graph categorization (e.g. clustering coefficient, neighborhood connectivity, eccentricity, average degree, etc.),
•
19
3 out of 8 attributes were assumed to be missing
Conclusion •
A new active learning framework was proposed, which can efficiently deal with partially observed data.
•
The learner can choose to query either the label of the sample or the missing features of the sample based on the maximum utility of each action.
•
Similar approach can be applied to graph mining tasks: a special case on graph categorization tasks was considered in this paper.
20
Active Learning with Partially Observed Data Thank you! April 7, 2014 BGM Workshop at WWW’14 Seoul, Korea
Seungwhan Moon Carnegie Mellon University
21
Algorithm