Apache Giraph

5 downloads 363 Views 5MB Size Report
Jun 16, 2014 - Well established. Efficient for big data analytics. Not efficient with iterative algorithms. (stateless).
Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis

Züri Machine Learning Meetup #5

June 16, 2014

Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis

Züri Machine Learning Meetup #5

June 16, 2014

Outline •  Machine Learning Cases •  Why Apache Giraph? •  Walk-through example for Recommendation Systems •  What’s more? June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

3

Machine Learning cases

Friend Recommendation Fake account detection June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

4

Machine Learning cases

Product recommendation Online advertisements June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

5

Machine Learning cases

Route planning Delivery scheduling June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

6

Machine Learning cases

Graphs are everywhere Graphs need processing! June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

7

(

(Graphs ABC)

Graph: a representation of a set of objects V = Vertices (nodes) E = Edges (links) Graphs capture the relationship between objects Graphs can be directed or undirected

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

) 8

Graphs need processing!

So what? June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

9

Challenge #1

Scale of graphs indexes ~50B pages has ~1.1B users has ~570M users

has ~530M users June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

10

Challenge #2

Complexity of graphs

Compute shortest distance from “google.com” à  Multiple passes to compute the result à  Inherent dependencies make it hard to parallelize June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

11

MapReduce Well established Efficient for big data analytics

June 16, 2014

Not efficient with iterative algorithms (stateless) Graph algorithms are iterative

Apache Giraph for applications in Machine Learning | Maria Stylianou

12

Why Apache Giraph?

Explicitly designed for graph processing

on top of the Hadoop ecosystem

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

13

The story. Apache Top Level Project (2012)

Google Pregel (2010)

Donated to ASF by Yahoo! (2011)

June 16, 2014

1.1 release (2014)

1.0 release (2013) Supported by:

Facebook Yahoo! LinkedIn

Apache Giraph for applications in Machine Learning | Maria Stylianou

14

Giraph follows the Pregel model or

Bulk Synchronous Parallel

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

15

“Thinking like a vertex” I am a vertex! How would I coordinate with other vertices to solve the problem?

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

16

Shortest Paths I only know my value and who my neighbors are

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

17

Receive messages à Update value à Send messages

Vertices compute asynchronously June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

18

Global Synchronization

Synchronization barrier June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

19

And again

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

20

And again

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

21

Giraph super powers •  Message-passing communication •  In-memory computation à stateful •  Global synchronization •  Iterations à Iterations à Iterations June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

22

Recommendation Systems

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

23

Collaborative Filtering

Recommendation systems technique

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

24

Giraph for Recommendation Systems

Stochastic Gradient Descent algorithm (SGD)

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

25

Giraph for Recommendation Systems

Stochastic Gradient Descent algorithm (SGD)

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

26

Giraph for Recommendation Systems

Stochastic Gradient Descent algorithm (SGD)

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

27

Giraph for Recommendation Systems

Stochastic Gradient Descent algorithm (SGD)

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

28

Giraph for Recommendation Systems

Stochastic Gradient Descent algorithm (SGD)

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

29

Giraph for Recommendation Systems

Stochastic Gradient Descent algorithm (SGD)

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

30

What’s more Okapi ML •  The 1st advanced ML toolkit for Giraph •  Available as open source

Code available at: https://github.com/grafos-ml-okapi Documentation: http://grafos.ml/okapi.html June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

31

The Okapi library Collaborative filtering •  •  •  •  •  •  • 

Alternating Least Squares Stochastic Gradient Descent Singular Value Decomposition Collaborative Less-is-More (CLiMF) Context-aware recom. (TFMAP) Bayesian Personalized Ranking Popularity Ranking

Clustering

Graph analytics •  •  •  •  •  •  •  • 

Clustering coefficient Graph partitioning K-Core PageRank Semi-clustering Shortest distances SybilRank Triangle counting

•  Affinity propagation •  Kmeans June 16, 2014

and adding …

Apache Giraph for applications in Machine Learning | Maria Stylianou

32

What’s more Giraph in Action •  The 1st book for Giraph –  First steps with Giraph –  Build applications –  Integrate with other tools –  More! More details: http://manning.com/martella/

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

33

Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis

Züri Machine Learning Meetup #5

June 16, 2014