Apache Giraph

Jun 16, 2014 - Well established. Efficient for big data analytics. Not efficient with iterative algorithms. (stateless). Graph algorithms are iterative. MapReduce.
5MB Sizes 1 Downloads 170 Views
Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis

Züri Machine Learning Meetup #5

June 16, 2014

Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis

Züri Machine Learning Meetup #5

June 16, 2014

Outline •  Machine Learning Cases •  Why Apache Giraph? •  Walk-through example for Recommendation Systems •  What’s more? June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

3

Machine Learning cases

Friend Recommendation Fake account detection June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

4

Machine Learning cases

Product recommendation Online advertisements June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

5

Machine Learning cases

Route planning Delivery scheduling June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

6

Machine Learning cases

Graphs are everywhere Graphs need processing! June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

7

(

(Graphs ABC)

Graph: a representation of a set of objects V = Vertices (nodes) E = Edges (links) Graphs capture the relationship between objects Graphs can be directed or undirected

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

) 8

Graphs need processing!

So what? June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

9

Challenge #1

Scale of graphs indexes ~50B pages has ~1.1B users has ~570M users

has ~530M users June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

10

Challenge #2

Complexity of graphs

Compute shortest distance from “google.com” à  Multiple passes to compute the result à  Inherent dependencies make it hard to parallelize June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

11

MapReduce Well established Efficient for big data analytics

June 16, 2014

Not efficient with iterative algorithms (stateless) Graph algorithms are iterative

Apache Giraph for applications in Machine Learning | Maria Stylianou

12

Why Apache Giraph?

Explicitly designed for graph processing

on top of the Hadoop ecosystem

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

13

The story. Apache Top Level Project (2012)

Google Pregel (2010)

Donated to ASF by Yahoo! (2011)

June 16, 2014

1.1 release (2014)

1.0 release (2013) Supported by:

Facebook Yahoo! LinkedIn

Apache Giraph for applications in Machine Learning | Maria Stylianou

14

Giraph follows the Pregel model or

Bulk Synchronous Parallel

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

15

“Thinking like a vertex” I am a vertex! How would I coordinate with other vertices to solve the problem?

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou

16

Shortest Paths I only know my value and who my neighbors are

June 16, 2014

Apache Giraph for applications in Machine Learning | Maria Stylianou