Jun 16, 2014 - Well established. Efficient for big data analytics. Not efficient with iterative algorithms. (stateless).
Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis
Züri Machine Learning Meetup #5
June 16, 2014
Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis
Züri Machine Learning Meetup #5
June 16, 2014
Outline • Machine Learning Cases • Why Apache Giraph? • Walk-through example for Recommendation Systems • What’s more? June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
3
Machine Learning cases
Friend Recommendation Fake account detection June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
4
Machine Learning cases
Product recommendation Online advertisements June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
5
Machine Learning cases
Route planning Delivery scheduling June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
6
Machine Learning cases
Graphs are everywhere Graphs need processing! June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
7
(
(Graphs ABC)
Graph: a representation of a set of objects V = Vertices (nodes) E = Edges (links) Graphs capture the relationship between objects Graphs can be directed or undirected
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
) 8
Graphs need processing!
So what? June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
9
Challenge #1
Scale of graphs indexes ~50B pages has ~1.1B users has ~570M users
has ~530M users June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
10
Challenge #2
Complexity of graphs
Compute shortest distance from “google.com” à Multiple passes to compute the result à Inherent dependencies make it hard to parallelize June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
11
MapReduce Well established Efficient for big data analytics
June 16, 2014
Not efficient with iterative algorithms (stateless) Graph algorithms are iterative
Apache Giraph for applications in Machine Learning | Maria Stylianou
12
Why Apache Giraph?
Explicitly designed for graph processing
on top of the Hadoop ecosystem
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
13
The story. Apache Top Level Project (2012)
Google Pregel (2010)
Donated to ASF by Yahoo! (2011)
June 16, 2014
1.1 release (2014)
1.0 release (2013) Supported by:
Facebook Yahoo! LinkedIn
Apache Giraph for applications in Machine Learning | Maria Stylianou
14
Giraph follows the Pregel model or
Bulk Synchronous Parallel
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
15
“Thinking like a vertex” I am a vertex! How would I coordinate with other vertices to solve the problem?
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
16
Shortest Paths I only know my value and who my neighbors are
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
17
Receive messages à Update value à Send messages
Vertices compute asynchronously June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
18
Global Synchronization
Synchronization barrier June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
19
And again
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
20
And again
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
21
Giraph super powers • Message-passing communication • In-memory computation à stateful • Global synchronization • Iterations à Iterations à Iterations June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
22
Recommendation Systems
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
23
Collaborative Filtering
Recommendation systems technique
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
24
Giraph for Recommendation Systems
Stochastic Gradient Descent algorithm (SGD)
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
25
Giraph for Recommendation Systems
Stochastic Gradient Descent algorithm (SGD)
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
26
Giraph for Recommendation Systems
Stochastic Gradient Descent algorithm (SGD)
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
27
Giraph for Recommendation Systems
Stochastic Gradient Descent algorithm (SGD)
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
28
Giraph for Recommendation Systems
Stochastic Gradient Descent algorithm (SGD)
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
29
Giraph for Recommendation Systems
Stochastic Gradient Descent algorithm (SGD)
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
30
What’s more Okapi ML • The 1st advanced ML toolkit for Giraph • Available as open source
Code available at: https://github.com/grafos-ml-okapi Documentation: http://grafos.ml/okapi.html June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
31
The Okapi library Collaborative filtering • • • • • • •
Alternating Least Squares Stochastic Gradient Descent Singular Value Decomposition Collaborative Less-is-More (CLiMF) Context-aware recom. (TFMAP) Bayesian Personalized Ranking Popularity Ranking
Clustering
Graph analytics • • • • • • • •
Clustering coefficient Graph partitioning K-Core PageRank Semi-clustering Shortest distances SybilRank Triangle counting
• Affinity propagation • Kmeans June 16, 2014
and adding …
Apache Giraph for applications in Machine Learning | Maria Stylianou
32
What’s more Giraph in Action • The 1st book for Giraph – First steps with Giraph – Build applications – Integrate with other tools – More! More details: http://manning.com/martella/
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
33
Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis
Züri Machine Learning Meetup #5
June 16, 2014