Jun 16, 2014 - Well established. Efficient for big data analytics. Not efficient with iterative algorithms. (stateless). Graph algorithms are iterative. MapReduce.
Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis
Züri Machine Learning Meetup #5
June 16, 2014
Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis
Züri Machine Learning Meetup #5
June 16, 2014
Outline • Machine Learning Cases • Why Apache Giraph? • Walk-through example for Recommendation Systems • What’s more? June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
3
Machine Learning cases
Friend Recommendation Fake account detection June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
4
Machine Learning cases
Product recommendation Online advertisements June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
5
Machine Learning cases
Route planning Delivery scheduling June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
6
Machine Learning cases
Graphs are everywhere Graphs need processing! June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
7
(
(Graphs ABC)
Graph: a representation of a set of objects V = Vertices (nodes) E = Edges (links) Graphs capture the relationship between objects Graphs can be directed or undirected
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
) 8
Graphs need processing!
So what? June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
9
Challenge #1
Scale of graphs indexes ~50B pages has ~1.1B users has ~570M users
has ~530M users June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
10
Challenge #2
Complexity of graphs
Compute shortest distance from “google.com” à Multiple passes to compute the result à Inherent dependencies make it hard to parallelize June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
11
MapReduce Well established Efficient for big data analytics
June 16, 2014
Not efficient with iterative algorithms (stateless) Graph algorithms are iterative
Apache Giraph for applications in Machine Learning | Maria Stylianou
12
Why Apache Giraph?
Explicitly designed for graph processing
on top of the Hadoop ecosystem
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
13
The story. Apache Top Level Project (2012)
Google Pregel (2010)
Donated to ASF by Yahoo! (2011)
June 16, 2014
1.1 release (2014)
1.0 release (2013) Supported by:
Facebook Yahoo! LinkedIn
Apache Giraph for applications in Machine Learning | Maria Stylianou
14
Giraph follows the Pregel model or
Bulk Synchronous Parallel
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
15
“Thinking like a vertex” I am a vertex! How would I coordinate with other vertices to solve the problem?
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
16
Shortest Paths I only know my value and who my neighbors are
June 16, 2014
Apache Giraph for applications in Machine Learning | Maria Stylianou
http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf ..... Tailors content delivery based on viewing preference data captured in Cassandra.
public void publish(Company publisher, double price) { ...... The EntityManager is the primary interface used by application developers to interact with the JPA runtime. ...... is true for UK and false for Peru, and is equivalent to the expression:.
No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical ... questions pertaining to this book in order to successfully download the code. .... The Tomcat Manager Web Application. ...... Secure Sessi
Apache OpenJPA 2.0 User's Guide ...... 184. 2.1. Code Formatting with the Application Id Tool . ...... 377. 2.9. Example properties for Informix Dynamic Server . ...... backs methods for monitoring changes in the lifecycle of your persistent objects.
May 2, 2018 - million barrels of oil (APA has a 100-percent working interest); and ... Net cash provided by operating activities in the quarter was $615 million.
Oct 5, 2010 - How long have you been using Apache Camel? ..... 19.4%. Ruby: 6. 8.96%. Python: 10. 14.93%. If other, please specify: 4. 5% .... Integration testing with OSGi environment. .... Stop the API changes on minor releases! 51.
Apache Storm developers can use Amazon Kinesis to quickly and cost effectively ..... basic webserver and serves the content using the Connect middleware for Node. .... decoupled architecture for streaming, processing, storage, and delivery.