Building Intelligent Systems with Large Scale Deep Learning

Building Intelligent Systems with Large Scale Deep Learning Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Google Brain Team Mission: Make Machines Intelligent. Improve People’s Lives.

How do we do this? ● Conduct long-term research (>200 papers, see g.co/brain & g.co/brain/papers) ○

Unsupervised learning of cats, Inception, word2vec, seq2seq, DeepDream, image captioning, neural translation, Magenta, ML for robotics control, healthcare, …

● Build and open-source systems like TensorFlow (see tensorflow.org and https://github.com/tensorflow/tensorflow)

● Collaborate with others at Google and Alphabet to get our work into the hands of billions of people (e.g., RankBrain for Google Search, GMail Smart Reply, Google Photos, Google speech recognition, Google Translate, Waymo, …)

● Train new researchers through internships and the Google Brain Residency program

Main Research Areas ● General Machine Learning Algorithms and Techniques ● Computer Systems for Machine Learning ● Natural Language Understanding ● Perception ● Healthcare ● Robotics ● Music and Art Generation

Main Research Areas ● General Machine Learning Algorithms and Techniques ● Computer Systems for Machine Learning ● Natural Language Understanding ● Perception ● Healthcare ● Robotics ● Music and Art Generation

research.googleblog.com/2017/01 /the-google-brain-team-looking-ba ck-on.html

1980s and 1990s Accuracy

neural networks other approaches

Scale (data size, model size)

1980s and 1990s Accuracy

more compute



Now Accuracy

more compute



Growth of Deep Learning at Google

and many more . . . .

Directories containing model description files

Experiment Turnaround Time and Research Productivity ● Minutes, Hours: ○ Interactive research! Instant gratification!

● 1-4 days ○ Tolerable ○ Interactivity replaced by running many experiments in parallel

● 1-4 weeks ○ High value experiments only ○ Progress stalls

● >1 month ○ Don’t even try

Build the right tools

Google Confidential + Proprietary (permission granted to share within NIST)

Open, standard software for general machine learning Great for Deep Learning in particular http://tensorflow.org/ and

https://github.com/tensorflow/tensorflow

First released Nov 2015 Apache 2.0 license

TensorFlow Goals Establish common platform for expressing machine learning ideas and systems Make this platform the best in the world for both research and production use Open source it so that it becomes a platform for everyone, not just Google

TensorFlow Scaling

Near-linear performance gains with each additional 8x NVIDIA® Tesla® K80 server added to the cluster

TensorFlow supports many platforms

CPU

GPU iOS

Android

Raspberry Pi 1st-gen TPU

Cloud TPU

TensorFlow supports many languages Java

2013

2011

2013 2013 2010

late 2015

ML is done in many places

TensorFlow GitHub stars by GitHub user profiles w/ public locations Source: http://jrvis.com/red-dwarf/?user=tensorflow&repo=tensorflow

TensorFlow: A Vibrant Open-Source Community ●

Rapid development, many outside contributors ○ 475+ non-Google contributors to TensorFlow 1.0 ○ 15,000+ commits in 15 months ○ Many community created tutorials, models, translations, and projects ■ ~7,000 GitHub repositories with ‘TensorFlow’ in the title

●

Direct engagement between community and TensorFlow team ○ 5000+ Stack Overflow questions answered ○ 80+ community-submitted GitHub issues responded to weekly

●

Growing use in ML classes: Toronto, Berkeley, Stanford, ...

Google Photos

[glacier] Google Cloud Platform

Confidential & Proprietary

24

24

Reuse same model for completely different problems Same basic model structure trained on different data, useful in completely different contexts Example: given image → predict interesting pixels

www.google.com/sunroof

We have tons of vision problems Image search, StreetView, Satellite Imagery, Translation, Robotics, Self-driving Cars,

Computers can now see Large implications for healthcare


MEDICAL IMAGING Using similar model for detecting diabetic retinopathy in retinal images

Performance on par or slightly better than the median of 8 U.S. board-certified ophthalmologists (F-score of 0.95 vs. 0.91). http://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html

Computers can now see Large implications for robotics


Combining Vision with Robotics “Deep Learning for Robots: Learning from Large-Scale Interaction”, Google Research Blog, March, 2016 “Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection”, Sergey Levine, Peter Pastor, Alex Krizhevsky, & Deirdre Quillen, Arxiv, arxiv.org/abs/1603.02199

Self-Supervised and End-to-end Pose Estimation

Confidential + Proprietary

TCN + Self-Supervision (No Labels!)

Confidential + Proprietary

Scientific Applications of ML


Predicting Properties of Molecules Toxic?

Aspirin

Message Passing Neural Net

Bind with a given protein? Quantum properties.

● ● ● ● ●

Chemical space is too big, so chemists often rely on virtual screening. Machine Learning can help search this large space. Molecules are graphs, nodes=atoms and edges=bonds (and other stuff) Message Passing Neural Nets unify and extend many neural net models that are invariant to graph symmetries State of the art results predicting output of expensive quantum chemistry calculations, but ~300,000 times faster

https://research.googleblog.com/2017/04/predicting-properties-of-molecules-with.html and https://arxiv.org/abs/1702.05532 and https://arxiv.org/abs/1704.01212 (latter to appear in ICML 2017)

Measuring live cells with image to image regression “Seeing More”

Enabling technology: Image to image regression Input

True Depth

Predicted Depth

Depth prediction on portrait data

Applications for camera effects Input

Saturation

Defocus

Predict cellular markers from transmission microscopy?

Human cancer cells / DIC / nuclei (blue) and cell mask (green)

Human iPSC neurons / phase contrast / nuclei (blue), dendrites (green), and axons (red)

Scaling language understanding models


Sequence-to-Sequence Model Target sequence

[Sutskever & Vinyals & Le NIPS 2014]

X

Y

Z

Q

__

X

Y

Z

v

Deep LSTM A

B

C

Input sequence

D

Sequence-to-Sequence Model: Machine Translation Target sentence


How

v

Quelle

est

votre

Input sentence

taille?



How

tall

How

v

Quelle

est

votre

Input sentence

taille?



How

tall

How

are

v

Quelle

est

votre

Input sentence

taille?

tall



How

tall

How

are

you?

v

Quelle

est

votre

Input sentence

taille?

tall

are

Sequence-to-Sequence Model: Machine Translation At inference time: Beam search to choose most probable [Sutskever & Vinyals & Le NIPS 2014] over possible output sequences

v

Quelle

est

votre

Input sentence

taille?

Incoming Email

Smart Reply Small Feed-Forward Neural Network

Google Research Blog - Nov 2015 Activate Smart Reply?

yes/no

Incoming Email

Smart Reply Small Feed-Forward Neural Network

Google Research Blog - Nov 2015 Activate Smart Reply?

yes/no

Generated Replies

Deep Recurrent Neural Network

Smart Reply April 1, 2009: April Fool’s Day joke Nov 5, 2015: Launched Real Product Feb 1, 2016: >10% of mobile Inbox replies

Sequence to Sequence model applied to Google Translate


https://arxiv.org/abs/1609.08144

Google Neural Machine Translation Model Y1

One model replica: one machine w/ 8 GPUs

Encoder LSTMs

Y2

SoftMax Decoder LSTMs

Gpu8

Gpu8

8 Layers

+

+

+ +

Gpu3

+

+ Gpu3

Gpu2

Attention

Gpu2

Gpu2 Gpu1 Gpu1

X3

X2

Y1

Y3

Model + Data Parallelism Parameters distributed across many parameter server machines

Many replicas

Params

Params

...

Params

...

Neural Machine Translation perfect translation

6 Translation quality

5

human neural (GNMT)

4

phrase-based (PBMT)

3 2

Closes gap between old system and human-quality translation by 58% to 87%

1 0

English English English Spanish French > > > > > Spanish French Chinese English English

Translation model

Chinese > English

Enables better communication across the world

research.googleblog.com/2016/09/a-neural-network-for-machine.html

BACKTRANSLATION FROM JAPANESE (en->ja->en) Phrase-Based Machine Translation (old system): Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai language, has been referred to as the house of God. The top close to the west, there is a dry, frozen carcass of a leopard. Whether the leopard had what the demand at that altitude, there is no that nobody explained. Google Neural Machine Translation (new system): Kilimanjaro is a mountain of 19,710 feet covered with snow, which is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” God ‘s house in Masai language. There is a dried and frozen carcass of a leopard near the summit of the west. No one can explain what the leopard was seeking at that altitude.

Automated machine learning (“learning to learn”)


Current: Solution = ML expertise + data + computation

Current: Solution = ML expertise + data + computation Can we turn this into: Solution = data + 100X computation ???

Early encouraging signs Trying multiple different approaches: (1) RL-based architecture search (2) Model architecture evolution (3) Learn how to optimize

Appeared in ICLR 2017

Idea: model-generating model trained via RL (1) Generate ten models (2) Train them for a few hours (3) Use loss of the generated models as reinforcement learning signal arxiv.org/abs/1611.01578

CIFAR-10 Image Recognition Task

Penn Tree Bank Language Modeling Task “Normal” LSTM cell

Cell discovered by architecture search

Learn2Learn: Learn the Optimization Update Rule

Neural Optimizer Search using Reinforcement Learning, Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc Le. To appear in ICML 2017

More computational power needed Deep learning is transforming how we design computers


Special computation properties reduced precision ok

about 1.2 × about 0.6 about 0.7

1.21042

NOT

× 0.61127 0.73989343

Special computation properties reduced precision ok

handful of specific operations

about 1.2 × about 0.6

1.21042

NOT

about 0.7

× 0.61127 0.73989343

×

=

Tensor Processing Unit v2

Revealed in May at Google I/O

Google-designed device for neural net training and inference ●

180 teraflops of computation, 64 GB of memory

TPU Pod 64 2nd-gen TPUs 11.5 petaflops 4 terabytes of memory

Programmed via TensorFlow Same program will run with only minor modifications on CPUs, GPUs, & TPUs

Will be Available through Google Cloud Cloud TPU - virtual machine w/180 TFLOPS TPUv2 device attached

Making 1000 Cloud TPUs available for free to top researchers who are committed to open machine learning research We’re excited to see what researchers will do with much more computation! g.co/tpusignup

Machine Learning in Google Cloud Custom ML models

TensorFlow

Pre-trained ML models

Vision API

Speech API

Jobs API

Natural Language API

Translation API

Video Intelligence API

Machine Learning Engine

Machine Learning for Higher Performance Machine Learning Models


Device Placement with Reinforcement Learning Placement model (trained via RL) gets graph as input + set of devices, outputs device placement for each graph node

+19.3% faster vs. expert human for NMT model

Measured time per step gives RL reward signal

+19.7% faster vs. expert human for InceptionV3

Device Placement Optimization with Reinforcement Learning, Azalia Mirhoseini, Hieu Pham, Quoc Le, Mohammad Norouzi, Samy Bengio, Benoit Steiner, Yuefeng Zhou, Naveen Kumar, Rasmus Larsen, and Jeff Dean, to appear in ICML 2017, arxiv.org/abs/1706.04972

Now Accuracy

more compute



Future Accuracy

more compute



Example queries of the future Which of these eye images shows symptoms of diabetic retinopathy?

Please fetch me a cup of tea from the kitchen

Describe this video in Spanish

Find me documents related to reinforcement learning for robotics and summarize them in German

Conclusions Deep neural networks are making significant strides in speech, vision, language, search, robotics, healthcare, … If you’re not considering how to use deep neural nets to solve your problems, you almost certainly should be

g.co/brain More info about our work

g.co/brain

Thanks!