model

4 downloads 422 Views 11MB Size Report
Apr 30, 2018 - Inception: tensorflow.org/tutorials/image_recognition .... API Spec ... TensorFlow Distributed Execution
Getting Started with TensorFlow O’Reilly AI Conf. April 30, 2018 NYC

Amy Unruh

Your guide

These slides: bit.ly/tf-aiconf

Amy [email protected] @amygdala bit.ly/tf-aiconf

bit.ly/tensorflow-workshop

Welcome and Logistics - About the workshop, intros - break: 10:30-11 - TAs: Sara and Mallika These slides: bit.ly/tf-aiconf

Google Cloud Platform

3

Overview of the Workshop ● ● ●

Intro, setup, logistics What is TensorFlow (& Keras)? A story: what might an experimentation workflow look like?

Intro to some of TensorFlow’s high-level APIs: the Estimator, managing input , vocabulary_list=["female", "male"]) education = categorical_column_with_hash_bucket( key="education", hash_bucket_size=1000) ... # Continuous base columns. age = numeric_column("age") ...

bit.ly/tf-aiconf

Feature columns # Transformations. age_buckets = bucketized_column( age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) education_occupation =crossed_column( ["education", "occupation"], hash_bucket_size=int(1e4)) … # embeddings for deep learning embedding_column(workclass, dimension=8)

bit.ly/tf-aiconf

Feature columns and Estimators! model = tf.estimator.LinearClassifier( model_dir=model_dir, feature_columns=base_columns + crossed_columns)

Bucketing

Crossing

Hashing

Embedding

Partition by range

Create new combinations

Limit size

Learn a new representation

See also: tf.feature_column.input_layer

Typical flow ● Define an input function to load ) } prediction_output = tf.estimator.export.PredictOutput({"classes": tf.argmax(input=logits, axis=1), "probabilities": tf.nn.softmax(logits, name="softmax_tensor")}) if mode == tf.estimator.ModeKeys.PREDICT: return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions, export_outputs={tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_output}) ... bit.ly/tf-aiconf

... # Calculate Loss (for both TRAIN and EVAL modes) onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10) loss = tf.losses.softmax_cross_entropy( onehot_labels=onehot_labels, logits=logits) # Generate some summary info tf.summary.scalar('loss', loss) # Configure the Training Op (for TRAIN mode) if mode == tf.estimator.ModeKeys.TRAIN: optimizer = tf.train.AdamOptimizer(learning_rate=1e-4) train_op = optimizer.minimize( loss=loss, global_step=tf.train.get_global_step()) return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op) # Add evaluation metrics (for EVAL mode) eval_metric_ops = { "accuracy": tf.metrics.accuracy( labels=labels, predictions=predictions["classes"])} return tf.estimator.EstimatorSpec( mode=mode, loss=loss, eval_metric_ops=eval_metric_ops) bit.ly/tf-aiconf

Using tf.layers to define our CNN

conv1 = tf.layers.conv2d( inputs=input_layer, filters=32, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2) conv2 = tf.layers.conv2d( inputs=pool1, filters=64, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2) pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64]) dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu, name="dense1") dropout = tf.layers.dropout( inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN) logits = tf.layers.dense(inputs=dropout, units=10)

bit.ly/tf-aiconf

if mode == tf.estimator.ModeKeys.TRAIN: K.set_learning_phase(True) else: K.set_learning_phase(False)

Using keras.layers to define our CNN

conv1 = Convolution2D(32, (5, 5), activation='relu', input_shape=(28,28,1))(input_layer) pool1 = MaxPooling2D(pool_size=(2,2))(conv1) conv2 = Convolution2D(64, (5, 5), activation='relu')(pool1) pool2 = MaxPooling2D(pool_size=(2,2))(conv2) pool2_flat = Flatten()(pool2) dense = Dense(1024, activation='relu')(pool2_flat) dropout = Dropout(0.4)(dense) logits = Dense(10, activation='linear')(dropout)

bit.ly/tf-aiconf

bit.ly/tf-aiconf

Comparing ‘regular’ with Fashion MNIST

bit.ly/tf-aiconf

Maybe there are better models?...

bit.ly/tf-aiconf

bit.ly/tf-aiconf

Extras: Tf.data and Datasets: Performance Considerations

A typical TensorFlow training input pipeline can be framed as an ETL process: ● Extract: Read data from persistent storage -- either local or remote ● Transform: Use CPU cores to parse and perform preprocessing operations on the data ○ ○

image decompression, data augmentation shuffling, and batching...

● Load: Load the transformed data onto the accelerator device(s) that execute the machine learning model. ○

GPUs, TPUs...

● Viewing input pipelines as an ETL process provides structure that facilitates the application of performance optimizations. ○ utilize the CPU effectively, while reserving the accelerator for the heavy lifting of training your model. ● The tf.data API is an easier and more performant way to create input pipelines to TensorFlow models https://www.tensorflow.org/api_docs/python/tf/data

def parse_fn(example): "Parse TFExample records and perform simple data augmentation." example_fmt = { "image": tf.FixedLengthFeature((), tf.string, ""), "label": tf.FixedLengthFeature((), tf.int64, -1) } parsed = tf.parse_single_example(example, example_fmt) image = tf.image.decode_image(parsed["image"]) image = _augment_helper(image) # augments image using slice, reshape, resize_bilinear return image, parsed["label"] def input_fn(): files = tf.data.Dataset.list_files("/path/to/dataset/train-*.tfrecord") dataset = files.interleave(tf.data.TFRecordDataset) dataset = dataset.shuffle(buffer_size=FLAGS.shuffle_buffer_size) dataset = dataset.map(map_func=parse_fn) dataset = dataset.batch(batch_size=FLAGS.batch_size) return dataset

(images from: https://goo.gl/XGeimj)

def parse_fn(example): "Parse TFExample records and perform simple data augmentation." example_fmt = { "image": tf.FixedLengthFeature((), tf.string, ""), "label": tf.FixedLengthFeature((), tf.int64, -1) } parsed = tf.parse_single_example(example, example_fmt) image = tf.image.decode_image(parsed["image"]) image = _augment_helper(image) # augments image using slice, reshape, resize_bilinear return image, parsed["label"] def input_fn(): files = tf.data.Dataset.list_files("/path/to/dataset/train-*.tfrecord") dataset = files.interleave(tf.data.TFRecordDataset) dataset = dataset.shuffle(buffer_size=FLAGS.shuffle_buffer_size) dataset = dataset.map(map_func=parse_fn) dataset = dataset.batch(batch_size=FLAGS.batch_size) dataset = dataset.prefetch(buffer_size=FLAGS.prefetch_buffer_size) return dataset

Pipelining

Parallelize Data Transformation

def parse_fn(example): "Parse TFExample records and perform simple data augmentation." example_fmt = { "image": tf.FixedLengthFeature((), tf.string, ""), "label": tf.FixedLengthFeature((), tf.int64, -1) } parsed = tf.parse_single_example(example, example_fmt) image = tf.image.decode_image(parsed["image"]) image = _augment_helper(image) # augments image using slice, reshape, resize_bilinear return image, parsed["label"] def input_fn(): files = tf.data.Dataset.list_files("/path/to/dataset/train-*.tfrecord") dataset = files.interleave(tf.data.TFRecordDataset) dataset = dataset.shuffle(buffer_size=FLAGS.shuffle_buffer_size) dataset = dataset.map(map_func=parse_fn, num_parallel_calls=FLAGS.num_parallel_calls) dataset = dataset.batch(batch_size=FLAGS.batch_size) dataset = dataset.prefetch(buffer_size=FLAGS.prefetch_buffer_size) return dataset

def parse_fn(example): "Parse TFExample records and perform simple data augmentation." example_fmt = { "image": tf.FixedLengthFeature((), tf.string, ""), "label": tf.FixedLengthFeature((), tf.int64, -1) } parsed = tf.parse_single_example(example, example_fmt) image = tf.image.decode_image(parsed["image"]) image = _augment_helper(image) # augments image using slice, reshape, resize_bilinear return image, parsed["label"] def input_fn(): files = tf.data.Dataset.list_files("/path/to/dataset/train-*.tfrecord") dataset = files.interleave(tf.data.TFRecordDataset) dataset = dataset.shuffle(buffer_size=FLAGS.shuffle_buffer_size) dataset = dataset.apply(tf.contrib.data.map_and_batch( map_func=parse_fn, batch_size=FLAGS.batch_size)) dataset = dataset.prefetch(buffer_size=FLAGS.prefetch_buffer_size) return dataset

Parallelize Data Extraction

def parse_fn(example): "Parse TFExample records and perform simple data augmentation." example_fmt = { "image": tf.FixedLengthFeature((), tf.string, ""), "label": tf.FixedLengthFeature((), tf.int64, -1) } parsed = tf.parse_single_example(example, example_fmt) image = tf.image.decode_image(parsed["image"]) image = _augment_helper(image) # augments image using slice, reshape, resize_bilinear return image, parsed["label"] def input_fn(): files = tf.data.Dataset.list_files("/path/to/dataset/train-*.tfrecord") dataset = files.apply(tf.contrib.data.parallel_interleave( tf.data.TFRecordDataset, cycle_length=FLAGS.num_parallel_readers)) dataset = dataset.shuffle(buffer_size=FLAGS.shuffle_buffer_size) dataset = dataset.apply(tf.contrib.data.map_and_batch( map_func=parse_fn, batch_size=FLAGS.batch_size)) dataset = dataset.prefetch(buffer_size=FLAGS.prefetch_buffer_size) return dataset

Wrapup

Thank you!

Amy [email protected] @amygdala bit.ly/tf-aiconf

bit.ly/tensorflow-workshop

What next?

(an incomplete list)

Tutorials and code tensorflow.org TensorFlow Dev Summit bit.ly/tf-dev-summit2018 Intro to Deep Learning with TensorFlow Udacity class goo.gl/iHssII For CNNs: Stanford’s CS231n cs231n.github.io Coursera: Serverless Machine Learning with Tensorflow on Google Cloud Platform

These Slides bit.ly/tf-aiconf AI Adventures (on YouTube) bit.ly/ai-adventures Deep Learning (Goodfellow, Bengio, Courville) http://www.deeplearningbook.org/ Chris Olah’s blog + Distill colah.github.io distill.pub/ Michael Nielsen’s book Neuralnetworksanddeeplearning.com TensorFlow RNN tutorial

end

Brief intro to some NN concepts

Google Cloud Platform

176

What is Machine Learning?

data

algorithm

bit.ly/tf-ainextcon

insight

bit.ly/tensorflow-workshop Google Cloud Platform

177

What is Machine Learning? “Field of study that gives computers the ability to learn without being explicitly programmed".

data

algorithm

bit.ly/tf-ainextcon

insight

bit.ly/tensorflow-workshop Google Cloud Platform

178

What is Machine Learning? “You can think of ML as programming with data instead of instructions. The system learns from the data so it can react correctly to new data.".

data

algorithm

bit.ly/tf-ainextcon

insight

bit.ly/tensorflow-workshop Google Cloud Platform

179

What is Machine Learning? But: http://research.google.com/pubs/pub43146.html ("Machine Learning: The High Interest Credit Card of Technical Debt")

data

algorithm

bit.ly/tf-ainextcon

insight

bit.ly/tensorflow-workshop Google Cloud Platform

180

["this", "movie", "was", "great"]

Input



Hidden →

Output (label) →

["POS"] bit.ly/tensorflow-workshop

bit.ly/tf-ainextcon Google Cloud Platform

181

["this", "movie", "was", "great"]

Input



Hidden →

Output (score) →

[.7] bit.ly/tensorflow-workshop

bit.ly/tf-ainextcon Google Cloud Platform

182

Input Hidden Output(label)

pixels(

["cat"]

)

bit.ly/tensorflow-workshop

bit.ly/tf-ainextcon Google Cloud Platform

183

Input Hidden Output(label)

So how do NNs learn?-A feedback process called backpropagation

pixels(

["cat"]

)

bit.ly/tensorflow-workshop

bit.ly/tf-ainextcon Google Cloud Platform

184

Because TensorFlow knows the entire graph of your computations, it can automatically use the backpropagation algorithm to efficiently determine how your variables affect the cost you ask it to minimize.

["cat"]

Then it can apply your choice of optimization algorithm to modify the variables and reduce the cost. bit.ly/tensorflow-workshop

bit.ly/tf-ainextcon Google Cloud Platform

185

Linear Regression

f(x)= mx + b

Linear Classification: Apply a Logistic Function ReLU

z = f(x) = mx + b y = sigmoid_function(z)

Linear Classification: Apply a Logistic Function

z = f(x) = mx + b y = sigmoid_function(z)

Image from: https://docs.microsoft.com/en-us/azure/machine-learning/machine-lear ning-algorithm-choice

Linear Classification and Linear Separability

(but: feature crosses can encode non-linear information, and TensorFlow can help with this.)

Implementation as Matrix Multiplication

bit.ly/tf-ainextcon

bit.ly/tensorflow-workshop Google Cloud Platform

191

On deep learning

Google Cloud Platform

192

How About Non Linear Data Distributions? The Problem

Linear Model

Neural Network

ConvNets

Each neuron implements a relatively simple mathematical function.

But the composition of 106 - 109 such functions is surprisingly powerful.

“A core idea in deep learning is that we assume that the data was generated by the composition of factors or features, potentially at multiple levels in a hierarchy.” From http://www.deeplearningbook.org/, Chp 5

On “the curse of dimensionality”: Is it possible to represent a complicated function efficiently? Is it possible for the estimated function to generalize well to new inputs? Yes: a key insight is that a large number of regions, say O(2k), can be defined with O(k) examples, so long as we introduce some dependencies between the regions via assumptions about the underlying data-generating distribution. Many deep learning algorithms provide ... assumptions that are reasonable for a broad range of AI tasks in order to capture these advantages. (from: http://www.deeplearningbook.org/ )

Neural Networks: Bending the Solution Space

Separating a Spiral

Tensorboard: Graph Visualization

bit.ly/tf-aiconf

TensorBoard

https://www.tensorflow.org/get_started/summaries_and_tensorboard

playground.tensorflow.org