Apr 30, 2018 - Inception: tensorflow.org/tutorials/image_recognition .... API Spec ... TensorFlow Distributed Execution
Getting Started with TensorFlow O’Reilly AI Conf. April 30, 2018 NYC
Amy Unruh
Your guide
These slides: bit.ly/tf-aiconf
Amy
[email protected] @amygdala bit.ly/tf-aiconf
bit.ly/tensorflow-workshop
Welcome and Logistics - About the workshop, intros - break: 10:30-11 - TAs: Sara and Mallika These slides: bit.ly/tf-aiconf
Google Cloud Platform
3
Overview of the Workshop ● ● ●
Intro, setup, logistics What is TensorFlow (& Keras)? A story: what might an experimentation workflow look like?
Intro to some of TensorFlow’s high-level APIs: the Estimator, managing input , vocabulary_list=["female", "male"]) education = categorical_column_with_hash_bucket( key="education", hash_bucket_size=1000) ... # Continuous base columns. age = numeric_column("age") ...
bit.ly/tf-aiconf
Feature columns # Transformations. age_buckets = bucketized_column( age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) education_occupation =crossed_column( ["education", "occupation"], hash_bucket_size=int(1e4)) … # embeddings for deep learning embedding_column(workclass, dimension=8)
bit.ly/tf-aiconf
Feature columns and Estimators! model = tf.estimator.LinearClassifier( model_dir=model_dir, feature_columns=base_columns + crossed_columns)
Bucketing
Crossing
Hashing
Embedding
Partition by range
Create new combinations
Limit size
Learn a new representation
See also: tf.feature_column.input_layer
Typical flow ● Define an input function to load ) } prediction_output = tf.estimator.export.PredictOutput({"classes": tf.argmax(input=logits, axis=1), "probabilities": tf.nn.softmax(logits, name="softmax_tensor")}) if mode == tf.estimator.ModeKeys.PREDICT: return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions, export_outputs={tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_output}) ... bit.ly/tf-aiconf
... # Calculate Loss (for both TRAIN and EVAL modes) onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10) loss = tf.losses.softmax_cross_entropy( onehot_labels=onehot_labels, logits=logits) # Generate some summary info tf.summary.scalar('loss', loss) # Configure the Training Op (for TRAIN mode) if mode == tf.estimator.ModeKeys.TRAIN: optimizer = tf.train.AdamOptimizer(learning_rate=1e-4) train_op = optimizer.minimize( loss=loss, global_step=tf.train.get_global_step()) return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op) # Add evaluation metrics (for EVAL mode) eval_metric_ops = { "accuracy": tf.metrics.accuracy( labels=labels, predictions=predictions["classes"])} return tf.estimator.EstimatorSpec( mode=mode, loss=loss, eval_metric_ops=eval_metric_ops) bit.ly/tf-aiconf
Using tf.layers to define our CNN
conv1 = tf.layers.conv2d( inputs=input_layer, filters=32, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2) conv2 = tf.layers.conv2d( inputs=pool1, filters=64, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2) pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64]) dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu, name="dense1") dropout = tf.layers.dropout( inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN) logits = tf.layers.dense(inputs=dropout, units=10)
bit.ly/tf-aiconf
if mode == tf.estimator.ModeKeys.TRAIN: K.set_learning_phase(True) else: K.set_learning_phase(False)
Using keras.layers to define our CNN
conv1 = Convolution2D(32, (5, 5), activation='relu', input_shape=(28,28,1))(input_layer) pool1 = MaxPooling2D(pool_size=(2,2))(conv1) conv2 = Convolution2D(64, (5, 5), activation='relu')(pool1) pool2 = MaxPooling2D(pool_size=(2,2))(conv2) pool2_flat = Flatten()(pool2) dense = Dense(1024, activation='relu')(pool2_flat) dropout = Dropout(0.4)(dense) logits = Dense(10, activation='linear')(dropout)
bit.ly/tf-aiconf
bit.ly/tf-aiconf
Comparing ‘regular’ with Fashion MNIST
bit.ly/tf-aiconf
Maybe there are better models?...
bit.ly/tf-aiconf
bit.ly/tf-aiconf
Extras: Tf.data and Datasets: Performance Considerations
A typical TensorFlow training input pipeline can be framed as an ETL process: ● Extract: Read data from persistent storage -- either local or remote ● Transform: Use CPU cores to parse and perform preprocessing operations on the data ○ ○
image decompression, data augmentation shuffling, and batching...
● Load: Load the transformed data onto the accelerator device(s) that execute the machine learning model. ○
GPUs, TPUs...
● Viewing input pipelines as an ETL process provides structure that facilitates the application of performance optimizations. ○ utilize the CPU effectively, while reserving the accelerator for the heavy lifting of training your model. ● The tf.data API is an easier and more performant way to create input pipelines to TensorFlow models https://www.tensorflow.org/api_docs/python/tf/data
def parse_fn(example): "Parse TFExample records and perform simple data augmentation." example_fmt = { "image": tf.FixedLengthFeature((), tf.string, ""), "label": tf.FixedLengthFeature((), tf.int64, -1) } parsed = tf.parse_single_example(example, example_fmt) image = tf.image.decode_image(parsed["image"]) image = _augment_helper(image) # augments image using slice, reshape, resize_bilinear return image, parsed["label"] def input_fn(): files = tf.data.Dataset.list_files("/path/to/dataset/train-*.tfrecord") dataset = files.interleave(tf.data.TFRecordDataset) dataset = dataset.shuffle(buffer_size=FLAGS.shuffle_buffer_size) dataset = dataset.map(map_func=parse_fn) dataset = dataset.batch(batch_size=FLAGS.batch_size) return dataset
(images from: https://goo.gl/XGeimj)
def parse_fn(example): "Parse TFExample records and perform simple data augmentation." example_fmt = { "image": tf.FixedLengthFeature((), tf.string, ""), "label": tf.FixedLengthFeature((), tf.int64, -1) } parsed = tf.parse_single_example(example, example_fmt) image = tf.image.decode_image(parsed["image"]) image = _augment_helper(image) # augments image using slice, reshape, resize_bilinear return image, parsed["label"] def input_fn(): files = tf.data.Dataset.list_files("/path/to/dataset/train-*.tfrecord") dataset = files.interleave(tf.data.TFRecordDataset) dataset = dataset.shuffle(buffer_size=FLAGS.shuffle_buffer_size) dataset = dataset.map(map_func=parse_fn) dataset = dataset.batch(batch_size=FLAGS.batch_size) dataset = dataset.prefetch(buffer_size=FLAGS.prefetch_buffer_size) return dataset
Pipelining
Parallelize Data Transformation
def parse_fn(example): "Parse TFExample records and perform simple data augmentation." example_fmt = { "image": tf.FixedLengthFeature((), tf.string, ""), "label": tf.FixedLengthFeature((), tf.int64, -1) } parsed = tf.parse_single_example(example, example_fmt) image = tf.image.decode_image(parsed["image"]) image = _augment_helper(image) # augments image using slice, reshape, resize_bilinear return image, parsed["label"] def input_fn(): files = tf.data.Dataset.list_files("/path/to/dataset/train-*.tfrecord") dataset = files.interleave(tf.data.TFRecordDataset) dataset = dataset.shuffle(buffer_size=FLAGS.shuffle_buffer_size) dataset = dataset.map(map_func=parse_fn, num_parallel_calls=FLAGS.num_parallel_calls) dataset = dataset.batch(batch_size=FLAGS.batch_size) dataset = dataset.prefetch(buffer_size=FLAGS.prefetch_buffer_size) return dataset
def parse_fn(example): "Parse TFExample records and perform simple data augmentation." example_fmt = { "image": tf.FixedLengthFeature((), tf.string, ""), "label": tf.FixedLengthFeature((), tf.int64, -1) } parsed = tf.parse_single_example(example, example_fmt) image = tf.image.decode_image(parsed["image"]) image = _augment_helper(image) # augments image using slice, reshape, resize_bilinear return image, parsed["label"] def input_fn(): files = tf.data.Dataset.list_files("/path/to/dataset/train-*.tfrecord") dataset = files.interleave(tf.data.TFRecordDataset) dataset = dataset.shuffle(buffer_size=FLAGS.shuffle_buffer_size) dataset = dataset.apply(tf.contrib.data.map_and_batch( map_func=parse_fn, batch_size=FLAGS.batch_size)) dataset = dataset.prefetch(buffer_size=FLAGS.prefetch_buffer_size) return dataset
Parallelize Data Extraction
def parse_fn(example): "Parse TFExample records and perform simple data augmentation." example_fmt = { "image": tf.FixedLengthFeature((), tf.string, ""), "label": tf.FixedLengthFeature((), tf.int64, -1) } parsed = tf.parse_single_example(example, example_fmt) image = tf.image.decode_image(parsed["image"]) image = _augment_helper(image) # augments image using slice, reshape, resize_bilinear return image, parsed["label"] def input_fn(): files = tf.data.Dataset.list_files("/path/to/dataset/train-*.tfrecord") dataset = files.apply(tf.contrib.data.parallel_interleave( tf.data.TFRecordDataset, cycle_length=FLAGS.num_parallel_readers)) dataset = dataset.shuffle(buffer_size=FLAGS.shuffle_buffer_size) dataset = dataset.apply(tf.contrib.data.map_and_batch( map_func=parse_fn, batch_size=FLAGS.batch_size)) dataset = dataset.prefetch(buffer_size=FLAGS.prefetch_buffer_size) return dataset
Wrapup
Thank you!
Amy
[email protected] @amygdala bit.ly/tf-aiconf
bit.ly/tensorflow-workshop
What next?
(an incomplete list)
Tutorials and code tensorflow.org TensorFlow Dev Summit bit.ly/tf-dev-summit2018 Intro to Deep Learning with TensorFlow Udacity class goo.gl/iHssII For CNNs: Stanford’s CS231n cs231n.github.io Coursera: Serverless Machine Learning with Tensorflow on Google Cloud Platform
These Slides bit.ly/tf-aiconf AI Adventures (on YouTube) bit.ly/ai-adventures Deep Learning (Goodfellow, Bengio, Courville) http://www.deeplearningbook.org/ Chris Olah’s blog + Distill colah.github.io distill.pub/ Michael Nielsen’s book Neuralnetworksanddeeplearning.com TensorFlow RNN tutorial
end
Brief intro to some NN concepts
Google Cloud Platform
176
What is Machine Learning?
data
algorithm
bit.ly/tf-ainextcon
insight
bit.ly/tensorflow-workshop Google Cloud Platform
177
What is Machine Learning? “Field of study that gives computers the ability to learn without being explicitly programmed".
data
algorithm
bit.ly/tf-ainextcon
insight
bit.ly/tensorflow-workshop Google Cloud Platform
178
What is Machine Learning? “You can think of ML as programming with data instead of instructions. The system learns from the data so it can react correctly to new data.".
data
algorithm
bit.ly/tf-ainextcon
insight
bit.ly/tensorflow-workshop Google Cloud Platform
179
What is Machine Learning? But: http://research.google.com/pubs/pub43146.html ("Machine Learning: The High Interest Credit Card of Technical Debt")
data
algorithm
bit.ly/tf-ainextcon
insight
bit.ly/tensorflow-workshop Google Cloud Platform
180
["this", "movie", "was", "great"]
Input
→
Hidden →
Output (label) →
["POS"] bit.ly/tensorflow-workshop
bit.ly/tf-ainextcon Google Cloud Platform
181
["this", "movie", "was", "great"]
Input
→
Hidden →
Output (score) →
[.7] bit.ly/tensorflow-workshop
bit.ly/tf-ainextcon Google Cloud Platform
182
Input Hidden Output(label)
pixels(
["cat"]
)
bit.ly/tensorflow-workshop
bit.ly/tf-ainextcon Google Cloud Platform
183
Input Hidden Output(label)
So how do NNs learn?-A feedback process called backpropagation
pixels(
["cat"]
)
bit.ly/tensorflow-workshop
bit.ly/tf-ainextcon Google Cloud Platform
184
Because TensorFlow knows the entire graph of your computations, it can automatically use the backpropagation algorithm to efficiently determine how your variables affect the cost you ask it to minimize.
["cat"]
Then it can apply your choice of optimization algorithm to modify the variables and reduce the cost. bit.ly/tensorflow-workshop
bit.ly/tf-ainextcon Google Cloud Platform
185
Linear Regression
f(x)= mx + b
Linear Classification: Apply a Logistic Function ReLU
z = f(x) = mx + b y = sigmoid_function(z)
Linear Classification: Apply a Logistic Function
z = f(x) = mx + b y = sigmoid_function(z)
Image from: https://docs.microsoft.com/en-us/azure/machine-learning/machine-lear ning-algorithm-choice
Linear Classification and Linear Separability
(but: feature crosses can encode non-linear information, and TensorFlow can help with this.)
Implementation as Matrix Multiplication
bit.ly/tf-ainextcon
bit.ly/tensorflow-workshop Google Cloud Platform
191
On deep learning
Google Cloud Platform
192
How About Non Linear Data Distributions? The Problem
Linear Model
Neural Network
ConvNets
Each neuron implements a relatively simple mathematical function.
But the composition of 106 - 109 such functions is surprisingly powerful.
“A core idea in deep learning is that we assume that the data was generated by the composition of factors or features, potentially at multiple levels in a hierarchy.” From http://www.deeplearningbook.org/, Chp 5
On “the curse of dimensionality”: Is it possible to represent a complicated function efficiently? Is it possible for the estimated function to generalize well to new inputs? Yes: a key insight is that a large number of regions, say O(2k), can be defined with O(k) examples, so long as we introduce some dependencies between the regions via assumptions about the underlying data-generating distribution. Many deep learning algorithms provide ... assumptions that are reasonable for a broad range of AI tasks in order to capture these advantages. (from: http://www.deeplearningbook.org/ )
Neural Networks: Bending the Solution Space
Separating a Spiral
Tensorboard: Graph Visualization
bit.ly/tf-aiconf
TensorBoard
https://www.tensorflow.org/get_started/summaries_and_tensorboard
playground.tensorflow.org