Flexible, High Performance Convolutional Neural Networks for Image ...

Deep Neural Networks for Pattern Recognition

Dan Cireşan IDSIA, USI and SUPSI Lugano, Switzerland [email protected]

www.idsia.ch/~ciresan

DNN for Visual Pattern Recognition • Classification

• Detection

• Segmentation

Neural networks • NN were introduced in 1940 (McCulloch & Pitts). The perceptron – Rosenblatt 1958 • Research advanced slowly in the next decades • Problems: high processing power requirements, XOR • Key advancements: – backpropagation, Werbos 1975 – convolutional nets – Fukushima 1979, LeCun 1989 • By 1990s NN were overtaken by other Machine Learning techniques, like Support Vector Machines (SVM)

Deep Neural Networks • Mathematical functions: • Highly nonlinear • Huge complexity

f : Rn  Rm

– High input and output dimensionality: m  2,105 n  102 ,106 – 104-107 parameters to learn – 106-109 connections (operations) / image – Learning the parameters requires quadrillions of floating point operations









Feed Forward Deep Neural Networks • General architecture: Multi Layer Perceptron • Dedicated architecture for images: Convolutional Neural Networks (CNN / DNN). First used by Fukushima in 1979. • Once made huge, they significantly advanced the field of Visual Pattern Recognition: Ciresan et al. NC 2010 & IJCAI2011 & CVPR 2012, Krizhevsky et al. NIPS 2012, …

Multi Layer Perceptrons • • • • • •

Input

Hidden layer I

Hidden layer II

Output

simple, uniform architecture fully connected  many weights general non-linear function approximator disregards 2D information inefficient for classifying images Big and deep MLP (~12 million weights), breaking the record on MNIST (Ciresan et al., NC 2010).

  y j  f a j   f   xi wij  b j   i 

First implementation of a huge MLP in CUDA: Deep, Big, Simple Neural Nets for Handwritten Digit Recognition – Ciresan et al., Neural Computation 2010

Convolutional Neural Networks (CNNs) • Hierarchical architecture designed for image processing, loosely inspired from biology • Introduced by Fukushima (79) and refined by LeCun et al.(98),

Riesenhuber et al.(99), Simard et al.(03), Behnke (03) • Uses neighboring information (preserves 2D information) • First implementation of a CNN/DNN in CUDA: Flexible, High Performance

Convolutional Neural Networks for Image Classification – Ciresan et al., IJCAI 2011

Convolutional Neural Network Input Image

yn = CNN(xn,w)

…

Convolution

Feature Extraction can be repeated

Pooling

Fully Connected

Feature Selection

Classification

Convolutional layer • Detection layer, i.e. convolve an image with a feature detector • 3x3 kernel (filter) => only 10 weights (3x3+1) shared by all 900 neurons • each neuron has 9 source neurons • 900 x (9+1) = 9000 connections

Max-pooling layer • • •

Introduces small translation invariance / robustness to small distortions Improves generalization Decreases dimensionality of representation (feature selection!)

Fully connected layer • One output neuron per class normalized with soft-max activation function

Training is computationally intensive 9HL-48x48-100C3-MP2-200C2-MP2-300C2-MP2-400C2-MP2-500N-3755 Net for Chinese OCR #weights #connections Feature extraction layer: 321’800 26.37 Million Classification layer: 1’881’255 1.88 Million

1’121’749 training samples CPU (i7-920, one thread): 27h to forward propagate one epoch 14 months to train for 30 epochs

Graphics processing units (GPUs) • • •

•

8 x GTX 480/580 1.5GB RAM >12 TFLOPS (theoretical speed) on 2 kW 40-80x speed-up compared with a single threaded CPU version of the CNN program (one day on GPU instead of two months on CPU) ~3000 Euro for one workstation with 4 GPUs (ca. 2011)

Or one week on GPU!

Experiments: from digits to natural images • • • • • • •

Handwritten digits and Latin characters: NIST SD 19 Handwritten Chinese characters Traffic signs Stereo images of 3D models: NORB Natural images: CIFAR10 Mitosis detection in breast cancer histological images Neuronal membrane segmentation in Electron Microscopy images (for connectomics) • Retina vessel segmentation • Trail detection • Vision for humanoid robots

MNIST • • • • • •

Very competitive dataset Probably the most famous benchmark Once we applied DNN, it became a toy problem Handwritten digits, 28x28 grayscale images 60000 for training and 10000 for testing Many papers: http://yann.lecun.com/exdb/mnist/

MNIST • Simard et al. (2003) – 0.40%, Ciresan et al. (2010) – 0.35% (big deep simple MLP) • Big deep CNN – 0.35% (2011), far less weights than a MLP • 30 out of 35 errors have a correct second prediction #M, #N in Hidden Layers

Test error [%]

20M-60M

1.02

20M-60M-150N

0.55

20M-60M-100M-150N

0.38

20M-40M-60M-80M-100M-120M-150N

0.35

Label First prediction Second prediction

35 errors Flexible, High Performance Convolutional Neural Networks for Image Classification; IJCAI 2011, D. Ciresan et al.

Committees

• • • • •

Extremely simple idea Easy to compute Averaging the corresponding outputs of many nets 0.35%  0.27% with 7 nets, 0.23% with 35 nets. Works better with preprocessing in case of handwritten characters

Ciresan et al. - Convolutional Neural Network Committees For Handwritten Character Classification, ICDAR 2011

Handwritten Characters NIST SD 19

Ciresan et al. - Convolutional Neural Network Committees For Handwritten Character Classification, ICDAR 2011

NORB: stereo images of 3D models • • • • •

Cluttered & jittered NORB 486000 96x96 stereo images 5 classes with 10 instances 5 instances for training and 5 for testing DNN improve state of the art from 5% to 2.7%

Flexible, High Performance Convolutional Neural Networks for Image Classification; Ciresan et al. IJCAI 2011

CIFAR10 • • • • •

small, 32x32 pixels color images complex backgrounds 10 classes 50000 training images 10000 test images

trans. [%]

IP maps

TfbV [%]

0; 100M

no

28.87 ± 0.37

0; 100M

edge

29.11 ± 0.36

5; 100M

no

20.26 ± 0.21

5; 100M

edge

21.87 ± 0.57

5; 100M

hat

21.44 ± 0.44

5; 200M

no

19.90 ± 0.16

5; 300M

no

19.51 ± 0.18

5; 400M

no

19.54 ± 0.16

11.21% - Ciresan et al. CVPR 2012 first layer filters

Flexible, High Performance Convolutional Neural Networks for Image Classification; Ciresan et al. IJCAI 2011

Deep Neural Networks trained on GPUs excel in image classification… Dataset

Previous result [%]

Our result [%]

Decrease of error rate

MNIST

0.39

0.23

41%

NIST SD 19

20-80%

Chinese characters

10.01

5.78

42%

Small NORB

2.87

2.01

30%

Full NORB

5.00

2.70

46%

CIFAR10

18.81

11.21

39%

Traffic signs

1.69

0.54

72%

Ciresan et al., Flexible, High Performance Convolutional Neural Networks for Image Classification, IJCAI 2011 Ciresan et al., Multi-column deep neural networks for image classication, CVPR 2012.

Our DNN won six international competitions – First place at Assessment of Mitosis Detection Algorithms, MICCAI 2013 Grand Challenge, Nagoya, Japan – Best result at Offline Chinese Character Recognition, ICDAR 2013, Washington D.C., USA – First place at Segmentation of neuronal structures in EM stacks - ISBI 2012, Barcelona, Spain – First place at Mitosis Detection in Breast Cancer Histological Images, ICPR 2012, Tsukuba, Japan – First place at Offline Chinese Character Recognition, ICDAR 2011, Beijing, China – First place at The German Traffic Sign Recognition Benchmark, IJCNN 2011, San Jose, USA All competitions had hidden test sets and testing was performed by the organizers.

German Traffic Sign Recognition Competition – IJCNN 2011, San Jose • 40000 color images: 15x15 to 250x250 pixels • 43 different signs: 26640 training images, 12569 testing images Input: Extract Region of Interest (ROI) • Enhance contrast with four different methods • Resize up/down to 48x48 pixels • Great variation in details, contrast and illumination first layer filters

Normalized images • Original • Image Adjustment • Histogram Equalization

• Adaptive Hist. Eq. • Contrast Normalization

MCDNN trained on preprocessed color images • Original + 4 normalized datasets • Train 5 nets for each normalization • Build a 25-net committee first layer filters

Adaptive Image Histogram Histogram Original Adjustment Equalization Equalization Contrast Rank Team Representative Method Error rate 98.56% 98.39% 98.80% 98.46% 98.63% 1 IDSIA Dan Ciresan Committee of CNNs 0.54% 98.16% 98.57% 98.23% 98.47% 98.33% 2 INI Human Performance 1.16% 98.64% 98.77% 98.51% 98.51% 98.46% 3 NYU Pierre Sermanet Multi-Scale CNNs 1.69% 98.43% 98.61% 98.31% 98.53% 98.62% 4 CAOR Fatin Zaklouta Random Forests 3.86% 98.53% 98.77% 98.57% 98.57% 98.66%

68 Errors out of 12569 Images • Over 80% of the 68 errors are associated with correct second predictions • Rejecting 1% percent (confidence < 0.51) results in 0.24% error rate • A single error for a rejecting rate of 6.6% (confidence < 0.94)

Chinese Handwriting Recognition Competition – ICDAR 2011, Beijing • • • • • • • • •

Offline Chinese Character Recognition (Task 1) 3GB of data 3755 classes >1M characters 48x48 pixels grayscale images 270 samples / class 9 teams No knowledge of Chinese First place at ICDAR 2011 competition – 94.22% first prediction is correct – 99.29% CR10

• 3.5h/Epoch on GPU (training + testing)

• 27h on CPU only Forward propagation

Training on GPU: 3.54 hours/epoch

>90% rec. rate after a single epoch

Initial training with a separated validation set

Continue to train the net with lowest validation error on merged training and validation data.

Offline Chinese Character Recognition ICDAR 2011 System

CPU 2-Core GPU Size of the 3GHz 580GTX model

CR 1

CR 5

CASIA-CREC-1

83.02%

97.15%

0.93ms

5.71M

CASIA-CERC-2

82.02%

96.75%

0.90ms

10.33M

CASIA-CERC-3

82.45%

96.97%

0.92ms

12.17M

CASIA-CSIS HKU IDSIAnn-1 IDSIAnn-2 SCUT-HCII THU

90.77% 91.87% 92.05% 94.22% 86.01% 91.54%

98.66% 98.99% 99.27% 99.29% 93.44% 98.91%

12.25ms 183.32ms 86.21ms 86.21ms 2.96ms 6.94ms

457.23M 475.41M 27.35M 27.35M 6.51M 188.64M

2.2ms 2.2ms

Offline Chinese Character Recognition – ICDAR 2013, Washington D.C. • • • •

Same competition Same data: 3755 classes, 1M+ characters Better recognition system MCDNN have state of the art result again, 19% better than the next competitor • Recognition rate goes up from 94.22% (2011) to 95.78% • Very close to human performance: 96.13% • When considering the first 5 predictions, the recognition rate reaches 99.70% • DNN solved the problem of unconnected Chinese character recognition

MEDICAL APPLICATIONS • Neuronal membrane segmentation • Mitosis detection • Retinal blood vessel segmentation

joint work with Alessandro Giusti

Neural Networks for Segmenting Neuronal Structures in Electron Microscopy Stacks – ISBI 2012, Barcelona, Spain Training data: 30 labeled 512x512 slices Test data: 30 unlabeled 512x512 slices CONNECTOMICS

Introduction • We use a powerful pixel classifier (a Deep Neural Network) and minimal postprocessing • Input: raw pixel values in a window (no features) • Output: membrane probability of central pixel Deep Neural Network

Ciresan et al. - Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images, NIPS 2012

A Typical Network Architecture

Feature extraction layers

Classification layers

132K weights 121M connections

87K weights 87K connections

Training samples & time 1.5M positive training samples (all membrane pixels in the stack)

1.5M negative training samples

 1 year training time for 30 epochs on a CPU

Or 1 week on a GPU

Manipulations on input windows Plain FOV NU FOV+NU

Foveation (FOV) forces the network to disregard fine details at the periphery of the input window  improved Sampling generalization ability a larger image area in Nonuniform (NU) captures the same amount of neurons, by sampling with reduced resolution at the periphery of the window

Averaging four DNN

Final Results (all slices)

DEMO: real time 3D visualization • Final version at Supercomputing 2014, 17-20 November 2014

Ciresan et al. - Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images, NIPS 2012

Results of ISBI 2012 challenge Even without problem-specific postprocessing, our approach outperforms competition by a large margin in all metrics. For pixel error, our approach is the only one outperforming a second human observer.

Dynamic programming speeds-up testing three orders of magnitude: up to 5Mpixels/s or 55h for 1Tpixel or 231days for 1Ppixel (one GTX 580). Giusti et al., ICIP 2013 Ciresan et al. - Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images, NIPS 2012

Multi-class Results

Input Membranes Mitochondria Synapses

Multi-class Results


Multi-class Results


Multi-class Results


Multi-class Results

Membranes

Multi-class Results

Membrane Junctions

ICPR2012 & MICCAI2013 competitions Mitosis Detection in Breast Cancer Histology Images • No histology or medicine background • Mitosis detection as a challenging visual pattern recognition problem • 2012 ICPR Competition – 50 images, 300 mitosis

• 2013 MICCAI Competition – ~600 images, 1157 mitosis

Data Description 2048x2048 px (0.5 x 0.5 mm)

ICPR2012 & MICCAI2013 competitions Mitosis Detection in Breast Cancer Histology Images • We use a powerful pixel classifier (a Deep Convolutional Neural Network) to detect pixels close to mitosis centroids • Input: raw pixel values in a window (no features, no preprocessing) • Output: probability of central pixel being close to a mitosis centroid

collaboration with Alessandro Giusti

Network Architecture

Feature extraction layers

Classification layers

7.5K weights 4.7M connections

6.7K (13.4K) weights 6.7K (13.4K) connections

Training samples & time (all pixels closer than 10 px to a mitosis)

ICPR 2012

66K positive training samples 2M negative training samples

5 months training time for up to 7 epochs on a CPU

Or up to 3 days on a GPU

Detection results

D. Ciresan, A. Giusti, L.M. Gambardella, J. Schmidhuber - Mitosis Detection in Breast Cancer Histology Images using Deep Neural Networks, MICCAI 2013

Assessment of Mitosis Detection Algorithms 2013 - MICCAI Grand Challenge

Results of Mitosis Detection Competitions ICPR 2012

MICCAI 2013

0.9

0.7

0.8

0.6 0.5 F1 score

0.6 0.5 0.4 0.3

0.4 0.3 0.2

0.2

0.1

0.1

0 other entries

IDSIA

0

IDSIA

F1 score

0.7

other entries

Retina vessel segmentation - challenging problem - clinical relevance (e.g. for diagnosing glaucoma) - state of the art results for DRIVE and STARE datasets - better than a second human observer

DNN

Retina vessel segmentation: method

Retina vessel segmentation: results

Vision for humanoid robots: train iCub to play chess • Localization and classification with DNN

project developed by Alan Lockett

Trail Following Problem

MAV

collaboration with Jérôme Guzzi, Alessandro Giusti, Fang-lin He, Juan P. R. Gómez

Simulated experiments videos: 1. Following a trail 2. Trail detection

Trail following problem – demo video

Pedestrian detection – DAIMLER dataset -preliminary results-

Adapted from Monocular Pedestrian Detection: Survey and Experiments, M. Enzweiler and D. M. Gavrila, PAMI 2009

Pedestrian detection – DAIMLER dataset • Live demo

Thanks • My colleagues: Ueli Meier Jonathan Masci Jérôme Guzzi Jan Koutnik

Alessandro Giusti Alan Lockett Fang-lin He Marijn Stollenga

• Funding: Swiss National Foundation (SNF) Commission for Technology and Innovation (CTI) Industry

Conclusions • Big deep nets combining CNN and other ideas are now state of the art for many image classification, detection and segmentation tasks • Our DNN won six international competitions • DNN can be used for various applications: automotive, biomedicine, detection of defects, document processing, image processing, etc.

• DNN are already better and much faster than humans on many difficult problems • GPUs are essential for training DNN. Testing can be done on CPU.

• More info: www.idsia.ch/~ciresan

[email protected]