DEEP NEURAL NETWORKS FOR MODELING NONLINEAR DYNAMICS

D EEP N EURAL N ETWORKS FOR M ODELING N ONLINEAR D YNAMICS

A comparison of Shannon‘s cross entropy and mean squared error N AJEEB K HAN AND I AN S TAVNESS D EPARTMENT OF C OMPUTER S CIENCE , U NIVERSITY OF S ASKATCHEWAN . I NTRODUCTION

R ESULTS

• Evaluation metric: root mean squared error (RMSE) • 5 repetitions of 10-fold cross validation

• Arm reaching movements can be modeled as a mapping [1]

Dataset partitioned into k folds 1 1

Joint Space

test

5 6

Torque trajectory

Hidden features α

Y-axis

Elbow Angle θ2

0

0.2

7 0 −0.2

−2

−1

0

1

2

3

Shoulder Angle θ1

−0.4 −0.8 −0.6 −0.4 −0.2

0

0.2

0.4

0.6

0.8

X-axis

test test test

Input features α

Hidden features β

Hidden features α

Hidden features β

Initial and final states

Reconstructed trajectory

Hidden layer α

Hidden layer β

6

#10

τˆ1

τˆ1

α1

α ˆ1

τ2

τˆ2

τˆ2

τˆ3

α2

α ˆ2

τ3

τˆ3

τˆ3

τ4

τˆ4

α3

α ˆ3

τ4

τˆ4

χ1

τˆ4

τ5

τˆ5

α4

α ˆ4

τ5

τˆ5

χ2

τˆ5

τ6

τˆ6

α5

α ˆ5

τ6

τˆ6

τˆ6

τ7

τˆ7

α6

α ˆ6

τ7

τˆ7

τˆ7

τ8

τˆ8

τ8

τˆ8

τˆ8

τ2

τˆ2

τ3

(a) Learning hidden features α from torque trajectory

(b) Learning hidden features β from hidden features α

100

200

300

α ˆ1

α2

α ˆ2

α3

α ˆ3

α4

α ˆ4

α5

α ˆ5

α6

α ˆ6

400

500

Cross Entropy Mean Squared Error

τ1

τˆ1

τ2

τˆ2

τ3

τˆ3

τ4

τˆ4

τ5

τˆ5

τ6

τˆ6

τ7

τˆ7

τ8

τˆ8

5 4 3 2 1

#10!3

6

200

300

400

(c) Pre-training a deep autoencoder

·10−3 4

τ1

τˆ1

τ2

τˆ2

τ3

τˆ3

τ4

τˆ4

τ5

τˆ5

τ6

τˆ6

τ7

τˆ7

τ8

τˆ8

CE MSE

(d) Deep network predicting torque trajectory from initail and final state

·10−2

τˆ7

τ8

τˆ8

100

200

300

400

·10−3

α1

α ˆ1

α2

α ˆ2

α3

α ˆ3

α4

α ˆ4

α6

RMSE

τˆ6

τ7

1

CE MSE

7

α ˆ5 α ˆ6

1.2

6

1

5

0.8

500

τ1

τˆ1

τ2

τˆ2

τ3

τˆ3

τ4

τˆ4

τ5

τˆ5

τ6

τˆ6

τ7

τˆ7

τ8

τˆ8

CE MSE

4 3 2

0.2 0

τˆ5

τ6

2

0.4

1

τˆ4

τ5

Figure 8: Mean and 95 percent confidence intervals of test reconstruction error for the deep autoencoder.

α5

2

τˆ3

τ4

Number of epochs

1.4

3

τˆ2

τ3

3

500

1.6

τˆ1

τ2

4

0 100


τ1

5

0.6

τ1

τˆ1

7

Predicted torque trajectory

Reconstructed α

τ1


α1

Figure 7: Mean and 95 percent confidence intervals of test reconstruction error for the second autoencoder with hidden-layer size 4.

!3

Figure 6: Mean and 95 percent confidence intervals of test reconstruction error for the first autoencoder with hidden-layer size 50.

RMSE

A deep network was used to map the initial and final state to the torque trajectory Torque trajectory

1

Number of epochs

• MSE: Assumes independent Gaussian residuals • CE: Average measure of information

Reconstructed input

2

0.4

Unsupervised layerwise pre-training [2] using

Torque control trajectory

3

test

M ETHODS

Inverse dynamics of the arm

4

Number of epochs

Figure 5: 10-fold cross validation.

Figure 2: A rectangular region in the joint-space transforms into a non-rectangular region in the hand-space.

Compute minimumjerk trajectories

5

test

10

0.6

6

0

test

0

Figure 3: Data set generation.

10

0.8

−3

Generate random initial and final points χ

9

test

8

Hand Space

−2

(3)

8

9

(2)

xk log x ˆk − (1 − xk ) log(1 − x ˆk )

7

#10!3

RMSE

k=1

k=1

(xk − x ˆk )

6

7

Root mean squared error (RMSE)

JCE =

K X

2

2

kth training

Figure 1: Two-link planar arm model (Adapted from Berniker et al., Nat. Neurosci. 2008).

5

test

3

2

JM SE =

4

4

• Highly non-linear dimensionality reduction using deep autoencoders • Autoencoder training using Mean Squared Error (MSE) vs Cross Entropy (CE) K X 1

3

test

2

(1)

2


{xi , xf , T } → {x(t), u(t)}

7


We used deep neural networks for the control of a two-link planar arm model.

1

0 Batch 500 Batch 50 3 Line Searches

Batch 500 Batch 50 10 Line Searches





Figure 9: Impact of hyper-parameters on performance of CE vs MSE for training: autoencoder with hidden layer 50 (left), autoencoder with hidden layer 4 (center), deep autoencoder (right).

Figure 4: Unsupervised pre-training for torque trajectory (a-c) dimensionality reduction and (d) prediction. C ONCLUSIONS

First detailed evaluation of learning criteria for deep autoencoders. • Empirically proved that CE achieves a lower reconstruction RMSE compared to MSE

• These results are independent of hyper-parameters such as minibatch size and number of conjugate gradient line searches. • As future work, the impact of data set size and use of regularization on the cost functions may be evaluated.

R EFERENCES [1] Max Berniker and Konrad P Kording. Deep networks for motor control functions. Frontiers in computational neuroscience, 9, 2015. [2] Yoshua Bengio. Greedy layer-wise training of deep networks. NIPS, 19:153, 2007.