A comparison of Shannon's cross entropy and mean squared error ... Squared Error (MSE) vs Cross Entropy. (CE) ... 5 repe
D EEP N EURAL N ETWORKS FOR M ODELING N ONLINEAR D YNAMICS
A comparison of Shannon‘s cross entropy and mean squared error N AJEEB K HAN AND I AN S TAVNESS D EPARTMENT OF C OMPUTER S CIENCE , U NIVERSITY OF S ASKATCHEWAN . I NTRODUCTION
R ESULTS
• Evaluation metric: root mean squared error (RMSE) • 5 repetitions of 10-fold cross validation
• Arm reaching movements can be modeled as a mapping [1]
Dataset partitioned into k folds 1 1
Joint Space
test
5 6
Torque trajectory
Hidden features α
Y-axis
Elbow Angle θ2
0
0.2
7 0 −0.2
−2
−1
0
1
2
3
Shoulder Angle θ1
−0.4 −0.8 −0.6 −0.4 −0.2
0
0.2
0.4
0.6
0.8
X-axis
test test test
Input features α
Hidden features β
Hidden features α
Hidden features β
Initial and final states
Reconstructed trajectory
Hidden layer α
Hidden layer β
6
#10
τˆ1
τˆ1
α1
α ˆ1
τ2
τˆ2
τˆ2
τˆ3
α2
α ˆ2
τ3
τˆ3
τˆ3
τ4
τˆ4
α3
α ˆ3
τ4
τˆ4
χ1
τˆ4
τ5
τˆ5
α4
α ˆ4
τ5
τˆ5
χ2
τˆ5
τ6
τˆ6
α5
α ˆ5
τ6
τˆ6
τˆ6
τ7
τˆ7
α6
α ˆ6
τ7
τˆ7
τˆ7
τ8
τˆ8
τ8
τˆ8
τˆ8
τ2
τˆ2
τ3
(a) Learning hidden features α from torque trajectory
(b) Learning hidden features β from hidden features α
100
200
300
α ˆ1
α2
α ˆ2
α3
α ˆ3
α4
α ˆ4
α5
α ˆ5
α6
α ˆ6
400
500
Cross Entropy Mean Squared Error
τ1
τˆ1
τ2
τˆ2
τ3
τˆ3
τ4
τˆ4
τ5
τˆ5
τ6
τˆ6
τ7
τˆ7
τ8
τˆ8
5 4 3 2 1
#10!3
6
200
300
400
(c) Pre-training a deep autoencoder
·10−3 4
τ1
τˆ1
τ2
τˆ2
τ3
τˆ3
τ4
τˆ4
τ5
τˆ5
τ6
τˆ6
τ7
τˆ7
τ8
τˆ8
CE MSE
(d) Deep network predicting torque trajectory from initail and final state
·10−2
τˆ7
τ8
τˆ8
100
200
300
400
·10−3
α1
α ˆ1
α2
α ˆ2
α3
α ˆ3
α4
α ˆ4
α6
RMSE
τˆ6
τ7
1
CE MSE
7
α ˆ5 α ˆ6
1.2
6
1
5
0.8
500
τ1
τˆ1
τ2
τˆ2
τ3
τˆ3
τ4
τˆ4
τ5
τˆ5
τ6
τˆ6
τ7
τˆ7
τ8
τˆ8
CE MSE
4 3 2
0.2 0
τˆ5
τ6
2
0.4
1
τˆ4
τ5
Figure 8: Mean and 95 percent confidence intervals of test reconstruction error for the deep autoencoder.
α5
2
τˆ3
τ4
Number of epochs
1.4
3
τˆ2
τ3
3
500
1.6
τˆ1
τ2
4
0 100
Cross Entropy Mean Squared Error
τ1
5
0.6
τ1
τˆ1
7
Predicted torque trajectory
Reconstructed α
τ1
Cross Entropy Mean Squared Error
α1
Figure 7: Mean and 95 percent confidence intervals of test reconstruction error for the second autoencoder with hidden-layer size 4.
!3
Figure 6: Mean and 95 percent confidence intervals of test reconstruction error for the first autoencoder with hidden-layer size 50.
RMSE
A deep network was used to map the initial and final state to the torque trajectory Torque trajectory
1
Number of epochs
• MSE: Assumes independent Gaussian residuals • CE: Average measure of information
Reconstructed input
2
0.4
Unsupervised layerwise pre-training [2] using
Torque control trajectory
3
test
M ETHODS
Inverse dynamics of the arm
4
Number of epochs
Figure 5: 10-fold cross validation.
Figure 2: A rectangular region in the joint-space transforms into a non-rectangular region in the hand-space.
Compute minimumjerk trajectories
5
test
10
0.6
6
0
test
0
Figure 3: Data set generation.
10
0.8
−3
Generate random initial and final points χ
9
test
8
Hand Space
−2
(3)
8
9
(2)
xk log x ˆk − (1 − xk ) log(1 − x ˆk )
7
#10!3
RMSE
k=1
k=1
(xk − x ˆk )
6
7
Root mean squared error (RMSE)
JCE =
K X
2
2
kth training
Figure 1: Two-link planar arm model (Adapted from Berniker et al., Nat. Neurosci. 2008).
5
test
3
2
JM SE =
4
4
• Highly non-linear dimensionality reduction using deep autoencoders • Autoencoder training using Mean Squared Error (MSE) vs Cross Entropy (CE) K X 1
3
test
2
(1)
2
Root mean squared error (RMSE)
{xi , xf , T } → {x(t), u(t)}
7
Root mean squared error (RMSE)
We used deep neural networks for the control of a two-link planar arm model.
1
0 Batch 500 Batch 50 3 Line Searches
Batch 500 Batch 50 10 Line Searches
Batch 500 Batch 50 3 Line Searches
Batch 500 Batch 50 10 Line Searches
Batch 500 Batch 50 3 Line Searches
Batch 500 Batch 50 10 Line Searches
Figure 9: Impact of hyper-parameters on performance of CE vs MSE for training: autoencoder with hidden layer 50 (left), autoencoder with hidden layer 4 (center), deep autoencoder (right).
Figure 4: Unsupervised pre-training for torque trajectory (a-c) dimensionality reduction and (d) prediction. C ONCLUSIONS
First detailed evaluation of learning criteria for deep autoencoders. • Empirically proved that CE achieves a lower reconstruction RMSE compared to MSE
• These results are independent of hyper-parameters such as minibatch size and number of conjugate gradient line searches. • As future work, the impact of data set size and use of regularization on the cost functions may be evaluated.
R EFERENCES [1] Max Berniker and Konrad P Kording. Deep networks for motor control functions. Frontiers in computational neuroscience, 9, 2015. [2] Yoshua Bengio. Greedy layer-wise training of deep networks. NIPS, 19:153, 2007.