Appendix A: Wirtinger Calculus - Wiley Online Library

1 downloads 255 Views 2MB Size Report
mean-squared error or the peak distortion) of the output signal from the desired (ref- erence) signal. For an analytical
Precoding and Signal Shaping for Digital Transmission. Robert F. H. Fischer Copyright 02002 John Wiley & Sons, Inc. ISBN: 0-471-22410-3

Appendix A Wirtinger Calculus

T

he optimization of system parameters is a very common problem in communications and engineering. For example, the optimal tap weights of an equalizer should be adapted for minimum deviation (e.g., measured by the mean-squared error or the peak distortion) of the output signal from the desired (reference) signal. For an analytical solution, a cost function is set up and the partial derivatives with respect to the adjustable parameters are set to zero. Solving this set of equations results in the desired optimal solution. Often, however, the problem is formulated using complex-valued parameters. In digital communications, signals and systems are preferably treated in the equivalent complex baseband [Fra69, Tre7 1, Pro0 11. For solving such optimization problems, derivation with respect to a complex variable is required. Starting from well-known principles, this Appendix derives a smart and easily remembered calculus, sometimes known as the Wirtinger Calculus [FL88, Rem891.

405

406

A.l

WIRTINGER CALCULUS

REAL AND COMPLEX DERIVATIVES

First, we consider a real-valued function of a real variable:

f : IR 3 x t+ y = f(x)

E IR.

(A.l . l )

The point xopt,for which f(x) is maximum’ is obtained by taking the derivative of f with respect to x and setting it to zero. For xoptthe following equation has to be valid: (A. 1.2) Here we assume f(x) to be continuous in some region R,and the derivative to exist. Whether the solution of the above equation actually gives a minimum or maximum point has to be checked via additional considerations or by inspecting higher-order derivatives. Analogous to real functions, a derivative can be defined for complex functions of a complex variable (A.1.3) f :C 3 zt+ w = f ( z ) E c as well: (A.1.4) The above limit has to exist for the infinitely many series {zn} which approach lim z , = 20. If f’(z)exists in a region R C C,the function f ( z ) is called n+ 03 analytic, holomorphic, or regular in R. In the following, the relations between real and complex derivatives are discussed. A complex function can be decomposed into two real functions, each depending on two real variables x and y, the real and imaginary parts of z : ZO, i.e.,

f ( z ) = f(x + j y) 2 u(x, y)

+ j u ( x , y),

z =x + j y .

(A.1.5)

It can be shown that in order for f ( z ) to be holomorphic, the component functions u(x, y) and u(x, y) have to meet the Cauchy-Riemann differential equations, which read (e.g., [FL88,Rem891): (A.1.6a)

(A. 1.6b)

‘The same considerations are valid for minimization.

WlRTlNGER CALCULUS

407

The complex derivative of a holomorphic function f ( z ) can then be expressed by the partial derivatives of the real functions u(x, y) and ~ ( xy): , (A.1.7) The complex derivative of a complex function plays an important role in complex analysis-in communications it has almost no significance. In fact, a more common problem is the optimization of real functions, depending on complex parameters. Complex cost functions are of no interest, because in the field of complex numbers no ordering (relations < and >) is defined and thus minimization or maximization makes no sense.

A.2 WlRTlNGER CALCULUS As already stated, we have to treat real functions of one or more complex variables. Thus, let us now consider functions

=

f : Q: 3 z = x + j y

c-)

w = f(z)

= u(zly)E

IR.

(A.2.1)

Since ~ ( xy), 0 holds (cf. (A.lS)), f ( z ) generally is not holomorphic. A real 8Y -0 = function would only be regular if, according to (A. 1.6), - 0 and &@& are valid. But this only holds for a real constant, and hence can be disregarded. The straightforward solution to the optimization of the above function is as follows: Instead of regarding f ( z ) as a real function of one complex variable, we view f ( z ) = u(z,y ) as a function of two real variables. Thus optimization can be done as for multidimensional real functions. We want to find f(z)

-+

opt.

which requires W X , Y )

dX

I -0

E

and

u(x,y) -+ opt.

du(z,y)

~

8Y

,

I().

(A.2.2)

In order to obtain a more compact representation, both of the above real-valued equations for the optimal components xOptand yopt can be linearly combined into one complex-valued equation: (A.2.3) where, for the moment, a1 and a2 are arbitrary real and nonzero constants. Equations (A.2.2) and (A.2.3) are equivalent (and hence, of course, result in the same solution) because real and imaginary part are orthogonal. As already stated, this procedure is mainly intended to get a compact representation.

408

WlRTlNGER CALCULUS

Writing real part and imaginary part of z = x define the following differential operator:

+ j y as the tuple (z,y), we can (A.2.4)

This operator can, of course, also be applied to complex functions (A. 1.5). This is reasonable, because real cost functions are often composed of complex components, e.g., f ( z ) = 1zI2 = z . z* fl(z) . f ~ ( z )with , an obvious definition of fl,z(z)E C. Note, z* 2 x - j y denotes the complex conjugate of z = x j y. The remaining task is to chose suitable constants a1 and u2. The main aim is to obtain a calculus that is easily remembered and easy to apply. As will be shown later, the choice a1 = $ and a2 = -$ meets all requirements. To honor the work of the Austrian mathematician Wilhelrn Wirtinger (1865-1945) who established this differential calculus, we call it Wirtinger Calculus.

+

Definition A. 1: Wirfinger Calculus The partial derivatives of a (complex) function f ( z ) of a complex variable z = 2 j y E C,5,y E R, with respect to z and z * , respectively, are defined as:

+

--,-

af a 1. a.f .a.f By) dz - 2 (ax

-

(A.2.5)

and

(A.2.6)

A.2.1 Examples We now study some important examples. First, let f ( z ) = cz, where c 6 constant. Derivation of f ( z ) yields

and az*

")

+ j "(>J;

2

=

1(c +j (jc)) = 0 .

C is a

(A.2.8)

2

Similarly, for f ( z ) = cz*, we arrive at:

az

2

dX

-j

-Jy)) aY

=

(c - j

(-j c)) = 0

,

(A.2.9)

WlRTlNGER CALCULUS

409

and dz*

2

+ j d c ( x d ~ J y ) ) = -1( c + j ( - j c ) ) 2

dX

Next, we consider the function f = zz* = 1zI2 = x2 read:

a

1 zz* = az 2 and d dz*

(

d(x2+ y2) dX

+ y2)) =

-j

aY

(

1 d(X2+Y2) +j 2 dX

- zz* = -

dy

y2)) =

a

(A.2.10)

=c.

+ y2. Here the derivatives

(22 - j 2 y ) = z*

, (A.2.11)

f (2x +j 2y) = z .

(A.2.12)

To summarize, the correspondences in Table A.l are valid. Table A. I

Wirtinger derivatives of some important functions

CZ

C

0

CZ*

0

c

ZZ*

z*

z

Note that using the Wirtinger Calculus differentiation is formally done as with real functions. Moreover, and somewhat unexpected, z* is formally considered as a constant when derivating with respect to z and vice versa. It is also easy to show that the sum, product, and quotient rules still hold. For example, given f ( z ) = f l ( z ) . f2(z),we obtain

a

-f1(z) az

.f2(z)

=

-j

afl(.)f2(z) dY

-j

j l ( z ) ~ )

(A.2.13)

410

WIRTINGER CALCULUS

Finally, for f ( z ) [FL88, Rem891:

A.2.2

=

h ( g ( z ) )5 h ( w ) ,g : C ++ C,the following chain rules hold

Discussion

The Wirtinger derivative can be considered to lie inbetween the real derivative of a real function and the complex derivative of a complex function. Rewriting (A.2.5) and (A.2.6),we arrive at:

(A.2.15a)

a f -= o dz*

(A.2.15b)

On the one hand, equation (A.2.15a) states that for holomorphic functions the Wirtinger derivative with respect to z agrees with the ordinary derivativeof a complex function (cf. (A.l.7)). On the other hand, (A.2.15b) can be interpreted in the way that holomorphic functions do not formally depend on z * . Contrary to the usual complex derivative, the Wirtinger derivative exists for all functions, in particular nonholomorphic ones, such as real functions. Since both operators and are merely a compact notation incorporating two real differential quotients, they can be applied to arbitrary functions of complex variables. For nonholomorphic functions, $ # 0 usually holds, and thus either the derivative with respect to z or z* can be used for optimization. The actual cost functions determines

&

411

GRADIENTS

which one is more advantageous; if quadratic forms are considered, the operator is preferable. To summarize, it should again be emphasized, that, because of its compact notation, Wirtinger Calculus is very well suited for optimization in engineering. It circumvents a separate inspection of real part and imaginary part of the cost function. Because of the simple arithmetic rules-mostly it can be calculated as known from real functions-the Wirtinger Calculus is very clear.

A.3 GRADIENTS For the majority of applications the cost function does not only depend on one, but on many variables, e.g., we have f : C" 3 z = [ z l ,z2,. . . ,znIT c-) w = f(z)E IR. For optimization, all n partial derivatives with respect to the complex variables z1 = 2 1 jy1 through z, = 2, jy, have to be calculated. Usually, these derivatives are again combined into a vector, the so called gradient:

+

+

A

which, in the optimum, has to equal the zero vector 0 = [0, 0, . . . , 0IT. Wirtinger Calculus is especially well suited for such multidimensional functions, because here only with a great effort can the real part and the imaginary part be separated and inspected independently. Using the above definitions of the partial derivatives ((A.2.5) and (A.2.6)), we arrive at simple arithmetic rules, now expressed using vectors and matrices.

A.3.1 Examples

c:=l

We now again study some important examples. First, let f(z)= cTz = c,z, or n f(r) = cTz* = czz:, respectively, with c = [ c ~c2,. , . . , c,IT and c, constant. It is easy to prove the following properties for gradients:

c,=l

' Tc

z=c,

az

L

az*

cT z

=o,

-' c

8%

Z

az*

T

z*=o,

(A.3.2) cT z * = c .

*Here we use column vectors, but the same considerations also apply to row vectors.

4f2

WlRTfNGER CALCULUS

Finally, considering the quadratic form f ( z ) = z H M z , where M is a constant n x n matrix, derivation results in: d T a - zHMz= MZ (A.3.3) - zHMz= (zHM) 8% az* and for f ( z )= zHz we arrive at:

' z H 8%

z=z*,

d

dz*

z Hz = z .

(A.3.4)

To summarize, the correspondences in Table A.2 are valid. Table A.2 Wirtinger derivatives (gradients) of some important functions.

A.3.2 Discussion The gradient with respect to the Wirtinger derivatives, is related to the gradient

(A.3.5)

which is frequently used, e.g., in [Hay961 by

or since f ( z )is real-valued,

af ( z ) .

(A.3.7) 8% The first disadvantage of definition (A.3.6) compared to Wirtinger Calculus is that an undesired factor of 2 occurs. In particular, if the chain rule is applied 1 times, the result is artificially multiplied by 2l. Second, the calculus is not very elegant, because the gradient of, e.g., f ( z ) = cTz is V(cTz) = 0, but that of f ( z ) = cTz* calculates to V(cTz*) = 2%. Hence, because of its much clearer arithmetic rules, we exclusively apply the Wirtinger Calculus. (Vf(Z))* = 2-

GRADIENTS

413

REFERENCES [FL881

W. Fischer and 1. Lieb. Funktionentheorie. Vieweg-Verlag, Braunschweig, Germany, 1988. (In German.)

[Fra69]

L. E. Franks. Signal Theory. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1969.

[Hay961

S. Haykin. Adaptive Filter Theory. Prentice-Hall, Inc., Englewood Cliffs, NJ, 3rd edition, 1996.

[Pro011

J. G. Proakis. Digital Communications. McGraw-Hill, New York, 4th edition, 200 1.

[Rem89]

R. Remmert. Funktionentheorie 1. Springer Verlag, Berlin, Heidelberg, 1989. (In German.)

[Tre7 11

H. L. van Trees. Detection, Estimation, and Modulation Theory-Part 111: Radar-Sonar Signal Processing and Gaussian Signals in Noise. John Wiley & Sons, Inc., New York, 1971.