Generalize Linear Models - Residuals - Pages.cs.wisc.edu…

5 downloads 20 Views 373KB Size Report
Mar 5, 2004 - (Anscombe residuals) the µ-asymptotic skewness of h(Y ) is zero. Generalized Linear Models and Transforme
Generalized Linear Models and Transformed Residuals

Generalized Linear Models and Transformed Residuals

Outline

Generalized Linear Models and Transformed Residuals Henrik Bengtsson1 1 Mathematical Statistics Centre for Mathematical Sciences Lund University

March 5, 2004

Introduction Cumulants Transforms Transforms for the exponential family

Generalized Linear Models and Transformed Residuals Introduction

Generalized Linear Models and Transformed Residuals Introduction

Transformed residuals

Cumulants and the exponential family

The idea behind transformed residuals h(Y ) − Eθ [h(Y )] rT (Y , θ) = Dθ [h(Y )] is to find a transform h(·) such that the distribution of h(Y ) is as “Normal as possible”. Two cases; choose h(·) such that 

(Variance-stabilizing residuals) the µ-asymptotic variance of h(Y ) is constant in θ,



(Anscombe residuals) the µ-asymptotic skewness of h(Y ) is zero.

Consider the exponential family, which has density function fY (y ) ∝ exp{y θ − b(θ) + c(y )}. The moment generating function of Y is MY (t) = E [exp(tY )] = exp{b(θ + t) − b(θ)}. The cumulant generating function of Y is KY (t) = ln{MY (t)} = b(θ + t) − b(θ).

Generalized Linear Models and Transformed Residuals Cumulants

Generalized Linear Models and Transformed Residuals Cumulants

Power series expansion of the cumulant function

Normal distribution cumulants

The cumulant generating function of Y is KY (t) = ln{MY (t)} If Y ∈ N(µ, σ 2 ), then κ1 [Y ] = µ, κ2 [Y ] = σ 2 and κr [Y ] = 0 for r > 2. Thus,

It has a power series expansion KY (t) =

∞ 

κr [Y ]

r =1

tr r!

KY (t) = µt + σ 2

where {κr [Y ]}∞ r =1 are the cumulants of Y . The four first are κ1 [Y ] = E [Y ] = µ, κ2 [Y ] = V [Y ] = σ 2 ,

t3 t4 t2 +0· +0· + ... 2 3! 4!

In other words, the cumulants κr [Y ]; r > 2 can be seen as a measures of non-Normality.

κ3 [Y ] = E [(Y − E [Y ])3 ], κ4 [Y ] = E [(Y − E [Y ])4 ] − 3σ 4 One definition of skewness is Sθ [Y ] = κ3 [Y ]/σ 3 and is a scale-free measure of asymmetry.

Generalized Linear Models and Transformed Residuals Cumulants

Generalized Linear Models and Transformed Residuals Cumulants

µ-asymptotic normality

N-asymptotic normality (of the mean)

Example: For Y ∈ Po(µ), the cumulant g.f. is KY (t) = µ(exp(t) − 1), which gives that κr [Y ] = µ; ∀r , that is KY (t) = µt + µ

t3 t2 + µ + ... 2 3! 3

κr [Y¯ ] = Standardize Z =

−2

Hence, the skewness is Sθ [Y ] = κ3 [Y ]/σ = µ/µ = µ . When µ → ∞ we have that Sθ [Y ] → 0, that is, “more Normal”, which we recognize from CLT. 3

Let Y1 , Y2 , . . . , YN be iid (∼ Y ). One can show that the cumulants of N the mean Y¯ = n=1 Yn are

¯ −µ Y √ . σ/ N

1 κr [Y ]. N r −1

Its cumulants are

κ1 [Z ] = 0, κ2 [Z ] = 1, κr [Z ] =

1 κr [Y ]; (r > 2). σ r N r /2−1

Thus, for r > 2 we have that κr [Z ] → 0 as N → ∞, that is, Z ∼ N(0, 1) as N → ∞ (CLT).

Generalized Linear Models and Transformed Residuals Transforms

Generalized Linear Models and Transformed Residuals Transforms

Cumulants and transforms

Variance stabilizing transforms

Let µ = Eθ [Y ] and σ 2 = Vθ [Y ] so that Eθ [Y¯ ] = µ and Vθ [Y¯ ] = σ 2 /N. Moreover, let h(·) be a (nice) transform. Second order Gaussian approximation (aka the Delta method) gives that h (µ) σ 2 · + O(N −2 ) 2 N σ2 + O(N −2 ) Vθ [h(Y¯ )] = [h (µ)]2 · N κ3 [Y ] σ4 + 3[h (µ)]2 · h (µ) · 2 + O(N −3 ) κ3 [h(Y¯ )] = [h (µ)]3 · 2 N N Eθ [h(Y¯ )] = h(µ) +

Now, if ([h (µ)]2 · σ 2 ) = a (a constant), then Vθ [h(Y¯ )] = [h (µ)]2 ·

σ2 a + O(N −2 ) = + O(N −2 ). N N

In other words, the variance of the transformed variable is independent of the mean µ. Example: Let Y ∼ Po(µ). Then the transform h(y ) = y 1/2 is stabalizing the variance because  1 1 Vθ [ Y¯ ] = Vθ [h(Y¯ )] = [ µ−1/2 ]2 · µ + O(N −2 ) = + O(N −2 ). 2 4

Generalized Linear Models and Transformed Residuals Transforms

Generalized Linear Models and Transformed Residuals Transforms

Symmetrizing transforms

Symmetrizing transforms for the Poisson distribution Example: Let Y ∼ Po(µ) so that κr [Y ] = µ; ∀r . Then

The skewness of h(Y¯ ) is approximately zero, if [h (µ)]2 · κ3 [Y ] + 3 · h (µ) · σ 4 = 0 such that the third cumulant becomes κ3 [Y ] σ4  2  κ3 [h(Y¯ )] = [h (µ)]3 · + 3[h (µ)] · h (µ) · + O(N −3 ) N2 N2 = O(N −3 ).

[h (µ)]2 · κ3 [Y ] + 3 · h (µ) · σ 4 = 0 ⇔ [h (µ)]2 + 3 · h (µ) · µ = 0 One solution is that h(µ) = aµ2/3 + b where a and b are constants. For a = 1 and b = 0 the variance of h(Y¯ ) = Y¯ 2/3 becomes µ µ + O(N −2 ) = [(2/3)µ−1/3 ]2 · + O(N −2 ) N N 1 = (2/3)2 µ1/3 · + O(N −2 ), N

Vθ [h(Y¯ )] = [h (µ)]2 ·

or equivalent, the standard deviation becomes Dθ [h(Y¯ )] = (2/3)µ1/6 /N + O(N −2 ).

Generalized Linear Models and Transformed Residuals Transforms

Generalized Linear Models and Transformed Residuals Transforms

Simulation: Symmetrization of Poisson samples.

Symmetrizing and variance stabilization Poisson transform So, for Y ∼ Po(µ), the transform h(Y ) = Y 2/3 “makes” the distribution symmetric (κ3 [h(Y )] ≈ 0) and the the standard deviation “becomes” Dθ [h(Y )] ≈ (2/3)µ1/6 . The mean is E [h(Y )] ≈ µ2/3 .

Simulation of N = 5000 samples from Y ∈ Po(µ) with µ = 3, 5, 10, 20 and the symmetrization transform h(Y ) = Y 2/3 : Normal Q−Q Plot for Y ~ Po(5)

However, the transform

Normal Q−Q Plot for Y ~ Po(20)

Z = Z (Y ) =

20

30

Sample Quantiles

20 15

0

10

5

10

10

Sample Quantiles

40

15

Normal Q−Q Plot for Y ~ Po(10)

5

Sample Quantiles

8 6 4 2 0

Sample Quantiles

10

Normal Q−Q Plot for Y ~ Po(3)

−4

−2

0

2

4

−4

Theoretical Quantiles

−2

0

2

4

−4

Theoretical Quantiles

−2

0

2

4

−4

−2

Theoretical Quantiles

0

2

4

Theoretical Quantiles

Y 2/3 − µ2/3 h(Y ) − E [h(Y )] ≈ 2 1/6 D[h(Y )] 3µ

is approximately Z ∼ N(0, 1), which is exactly the Anscombe residuals for the Poisson distribution; rA (Y , µ) =

Generalized Linear Models and Transformed Residuals Transforms

Generalized Linear Models and Transformed Residuals Transforms for the exponential family

Simulation: Poission distribution and Anscombe residuals

Exponential family revisited For an exponential family given by

Po(µ)

fY (y ) ∝ exp{y θ − b(θ) + c(y )}

1

the cumulant generating function of Y is

0

rA(y, µ)

KY (t) = ln{MY (t)} = b(θ + t) − b(θ)

−2

30

so that κr [Y ] = b (k) (θ). For this reason, the variance can be written as a function of the mean:

−4

0 10

y

50

2

3

70

Po(µ)

0

10

20

30 µ

40

50

3 Y 2/3 − µ2/3 · . 2 µ1/6

0

10

20

30

40

50

µ

Eµ [Y ] = µ Vµ [Y ] = µ

Eµ [rA (Y )] = −0.0312 ≈ 0 Vµ [rA (Y )] = 1.019 ≈ 1

Sµ [Y ] = µ−2

Sµ [rA (Y )] = −0.00738 ≈ 0

Vθ [Y ] = σ 2 = σ 2 (µ) = b  (θ)((b  )−1 (µ)) so that, since b  (θ) = κ1 [Y ] = µ, κ3 (µ) b (3) (θ(µ)) d 2 κ3 (µ) σ (µ) = (2) = 2 . = 2 dµ κ (µ) σ (µ) b (θ(µ))

Generalized Linear Models and Transformed Residuals Transforms for the exponential family

Generalized Linear Models and Transformed Residuals Transforms for the exponential family

Exponential family revisited...

Poisson distribution revisited

If we let Vθ [Y ] = V (µ) = σ 2 (µ), then κ3 (µ) = V (µ)/V  (µ) and the symmetrizing transformation solves [h (µ)]2 · κ3 [Y ] + 3 · h (µ) · σ 4 = 0 ⇔ h (µ)V  (µ)V (µ) + 3h (µ)V 2 (µ) = 0 or equivalent  d   (h (µ))3 V (µ) = 0. dµ

For Y ∈ Po(µ) we have that Eµ [Y ] = µ and Vµ [Y ] = µ. Thus according to Wedderburn’s result the transform that makes h(Y ) most Normal is  u  u 3 2/3 1 2 θ dθ = = u 2/3 . h(u) = 1/3 2 3 0 θ 0 We have showed that this transform is not stabilizing the variance; recall Dθ [h(Y )] ≈ (2/3)µ1/6 , but dividing by the standard deviation the Anscombe residuals obtain this

A solution to this equation is 

u

h(u) = −∞

1 dθ. [V (θ)]1/3

This is what Wedderburn (19??) showed hold for all distributions in GLM, i.e. exponential family distributions.

Generalized Linear Models and Transformed Residuals Further Reading

For Further Reading

Donald A. Pierce and Daniel W. Schafer. Residuals in Generalized Linear Models. Journal of American Statistical Association, Vol. 81, No. 396, Theory and Methods, December 1986

rA (Y , µ) =

3 Y 2/3 − µ2/3 · . 2 µ1/6