Shrinkage estimators

October 31, 2013

Contents 1 Shrinkage estimators

1

2 Admissible linear shrinkage estimators

3

3 Admissibility of unbiased normal mean estimators

6

4 Motivating the James-Stein estimator

11

4.1

What is wrong with X? . . . . . . . . . . . . . . . . . . . . . . . . .

11

4.2

An oracle estimator: . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

4.3

Adaptive shrinkage estimation . . . . . . . . . . . . . . . . . . . . . .

13

5 Risk of δJS

16

5.1

Risk bound for δJS . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

5.2

Stein’s identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

6 Some oracle inequalities 6.1

A simple oracle inequality . . . . . . . . . . . . . . . . . . . . . . . .

7 Unknown variance or covariance

21 21 23

Much of this content comes from Lehmann and Casella [1998], sections 5.2, 5.4, 5.5, 4.6 and 4.7.

1

Shrinkage estimators

Consider a model {p(x|θ) : θ ∈ Θ} for a random variable X such that E[X|θ] = µ(θ), 0 < Var[X|θ] = σ 2 (θ) < ∞ ∀θ ∈ Θ.

1

Peter Hoff

Shrinkage estimators

October 31, 2013

A linear estimator δ(x) for µ(θ) is an estimator of the form δab (X) = aX + b. Is δab admissible? Theorem 1 (LC thm 5.2.6). δab (X) = aX + b is inadmissible for E[X|θ] under squared error loss whenever 1. a > 1, 2. a = 1 and b 6= 0, or 3. a < 0. Proof. The risk of δab is R(θ, δab ) = E[(aX + b − µ)2 |θ] = E[(aX − aµ − µ(1 − a) + b)2 |θ] = E[a2 (X − µ)2 + (b − µ(1 − a))2 + 2a(X − µ)(b − µ(1 − a))|θ] = a2 σ 2 + (b − µ(1 − a))2 1. If a > 1, then R(θ, δab ) > a2 σ 2 > σ 2 = R(θ, X), so δab is dominated by X. 2. If a < 0, then R(θ, δab ) > (b − µ(1 − a))2 = (1 − a)2 (b/(1 − a) − µ)2 = (b/(1 − a) − µ)2 = R(θ, b/(1 − a)) and so δab is dominated by the constant estimator b/(1 − a). 3. If a = 1 and b 6= 0, then R(θ, δab ) = σ 2 +b2 > σ 2 = R(θ, X), so δab is dominated by X.

2

Peter Hoff

Shrinkage estimators

October 31, 2013

Letting w = 1 − a and µ0 = b/(1 − a), the result suggests that if we want to use an admissible linear estimator, it should be of the form δ(X) = wµ0 + (1 − w)X , w ∈ [0, 1] We call such estimators linear shrinkage estimators as they “shrink” the estimate from X towards µ0 . Intuitively, you can think of µ0 as your “guess” as to the value of µ, and w as the confidence you have in your guess. Of course, the closer your guess is to the truth, the better your estimator. If µ0 represents your guess as to µ(θ), it seems natural to require that µ0 ∈ µ(Θ) = {µ : µ = µ(θ), θ ∈ Θ}, i.e. µ0 is a possible value of µ. Lemma 1. If µ(Θ) is convex and µ0 6∈ µ ¯(Θ), then δ(X) = wµ0 + (1 − w)X is not admissible. Proof. For the one-dimensional case, suppose µ0 > µ(θ) ∀θ ∈ Θ. ˜ Let µ ˜0 = supΘ µ(θ), and δ(X) = wµ ˜0 + (1 − w)X. ˜ Then δ(X) dominates δ(X) (the variances are the same, and the latter has higher bias for all θ). The proof is similar for the case µ0 < µ(θ) ∀θ ∈ Θ.

Exercise: Generalize this result to higher dimensions.

2

Admissible linear shrinkage estimators

We have shown that δ(X) = wµ0 + (1 − w)X is inadmissible for µ(θ) = E[X|θ] if • w 6∈ [0, 1] or 3

Peter Hoff

Shrinkage estimators

October 31, 2013

• µ0 6∈ µ(Θ). Restricting attention to w ∈ [0, 1] and µ0 ∈ µ(Θ), it may seem that such estimators should always be admissible, but “always” is almost always too inclusive. Exercise: Given an example where wµ0 + (1 − w)X is not admissible, even with w ∈ (0, 1) and µ0 ∈ µ(Θ).