coarsening bias: how instrumenting for ... - Scholars at Harvard

18 downloads 188 Views 646KB Size Report
effects of school leaving laws that encouraged some students to stay in ..... the proof in the Online Appendix and Angri
C OARSENING BIAS : H OW INSTRUMENTING FOR COARSENED TREATMENTS UPWARDLY BIASES INSTRUMENTAL VARIABLE ESTIMATES J OHN M ARSHALL∗ N OVEMBER 2014

Political scientists increasingly use instrumental variable (IV) methods, and must often choose between operationalizing their endogenous treatment variable as discrete or continuous. Beyond theoretical considerations, many datasets only provide coarse treatment codings. I demonstrate that coarsening a treatment with multiple intensities— e.g. treating a continuous treatment as binary—can substantially upwardly bias IV estimates, because a coarsened first stage only partially registers the reduced form effect. However, using a treatment where multiple values are affected by the instrument— even without measuring all intervals—recovers a consistent causal estimate. These insights are relevant for identifying high school’s long-run effects on political preferences. Since years of schooling is rarely measured in political surveys, I utilize two-sample IV techniques to avoid coarsening bias. I find that an additional year of late high school in Great Britain substantially increases Conservative voting. However, the estimate for completing high school is upwardly biased by three-to-four times.



PhD candidate, Department of Government, Harvard University. [email protected]. I thank John Bullock, Anthony Fowler, Andy Hall, Torben Iversen, Rakeen Mabud, Horacio Larreguy, Arthur Spirling, Brandon Stewart, Dustin Tingley and Tess Wise for illuminating discussions or useful comments.

1

1

Introduction

Instrumental variable (IV) techniques are now a standard part of the political scientist’s methodological toolkit. IV analyses have illuminated complex relationships such as the effects of democracy on economic development (Acemoglu, Johnson and Robinson 2001), economic growth on civil conflict (Miguel, Satyanath and Sergenti 2004), and campaign spending on election outcomes (Gerber 1998). Given an appropriate instrument can identify important causal relationships that cannot be easily disentangled, it is not surprising to find that the number of articles published in the American Journal of Political Science and the American Political Science Review using IV techniques has grown considerably over recent decades (see Figure 1). Best practice for using IV methods is now receiving greater scrutiny (e.g. Angrist and Pischke 2008; Dunning 2008; Sovey and Green 2011). However, this article highlights a previouslyunrecognized but potentially severe source of bias: how coarsening a continuous or multi-valued (endogenous) treatment variable1 can substantially upwardly bias IV estimates. This concern remains highly relevant: of papers using IV methods, 36% of AJPS and APSR publications since 2005 instrument for binary treatments or ordinal treatments (with five or fewer categories). While coarsening—which is unproblematic for OLS estimation—may appear appealing when the researcher believes that coarsening is more interpretable, theory suggests that the treatment effect may be non-linear, or granular measures of the treatment are unavailable, it can cause large biases in the IV context. I first analytically characterize coarsening bias, before exploiting two natural experiments to illustrate its extent in the context of identifying high school’s effect on political preferences in Great Britain. I demonstrate that instrumenting for completing high school over-estimates the nevertheless large causal effect of late high school education on voting Conservative later in life by 1

Like Angrist, Imbens and Rubin (1996), I refer to the endogenous variable as the “treatment”. Although this treatment is not random, the predicted values from the first stage are effectively random. 2

6 4 2 0

Number of articles published

1986

1990

1994

1998

2002

2006

2010

2014

Year AJPS

APSR

Figure 1: Annual trends in the usage of instrument variable techniques in political science Notes: Counts are based on data provided by Allison Carnegie (from Sovey and Green 2011) and the author’s own reading of AJPS and APSR articles. The APSR count for 2014 includes only the first three (of four) editions of each journal. Any reference to implementing an IV technique is included.

3

a factor of three or four. In general, coarsening bias can explain why IV estimates are often orders of magnitude larger than the analogous OLS or reduced form estimates. It is thus imperative that political scientists become aware of how coarsening bias can substantially inflate IV estimates. This article’s main theoretical contribution characterizes the potential biases associated with coarsening an (endogenous) treatment variable in an IV analysis. For ease of exposition, I focus on instrumenting for a treatment intensity—a treatment that takes multiple values—that is coded as a dummy variable.2 Intuitively, coarsening bias arises when an instrument affects the intensity of the underlying treatment in a way that cannot be measured when the treatment is operationalized as a binary variable. For example, a dummy for completing high school would fail to register the effects of school leaving laws that encouraged some students to stay in school for an additional year without completing high school. By grouping together multiple years of schooling (or treatment intensities) that each affect the outcome, coarsening falsely creates the impression that the sum of the effects at each intensity can be attributed to completing high school. Coarsening bias therefore violates the exclusion restriction underpinning IV estimation (e.g. Angrist, Imbens and Rubin 1996), because the instrument affects the outcome through an avenue not captured by the dichotomous treatment variable in the first stage.3 Coarsening bias is especially large when any change in treatment intensity affects the outcome, e.g. if the treatment’s true effect is linear, and when the instrument has large effects on intensities 2

The argument equally applies when only a single value of an ordinal treatment is affected by the instrument because the treatment effectively serves as a dummy variable. 3 For a more technical intuition, we can write the IV estimator as the ratio of the reduced form and first stage, which respectively identify the effect of the instrument on the outcome and the effect of the instrument on the treatment (see Angrist and Pischke 2008). Regardless of how the treatment is coarsened, the reduced form captures the average effect of the instrument on the outcome. However, the first stage under-estimates the effect of the instrument on the coarsened treatment by failing to register increases in treatment intensity which do not pass the threshold required to alter the coarsened treatment indicator. Coarsening bias therefore violates the exclusion restriction underpinning IV estimation, because the instrument affects the outcome through an avenue not captured in the first stage. By under-estimating the first stage required to re-scale the reduced form, coarsening the treatment upwardly biases the IV estimate under general conditions.

4

other than the coarsened binary threshold. Only when the true effect of a treatment is discontinuous and the researcher is able to both recognize and measure the value of the treatment where this occurs, or where the instrument only affects the treatment at the relevant threshold, will the IV estimate associated with a coarsened treatment variable be unbiased. Both observational and experimental data are vulnerable to coarsening bias. For example, in the case of Pierskalla and Hollenbach (2013), who use local communication regulations to instrument for an indicator of local cell phone coverage, favorable regulations could increase cell phone usage without affecting the services used to define their treatment. Lacking the correct first stage could explain why the IV estimates in Pierskalla and Hollenbach (2013) are 20 times larger than their OLS estimates. In experimental studies, the extent of coarsening bias depends upon the type of endogenous variable that a randomized instrument affects. For Gerber, Huber and Washington (2010), who in one specification instrument for an individual’s partisan identification and find effects 15 times larger than their corresponding OLS estimates, upward bias may occur if their randomized mailing causes voters to move toward a political party without passing the threshold required to register a new partisan identification.4 Conversely, experiments inducing respondents to uptake truly binary treatments are not affected by this bias. For example, get-out-the-vote canvassing is unlikely to impact respondents that did not answer the door (e.g. Gerber and Green 2000). However, determining whether coarsening bias accounts for large IV estimates is difficult. Large IV estimates could instead reflect unusually large effects among individuals that only received the treatment because they were induced to do so by the instrument (Angrist, Imbens and Rubin 1996), or weak instrument bias (Staiger and Stock 1997). To isolate the empirical relevance of coarsening bias—as distinct from these alternative explanations for large IV estimates—I utilize an unusual empirical application where the availability of multiple strong instruments permits 4

Gerber, Huber and Washington (2010) also consider another specification where partisanship is coded using a 7-point scale. This is unlikely to be bias.

5

direct comparison of the coarsened and uncoarsened estimates. While coarsening bias reflects an exclusion restriction violation, I recast the problem as an underestimation of the first stage. Building on this insight, researchers can still recover valid causal estimates. In theory, the solution is remarkably easy: if data can be assembled to code the treatment as a multi-valued variable, where the instrument affects multiple treatment values, 2SLS consistently estimates the average causal effect of a unit increase in the treatment among compliers (Angrist and Imbens 1995). This yields what I term the local average per-unit treatment effect. In my empirical application, I show how two-sample IV methods—which estimate the reduced form in a sample containing data on only the outcome and the instrument, and the first stage in a sample containing data on only the treatment and the instrument—can estimate this causal quantity even in the frequent case where a researcher’s original dataset does not include or cannot be augmented with a richer measure of the treatment. Where improved measurement of the treatment is impossible, researchers are advised to focus on their otherwise unbiased reduced form estimates. This article’s main empirical contributions are to illustrate and remedy coarsening bias in the context of high school education’s effects on political preferences. While a large literature has emphasized the importance of schooling for political behavior (e.g. Nie, Junn and Stehlik-Barry 1996; Verba, Schlozman and Brady 1995), and these relationships define the incentives for politicians of different stripes to expand education, existing research has struggled to disentangle education’s causal effect on vote choice. Analyses of electoral turnout suggest that IV approaches to similar questions may be vulnerable to coarsening bias. Recent field and natural experiments using IV methods have estimated extremely large effects of completing high school on turnout (Milligan, Moretti and Oreopoulos 2004; Sondheimer and Green 2010).5 However, these estimates are up5

Sondheimer and Green (2010:185) pool three U.S. field experiments that encourage students to complete high school, and find that “a high school dropout with a 15.6% chance of voting would have a 65.2% chance of turnout if randomly induced to graduate from high school.” Instrumenting for completing high school with state-level school leaving ages, Milligan, Moretti and Oreopoulos (2004) find completing high school increases the probability of turning out by around 30-40 6

wardly biased if each additional year of schooling increases turnout while the instrument increases schooling for many students without necessarily inducing them to complete high school. To address the selection concern that certain types of individual receive more education (e.g. Kam and Palmer 2008), I exploit two major educational reforms in Britain that increased the minimum high school leaving age from 14 to 15 in 1947 and from 15 to 16 in 1972. The 1947 reform induced most students to stay in school until 15, but induced others to complete high school at age 16. Using this reform to instrument for completing high school is thus liable to over-estimate high school’s political effects if an additional year of education at age 15 (without completing high school) has downstream effects on political preferences. However, especially since years of schooling is not measured in most political surveys, this appears to be an appealing quantity to estimate. The 1972 reform only induced students to remain in school until age 16. Using both reforms to instrument for years of schooling allows me to estimate the effect of the penultimate and final year of high school, and thus calculate the extent of coarsening bias by comparing the IV estimates instrumenting for completing high school to the effect of the final year of high school. Since the reforms only affected the likelihood that a student remains in school until age 15 or 16, these estimates are not subject to coarsening bias. This feature identifies the extent of coarsening bias, and motivates the general recommendation that researchers operationalize treatment intensities as variables where the instrument affects multiple intensities. Estimation is challenging because years of schooling is not measured in my survey data. To address this common problem, I use two-sample IV methods (Angrist and Krueger 1992, 1995; Franklin 1989) to combine election surveys containing political outcomes and the instruments with administrative surveys containing years of schooling and the instruments. I also extend the two-sample 2SLS (TS2SLS) estimator advocated by Franklin (1989), Angrist and Krueger (1995) and Inoue and Solon (2010) to allow for clustering. The simple code for implementing this method should alert applied researchers to this under-utilized empirical strategy. percentage points and the probability of following election campaigns by 85 percentage points. 7

The results show that while late high school education increases the likelihood that an individual votes for the Conservative party later in life, instrumenting for an indicator for completing high school substantially upwardly biases estimates of schooling’s political effects. First, I present reduced form regression discontinuity estimates indicating that cohorts subject to a greater school leaving age are significantly more likely to vote Conservative. Second, I show that a dichotomous treatment for completing high school upwardly biases estimates by 300-400 per cent in this case, where the effect of late high school is actually approximately linear. Third, my TS2SLS estimates show that both the penultimate and final year of high school increase the probability that an instrument complier votes Conservative by around 15 percentage points. Furthermore, analysis of the mechanisms supports the theoretical argument that schooling increases income and support for the more conservative fiscal policies associated with the Conservative Party. This substantial effect of schooling raises a dilemma for left-of-center parties—like Labour and more recently the Liberal Democrats—which have seemingly championed inclusive educational policies at the expense of electoral success. This paper is organized as follows. Section 2 formally characterizes coarsening bias and discusses the implications for applied empirical work. Section 3 identifies the effect of an additional year of schooling on voting preferences in Great Britain, and calculates the bias arising from using a dichotomous measure of high school. Section 4 concludes.

2

IV’s upward bias with coarsened treatments

This section demonstrates the upward bias of coarsening an endogenous treatment intensity, focusing on the simplest case where there is a binary instrument and the treatment is coarsened into a binary indicator.6 After briefly reviewing the IV assumptions in the heterogeneous potential outcomes framework (see Angrist, Imbens and Rubin 1996), I first show how coarsening bias arises 6

The results extend to multi-valued instruments and including control variables.

8

from a subtle exclusion restriction violation, before analyzing how the extent of coarsening bias depends upon the relationship between the treatment and the outcome.

2.1

IV notation and assumptions

Instrumental variable techniques address the concern that a treatment may be correlated with an omitted variable that affects the outcome of interest. An instrumental variable—which is correlated with the treatment, but does not directly affect the outcome—is used to instrument for the endogenous treatment in order to isolate the exogenous effect of the treatment. Since only variation in the treatment induced the instrument is exploited, the causal effects pertain to compliers—observations that only received a particular treatment intensity because they received the instrument. Formally, denote the instrument, for each observation i ∈ N ≡ {1, ..., n}, as Zi ∈ {0, 1}. The observed endogenous treatment intensity of observation i, Ti ∈ {1, ..., J}, assumes one of J ordered values. Yi is i’s observed outcome of interest. We first assume that the stable unit treatment value assumption (SUTVA) holds, which requires that potential outcomes are independent of the instruments and treatments received by other individuals. Given SUTVA, Ti (Zi = z) denotes i’s potential outcomes of Ti conditional on receiving instrument Zi = z, while Yi (Zi = z, Ti (z) = t ) correspondingly denotes i’s potential outcomes of Yi conditional on receiving treatment intensity Ti = t and instrument Zi = z.7 In addition to SUTVA (A1), IV estimation typically requires four additional assumptions (Angrist, Imbens and Rubin 1996). First, assume that Zi is randomly assigned (A2), and thus the instrument is independent of potential outcomes and potential treatment intensities. Second, assume that there exists a first stage (A3), such that the instrument affects the intensity of the treatment i receives. Third, monotonicity (A4) requires that, for all individuals, the instrument either never decreases the treatment or never increases the treatment. Fourth, the weak exclusion restriction Observed outcomes relate to potential outcomes through Ti = Zi Ti (1) + (1 − Zi )Ti (0) and Yi = ZiYi (1, Ti (1)) + (1 − Zi )Yi (0, Ti (0)). 7

9

requires that Zi only affects Yi through the treatment Ti (A5). This exclusion restriction entails that Zi only affects Yi through Ti ; consequently, Yi (z,t ) = Yi (t ) for any z. The weakness of this assumption when Ti is coarsened is explained below. These standard assumptions, which are explained in greater detail elsewhere (e.g. Angrist, Imbens and Rubin 1996; Dunning 2008; Sovey and Green 2011), are formalized below. A1. Stable unit treatment value assumption: for all i ∈ N , (a) Ti (Zi = z, Z−i = w) = Ti (Zi = z, Z−i = w0 ) for all z, w, and w0 , (b) Yi (Zi = z, Ti (z) = t, Z−i = w, T−i (z) = v) = Yi (Zi = z, Ti (z) = t, Z−i = w0 , T−i (z) = v0 ) for all z, w, w0 , v and v0 , where Z−i and T−i are vectors of instrument and treatment assignments to all observations except i. A2. Instrument independence: for all i ∈ N , Ti (0) and Ti (1) are jointly independent of Zi . A3. First stage: E[Ti |Zi = 1] − E[Ti |Zi = 0] 6= 0. A4. Monotonicity: for all i ∈ N , Ti (1) − Ti (0) ≥ 0 or Ti (1) − Ti (0) ≤ 0. A5. Exclusion restriction (weak): for all i ∈ N , Yi (z,t ) = Yi (z0 ,t ) for any t and all z and z0 .

2.2

Characterization of coarsening bias

Coarsening bias occurs when the researcher, whether by choice or constrained by data availability, coarsens the treatment intensity. In particular, in the hope of identifying the effect of treatment intensity k > 1, Ti is partitioned by defining the indicator Dik ≡ 1(Ti ≥ k). This binary variable indicates whether an individual receives at least treatment intensity k. The researcher seeking to identify the effect of obtaining Ti = k is thus interested in estimating the following causal quantity: βk ≡ E[Yi (k) −Yi (k − 1)|Ti (1) ≥ k > Ti (0)]. 10

βk is the local average treatment effect (LATE) of obtaining intensity k beyond only obtaining the preceding level k − 1 for instrument compliers. In the case of schooling, this could be the effect of completing high school beyond completing the penultimate grade of high school. Often, however, this counterfactual is not clearly specified: many studies implicitly compare the difference between treatment intensities above and below an often arbitrary threshold defining a coarsened treatment condition. For example, an indicator for a strong Republican partisan compares strong Republican partisans to a mixture of weak Republican partisans, independents and Democrat partisans. As is well known, the IV framework seeks to estimate the effect of the endogenous binary treatment Dik using the following system of equations:

Yi = βk Dik + ui , Dik = γZi + εi .

(1) (2)

Equation (1) is the structural model relating Dik to the outcome Yi , while equation (2) is the first stage regression estimating the effect of the instrument Zi on Dik . The IV estimator of the effect of Dik on Yi , which I denote βkIV , divides the reduced form estimate of the effect of Zi on Yi by the first stage effect of Zi on Dik .8 Causal estimates of the reduced form and first stage can be identified under assumptions A1 and A2. Using assumptions A4 and A5, the IV estimator for the system of equations above can be expressed as the weighted sum of the causal effect for compliers moving from intensity t − 1 to t for each such interval (see the proof in the Online Appendix and Angrist and Imbens 1995):

βkIV

E[Yi |Zi = 1] − E[Yi |Zi = 0] ∑tJ=2 pt βt = . ≡ E[Dik |Zi = 1] − E[Dik |Zi = 0] pk

(3)

Without loss of generality, I consider the case where monotonicity implies Ti (1) − Ti (0) ≥ 0. 8

The addition of covariates provides similar results and the case of multiple instruments involves weighting multiple Wald estimators (Angrist and Pischke 2008). 11

Hence, pt ≡ Pr(Ti (1) ≥ t > Ti (0)) denotes the probability that an individual only reaches category Ti = t because they received the instrument Zi = 1, and thus represents the proportion of compliers at treatment intensity t in the population. Analogously, pk = Pr(Ti (1) ≥ k > Ti (0)) = E[Dik |Zi = 1] − E[Dik |Zi = 0] is the relevant first stage for reaching the treatment intensity k. The estimator is well-defined under A3 provided pk 6= 0. Finally, βt ≡ E[Yi (t ) −Yi (t − 1)|Ti (1) ≥ t > Ti (0)] is the LATE for compliers moving from treatment intensity t − 1 to treatment intensity t. Under the standard assumptions enumerated above, the IV estimator—which should return βk , the effect of treatment intensity k for compliers—is often inconsistent. Inspection of equation (3) shows that βkIV is only equal to βk when ∑t6=k pt βt = 0. In fact, βkIV is consistent only in four special cases. First, when the instrument only affects reaching intensity k; or pt = 0, ∀t 6= k. Second, when the effect at all intensities other than k is zero; or βt = 0, ∀t 6= k. Third, when one of the preceding conditions holds for each intensity t, ensuring that pt βt = 0 for all t 6= k. Fourth, if the direction of the effects (weighted by pt ) differ across intensities but ultimately exactly cancel out. This inconsistency reflects a subtle exclusion restriction violation that arises from coarsening Ti . Although assumption A5 requires that Zi does not affect Yi through avenues other than Ti , the coarsening of Ti into Dik allows for Ti to affect Yi without going through Dik . Consequently, while any effect of Zi on Yi is registered in the reduced form estimate (the numerator of equation (3)), the first stage (the denominator) only registers cases where Zi induces i to pass the threshold used to define the treatment indicator (e.g. moving from intensity k − 1 to k). If values of Ti other than k also affect Yi , then changes in the reduced form are not captured in the first stage. Consistent estimation of the causal effect of Dik requires stronger assumptions. In particular, the strong exclusion restriction (A5*), such that Zi only affects Yi through Dik , is sufficient. A5*. Exclusion restriction (strong): for all i ∈ N and all z and z0 , (a) Yi (z,t ) = Yi (z0 ,t 0 ) for all t,t 0 ≥ k, (b) Yi (z,t ) = Yi (z0 ,t 0 ) for all t,t 0 < k. 12

This assumption is more demanding than A5 because, in addition to requiring that the instrument only affect the outcome by altering the intensity of the treatment, it also requires that the researcher correctly coarsen the treatment such that βt = 0 for all t 6= k. In other words, the stronger assumption of A5* requires knowledge of the functional form relating the treatment to the outcome to ensure that other levels of the treatment are not also affecting the outcome. The following proposition formally summarizes these insights, demonstrating the importance of the seemingly minor distinction between assumptions A5 and A5* when a coarsened measure of Ti is used. Proposition 1. (Coarsening bias) Assume that in addition to assumptions A1, A2, A3, A4 and A5 that pk > 0 holds. The dummy variable IV estimator βkIV can be expressed as: βkIV = βk +

∑t6=k pt βt . pk

(4)

Provided sign(βk ) = sign(βt ) for all t 6= k where pt > 0, the dummy variable IV estimator accentuates the true causal effect: |βk | ≤ |βkIV |. However, if A5* also holds, then βkIV consistently estimates βk . All proofs are provided in the Online Appendix. This result shows that, after coarsening Ti , the standard IV estimator typically only consistently estimates the LATE of obtaining treatment intensity k on Yi under the strong exclusion restriction assumption A5*. When only A5 holds, the IV estimate is inconsistent—and thus biased even as the sample size becomes large—except when the instrument only affects the probability of attaining treatment intensity k.9 Otherwise, the characterization of the bias in equation (4) demonstrates coarsening can substantially bias the IV estimate. Provided that the direction of the effect on Yi for each intensity of the treatment is the same, the magnitude of the IV estimate is upwardly biased. Furthermore, the bias of the estimator is increasing in both pt /pk and βt , for any t 6= k. In other 9

IV estimators are biased but consistent in finite samples. 13

words, the bias is greater when the first stage is strong for intensities other than k and the LATE for intensities other than k is large. The economic or political effects of high school education represent an important case where coarsening bias could occur. Consider a law requiring that students remain in school until age 15 in a country like Britain where high school is completed at age 16. For students who would have dropped out before age 15 without the law, the law may induce them to stay in school until age 15 without completing high school (i.e. pt > 0 for some t < k). However, some may go on to complete high school (i.e. pk > 0). There is thus a first stage for levels of schooling below completing high school, in addition to completing high school. Coarsening bias arises if additional schooling before the completion of high school affects the outcome of interest (i.e. βt > 0 and pt > 0 for some t < k). For outcomes like income, additional human capital or signaling value could easily increase labor market returns (e.g. Becker 1993; Mincer 1974; Spence 1973). Furthermore, if increased income affects political preferences, or remaining in high school imparts politically-relevant norms, then political outcomes may also be biased. However, existing empirical work using exogenous variation to instrument for completing high school has implicitly assumed that A5* holds (Milligan, Moretti and Oreopoulos 2004; Sondheimer and Green 2010), and thus implies a strong theoretical commitment to the claim that human capital or signaling benefits of education apply only as a student completes high school.

2.3

When is coarsening bias severe?

Proposition 1 demonstrated that the extent of coarsening bias depends upon the first stage and the LATE at different treatment intensities. This analytical insight facilitates interpretation of the bias in terms of a (weighted) causal response function (CRF). The CRF represents the effect of the treatment (the LATE) at each intensity. Since the CRF is almost never known in advance, it is essential to understand the types of causal effects for which assumption A5* is tenable, and what causal quantities can be recovered when only assumption A5 is tenable. 14

2.3.1

Sharp jumps in the CRF

When the CRF exhibits sharp discontinuities, as exemplified in Figure 2, the dummy approach can be appropriate. This is because assumption A5* holds when only intensity k affects the outcome. Provided the researcher is able to correctly identify intensity k—the only point at which there is a (positive) causal effect in the figure—as the key jump, then βk can be consistently estimated when a suitable instrument exists to ensure pk > 0. This works because βt = 0 for all t 6= k, and thus the dummy variable IV estimator is consistent regardless of whether pt > 0 for some other t 6= k.

Outcome (Yi)



k

k+1

Treatment intensity (Ti)

Figure 2: Discontinuous causal response function

In practice, however, it is hard to know a priori whether k correctly captures the true discontinuity. In general, tipping points are not straight-forward to exactly theorize. In experiments where subjects cannot be partially treated it is easier to determine clear cutoffs. As noted above, in the

15

case of Gerber and Green’s (2000) get-out-the-vote canvassing, knocking on a door is only likely to affect respondents that opened the door to receive the treatment. But even experiments can be hard to evaluate if there is partial compliance, such that individuals can experience some of the treatment without being designated as treated. Conversely, if the researcher incorrectly surmises that k + 1 is the correct threshold, at best they fail to detect the existence of the effect of intensity k but correctly identify no effect at k + 1. In the simple example of Figure 2, where βt = 0 for all t 6= k, the researcher correctly concludes that βk+1 = 0 only if their instrument does not induce subjects to reach intensity k. In other words, pk = 0 ensures a correct estimate of a quantity that was probably not of primary interest. When pk > 0, the IV estimator will produce an inconsistent estimate of the LATE at intensity k + 1 with bias given by:

βkIV+1 − βk+1 =

pk βk > 0. pk + 1

(5)

Although this estimate is approximately correct in the sense that there is a causal effect nearby, it both wrongly attributes the effect to intensity k + 1 and does not even consistently estimate of βk unless pk+1 = pk . 2.3.2

Linear (and other) CRFs

The coarsening bias associated with using a dummy variable can be particularly large when the true CRF is linear. Letting the causal effect associated with each interval be βt = τ 6= 0, the dummy variable IV estimator yields:

βkIV − βk =

∑t6=k pt τ. pk

16

(6)

This requires that more than one half of all compliers must achieve intensity k for the bias to be less than double the size of the true coefficient.10 This concern increases with how close the treatment intensity categories are to one another (i.e. increases in the number of categories J), because it becomes increasingly implausible that any instrument could ensure pt = 0 for all t 6= k. When the causal response is linear and Ti is observable, an IV estimator replacing Dik with Ti in the first stage is more appropriate. The first stage thus regresses Ti , rather than Dik , on Zi . By using Ti in the first stage to re-scale the reduced form estimate, this approach instead identifies the local average per-unit treatment effect (LAPTE). This estimator is defined as:

IV βLAPT E ≡

E[Yi |Zi = 1] − E[Yi |Zi = 0] ∑J pt βt . = t =J 2 E[Ti |Zi = 1] − E[Ti |Zi = 0] ∑t =2 pt

(7)

The LAPTE—which is the standard IV estimator used when the treatment is continuous—is thus a linear approximation, where the causal effects at each treatment intensities is weighted by the number of compliers at that intensity. It is easy to see that when the true effect is τ at each interval, IV it is exactly recovered by βLAPT E.

More generally, the LAPTE estimator can be interpreted as correcting the first stage by appropriately measuring the instrument’s effect on the endogenous treatment. This approach is robust in two attractive respects. First, only the weaker exclusion restriction (A5) is required for consistent estimation of the LAPTE, even when the true causal effect is not linear. This is because the first stage is assigned a linear form that captures all changes in Ti (Angrist and Imbens 1995). Second, the linear approach can be robust even without observing all categories. If the J observed categories represent a coarsening of the true intervals (e.g. because T is continuous), the linear causal effect can still be recovered provided the intervals are equally spaced. 10

Note that

∑t6=k pt p − pk = < 1, pk pk

only when pk > p/2, where p ≡ ∑tJ=2 pt . 17

Proposition 2. (LAPTE with multiple categories) Let only J equally-spaced categories of Ti be observed when there are in fact αJ equally-spaced categories, where α > 1 is finite and αJ is IV ,J IV ,αJ an integer. Denote βLAPT E and βLAPT E respectively as the IV estimators in the observed sample

(denoted by superscript J) and unobserved sample (denoted by superscript αJ). Let assumptions A1-A5 hold, and assume pt > 0 for at least two intensities. If the effect of Ti is linear such that IV ,J IV ,αJ β jJ = τ for all intervals j, then βLAPT E = αβLAPT E .

Consequently, obtaining the coefficient on the quantity of interest only requires an adjustment by factor α to identify the average linear causal effect for a desired unit interval. A first stage is required for at least two treatment intensities to ensure that the IV estimate averages across coarsened categories; without estimates for two intensities to “draw a line through”, the estimates would be equally susceptible to coarsening bias. As the CRF departs from linearity, a linear approximation may become less useful. An important question is thus: how does the LAPTE differ from the dummy variable IV estimate? It is easy to show that the dummy variable approach yields a coefficient at least as large as the LAPTE when A4 is satisfied.11 Consequently, if the CRF is that in Figure 2, then the linear approach underestimates the true causal effect at intensity k when βt = 0 for all t 6= k but pt > 0 for some t 6= k. Where the instrument only affects the first stage of interest, or pt = 0, ∀t 6= k, the LAPTE estimator yields an identical estimate to the dummy variable IV estimator. In all other cases, the dummy variable IV estimate is biased. However, the LAPTE remains a consistent causal estimate of the complier-weighted average across the intervals.

2.4

Practical implications

Researchers may wish to coarsen treatment variables for a variety of seemingly valid reasons. First, coarsening may be appealing because estimates are more interpretable. Second, coarsening may 11

Comparing denominators, ∑tJ=2 pt ≥ pk if sign( pt ) = sign( pk ), ∀t.

18

be theoretically appealing when the treatment effect is believed to be non-linear. Third, coarsening may be convenient in the absence of more granular measures of the treatment. Furthermore, such coarsening is unproblematic when using simple linear models like OLS. However, the above analysis demonstrates that coarsening can substantially bias IV estimates. The analytic results show how the shape of the CRF is critical for ascertaining the bias of the IV estimator with a binary treatment. Unless the instrument is very specific in inducing subjects to only reach treatment intensity k, or the causal response is non-zero only at that particular point and thus the more demanding exclusion restriction (A5*) is satisfied, the dummy variable IV estimator can be severely biased. If the CRF is instead approximately linear, it is more appropriate to estimate the LAPTE. Given the risk of coarsening bias, and the difficulty of implementing semi-parametric IV strategies,12 consistently estimating the LAPTE is often also more appropriate when the CRF is non-linear. Although researchers may in some cases have strong prior beliefs over the shape of the CRF, and thus the most appropriate empirical strategy, it is hard to be certain. For example, there is considerable debate over whether the wage returns to schooling are relatively constant across years of schooling (Becker 1993) or whether the main benefits follow from the signalling value of obtaining a diploma (Spence 1973). In general, researchers must rely on their intuition and evidence— including the reduced form relationship, separate first stage regressions and the (endogenous) OLS relationship—to determine the appropriate specification when only a single instrument is available. In rare cases where multiple instruments are available, a sharper empirical assessment is possible. With q > 1 instruments, q intervals of the CRF can be estimated by instrumenting for q binary indicators demarcating q treatment intensities. Provided different instruments do not affect different types of compliers differently, this permits the researcher to estimate βt for compliers at multiple relevant intervals. Finding a large effect at t 6= k, when βk is also estimated, provides 12

Semi-parametric approaches rely on substantially stronger assumptions than typical IV estimation, and are challenging to estimate (e.g. Blundell, Chen and Kristensen 2007; Newey and Powell 2003). 19

evidence that the CRF is not discontinuous and thus βkIV is unlikely to consistently estimate βk . Since finding multiple valid instruments is extremely challenging, this approach is unlikely to be suitable in general. However, in the special case of my empirical application, I am able to use this approach to clearly demonstrate that IV estimates for completing high school can substantially over-estimate education’s political effects by showing that prior years of schooling causally affect political preferences. Isolating the magnitude of coarsening bias in this particular case is instructive for researchers possessing a single instrument, because it confirms that coarsening bias is a major empirical concern. An important caveat for implementing the linear approach is that a measure of Ti , as well as Dik , is available. As noted above, researchers often resort to using dummy variables when more nuanced measures are not easily available. The application in this paper also shows how twosample IV methods can address this missing data problem. The two-sample IV method similarly solves the problem that neither Ti nor Dik are observed in the researcher’s main sample (Angrist and Pischke 2008; Franklin 1989).

3

High school education and political preferences

In this section, I consider coarsening bias in the context of examining how high school education affects vote choice in Great Britain. Despite widespread interest in the causal effects of education on political participation (see Sondheimer and Green 2010), education’s partisan bias has received limited attention from scholars seeking to move beyond survey correlations. Given education is central in the lives of adolescents as they become politically aware, but also has many downstream consequences for adulthood, it may be one of the most important determinants of political preferences. There are various ways in which education could affect political preferences. One of the most robust correlations from political surveys in developed democracies is the link between income and

20

support for right-wing political parties (e.g. Gelman et al. 2010; Thomassen 2005). If education increases income, as human capital theory suggests (e.g. Acemoglu and Angrist 2000; Becker 1993), then additional high school should increase support for right-wing parties proposing lower taxes (Meltzer and Richard 1981; Romer 1975). In Britain, this is likely to increase support for the Conservative party. Alternatively, education could cultivate socially liberal attitudes. This link has also been widely documented in survey research (Dee 2004; Schoon et al. 2010), although it is stronger at the university than high school level. Rather than supporting right-wing parties, this impetus generally seems to push voters toward left-wing parties supporting more post-materialist and socially liberal policies (e.g. Heath et al. 1985; Inglehart 1981). The Labour and especially Liberal Democrat parties are regarded as more socially progressive, and are thus expected to benefit if education causes voters to become more socially liberal. To estimate high school’s effect on political preferences, I use Britain’s compulsory schooling reforms as instruments for schooling in the context of a regression discontinuity design. This addresses the major selection concern that the types of individuals which receive more education is unlikely to be random, even after various observables are controlled for or matched upon (e.g. Kam and Palmer 2008). However, since British political surveys typically do not provide more granular measures of education than high school completion, IV methods risk replacing selection bias with coarsening bias. To circumvent this concern and estimate the effect of an additional year of late high school education, I use two-sample IV methods to combine the reduced form estimates with first stage estimates from a different sample measuring years of schooling. Due to the particular characteristics of this empirical application, I am able to demonstrate the extent of coarsening bias arising from instrumenting for high school completion.

21

3.1

Compulsory schooling laws in Britain

Great Britain’s education laws define the maximum age by which students must start school and the minimum age at which students can leave school. To identify the effect of high school education, I exploit two landmark reforms of the minimum leaving age that came into force in 1947 and 1972. First, Winston Churchill’s wartime coalition government passed the Education Act 1944, which increased the leaving age from 14 to 15 in England and Wales. The Education (Scotland) Act 1945 enacted the same reform in Scotland. The new leaving age, which had repeatedly failed to pass in the 1920s and 1930s due to financial constraints (Gillard 2011), came into force 1st April 1947 after several years of intensive preparation. Second, Parliament passed the Education Act 1962 raising the school leaving age to 16, although it was Conservative Edward Heath who finalized the extension in 1972 under Statutory Instrument 444 (1972). Like the 1947 reform, Labour had consistently pushed for the increase,13 while education was widely seen as an economically and socially beneficial investment at the time (Woodin, McCulloch and Cowan 2013). This second reform came into force in England, Scotland and Wales on 1st September 1972. Northern Ireland, which experienced different education reforms (Oreopoulos 2006), is excluded from the analysis. The reforms are described in greater detail in the Online Appendix. The reforms substantially altered the education profile of Britain’s students. As Figure 3 shows, relative to the immediately prior academic cohorts, both reforms induced a large fraction of students to remain in school for an additional year. Unlike compulsory schooling reforms in Canada and the U.S., which affected a small and somewhat idiosyncratic set of students (Clark and Royer 2013; Goldin and Katz 2008; Oreopoulos 2006), Britain’s reforms affected a large proportion of the population.14 Around one third of all students remained in school at least one year longer fol13

Under Labour Prime Minister Gordon Brown, Parliament passed the Education and Skills Act 2008, raising the education leaving to 18 by 2015. 14 The LATE converges toward the average treatment effect (ATE) as the number of compliers increases (Oreopoulos 2006). Aronow and Carnegie (2013) show that covariates can also aid translation between the LATE and ATE. 22

Figure 3: Compulsory schooling reforms and staying in school by cohort Notes: Data based on the Labour Force Survey data used in the empirical analysis below. Black lines represent third-order local polynomial fits. Grey dots are birth-year cohort averages.

lowing the 1947 reform, while a fifth remained in school because of the 1972 reform. While the 1947 reform also increased the proportion staying in school until 16, the 1972 reform did not affect schooling beyond the high school level. Importantly, the figure also clearly shows that neither reform affected lower levels of schooling or post-secondary education. Although the number of students in school rose considerably, the education system itself did not change greatly. Fees for secondary schooling had already been removed in 1944, although this did not affect enrollment (Oreopoulos 2006). Prior to the 1947 reform, the government preemptively engaged in a major expansion effort to maintain quality by increasing the number of 23

teachers, buildings and classroom materials (Woodin, McCulloch and Cowan 2013). In both cases, the additional year of schooling was primarily intended to ensure students grasped the material they had previously been taught (Clark and Royer 2013; Grenet 2013).

3.2

Data

I test the political implications of these reforms using the British Election Survey (BES) and the British Social Attitudes Survey (BSAS). The BES has been conducted following every general election since 1964. I use the eight elections from 1979 to 2010 that contain relevant variables. The BSAS has been conducted in the summer of every year since 1983, with the exception of 1988 and 1992, although I only examine the ten surveys asking about voting behavior.15 Both surveys randomly sample adult citizens (aged 18 or above) with postal addresses in Great Britain for inperson interviews.16 Pooling these 18 surveys produced a dataset containing 29,139 observations covering 14 different years. The main outcome variable is an indicator for voting Conservative at the last election. In the sample, 34% of respondents reported voting Conservative,17 while 41% and 18% respectively voted Labour and Liberal. As robustness checks, I examine an indicator for the 31% of voters that identify as Conservative partisans, and indicators for voting Labour and Liberal. The minimum schooling leaving age affecting an individual in (birth year) cohort c is defined by indicators for whether a student was born early enough to be impacted by each reform. Specifically, 1(Leaving agec = 15) = 1(birth year + 14 ∈ [1947, 1972]) denotes a voter affected by the 1947 reform, and 1(Leaving agec = 16) = 1(birth year + 15 ≥ 1972) denotes a voter affected by the 1972 reform.18 The residual category is the pre-1947 period when the leaving age was 14 for all 15

These surveys were conducted in: 1987, 1994-1996, 1999, 2001, 2003, 2005, 2008 and 2010. See footnote 29 for more details. Additional pre-election and non-interview surveys were excluded. 17 The survey-weighted Conservative national vote share across period under study is 36%. 18 Month of birth is unavailable in the BES or BSAS, so the instruments are assigned by birth year. My first stage is very similar to Clark and Royer (2013), who can assign the instruments 16

24

respondents. Whether an individual was affected by the reform is thus assigned by cohort, defined by the year aged 14 and 15. However, the BES and BSAS measures of education are problematic. Both surveys principally measure educational attainment using around six categories ranging from no qualification to university degree.19 In the BSAS, completing high school is captured by the second lowest category, which specifies that a respondent has a certificate of secondary education (CSE) or equivalent. At the end of high school (at age 15 or 16), or a student’s 11th year of formal schooling, students take CSE exams in a variety of subjects. Given only 2-3% of students fail any particular CSE exam, obtaining a CSE is a good proxy for completing high school. An indicator measuring this is used to examine the results when schooling is dichotomized at a theoretically appealing point. The BES instead asks whether respondents achieved a minimum level of examination performance; this measure is not considered here.20 Using only the BES and BSAS surveys to identify the effect of years of schooling would require either coarsening the treatment or substantially reducing the sample size. However, collecting a second sample from the same population but containing basic demographic variables and the age at which an individual left school can solve this problem. Accordingly, I use Labour Force Survey (LFS) data—an annual and more recently quarterly household survey—from each year in which a survey was conducted to collect a pooled sample of 1,179,939 voting age respondents.21 Years of using month of birth data. The clear graphical discontinuities shown below further support my coding. 19 Although both the BES and BSAS also ask respondents what age they left school, nearly half of the surveys did not allow respondents to answer that they left school below age 15, and thus cannot differentiate the effect of the 1947 reform from the number of years of schooling. This bottom coding is clearly still relevant in the twenty-first century because many of those aged 14 or above in 1947 are still alive. Respondents with foreign qualifications were excluded. 20 This measure produces similar results for completing high school in the BSAS. 21 Only the July-September sample was used since the LFS became quarterly. This avoids respondent duplication and approximates the months when the BES and BSAS surveys were conducted. Observations from Northern Ireland and respondents below the age of 18 were excluded to match the political surveys.

25

schooling is defined by the age a respondent left continuous full time education minus five, and an upper bound of 13 years of state-supported education is applied.22 Before 2003, the LFS collected both month and year of birth, and therefore permitted perfect instrument assignment; since 2003, the instruments were assigned as in the BES and BSAS.23

3.3

Identification strategy and estimation

To identify the effect of late high school education on political preferences, I use Britain’s compulsory schooling reforms as instruments for the level of schooling an individual receives. These reforms have been widely used as instruments, most convincingly in regression discontinuity (RD) designs (see Clark and Royer 2013; Oreopoulos 2006), because they induced dramatic changes in educational attainment across neighboring cohorts.24 This study employs a similar RD design where the running variable determining the treatment is birth year cohort. The key identifying assumption is that political preferences are continuous in all covariates other than the school leaving age at the reform discontinuity. This implies instrument independence (A2 above). The “sorting” concern is that another key variable simultaneously changes at the discontinuity. Selection into cohorts in Britain is implausible since parents could not have precisely predicted CSL reforms over a decade in advance.25 Furthermore, given that cultural shifts are unlikely to have affected 15 year olds without also affecting 14 year olds, the most plausible concerns relate to demographic, socio-economic and labor market characteristics. Figure 4 shows that trends in various proxies for these variables are essentially continuous through both disconti22

Classifying students in university and vocational programs is difficult after age 18. Since the reforms did not affect higher education, this choice is inconsequential. 23 Comparing the methods for the 1972 reform shows almost identical results. Too few respondents were 14 before 1947 in post-2003 surveys for comparison at the 1947 reform. 24 The laws have identified the effects of schooling on income (Devereux and Hart 2010; Grenet 2013; Harmon and Walker 1995; Oreopoulos 2006) and mortality rates (Clark and Royer 2013). 25 The lack of heaping is confirmed by annual birth rate data (Office for National Statistics 2013). In my BES/BSAS sample, McCrary (2008) tests confirm that the density of the data is indistinguishable across the reform discontinuities.

26

Figure 4: Trends in demographic, socio-economic and labor market demographic variables Notes: The data in Panels A-D is from the BES and BSAS. The data in Panels E-G is from the BES. The data in Panels H and I is from the Bank of England “UK Economic Data 1700-2009” dataset.

nuities. I first estimate the reduced form effects, δ1 and δ2 , of the schooling reforms themselves on voting Conservative. I thus estimate the following equations in the pooled BES and BSAS sample using OLS:

Yict = δ1 1(Leaving agec = 15) + δ2 1(Leaving agec = 16) + f (birth yearc ) + Wit γ + ηt + εit , (8)

where 1(Leaving agec < 15) is the residual category. f is a flexible global polynomial function of the running variable designed to capture trends away from the reform discontinuities.26 I use a 26

I use highly flexible global polynomial trends to include both reforms in the same specification.

27

third-order polynomial in the main tables, but present various specifications for f —ranging from linear birth year trends to fifth-order polynomial trends—to demonstrate robustness. Finally, Wit includes a gender dummy, standardized age polynomials,27 and dummies for white, black and (south and east) Asian ethnicities, and ηt is a survey fixed effect. Standard errors are clustered by cohort. The principal quantity of interest is the effect of schooling on political preferences. To estimate this, I use Britain’s reform cutoffs as instruments. Since the reforms do not exactly determine an individual’s level of schooling, the assignment of schooling Ti is probabilistic. I thus employ a “fuzzy” RD design, which generalizes the typical “sharp” RD to cases where the cutoff discontinuously increases the probability of receiving treatment by using the cutoff as an instrument for the treatment (Hahn, Todd and Van der Klaauw 2001). Like standard IV approaches, this requires that monotonicity (A4) and the exclusion restriction (A5 or A5*) hold. The fact that very few students failed to complete the minimum level of schooling after the reforms (see Figure 3) supports monotonicity. Although the close proximity of the reforms to schooling choices limits the scope for the reforms to affect an individual’s political preferences through other channels, the I explore exclusion restriction violations in the robustness checks below. However, whether assumption A5* holds in addition to assumption A5 is tested empirically. The fuzzy RD entails estimating the following structural equation:

Yict = β Ti + f (birth yearc ) + Wi ϕ + ηt + εict ,

(9)

where the first stage regression generating exogenous variation in Ti is given by:

Ti = α1 1(CSLc = 15) + α2 1(CSLc = 16) + f (birth yearc ) + Wi ψ + ηt + εict .

(10)

A strong first stage (A3) implies that the instruments explain substantial variation in schooling. 27

Age polynomials are assigned the same order as f . 28

Specifically, an F statistic exceeding 10, when testing the exclusion of the instruments, is sufficient to ensure that “weak instrument bias” is negligible (Staiger and Stock 1997). I use three measures of schooling. First, Ti is a dummy for completing high school. Since high school completion is only available in the BSAS data, equation (9) is estimated with 2SLS using the BSAS data. Such estimation is similar to the IV estimator in equation (3), except that the numerator and denominator are now weighted over the two instruments and other covariates are partialled out. Second, Ti is measured using two dummy variables for staying in school for 10 or above 10 years. Given Figure 3 shows that the instruments only induce students to receive a tenth or eleventh year of schooling (i.e. pt > 0 only for t = 10 or t = 11), the coefficients on these variables identify the effect of additional year of schooling at age 15 and 16. It is this feature that permits exact calculation of the coarsening bias in this application. Third, measuring Ti as years of schooling estimates the LAPTE. This is akin to estimating equation (7) using two instruments and controlling for covariates. Given that years of schooling is only measured in the LFS and political outcomes are only measured in the BES and BSAS, estimation for the latter two measures of Ti necessitates two-sample IV methods. When schooling is measured in years, I use two-sample 2SLS (TS2SLS) to estimate equation (9). TS2SLS uses the first stage estimates from the LFS sample to impute the unobserved predicted value of schooling in the political surveys. The effect of schooling on Yict can then be estimated, where the first and second stage estimation are efficiently combined as a consistent two-step estimator (Inoue and Solon 2010). I analytically derive the cluster-robust covariance matrix, which accounts for the uncertainty in the estimation of the first stage, in the Online Appendix using Murphy and Topel’s (1985) generated regressors method. The proof and other technical details are provided in the Online Appendix, while the general R program implementing this procedure is available in the replication code. Beyond the standard IV assumptions discussed above, TS2SLS also requires that both datasets

29

30 29,139 29,139 29,139 29,139 29,139 29,139 29,139

Pre-treatment covariates Birth year Age Male White Black Asian Survey year 1,951.33 44.14 0.46 0.96 0.01 0.02 1,995.47

0.51 0.37

0.80

0.34

14.70 13.56 0.50 0.20 0.11 0.14 8.20

0.50 0.48

0.40

0.47

1920 18 0 0 0 0 1979

0 0

0

0

Min.

1987 69 1 1 1 1 2010

1 1

1

1

Max.

84,172 84,172 84,172 84,172 84,172 84,172 84,172

84,172 84,172

84,172 84,172 84,172

Obs.

1,951.30 44.17 0.46 0.96 0.01 0.02 1,995.84

0.51 0.37

11.09 0.26 0.67

Mean

14.56 14.37 0.50 0.20 0.11 0.14 8.58

0.50 0.48

1.45 0.44 0.47

LFS Std. dev.

1920 18 0 0 0 0 1979

0 0

0 0 0

Min.

1987 69 1 1 1 1 2010

1 1

13 1 1

Max.

Notes: High school is only measured in the BSAS (see main text). See main text and Online Appendix for the procedure used to create the LFS sample.

29,139 29,139

15,102

Endogenous treatment variables Schooling 10 years of schooling 11 or more years of schooling High school

Excluded instruments CSL=15 CSL=16

29,139

Dependent variable Conservative vote

Obs.

BES/BSAS Mean Std. dev.

Table 1: Summary statistics: BSAS and LFS samples

independently sample from the same population (Franklin 1989; Inoue and Solon 2010).28 Although the BES, BSAS and LFS are all stratified random samples from the population,29 imbalances could remain due to chance, different survey sizes or differential response rates. To address the concern that the TS2SLS assumptions are not satisfied, I chose a random subsample of the LFS sample to match the BES and BSAS sample distribution in terms of year of birth, gender, ethnicity, leaving age, and survey year by randomly choosing observations from within these blocks.30 This reduces the LFS sample to 84,172 observations.31 The summary statistics in Table 1 show that the first and second moments on the common variables match well.

3.4 3.4.1

Results Compulsory schooling reforms increase schooling and Conservative voting

Figure 5 plots the first stage graphically. The left hand graph shows a large increase in the average number of years of schooling per cohort following the 1947 reform. This reflects the 40% of students which stayed in school for another year shown in Figure 3. The right-hand graph shows that the 1972 reform also substantially increased average years of schooling, although the magnitude of the change was smaller because by 1972 students generally stayed in school longer. The first stage estimates in Table 2 confirm that both reforms substantially increased schooling. Looking at the dummy for completing high school in the BSAS sample, column (1) shows that 28

This assumption ensures that both sets of sample moments converge upon a common population, and can thus be interchanged. 29 The BES uses a multi-stage design, randomly selecting postal addresses from several wards from randomly sampled constituencies (stratifying by region). Similarly, the BSAS divides Britain into sectors defined by postcode, from which households are randomly chosen from postal addresses. Respondents aged 18 or above within a household are then randomly chosen. The LFS became an unclustered (“simple”) random sample from postal addresses since 1992, having earlier clustered. 30 To match the LFS sample, the final samples used for both datasets exclude respondents aged above 69 and those born before 1920 or after 1987. 31 Where sample size concerns are more salient, the first stage could be weighted to match the reduced form sample distribution. 31

Figure 5: Average years of schooling by birth year cohort (LFS data) Notes: Black lines represent second-order local polynomial fits. Grey dots are cohort averages.

32

both the 1947 and 1972 reforms significantly increased the probability of completing high school. Importantly, given the potential for coarsening bias, the 1947 reform increases the probability of completing high school by around seven percentage points. This increase for the 1947 is highly statistically significant (F statistic of 14.4 when those affected by the 1972 reform are excluded), confirming that any bias does not simply result from a weak instrument problem. Column (2) instead examines years of schooling in the LFS, and reinforces the graphical analysis showing that both reforms were effective at keeping students in school. The 1947 reform increases years of schooling by 0.45 years, while the 1972 reform added a further 0.09 years. In both cases, the large F statistic—testing the relevance of the excluded instruments—indicates a strong first stage.32 Although the cohort averages are noisier, the reduced form plots in Figure 6 indicate that around the reforms voters differ systematically in their political preferences. Particularly following the 1947 reform, there is a notable upward shift in support for the Conservative party by cohort. The graphs indicate that cohorts affected by the reform are approximately five percentage points more Conservative. Given that the 1972 reform affected fewer students, the two percentage point difference at the discontinuity is less clear. The fact that both reforms reverse the trend against the Conservatives—which is a function of both declining support over time and younger voters being more left-wing—further suggests that the posited relationship does not simply reflect cohort trends. Column (3) of Table 2 presents the reduced form estimates of the reform’s effect on voting Conservative later in life. The results indicate that the reforms induced a large and statistically significant increase in support for the Conservative party. Cohorts affected by the 1947 are 6.9 percentage points more likely to vote Conservative, while the 1972 reform—which affected fewer students—increased Conservative voting by a further 2.4 percentage points.33 Such large shifts for affected cohorts imply that the reforms substantially altered national politics, and could easily 32 33

The reform’s effects differ at least at the 5% level. The difference between the two coefficients is significant just outside the 10% level.

33

34 BSAS 15,102 BSAS 15,102 36.7

0.411** (0.189)

BSAS 8,951 BSAS 8,951 14.4

0.504 (0.418)

Vote Con 2SLS (5)

Vote Con TS2SLS (7)

BES/BSAS 29,139 LFS 84,172 50.1

-0.044 (0.029)

Vote Labour TS2SLS (8)

BES/BSAS 29,139 LFS 84,172 50.1

-0.111*** (0.034)

Vote Liberal TS2SLS (9)

Notes: Specifications (1) and (2) present the first stage estimates where completing high school and years of schooling are respectively the endogenous treatment variable in the BSAS and LFS samples. Specification (3) is the reduced form estimate. In specifications (4)-(9), the variables listed on the left side of the table are instrumented for by the indicators for the 1947 and 1972 reforms. All specifications include cubic birth year polynomials, age squared, age cubed, male, white, black and south Asian dummies, and survey year fixed effects. Specification (5) excludes respondents affected by the 1972 reform. While specifications (3)-(7) take Conservative vote as dependent variable, the dependent variable in specifications (8) and (9) respectively is Labour and Liberal vote. Specifications (4) and (5) are estimated using 2SLS in the BSAS sample, while specifications (6)-(9) are estimated using TS2SLS where the reduced form is estimated in the BES/BSAS sample and the first stage is estimated in the LFS sample. Standard errors clustered by cohort. * denotes p < 0.1, ** denotes p < 0.05, *** denotes p < 0.01.

BES/BSAS 29,139 LFS 84,172 50.1

0.164*** (0.038) 0.288*** (0.097)

Vote Con TS2SLS (6)

BES/BSAS 29,139 LFS 84,172 50.1

BES/BSAS 29,139

0.069*** (0.015) 0.093*** (0.021)

Vote Con 2SLS (4)

Reduced form stage sample Reduced form observations First stage sample First stage observations First stage F statistic LFS 84,172 50.1

0.446*** (0.047) 0.534*** (0.059)

Vote Con OLS (3)

0.166*** (0.038)

BSAS 15,102 36.7

0.073*** (0.022) 0.176*** (0.025)

OLS (2)

Schooling

Years of schooling

11 or more years of schooling

10 years of schooling

Completed high school

1972 reform

1947 reform

High school OLS (1)

Table 2: Instrumental variable estimates of schooling’s effect on voting Conservative

Figure 6: Proportion conservative by birth year cohort (BSAS data) Notes: Black lines represent second-order local polynomial fits. Grey dots are cohort averages.

35

have altered the outcomes of the close elections in 1974 (February), 1992 and 2010. 3.4.2

Instrumental variable estimates of high school’s effect on voting Conservative

By averaging across all individuals, the reduced form underestimates the impact on individuals who only remained in school because of the reforms. To calculate the effects for such compliers, I turn to the fuzzy RD estimates. Columns (4)-(9) in Table 2 present the fuzzy RD results, instrumenting for different measures of schooling with the school leaving age reforms. I first examine the 2SLS estimates where schooling is dichotomized. Given column (3) established a significant reduced form effect for the 1947 reform, but the reform did not compel all students to complete high school, there is clear scope for upward coarsening bias. Column (4) suggests that voters induced to complete high school by the reform are 41 percentage points more likely to vote Conservative in later life. This statistically significant estimate is very large, particularly when considering that a large segment of the population are compliers. This concern is even more evident in column (5), which uses only the 1947 reform as an instrument (removing those born after 1972). In this specification—where the bias is expected to be largest, given that the 1947 caused a significant proportion of student to also complete high school—the 2SLS estimates imply a 50 percentage point increase in the probability of voting Conservative. Although such large effects on predominantly working class, and thus generally pro-Labour, compliers are surprising, they are perhaps not infeasible. The availability of two instruments that affect only two levels of schooling allows me to distinguish potential coarsening bias. Column (6) uses the 1947 and 1972 reforms to instrument for indicators for completing ten years of schooling or 11 or more years of schooling. Given that neither reform affected attaining nine or fewer years of schooling, or more than 11 years of schooling, the coefficients in column (6) can separately estimate the effect of an additional year of late high school. These results indicate that a 10th year increases the probability of voting Conservative by 16 percentage points, while an 11th year adds a further 12 percentage points. Therefore, at least 36

at the end of high school, the political effect of schooling is approximately linear. Unsurprisingly, the LAPTE estimate in column (7) shows a similar effect for an additional year of schooling. The estimates in columns (6) and (7) demonstrate that the dummy for completing high school substantially overstates the political effect of the final year of high school. Coarsening bias more than trebles the estimate for the final (11th) year of school, and quadruples it when focusing only on the 1947 reform. While the naive estimates are biased in terms of magnitude, the analysis conducted here nevertheless finds that late high school causes voters to become substantially more conservative in later life. In fact, despite reducing the magnitude of the estimates, the precision of the estimate—in terms of the coefficient’s size relative to its standard error—increases. The mechanisms underpinning this significant finding are explored below. Given Britain has had three main political parties throughout the survey period analyzed here, it is interesting to assess which party primarily loses votes to the Conservatives. Specifications (8) and (9) respectively use Labour and Liberal vote indicators as dependent variables, and show that schooling decreases the probability of voting for both parties. The reduction is especially large, and statistically significant, for the Liberal Democrats. However, unreported results show that the 1947 reform also significantly harmed Labour. 3.4.3

Robustness checks

I now show that the reduced form and TS2SLS estimates are robust to various potential concerns. First, I provide support for continuity assumption underpinning RD estimation. Figure 7 demonstrates that the results are not being driven by the choice of cubic cohort trends.34 In particular, similar results are obtained when using higher-order polynomials that could better account for complex trends in Conservative support. Furthermore, I also control for national labor market characteristics—measured using the unemployment rate and average earnings—at age 14 in col34

The Online Appendix presents a similar figure for the reduced form estimates.

37

Linear

Quadratic

Cubic

Quartic

Quintic .05

.1

.15

.2

.25

.3

Marginal effect of years of schooling

Figure 7: TS2SLS estimates using higher-order polynomial controls Notes: Higher-order polynomial specifications include standardized global birth year trends of order p and standardized age trends of order p (excluding linear age because it is perfectly collinear with linear birth year). The cubic estimates, in red, are those reported in the regression tables.

umn (1) of Table 3 and find similar results.35 To demonstrate that age is not driving the results, column (2) shows similar estimates when considering within-age variation, by including age fixed effects. Second, I also validate the estimates by ensuring that they apply consistently across measures of political preference. Column (3) in Table 3 similarly shows that an additional year of late high school increases the likelihood of identifying as a Conservative partisan by 12 percentage 35

Despite reducing the sample by c.75%, the results are robust to including father job status.

38

Table 3: Robustness checks

Controls (1)

Age dummies (2)

Partisan (3)

BES (4)

BSAS (5)

Panel A: Reduced form estimates 1947 reform 0.055*** (0.015) 1972 reform 0.086*** (0.020)

0.071*** (0.015) 0.097*** (0.021)

0.048*** (0.018) 0.079*** (0.023)

0.070*** (0.019) 0.085*** (0.025)

0.068** (0.032) 0.097*** (0.036)

Panel B: TS2SLS estimates Years of schooling 0.128*** (0.031)

0.170** (0.037)

0.127*** (0.044)

0.151*** (0.044)

0.187** (0.075)

Reduced form observations First stage observations First stage F statistic

29,139 84,172 63.7

28,799 84,172 51.0

14,037 59,224 46.1

13,765 58,253 34.2

29,139 84,172 71.9

Notes: All specifications include cubic birth year polynomials, age squared, age cubed, male, white, black and south Asian dummies, and survey year fixed effects. Specification (1) includes the national unemployment rate and average earnings index at age 14 as controls. Specification (2) includes a full set of age dummies. Specification (3) takes Conservative partisanship is an indicator dependent variable. Specifications (4) and (5) use the BES data with Conservative voting and partisanship as dependent variables; a different LFS sample is used to match the BES distribution. Standard errors clustered by cohort. * denotes p < 0.1, ** denotes p < 0.05, *** denotes p < 0.01.

39

points. Moreover, columns (4) and (5) show similar estimates when examining the BES and BSAS separately. Finally, the exclusion restriction is violated if the reforms affected political preferences through channels other than schooling. Although political or cultural changes are unlikely to differentially affect cohorts one year apart, it is possible that an additional year in school could affect life choices—such as marriage or having children—by simply keeping students in school, but without operating through schooling itself. The Online Appendix, however, shows that neither reform affected the age of a respondent’s oldest child, the number of children a respondent has, or whether the respondent has ever been married at the time of the survey. Furthermore, any reduction in schooling quality or spillover causing older cohorts to behave more like treated cohorts would reduce between-cohort differences around the reforms, and thus downwardly bias the estimates. Nevertheless, I establish the sensitivity of the results to the exclusion restriction by calculating the extent of the violation required to overturn the results. Conley, Hansen and Rossi’s (2012) union of confidence intervals sensitivity test indicates that around two-thirds of the reduced form effect must operate through channels other than schooling for the TS2SLS estimates to become insignificantly positive.36

3.5

Mechanisms

To understand how education causes voters to vote Conservative, I now examine the mechanisms underpinning this relationship. I use political questions from the BES surveys to separate alternative explanations.37 Although demonstrating a causal mechanism is difficult, examining a range of potential mediators in conjunction with placebo tests can support some mechanisms and eliminate others (Gerber and Green 2012). I replace Yict with Yict − δ1 1(CSLc = 15) − δ2 1(CSLc = 16) and estimate equation (9) using TS2SLS. I vary δ1 and δ2 , restricting their ratio δ2 /δ1 = 0.093/0.069 (as in Table 2). The schooling coefficient becomes statistically insignificant when δ1 = 0.047 and δ2 = 0.063. 37 The BSAS only regularly asks several political questions. 36

40

Human capital theory and the Romer-Meltzer-Richard model predict that education induces more conservative fiscal policy preferences by increasing an individual’s income. There is clear evidence that the 1947 and 1972 reforms substantially increased the income of affected cohorts: using similar RD designs, previous studies show that each additional year of schooling increased wage income by 5-15 percent (Devereux and Hart 2010; Grenet 2013; Harmon and Walker 1995; Oreopoulos 2006). Columns (1) and (2) in Table 4 test the Romer-Meltzer-Richard prediction that increased education—which increases income—also translates into more conservative fiscal policy preferences, using specifications analogous to those in Table 2. Consistent with the income-based explanation, the TS2SLS estimates provide clear evidence that voters become less supportive of tax and spend and less likely to support expanding welfare benefits. For both variables, an additional year of schooling increases support for conservative economic policies by around one quarter of a standard deviation. Although the causal link from fiscal policy preferences to vote choice cannot be tested, there is unsurprisingly a strong negative correlation between supporting high taxation and welfare spending in the BES sample.38 However, if voters adopt the policy positions of the political party or candidate they identify with (e.g. Lenz 2012), changes in economic policy preferences could simply reflect changes in partisanship arising from an alternative source. To test this possibility, I examine whether respondents adopt Conservative positions on non-economic issues. In particular, I examine the following positions associated with the Conservative party: emphasis on reducing crime over protecting citizen rights, support for Britain leaving the European community (EEC, EC or EU, depending on the survey year), and not abolishing private education.39 The results in columns (3)-(5) show that education does not significantly shift voters toward any of these Conservative positions. This evidence clearly suggests that education’s political effects operate through fiscal policy preferences. 38

The significant correlations between voting Conservative and supporting tax and spend and supporting welfare benefits are respectively -0.26 and -0.31. 39 Unsurprisingly, emphasizing crime reduction (ρ = 0.12), not abolishing private education (ρ = 0.26) and leaving Europe (ρ = 0.03) are significantly positively correlated with voting Conservative. 41

42

(2)

(1)

0 or 1 2.12 0.99 8,516 30,469 61.7

Outcome range Outcome mean Outcome standard deviation Reduced form observations First stage observations First stage F statistic

0 to 10 6.48 2.73 6,872 34,140 25.2

0.499 (0.524)

0.132 (0.225) 0.331 (0.288)

Support crime reduction (over rights) (3)

0 or 1 0.33 0.47 12,921 55,275 44.2

0.010 (0.064)

0.003 (0.030) 0.006 (0.031)

0 or 1 0.81 0.39 10,722 34,762 53.0

-0.037 (0.037)

-0.029* (0.017) -0.017 (0.021)

Support Oppose leaving abolishing Europe private education (4) (5)

-4.71 to 1.76 0.00 1.00 8,839 40,163 34.6

0.105 (0.129)

0.054 (0.061) 0.022 (0.077)

(6)

Political information index

-1.98 to 1.49 0.00 1.00 14,032 59,224 46.1

-0.064 (0.120)

-0.007 (0.054) -0.076 (0.075)

(7)

Political interest index

Notes: All variables come from BES surveys, given the BSAS does not ask many political questions. All specifications include cubic birth year polynomials, age squared, age cubed, male, white, black and south Asian dummies, and survey year fixed effects. Standard errors clustered by cohort. * denotes p < 0.1, ** denotes p < 0.05, *** denotes p < 0.01.

0 to 10 6.56 2.41 11,394 55,275 44.2

-0.187** (0.084)

Panel B: TS2SLS estimates Years of schooling -0.681*** (0.222)

Panel A: Reduced form estimates 1947 reform -0.334*** -0.087* (0.082) (0.046) 1972 reform -0.295** -0.152** (0.119) (0.070)

Support welfare benefits

Support tax and spend

Table 4: Mechanisms through which schooling affects political preferences

The most plausible alternative to the income channel is that education causes voters to become more conservative by increasing their political engagement. Particularly since compliers are from disproportionately poor backgrounds (Grenet 2013), greater engagement—whether individually or socially—could challenge socially-ingrained left-wing predispositions. To test this possibility, I examine the effect of schooling on measures of political engagement. The results, in columns (6) and (7), provide no support for such an engagement mechanism. Rather, schooling does not affect standardized indices testing political knowledge and surveying interest in politics (see Online Appendix for construction).40

4

Conclusion

This article highlights and illustrates the importance of coarsening bias. This bias may be common in applied research using IV methods and, as demonstrated in the context of high school’s long-run effects on political preferences, can be substantial. Using two-sample IV methods to overcome data limitations frequently encountered by applied researchers, I show that while using a binary indicator for completing high school did not incorrectly flag a statistically significant causal relationship, it over-estimated the causal relationship by three to four times. Both theoretically and empirically, this articles demonstrates that coarsening bias is an important concern that researchers utilizing IV methods should be aware of. When deciding how to code an endogenous treatment variable, researchers must be carefully consider the nature of the causal effect they expect to find. In general, only when the treatments effects are truly discontinuous, in that a certain level of the treatment discontinuously increases the treatment’s causal, and the researcher is able to exactly pinpoint that threshold where the causal effect occurs, will coarsening produce unbiased estimates of the desired causal quantity. Otherwise, it is safer to avoid the strict exclusion restriction required in this case and instead consistently estimate the local average 40

The effect on reported turnout is precisely zero. There is similarly no effect on post-materialist beliefs. 43

per-unit treatment effect by measuring the treatment intensity using a linear treatment variable. When sufficiently fine-grained measures of the treatment are not available, this paper shows that two-sample IV methods can be used to estimate the first stage in another dataset where both the instrument and a fine-grained measure of the treatment are available. When no good measure of the treatment can be obtained, focusing on the reduced form estimates is the safest approach. The estimates of high school’s political effects are important in their own right. Specifically, I show for the first time find that an additional year of late high school causes downstream support for Britain’s Conservative party. These large effects are “local” in that they only apply to students that would not have remained in school without the reforms, although the nature of Britain’s education reforms ensures that these effects apply to a large proportion of the population. Furthermore, I provide evidence suggesting that these large effects can primarily be attributed to education increasing income, which in turn induces voters to support parties offering more conservative fiscal policies. This finding raises a “catch 22” for the Labour and Liberal parties: increasing education opportunities has been a key plank in the policies of these parties, but have come at the cost of losing voters. It is thus important for future research to assess both whether this same relationship extends to other countries and the exact mechanisms that underpin the political implications of education policy.

44

References Acemoglu, Daron and Joshua D. Angrist. 2000. “How Large Are Human Capital Externalities? Evidence from Compulsory Schooling Laws.” NBER Macroeconomics Annual 2000 pp. 9–59. Acemoglu, Daron, Simon Johnson and James A. Robinson. 2001. “The Colonial Origins of Comparative Development: An Empirical Investigation.” American Economic Review 91(5):1369– 1401. Angrist, Joshua D. and Alan B. Krueger. 1992. “The Effect of Age at School Entry on Educational Attainment: An Application of Instrumental Variables with Moments from Two Samples.” Journal of the American Statistical Association 87(418):328–336. Angrist, Joshua D. and Alan B. Krueger. 1995. “Split-sample instrumental variables estimates of the return to schooling.” Journal of Business and Economic Statistics 13(2):225–235. Angrist, Joshua D. and Guido W. Imbens. 1995. “Two-Stage Least Squares Estimation of Average Causal Effects in Models With Variable Treatment Intensity.” Journal of the American Statistical Association 90(430):431–442. Angrist, Joshua D., Guido W. Imbens and Donald B. Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91(June):444– 455. Angrist, Joshua D. and J¨orn-Steffan Pischke. 2008. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ: Princeton University Press. Aronow, Peter M. and Allison Carnegie. 2013. “Beyond LATE: Estimation of the average treatment effect with an instrumental variable.” Political Analysis 21(4):492–506.

45

Becker, Gary S. 1993. Human Capital: A Theoretical and Empirical Analysis, with Special Reference to Education. University of Chicago Press. Blundell, Richard, Xiaohong Chen and Dennis Kristensen. 2007. “Semi-Nonparametric IV Estimation of Shape-Invariant Engel Curves.” Econometrica 75(6):1613–1669. Clark, Damon and Heather Royer. 2013. “The Effect of Education on Adult Mortality and Health: Evidence from Britain.” American Economic Review 103(6):2087–2120. Conley, Timothy G., Christian B. Hansen and Peter E. Rossi. 2012. “Plausibly Exogenous.” Review of Economics and Statistics 94(1):260–272. Dee, Thomas S. 2004. “Are there civic returns to education?” Journal of Public Economics 88:1697–1720. Devereux, Paul J. and Robert A. Hart. 2010. “Forced to be Rich? Returns to Compulsory Schooling in Britain.” Economic Journal 120:1345–1364. Dunning, Thad. 2008. “Model specification in instrumental-variables regression.” Political Analysis 16(3):290–302. Franklin, Charles H. 1989. “Estimation across data sets: two-stage auxiliary instrumental variables estimation (2SAIV).” Political Analysis 1(1):1–23. Gelman, Andrew, Park, Boris Shor, Joseph Bafumi and Jeronimo Cortina. 2010. Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do. Princeton, NJ: Princeton University Press. Gerber, Alan. 1998. “Estimating the Effect of Campaign Spending on Senate Election Outcomes Using Instrumental Variables.” American Political Science Review 92(2):401–411.

46

Gerber, Alan S. and Donald P. Green. 2000. “The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment.” American Political Science Review 94(3):653– 663. Gerber, Alan S. and Donald P. Green. 2012. Field Experiments: Design, Analysis, and Interpretation. W.W. Norton. Gerber, Alan S., Gregory A. Huber and Ebonya Washington. 2010. “Party affiliation, partisanship, and political beliefs: A field experiment.” American Political Science Review 104(4):720–744. Gillard, Derek. 2011. “Education in England: A Brief History.” Web link. Goldin, Claudia D. and Lawrence F. Katz. 2008. The Race Between Education and Technology. Cambridge, MA: Harvard University Press. Grenet, Julien. 2013. “Is Extending Compulsory Schooling Alone Enough to Raise Earnings? Evidence from French and British Compulsory Schooling Laws.” Scandinavian Journal of Economics 115(1):176–210. Hahn, Jinyong, Petra Todd and Wilbert Van der Klaauw. 2001. “Identification and estimation of treatment effects with a regression-discontinuity design.” Econometrica 69(1):201–209. Harmon, Colm and Ian Walker. 1995. “Estimates of the Economic Return to Schooling for the United Kingdom.” American Economic Review 85(5):1278–1286. Heath, Anthony, Roger Jowell, John Curtice, Julia Field and Clarissa Levine. 1985. How Britain Votes. Pergamon Press Oxford. Inglehart, Ronald. 1981. “Post-Materialism in an Environment of Insecurity.” American Political Science Review 75(4):880–900. Inoue, Atsushi and Gary Solon. 2005. “Two-Sample Instrumental Variables Estimators.”. 47

Inoue, Atsushi and Gary Solon. 2010. “Two-Sample Instrumental Variables Estimators.” Review of Economics and Statistics 92(3):557–561. Kam, Cindy D. and Carl L. Palmer. 2008. “Reconsidering the Effects of Education on Political Participation.” Journal of Politics 70(3):612–631. Lenz, Gabriel S. 2012. Follow the Leader? How Voters Respond to Politicians’ Policies and Performance. University of Chicago Press. McCrary, Justin. 2008. “Manipulation of the running variable in the regression discontinuity design: A density test.” Journal of Econometrics 142(2):698–714. Meltzer, Allan H. and Scott F. Richard. 1981. “A rational theory of the size of government.” Journal of Political Economy 89:914–927. Miguel, Edward, Shanker Satyanath and Ernest Sergenti. 2004. “Economic shocks and civil conflict: An instrumental variables approach.” Journal of Political Economy 112(4):725–753. Milligan, Kevin, Enrico Moretti and Philip Oreopoulos. 2004. “Does education improve citizenship? Evidence from the United States and the United Kingdom.” Journal of Public Economics 88:1667–1695. Mincer, Jacob. 1974. Schooling, Experience, and Earnings. New York: Columbia University Press. Murphy, Kevin M. and Robert H. Topel. 1985. “Estimation and Inference in Two-Step Econometric Models.” Journal of Business and Economic Statistics 20(1):88–97. Newey, Whitney K. and James L. Powell. 2003. “Instrumental variable estimation of nonparametric models.” Econometrica 71(5):1565–1578.

48

Nie, Norman H., Jane Junn and Kenneth Stehlik-Barry. 1996. Education and Democratic Citizenship in America. University of Chicago Press. Office for National Statistics. 2013. “Vital Statistics: Population and Health Reference Tables— Annual Time Series Data.” Web link. Oreopoulos, Philip. 2006. “Estimating Average and Local Average Treatment Effects of Education when Compulsory Schooling Laws Really Matter.” American Economic Review 96(1):152–175. Pierskalla, Jan H. and Florian M. Hollenbach. 2013. “Technology and collective action: The effect of cell phone coverage on political violence in Africa.” American Political Science Review 107(2):207–224. Romer, Thomas. 1975. “Individual welfare, majority voting, and the properties of a linear income tax.” Journal of Public Economics 4(2):163–185. Schoon, Ingrid, Helen Cheng, Catharine R. Gale, G. David Batty and Ian J. Deary. 2010. “Social status, cognitive ability, and educational attainment as predictors of liberal social attitudes and political trust.” Intelligence 38(1):144–150. Sondheimer, Rachel M. and Donald P. Green. 2010. “Using Experiments to Estimate the Effects of Education on Voter Turnout.” American Journal of Political Science 41(1):178–189. Sovey, Allison J. and Donald P. Green. 2011. “Instrumental variables estimation in political science: A readers’ guide.” American Journal of Political Science 55(1):188–200. Spence, Michael. 1973. “Job market signaling.” Quarterly Journal of Economics 87(3):355–374. Staiger, Douglas and James H. Stock. 1997. “Instrumental Variables Regression with Weak Instruments.” Econometrica 65(3):557–586.

49

Thomassen, Jacques J.A. 2005. The European Voter: A Comparative Study of Modern Democracies. Oxford: Oxford University Press. Verba, Sidney, Kay Lehman Schlozman and Henry E. Brady. 1995. Voice and Equality: Civic Voluntarism in American Politics. Cambridge, MA: Harvard University Press. Woodin, Tom, Gary McCulloch and Steven Cowan. 2013. “Raising the participation age in historical perspective: policy learning from the past?” British Educational Research Journal 39(4):635–653. Woodin, Tom, McCulloch Gary and Steven Cowan. 2013. Secondary Education and the Raising of the School Leaving Age: Coming of Age? New York: Palgrave MacMillan.

50

Online Appendix Proofs Proof of Proposition 1. Assumption A1 allows us to write the potential outcomes of Yi as Yi (Zi , Ti ) and potential outcomes of Ti as Ti (Zi ). Without loss of generality, take the case where A4 holds, and Ti (1) − Ti (0) ≥ 0. The reduced form can be written as: E[Yi |Zi = 1] − E[Yi |Zi = 0] = E[Yi (1, Ti (1)) −Yi (0, Ti (0))]  J = E ∑ I (Ti (1) ≥ t )[Yi (1,t ) −Yi (1,t − 1)] − t =2

J



∑ I (Ti (0) ≥ t )

[Yi (0,t ) −Yi (0,t − 1)]

t =2

 J  = E ∑ [I (Ti (1) ≥ t ) − I (Ti (0) ≥ t )][Yi (t ) −Yi (t − 1)] t =2

J

=

∑ Pr[Ti (1) ≥ t > Ti (0)]E[Yi (t ) −Yi (t − 1)|Ti (1) ≥ t > Ti (0)]

t =2 J

=

∑ pt βt ,

t =2

where the first line uses random assignment (A2), the third uses A5 to re-write Yi (z,t ) = Yi (z0 ,t ) ≡ Yi (t ), and the fourth line uses A4 (which implies that Ti (1) ≥ t > Ti (0) is either 0 or 1). In terms of potential outcomes, the coarsening treatment indicator is given by Dik (Ti (Zi = z) = t ); because assignment from t to 0 or 1, z cannot affect Dik . Using assumption A2, the first stage

51

for the coarsening indicator Dik is given by:  J E[Dik |Zi = 1] − E[Dik |Zi = 0] = E ∑ [Dik (t ) − Dik (t − 1)]I (Ti (1) ≥ t ) − t =2

J



∑ I (Ti (0) ≥ t )[Dik (t ) − Dik (t − 1)]

t =2

 J  = E ∑ [I (Ti (1) ≥ t ) − I (Ti (0) ≥ t )][Dik (t ) − Dik (t − 1)] t =2

  = E I (Ti (1) ≥ k) − I (Ti (0) ≥ k)

= Pr(Ti (1) ≥ k > Ti (0)) = pk , where the third line follows from the fact that, by definition of Dik (t ), only Dik (k) − Dik (k − 1) = 1 (all other jumps are 0), while the fourth line follows from A4 (as above). Combining the first stage with the reduced form, the Wald IV estimator of the LATE for Dik is: E[Yi |Zi = 1] − E[Yi |Zi = 0] E[Dik |Zi = 1] − E[Dik |Zi = 0] ∑tJ=2 pt βt = pk ∑tJ=2,t6=k pt βt = βk + , pk

βkIV =

where the final line follows from simple algebra, and the final term is the additional bias of the estimator (beyond finite sample bias). Assumption A3 ensures that βkIV is well-defined, by preventing the denominator from equalling zero. It is immediately clear that the additional bias term is positive whenever ∑tJ=2,t6=k pt βt > 0, given pt ≥ 0. βt > 0 for all t is thus a sufficient condition for positive bias; the bias is thus upward in magnitude if βk > 0. Similarly for ∑tJ=2,t6=k pt βt < 0 and βk < 0. Consequently, sign(βk ) = sign(βt ), ∀t implies |βk | ≤ |βkIV |. 52

Consistency requires that ∑tJ=2,t6=k pt βt = 0. I now show that A5* is a sufficient condition. Both A5 and A5* entail that [Yi (z,t ) − Yi (z,t − 1)] = [Yi (t ) − Yi (t − 1)]. Furthermore, the definition of A5* entails that [Yi (t ) −Yi (t − 1)] = 0 for all t 6= k. Consequently,  J  E[Yi |Zi = 1] − E[Yi |Zi = 0] = E ∑ [I (Ti (1) ≥ t ) − I (Ti (0) ≥ t )][Yi (t ) −Yi (t − 1)]  = E

t =2 J





[I (Ti (1) ≥ t ) − I (Ti (0) ≥ t )] [Yi (t ) −Yi (t − 1)] +

t =2,t6=k

  E [I (Ti (1) ≥ k) − I (Ti (0) ≥ k)][Yi (k) −Yi (k − 1)]

= 0 + pk βk ,

where the first line follows from A5 and the third line requires A5*. Under A5*, it is thus clear that the Wald estimator then yields: E[Yi |Zi = 1] − E[Yi |Zi = 0] E[Dik |Zi = 1] − E[Dik |Zi = 0] p βt = k pk

βkIV =

= βk . Therefore, βkIV is a consistent estimator under A1, A2, A3, A4 and A5* (which implies A5). 

W ,J Proof of Proposition 2. Note, using the first part of the proof of Proposition 1, that βLAPT E = ∑tJ=2 pt βtJ ∑tJ=2 pt

W ,αJ = τ and βLAPT E =

αJ ∑tαJ =2 pt βt αJ p ∑t =2 t

= τ/α, where the linearity of the causal effect at each

intensity interval implies αβtαJ = βtJ . The result follows. 

53

Two-sample 2SLS estimation The goal is estimate the following system of IV equations:

Yi = Ti βT + Wi β−T + ui = Xi β + ui

(11)

Ti = Zi Π + εi ,

(12)

where Xi includes exogenous covariates Wi and the treatment variable(s) Ti , while Zi includes Wi and q excluded instruments. Identification requires that only p ≤ q treatment variables can be instrumented for. Two methods have been proposed for IV estimation with two samples. Angrist and Krueger (1992) propose a Wald-style estimator where the reduced form estimates are divided by their first stage counterparts, which can be generalized to the overidentified case where the number of instruments outnumber the number of endogenous variables. Inoue and Solon (2010) show that this estimator is less efficient than the 2SLS counterpart—first proposed by Franklin (1989)—that will be used in the empirical application here. The advantage of this estimator is that it corrects for finite-sample differences between the two samples.41 Furthermore, its extension to multiple instruments and multiple endogenous variables is straight-forward—both of which are important in many empirical applications, including the analysis in this paper. In matrix form (stacking over i in each sample), the two-sample 2SLS (TS2SLS) estimator is: βˆ T S2SLS = (Xˆ10 Xˆ1 )−1 Xˆ10 Y1 ,

(13)

where Xˆ1 = (Tˆ1 ,W1 ) is the matrix of predicted values in sample 1. The OLS regression coefficients 41

Inoue and Solon (2005) show that the TS2SLS estimator remains consistent even when differences in the sampling rates vary with some of the instrumental variables.

54

generating Tˆ1 are based on p first stage regressions estimated in sample 2: ˆ = Z1 (Z20 Z2 )−1 Z20 X2 . Xˆ1 = Z1 Π

(14)

The following assumptions are required to ensure the consistency of the TS2SLS estimator (see Franklin 1989; Inoue and Solon 2010): 1. Random sampling from the same population: {Y1i , Z1i }ni=1 1 and {T2i , Z2i }ni=2 1 are independently and identically distributed draws of size n1 and n2 from the same population with finite second moments. 0 ε ] = E[Z 0 ε ] = 0. 2. Instrument exogeneity: E[Z1i 1i 2i 2i 0 u ] = 0. 3. Exclusion restriction: E[Z1i 1i 0 Z and Z 0 Z have full rank, (b) X 0 Z and X 0 Z have full rank. 4. Rank conditions: (a) Z1i 1i 2i 2i 1i 2i 2i 2i 0 X ] = E[Z 0 X ], (b) E[Z 0 Z ] = E[Z 0 Z ]. 5. Interchangeable sample moments: (a) E[Z1i 1i 2i 2i 1i 1i 2i 2i

Assumption 1 says that the samples must draw from the same population. Assumption 2 requires that the instrument be exogenous in the first stage. Assumption 3 is implied by the exclusion restriction (assumption A5) in the main text, but is written in terms of expectations. Assumption 4 is a standard rank condition required for matrix invertibility. Assumption 5 requires that crucial samples moments can be interchanged, thereby permitting substitution between samples. As n1 and n2 converge to the population size, Assumption 5 necessarily holds. Franklin (1989) proves the n1 -consistency of the TS2SLS estimator.42 However, calculating the TS2SLS standard errors is not obvious. Calculating the standard errors from a regression of Y1 on Xˆ1 neglects the uncertainty in the first stage, in addition to distributional differences between the first stage and reduced form samples. 42

Angrist and Krueger’s (1995) proof rests on showing that the TS2SLS estimator converges to the consistent Angrist and Krueger (1992) estimator, because of Assumption 5. 55

The Murphy and Topel (1985) two-stage framework for understanding “generated regressors”— accounting for the uncertainty introduced where a variable is estimated as a proxy to enter a separate regression—incorporates such estimation uncertainty.43 Proposition 3 derives the homoskedastic and cluster-robust variance (matrices), of which the robust variance is the particular case of G1 = n1 and G2 = n2 clusters. (i is dropped to facilitate exposition.) Proposition 3. The asymptotic variance of the TS2SLS estimator, V[βˆ T S2SLS ], is 

σ12



· · · σ1,p    .. ..  ... Ω = E[ε 0 ε|Xˆ1 ] =  . .      σ p,1 . . . σ p2

  n1 ˆ T S2SLS0 ˆ T S2SLS 2 σu + βS ΩβS E[Xˆ10 Xˆ1 ]−1 , n2

(15)

when the reduced form squared error σu2 = E[u2 |Xˆ1 ] and the error covariances Ω of the p first stage regressions are homoskedastic; when the reduced form and first stage errors are grouped into G1 and G2 clusters respectively, the cluster-robust variance is E[Xˆ10 Xˆ1 ]−1

  n 1 0 T S2SLS0 T S2SLS 0 T S2SLS0 ˆ )E[(βˆS ⊗ Z1 ) Xˆ1 ] E[Xˆ10 Xˆ1 ]−1 ,(16) ⊗ Z1 )]V(Π V[βˆ ] + E[Xˆ1 (βˆS n2

where βˆST S2SLS is the vector of coefficients on p endogenous variables, the uncorrected TS2SLS variance is given by V[βˆ T S2SLS ] = ˆ)= stage regressions are V(Π

G1 G1 0 ˆ ˆ0 G1 −1 ∑g=1 E[X1g uˆ1g uˆ1g X1g ]

G2 0 −1 G2 −1 Φ ⊗ E[Z2 Z2 ] ,



and the variances from m first-

where 

0 −1 G2 0 0 E[Z2 Z2 ] ∑g=1 E[Z2g εˆ2g1 εˆ2g1 Z2g ]

0 0 2 E[Z20 Z2 ]−1 ∑G g=1 E[Z2g εˆ2g1 εˆ2gp Z2g ] 

···   .. .. .. . Φ= . . .     0 εˆ 0 Z ] . . . E[Z 0 Z ]−1 G2 E[Z 0 εˆ 0 Z ] 2 ˆ ˆ E[Z20 Z2 ]−1 ∑G E [ Z ε ε ∑ 2g 2gp 2g1 2g 2 2 2g 2gp 2gp 2g g=1 g=1 43

(17)

Inoue and Solon (2010) acknowledge this approach but derive homoskedastic and heteroskedastic variance matrices in an alternative way, but do not provide a cluster-robust variance estimate.

56

Proof : Start by separating Xˆ into its endogenous and exogenous components,

Yi1 = Xi1 β−S + Ti1 βS + ui = Xi1 β−S + Tˆi1 βS + [Ti1 − Tˆi1 ] + ui ,

(18)

ˆ = Zi1 (Z 0 Z2 )−1 Z 0 T2 is the predicted value of the treatment using the first stage where Tˆi1 = Zi1 Π 2 2 estimates, and Ti1 is the true and unobserved treatment in sample 1. An OLS regression would yield: 



√ βˆ−T − β−S  n1  = ˆ βS − βS



1 ˆ0 ˆ X X1 n1 1

−1

  1 ˆ0 1 ˆ 0 ˆ −1 1 ˆ 0 X X1 √ X1 u1 + √ X1 [Ti1 − Tˆi1 ]βS , n1 n1 1 n1

(19)

where subscripts i and superscripts T S2SLS are omitted to save space. Using the expansion result in Murphy and Topel (1985: 374) yields:     ˆ √ √ β−T − β−S  a 1 ˆ 0 ˆ −1 1 ˆ 0 n1 (βˆ − β ) ≡ n1  = X X √ X1 u1  1 n1 1 n1 ˆ βS − βS     √ 1 ˆ 0 ˆ −1 n1 1/2 1 ˆ 0 ˆ 0 ˆ − Π), (20) + X1 X1 X1 (βT ⊗ Z1 ) n2 (Π n1 n2 n1

where (βˆT0 ⊗ Z1 ) is the matrix of defined in equation (12) of Murphy and Topel (1985). ˆ be a consistent estimator of the first stage for the endogenous variables, such that Let Π √ a ˆ − Π) ∼ n2 (Π N (0, V(Π)). Using our consistent first stage estimate, the asymptotic variance is therefore given by: V(βˆ − β ) =

E[Xˆ10 Xˆ1 ]−1

  n1 ˆ 0 ˆ 0 −1 −1 0 0 E[Xˆ10 Xˆ1 ]−1 , (21) V[β ] + E[X1 (βT ⊗ Z1 )] V[Π]E[(βˆT ⊗ Z1 ) Xˆ1 ] n2

where V[β ] is the variance of the naive TS2SLS estimator. (Note that E[Xˆ10 u1 ] = 0, in conjunction with a consistent first stage, implies the consistency of the estimator.) This establishes the general asymptotic variance formula in Proposition 3. We now apply the 57

homoskedastic and cluster-robust error structures: 1) Homoskedastic errors. Under homoskedasticity, the naive variance from the TS2SLS regression is simply σu2 (Xˆ10 Xˆ1 )−1 . To correct for the first stage estimation, we have: ˆ (Π ˆ )(βˆT0 ⊗ Z1 )0 Xˆ1 = Xˆ10 (βˆT0 ⊗ Z1 )(Ω ⊗ (Z10 Z1 )−1 )(βˆT0 ⊗ Z1 )0 Xˆ1 Xˆ10 (βˆT0 ⊗ Z1 )V

(22)

= Xˆ10 (βˆT0 ΩβˆT ⊗ Z1 (Z10 Z1 )−1 Z10 )Xˆ1

(23)

= βˆT0 ΩβˆT (Xˆ10 Xˆ1 ),

(24)

where the first line uses the definitions of homoskedasticity given in the proposition, the second line applies the mixed product property of Kronecker products, and the third line exploits Z1 (Z10 Z1 )−1 Z10 Xˆ1 = Xˆ1 (because all exogenous variables are contained in both Xˆ1 and Z1 ) and the fact that βˆT0 ΩβˆT0 is a scalar. Substituting into the general variance matrix yields the homoskedastic variance formula in Proposition 3. ˆ)= 2) Clustered errors. In the clustered case, we simply let V(Π

G2 0 −1 G2 −1 Φ ⊗ E[Z2 Z2 ] .



Standard errors are given by the square roots of the diagonal elements of V[βˆ T S2SLS ]/n1 . Using the analogy principle, expectations can be replaced by sample moments. ˆ ) is simply the standard cluster-robust variIn the case of a single endogenous regressor, V(Π ance matrix for the first stage: " E[Z20 Z2 ]−1

# G 2 G2 0 ˆ ˆ0 ε2g ε2g Z2g ] E[Z20 Z2 ]−1 . E[Z2g ∑ G2 − 1 g=1

(25)

When there are multiple endogenous variables, the first stage estimates may be correlated across models. This requires the more complex formulation in Proposition (3).

58

Brief history of CSL reforms in Britain This brief history borrows from Gillard (2011), Woodin, Gary and Cowan (2013, 2013) and the relevant legislative documents. There have been three landmark pieces of legislation in the area of CSLs in the twentieth century. First, David Lloyd George’s Liberal government moved on the recommendations of the Lewis Report 1916 to raise the school leaving age from 13 to 14 as parts of post-WW1 reforms under the Education Act 1918 (or Fisher Act). The Act was ambitious in that it also aimed to institutionalize schooling until 18 and expand higher education, as well as abolish fees at state-run schools (although this did not fully occur until 1947 in secondary education) and establish a national schooling infrastructure. However, the change in the school leaving age was not implemented by Lloyd George until the Education Act 1921, coming into effect in 1922. Although the 1918 Act had intended for further increases in the leaving age, these did not transpire for financial reasons despite repeated attempts in the 1920s and 1930s (Oreopoulos 2006). In practice, this Act had relatively little effect on school enrollment. Second, as part of the Beveridge reforms, the Churchill’s wartime coalition government passed the Education Act 1944 (or Butler Act), which increased the school leaving age from 14 to 15 in England and Wales;44 the Education (Scotland) Act 1945 cemented the same reform in Scotland. No such reform occurred in Northern Ireland until 1957, which is not included in the BES samples. The leaving age did not come into force until 1st April 1947, giving the education system time to expand its operations to accommodate the changes in the system (as well as many other new provisions under the 116-page monolith).45 By raising the leaving age, the first year of secondary school became compulsory, and thus the state fully subsidized secondary education for all pupils for the first time. As shown in the main paper, this substantially increased enrollment among the affected cohorts. 44

The Education Act 1936 had determined that the age should be raised in 1939, but this did not occur because of the onset of WW2. 45 The lack of teachers was a serious concern, requiring an emergency training program in 1945 to address the lack of capacity. 59

The Education Act 1944 also provided for raising the leaving age to 16 once practical. Consequently the leaving age could be raised to 16 by an Order of Council.46 Conservative Prime Minister Harold Macmillan presided over plans to raise the school leaving age to 16 in the Education Act 1962, which ultimately fixed spring and summer leaving dates, although it was Conservative Edward Heath who finalized the update to the current system under Statutory Instrument 444 (1972). The new rule, which had been overseen by Margaret Thatcher and heavily pushed by the Crowther Report 1959, was implemented for the academic year starting 1st September 1972 in England and Wales. Statutory Instrument 59 (1972) raised the leaving age more flexibly in Scotland to allow local authorities, who were very concerned about teacher shortages (especially in Strathclyde/Glasgow), to allow part-time schooling and early leaving in the summer terms. Consequently, the 1972 reform was relatively weak for many Scottish students. The reform in Scotland was not fully implemented until the Education Act 1976. The Education (School-leaving Dates) Act 1976 introduced slightly more subtle leaving age rules—which are not utilized in this paper as they require monthly birth data (as in Clark and Royer 2013). In England, Wales and Scotland these reforms again raised education participation rates, although less dramatically than the 1944 Act (Milligan, Moretti and Oreopoulos 2004). (Given the 1972 change in Scotland was not so clearly binding, it is excluded from the RD analysis.) Since it serves as encouragement, it remains for the 2SLS analyses. Although the 1947 and especially 1972 reforms were implemented by Conservatives, Labour had consistently pushed for the increases in the leaving age (Gillard 2011). This suggests that the reforms were not politically motivated. In practice, the reforms were slow to occur, despite persistent campaigning, because they posed severe infrastructural challenges and entailed considerable cost (Woodin, McCulloch and Cowan 2013). In 2008, Labour Prime Minister Gordon Brown passed the Education and Skills Act 2008. This 46

An Order of Council does not require approval like an Act. It may be lain before the House of Commons and is accepted unless a resolution is passed against it.

60

requires that by 2013 young people must remain in at least part-time education or training until age 17; by 2015, this rises to 18. Although regional implementation will vary, the Act applies across the UK.

Variable definitions • Conservative/Labour/Liberal vote. Indicator coded one for respondents identifying as having voted for the Conservative/Labour/Liberal party at the last general election. Only respondents which refused to respond, did not answer or did not vote were excluded. Datasets: BES and BSAS. • Conservative/Labour/Liberal partisan. Indicator for identifying as a Conservative/Labour/Liberal. The answer is the following questions: “Generally speaking, do you think of yourself as Conservative, Labour, Liberal, ...” (BES); “Generally speaking, do you think of yourself as a supporter of any one political party?” (BSAS). Although a follow-up occurs if the respondent answers “none” or “don’t know”, this is treated as a zero in this analysis. Note that Liberal party is used as a catch-all to include the Liberal Party, the Social Democrats in 1987 and the subsequent merged Liberal Democrats. Only respondents which refused to respond were excluded. Datasets: BES and BSAS. • High school. Indicator coded one for respondents that answered that their highest qualification was completing high school or higher. Where possible, respondents that left high school at age 16 were also coded as having completed high school. Dataset: BSAS. • Schooling. Years of completed schooling is calculated as the age that the respondent left full time education minus five (the age at which students start formal schooling). Years of schooling is top-coded at 13 years to ensure comparability and focus on state-provided education. Indicators for 10 and 11 years of schooling are defined according to this measure. Dataset: LFS. 61

• Birth year. Birth-year is estimated by subtracting age at the date of the survey from the year in which the survey was conducted. I then add 14 for year aged 14. I also add 14 for year aged 15 because the legislation affected only those after September 1972 (compared with April 1947, which is pertinent for year aged 14). Datasets: BES, BSAS and LFS. • CSLs. Indicators for CSL=15 and CSL=16; the residual is