WPS5601 Policy Research Working Paper 5601 A Practical Comparison of the Bivariate Probit and Linear IV Estimators Richard C. Chiburis Jishnu Das Michael Lokshin The World Bank Development Research Group Poverty and Inequality Team and Human Development and Public Services Team March 2011 Policy Research Working Paper 5601 Abstract This paper presents asymptotic theory and Monte- (b) comparing the mean-square error and the actual Carlo simulations comparing maximum-likelihood size and power of tests based on these estimators bivariate probit and linear instrumental variables across a wide range of parameter values relative to the estimators of treatment effects in models with a binary existing literature; and (c) assessing the performance of endogenous treatment and binary outcome. The three misspecification tests for bivariate probit models. The main contributions of the paper are (a) clarifying the authors recommend two changes to common practices: relationship between the Average Treatment Effect bootstrapped confidence intervals for both estimators, obtained in the bivariate probit model and the Local and a score test to check goodness of fit for the bivariate Average Treatment Effect estimated through linear IV; probit model. This paper is a product of the Poverty and Inequality Team, and Human Development and Public Services Team; Development Research Group. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be contacted at jdas1@worldbank.org and mlokshin@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team A Practical Comparison of the Bivariate Probit and Linear IV Estimators Richard C. Chiburis Jishnu Dasy Michael Lokshinz University of Texas at Austin y Development Research Group, World Bank z Development Research Group, World Bank 1 1 Introduction This paper examines estimation issues in empirical models with binary regressors and out- come variables. A motivating example is the e¤ect of private schooling on graduation rates. -- -- Here the "treatment" attending a private school-- and the "outcome" whether or not the individual graduated-- can take one of two potential values. Comparing mean graduation rates of children in public and private schools likely yields a biased estimate of the causal e¤ect of private schooling on graduation rates if omitted variables, such as ability, are cor- related both with private school attendance and graduation rates. There are two common approaches to estimating causal e¤ects in such models. One ap- proach disregards the binary structure of the outcome and treatment variables and presents linear instrumental variables (IV) estimates of the treatment e¤ect; the second computes maximum-likelihood estimates of a bivariate probit (BP) model, which assumes that the outcome and treatment are each determined by latent linear index models with jointly nor- mal error terms. Sometimes the two approaches can produce markedly di¤erent results. A persistent problem in the private schooling literature, for instance, is the large di¤erence be- tween linear IV and BP estimates of the treatment e¤ect. In some cases, these di¤er by an order of magnitude with the linear IV estimates exhibiting larger coe¢ cients and standard errors (Altonji, Elder, and Taber 2005). Keeping aside the discussion on the relevance of reduced-form impacts versus structural parameters (Angrist 2001, Mo¢ tt 2001), the existing literature sometimes o¤ers conicting advice on the best course of action in empirical problems of this sort. For instance, Angrist (1991, 2001) argues that the hard part of the empirical problem is ...nding a good instrument and that the "di¢ culties with endogenous variables in nonlinear limited dependent variables models are usually more apparent than real." This argument is supported by a stress on causal e¤ects as opposed to structural parameters in these models and by Monte-Carlo sim- ulations that argue for the robustness of the simpler linear estimator to the distribution of the error terms. On the other hand, Bhattacharya, Goldman, and McCa¤rey (2006) present 2 simulations that suggest that BP is slightly more robust than IV to non-normality of the error terms. We show that both of these seemingly di¤erent viewpoints can be justi...ed depending on the structure of the problem. The reconciliation is based on (a) distinguishing carefully between the Local Average Treatment E¤ect (LATE) estimated under the linear IV with the Average Treatment E¤ect (ATE) estimated under the BP model and (b) ex- tending the parameter values in Monte-Carlo simulations to cover a far wider range of model speci...cations relative to the existing literature. We present asymptotic and ...nite-sample Monte-Carlo simulation results for an extensive range of parameter values to help decide on a practical course of action when faced with a single dataset, a reliable instrument and (possibly) widely di¤ering estimates of the treatment e¤ect depending on the technique used. We focus on both the mean-square error of the estimators and on the size and power of hypothesis tests based on these estimators. We present simulations both with the BP model correctly speci...ed and with misspeci...cation due to non-normal error terms. Finally, we propose two straightforward additions to current practice that vastly improve the performance of the estimators and our con...dence in the normality assumptions of the BP model. Our ...rst set of results assumes that the BP model is correctly speci...ed. Under this assumption, when there are no covariates, BP tends to perform better than IV for smaller sample sizes (below 5000), with mixed results for larger samples. With a continuous co- variate, the performance of BP dominates IV in all of our simulations, and BP performs especially well when the treatment probability is close to 0 or 1: For instance, when the treatment probability is 0:1, for all ranges of the outcome probabilities and even with sam- ple sizes greater than 10,000 observations, the con...dence intervals of the IV estimate remain too large for any meaningful hypothesis testing; in contrast, BP con...dence intervals are much smaller.1 Therefore, researchers should expect IV and BP coe¢ cients to di¤er quite 1 This particular ...nding explains the large di¤erences between the IV and BP con...dence intervals in the motivating example above, since the percentage of children in private schools in the United States is 10 percent or lower in most samples 3 substantially when treatment probabilities are low or when sample sizes are below 5000; linear IV estimates are particularly uninformative for hypothesis testing when treatment probabilities are low. Further, as pointed out by Imbens and Angrist (1994) and others, the IV estimate is con- sistent for the local average treatment e¤ect (LATE) and not the overall average treatment e¤ect (ATE), which can be recovered from the maximum-likelihood BP estimate. That the estimators are estimating di¤erent e¤ects accounts for a ...nding by Angrist (1991) that in some cases, the variance of the IV estimator is lower than that of the maximum-likelihood BP ATE estimator despite the well known e¢ ciency of maximum likelihood. As expected, across most parameters of our simulations, the BP estimator is not robust to misspeci...cation of the BP model. Simulation results where the error terms exhibit excess skewness or excess kurtosis often lead to highly biased BP estimates, and tests based on BP estimates greatly overreject a true null hypothesis when the model is misspeci...ed. Tests based on IV estimates are more robust in terms of size, but they are also less powerful. The results presented in Bhattacharya, Goldman, and McCa¤rey (2006) on the robustness of the BP estimator to non-normal error terms arise only for speci...c combinations of the relevant parameters, as clari...ed by the extensive Monte-Carlo simulations considered here. We propose two additional steps to recover better con...dence intervals and check for model misspeci...cation in BP estimators. For both BP and IV estimators, sample sizes have to exceed 10,000 before the coverage rates of the standard Wald-type con...dence intervals approach the nominal coverage rate. In general, IV con...dence intervals tend to be too conservative and BP con...dence intervals are not conservative enough. We show that using bootstrapped con...dence intervals (relative to analytical versions) improves the coverage rate of both IV and BP estimators for all parameter values. Second, we recommend running s Murphy' (2007) score test to check the goodness-of-...t of the BP model; our simulations suggest that the score test is fairly good at picking up misspeci...cations arising from excess kurtosis or skewness in the error distributions. 4 The remainder of the paper is structured as follows. Section 2 reviews standard asymp- totic results. Section 3 discusses the data generating process for the Monte-Carlo simulations and presents results. Section 4 concludes. 2 Asymptotic properties of IV and BP estimators We ...rst derive asymptotic results for the case of a single binary instrument and no covariates. The section also details the relationship between two commonly used treatment e¤ects, the Average Treatment E¤ect (LATE) and the Local Average Treatment E¤ect (ATE). Let T 2 f0; 1g be the endogenous treatment, and let Y 2 f0; 1g be the outcome of interest. s Let Y1 be an individual' potential outcome had she received the treatment (T = 1), and let s Y0 be the individual' potential outcome had she not received the treatment (T = 0). Let s Z 2 f0; 1g be an instrument for the treatment. Let T1 be an individual' chosen treatment s had she been given Z = 1, and let T0 be an individual' chosen treatment had she been given Z = 0. We follow Imbens and Angrist (1994) in de...ning an instrument Z as satisfying the following conditions: Z is independent of (Y0 ; Y1 ; T0 ; T1 ) (1) and E [T j Z = 1] 6= E [T j Z = 0] : (2) We think of individuals as being sampled from the joint distribution of the random variables (Z; T1 ; T0 ; Y1 ; Y0 ). For each individual i, we actually observe (Z; T; Y ), where T = TZ and Y = YT . Suppose that we have an i.i.d. sample of n individuals. We focus on three commonly estimated treatment e¤ects, de...ned as follows: 5 1. The average treatment e¤ect (ATE) over the entire population is given by AT E = E [Y1 ] E [Y0 ] : (3) 2. The average treatment e¤ect on the treated (ATT) is the average treatment e¤ect only over those individuals who actually received the treatment: AT T = E [Y1 j T = 1] E [Y0 j T = 1] : (4) 3. The probability limit of the IV estimator is what Imbens and Angrist (1994) called the local average treatment e¤ect (LATE): E [Y j Z = 1] E [Y j Z = 0] LAT E = : (5) E [T j Z = 1] E [T j Z = 0] Under the condition that T1 T0 for all individuals, Imbens and Angrist show that LAT E can be interpreted as the average treatment e¤ect for the subpopulation that complies with the instrument, i.e. the subpopulation for which T would be equal to Z regardless of whether Z = 0 or Z = 1. It is informative to compare these e¤ects to the result we would obtain if we ignore that T is endogenous and run an OLS regression of Y on T and a constant.2 The probability limit of such a regression is OLS = E [Y1 j T = 1] E [Y0 j T = 0] : (6) If E [Y1 j T ] and E [Y0 j T ] are invariant to T , so that there is no selection bias, then OLS = AT E = AT T . Note that this condition does not ensure that LAT E = AT E . 2 Since Y is binary, one might also consider running a probit of Y on T and a constant. With no covariates, running a probit and computing the treatment e¤ect produces exactly the same result as OLS since both models have two parameters to ...t two moments E [Y j T = 1] and E [Y j T = 0]. 6 2.1 Bivariate probit model Typically it is necessary to impose additional structure on the model in order to identify 3 AT E and AT T . One way to do this while still allowing the treatment to be endogenous is to assume a bivariate probit model, which is a linear index model with bivariate normal shocks (Heckman 1978): T = Z+ T + "1 (7) T = 1 fT > 0g Y = T+ Y + "2 Y = 1 fY > 0g with ("1 ; "2 ) jointly distributed as standard bivariate normal with correlation and indepen- dent of Z. Note that assumption (1) above follows from this independence condition, and that 6= 0 implies (2). When > 0, LAT E (5) has the interpretation given by Imbens and Angrist (1994). De...ne pT = Pr[T = 1] and pY = Pr[Y = 1]. Let and be the standard normal distribution and density functions, respectively. Let B ( ; ; ) be the distribution function for the standard bivariate normal distribution with correlation . The ATE (3) is AT E = ( Y + ) ( Y ): (8) 3 Heckman and Vytlacil (1999) observe that if there exists a value z of the instrument such that E [T j Z = z] = 0, then AT T is nonparametrically identi...ed. If additionally there exists z 0 such that E [T j Z = z 0 ] = 1, then AT E is also identi...ed. This is a type of "identi...cation at in...nity" result since it typically requires extreme values of Z to be observed. However, with a binary Z as we have in our simulations, these conditions are rarely satis...ed. 7 A ...rst-order Taylor approximation about = 0 is 1 AT E ( (pY )): (9) The ATT (4) is given by B( T; Y+ ; ) B( T ; Y ; ) AT T = Pr[Z = 0] (10) ( T) B( T + ; Y + ; ) B( T + ; Y; ) + Pr[Z = 1] : ( T+ ) The LATE (5) can also be written as a function of the parameters in the bivariate probit model: [B( T+ ; Y + ; )+B( ( T+ ); Y ; )] [B( T; Y + ; )+B( T; Y ; )] LAT E = ( T+ ) ( T) : (11) While all of the types of treatment e¤ects are equal when = 0, they can di¤er signif- icantly for other values of ; in particular, the ordering of AT E , AT T , and LAT E varies across parameter values. In Appendix A.2, we derive a Taylor approximation for the ratio of LAT E to AT E as LAT E 1 1 1+ (pY ) (pT ): (12) AT E Since the probability limit of the IV estimator ^ IV is LAT E , (12) can be used to obtain a quick and intuitive approximation of the bias of ^ IV for AT E . In general, ^ IV is most biased relative to AT E when j j is large, and pT and pY are far from 1 . 2 The sign of 1 1 the bias depends on the signs of , (pY ), and (pT ). In Figure 1, we graph AT E , AT T , LAT E , and OLS for model (7) with = 0:3, = 0:4, and di¤erent values of pT , pY , and . We vary pT and pY by changing the constants T and Y. As suggested by the approximation (9) and given that is ...xed, AT E changes with pY but is very little a¤ected by pT or . When = 0, all of the e¤ects are equal. When > 0, the OLS e¤ect OLS (6) that ignores the endogeneity of T is biased upward relative to AT E , AT T , and LAT E , which correctly take into account the endogeneity of the treatment. As predicted 8 by the approximation (12), AT E and LAT E di¤er substantially when pT and pY are far from 1 , and this di¤erence increases with . The e¤ect on the treated 2 AT T is naturally close to the overall average e¤ect AT E when the probability of treatment pT is high. When pT is low, AT T is closer to LAT E because in that case there is high overlap between the treated and complier populations. Figure 1 also highlights the limits of our approximation 1 (12)-- according to (12), LAT E and AT E should the same when either pY or pT is 2 , but at higher values of , we see that the two e¤ects diverge. 2.2 Asymptotic variance of linear IV estimator The asymptotic variance Avar[ ^ ] of an estimator ^ is de...ned such that p n Var[ ^ ] ! Avar[ ^ ]: The asymptotic variance of the IV estimator ^ IV is Pr[Y =1jZ=1](1 Pr[Y =1jZ=1]) Pr[Y =1jZ=0](1 Pr[Y =1jZ=0]) Pr[Z=1] + Pr[Z=0] Avar[ ^ IV ] = 2 : (13) (Pr [T = 1 j Z = 1] Pr [T = 1 j Z = 0]) To provide intuition for how Avar[ ^ IV ] changes with pT , pY , and , we derive in Appendix A.3 the following Taylor approximation of Avar[ ^ IV ] within the context of the bivariate probit model: pY (1 pY ) Avar[ ^ IV ] 2[ : (14) ( 1 (pT ))]2 Var[Z] The asymptotic variance of ^ IV increases as pY approaches 1 2 and as pT moves away from 1 2 . Furthermore, the approximation (14) of Avar[ ^ IV ] does not depend on at all, and the exact Avar[ ^ IV ] (13) exhibits very little dependence on , as illustrated in Figure 2, which plots Avar[ ^ IV ] for various parameter values. 9 2.3 Asymptotic variance of ML bivariate probit estimators Let denote the vector of the parameters of , , T, Y, and in the bivariate probit model (7). Maximum-likelihood estimates of are obtained by selecting ^ to maximize the log-likelihood function: X n ^ = argmax log Li ( ) i=1 where 8 > B( Z + ; > Ti + ); if Ti = 1 and Yi = 1; > i T Y; > > > > < B( Zi + T ; ( Ti + Y ); ); if Ti = 1 and Yi = 0; Li ( ) = (15) > B( ( Z + > ); if Ti = 0 and Yi = 1; > > i T ); Ti + Y; > > > : B( ( Z + i T ); ( Ti + Y ); ); if Ti = 0 and Yi = 0: Once we have estimates of the parameters, we can estimate most types of treatment e¤ects, because they are functions of ; by substituting the estimated parameters into the expressions (8) for the ATE, (10) for the ATT, and (11) for the LATE. We denote the respective ML estimators as ^ BPE , ^ BPT , and ^ BP E . Hence, if the bivariate probit model AT AT LAT (7) is correctly speci...ed, maximum-likelihood can be used to consistently estimate the ATE, ATT, or LATE, whereas the linear IV estimator ^ IV only consistently estimates the LATE (5). The asymptotic variance of the ML estimator ^ of is given by Avar[^] = I( ) 1 , the inverse information matrix evaluated at the true . There are two common ways to calculate I( ): 0 @ @ I1 ( ) = E log Li ( ) log Li ( ) (16) @ @ and @2 I2 ( ) = E log Li ( ) : (17) @ 2 Using the delta method, we compute the asymptotic variance of any continuously di¤eren- 10 tiable function f of as f 0 ( )0 Avar[^]f 0 ( ): (18) Since the ATE, ATT, and LATE are all functions of , we can compute the asymptotic variance of ^ BPE , ^ BPT , and ^ BP E in this fashion. The results for ^ BPE for model (7) AT AT LAT AT with = 0:3, = 0:4, and many di¤erent values of pT , pY , and are shown in Figure 2. Note that the asymptotic variance of ^ BPE is highly sensitive to AT 1 when pT is far from 2 . 2.4 Comparing the asymptotic variances Because linear IV only consistently estimates the LATE, the asymptotic variances of linear IV and maximum-likelihood BP are compared most fairly for estimation of the LATE. When the BP model (7) is correctly speci...ed, maximum likelihood BP is asymptotically e¢ cient for the LATE since it is asymptotically e¢ cient for any smooth function of the parameters . Using the formulas (13) and (18), we compared the asymptotic variances of ^ IV and ^ BP for the LATE in model (7) with = 0:3, = 0:4, and across many di¤erent values LAT E of pT , pY , and . The asymptotic variance of ^ BP E is always lower than that of ^ IV , and LAT on average, the variance of ^ IV is 28 percent higher than ^ BP E , or its standard deviation LAT is 13 percent higher than ^ BP E . LAT The e¢ ciency gain from using ^ BP E instead of ^ IV LAT is far greater when a covariate is included in the model. In our models with a continuous covariate, on average the standard deviation of ^ IV is 150 percent higher than ^ BP E . LAT Angrist (1991) found that despite the e¢ ciency of BP, the variance of ^ BPE sometimes AT exceeds the variance of ^ IV . s Angrist' results follow from the asymptotics, as shown in Figure 2, where for certain values of pT , pY , and , the asymptotic variance of ^ BPE is AT higher than that of ^ IV . The key observation is that ^ BPE and ^ IV are estimating two AT di¤erent things, ATE and LATE respectively, and depending on parameters one can be estimated more precisely than the other. If we are interested in minimizing mean-square error for estimating the ATE, then ^ BPE will always be the better choice (when the BP AT 11 model is correct) except in rare cases in which ^ IV has lower asymptotic variance than ^ BP and the LATE happens to be close to the ATE. AT E 3 Monte-Carlo simulations To examine the properties of the BP and IV estimators in ...nite samples and with misspeci- ...cations, we conducted Monte-Carlo simulations across a range of parameter values. These parameter values represent a wider selection compared to those used in previous work by Angrist (1991) and Bhattacharya, Goldman, and McCa¤rey (2006), and prove useful in un- derstanding the performance of these estimators in practical applications. The wider range of parameters considered here qualitatively a¤ects the nature of the recommendation. For instance, we ...nd that for some combinations of pT and pY , deviations from normality in the BP model result in signi...cant bias, in contrast to the results of Bhattacharya, Gold- s man, and McCa¤rey (2006) over more limited simulations. Also, Angrist' (1991) ...nding of near-e¢ ciency of IV disappears when we add an exogenous covariate to the model. Table 1 compares the parameter ranges used in the di¤erent papers. Our simulations Angrist (1991) Bhattacharya et al. (2006) pT 0:1 0:9 0:2 0:5 0:5 pY 0:1 0:9 0:5 0:9 0:0 0:7 0:0 0:7 0:5 0:1 0:5 AT E 0:05 0:16 0:10 0:00 0:42 n 400 30; 000 400 800 5000 Number of covariates 0 or 1 0 1 Table 1: Ranges of parameter values used in various studies. 3.1 Data-generating processes Our data-generating processes (DGPs) are all based on the following latent-index model: 12 Ti = Zi + T Xi + T + "T i (19) Ti = 1 fTi > 0g Yi = Ti + Y Xi + Y + "Y i Yi = 1 fYi > 0g where Ti and Yi are latent continuous variables; , , T, Y, T, and Y are parameters; Xi is an exogenous covariate; and Zi is an instrumental dummy variable that is zero with 1 1 probability 2 and one with probability 2 . Our DGPs are designed to mimic typical situations encountered in applied econometric applications. The values of the coe¢ cients in the system (19) are chosen such that the true ATE is positive and falls in the range from 0:05 to 0:16 depending on the model speci...cation. In all of the DGPs, the coe¢ cients in (19) take the following values: = 0:3; = 0:4; T = 0:9; Y = 0:4 (20) We consider two DGPs for Xi : 1. In the ...rst DGP, Xi = 0 always, and hence we do not estimate T or Y. 2. In the second DGP, Xi N (0; 1), and Xi is independent of Zi . The error terms "T i and "Y i are always jointly independent of (Xi ; Zi ) and can be gener- ated according to any of six possible processes: 1. "T and "Y are jointly bivariate standard normal with correlation taking on one of four possible values: 0; 0:3; 0:5; 0:7. 2. Generate (uT ; uY ) as bivariate normal with correlation 0:32. Then transform "T = 1 1 F( (uT )) and "Y = F ( (uY )), where F is the CDF of a chi-square distribution 13 with 5 degrees of freedom. This results in skewed distributions for "T and "Y , and the bivariate probit model is misspeci...ed.4 3. Generate (uT ; uY ) as bivariate normal with correlation 0:32. Then transform "T = 1 1 F( (uT )) and "Y = F ( (uY )), where F is the CDF of a t distribution with 4 degrees of freedom. This results in distributions for "T and "Y with high kurtosis, and the bivariate probit model is misspeci...ed. Furthermore, we also consider many values of the constants T and Y. They are chosen so that pT = Pr [T = 1] and pY = Pr [Y = 1] each range separately over f0:1; 0:3; 0:5; 0:7; 0:9g. For each of the 300 combinations of possible DGPs for Xi , DGPs for ("T ; "Y ), and values of pT and pY speci...ed above, we conduct Monte-Carlo simulations on samples of 400, 800, 1K, 2K, 3K, 5K, 8K, 10K, 15K, 20K, and 30K observations. We run 1000 simulations for each sample size. In each simulation we compute the IV estimate of the LATE and the maximum-likelihood BP estimates of the ATE. Greene (1998) observed that the endogeneity of Ti does not a¤ect the form of the BP likelihood function (15), and hence BP estimates can be obtained directly from the bivariate probit routine available in many statistical software packages. In the simulations with nonzero covariates Xi , the ATE for the bivariate probit model is estimated as Xn ^ BPE = 1 AT (^ + ^ Y Xi + ^ Y ) ( ^ Y Xi + ^ Y ) : n i=1 The true ATE and LATE always lie in the interval [ 1; 1]. While ^ BPE will always fall AT in that interval, ^ IV is sometimes outside this interval, especially when the sample size is small. 4 The correlation of 0:32 for (uT ; uY ) was chosen so that the correlation of the transformed ("T ; "Y ) is approximately 0:30, allowing for comparison to the bivariate normal simulations with = 0:30. 14 3.2 Results Our simulation results are presented in Figures 3 through 8. The ...rst three are representa- tionally similar: in every sub-...gure, we plot the true AT E (the dotted line), the mean of ^ BP (the thick solid curve) and the mean of ^ IV (the thick dashed curve) against sample AT E sizes between 400 and 30,000. We also show the range between the 5th and the 95th per- centiles of ^ BPE and ^ IV . There are 9 sub-...gures showing the behavior of the BP and IV AT estimators for di¤erent parameter values of pT and pY and in every ...gure, we ...x = 0:3. Figure 3 presents simulations in estimations with no covariates, Figure 4 with covariates and Figure 5 examines departures from the BP model assumptions. In Appendix A.5, we provide tables of the root-mean-square error of ^ BPE and ^ IV for estimating AT AT E over a wider range of parameter values, which researchers can use with reference to the structure of their own particular problem. There are several noteworthy features. First, when there are no covariates (Figure 3) the simulations match the asymptotics (Figures 1 and 2) fairly well in sample sizes larger than about 5,000. In sample sizes smaller than 5,000, ^ BPE has lower variance than predicted by AT the asymptotics, because of mechanical bounds on the estimator ( ^ BPE 2 [ 1; 1]). Second AT ^ BP can be biased in small samples, as often happens for maximum-likelihood estimators. AT E Even when sample sizes are large, ^ BPE can be biased under particular extreme combinations AT of pT and pY -- in our simulations, two particularly dramatic examples are (pT = 0:9; pY = 0:1) and (pT = 0:1; pY = 0:9).5 Third, due to its relatively lower small-sample variance, ^ BP generally performs better than ^ IV in terms of RMSE for sample sizes smaller than AT E about 5,000. For larger sample sizes, the e¢ ciency of ^ BPE relative to ^ IV is somewhat AT reduced and for extreme combinations of parameter values, ^ IV can be the better estimator in terms of RMSE. 5 Firth (1993) describes several techniques for removing the ...rst-order bias from maximum-likelihood estimates. We simulated both asymptotic ...rst-order bias removal and bootstrap bias removal for the BP estimator but found that both techniques perform rather poorly, especially when j j is close to 1; since lower-order expansions poorly approximate the ...nite-sample bias. 15 Figure 4 shows that once we include covariates X in the BP model (19), ^ BPE has much AT lower variance and outperforms ^ IV across all of our simulations in terms of RMSE for AT E . Indeed, in most cases, the IV standard errors are too large for meaningful hypothesis testing, a problem that is particularly severe when pT is close to 0 or 1. These simulations highlight that the use of linear IV estimators with covariates can lead to extremely high standard errors and dramatic di¤erences in ^ BPE relative to ^ IV . AT An overarching theme thus far is that the BP estimators are generally more e¢ cient than linear IV, especially when the model speci...cation includes additional covariates. However, the gain in e¢ ciency may be outweighed by the severe bias when the BP model is misspeci...ed. Figure 5 examines departures from the BP model assumptions in the case with covariates.6 In this case, ^ BPE continues to have low variance but can be severely biased in some cases, AT with no clear guidance on the parameter values under which the expected bias will be worse. The evidence of bias presented here contrasts with the results of Bhattacharya, Goldman, and McCa¤rey (2006), who suggest that BP is slightly more robust to non-normality than IV. s As Figure 5 clari...es, Bhattacharya, Goldman, and McCa¤rey' result is a direct consequence of their choice of parameters. In their simulations with non-normal errors, pT is ...xed at 0:5, pY ranges between 0:5 and 0:7, and is 0:5. Our results in Figure 5 suggest that these happen to be values of pT and pY for which ^ BPE performs fairly well even when the BP AT model assumptions are violated. 6 For simulations with skewness or excess kurtosis of the error terms, the results are similar because the BP estimates are still consistent, despite the misspeci...cation. This is because with no covariates our misspeci...ed DGP is observationally equivalent to a correctly speci...ed bivariate probit model. Recall that (uT ; uY ) are generated as bivariate normal with correlation , and then "T = f (uT ) and "Y = f (uY ) for some monotone function f . Let ~ T = f 1 ( T ), ~ = f 1 ( T + ) ~ T , ~ Y = f 1 ( Y ), and ~ = f 1 ( Y + ) ~ Y . Then a correctly speci...ed bivariate probit model with coe¢ cients ~ T , ~ , ~ Y , ~ , produces the same distribution of observables as our DGP, and the values of all treatment e¤ects are the same in both models. It would have been possible for the BP estimators to be inconsistent if we had modi...ed the joint distribution of ("T ; "Y ) rather than modifying the marginal distributions individually. With a nonzero covariate Xi , the assumption of normality will actually be restrictive because the transformation f 1 will be applied at more than two points and hence will no longer preserve linearity. 16 3.3 Coverage of con...dence intervals Our ...nal simulation results examine the validity of con...dence intervals generated by the various methods and the performance of goodness-of-...t tests for the BP model, which can help detect potential misspeci...cation in the BP model. Figure 6 compares the nominal 95% con...dence intervals based on ^ IV and ^ BPE in terms of coverage of AT AT E . The standard error used to construct the con...dence intervals for ^ IV is obtained using the sample analogue of the asymptotic variance (13). Results are shown in Figure 6 for a correctly speci...ed model with Xi N (0; 1) and = 0:3. As shown in the ...gure, the IV coverage tends to be too high (greater than 95%) for small samples but slowly deteriorates towards zero as the sample size increases and ^ IV converges to LAT E rather than AT E (the dashed curve in the ...gure). The most common way to compute standard errors for the BP parameters is by estimating the information matrix using the sample analogue of I2 (^) (17). We would then apply the delta method as in (18) to obtain standard errors for ^ BPE . BP con...dence intervals for AT AT E computed in this way display signi...cantly lower coverage than the nominal 95% for sample sizes below 5,000, even when the model is correctly speci...ed, but coverage improves toward 95% in samples larger than 10,000 observations (the solid curve in the ...gure). Further investigation reveals that this undercoverage occurs because standard errors for the BP parameters are too small, and additional undercoverage is introduced in the delta-method step.7 Alternatively, we tried estimating the information matrix using the sample analogue of I1 (^) (16), or standard errors can be estimated using the Huber-White sandwich (robust) ^ ^ ^ estimator I2 (^) 1 I1 (^)I2 (^) 1 . These methods result in similar undercoverage of AT E . Fortunately, bootstrapped con...dence intervals appear to provide a simple ...x for over and undercoverage in both the IV and BP models. In the bootstrap, we draw with replacement n observations from the data and estimate ^ BPE and ^ IV using the new sample. This is AT 7 Monfardini and Radice (2008) report a similar result that t-tests based on maximum-likelihood estimation of the BP model systematically overreject the hypothesis = 0. Also as noted by Freedman and Sekhon s (2010), part of the di¢ culty may be caused by numerical issues with Stata' implementation of likelihood at maximization, as often the likelihood function is very and the algorithm fails to ...nd the global maximum. 17 repeated many times, and the size- con...dence interval for AT E or LAT E is reported as the interval between the a 2 and 1 2 quantiles of the simulated draws of ^ BPE and ^ IV .8 By AT bootstrapping the entire procedure of calculating ^ BPE , we avoid using the delta method. AT Because we ran thousands of simulations we used 39 bootstrap replications in each of our simulations to save time (each bootstrap replication took about 15 seconds at n = 30; 000), but we recommend at least 199 bootstrap replications in practice to reduce sampling noise. In addition, we simulated bootstrap results for only two sample sizes (n = 400 and n = 3; 000) given the processing time involved. The coverage rates of the bootstrapped BP con...dence intervals for AT E are close to the nominal 1 , as shown in Figures 6 and 7. The only exceptions are in small samples in the extreme cases (pT = 0:1; pY = 0:9) and (pT = 0:9; pY = 0:1), which have been shown in Figure 3 to be particularly problematic for BP. We therefore strongly recommend using bootstrapped con...dence intervals for BP, whether one is estimating treatment e¤ects or just the BP coe¢ cients .9 In addition, Figures 6 and 7 show that bootstrapping also reduces the overcoverage of IV con...dence intervals that we saw in small samples although it does not prevent undercoverage of IV in large samples because ^ IV is generally inconsistent for AT E . 3.4 Goodness-of-...t tests for bivariate probit Figure 8 presents results of goodness-of-...t tests for the bivariate probit model. We compare the ability of two di¤erent goodness-of-...t tests to detect our non-normal data-generating processes. Our ...rst test is an adaptation of the Hosmer and Lemeshow (1980) test to the bivariate probit model. This test divides the observations into subgroups and checks whether the frequencies of observed (yi ; ti ) match predicted frequencies given ^ and the distribution 8 Our simulations indicate that these quantile-based con...dence intervals perform better than bootstrap- ping standard errors and then using a normal approximation to obtain con...dence intervals. 9 When the BP model is misspeci...ed, the coverage rates of BP con...dence intervals are severely a¤ected by the misspeci...cation when there are covariates Xi . The misspeci...cation has a lesser impact on IV coverage rates, since IV standard errors tend to be larger and IV is generally not consistent even in the BP model. However, for the same reason, tests based on ^ IV are generally less powerful than tests based on ^ BP E . AT 18 of Xi and Zi in each subgroup. The details of our adaptation of the Hosmer-Lemeshow test are given in Appendix A.4. The second goodness-of-...t test we use is a Rao score test developed by Murphy (2007).10 This test embeds the bivariate normal distribution within a larger family of distributions by adding more parameters to the model and checks whether the additional parameters are all zeros using the score for the additional parameters at the BP estimate.11 We set both tests to reject at a 5% signi...cance level using asymptotic chi- square critical values.12 In our simulations with a bivariate normal data generating process, both tests reject about 5% of the time, as expected. The score test performs much better than the Hosmer-Lemeshow test in detecting our non-normal data-generating processes, as shown in Figure 8. The results of this comparison of the two tests agree with those of Chiburis (2010) from simulations without an endogenous regressor. 4 Conclusion We have derived asymptotic results and presented simulations comparing bivariate probit and linear IV estimators of the average treatment e¤ect of a binary treatment on a binary outcome. Our simulation results provide some practical guidance on the choice of speci- ...cation in practical problems with di¤erent parameter values and the presence/absence of covariates and can help explain widely di¤ering results depending on the speci...cation cho- sen. Our ...ndings can be summarized as four main messages for practical applications in empirical models with binary regressors and binary outcome variables: Researchers should expect IV and BP coe¢ cients to di¤er substantially when treatment probabilities are low or when sample sizes are below 5000. Linear IV estimates are particularly uninformative for hypothesis testing when treatment probabilities are low, 10 See Chiburis (2010) for corrections to several errors in Murphy (2007) and an alternative derivation of the test. 11 Since Ti is endogenous, predicted probabilities of (yi ; ti ) used to calculate the test statistic are computed conditional on Xi and Zi but not Ti . 12 Murphy (2007) recommends bootstrapping the critical value of his test, but we ...nd that the asymptotic critical values work well enough even at small sample sizes that the time-consuming bootstrap is not necessary. 19 a problem that is accentuated when there are covariates in the model. Table A5 in the Appendix provides the ATE, the ratio of LATE to ATE and the root mean square error for di¤erent values of pT , pY and for di¤erent sample sizes. These tables can be used as a guide for practical applications. One recommendation is to present both linear IV and BP estimates when there are covariates in the model, and for the ranges of pT and pY where IV con...dence intervals are large. ect The di¤erence between IV and BP estimates could also re di¤erences between the LATE and ATE estimates recovered by the linear IV and BP procedures respectively. Again, Table A5 as well as our asymptotic ratio approximation provide a guide for the variance in these estimates. Con...dence intervals recovered through bootstrapping are a must in these models when sample sizes are below 10,000 and should be preferred to analytical standard errors for all applications. As is well known, researchers should be aware that for a broad range of parameter values, misspeci...cation of the BP model can lead to severe bias in BP estimates. This problem, however, does not arise in models with no covariates. In models with s covariates, Murphy' goodness-of-...t score test (Murphy 2007, Chiburis 2010) can help detect misspeci...cations of the BP model. A Appendix A.1 Stata commands In this appendix we describe how to run our recommended BP and IV procedures for cal- culating treatment e¤ects in Stata. The Stata commands biprobittreat, scoregof, and 20 bphltest are available for download at https://webspace.utexas.edu/rcc485/www/code.html Suppose we have a dataset with binary outcome Y, binary treatment T, instrument Z, and covariates X1, X2. 1. To compute ^ IV along with bootstrapped con...dence intervals, type: ivregress 2sls Y X1 X2 (T=Z), vce(bootstrap, reps(199)) estat bootstrap, percentile 2. To compute ^ BPE and ^ BPT along with bootstrapped con...dence intervals, type: AT AT bootstrap _b ate=r(ate) att=r(att), reps(199): biprobittreat (Y = T X1 X2) (T = Z X1 X2) estat bootstrap, percentile 3. To run the Murphy score and Hosmer-Lemeshow goodness-of-...t tests, type: biprobit (Y = T X1 X2) (T = Z X1 X2) scoregof bphltest A.2 Derivation of LAT E = AT E approximation (12) Using (8) and (11), we compute a ...rst-order Taylor approximation of LAT E AT E about ; ; = 0. Although the ratio is unde...ned for = 0 or = 0, the limit lim ; ; !0 LAT E AT E exists and is equal to 1, so we can still compute the Taylor expansion around this point. The terms involving the derivatives with respect to and are zero because LAT E AT E = 1 when = 0, regardless of and . This leaves us with 21 p p p p ( T+ )2 +( Y + )2 ( T+ )2 + 2 Y 2 +( T Y + )2 + 2+ 2 T Y LAT E 1+ lim p 2 ( ( : AT E ; !0 T+ ) ( T ))( ( Y + ) ( Y )) Hôpital' rule twice to obtain The limit is indeterminate, so we apply L' s LAT E 1+ T Y: AT E 1 In order to write this in terms of pT and pY , at = = 0 we approximate T (pT ) 1 and Y (pT ), yielding LAT E 1 1 1+ (pT ) (pY ): AT E A.3 Derivation of Avar[ ^ IV ] approximation (14) We can write (13) as Pr[Y =1jZ=1](1 Pr[Y =1jZ=1]) Pr[Y =1jZ=0](1 Pr[Y =1jZ=0]) Pr[Z=1] + Pr[Z=0] Avar[ ^ IV ] = ; ~2 where ~= ( T + ) ( T) is the probability limit of the coe¢ cient on Z in the ...rst stage of the IV regression. A Taylor approximation of ~ in terms of pT is 1 ~ ( (pT )) 22 1 since (pT ) is between T and T+ . Furthermore, for reasonable values of the treatment e¤ect we can use E [Y j Z = 1] (1 E [Y j Z = 1]) E [Y j Z = 0] (1 E [Y j Z = 0]) to approximate pY (1 pY ) Avar[ ^ IV ] 2 ~ Pr [Z = 1] (1 Pr [Z = 1]) Var[Y ] = ~ 2 Var[Z] pY (1 pY ) 2[ ( 1 (p ))]2 Var[Z] : T A.4 Adapted Hosmer-Lemeshow goodness-of-...t test for bivariate probit The Hosmer and Lemeshow (1980) test statistic was developed to correct a problem with the simple Pearson test statistic. To compute the Pearson test statistic in the bivariate probit model with an endogenous regressor, we create cells for each unique value of (x; z) in the data and sort the observations into those cells. For each cell c = 1; : : : ; C, let Ocyt be the number of observations in cell c with Y = y and T = t, and let Ecyt be the expected number of observations in cell c with Y = y and T = t according to the BP model. It is P computed as Ecyt = i2cell c ^ iyt , where ^ iyt is the predicted probability of (Y; T ) = (y; t) given (X; Z) = (xi ; zi ), evaluated using the BP model at the estimated parameters ^. The Pearson test statistic is X X X (Ocyt C 1 1 Ecyt )2 2 X = : c=1 y=0 t=0 Ecyt When X and Z are discrete and the number of unique cells (x; z) is small relative to n, X 2 is approximately distributed as chi-square with 3C dim ( ) degrees of freedom under the null hypothesis that the true model is BP (Osius and Rojek 1992). We recommend the use 23 of the Pearson test statistic in such cases. However, when there are many unique values of (x; z) in the data, as is the case when X or Z is continuously distributed, Osius and Rojek (1992) show that this asymptotic approximation for X 2 breaks down. They compute a better asymptotic distribution of X 2 for the continuous case. The method of Hosmer and Lemeshow (1980) and Fagerland, Hosmer, and Bo...n (2008), which was originally developed for logistic models, is another way to modify the Pearson test for use with continuous X or Z. This test combines the observations into a smaller number of groups to ensure that the test statistic is well approximated by its asymptotic distribution.13 To adapt the Hosmer and Lemeshow (1980) test to the bivariate probit model, we choose two constants G1 and G2 . We ...rst sort the observations into G1 groups of roughly equal size based on Pr[T = 1 j ^; X = xi ; Z = zi ]. Within each of these groups, we then sort the observations into G2 subgroups based on Pr[Y = 1 j ^; X = xi ; Z = zi ]. This results in a total of G = G1 G2 groups. For each of these groups g, let Ogyt be the number of P observations in group g with Y = y and T = t, and let Egyt = i2group g ^ iyt . The adapted Hosmer-Lemeshow test statistic is X X X (Ogyt G 1 1 Egyt )2 C= : g=1 y=0 t=0 Egyt Under the null hypothesis that BP is the correct model, we expect C to be distributed approximately chi-square with 3(G 2) degrees of freedom. This distribution was derived by Fagerland, Hosmer, and Bo...n (2008) based on simulations. In our simulations, the Hosmer-Lemeshow test statistic C is computed with G1 = G2 = 3.14 13 Pigeon and Heyse (1999) add a small modi...cation to the Hosmer-Lemeshow statistic. Their statistic has a slightly di¤erent asymptotic distribution. 14 This results in 9 total groups, which is in the range of 8 to 12 groups used by Fagerland, Hosmer, and Bo...n (2008) in their simulations of the analogous test for multinomial logistic regressions. 24 A.5 Simulation root-mean-square error tables Root-mean-square error n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000 pT pY AT E LAT E AT E BP IV BP IV BP IV BP IV BP IV 0.1 0.1 0.08 1.00 0.38 2.57 0.35 1.53 0.27 0.22 0.16 0.12 0.09 0.06 0.1 0.3 0.15 1.00 0.36 3.46 0.33 1.99 0.26 0.35 0.16 0.19 0.10 0.10 0.1 0.5 0.16 1.00 0.36 4.24 0.32 3.14 0.25 0.39 0.16 0.21 0.10 0.12 0.1 0.7 0.13 1.00 0.38 4.08 0.34 3.14 0.24 0.36 0.14 0.19 0.08 0.11 0.1 0.9 0.06 1.00 0.43 2.58 0.38 2.94 0.27 0.23 0.15 0.12 0.06 0.07 0.3 0.1 0.08 1.00 0.28 1.93 0.22 0.21 0.13 0.11 0.07 0.06 0.04 0.03 0.3 0.3 0.14 1.00 0.31 1.91 0.24 0.31 0.16 0.16 0.09 0.09 0.05 0.05 0.3 0.5 0.16 1.00 0.31 2.40 0.24 0.34 0.15 0.18 0.10 0.10 0.06 0.06 0.3 0.7 0.13 1.00 0.31 1.23 0.22 0.31 0.14 0.17 0.08 0.09 0.05 0.05 0.3 0.9 0.06 1.00 0.29 2.29 0.22 0.21 0.12 0.11 0.06 0.06 0.03 0.03 0.5 0.1 0.07 1.00 0.24 0.54 0.18 0.17 0.10 0.09 0.05 0.05 0.03 0.03 0.5 0.3 0.14 1.00 0.29 0.96 0.22 0.27 0.13 0.14 0.08 0.08 0.04 0.04 0.5 0.5 0.16 1.00 0.30 0.81 0.22 0.30 0.14 0.15 0.09 0.09 0.05 0.05 0.5 0.7 0.14 1.00 0.29 0.64 0.21 0.27 0.14 0.15 0.08 0.08 0.05 0.05 0.5 0.9 0.07 1.00 0.25 0.49 0.18 0.17 0.10 0.09 0.05 0.05 0.03 0.03 0.7 0.1 0.06 1.00 0.27 1.13 0.21 0.20 0.12 0.11 0.05 0.06 0.03 0.03 0.7 0.3 0.13 1.00 0.31 1.70 0.24 0.32 0.14 0.17 0.08 0.09 0.05 0.05 0.7 0.5 0.16 1.00 0.32 1.15 0.24 0.34 0.15 0.18 0.09 0.10 0.06 0.06 0.7 0.7 0.14 1.00 0.30 0.88 0.24 0.31 0.16 0.17 0.09 0.09 0.06 0.05 0.7 0.9 0.08 1.00 0.28 0.47 0.22 0.21 0.13 0.11 0.07 0.06 0.04 0.03 0.9 0.1 0.06 1.00 0.44 2.45 0.37 0.83 0.27 0.23 0.15 0.12 0.06 0.06 0.9 0.3 0.13 1.00 0.39 5.02 0.35 0.91 0.26 0.35 0.15 0.18 0.08 0.10 0.9 0.5 0.16 1.00 0.38 5.09 0.34 1.26 0.25 0.37 0.16 0.21 0.10 0.12 0.9 0.7 0.15 1.00 0.36 4.03 0.33 0.85 0.26 0.35 0.17 0.19 0.10 0.11 0.9 0.9 0.08 1.00 0.37 3.65 0.35 0.68 0.26 0.24 0.16 0.12 0.09 0.06 Table 2: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and pY , AT in bivariate probit model simulations with no covariates and = 0. For most values of pT and pY , the RMSE of BP is much smaller than the RMSE for IV in the sample sizes below 3000, but the di¤erence shrinks with larger sample sizes. 25 Root-mean-square error n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000 pT pY AT E LAT E AT E BP IV BP IV BP IV BP IV BP IV 0.1 0.1 0.08 1.47 0.38 3.03 0.32 1.88 0.23 0.22 0.11 0.12 0.06 0.07 0.1 0.3 0.15 1.12 0.36 4.93 0.33 4.81 0.25 0.34 0.15 0.18 0.09 0.10 0.1 0.5 0.16 0.90 0.37 6.65 0.34 4.22 0.26 0.38 0.17 0.20 0.10 0.11 0.1 0.7 0.12 0.70 0.41 5.30 0.37 2.99 0.28 0.34 0.19 0.19 0.11 0.11 0.1 0.9 0.05 0.47 0.32 3.40 0.41 2.67 0.40 0.23 0.25 0.12 0.16 0.07 0.3 0.1 0.07 1.19 0.27 1.38 0.20 0.21 0.11 0.11 0.06 0.06 0.03 0.04 0.3 0.3 0.14 1.11 0.31 1.58 0.24 0.32 0.15 0.16 0.09 0.09 0.05 0.05 0.3 0.5 0.16 1.02 0.32 2.16 0.24 0.34 0.16 0.18 0.09 0.10 0.05 0.05 0.3 0.7 0.13 0.91 0.31 1.78 0.24 0.31 0.15 0.16 0.09 0.09 0.05 0.05 0.3 0.9 0.06 0.73 0.32 1.29 0.26 0.21 0.16 0.11 0.08 0.06 0.04 0.04 0.5 0.1 0.06 0.96 0.24 0.50 0.18 0.18 0.10 0.09 0.05 0.05 0.03 0.03 0.5 0.3 0.14 1.03 0.29 0.89 0.22 0.28 0.13 0.14 0.08 0.08 0.04 0.04 0.5 0.5 0.16 1.05 0.30 1.08 0.23 0.29 0.15 0.15 0.08 0.08 0.05 0.05 0.5 0.7 0.14 1.03 0.29 0.83 0.21 0.26 0.13 0.14 0.08 0.08 0.04 0.04 0.5 0.9 0.06 0.96 0.25 0.54 0.18 0.17 0.10 0.09 0.05 0.05 0.03 0.03 0.7 0.1 0.06 0.73 0.32 1.19 0.25 0.23 0.15 0.11 0.08 0.06 0.04 0.04 0.7 0.3 0.13 0.91 0.31 1.82 0.25 0.33 0.15 0.16 0.09 0.09 0.05 0.05 0.7 0.5 0.16 1.02 0.32 1.31 0.25 0.37 0.16 0.18 0.09 0.10 0.05 0.05 0.7 0.7 0.14 1.11 0.31 1.25 0.24 0.31 0.15 0.16 0.09 0.09 0.05 0.05 0.7 0.9 0.07 1.19 0.27 0.72 0.20 0.21 0.11 0.11 0.06 0.06 0.03 0.04 0.9 0.1 0.05 0.47 0.34 2.73 0.41 1.28 0.40 0.24 0.27 0.12 0.15 0.07 0.9 0.3 0.12 0.70 0.42 4.12 0.37 1.90 0.29 0.35 0.19 0.18 0.11 0.10 0.9 0.5 0.16 0.90 0.38 4.81 0.34 2.07 0.26 0.38 0.17 0.20 0.10 0.11 0.9 0.7 0.15 1.12 0.38 4.24 0.34 2.85 0.25 0.35 0.16 0.18 0.09 0.10 0.9 0.9 0.08 1.47 0.39 3.12 0.34 2.45 0.23 0.22 0.12 0.12 0.07 0.08 Table 3: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and AT pY , in bivariate probit model simulations with no covariates and = 0:3. 26 Root-mean-square error n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000 pT pY AT E LAT E AT E BP IV BP IV BP IV BP IV BP IV 0.1 0.1 0.08 1.90 0.39 2.55 0.30 2.33 0.19 0.22 0.09 0.13 0.05 0.10 0.1 0.3 0.15 1.17 0.39 3.58 0.33 5.51 0.24 0.33 0.15 0.18 0.09 0.10 0.1 0.5 0.16 0.75 0.38 4.92 0.34 4.74 0.28 0.37 0.19 0.20 0.12 0.12 0.1 0.7 0.12 0.44 0.38 3.78 0.42 3.10 0.34 0.35 0.23 0.19 0.15 0.12 0.1 0.9 0.05 0.18 0.18 2.82 0.24 3.12 0.35 0.23 0.41 0.13 0.31 0.08 0.3 0.1 0.07 1.31 0.26 1.82 0.19 0.21 0.10 0.11 0.05 0.06 0.03 0.04 0.3 0.3 0.14 1.25 0.31 2.01 0.22 0.29 0.14 0.16 0.08 0.09 0.05 0.06 0.3 0.5 0.16 1.06 0.31 2.12 0.24 0.33 0.16 0.17 0.09 0.09 0.05 0.06 0.3 0.7 0.13 0.82 0.31 2.30 0.24 0.32 0.16 0.16 0.09 0.09 0.05 0.06 0.3 0.9 0.06 0.47 0.30 1.78 0.32 0.21 0.23 0.11 0.11 0.07 0.06 0.04 0.5 0.1 0.06 0.84 0.26 0.50 0.18 0.17 0.10 0.09 0.05 0.05 0.03 0.03 0.5 0.3 0.13 1.09 0.29 0.98 0.21 0.26 0.13 0.14 0.07 0.07 0.04 0.04 0.5 0.5 0.16 1.15 0.29 1.11 0.22 0.27 0.14 0.15 0.08 0.08 0.05 0.05 0.5 0.7 0.13 1.09 0.28 0.81 0.21 0.25 0.12 0.13 0.07 0.08 0.04 0.04 0.5 0.9 0.06 0.84 0.26 0.49 0.18 0.17 0.11 0.09 0.05 0.05 0.03 0.03 0.7 0.1 0.06 0.47 0.31 1.03 0.31 0.21 0.23 0.11 0.12 0.07 0.06 0.04 0.7 0.3 0.13 0.82 0.31 2.61 0.25 0.31 0.16 0.16 0.09 0.09 0.05 0.05 0.7 0.5 0.16 1.06 0.32 1.32 0.25 0.33 0.16 0.17 0.09 0.09 0.05 0.05 0.7 0.7 0.14 1.25 0.31 0.87 0.23 0.29 0.14 0.16 0.08 0.09 0.05 0.06 0.7 0.9 0.07 1.31 0.26 0.50 0.17 0.20 0.09 0.11 0.05 0.06 0.03 0.04 0.9 0.1 0.05 0.18 0.19 2.37 0.25 0.96 0.36 0.23 0.44 0.13 0.31 0.08 0.9 0.3 0.12 0.44 0.39 4.19 0.41 1.20 0.34 0.35 0.24 0.19 0.15 0.12 0.9 0.5 0.16 0.75 0.40 4.66 0.35 1.25 0.28 0.37 0.19 0.20 0.12 0.12 0.9 0.7 0.15 1.17 0.39 3.94 0.34 1.35 0.24 0.34 0.15 0.18 0.09 0.11 0.9 0.9 0.08 1.90 0.37 2.76 0.30 0.58 0.19 0.22 0.09 0.13 0.05 0.09 Table 4: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and AT pY , in bivariate probit model simulations with no covariates and = 0:5. 27 Root-mean-square error n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000 pT pY AT E LAT E AT E BP IV BP IV BP IV BP IV BP IV 0.1 0.1 0.08 2.60 0.40 3.06 0.29 2.23 0.16 0.24 0.07 0.16 0.04 0.14 0.1 0.3 0.15 1.12 0.41 4.07 0.34 3.47 0.25 0.33 0.15 0.17 0.09 0.10 0.1 0.5 0.16 0.46 0.40 4.25 0.39 3.95 0.33 0.38 0.24 0.21 0.16 0.14 0.1 0.7 0.12 0.15 0.25 3.57 0.29 3.02 0.35 0.37 0.37 0.21 0.29 0.15 0.1 0.9 0.05 0.02 0.09 2.93 0.09 2.18 0.10 0.23 0.15 0.13 0.21 0.09 0.3 0.1 0.06 1.34 0.26 1.10 0.17 0.23 0.08 0.11 0.04 0.06 0.02 0.04 0.3 0.3 0.14 1.54 0.30 1.61 0.21 0.31 0.12 0.17 0.07 0.11 0.04 0.09 0.3 0.5 0.16 1.11 0.31 2.65 0.24 0.33 0.15 0.17 0.09 0.09 0.05 0.06 0.3 0.7 0.12 0.60 0.32 2.93 0.28 0.34 0.18 0.17 0.11 0.10 0.06 0.07 0.3 0.9 0.05 0.16 0.17 1.17 0.19 0.22 0.25 0.12 0.28 0.07 0.21 0.06 0.5 0.1 0.06 0.53 0.23 0.50 0.22 0.19 0.17 0.10 0.08 0.06 0.04 0.04 0.5 0.3 0.13 1.16 0.27 0.76 0.20 0.26 0.12 0.14 0.06 0.08 0.04 0.05 0.5 0.5 0.16 1.39 0.28 0.87 0.20 0.26 0.12 0.15 0.07 0.10 0.04 0.08 0.5 0.7 0.13 1.16 0.26 0.70 0.19 0.24 0.11 0.14 0.06 0.07 0.04 0.05 0.5 0.9 0.06 0.53 0.23 0.48 0.22 0.18 0.17 0.10 0.08 0.06 0.04 0.04 0.7 0.1 0.05 0.16 0.18 0.81 0.20 0.23 0.24 0.12 0.28 0.07 0.20 0.06 0.7 0.3 0.12 0.60 0.32 1.40 0.28 0.32 0.18 0.17 0.11 0.10 0.06 0.07 0.7 0.5 0.16 1.11 0.31 1.53 0.23 0.33 0.15 0.17 0.08 0.09 0.05 0.06 0.7 0.7 0.14 1.54 0.29 1.32 0.20 0.29 0.12 0.17 0.07 0.11 0.04 0.09 0.7 0.9 0.06 1.34 0.26 0.67 0.16 0.20 0.08 0.10 0.04 0.06 0.02 0.04 0.9 0.1 0.05 0.02 0.09 2.43 0.08 0.72 0.09 0.24 0.14 0.13 0.22 0.08 0.9 0.3 0.12 0.15 0.24 4.51 0.29 0.94 0.35 0.36 0.39 0.20 0.30 0.14 0.9 0.5 0.16 0.46 0.40 4.53 0.39 1.37 0.33 0.38 0.23 0.21 0.16 0.14 0.9 0.7 0.15 1.12 0.42 4.73 0.34 1.66 0.25 0.35 0.16 0.17 0.09 0.10 0.9 0.9 0.08 2.60 0.39 3.45 0.27 0.92 0.15 0.23 0.06 0.16 0.03 0.14 Table 5: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and AT pY , in bivariate probit model simulations with no covariates and = 0:7. 28 Root-mean-square error n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000 pT pY AT E LAT E AT E BP IV BP IV BP IV BP IV BP IV 0.1 0.1 0.08 1.35 0.20 5.87 0.14 5.89 0.08 0.30 0.04 0.15 0.03 0.09 0.1 0.3 0.14 1.09 0.25 8.04 0.17 21.61 0.10 0.47 0.06 0.23 0.03 0.13 0.1 0.5 0.15 0.92 0.26 10.12 0.18 18.59 0.11 0.49 0.06 0.24 0.03 0.14 0.1 0.7 0.12 0.77 0.27 11.15 0.18 34.14 0.10 0.48 0.05 0.23 0.03 0.14 0.1 0.9 0.05 0.56 0.21 3.06 0.18 19.59 0.09 0.31 0.04 0.16 0.02 0.10 0.3 0.1 0.07 1.13 0.18 3.19 0.13 0.39 0.08 0.14 0.05 0.07 0.03 0.04 0.3 0.3 0.13 1.07 0.24 8.31 0.18 1.04 0.11 0.22 0.06 0.11 0.03 0.06 0.3 0.5 0.15 1.01 0.25 25.13 0.17 0.98 0.10 0.23 0.06 0.12 0.03 0.07 0.3 0.7 0.12 0.93 0.23 42.05 0.15 0.47 0.09 0.23 0.05 0.11 0.03 0.07 0.3 0.9 0.06 0.79 0.18 39.50 0.10 0.30 0.05 0.15 0.03 0.08 0.02 0.05 0.5 0.1 0.06 0.97 0.18 4.46 0.12 0.93 0.07 0.12 0.04 0.07 0.02 0.04 0.5 0.3 0.13 1.01 0.26 18.72 0.18 1.64 0.11 0.19 0.06 0.10 0.04 0.05 0.5 0.5 0.15 1.02 0.28 45.66 0.20 1.82 0.12 0.20 0.07 0.10 0.04 0.06 0.5 0.7 0.13 1.01 0.25 12.23 0.19 1.24 0.11 0.19 0.06 0.10 0.04 0.06 0.5 0.9 0.06 0.96 0.17 10.47 0.11 0.85 0.07 0.13 0.04 0.07 0.02 0.04 0.7 0.1 0.06 0.80 0.19 24.85 0.11 0.82 0.06 0.14 0.03 0.08 0.02 0.04 0.7 0.3 0.12 0.93 0.23 36.60 0.15 1.09 0.08 0.22 0.05 0.11 0.03 0.06 0.7 0.5 0.15 1.02 0.26 49.65 0.18 1.09 0.10 0.23 0.05 0.12 0.03 0.07 0.7 0.7 0.13 1.07 0.25 56.82 0.18 0.83 0.11 0.22 0.06 0.11 0.03 0.07 0.7 0.9 0.07 1.13 0.18 41.57 0.13 0.90 0.08 0.15 0.05 0.08 0.03 0.05 0.9 0.1 0.05 0.58 0.23 4.47 0.20 1.85 0.11 0.31 0.04 0.15 0.02 0.09 0.9 0.3 0.12 0.77 0.27 7.58 0.17 2.96 0.10 0.47 0.05 0.24 0.03 0.13 0.9 0.5 0.15 0.94 0.26 10.60 0.18 2.96 0.10 0.51 0.06 0.24 0.03 0.14 0.9 0.7 0.14 1.11 0.25 9.90 0.18 2.62 0.11 0.47 0.06 0.23 0.03 0.14 0.9 0.9 0.08 1.36 0.21 7.01 0.14 2.21 0.08 0.31 0.04 0.15 0.02 0.09 Table 6: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and AT pY , in bivariate probit model simulations with covariate X and = 0. 29 Root-mean-square error n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000 pT pY AT E LAT E AT E BP IV BP IV BP IV BP IV BP IV 0.1 0.1 0.07 1.75 0.22 3.79 0.15 10.98 0.09 0.29 0.05 0.15 0.03 0.10 0.1 0.3 0.14 1.17 0.27 6.61 0.19 11.02 0.12 0.47 0.06 0.22 0.04 0.13 0.1 0.5 0.15 0.81 0.29 9.62 0.20 5.82 0.12 0.51 0.07 0.24 0.04 0.15 0.1 0.7 0.12 0.52 0.28 8.79 0.22 39.55 0.13 0.48 0.07 0.23 0.04 0.14 0.1 0.9 0.05 0.27 0.13 4.58 0.14 11.49 0.13 0.32 0.09 0.16 0.04 0.10 0.3 0.1 0.06 1.29 0.20 22.22 0.14 0.32 0.09 0.14 0.04 0.07 0.03 0.05 0.3 0.3 0.13 1.21 0.27 7.64 0.19 0.90 0.12 0.21 0.06 0.11 0.04 0.07 0.3 0.5 0.15 1.06 0.27 14.39 0.18 0.74 0.11 0.24 0.06 0.12 0.04 0.07 0.3 0.7 0.12 0.84 0.23 20.86 0.15 0.50 0.09 0.22 0.05 0.11 0.03 0.07 0.3 0.9 0.05 0.53 0.17 37.83 0.12 0.29 0.06 0.15 0.03 0.08 0.02 0.05 0.5 0.1 0.06 0.89 0.19 18.25 0.12 1.31 0.07 0.12 0.03 0.06 0.02 0.04 0.5 0.3 0.12 1.08 0.26 23.20 0.19 0.67 0.12 0.18 0.06 0.10 0.04 0.06 0.5 0.5 0.15 1.11 0.29 8.86 0.21 1.72 0.13 0.20 0.07 0.10 0.04 0.06 0.5 0.7 0.13 1.07 0.26 16.75 0.19 0.87 0.12 0.18 0.06 0.09 0.04 0.06 0.5 0.9 0.06 0.86 0.17 1.47 0.11 0.28 0.06 0.13 0.03 0.07 0.02 0.04 0.7 0.1 0.05 0.56 0.19 14.48 0.13 0.67 0.06 0.14 0.03 0.08 0.02 0.05 0.7 0.3 0.12 0.86 0.24 40.01 0.15 1.47 0.09 0.22 0.05 0.11 0.03 0.07 0.7 0.5 0.15 1.05 0.26 46.67 0.18 0.92 0.11 0.22 0.06 0.12 0.04 0.07 0.7 0.7 0.13 1.20 0.27 49.33 0.19 1.78 0.12 0.22 0.06 0.11 0.04 0.07 0.7 0.9 0.06 1.26 0.19 9.03 0.14 1.55 0.08 0.15 0.05 0.08 0.03 0.05 0.9 0.1 0.05 0.28 0.16 4.39 0.17 1.58 0.15 0.30 0.09 0.15 0.04 0.09 0.9 0.3 0.12 0.52 0.29 10.31 0.22 3.26 0.13 0.47 0.07 0.24 0.04 0.14 0.9 0.5 0.15 0.81 0.28 10.45 0.20 4.10 0.12 0.51 0.07 0.24 0.04 0.15 0.9 0.7 0.14 1.17 0.27 10.56 0.20 2.45 0.12 0.49 0.07 0.22 0.04 0.14 0.9 0.9 0.07 1.76 0.21 8.89 0.15 2.13 0.09 0.34 0.05 0.16 0.03 0.10 Table 7: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and AT pY , in bivariate probit model simulations with covariate X and = 0:3. 30 Root-mean-square error n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000 pT pY AT E LAT E AT E BP IV BP IV BP IV BP IV BP IV 0.1 0.1 0.07 2.13 0.24 10.15 0.16 13.46 0.09 0.29 0.05 0.16 0.03 0.12 0.1 0.3 0.14 1.19 0.29 9.15 0.20 10.95 0.13 0.47 0.07 0.23 0.04 0.13 0.1 0.5 0.15 0.65 0.31 15.02 0.23 10.69 0.14 0.51 0.08 0.24 0.05 0.15 0.1 0.7 0.12 0.32 0.27 7.99 0.24 22.77 0.18 0.50 0.10 0.24 0.06 0.15 0.1 0.9 0.05 0.15 0.10 4.87 0.09 14.08 0.10 0.32 0.12 0.16 0.11 0.10 0.3 0.1 0.06 1.38 0.22 10.52 0.15 0.31 0.09 0.14 0.04 0.07 0.02 0.05 0.3 0.3 0.13 1.36 0.28 17.03 0.19 0.66 0.12 0.21 0.06 0.12 0.04 0.08 0.3 0.5 0.15 1.10 0.26 35.91 0.18 0.54 0.11 0.23 0.06 0.11 0.04 0.07 0.3 0.7 0.12 0.75 0.23 9.86 0.15 0.49 0.09 0.23 0.05 0.12 0.03 0.07 0.3 0.9 0.05 0.34 0.15 24.94 0.12 0.29 0.09 0.15 0.04 0.09 0.02 0.05 0.5 0.1 0.05 0.75 0.18 17.90 0.12 0.88 0.06 0.12 0.03 0.07 0.02 0.04 0.5 0.3 0.12 1.13 0.26 9.02 0.19 1.10 0.11 0.18 0.06 0.09 0.04 0.06 0.5 0.5 0.15 1.23 0.29 8.71 0.21 1.26 0.13 0.19 0.07 0.10 0.04 0.07 0.5 0.7 0.12 1.12 0.25 7.44 0.18 0.84 0.11 0.18 0.06 0.10 0.03 0.06 0.5 0.9 0.05 0.72 0.18 1.66 0.11 0.51 0.06 0.13 0.03 0.07 0.02 0.04 0.7 0.1 0.05 0.36 0.15 4.83 0.13 0.42 0.08 0.14 0.04 0.08 0.02 0.05 0.7 0.3 0.12 0.76 0.23 19.62 0.15 0.99 0.09 0.22 0.05 0.11 0.03 0.07 0.7 0.5 0.15 1.10 0.26 9.00 0.18 1.87 0.11 0.22 0.06 0.11 0.04 0.07 0.7 0.7 0.13 1.35 0.27 28.15 0.19 2.07 0.12 0.21 0.07 0.11 0.04 0.08 0.7 0.9 0.06 1.34 0.21 6.61 0.15 1.70 0.08 0.15 0.04 0.08 0.02 0.05 0.9 0.1 0.05 0.17 0.10 4.84 0.10 2.63 0.11 0.30 0.11 0.16 0.10 0.10 0.9 0.3 0.11 0.33 0.26 10.61 0.24 3.27 0.18 0.50 0.10 0.24 0.06 0.15 0.9 0.5 0.15 0.66 0.30 12.18 0.23 3.01 0.14 0.54 0.08 0.25 0.05 0.15 0.9 0.7 0.14 1.19 0.28 10.92 0.20 4.55 0.12 0.54 0.07 0.22 0.04 0.13 0.9 0.9 0.07 2.15 0.23 7.39 0.17 3.29 0.09 0.34 0.05 0.17 0.03 0.12 Table 8: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and AT pY , in bivariate probit model simulations with covariate X and = 0:5. 31 Root-mean-square error n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000 pT pY AT E LAT E AT E BP IV BP IV BP IV BP IV BP IV 0.1 0.1 0.07 2.75 0.25 5.99 0.15 8.13 0.08 0.29 0.04 0.18 0.03 0.15 0.1 0.3 0.14 1.12 0.31 17.42 0.22 18.43 0.13 0.44 0.07 0.23 0.04 0.12 0.1 0.5 0.15 0.42 0.35 12.13 0.29 15.32 0.18 0.53 0.10 0.27 0.06 0.17 0.1 0.7 0.12 0.15 0.22 20.78 0.21 9.53 0.21 0.50 0.18 0.25 0.11 0.17 0.1 0.9 0.05 0.13 0.06 7.74 0.05 13.87 0.05 0.32 0.06 0.17 0.08 0.10 0.3 0.1 0.06 1.34 0.24 2.31 0.16 0.86 0.07 0.14 0.03 0.07 0.02 0.05 0.3 0.3 0.13 1.61 0.27 6.40 0.19 0.63 0.11 0.20 0.06 0.13 0.03 0.10 0.3 0.5 0.15 1.13 0.24 15.60 0.16 0.54 0.10 0.23 0.05 0.12 0.03 0.07 0.3 0.7 0.12 0.56 0.23 38.01 0.16 0.64 0.09 0.23 0.05 0.12 0.03 0.09 0.3 0.9 0.05 0.16 0.11 5.50 0.09 0.39 0.08 0.15 0.07 0.09 0.04 0.06 0.5 0.1 0.05 0.48 0.16 17.87 0.11 0.63 0.06 0.12 0.03 0.07 0.02 0.05 0.5 0.3 0.12 1.17 0.23 5.10 0.16 1.32 0.09 0.17 0.05 0.09 0.03 0.06 0.5 0.5 0.15 1.44 0.28 14.33 0.20 1.18 0.12 0.19 0.07 0.12 0.04 0.09 0.5 0.7 0.12 1.18 0.24 11.22 0.16 0.99 0.09 0.18 0.05 0.09 0.03 0.06 0.5 0.9 0.05 0.48 0.16 5.67 0.10 0.46 0.06 0.13 0.03 0.07 0.02 0.05 0.7 0.1 0.05 0.16 0.10 4.21 0.09 0.46 0.09 0.15 0.07 0.09 0.05 0.06 0.7 0.3 0.12 0.56 0.22 7.76 0.16 0.86 0.09 0.22 0.05 0.12 0.03 0.08 0.7 0.5 0.15 1.12 0.24 4.12 0.17 0.88 0.10 0.22 0.06 0.12 0.03 0.07 0.7 0.7 0.13 1.59 0.27 9.49 0.19 0.78 0.11 0.21 0.06 0.13 0.04 0.10 0.7 0.9 0.06 1.35 0.24 3.27 0.15 0.64 0.07 0.14 0.03 0.08 0.02 0.05 0.9 0.1 0.05 0.14 0.06 15.44 0.06 12.17 0.06 0.29 0.07 0.17 0.07 0.10 0.9 0.3 0.11 0.15 0.21 15.21 0.21 12.75 0.21 0.52 0.17 0.25 0.10 0.16 0.9 0.5 0.14 0.41 0.32 18.70 0.28 8.28 0.19 0.56 0.11 0.27 0.06 0.17 0.9 0.7 0.14 1.09 0.31 26.71 0.21 14.96 0.13 0.49 0.07 0.22 0.04 0.13 0.9 0.9 0.07 2.78 0.25 11.85 0.17 5.20 0.09 0.33 0.05 0.19 0.03 0.15 Table 9: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and AT pY , in bivariate probit model simulations with covariate X and = 0:7. 32 References Altonji, J., T. Elder, and C. Taber (2005). "Selection on Observed and Unobserved Variables: Assessing the E¤ectiveness of Catholic Schools," Journal of Political Economy, 113(1): 151­184. Angrist, J. (1991). "Instrumental Variables Estimation of Average Treatment E¤ects in Econometrics and Epidemiology,"NBER Technical Working Paper No. 0115. Angrist, J. (2001). "Estimation of Limited Dependent Variable Models with Dummy En- dogenous Regressors: Simple Strategies for Empirical Practice,"Journal of Business and Economic Statistics, 19(1): 2­16. Angrist, J., and J. Pischke (2009). Mostly Harmless Econometrics. Princeton University Press, Princeton. Bhattacharya, J., D. Goldman, and D. McCa¤rey (2006). "Estimating Probit Models with Self-selected Treatments,"Statistics in Medicine, 25(3): 389­413. Chiburis, R. C. (2010). "Score Tests of Normality in Bivariate Probit Models: Comment," Working paper, University of Texas at Austin. Fagerland, M. W., D. W. Hosmer, and A. M. Bo...n (2008). "Multinomial Goodness-of-...t Tests for Logistic Regression Models,"Statistics in Medicine, 27(21): 4238­4253. Firth, D. (1993). "Bias Reduction of Maximum Likelihood Estimates," Biometrika, 80(1): 27­38. Freedman, D. A., and J. S. Sekhon (2010). "Endogeneity in Probit Response Models," Po- litical Analysis, 18(2): 138­150. Greene, W. (1998). "Gender Economics Courses in Liberal Arts Colleges: Further Results," Journal of Economic Education, 29(4): 291­300. 33 Heckman, J. J. (1978). "Dummy Endogenous Variables in a Simultaneous Equation System," Econometrica, 46(6): 931­959. Heckman, J. J., and E. J. Vytlacil (1999). "Local Instrumental Variables and Latent Variable Models for Identifying and Bounding Treatment E¤ects," Proceedings of the National Academy of Sciences, 96(8): 4730­4734. Hosmer, D. W., and S. Lemeshow (1980). "Goodness of Fit Tests for the Multiple Logistic Regression Model,"Communications in Statistics, 9(10): 1043­1069. Imbens, G., and J. Angrist (1994). "Identi...cation and Estimation of Local Average Treat- ment E¤ects,"Econometrica, 62(2): 467­475. Mo¢ tt, R. A. (2001). "Estimation of Limited Dependent Variable Models with Dummy Endogenous Regressors: Simple Strategies for Empirical Practice: Comment,"Journal of Business and Economic Statistics, 19(1): 20­23. Monfardini, C., and R. Radice (2008). "Testing Exogeneity in the Bivariate Probit Model: A Monte Carlo Study,"Oxford Bulletin of Economics and Statistics, 70(2): 271­282. Murphy, A. (2007). "Score Tests of Normality in Bivariate Probit Models,"Economics Let- ters, 95(3): 374­379. Osius, G., and D. Rojek (1992). "Normal Goodness-of-...t Tests for Multinomial Models with Large Degrees of Freedom," Journal of the American Statistical Association, 87(420): 1145­1152. Pigeon, J. G., and J. F. Heyse (1999). "An Improved Goodness of Fit Statistic for Probability Prediction Models,"Biometrical Journal, 41(1): 71­82. 34 p = 0.1 p = 0.5 p = 0.9 Y Y Y 0.6 0.6 0.6 p = 0.1 0.4 0.4 0.4 T 0.2 0.2 0.2 0 0 0 0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7 0.6 0.6 0.6 p = 0.5 0.4 0.4 0.4 T 0.2 0.2 0.2 0 0 0 0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7 0.6 0.6 0.6 p = 0.9 0.4 0.4 0.4 T 0.2 0.2 0.2 0 0 0 0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7 Figure 1: AT E (solid lines), LAT E (long dashed lines), and AT T (dotted lines), for the bivariate probit model (7) with = 0:3, = 0:4, and several values of pT , pY , and . The circles denote OLS , the probability limit of an OLS regression of Y on T . 35 p = 0.1 p = 0.5 p = 0.9 Y Y Y 400 400 400 p = 0.1 200 200 200 T 0 0 0 0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7 400 400 400 p = 0.5 200 200 200 T 0 0 0 0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7 400 400 400 p = 0.9 200 200 200 T 0 0 0 0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7 Figure 2: Asymptotic variance of ^ AT E (solid lines with circles) and ^ IV (dashed lines with BP triangles), for various values of , pT , and pY . For example, an asymptotic variance of 200 means that at a given sample size n, the variance of the estimator is approximately 200=n. 36 p = 0.1 p = 0.5 p = 0.9 Y Y Y 0.5 0.5 0.5 p = 0.1 0 0 0 T -0.5 -0.5 -0.5 3 4 3 4 3 4 10 10 10 10 10 10 0.5 0.5 0.5 p = 0.5 0 0 0 T -0.5 -0.5 -0.5 3 4 3 4 3 4 10 10 10 10 10 10 0.5 0.5 0.5 p = 0.9 0 0 0 T -0.5 -0.5 -0.5 3 4 3 4 3 4 10 10 10 10 10 10 Sample size Sample size Sample size Figure 3: Spread of BP and IV estimates in simulations with no covariates and = 0:3. The area between the thin solid curves represents the range between the 5th and 95th percentiles of the BP estimator, and the area between the thin dashed curves represents the same range for the IV estimator. The thick solid curve is the mean BP estimate, the thick dashed curve is the mean IV estimate, and the dotted line is the true ATE. 37 p = 0.1 p = 0.5 p = 0.9 Y Y Y 0.5 0.5 0.5 p = 0.1 0 0 0 T -0.5 -0.5 -0.5 3 4 3 4 3 4 10 10 10 10 10 10 0.5 0.5 0.5 p = 0.5 0 0 0 T -0.5 -0.5 -0.5 3 4 3 4 3 4 10 10 10 10 10 10 0.5 0.5 0.5 p = 0.9 0 0 0 T -0.5 -0.5 -0.5 3 4 3 4 3 4 10 10 10 10 10 10 Sample size Sample size Sample size Figure 4: Spread of BP and IV estimates in simulations with covariate X and = 0:3. The area between the thin solid curves represents the range between the 5th and 95th percentiles of the BP estimator, and the area between the thin dashed curves represents the same range for the IV estimator. The thick solid curve is the mean BP estimate, the thick dashed curve is the mean IV estimate, and the dotted line is the true ATE. 38 p = 0.1 p = 0.5 p = 0.9 Y Y Y 0.5 0.5 0.5 p = 0.1 0 0 0 T -0.5 -0.5 -0.5 3 4 3 4 3 4 10 10 10 10 10 10 0.5 0.5 0.5 p = 0.5 0 0 0 T -0.5 -0.5 -0.5 3 4 3 4 3 4 10 10 10 10 10 10 0.5 0.5 0.5 p = 0.9 0 0 0 T -0.5 -0.5 -0.5 3 4 3 4 3 4 10 10 10 10 10 10 Sample size Sample size Sample size Figure 5: Spread of BP and IV estimates in simulations with covariate X and = 0:3 and skewed error terms. The area between the thin solid curves represents the range between the 5th and 95th percentiles of the BP estimator, and the area between the thin dashed curves represents the same range for the IV estimator. The thick solid curve is the mean BP estimate, the thick dashed curve is the mean IV estimate, and the dotted line is the true ATE. 39 p = 0.1 p = 0.5 p = 0.9 Y Y Y 1 1 1 p = 0.1 0.8 0.8 0.8 T 0.6 0.6 0.6 3 4 3 4 3 4 10 10 10 10 10 10 1 1 1 p = 0.5 0.8 0.8 0.8 T 0.6 0.6 0.6 3 4 3 4 3 4 10 10 10 10 10 10 1 1 1 p = 0.9 0.8 0.8 0.8 T 0.6 0.6 0.6 3 4 3 4 3 4 10 10 10 10 10 10 Sample size Sample size Sample size Figure 6: Coverage of the true AT E for nominal 95% con...dence intervals in simulations with normally distributed covariate Xi and = 0:3. The solid and dashed curves correspond to the size of tests based on ^ BPE and ^ IV , respectively. The poor coverage can be improved AT by bootstrapping the critical values, and the size from bootstrapping for ^ BPE and ^ IV is AT shown by the starred solid and starred dashed curves, respectively. 40 p = 0.1 p = 0.5 p = 0.9 Y Y Y 1 1 1 p = 0.1 0.8 0.8 0.8 T 0.6 0.6 0.6 3 4 3 4 3 4 10 10 10 10 10 10 1 1 1 p = 0.5 0.8 0.8 0.8 T 0.6 0.6 0.6 3 4 3 4 3 4 10 10 10 10 10 10 1 1 1 p = 0.9 0.8 0.8 0.8 T 0.6 0.6 0.6 3 4 3 4 3 4 10 10 10 10 10 10 Sample size Sample size Sample size Figure 7: Coverage of the true AT E for nominal 95% con...dence intervals in simulations with no covariates and = 0:3. The solid and dashed curves correspond to the size of tests based on ^ BPE and ^ IV , respectively. The poor coverage can be improved by bootstrapping the AT critical values, and the size from bootstrapping for ^ BPE and ^ IV is shown by the starred AT solid and starred dashed curves, respectively. 41 p = 0.1 p = 0.5 p = 0.9 Y Y Y 1 1 1 p = 0.1 0.5 0.5 0.5 T 0 0 0 3 4 3 4 3 4 10 10 10 10 10 10 1 1 1 p = 0.5 0.5 0.5 0.5 T 0 0 0 3 4 3 4 3 4 10 10 10 10 10 10 1 1 1 p = 0.9 0.5 0.5 0.5 T 0 0 0 3 4 3 4 3 4 10 10 10 10 10 10 Sample size Sample size Sample size Figure 8: Power (rejection probability) of 5%-level Murphy score (solid curves) and adapted Hosmer-Lemeshow (dashed curves) goodness-of-...t tests for normality in simulations with covariate X and = 0:3 and skewed error terms. 42