WPS8566 Policy Research Working Paper 8566 Long-Term Impacts of Alternative Approaches to Increase Schooling Evidence from a Scholarship Program in Cambodia Felipe Barrera-Osorio Andreas de Barros Deon Filmer Development Economics Development Research Group August 2018 Policy Research Working Paper 8566 Abstract This paper reports on a randomized experiment to investi- skills, socioemotional outcomes, socioeconomic status and gate the long-term effects of a primary school scholarship well-being, and labor market outcomes of individuals who program in rural Cambodia. In 2008, fourth-grade stu- are, on average, 21 years old. The results show that both dents in 207 randomly assigned schools (103 treatment, types of scholarships led to higher long-term educational 104 control) received scholarships based on the students’ attainment (about 0.21-0.29 grade level), but only mer- academic performance in math and language or their level it-based scholarships led to improvements in cognitive skills of poverty. Three years after the program’s inception, an (0.11 standard deviation), greater self-reported well-being evaluation showed that both types of scholarship recipi- (0.18 standard deviation), and employment probability (3.4 ents had more schooling than nonrecipients; however, only percentage points). Neither type of scholarship increased merit-based scholarships led to improvements in cognitive socioemotional skills. The results also suggest that there are skills. This new study reports impacts, nine years after pro- labeling effects: the impacts of the scholarship types differ gram inception, on the educational attainment, cognitive even for individuals with similar characteristics. This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/research. The authors may be contacted at dfilmer@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Long-Term Impacts of Alternative Approaches to Increase Schooling: Experimental Evidence from a Scholarship Program in Cambodia∗ Felipe Barrera-Osorio† T Andreas de Barros‡ Deon Filmer§ AF JEL codes: C93 Field Experiments; I21 Analysis of Education; I22 Educational Finance; I25 Education and Economic Development; I28 Government Policy; O12 Microeconomic Analyses of Economic Development. Keywords: Cambodia; education; long-term effects; merit-based targeting; poverty-based targeting; randomization; scholarships. ∗ We are grateful for financial support from the World Bank’s Strategic Research Program Trust DR Fund. We thank Simeth Beng, Tsuyoshi Fukao, and staff of the Ministry of Education of the Royal Government of Cambodia for input and assistance at various stages of this project. Alice Danon provided outstanding research assistance. For helpful comments, we thank Maria Bertling, Theresa Betancourt, Monnica Chan, Olivia Chi, Mark Chin, Jishnu Das, David Deming, Alejandro Ganimian, Sibylla Leon Guerrero, Rema Hanna, Andrew Ho, Whitney Kozakowski, Guilherme Lichand, Sophie Litschwartz, Dana McCoy, Jonathan Mijs, Karthik Muralidharan, Charles Nelson, Gautam Rao, Margaret Sheridan, Abhijeet Singh, and Martin West. The usual disclaimers apply. The findings, interpretations, and conclusions expressed in this paper are those of the authors and do not necessarily represent the views of the World Bank, its Executive Directors, or the governments they represent. Harvard University IRB Protocol Title “From Schooling to Young Adulthood”, Number IRB16-1518. † Associate Professor of Education and Economics, Harvard University, Graduate School of Ed- ucation. E-mail: felipe_barrera-osorio@gse.harvard.edu. ‡ PhD Candidate, Harvard University, Graduate School of Education. E-mail: adebarros@g. harvard.edu. § Lead Economist, World Bank. E-mail: dfilmer@worldbank.org. 1 Introduction How does additional schooling impact long-term life outcomes? According to the canonical human capital model, labor markets remunerate the skills acquired during the education process (Becker 2009). According to a signaling model (Arrow 1973; Spence 1973), education provides the market with a signal of individuals’ higher abilities; as a result, the market pays for these skills. Both models predict positive effects from investment in education. At the same time, emerging research is showing that, in many settings, increased schooling has not meant increased learning, which is potentially limiting the market returns to education (Pritchett 2013; The World FT Bank 2017). There are, however, few studies in low-income settings that can isolate the impacts of schooling on skills accumulation.1 Our paper aims to contribute to this evidence by presenting the causal long-term effects of a scholarship program which induced more schooling on cognitive, socioemotional, socio-economic status and well- being,2 , and labor outcomes in a group of 21-year-old individuals who received the scholarship nine years earlier, in Cambodia. Our study setup is the following. In 2008, 207 schools in Cambodia were ran- A domly allocated between two treatment arms (103 schools) and a control group (104 schools). In half of the treatment schools, students in grade four received a scholar- ship based on merit—high-performing students were selected using a baseline test of math and language skills—and fourth-graders in the remaining treatment schools re- DR ceived a scholarship based on poverty—students were selected using a poverty index, based on household and family socio-economic characteristics. Scholarships were given to recipients for three years (i.e. until the completion of primary school), con- ditional on continued school participation and basic performance standards. A first follow-up study, three years after the inception of the program, showed two main effects: higher school progression for individuals receiving either type of scholarship 1 Few well-identified studies on the causal impacts of education exist for developing countries; important exceptions are Duflo et al. (2017); Parker and Vogl (2018); Ozier (2016); Jakiela et al. (2015) and Friedman et al. (2011). 2 From this point on, we will refer to both socio-economic status and self-reported well-being as “well-being” for the sake of brevity. 2 (compared with non-recipients), and impacts on cognitive outcomes (as measured by a math test and a test of working memory) only for those receiving merit-based schol- arship (Barrera-Osorio and Filmer 2016). In this paper we report results from data collected in 2016—nine years after the beginning of the program—from a subsample of the original study participants. We present evidence of the effects of the schol- arship on a range of long-term outcomes spanning cognitive skills, socioemotional outcomes, socio-economic and well-being outcomes, and labor market outcomes. The analysis presents causal evidence to address three questions. First, what are the long-term effects of the program on cognitive skills and socioemotional out- comes? Specifically, we investigate the impacts of (exogenously induced) additional FT exposure to schooling on these outcomes. Heckman and Kautz (2014) show that socioemotional skills (which are also sometimes referred to as “non-cognitive” skills) are important determinants of labor outcomes in the long-term. Second, what are the long-term effects of the scholarships on socioemotional outcomes? In particular, we investigate whether socioemotional outcomes are co-produced with (or are com- plements to) cognitive outcomes. We can pursue the answer to this question because only the merit-based scholarship induced changes in cognitive skills after the first A three years of the intervention; therefore, we can test whether we observe effects on socioemotional outcomes for this group only, for both treatment groups, or for nei- ther group. Third, what are the long-term effects of the scholarships on well-being DR and labor market outcomes? Given that scholarships induced more schooling for all treated individuals, but only cognitive skills for some, we can investigate the channels through which this additional education might affect these outcomes. Based on an intent-to-treat model, the results show that, despite some catch-up by the control group between 2011 and 2016, scholarship recipients have on average 0.21-0.29 more years of schooling. This is in line with programs that attempt to reduce direct costs (for example scholarships; see Kremer et al. (2009), and Duflo et al. (2017)) and indirect costs of education (for example conditional cash transfers; see Fiszbein and Schady (2009)). We find positive effects on measures of cognitive skills, but only for the merit- based approach to targeting. Impacts of the merit scholarships on a “family index”— 3 that is an index that standardizes the cognitive skill measures and calculates a weighted average3 —have an effect size of 0.11 standard deviations (significant at the 10% level); the effects of the poverty-based treatment are close to zero (and not statistically significant). This is consistent with the effects found after the initial three years, suggesting limited fade-out for this outcome. We do not find any systematic impacts on two measures of socioemotional out- comes: emotional and behavioral difficulties (as measured by the Strengths and Difficulty Questionnaire, “SDQ”) and the “Big 5” personality traits (openness, con- scientiousness, extroversion, agreeableness and neuroticism).4 For a “family index” of these outcomes, we find imprecisely estimated impacts of 0.01 (merit-based schol- FT arships) and 0.10 (poverty-based scholarships) standard deviations. The findings therefore neither support the hypothesis that more schooling (necessarily) produces more socioemotional skills, nor the hypothesis that cognitive and socioemotional skills are (necessarily) co-produced. We find that the probability of working increased by 3.4 percentage points for young adults who had received a merit-based scholarship (significant at the 10% level), but the impact for those who had received a poverty-based scholarship was A close to zero (and statistically insignificant). The point estimates for earnings are both negative (but not statistically significant), perhaps because the scholarships induced individuals to delay entry into the labor market. DR Finally, we find positive overall impacts on various measures of self-reported well- being, but again only for those who had received merit-based scholarships. For a “family index” of socio-economic status, the point estimate is 0.17 standard devia- tions (significant at the 1% level) for the merit-based treatment arm; for the poverty treatment, the point estimate is 0.04 standard deviations (and not statistically sig- nificant). Overall, both types of scholarships led to more schooling attainment, but only the merit-based scholarships had positive impacts on cognitive, well-being, and labor 3 We follow Anderson (2008) and calculate inverse covariance matrix-weighted averages. 4 We collected information on other socioemotional outcomes such as grit and growth-mindset. However, the psychometric and statistical properties of these measures in our context were weak (Danon et al. 2018). 4 market outcomes. Neither of the two types of scholarships induced greater socioemo- tional skills. Two factors are important for interpreting these results. First, they are the marginal effect of increasing schooling by only about four additional months— although these may be critical months, inasmuch the program induced individuals to finish primary education. But it is possible that some of the key impacts of school- ing on socioemotional skills happen early on (when both the control and treatment groups were still in school) or later on in adolescence (when, for this population, both groups would have left school). Second, while attrition is neither especially high nor systematically different across the three groups of students, our relatively limited sample size may nonetheless have reduced the precision of the estimates. Our FT overall results present a complex picture, suggesting that demand-side interventions, such as scholarships, and their particular targeting approaches can have important long-term effects. The paper is organized as follows: in Section 2 we (selectively) review the related literature, in Section 3 we describe in more detail the study setup and context, in Sections 4 and 5 we describe our estimation strategy and data, in Section 6 we present the results, and in Section 7 we provide some concluding comments. A 2 Related Literature DR Our study builds on three strands of literature. First, we add to previous research on the effects of demand-side incentive programs in low- and middle-income countries, both in terms of their overall effect and with respect to varying targeting approaches. Second, we contribute to research on whether increased schooling produces outcomes that go beyond cognitive skills, in particular socioemotional outcomes. We further- more investigate how these might be co-produced. Third, we contribute to the rel- atively limited literature on the long-term effects of increased school enrollment on outcomes such as employment status or well-being (scant because these impacts only manifest themselves later in life and require a long-term approach to evaluation). 5 2.1 Demand-side incentives There is a growing empirical literature on the impact of conditional cash transfers, of which scholarships are one form, in low- and middle-income countries (Baird et al. 2014; Barham et al. 2013; García and Saavedra 2017; Snilstveit et al. 2015).5 Three studies are of particular relevance given their similar designs and scope. First, Fried- man et al. (2011) and Jakiela et al. (2015) present experimental evidence on the effects of a Kenyan merit-scholarship program for sixth-grade girls, nine years post- intervention. The studies find that short-term impacts on educational attainment and cognitive skill (initially reported in Kremer et al. (2009)) result in greater fe- FT male empowerment, improved political knowledge and attitudes in young adulthood (finding weaker evidence for effects on political behaviors). Second, Barham et al. (2013) evaluate the long-run effects of a conditional cash transfer program targeted to poor families in Nicaragua. They find that boys who were 9-12 at the time of the program attained about half a year more schooling when they were 19-22 than boys in a comparison group, and subsequently had better labor market outcomes (the difference for girls was not statistically different from zero between the treatment A and comparison groups). Third, Duflo et al. (2017) evaluate the long-term effects of a secondary school scholarship program, in Ghana. This randomized evaluation finds that the program delayed fertility and marriage, improved educational attain- ment, cognitive skill, and reproductive and health behaviors, and had heterogeneous DR impacts on earnings. Our study builds on these: The Kenyan study does not report impacts on earnings, and the Ghana and Nicaragua studies investigate few impacts on socioemotional outcomes. Our paper includes indicators for both types of out- comes, and allows for a contrast of targeting approaches (building on Barrera-Osorio and Filmer (2016)). Together, these evaluations inform the degree to which indi- vidual findings from specific contexts might have broader external validity (Vivalt 2017). 5 Scholarships may also be designed as incentive mechanisms, where payments are made based on future performance. See Fryer Jr (2011) for related evidence from the US. See Berry (2015), Blimpo (2014), and Li et al. (2014) for examples of related research from India, Benin, and China, respectively. We study scholarships whose payout is not (or arguably, only weakly) incentivized. See Section 3, below, for a description of the scholarship program. 6 2.2 Socioemotional outcomes Most of the literature on CCTs and scholarships focuses on schooling and cognitive skill outcomes, with some exceptions that consider the impact of transfers on political and social factors, household consumption smoothing (Sparrow 2007), labor market outcomes (Araujo et al. 2016; Filmer and Schady 2014; Parker and Vogl 2018; Silva and Sumarto 2015), or health (Cruz et al. 2017). Few studies analyze the impacts on various outcome dimensions simultaneously. In high-income countries, socio-emotional skills have been found to be important predictors of success in school and life in general (see West et al. (2016) for an FT overview), and the importance of social skills has grown in the U.S. labor market between 1980 and 2012 (Deming 2017). Research from the United States suggests that teachers can have large effects on socioemotional outcomes, although a teacher’s productivity in terms of student cognitive achievement is only a weak predictor for her impact on measures of students’ socioemotional outcomes (Blazar 2017; Blazar and Kraft 2017; Jackson et al. 2014; Kraft 2017; Santorella 2017). At the same time, little is known—especially in low- and middle-income countries—about whether A increased educational attainment leads to more socioemotional skills, and how this might interact with the formation of cognitive skills. Some analyses have tried to shed light on these relationships. For example, Kyllonen and Bertling (2013) report how participants’ self-reported confidence in mathematics in the 2003 Programme DR for International Student Assessment (PISA) study was positively correlated with performance. Claro et al. (2016) use a national data set of all tenth-graders in Chile to show that a student’s “growth mindset” can predict academic performance, offsetting socio-economic achievement gaps. However, these studies cannot identify exogeneous variation in schooling and cognitive skill, making causal inferences difficult.6 6 In a well-identified study, Fabregas (2017) investigates the effect of school quality and peer composition on students’ academic performance, perseverance, aspirations, and time-management, in Mexico. But this and related research on peer effects (ibid., for a review) does not shed light on the effects of educational attainment. 7 2.3 Long-term effects Research from the high-income countries suggests a common characteristic for effects of educational interventions is a lack of persistence (or “fade out”); i.e., initial positive effects that diminish in magnitude or disappear altogether over time (Bailey et al. 2017; Protzko 2015). But at the same time, other studies have shown positive effects on long-term outcomes, such as educational attainment, earnings, health outcomes, and (reduced) criminal behavior (Anderson et al. 2009; Carneiro and Ginja 2014; Chetty et al. 2011, 2014; Currie and Thomas 2000; Deming 2009; Dynarski et al. 2013; Frisvold and Lumeng 2011; Garces et al. 2002; Heckman et al. 2010; Ludwig FT and Miller 2007). Comparable long-term evidence from low- and middle-income countries is scarce, with the examples from Kenya, Nicaragua and Ghana described above being some of the few. Drawing lessons requires that educational interventions be defined more broadly. Acevedo et al. (2016) exploit a randomized controlled trial to assess the effect of a youth training and internship program in the Dominican Republic, ap- proximately four years after its inception. The authors investigate socioemotional A outcomes (including grit and self-esteem), expectations, and labor market outcomes, finding that treatment effects differed substantially by gender. Further, Doyle et al. (2011) use a randomized experiment to evaluate the impact of a health education program in grades five to seven of Tanzanian primary schools (in combination with DR health services and community engagement). Six years after the program’s imple- mentation, the study documents improvements in sexual and reproductive health attitudes, knowledge, and behaviors. In addition, Walker et al. (2007) and Gertler et al. (2013) assess the long-term effects of a randomized early childhood stimulation program (in combination with food supplementation) for a small sample of adoles- cents in Jamaica. The authors find positive effects on anxiety, depressive symptoms, self-esteem, anti-social behavior, attention deficit, hyperactivity, and oppositional behavior, along with impacts on labor market outcomes.7 Finally, both Ozier (2016) 7 See Krishnan and Krutikova (2013) for another, less well-identified study on the long-term effects of non-cognitive training in a small sample (n = 154) of students in Mumbai, India. The authors (ibid.) find large impacts on self-esteem and self-efficacy, smaller impacts on life evaluation 8 and Brudevold-Newman (2016) find positive effects of additional exposure to sec- ondary education on labor market outcomes, in Kenya; Brudevold-Newman (ibid.) also demonstrates related delays in childbearing and marriage. A review of the long- term impacts of CCT programs in Latin America (Molina-Millan et al. 2016) con- cludes that the literature produces very mixed results, with CCTs during the school years resulting in more cognitive, socioemotional skills and labor market outcomes in some settings, but not in others. 3 Intervention and Experimental Design FT In 2008, the Government of Cambodia began implementing a new pilot scholarship program for grade 4 students in 207 public schools. The program’s stated goal was to reduce student drop-out rates and increase primary school completion, though the government also implicitly sought to improve students’ educational performance. At the time, the program’s 207 schools represented all public schools in three of the country’s 25 provinces8 (Mondulkiri, Ratanakiri, and Preah Vihear); the three A provinces had been selected for having the highest drop-out rates in the upper pri- mary grades (grades four to six), according to Cambodia’s Education Management Information System (EMIS).9 The program was phased in as a pilot over two years, with a random set of 103 schools starting in 2008/09 and the remaining schools DR entering in the following year (random assignment was stratified by province). The scholarship program targeted students entering grade 4, using one of two selection approaches. In a randomly selected half of the scholarship schools (52 schools), students were selected based on their combined performance on a test of Khmer and mathematics. This “merit-based” eligibility was determined through and aspirations, as well as positive impacts on educational attainment and initial labor market outcomes (approximately 11 years after program participation started). 8 Here, we count the capital as Cambodia’s 25th “province”. More precisely, Phnom Penh is a spe- cial administrative district whose administrative characteristics partly resemble those of provinces. 9 To limit the program’s geographic scope, in Ratanakiri, only five of seven districts were selected, choosing those districts with the highest dropout rate. In the remaining two provinces, all districts were selected. 9 a centrally-scored test; the maximum possible score was 25. In the remaining 51 schools, they were selected based on a “poverty-based” approach. A student’s “poverty score” was determined based on their self-reported (but validated) house- hold and socio-economic characteristics; the poverty index ranges from 0 (richest household) to 292 (poorest household).10 Under both approaches, half of a given school’s fourth-graders qualified (i.e., the top half of performers, or the poorest half of students).11 Crucially (for our study), students in all 207 schools completed both types of assessments, independent of their school’s assignment status. Scholarships were offered to beneficiaries for three years (i.e. through the end of primary school), conditional on their continued enrollment, passing grades, and FT regular attendance. These requirements were moderately enforced.12 Scholarships were disbursed as a lump-sum payment of approximately USD20 in the first year, and two payments of approximately USD10 in each of the following two years. As reported by Barrera-Osorio and Filmer (2016), these amounts represent about 3.3 percent of the yearly per capita expenditure in the study sample. These transfers are small compared to similar programs in other countries (Fiszbein and Schady 2009); even relatively small impacts may therefore be cost-effective. A Our experimental design exploits the randomized roll-out of the program over its two phases. In 2008/09, during phase one, fourth-graders in schools that were selected to disburse the program in the second phase did not receive any scholarship DR and did not become eligible in the years thereafter.13 Note that a sub-set of these fourth-grade students would have been eligible under one of the two targeting schemes (merit-based or poverty-based), had their school been selected. In expectation, these 10 The aptitude test was based on the 2005/06 Grade 3 National Learning Assessment. The poverty assessment asked respondents about household demographics and possession of a list of assets (as provided in Table 2). See Barrera-Osorio and Filmer (2016) for more details on the student assessment and the poverty score. 11 Median students also qualified for the scholarship. The number of scholarships was determined using the previous year’s official enrolment numbers. 12 If a student lost her scholarship, its amount could not be re-allocated within the same school and the same year. Instead, the amount would be used for the subsequent cohort of fourth-graders. 13 Recall that the program required students to maintain passing grades. Thus, a phase-one fourth-grader who attended a control group school could not become eligible in phase two by repeating the grade. 10 two sub-samples are equal to their respective eligible peers from phase-one schools (below, we present supportive evidence that the two groups of students are in fact balanced, across phase-one and phase-two schools). Thus, we can identify the causal intent-to-treat effect of the scholarship program, under either of the two targeting approaches. As phase-one schools were moreover randomly assigned to either the poverty-based or merit-based targeting scheme, we can also compare the scholarship’s effect across the two targeting schemes. 4 Estimation Framework, Internal Validity FT We estimate a generic production function model: j j j Yt,i = β0 + β1 T0,i + B X 0,i + µt,i for j =merit or poverty (1) where Y are outcomes such as educational attainment, cognitive skills, socioemo- tional skills, labor outcomes, or measures of well-being (which include socio-economic status, SES). Vector X 0,i includes a rich set of baseline characteristics at the student’s A school-, village-, and individual-level (the next section describes these measures in greater detail). All estimations include district-level fixed effects and allow for the clustering of standard errors at the assignment level (i.e., within schools; cf. Abadie j DR et al. (2017)). Equation 1 estimates an intent-to-treat model, with β1 capturing the effect of offering the scholarship on outcomes Y. Our default approach is to estimate Equation 1 as two separated OLS models, for the merit- and poverty-based sub-samples.14 For annual earnings we use a Tobit model with an inverse hyperbolic sine transformation of the outcome variable because its distribution shows a spike at zero (cf. Duflo et al. 2017).15 14 Using simulations, we compared our strategy to others that would (a) use a regression- discontinuity approach (exploiting the continuous poverty- and merit-indices and their strict cut- off), (b) a difference-in-difference approach, and (c) a difference-in-discontinuity approach (not shown). All three alternative strategies make additional assumptions and do not lead to increased statistical power. 15 For respondents’ daily reservation wage, we also calculated a two-part regression or “Tobit II” model, where the second part of the model uses a log transformation of the outcome variable (see 11 For each “family” of outcomes, cognitive skills, socioemotional outcomes, and well- being, we present the results from a test that the treatment coefficients are jointly zero (using seemingly unrelated regressions, SUR). Within these sets of outcomes, we also use SUR to test whether the treatment coefficient for the poverty subsample is equal to that for the merit subsample. Our sampling frame consists of 5,964 fourth-grade students (in the program’s 207 schools), who participated in the baseline eligibility assessment, in December 2008 and January 2009. Of those, 2,996 respondents were randomly selected for the first three-year follow-up survey, in 2011. For this first follow-up, an additional 658 “replacement” students were randomly selected, in case students from the target FT group could not to be found. In the 2016 follow-up, we tracked all students who had participated in the 2011 study, a random subset of 140 respondents who had previously been found to be attritors, and all replacement students who were inter- viewed in 2011. Our 2016 sample includes 2,252 respondents, of which 2,024 had been interviewed in 2011, 86 had not been reached previously, and 142 had served as replacements, in 2011. Table 1 provides the control group means for key demographic characteristics, for A the “merit” and “poverty” sub-samples (for the control group, these refer to respon- dents who would have qualified if their school had been assigned to one or the other scholarship approach). Our analysis sample consists of 890 and 825 respondents for DR the merit-based and poverty-based sub-samples, respectively. Among those, about half (48% and 51%, respectively) are female. On average, respondents live with an additional six household members. Almost all the respondents were already working at the time of the three-year follow-up survey. The data support the fact that our experimental design is valid. First, we find that both sub-samples are balanced on observables. This holds true for the full set of respondents at baseline, as discussed by Barrera-Osorio and Filmer (2016), and for this paper’s estimation samples (see Tables A1 and A2, in the Appendix). Second, overall attrition is 31 percent for either sub-sample, and we managed to track 88 percent of respondents who were included in the three-year follow-up study Belotti et al. 2015). Results do not lead to substantial changes and are available upon request. 12 (i.e., six years after our last contact with study participants). As shown in Table 2, there are no systematic differences in attrition by treatment group. Column (5) of the Table 2’s “merit scholarship” and “poverty scholarship” panels presents the difference- in-difference among attritors and non-attritors, across respondents in the treatment and control groups (computed by OLS regression and including stratification fixed effects). Only two out of 16 indicators in the merit subsample and only three indica- tors in the poverty subsample show a statistically significant or marginally significant difference-in-differences; this result is not surprising given multiple comparisons. We also test for the individual coefficients being jointly equal to zero, using seemingly unrelated estimation (SUR); the resulting Chi-square statistics (and corresponding FT p-values) suggests that we should not reject that the two sub-samples are balanced. 5 Data and Measurement Our analysis combines data from five main sources. First, we collect outcome data through in-person interviews at the respondents’ residence, using handheld tablets. A Second, to construct a variable reflecting intention-to-treat, we use the official gov- ernment declaration (“Prakas”) of scholarship recipients. Third, we match each re- spondent to baseline data—application forms and baseline tests—as collected in De- cember 2008 and January 2009. We can thus control for baseline test scores, and for DR students’ initial household characteristics. Fourth, we construct a vector of control variables through administrative data on baseline school characteristics, as provided by the country’s Educational Management Information System (EMIS).16 Fifth, we take advantage of the fact that Cambodia’s 2008 census was conducted just before the scholarship program started. Using geographic coordinates, we match each school to its closest village and include this village’s demographic characteristics as additional controls.17 16 We include a binary indicator of whether a school had access to drinking water, a binary indicator of whether the school had a toilet facility, the number of primary school classrooms, the number of newly enrolled fourth-graders, the number of teaching staff, and the school’s income. 17 Village-level data as published by the Cambodian National Institute of Statistics at the Min- istry of Planning (2010). We control for the share of villagers who are literate in Khmer, the share 13 Data collection for the baseline and three-year follow-up occurred from December 2008 to January 2009, and from May to September 2011, respectively. Data collec- tion for our latest round of follow-up took from December 2016 to May 2017. We guaranteed data-quality by following standard monitoring procedures, as described by Glennerster (2017). First, during the first week of field work, we conducted 30% of re-surveys (“back-checks”, usually within three days) and then reduced this num- ber, for an overall back-check rate of 15.7%. Second, we spot-checked approximately 20% of interviews, provided immediate feedback, and offered repeat-trainings to enu- merators. These spot-checks were not only conducted by field supervisors but also through additional, independent field-monitoring. Third, we ran daily analytics on FT newly collected data to spot irregularities, and to identify training needs. Finally, we employed 15% of staff as dedicated quality-control officers, such that steps to improve data quality could be taken immediately, as part of the regular data flow. The following discusses our newly collected outcome measures in greater detail. As education outcomes, we measure educational attainment (highest grade com- pleted), formal and informal training that lasted for at least one week (a binary variable), and whether the respondent received any formal education since the early A three-year follow-up (a binary variable). We also collected data on four measures of cognitive skills. First, we administered a computer-adaptive math-test, in which respondents answered ten questions from DR a larger pool of 23 items.18 We used a three-parameter logistic (3PL) item response theory (IRT) model with a single guessing parameter (Birnbaum 1968; Samejima 1969) to analyze responses to math tests from an evaluation of a similar scholarship program in Cambodia that was targeted to secondary school students (Filmer and Schady 2008). Participants in this assessment had been tested in two rounds, with overlapping items, and we follow the common (Stocking and Lord 1983) methodology for IRT-based scale equating.19 Our test begins with the item of median difficulty. of villagers with no schooling, the percentage of villagers engaged in crop or animal farming, the village’s population size, and a continuous measure of villagers’ household assets. 18 To our best knowledge, this assessment constitutes the first computer-adaptive ability test as conducted during a household survey, in a developing country. 19 We removed one item with low discrimination. 14 As the test is administered and respondents answer correctly or incorrectly, our assessment picks the next item to be displayed based on maximum information, re- calculates a respondent’s ability estimate using expected a posteriori, and continues thereafter until ten items are administered for each respondent (cf. Bock and Mislevy 1982; van der Linden and Pashley 2010). The second assessment is a test of shapes and puzzles loosely based on the Raven’s Progressive Matrices. This test is a measure of fluid intelligence; respondents are asked to complete 15 sets of pattern recognition. Our third measure is a “Digit Span” test, which asks respondents to repeat sequences of single-digit numbers, of increasing length. This test is a common measure of re- spondents’ working memory (Hamoudi and Sheridan 2015). Sequences are presented FT in sets of two and begin with two integers (asking respondents to repeat 2-1 and 1-3). No additional sequences are asked if a respondent fails to repeat both prompts; the last set of longest sequence presents two strings of eight integers (asking respondents to repeat 6-9-1-7-3-2-5-8 and 3-1-7-9-5-4-8-2). The fourth outcome is a vocabulary test based on picture recognition, similar to a Peabody Picture Vocabulary Test (PPVT). This test asks respondents to identify the picture corresponding to a word which the enumerator reads out loud. For each word the respondent is asked to A select from a choice of four pictures. The test is structured such that items become increasingly difficult (examples of easy items include, “citrus,” and “garment”; items of highest difficulty include “vitreous” and “lugubrious”). A maximum of 96 items is DR presented in sets of 12, and no additional item is displayed if a respondent fails to answer at least five items correctly in a given set. The final skill estimate for each of the math, pattern recognition, and vocabulary recognition tests are calculated with a two-parameter logistic (2PL) IRT model. The Digit Span test score reflects the number of integer sequences a respondent repeated correctly. All four measures are standardized (mean zero and standard deviation of one). We report on two sets of socioemotional outcomes: we screen for emotional and behavioral difficulties with the Strengths and Difficulty Questionnaire (“SDQ”), and measure the “Big 5” personality traits. The SDQ represents a common screening in- strument; we use (the official Khmer translation of) its most frequently used version with 25 items on psychological attributes (Goodman 1997). Following its scoring 15 guidelines and official recommendations (ibid.), we report on three subscales, sepa- rated into ‘internalizing problems’ (emotional and peer symptoms, 10 items), ‘exter- nalizing problems’ (conduct and hyperactivity symptoms, 10 items), and a scale of prosocial behavior (5 items). To capture respondents’ personality traits, The Big Five Scale measures five core dimensions of personality. The five broad personality traits measured are extraversion, agreeableness, openness, conscientiousness, and neuroti- cism. Evidence of the Big Five as being relevant (and associated with life outcomes) has been growing, beginning with the research of Fiske (1949) and later expanded upon by other researchers including Norman (1967), Smith (1967), Goldberg (1981), and McCrae and Costa (1987). We use the short 15 item Big Five Inventory (BFI-S) FT (Lang et al. 2011), with three items per personality trait. Like the indicators of cognitive skill, all measures of socioemotional outcomes are standardized.20 We also collected information on five labor market outcomes. We ask whether a respondent is currently working (yes or no) and the age at which she or he first started working. We moreover construct a binary indicator of whether a respondent’s main work activity is cognitively demanding. We categorize an occupation as such if it requires at least occasional use of reading, writing, mathematics, or a computer A (according to the respondent). Our survey also asked for respondents’ income; our analysis reports on (the inverse hyperbolic sine of) yearly earnings and (the inverse hyperbolic sine of) a respondent’s daily reservation wage, i.e., the minimum wage or DR payment for which a respondent is willing to accept work (both are reported in US dollars, a currency commonly used in Cambodia). Our last set of outcomes includes six indicators of socio-economic status and well- being. We assess subjective social status using a “MacArthur community ladder”.21 Respondents were shown a picture of a ladder with ten rungs and were told that 20 For further discussion on these measures, and their psychometric and statistical properties, see Danon et al. (2018). In addition to the measures we report on here, we collected data on respondents’ level of grit (Duckworth and Quinn 2009) and their growth mindset (Dweck 2000). We do not report on results for these measures because of their poor psychometric properties in our data. The inclusion of either of these measures does not substantively change our results. 21 For a description and bibliography of papers that use MacArthur lad- ders, see the MacArthur Foundation’s Network on SES and Health website: http://www.macses.ucsf.edu/Research/Psychosocial/subjective.php. 16 higher rungs correspond to higher socio-economic status. They were then asked to place themselves on this ladder in relation to everyone in their community. As a second measure of socio-economic status, we construct an index of respondents’ household assets, asking whether they possess items from a list similar to the one presented in Table 2. To calculate an individual’s latent SES score, we borrow from the psychometric literature and estimate a two-parameter logistic (2PL) IRT model, placing responses from 2009, 2011 and 2016 on the same scale.22 We also asked respondents to rate their satisfaction with life at present, all things considered, on a scale from one (“completely dissatisfied”) to ten (“completely satisfied”) and to rate their quality of life and health, respectively, on a scale from one (“poor”) to five FT (“excellent”). The fifth and last measure screens for (minor) mental health disorders, using the General Health Questionnaire (“GHQ”). We use the short form of the questionnaire (GHQ-12) with Likert scoring (Goldberg and Williams 2006; Quek et al. 2001). All six measures are standardized (mean zero and standard deviation of one).23 Finally, for each set of educational outcomes, cognitive outcomes, socioemotional outcomes, and SES and subjective well-being, we also calculate an overall “family A index,” following Anderson (2008).24 These indices have the benefit of reducing the number of statistical tests (and the temptation to selectively focus on positive results). In constructing the indices, we ensured that the qualitative “direction” DR of the construct was preserved—higher values point to more desirable outcomes. However, our index construction is atheoretical and may therefore group together measurements with different underlying constructs. We therefore present and discuss results from both individual measurements and the family indices. 22 Filmer and Scott (2012) show that such an IRT approach produces similar household rankings when compared to other aggregation methods. 23 We standardize by focusing on the endline measures for control group students (who would have qualified for at least one of the two types of scholarships, had they been in a treatment school instead). 24 We also considered using an alternative index instead, following Kling et al. (2007). The alternative approach does not lead to qualitatively different conclusions. 17 6 Results Tables 3 to 7 present results on five main categories of outcomes: education; cognitive skills; socioemotional outcomes; socio-economic and well-being outcomes; and labor outcomes. The tables share a common structure. Each table has two panels; Panel A reports results for the merit sample, whereas Panel B reports results for the poverty sample. Each panel presents separate regressions for a given dependent variable, as stated in the column headers. For the treatment variable (1 if assigned to treatment, 0 otherwise), the table presents regression coefficients and standard errors. All re- gressions control for covariates at baseline and district fixed effects; standard errors FT are clustered at the level of randomization (the school) (Abadie et al. 2017). Each of the two panels also presents the unconditional mean, as observed for the control group. Each panel moreover includes the results from a chi-square test on the null hypotheses that all treatment coefficients are jointly zero, using seemingly unrelated regression (SUR). Finally, across the two panels and for each of the outcome vari- ables, we present results (in the bottom two rows of the tables) for a test of the null hypothesis that the two treatment coefficients (merit and poverty) are equal. A 6.1 Education The main stated objective of the program was to increase school progression of low- DR income individuals. Early dropout from primary school is still a major obstacle in education in Cambodia, especially in rural areas. At inception of the program, only close to 40% of the poorest quintile of income completed 6th grade (Barrera-Osorio and Filmer 2016). As such, the first set of outcomes that the program aimed to change was to induce greater school progression, with an immediate goal of helping students successfully complete primary school (grade 6). Table 3 presents results for school progression (highest grade attained), primary school graduation (a zero-or-one variable), an indicator of whether the respondent received any formal education since the three-year follow-up study (in 2011), and a “family index” of the previous three measures (measured in standard deviations, SDs). On average, students in the control group completed 5.45-5.57 grade levels. Both 18 types of scholarships increased educational attainment, with similar point estimates (0.213 and 0.291 for merit and poverty scholarships, respectively, equivalent to about four additional months of schooling). The effects on overall attainment, as reported by (Barrera-Osorio and Filmer 2016) after three years of starting the program, were slightly higher for the poverty sample (0.34), and similar for the merit sample (0.23) (Barrera-Osorio and Filmer 2016, Table 4, column 3), indicating some catch-up by the control group. The scholarships increased primary school completion, by 5.0 and 11.3 percentage points (pp) for the merit- and poverty-based approaches, respectively (statistically significant for poverty-based scholarships). The point estimates for impacts on participation in any formal education (between FT 2011 and 2016) are positive and statistically significant for both groups: merit-based scholarships increased participation by 4.4pp over a control-group average of 77%, poverty-based scholarship increased it by 10pp over a control-group average of 71%, suggesting some catch-up with the merit-based scholarship group. The joint test of all coefficients being equal to zero is rejected for both treatments (a p-value of 0.06 and 0.04 for the merit and poverty treatments, respectively). Both point estimates of the regression with the “family index” as dependent variable are positive and statistically A significant (at the 5% and 1% levels, respectively), with a point estimate of 0.131 standard deviations for the merit-based and of 0.264 standard deviations for the poverty-based scholarships. However, we cannot conclude that the two coefficients DR are in fact different (the p-value corresponding to this test is above 0.10). 6.2 Cognitive skills An implicit objective of the program was to induce an increase in students’ learn- ing by encouraging greater attendance and retention—that is, inducing additional schooling. We measure cognitive skills through proxies that relate to an individual’s knowledge, ability to tackle problems, and fluid intelligence. Unsurprisingly, the control group for the merit sample has higher average test scores on these measures than the control group for the poverty sample (Table 4). Table 4 presents the impacts of scholarships on these measures of cognitive skills. 19 Across the different measures, we find suggestive evidence of positive effects for the merit-based treatment. All coefficients are positive, and two of them are statistically significant (Raven’s and the overall “family index”). The estimation suggests an overall effect of 0.113 standard deviations on these measures (significant at the 10% significance level). In contrast, the results for the poverty-based scholarship are either close to zero or even negative, in the case of the Forward Digit Span (a point estimate of -0.129 SDs, significant at the 10% level, and different from the effect for the merit-based transfer, significant at the 5% level). The “family index” is close to zero in the case of the poverty scholarship (<0.01 standard deviations). The findings here, nine years after program inception, are consistent with those FT documented in the previous three-year follow up study. In that study, merit-based scholarship recipients scored higher in mathematics and for the Digit Span test (Barrera-Osorio and Filmer 2016), whereas poverty-based scholarship recipients did not. 6.3 Socioemotional skills A An important contribution of our study is the analysis of effects on socioemotional skills. We are not only interested in measuring the effects of scholarships on these measures; we are also interested in the relationship between cognitive and socioemo- tional skills. The intuition behind this analysis has two parts. First, if scholarships DR induced more schooling for both types of scholarship recipients, then, under the as- sumption that schools also “produce” socioemotional skills, we should observe effects on these skills from both poverty- and merit-based scholarships. In contrast, if there is a complementary relationship in the accumulation of cognitive and socioemotional skills, we should observe effects on socioemotional skills only for students with the merit-based scholarship, and not for students in the poverty-based scholarship. We formally present these relationships in the next paragraphs. Our approach is based on two different conceptual models of the relationships between years of education (E ), cognitive skills (C ), and socioemotional skills (S ). As a starting point, based on the evaluation three years after the program’s inception 20 (Barrera-Osorio and Filmer 2016), we know that treatment T0 (at baseline, t = 0) increased years of education schooling for both merit- and poverty-based scholarships Et (Et = f (T0 ; X0 , Z0 ); ∂T 0 > 0, for both types of scholarships). Furthermore, the evaluation showed a causal, positive effect of the intervention on cognitive skills for ∂C M the merit-based scholarship only (CtM = f (T0 M M > 0); and zero effects ; X0 , Z0 ), ∂T t 0 ∂C P P for the poverty-based scholarship (CtP = f (T0 P = 0), where M denotes ; X0 , Z0 ), ∂T t 0 merit-based treatment and P denotes poverty-based treatment. The first conceptual relationship we explore is that between each type of skill— cognitive and socioemotional—and years of education: FT Ct = g (Et ; X0 , Z0 ) St = g (Et ; X0 , Z0 ) , where X0 are student characteristics and Z0 are school inputs (at baseline). These equations state that the effect on either set of skills is a function of the years of educa- tion; i.e., exposure to more schooling will induce higher cognitive and socioemotional skills. Therefore, the first set of relationships we investigate are: A ∂C ∂C ∂E = ∗ >0 (2) ∂T ∂E ∂T and DR ∂S ∂S ∂E = ∗ >0 (3) ∂T ∂E ∂T If schooling produces cognitive and socioemotional skills, both equations 2 and 3 are positive, independently of the type of treatment (merit or poverty). In contrast, the second conceptual relationship is based on a modification of this setup: for the merit-based scholarship we have an additional equation, relating cognitive skills and treatment: M CtM = f (T0 ) (4) i.e., treatment induced higher cognitive skills only for the merit (M ) treatment. The 21 basic relationship of interest is between socioemotional skills and cognitive skills: StM = g (CtM , Et ; X0 , Z0 ) The second relationship we investigate is therefore: ∂StM ∂StM ∂CtM ∂StM ∂Et M = ∗ + ∗ >0 (5) ∂T0 ∂CtM ∂T0M ∂Et ∂T0M i.e., that the effect of treatment on socioemotional skills is positive, and it depends ∂S M on the effect of cognitive skills on socioemotional skills ( ∂CtM ) and on the indirect FT t ∂S M effect of higher exposure to more schooling ( ∂E t t ). If there is complementarity (or M ∂St co-production) between cognitive and socioemotional skills (i.e., M ∂Ct > 0), then M ∂St M ∂T0 > 0. For the case of the poverty-based scholarship (P ), the corresponding expression is: ∂StP ∂StP ∂Et P = ∗ P (6) ∂T0 ∂Et ∂T0 A since ∂CtP P =0 ∂T0 . There are three main relevant cases for Equations 5 and 6. If exposure to school DR in-and-of itself produces socioemotional skills, both Equation 5 and 6 are positive. If exposure to schooling does not produce socioemotional skills, Equation 6 is equal to zero. Finally, under complementarities between cognitive and socioemotional skills (e.g. if cognitive skills help in the acquisition of socioemotional skills, or if they are co-produced), then Equation 5 is positive, independent of the relationship between socioemotional skills and exposure to school. Table 5 presents results on the Strengths and Difficulties Questionnaire (SDQ)— separating out the three attributes: prosocial, internalizing, and externalizing—and on the Big 5—separating by its five traits: Openness, Conscientiousness, Extraver- sion, Agreeableness and Neuroticism (OCEAN). Overall, we reject the hypothesis of impacts on any subcomponent of these two 22 groups of socioemotional outcomes. All point estimates are close to zero, with the ex- ception of Neuroticism for the poverty treatment, with a coefficient of 0.186 standard deviations (statistically significant at the 1% level). Nevertheless, the coefficients for the impact on “family indices” for both treatments are close to zero (-0.005 and -0.099 standard deviations for the merit- and poverty-based treatment, respectively), and neither coefficient is statistically significant. The broad pattern of the table suggests that the program did not produce effects on socioemotional skills, despite the observed impact on school progression and, for the merit sample, on cognitive outcomes. We cannot rule out competing hypotheses such as low marginal exposure to schooling (i.e., the treatment groups increased their FT educational attainment by only about four months, on average). We note, however, that this amount of additional schooling was sufficient to produce improved cognitive performance among the recipients of merit-based scholarships. 6.4 Labor outcomes Table 6 presents the effects of the program on current labor status (coded as one if the A respondent is “currently working”, and zero otherwise); the respondent’s age when they started working (which captures child labor); whether the recipient participated in work-related training that lasted for at least one week (formal or informal, a zero-or-one variable); the cognitive demands of the respondent’s main work activity; DR and two measures of income: yearly earnings and the daily reservation wage (both transformed using an inverse hyperbolic sine). There is a positive impact on the probability of working for recipients of the merit-based scholarships (3.4 percentage points, statistically significant, at the 10% level); the point estimate for the poverty arm is lower (1.2 percentage points), and not statistically significant. These effects are from a high baseline level of people who report to be currently working (the means of both control groups are around 92%).25 25 Of those, approximately 84% report agriculture, fishery, or forestry as their main field of work. Approximately 90% report to engage in agriculture, fishery, or forestry among their overall labor activities. We follow the International Standard Classification of Occupations (ISCO); these 23 Respondents in our two samples started to work very early in life, when they were between 12 and 13 years old. Respondents who were offered a scholarship delayed en- tering the labor market by 0.074 and by 0.339 years for the merit- and poverty-based program, respectively, in line with the results for school progression. However, these estimates have large standard errors and are not significantly different from zero. In addition, while about 58% of control group respondents report having received formal or informal training since 2011 (which could have improved their work prospects), we do not see any effects on this outcome. Only a small share of respondents (less than 18%) engage in economic activities that are cognitively demanding. There is no evidence of impact on the cognitive demands of the main work activity, for either FT of the two treatments. The point estimates on yearly earnings are negative for both treatments arms (but not statistically significant); one potential explanation is that the scholarship program delayed entry into the market for recipients and, as a result, they have less experience than non-recipients. We observe a positive impact on the daily reservation wage for both groups; however, the estimates are very imprecise. A 6.5 Well-being outcomes Table 7 presents effects on various measures of well-being. These include both self- assessed measures as well as more readily observed indicators such as measures of household asset ownership. All these outcomes are standardized to have a mean of DR zero and a standard deviation of one. As for some of the previous tables, we also present results for a standardized “family index” of these measures. Both treatments caused a positive impact on perceived status as measured by the SES ladder, with point estimates of 0.173 and 0.208 standard deviations for merit- based and poverty-based scholarships respectively. In addition, merit-based scholar- ships resulted in statistically significant positive impacts on respondents’ SES Index (i.e. ownership of household assets; 0.186 standard deviations), quality of health (0.129 standard deviations) and on the “family index” (0.174 standard deviations). Other than the SES ladder, none of the other impacts for the poverty-based schol- individuals’ occupation falls into ISCO Major Group 6 (“skilled agricultural and fishery workers”). 24 arships are statistically significantly different from zero, although most of the point estimates are positive. We reject the null hypothesis, for both targeting approaches, of all estimators being equal to zero (at the 1% level of significance). The point estimate for the impact on the overall “family index” for merit-based scholarships is substantially higher than for the poverty-based ones (and statistically significantly so; p-value = 0.168). 6.6 Heterogeneity We investigate two types of heterogeneous effects. Both sets of analyses use the FT “family indices” for education, cognition, socioemotional outcomes, and well-being outcomes; as an indicator of labor outcomes, we use a respondent’s daily reservation wage. The first analysis of heterogeneous effects compares the impact of scholarships for respondents who would have qualified for a scholarship under either of the two targeting schemes. For those individuals, the scholarship only differs in terms of its name or “label”. To investigate this, we estimate a regression that includes the treatment dummy, an indicator for whether a respondent would not have qualified A under the other scheme, and the interaction between the treatment and this indicator. Of interest is a comparison of the two direct treatment coefficients (the first rows of Panels A and B of Table 8) as these reflect the impact of the scholarships on students who were both high merit and high poverty. A difference in point estimates indicates DR a labeling effect.26 The results are consistent with heterogeneous impacts by treatment label, fa- voring the merit-based presentation of scholarships over their poverty-based presen- tation. The key pattern in Table 8 is that for the indices other than education participation and socio-emotional outcomes, the coefficients on “Treatment” for the merit-based scholarships are substantively larger than those for “Treatment” for the poverty-based scholarships. For example, for cognitive skills the impact of merit- based scholarships on high-merit high-poverty students is 0.211 (statistically sig- 26 This labeling effect was shown to be present for impacts on cognitive skills after three years of program implementation (Barrera-Osorio and Filmer 2016). 25 nificantly different from zero) whereas the impact of poverty-based scholarships on high-merit high-poverty students is 0.056 (not statistically significantly different from zero). The difference between these coefficients is 0.155 for cognitive skills, 0.144 for well-being, and 0.158 for reservation wage. By contrast, the impacts on education participation and socio-emotional outcomes are more similar (the differences are - 0.069 and 0.095, respectively). We interpret these results as suggesting that the labeling effect that was apparent in the earlier three-year follow-up study remains and is apparent in dimensions not documented before (well-being and reservation wage). However, we recognize that imprecision in the estimates makes it hard to be confident about this finding—the FT only difference in coefficient estimates that is statistically significantly different from zero is that for reservation wage (p-value of 0.018).27 The second dimension of heterogeneity we investigate is that by gender. In Ta- ble 9, we present results from regressions of the dependent variables on a treatment indicator, a gender indicator (female = 1, and zero otherwise), and their interaction. The table also assesses whether the size of the impact is different across targeting types for boys (as indicated by a Chi-square test and its corresponding p-value in the A last two rows of the table). The results are mixed, and should be interpreted with caution since, as in the discussion of Table 8, some of the point estimates suffer from large standard errors. We do not find gender-differential impacts on a beneficiary’s DR educational attainment, socioemotional outcomes, or well-being (whether within or across the two programs and samples). In contrast, there are large differences in the effect of poverty-based transfers on cognitive skills. While the impact on boys is pos- itive (with an effect-size of 0.168), the impact on girls is negative (with an effect-size of -0.141=0.168-0.309)—with the difference being a statistically significantly differ- ent from zero. Unlike in the average effect, the results reveal a positive impact of poverty-based scholarships for male recipients. Finally, Column (5) suggests that the estimated impact on a recipient’s daily reservation wage comes from the impact for 27 It is notable that the impacts of the merit-based scholarships on cognitive skills and reservation wage appear to be largely driven by poor individuals (as indicated by the size of the coefficient for the interaction term). 26 male recipients; for females, the point estimate is close to zero for either of the two programs. 7 Conclusions This study has investigated the long-term impacts of increased schooling, with a particular focus on potential complementarities across schooling, the development of cognitive skills, and socioemotional and labor market outcomes later in life. To this end, we evaluated the long-term effects of a primary school scholarship program in FT rural Cambodia, nine years after the program’s inception, tracking study participants when they were, on average, 21-years-old. Overall, we find that targeting approach matters for the impact on cognitive skills, socio-economic status, and well-being. The merit-based and poverty-based targeting schemes both led to increased schooling, but only the merit-based scholarship led to improvements in cognitive skill and to greater well-being. There is limited evidence of systematic differences across outcomes in these long-term impacts by gender. A Our study points to potential important avenues for research and policy. Prior work argues that more schooling does not necessarily imply more learning (World Bank, 2018); in turn, our work highlights that more schooling, even if it enhances learning, may not necessarily translate to noticeable changes in the labor market DR outcomes and may not lead to measurable improvements in socioemotional skills. To better understand this puzzle, additional research is needed, in at least three areas. First, our analysis of heterogeneous effects provides suggestive evidence that labor market effects may be concentrated among poorer beneficiaries who are male. This result echoes the findings of Duflo et al. (2017), who find labor market effects for a subset of male students, only. It will be important to understand how programs such as these can be designed in a way such that they also fully benefit female recipients. Second, our findings are consistent with research by Jackson (2018), which suggests that the school-based production of cognitive skills may not necessarily go hand- in-hand with improvements in socioemotional outcomes. However, research on how to purposefully foster socioemotional skill in school settings is only in its infancy, 27 especially in developing countries (see West et al. 2016). Third, our reported lack of impacts on socioemotional skills may be at least partially driven by a lack of precision; we would encourage other researchers to improve upon our study through continued work on the measurement of socioemotional outcomes in low-income countries (such as Laajaj and Macours 2017) and through similar, long-term evaluations with larger samples. A FT DR 28 Tables A FT DR 29 Table 1: Balance at Baseline and Three-year Follow-up Merit Scholarship Poverty Scholarship n All Treatment Control Difference n All Treatment Control Difference (1) (2) (3) (4) (5) (1) (2) (3) (4) (5) Panel A. Baseline Characteristics Age 822 11.882 12.034 11.732 0.358 776 11.927 12.048 11.799 0.353 (2.094) (2.066) (2.114) (0.24) (2.043) (2.028) (2.054) (0.237) Female 828 0.504 0.507 0.5 0.005 786 0.587 0.64 0.53 0.098*** (0.5) (0.501) (0.501) (0.035) (0.493) (0.481) (0.5) (0.035) Number of minors in HH 809 1.686 1.631 1.742 -0.123 770 1.821 1.856 1.783 0.081 (1.103) (1.092) (1.112) (0.123) (1.086) (1.052) (1.122) (0.12) Poverty Index (0-292) 858 210.88 204.417 217.557 -10.614 790 244.295 242.447 246.239 -1.2 (60.471) (66.586) (52.676) (8.141) (32.826) (34.06) (31.404) (4.482) Test score (0-25) 858 19.751 19.798 19.701 0.194 790 18.229 18.686 17.748 1.097 (3.098) (2.869) (3.321) (0.481) (4.744) (4.729) (4.717) (0.672) DR Panel B. Follow-up Characteristics Age 797 15.238 15.101 15.382 -0.208 718 15.355 15.082 15.643 -0.501* 30 (2.202) (2.269) (2.123) (0.244) (3.063) (2.092) (3.809) (0.287) Female 797 0.497 0.506 0.487 0.019 718 0.568 0.633 0.5 0.124*** (0.5) (0.501) (0.5) (0.034) (0.496) (0.483) (0.501) (0.034) A Number of minors in HH HH size 797 797 2.592 (1.645) 7.12 (2.545) 2.595 (1.547) 7.209 (2.501) 2.59 (1.743) 7.028 (2.591) 0.058 (0.148) 0.284 (0.227) 718 718 2.687 (1.659) 7.088 (2.401) 2.557 (1.539) 6.897 (2.22) 2.823 (1.77) 7.289 (2.566) -0.213 (0.148) -0.294 (0.218) Married 503 0.074 0.076 0.071 0.017 454 0.07 0.068 0.073 0.001 (0.261) (0.266) (0.257) (0.031) (0.256) (0.252) (0.261) (0.031) Currently Working 826 0.903 0.895 0.911 -0.012 760 0.872 0.857 0.889 -0.032 (0.296) (0.307) (0.285) (0.027) (0.334) (0.35) (0.315) (0.032) Notes:Minors refers to respondents age 14 and under; this may include the respondent. HH size refers to the number of people living in the respondent’s house- hold, including the respondent. Married is a dummy equal to 1 if the respondent is currently married and 0 if never married, divorced or separated. This variable is missing for minors. Currently working is a dummy equal to 1 if the respondent worked during the last week or has a job at the moment and 0 otherwise; respondents may work and also be a student. Column (1) presents the number of observations in the analysis sample. Colums (2) to (4) display the means for FT the full sample, the treatment group, and the control group, respectively. Standard deviations in parentheses. Column (5) is the difference between the treat- ment group mean and the control group mean. Differences in means are computed by OLS regression, controlling for province fixed effects. Standard errors in parentheses are clustered at the school level. *** p<0.01, ** p<0.05, * p<0.1. Table 2: Analysis of Differential Attrition Merit Scholarship Poverty Scholarship Attritor C Attritor T Non-attritor C Non-attritor T Diff-in-Diffs Attritor C Attritor T Non-attritor C Non-attritor T Diff-in-Diffs (1) (2) (3) (4) (5) (1) (2) (3) (4) (5) Female 0.4 0.38 0.5 0.51 -0.03 0.46 0.4 0.53 0.65 -.16** (.49) (.49) (.5) (.5) (.06) (.5) (.49) (.5) (.48) (.07) Number of minors 1.7 1.52 1.74 1.65 -0.09 1.88 1.89 1.78 1.85 -0.07 (1.15) (1.08) (1.12) (1.1) (.16) (1.12) (1.16) (1.13) (1.06) (.18) Own motorcycle 0.35 0.37 0.43 0.42 0.02 0.27 0.29 0.27 0.23 0.04 (.48) (.48) (.5) (.49) (.07) (.45) (.46) (.45) (.42) (.06) Own car/truck 0.13 0.1 0.13 0.18 -0.06 0.05 0.04 0.03 0.05 -0.02 (.33) (.3) (.33) (.38) (.05) (.23) (.19) (.17) (.21) (.03) Own oxen/buffalo 0.45 0.44 0.54 0.58 -0.03 0.38 0.4 0.37 0.51 -0.07 (.5) (.5) (.5) (.49) (.08) (.49) (.49) (.48) (.5) (.08) Own pig 0.49 0.42 0.56 0.6 -.12* 0.45 0.48 0.42 0.55 -0.06 (.5) (.5) (.5) (.49) (.07) (.5) (.5) (.49) (.5) (.08) Own ox or buffalo cart 0.22 0.22 0.29 0.31 0.01 0.18 0.23 0.17 0.26 -0.01 (.41) (.41) (.46) (.46) (.06) (.39) (.42) (.38) (.44) (.08) Hard roof 0.34 0.43 0.5 0.58 0.01 0.24 0.4 0.36 0.36 .17** (.48) (.5) (.5) (.49) (.07) (.43) (.49) (.48) (.48) (.08) DR Hard wall 0.47 0.59 0.57 0.56 .12* 0.35 0.4 0.4 0.42 0.05 (.5) (.49) (.5) (.5) (.07) (.48) (.49) (.49) (.49) (.08) Hard floor 0.85 0.82 0.83 0.92 -.11** 0.85 0.86 0.78 0.82 -0.01 (.36) (.39) (.37) (.27) (.05) (.36) (.35) (.41) (.38) (.05) 31 Have automatic toilet 0.05 0.06 0.06 0.05 0.02 0.02 0.03 0.01 0 0.02 (.22) (.24) (.23) (.22) (.03) (.14) (.17) (.12) (.05) (.02) Have pit toilet 0.1 0.11 0.14 0.14 0.02 0.08 0.19 0.12 0.12 .1* (.3) (.32) (.35) (.34) (.05) (.28) (.4) (.32) (.32) (.06) Electricity Piped water Poverty Index (0-292) A 0.23 (.42) 0.05 (.22) 221.83 0.23 (.42) 0.06 (.24) 220.91 0.25 (.43) 0.05 (.22) 217.84 0.22 (.42) 0.05 (.21) 205.24 0.02 (.07) 0.01 (.04) 9.93 0.18 (.39) 0.04 (.19) 245.18 0.18 (.38) 0.03 (.17) 243.56 0.17 (.38) 0.02 (.15) 246.17 0.15 (.35) 0.02 (.12) 242.13 0.02 (.05) 0 (.02) -0.51 (53.02) (52.46) (51.9) (66.18) (9.95) (32.17) (34.38) (31.89) (34.08) (5.99) Test score (0-25) 19.72 19.86 19.79 19.83 0.01 17.46 18.44 17.87 18.63 0 (3.24) (3.3) (3.23) (2.85) (.49) (4.87) (5.08) (4.69) (4.78) (.83) Observations 201 153 378 417 1149 190 135 341 390 1056 Attrition rate 0.31 0.31 Joint significance: Ho: all coef. =0 Chi-square 19.93 18.59 p-value 0.22 0.29 FT Notes: All variables measured at baseline. Columns 1 to 4 display the means for the control group attritors, the treatment group attritors, the control group surveyed and the treatment group surveyed. Standard deviations in parentheses. Column (5) is the difference between the treatment group mean and the control group mean among attritors minus the difference between the treatment group mean and the control group mean among respondents. Differences in means are computed by OLS regression, controlling for province fixed effects. Standard errors in parentheses are clustered at the school level. *** p<0.01, ** p<0.05, * p<0.1. The Chi-square (and corresponding p-value below) is the result of a test testing for the individual coefficients being jointly equal to 0 using seemingly unrelated estimation. Table 3: Education Outcomes Highest grade com- Completed primary Received any for- Family index pleted mal education in 2011-2017 (1) (2) (3) (4) Panel A. Merit Treatment 0.213* 0.0500 0.044* 0.131** (0.117) (0.036) (0.026) (0.065) Observations 814 814 814 814 R-squared 0.160 0.148 0.129 0.167 FT F-statistic 3.240 3.610 3.420 5.040 Covariates Yes Yes Yes Yes Control mean 5.590 0.610 0.780 0.130 Joint significance: Ho: all coef. =0 Chi-square 3.548 p-value 0.0600 Panel B. Poverty Treatment 0.291* 0.113*** 0.100** 0.264*** (0.149) (0.040) (0.039) (0.088) Observations 753 753 753 753 R-squared 0.169 0.173 0.124 0.174 A F-statistic 3.250 4.650 2.890 3.720 Covariates Yes Yes Yes Yes Control mean 5.480 0.570 0.720 -0.0100 Joint significance: Ho: all coef. =0 Chi-square 4.107 p-value 0.0430 DR Joint test: Poverty vs. Merit Chi-square 0.253 2.049 2.184 2.287 p-value 0.615 0.152 0.139 0.131 Notes: Estimated treatment effects. The dependent variable in column (1) is the highest grade the individual completed and is equal to -1 if the individual received no education, 0 if he only went to kindergarten and then ranges from 1 to 11 for Grade 1 to Grade 11. In column (2), the de- pendent variable is a dummy equal to 1 if the individual completed primary education. In column (3), the dependent variable is equal to 1 if the individual was enrolled in the formal education system during any of the years 2011 to 2016. In column (4), the family index is the inverse covari- ance matrix-weighted mean of the standardized dependent variables from the three previous columns following Anderson (2008). All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. The joint significance Chi-square (and corresponding p- value below) is a result of testing for the coefficients of individuals regressions being jointly equal to 0, using seemingly unrelated estimation. The poverty vs. merit Chi-square (and corresponding p-value below) is a result of testing for the coefficient of the merit sample and the coefficient of the poverty sample being equal. 32 Table 4: Cognitive Outcomes Math Raven’s Forward Picture Recog- Family index Digit Span nition Vocabu- lary Test (1) (2) (3) (4) (5) Panel A. Merit Treatment 0.0670 0.155** 0.0700 0.0200 0.113* (0.075) (0.066) (0.065) (0.073) (0.068) Observations 814 814 813 814 813 R-squared 0.189 0.180 0.0930 0.293 0.225 F-statistic 6.130 6.670 2.350 11.02 8.710 FT Covariates Yes Yes Yes Yes Yes Control mean 0.0800 0.0700 -0.0100 0.100 0.0700 Joint significance: Ho: all coef. =0 Chi-square 0.861 p-value 0.353 Panel B. Poverty Treatment 0.0860 0.0520 -0.129* 0.0600 0.00500 (0.064) (0.080) (0.070) (0.081) (0.072) Observations 753 753 752 753 752 R-squared 0.150 0.156 0.114 0.279 0.196 F-statistic 6.610 7.370 3.450 4.780 4.190 A Covariates Yes Yes Yes Yes Yes Control mean -0.0600 -0.0300 0.0400 -0.0400 -0.0300 Joint significance: Ho: all coef. =0 Chi-square 1.938 p-value 0.164 DR Joint test: Poverty vs. Merit Chi-square 0.0530 1.304 6.455 0.195 1.812 p-value 0.818 0.254 0.0110 0.659 0.178 Notes: Estimated treatment effects. The dependent variable in column (1) is the score on the mathematics computer adaptive test, com- puted using Item Response Theory (IRT) with a two parameter logistic (2PL) model, standardized. In column (2), the dependent variable is the score on the Raven’s matrices test computed using IRT with a 2PL model, standardized. In column (3), the dependent variable is the standardized score on the digit span test using forward items only, standardized. In column (4), the dependent variable is the score on a Picture Recognition Vocabulary Test computed using IRT with a 2PL model, standardized. In column (5), the family index is the inverse covariance matrix-weighted mean of the standardized dependent variables from the four previous columns following Anderson (2008). All re- gressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. The joint signifi- cance Chi-square (and corresponding p-value below) is a result of testing for the coefficients of individuals regressions being jointly equal to 0, using seemingly unrelated estimation. The poverty vs. merit Chi-square (and corresponding p-value below) is a result of testing for the coefficient of the merit sample and the coefficient of the poverty sample being equal. 33 Table 5: Socioemotional Outcomes SDQ Big 5 Family index Prosocial Internalizing Externalizing Openness Conscientiousness Extraversion Agreeableness Neuroticism (1) (2) (3) (4) (5) (6) (7) (8) (9) Panel A. Merit Treatment -0.0390 -0.0770 0.00200 0.0300 -0.0910 0.00500 -0.0720 0.0580 -0.00500 (0.066) (0.059) (0.070) (0.067) (0.062) (0.077) (0.066) (0.075) (0.065) Observations 813 812 812 813 812 813 811 814 807 R-squared 0.0790 0.130 0.0950 0.0900 0.0600 0.0960 0.0950 0.0700 0.102 F-statistic 2.180 6.140 2.430 3.540 2 2.820 2.530 2.110 3.550 Covariates Yes Yes Yes Yes Yes Yes Yes Yes Yes Control mean -0.0100 0 0 0.0400 0 0.0100 0.0300 -0.0400 0.0300 Joint significance: Ho: all coef. =0 Chi-square 0.370 p-value 0.543 DR Panel B. Poverty Treatment -0.00400 -0.0420 0.0530 0.0110 -0.00200 -0.0600 -0.0780 0.186*** -0.0990 (0.078) (0.087) (0.072) (0.074) (0.077) (0.084) (0.078) (0.071) (0.074) 34 Observations 751 753 753 753 752 752 752 753 749 R-squared 0.0960 0.129 0.0960 0.0720 0.0980 0.0910 0.0640 0.0840 0.107 F-statistic 2.520 5.250 2.960 2.350 6.250 2.200 1.510 2.950 2.890 Covariates Yes Yes Yes Yes Yes Yes Yes Yes Yes Control mean -0.0500 0 -0.0400 0 0.0200 0.0100 0.0100 -0.0300 0.0100 A Joint significance: Ho: all coef. =0 Chi-square p-value Joint test: Poverty vs. Merit 0.003 0.957 Chi-square 0.175 0.161 0.339 0.0490 1.103 0.437 0.00400 2.269 1.190 p-value 0.676 0.688 0.560 0.824 0.294 0.509 0.948 0.132 0.275 Notes: Estimated treatment effects. The dependent variable in column (1) is the score from 0 to 10 on the pro-social facet (the higher the score, the more pro-social) of the Strength and Difficulty Questionnaire (SDQ), standardized. In column (2), the dependent variables is the score from 0 to 20 on the internalizing behavior facet (the higher the score, the more externalizing behavior problems) of the SDQ, standardized. In column (3), the dependent variables is the score from 0 to 20 on the externalizing behavior facet (the higher the score, the more externalizing behavior problems) of the SDQ, standardized. In columns (4) to (8), the dependent variables are the scores from 3 to 15 on the Openness, Conscientiousness, Extroversion, Agreeableness and Neuroticism facets of the Big Five scale, standardized. In column (9), the family index is the inverse covariance matrix-weighted mean of the standardized dependent variables from the eight first columns following Anderson (2008) (scores from columns (2), (3), and (8) have been flipped beforehand). In column (10), the family index represents the first factor from an exploratory factor analysis (EFA) with quartimin rotation, on the same set of variables as in (9). All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** FT p<0.05, * p<0.1. The joint significance Chi-square (and corresponding p-value below) is a result of testing for the coefficients of individuals regressions being jointly equal to 0, using seemingly unrelated estimation. The poverty vs. merit Chi-square (and corresponding p-value below) is a result of testing for the coefficient of the merit sample and the coefficient of the poverty sample being equal. Table 6: Labor Outcomes Currently working Age started working Any training since 2011 Cog. demands of main Yearly earnings (inv. hy- Daily res. wage (inv. hy- work (1/0) perbolic sine, USD) perbolic sine, USD) (1) (2) (3) (4) (5) (6) Panel A. Merit Treatment 0.034* 0.0740 -0.0140 -0.0400 -0.294 0.0640 (0.020) (0.225) (0.037) (0.027) (0.188) (0.048) Observations 772 794 814 775 791 805 R-squared 0.106 0.0720 0.156 0.0990 0.102 F-statistic 1.940 1.910 4.830 2.570 3.710 3.830 Covariates Yes Yes Yes Yes Yes Yes Control mean 0.923 12.72 0.595 0.174 7.620 2.250 DR Panel B. Poverty Treatment 0.0120 0.339 -0.0310 0.0270 -0.382 0.0370 (0.019) (0.235) (0.036) (0.030) (0.239) (0.054) 35 Observations 712 726 753 713 732 746 R-squared 0.0870 0.0910 0.165 0.0900 0.118 F-statistic 2.850 1.980 7.100 2.080 959.6 2.170 Covariates Control mean A Yes 0.930 Yes 12.55 Yes 0.590 Yes 0.146 Yes 7.608 Yes 2.219 Notes: Estimated treatment effects. The dependent variable in column (1) is a dummy equal to 1 if the individual is currently working, i.e. she worked for at least 1 hour during the last week or has a job at the moment but did not work during the last week. In column (2), the dependent variable is the age at which the individual started to work. In column (3), the dependent variable is a dummy equal to 1 if the individual participated in any formal or informal training that lasted at least one week, since 2011. In column (4), the dependent variable is a dummy equal to 1 if the main work activity demands cognitive ability (read, write, calculate, or use a computer) and 0 otherwise. In column (5), the dependent variable is the yearly earning expressed in US dollars and transformed using an inverse hyperbolic sine. In column (6), the dependent variable is the daily reservation wage in US dollars and transformed using an inverse hyperbolic sine. In column (4), values for respondents who did not work have been imputed with 0, except if they were students. In columns (1) and (4), the sample is restricted to respondents who are not currently students. Column (2) includes everyone who ever worked. Column (4) includes only people who worked over the past 12 months. Columns (3) and (6) include the entire sample. Column (1), (2), (3), (4), and (6) are estimated using OLS regression; Column (5) is estimated using Tobit regression. All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. FT Table 7: Well-being Outcomes SES Ladder (vil- SES Index (IRT) Life Satisfaction Quality of Health Quality of Life Health Issue In- Family index lage) dex (GHQ) (1) (2) (3) (4) (5) (6) (7) Panel A. Merit Treatment 0.173** 0.186** 0.0570 0.129** 0.0410 -0.0260 0.174*** (0.068) (0.073) (0.067) (0.058) (0.058) (0.070) (0.065) Observations 814 814 814 814 814 804 804 R-squared 0.122 0.275 0.111 0.104 0.0970 0.0950 0.182 F-statistic 3.880 13.06 2.740 3.760 2.450 3.200 6.740 Covariates Yes Yes Yes Yes Yes Yes Yes Control mean 0.0100 0.0200 -0.0300 -0.0200 0 -0.0200 0 Joint significance: Ho: all coef. =0 Chi-square 7.041 p-value 0.00800 Panel B. Poverty DR Treatment 0.208*** -0.0640 0.0380 0.0620 0.0250 0.0540 0.0410 (0.073) (0.084) (0.071) (0.078) (0.073) (0.081) (0.085) Observations 753 753 752 753 753 744 743 36 R-squared 0.120 0.177 0.104 0.0730 0.0920 0.114 0.120 F-statistic 7.400 4.950 2.450 4.180 4.110 4.280 4.620 Covariates Yes Yes Yes Yes Yes Yes Yes Control mean -0.0900 -0.0800 -0.0300 0.0300 0.0200 -0.0500 -0.0300 A Joint significance: Ho: all coef. =0 Chi-square p-value 8.864 0.00300 Joint test: Poverty vs. Merit Chi-square 0.179 6.899 0.0480 0.621 0.0400 0.719 1.903 p-value 0.673 0.00900 0.827 0.431 0.841 0.397 0.168 Notes: Estimated treatment effects. The dependent variable in column (1) is the score from 1 to 10 on an economic ladder as compared to people of the same age in the village, standard- ized. In column (2), the dependent variables is a socio-economic index constructed based on asset ownership computed using Item Response Theory with a two parameter logistic model, standardized. In column (3), the dependent variable is the score from 1 to 10 on a life satisfaction question, standardized. In column (4), the dependent variable is the score from 1 to 5 on a health quality question, standardized. In column (5), the dependent variable is the score from 1 to 5 on a life quality question, standardized. In column (6), the dependent variable is the standardized score on the General Health Questionnaire. In column (7), the family index is the inverse covariance matrix-weighted mean of the standardized dependent variables from all the previous columns following Anderson (2008). All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from FT baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarships (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. The joint significance Chi-square (and corresponding p-value below) is a result of testing for the coefficients of individuals regressions being jointly equal to 0, using seemingly unrelated estimation. The poverty vs. merit Chi-square (and corresponding p-value below) is a result of testing for the coefficient of the merit sample and the coefficient of the poverty sample being equal. Table 8: Heterogeneous Treatment Effects by Treatment Label Family index: Family index: Family index: Family index: Daily res. wage Education Cognition Socioemotional SES/Well-being (inv. hyperbolic sine, USD) (1) (2) (3) (4) (5) Panel A. Merit Treatment 0.130 0.211** 0.0770 0.139 0.173*** (0.094) (0.094) (0.099) (0.084) (0.065) Below the median poverty score 0.0850 0.0340 0.105 0.0420 0.0350 (0.111) (0.118) (0.126) (0.114) (0.076) Below the median poverty score and treatment 0.00900 -0.214 -0.175 0.0810 -0.240** (0.128) (0.139) (0.157) (0.120) (0.092) Observations 814 813 807 804 805 R-squared 0.168 0.228 0.104 0.183 0.110 F-statistic 5.200 8.120 3.530 6.880 3.960 Covariates Yes Yes Yes Yes Yes Control mean -0.0200 0.0700 0 -0.280 2.260 DR Panel B. Poverty Treatment 0.199* 0.0560 -0.0180 -0.00500 0.0150 (0.103) (0.090) (0.105) (0.097) (0.062) 37 Below the median test score -0.285*** -0.0720 0.0480 0.0430 0.0740 (0.093) (0.103) (0.146) (0.136) (0.084) Below the median test score and treatment 0.159 -0.140 -0.211 0.128 0.0630 (0.114) (0.131) (0.179) (0.165) (0.110) Observations R-squared F-statistic Covariates A 753 0.183 4.040 Yes 752 0.200 4.400 Yes 749 0.110 2.830 Yes 743 0.122 4.370 Yes 746 0.122 2.330 Yes Control mean -0.160 -0.0300 -0.0200 -0.310 2.220 Joint test: Poverty vs. Merit Chi-square 1.070 0.213 0.0310 0.0630 5.563 p-value 0.301 0.644 0.859 0.802 0.0180 Notes: Estimated treatment effects. The dependent variables in columns (1) to (3) and (5) are the family indices from Tables 3 to 5 and 7. Column (4) is the same variable as in Table 6. Treatment captures effects for students who would have qualified for a scholarship under either scheme. Below the median poverty score are individuals who qualify under the merit-based scheme but would not have received a poverty-based scholarship. Below the median test score are individuals who qualify under the poverty-based scheme but would not have received a merit-based scholarship. All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level FT variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty schol- arship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. The Chi-square (and corresponding p-value below) is the result of testing the equality between the interaction term from the merit sample and interaction term from the poverty sample. Table 9: Heterogeneous Treatment Effects by Gender Family index: Family index: Family index: Family index: Daily res. wage Education Cognition Socioemotional SES/Well-being (inv. hyperbolic sine, USD) (1) (2) (3) (4) (5) Panel A. Merit Treatment 0.142* 0.133 -0.0180 0.188** 0.104 (0.085) (0.096) (0.096) (0.084) (0.077) Female -0.00700 -0.337*** -0.195 -0.199* -0.0200 (0.114) (0.110) (0.127) (0.118) (0.091) Female and treatment -0.0220 -0.0410 0.0260 -0.0270 -0.0820 (0.115) (0.131) (0.146) (0.127) (0.106) Observations 814 813 807 804 805 R-squared 0.167 0.225 0.102 0.182 0.103 F-statistic 4.910 8.520 3.480 6.710 3.690 Covariates Yes Yes Yes Yes Yes DR Control mean 0.130 0.0700 0.0300 0 2.250 Panel B. Poverty Treatment 0.257*** 0.168* -0.0860 0.0420 0.118 38 (0.096) (0.098) (0.097) (0.119) (0.081) Female -0.146 -0.254** -0.289** -0.247 -0.118 (0.132) (0.117) (0.119) (0.163) (0.089) Female and treatment 0.0120 -0.309** -0.0250 -0.00100 -0.154 Observations R-squared F-statistic A (0.133) 753 0.174 3.690 (0.141) 752 0.202 4.380 (0.136) 749 0.107 2.820 (0.168) 743 0.120 4.510 (0.101) 746 0.121 2.060 Covariates Yes Yes Yes Yes Yes Control mean -0.0100 -0.0300 0.0100 -0.0300 2.220 Joint test: Poverty vs. Merit Chi-square 0.0670 2.891 0.103 0.0220 0.345 p-value 0.796 0.0890 0.748 0.882 0.557 Notes: Estimated treatment effects. The dependent variables in columns (1) to (4) are the family indices from Tables 3 to 5 and 7. Column (5) is the same variable as in Table 6. All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at base- line. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. The Chi-square (and corresponding p-value below) is the result of testing the equality between the FT interaction term from the merit sample and interaction term from the poverty sample. References Abadie, A., S. Athey, G. Imbens, and J. Wooldridge (2017, November). When Should You Adjust Standard Errors for Clustering? Technical Report w24003, National Bureau of Economic Research, Cambridge, MA. Acevedo, P., G. Cruces, P. Gertler, and S. Martínez (2016, May). Soft Skills and Hard Skills in Youth Training Programs. Long Term Experimental Evidence from the Dominican Republic. Anderson, K. H., J. E. Foster, and D. E. Frisvold (2009, February). Investing FT in Health: The Long-Term Impact of Head Start on Smoking. Economic In- quiry 48 (3), 587–602. Anderson, M. L. (2008, December). Multiple Inference and Gender Differences in the Effects of Early Intervention: A Reevaluation of the Abecedarian, Perry Preschool, and Early Training Projects. Journal of the American Statistical Asso- ciation 103 (484), 1481–1495. A Araujo, M. C., M. Bosch, and N. Schady (2016, September). Can Cash Transfers Help Households Escape an Inter-Generational Poverty Trap? Working Paper 22670, National Bureau of Economic Research. DR Arrow, K. J. (1973, July). Higher Education as a Filter. Journal of Public Eco- nomics 2 (3), 193–216. Bailey, D., G. J. Duncan, C. L. Odgers, and W. Yu (2017, January). Persistence and Fadeout in the Impacts of Child and Adolescent Interventions. Journal of Research on Educational Effectiveness 10 (1), 7–39. Baird, S., F. H. Ferreira, B. Özler, and M. Woolcock (2014, January). Conditional, Unconditional and Everything in Between: A Systematic Review of the Effects of Cash Transfer Programmes on Schooling Outcomes. Journal of Development Effectiveness 6 (1), 1–43. 39 Barham, T., K. Macours, and J. A. Maluccio (2013, May). Boys’ Cognitive Skill Formation and Physical Growth: Long-Term Experimental Evidence on Critical Ages for Early Childhood Interventions. The American Economic Review 103 (3), 467–471. Barrera-Osorio, F. and D. Filmer (2016, April). Incentivizing Schooling for Learning: Evidence on the Impact of Alternative Targeting Approaches. Journal of Human Resources 51 (2), 461–499. Becker, G. S. (2009). Human Capital: A Theoretical and Empirical Analysis, with Special Reference to Education (3rd ed.). Chicago: University of Chicago Press. FT Belotti, F., P. Deb, W. G. Manning, and E. C. Norton (2015). twopm: Two-Part Models. Stata Journal 15 (1), 3–20. Berry, J. (2015, October). Child Control in Education Decisions An Evaluation of Targeted Incentives to Learn in India. Journal of Human Resources 50 (4), 1051– 1080. A Birnbaum, A. (1968). Some Latent Trait Models and Their Use in Inferring an Examinee’s Ability. Statistical Theories of Mental Test Scores . Blazar, D. (2017, October). Validating Teacher Effects on Students’ Attitudes and DR Behaviors: Evidence From Random Assignment of Teachers to Students. Educa- tion Finance and Policy , 1–52. Blazar, D. and M. A. Kraft (2017, March). Teacher and Teaching Effects on Students’ Attitudes and Behaviors. Educational Evaluation and Policy Analysis 39 (1), 146– 170. Blimpo, M. P. (2014). Team Incentives for Education in Developing Countries: A Randomized Field Experiment in Benin. American Economic Journal: Applied Economics 6 (4), 90–109. Bock, R. D. and R. J. Mislevy (1982). Adaptive EAP Estimation of Ability in a Microcomputer Environment. Applied psychological measurement 6 (4), 431–444. 40 Brudevold-Newman, A. (2016, November). The Impacts of Free Secondary Educa- tion: Evidence from Kenya. Carneiro, P. and R. Ginja (2014). Long-Term Impacts of Compensatory Preschool on Health and Behavior: Evidence from Head Start. American Economic Journal: Economic Policy 6 (4), 135–173. Chetty, R., J. N. Friedman, N. Hilger, E. Saez, D. W. Schanzenbach, and D. Yagan (2011, November). How Does Your Kindergarten Classroom Affect Your Earnings? Evidence from Project Star. The Quarterly Journal of Economics 126 (4), 1593– 1660. FT Chetty, R., J. N. Friedman, and J. E. Rockoff (2014, September). Measuring the Impacts of Teachers II: Teacher Value-Added and Student Outcomes in Adulthood. American Economic Review 104 (9), 2633–2679. Claro, S., D. Paunesku, and C. S. Dweck (2016, August). Growth mindset tem- pers the effects of poverty on academic achievement. Proceedings of the National A Academy of Sciences 113 (31), 8664–8668. Cruz, R. C. d. S., L. B. A. d. Moura, and J. J. Soares Neto (2017, August). Condi- tional cash transfers and the creation of equal opportunities of health for children in low and middle-income countries: a literature review. International Journal for DR Equity in Health 16, 161. Currie, J. and D. Thomas (2000). School Quality and the Longer-Term Effects of Head Start. Journal of Human Resources 35 (4), 755–74. Danon, A., A. de Barros, F. Barrera-Osorio, and D. Filmer (2018, February). Unable to foster skills or unable to measure them? Evidence on the measurement of non- cognitive skills in developing countries. Deming, D. (2009, June). Early Childhood Intervention and Life-Cycle Skill De- velopment: Evidence from Head Start. American Economic Journal: Applied Economics 1 (3), 111–134. 41 Deming, D. (2017, November). The Growing Importance of Social Skills in the Labor Market. The Quarterly Journal of Economics 132 (4), 1593–1640. Doyle, A. M., H. A. Weiss, K. Maganja, S. Kapiga, S. McCormack, D. Watson- Jones, J. Changalucha, R. J. Hayes, and D. A. Ross (2011, September). The Long-Term Impact of the MEMA kwa Vijana Adolescent Sexual and Reproductive Health Intervention: Effect of Dose and Time since Intervention Exposure. PLOS ONE 6 (9), e24866. Duckworth, A. L. and P. D. Quinn (2009, February). Development and Validation of the Short Grit Scale (Grit–S). Journal of Personality Assessment 91 (2), 166–174. FT Duflo, E., P. Dupas, and M. Kremer (2017, April). The Impact of Free Secondary Education: Experimental Evidence from Ghana. Dweck, C. S. (2000). Self-theories: their role in motivation, personality, and devel- opment. Essays in social psychology. Philadelphia, Pa.: Psychology Press. OCLC: 247496747. A Dynarski, S., J. Hyman, and D. W. Schanzenbach (2013, September). Experimental Evidence on the Effect of Childhood Investments on Postsecondary Attainment and Degree Completion. Journal of Policy Analysis and Management 32 (4), 692– 717. DR Fabregas, R. (2017). A Better School but a Worse Position? The Effects of Marginal School Admissions in Mexico City. Filmer, D. and N. Schady (2008, April). Getting Girls into School: Evidence from a Scholarship Program in Cambodia. Economic Development and Cultural Change 56 (3), 581–617. Filmer, D. and N. Schady (2014). The Medium-Term Effects of Scholarships in a Low-Income Country. Journal of Human Resources 49 (3), 663–694. Filmer, D. and K. Scott (2012, February). Assessing Asset Indices. Demogra- phy 49 (1), 359–392. 42 Fiske, D. W. (1949). Consistency of the factorial structures of personality ratings from different sources. The Journal of Abnormal and Social Psychology 44 (3), 329–344. Fiszbein, A. and N. R. Schady (2009, February). Conditional Cash Transfers: Re- ducing Present and Future Poverty. World Bank Publications. Google-Books-ID: aunlBU_2FsYC. Friedman, W., M. Kremer, E. Miguel, and R. Thornton (2011, April). Education as Liberation? Technical Report w16939, National Bureau of Economic Research, FT Cambridge, MA. Frisvold, D. E. and J. C. Lumeng (2011, March). Expanding Exposure Can Increasing the Daily Duration of Head Start Reduce Childhood Obesity? Journal of Human Resources 46 (2), 373–402. Fryer Jr, R. G. (2011). Financial incentives and student achievement: Evidence from randomized trials. The Quarterly Journal of Economics 126 (4), 1755–1798. A Garces, E., D. Thomas, and J. Currie (2002). Longer-Term Effects of Head Start. The American Economic Review 92 (4), 999–1012. García, S. and J. E. Saavedra (2017, October). Educational Impacts and Cost- DR Effectiveness of Conditional Cash Transfer Programs in Developing Countries: A Meta-Analysis. Review of Educational Research 87 (5), 921–965. Gertler, P., J. Heckman, R. Pinto, A. Zanolini, C. Vermeersch, S. Walker, S. Chang, and S. Grantham-McGregor (2013, June). Labor Market Returns to Early Child- hood Stimulation: A 20-year Followup to an Experimental Intervention in Jamaica. Technical Report w19185, National Bureau of Economic Research, Cambridge, MA. Glennerster, R. (2017). The Practicalities of Running Randomized Evaluations: Partnerships, Measurement, Ethics, and Transparency. In A. V. Banerjee and 43 E. Duflo (Eds.), Handbook of Economic Field Experiments, Volume 1, pp. 175– 243. Elsevier. Goldberg, D. and P. Williams (2006). A user’s guide to the General Health Ques- tionnaire. GL assessment. Goldberg, L. R. (1981). Language and individual differences: The search for uni- versals in personality lexicons. Review of personality and social psychology 2 (1), 141–165. Goodman, R. (1997, July). The Strengths and Difficulties Questionnaire: A Research FT Note. Journal of Child Psychology and Psychiatry 38 (5), 581–586. Hamoudi, A. and M. Sheridan (2015, November). Unpacking the Black Box of Cognitive Ability. A novel tool for assessment in a population based survey. Heckman, J. J. and T. Kautz (2014, January). Fostering and Measuring Skills: Interventions that Improve Character and Cognition. In J. J. Heckman, J. E. Humphries, and T. Kautz (Eds.), The Myth of Achievement Tests: The GED A and the Role of Character in American Life, pp. 341–430. Chicago: University of Chicago Press. Google-Books-ID: gJGPAgAAQBAJ. Heckman, J. J., S. H. Moon, R. Pinto, P. Savelyev, and A. Yavitz (2010). A New DR Cost-Benefit and Rate of Return Analysis for the Perry Preschool Program: A Summary. In A. J. Reynolds, A. J. Rolnick, M. M. Englund, and J. A. Temple (Eds.), Childhood Programs and Practices in the First Decade of Life, pp. 366–380. Cambridge: Cambridge University Press. Jackson, C. K. (2018, June). What Do Test Scores Miss? The Importance of Teacher Effects on Non-Test Score Outcomes. Journal of Political Economy . Jackson, C. K., J. E. Rockoff, and D. O. Staiger (2014). Teacher Effects and Teacher- Related Policies. Annual Review of Economics 6 (1), 801–825. 44 Jakiela, P., E. Miguel, and V. L. te Velde (2015, September). You’ve earned it: estimating the impact of human capital on social preferences. Experimental Eco- nomics 18 (3), 385–407. Kling, J. R., J. B. Liebman, and L. F. Katz (2007, January). Experimental Analysis of Neighborhood Effects. Econometrica 75 (1), 83–119. Kraft, M. A. (2017). Teacher Effects on Complex Cognitive Skills and Social- Emotional Competencies. Journal of Human Resources , 0916–8265R3. Kremer, M., E. Miguel, and R. Thornton (2009, July). Incentives to Learn. The FT Review of Economics and Statistics 91 (3), 437–456. Krishnan, P. and S. Krutikova (2013, October). Non-cognitive skill formation in poor neighbourhoods of urban India. Labour Economics 24, 68–85. Kyllonen, P. C. and J. P. Bertling (2013). Innovative questionnaire assessment meth- ods to increase cross-country comparability. In L. Rutkowski, M. von Davier, and D. Rutkowski (Eds.), Handbook of international large-scale assessment: Back- A ground, technical issues, and methods of data analysis, pp. 277–285. London: Chapman & Hall. Laajaj, R. and K. Macours (2017, March). Measuring skills in developing countries. DR Working Paper WPS8000, The World Bank, Washington, D.C. Lang, F. R., D. John, O. Lüdtke, J. Schupp, and G. G. Wagner (2011, June). Short assessment of the Big Five: robust across survey methods except telephone inter- viewing. Behavior Research Methods 43 (2), 548–567. Li, T., L. Han, L. Zhang, and S. Rozelle (2014, March). Encouraging classroom peer interactions: Evidence from Chinese migrant schools. Journal of Public Eco- nomics 111 (Supplement C), 29–45. Ludwig, J. and D. L. Miller (2007, February). Does Head Start Improve Children’s Life Chances? Evidence from a Regression Discontinuity Design. The Quarterly Journal of Economics 122 (1), 159–208. 45 McCrae, R. R. and P. T. Costa (1987). Validation of the five-factor model of per- sonality across instruments and observers. Journal of Personality and Social Psy- chology 52 (1), 81–90. Molina-Millan, T., T. Barham, K. Macours, J. A. Maluccio, and M. Stampini (2016, October). Long-term Impacts of Conditional Cash Transfers in Latin America: Re- view of the Evidence. Working Paper IDB-WP-732, Inter-American Development Bank, Washington, D.C. National Institute of Statistics, Ministry of Planning (2010). 2008 Census Cambodi- FT aRedatam+SP. Norman, W. T. (1967, April). 2800 personality trait descriptors - normative operat- ing characteristics for a university population. Technical Report UM-00310-1-T, Michigan University, Ann Arbor. Ozier, O. (2016, December). The Impact of Secondary Schooling in Kenya: A Re- gression Discontinuity Analysis. Journal of Human Resources . A Parker, S. W. and T. Vogl (2018, February). Do Conditional Cash Transfers Improve Economic Outcomes in the Next Generation? Evidence from Mexico. Working Paper 24303, National Bureau of Economic Research. DR Pritchett, L. (2013). The Rebirth of Education: Schooling Ain’t Learning. Washing- ton, D.C.: Brookings Institution Press for Center for Global Development. Protzko, J. (2015, November). The Environment in Raising Early Intelligence: A Meta-Analysis of the Fadeout Effect. Intelligence 53, 202–210. Quek, K. F., W. Y. Low, A. H. Razack, and C. S. Loh (2001, October). Reliability and validity of the General Health Questionnaire (GHQ-12) among urological patients: A Malaysian study. Psychiatry and Clinical Neurosciences 55 (5), 509–513. Samejima, F. (1969). Estimation of Latent Ability Using a Response Pattern of Graded Scores. Psychometrika 34 (4), Part 2. 46 Santorella, E. (2017, November). Multi-Dimensional Teacher Effects. Silva, I. D. and S. Sumarto (2015, October). How do Educational Transfers Affect Child Labour Supply and Expenditures? Evidence from Indonesia of Impact and Flypaper Effects. Oxford Development Studies 43 (4), 483–507. Smith, G. M. (1967, March). Personality correlates of cigarette smoking in students of college age. Annals of the New York Academy of Sciences 142 (1), 308–321. Snilstveit, B., J. Stevenson, D. Phillips, M. Vojtkova, E. Gallagher, T. Schmidt, H. Jobse, M. Geelen, M. G. Pastorello, and J. Eyers (2015). Interventions for FT improving learning outcomes and access to education in low-and middle-income countries: a systematic review. Technical report, International Initiative for Im- pact Evaluation, London. Sparrow, R. (2007, February). Protecting Education for the Poor in Times of Cri- sis: An Evaluation of a Scholarship Programme in Indonesia. Oxford Bulletin of Economics and Statistics 69 (1), 99–122. A Spence, M. (1973, August). Job Market Signaling. The Quarterly Journal of Eco- nomics 87 (3), 355–374. Stocking, M. L. and F. M. Lord (1983, April). Developing a Common Metric in Item DR Response Theory. Applied Psychological Measurement 7 (2), 201–210. The World Bank (2017). Learning to Realize Education’s Promise. World Develop- ment Report 2018. Washington, D.C.: The World Bank. OCLC: 992735784. van der Linden, W. J. and P. J. Pashley (2010). Item Selection and Ability Estimation in Adaptive Testing. In W. J. van der Linden and C. A. W. Glas (Eds.), Elements of adaptive testing, Statistics for social and behavioral sciences, pp. 3–30. New York: Springer. OCLC: ocn465370134. Vivalt, E. (2017, September). How Much Can We Generalize From Impact Evalua- tions? Job market paper, Australian National University, Canberra, Australia. 47 Walker, S. P., S. M. Chang, C. A. Powell, E. Simonoff, and S. M. Grantham- McGregor (2007, November). Early Childhood Stunting Is Associated with Poor Psychological Functioning in Late Adolescence and Effects Are Reduced by Psy- chosocial Stimulation. The Journal of Nutrition 137 (11), 2464–2469. West, M. R., M. A. Kraft, A. S. Finn, R. E. Martin, A. L. Duckworth, C. F. O. Gabrieli, and J. D. E. Gabrieli (2016, March). Promise and Paradox: Measuring Students’ Non-Cognitive Skills and the Impact of Schooling. Educational Evalua- tion and Policy Analysis 38 (1), 148–170. A FT DR 48 Appendix: Additional checks of validity and robust- ness of findings A FT DR 49 Table A1: Balance at Baseline Merit Scholarship Poverty Scholarship n All Treatment Control Difference n All Treatment Control Difference (1) (2) (3) (4) (5) (1) (2) (3) (4) (5) Female 767 0.502 0.505 0.499 0.006 728 0.593 0.646 0.533 0.098*** (0.5) (0.501) (0.501) (0.036) (0.492) (0.479) (0.5) (0.035) Number of minors 749 1.689 1.645 1.736 -0.105 712 1.815 1.845 1.779 0.075 (1.111) (1.102) (1.12) (0.128) (1.092) (1.059) (1.129) (0.127) Own motorcycle 756 0.425 0.423 0.426 0.003 723 0.249 0.229 0.271 -0.022 (0.495) (0.495) (0.495) (0.055) (0.433) (0.421) (0.445) (0.042) Own car/truck 743 0.152 0.177 0.125 0.035 726 0.039 0.047 0.029 0.008 (0.359) (0.382) (0.332) (0.036) (0.193) (0.211) (0.169) (0.022) Own oxen/buffalo 762 0.56 0.58 0.539 0.017 723 0.448 0.514 0.372 0.087* (0.497) (0.494) (0.499) (0.058) (0.498) (0.5) (0.484) (0.052) Own pig 764 0.581 0.604 0.557 0.034 724 0.493 0.554 0.423 0.097* (0.494) (0.49) (0.497) (0.05) (0.5) (0.498) (0.495) (0.056) Own ox or buffalo cart 753 0.3 0.307 0.293 -0.011 715 0.218 0.26 0.171 0.054 (0.459) (0.462) (0.456) (0.047) (0.413) (0.439) (0.377) (0.047) Hard roof 752 0.54 0.581 0.496 0.078 714 0.36 0.36 0.36 -0.013 DR (0.499) (0.494) (0.501) (0.052) (0.48) (0.481) (0.481) (0.052) Hard wall 764 0.565 0.563 0.568 -0.015 723 0.408 0.416 0.399 0.002 (0.496) (0.497) (0.496) (0.053) (0.492) (0.494) (0.49) (0.062) 50 Hard floor 753 0.878 0.921 0.831 0.075** 720 0.804 0.825 0.78 0.012 (0.328) (0.271) (0.375) (0.035) (0.397) (0.381) (0.415) (0.046) Have automatic toilet 748 0.053 0.049 0.058 -0.01 720 0.008 0.003 0.015 -0.012 (0.225) (0.217) (0.234) (0.023) (0.091) (0.051) (0.122) (0.01) Have pit toilet Electricity A 748 766 0.136 (0.343) 0.238 (0.426) 0.135 (0.342) 0.225 (0.418) 0.138 (0.345) 0.251 (0.434) 0.005 (0.042) -0.026 (0.047) 720 725 0.118 (0.323) 0.157 (0.364) 0.119 (0.324) 0.145 (0.353) 0.117 (0.322) 0.171 (0.377) 0.013 (0.036) -0.024 (0.041) Piped water 760 0.047 0.046 0.049 -0.001 719 0.019 0.016 0.024 -0.01 (0.213) (0.21) (0.216) (0.022) (0.138) (0.125) (0.152) (0.013) Poverty Index (0-292) 795 211.234 205.245 217.841 -9.864 731 244.015 242.133 246.167 -1.11 (60.108) (66.176) (51.899) (8.175) (33.115) (34.078) (31.892) (4.479) Test score (0-25) 795 19.814 19.832 19.794 0.134 731 18.275 18.633 17.865 0.951 (3.036) (2.851) (3.232) (0.467) (4.753) (4.784) (4.691) (0.684) FT Joint significance: Ho: all coef. =0 Chi-square 18.62 15.99 p-value 0.29 0.45 Notes: Column (1) presents the number of observations in the analysis sample (excluding observations with imputed baseline information). Columns (2) to (4) display the means for the full sample, the treatment group, and the control group, respectively. Standard deviations in parentheses. Column (5) is the difference between the treatment group mean and the control group mean. Differences in means are computed by OLS regression, controlling for province fixed effects. Standard errors in parentheses are clustered at the school level. *** p<0.01, ** p<0.05, * p<0.1. The Chi-square (and corresponding p-value below) is the result of a test testing for the individual coefficients being jointly equal to 0 using seemingly unrelated estimation. Table A2: Balance at Three-year Follow-up Merit Scholarship Poverty Scholarship n All Treatment Control Difference n All Treatment Control Difference (1) (2) (3) (4) (5) (1) (2) (3) (4) (5) Female 735 0.495 0.505 0.484 0.022 660 0.577 0.64 0.505 0.123*** (0.5) (0.501) (0.5) (0.034) (0.494) (0.481) (0.501) (0.034) HH size 735 7.117 7.204 7.02 0.296 660 7.071 6.858 7.316 -0.34 (2.572) (2.502) (2.647) (0.238) (2.397) (2.149) (2.636) (0.235) Own motorcycle 735 0.645 0.624 0.669 -0.007 660 0.536 0.521 0.554 0.02 (0.479) (0.485) (0.471) (0.041) (0.499) (0.5) (0.498) (0.037) Own car/truck 735 0.022 0.021 0.023 0.001 660 0.024 0.014 0.036 -0.015 (0.146) (0.142) (0.15) (0.012) (0.154) (0.118) (0.186) (0.013) Own oxen/buffalo 735 0.574 0.613 0.53 0.048 660 0.526 0.578 0.466 0.076 (0.495) (0.488) (0.5) (0.045) (0.5) (0.495) (0.5) (0.051) Own pig 735 0.608 0.649 0.562 0.079 660 0.586 0.618 0.55 0.054 (0.488) (0.478) (0.497) (0.049) (0.493) (0.487) (0.498) (0.057) DR Own ox or buffalo cart 735 0.253 0.263 0.242 -0.002 660 0.239 0.283 0.189 0.074 (0.435) (0.441) (0.429) (0.049) (0.427) (0.451) (0.392) (0.049) Hard roof 735 0.849 0.84 0.859 -0.009 660 0.77 0.737 0.808 -0.061 51 (0.358) (0.367) (0.349) (0.031) (0.421) (0.441) (0.395) (0.041) Hard floor 734 0.977 0.987 0.965 0.019 659 0.97 0.974 0.964 0 (0.151) (0.113) (0.183) (0.012) (0.172) (0.158) (0.186) (0.015) Have pit toilet Electricity Piped water A 735 735 735 0.049 (0.216) 0.427 (0.495) 0.287 0.039 (0.193) 0.438 (0.497) 0.289 0.061 (0.239) 0.415 (0.493) 0.285 -0.012 (0.025) 0.022 (0.051) -0.008 660 660 660 0.055 (0.227) 0.352 (0.478) 0.306 0.051 (0.22) 0.351 (0.478) 0.28 0.059 (0.235) 0.352 (0.478) 0.336 -0.007 (0.026) 0.005 (0.053) -0.069 (0.453) (0.454) (0.452) (0.054) (0.461) (0.45) (0.473) (0.057) SES Index (2PL) 735 0.187 0.198 0.175 0.065 660 -0.098 -0.128 -0.063 0.004 (0.85) (0.891) (0.804) (0.09) (0.82) (0.836) (0.802) (0.075) Joint significance: Ho: all coef. =0 Chi-square 13.39 30.74 p-value 0.42 <.01 FT Notes: Column (1) presents the number of observations in the analysis sample (excluding observations with imputed baseline information). Colums (2) to (4) display the means for the full sample, the treatment group, and the control group, respectively. Standard deviations in parentheses. Column (5) is the difference between the treatment group mean and the control group mean. Differences in means are computed by OLS regression, control- ling for province fixed effects. Standard errors in parentheses are clustered at the school level. *** p<0.01, ** p<0.05, * p<0.1. The Chi-square (and corresponding p-value below) is the result of a test testing for the individual coefficients being jointly equal to 0 using seemingly unrelated estimation.