WPS6496


Policy Research Working Paper                         6496




     Impact Evaluation of Conflict Prevention
        and Peacebuilding Interventions
                                      Marie Gaarder
                                      Jeannie Annan




The World Bank
Independent Evaluation Group
Public Sector Evaluation Department
June 2013
Policy Research Working Paper 6496


  Abstract
  The international community is paying increased                                   of violent conflict, and it presents some examples from
  attention to the 25 percent of the world’s population                             a collection of impact evaluations of conflict prevention
  that lives in fragile and conflict affected settings,                             and peacebuilding interventions. The paper examines
  acknowledging that these settings represent daunting                              the practices of impact evaluators in the peacebuilding
  development challenges. To deliver better results on the                          sector to see how they address evaluation design, data
  ground, it is necessary to improve the understanding                              collection, and conflict analysis. Finally, it argues that
  of the impacts and effectiveness of development                                   such evaluations are crucial for testing assumptions about
  interventions operating in contexts of conflict and                               how development interventions affect change—the
  fragility. This paper argues that it is both possible and                         so-called “theory of change�?—which is important for
  important to carry out impact evaluations even in settings                        understanding the results on the ground.




  This paper is a product of the Public Sector Evaluation Department, Independent Evaluation Group. It is part of a larger
  effort by the World Bank to provide open access to its research and make a contribution to development policy discussions
  around the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors
  may be contacted at mgaarder@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
             Impact evaluation of conflict prevention and
                     peacebuilding interventions
                                   Marie Gaarder and Jeannie Annan1




Keywords: Impact evaluation, conflict prevention, peacebuilding, conflict analysis, fragile states,
ex-combatants, experimental design, quasi-experimental design, encouragement design

JEL Classification: C93, D03, D74, H43, O22

Sector Board: Social Protection




1 Gaarder: Independent Evaluation Group (IEG), World Bank, mgaarder@worldbank.org; Annan, International Rescue
Committee, jeannie.annan@rescue.org. We would like to thank a number of people for their support. Ole Winckler
Andersen, Denmark’s development cooperation (Danida), Beate Bull, Norwegian Agency for development cooperation
(Norad) and Megan Kennedy-Chouane, DAC Network on Development Evaluation, editors of the forthcoming book
Evaluation Methodologies for Aid in Conflict, have been a driving force behind this paper (an updated version of which will
appear as a book chapter). Christoph Zürcher, Professor at University of Ottawa, Cyrus Samii, Assistant Professor at
New York University, Macartan Humphreys, Professor at Columbia University and Julia Leininger, German
Development Institute have reviewed and provided invaluable comments to draft versions of the paper. Anette Brown,
Deputy Director and Daniela Barba, Research Assistant at the International Initiative for Impact Evaluation (3ie) kindly
gave us access to the database and literature survey which formed the basis for the Samii, Brown, and Kulma (2012)
survey. Finally, a number of the lead investigators of the impact evaluations discussed in the paper took the time to
respond to our survey and we are very grateful for being able to draw on their experiences.
1. Introduction

The international community is paying increased attention to the 25 percent of the world’s
population that lives in fragile and conflict affected settings, acknowledging that these settings
represent daunting development challenges. While rising levels of resources go into these
contexts, results have proven to be difficult to achieve and sustain -- no fragile state has yet to
reach any of the MDGs (OECD, 2012; WDR, 2011). To deliver better results on the ground, it
is necessary to improve the understanding of the impacts and effectiveness of development
interventions operating in contexts of conflict and fragility. While impact evaluations are
increasingly used as a tool to establish what works, why and under what circumstances in a
variety of development sectors, doubts have been voiced as to the feasibility and desirability of
carrying out impact evaluation —evaluation that accounts for the counterfactual in order to
attribute impact—in these situations. Some evaluators and practitioners in this field raise four
main concerns: (i) it is unethical to identify a comparison group in situations of conflict and
fragility; (ii) it is too operationally difficult to do so; (iii) impact evaluations do not address the
most important evaluation questions; and (iv) they are too costly. This paper argues that it is
both possible and important to carry out impact evaluations even in settings of violent conflict,
and it presents some examples from a collection of impact evaluations of conflict prevention
and peacebuilding interventions. The paper examines the practices of impact evaluators in the
peacebuilding sector to see how they address evaluation design, data collection, and conflict
analysis. Finally, it argues that such evaluations are crucial for testing assumptions about how
development interventions affect change—the so-called “theory of change�?—which is
important for understanding the results on the ground.



2. Defining impact evaluations

Impact evaluation, as defined in this paper, refers to evaluations that draw from a set of
methods designed to establish a counterfactual or valid comparison, to the intervention in
question. The objective is to measure the net impact of the intervention, which in theory is the
difference in outcomes for those receiving the intervention compared to what the outcomes
would be for the same participants without the intervention. Since it is not possible to measure
this difference in practice, impact evaluation methods are all designed to create a comparison
group that resembles the participant group as closely as possible. This methodology can be used
to explore attribution at any level throughout the results matrix, be it outputs or short- and long
term outcomes and impacts.

                                                                                                     2
Impact evaluation methods include experiments—randomized controlled trials (RCTs) in which
subjects are randomized to receive a certain version of an intervention (the ‘treatment’ as it is
known from medical trials) or not, and cluster randomized controlled trials in which groups of
subjects (as opposed to individual subjects, such as schools, villages or households) are
randomized —and “quasi-experiments�?. The key difference between experiments and quasi-
experiments is the use of random assignment of the target population to treatment or control in
experiments. Instead of random assignment, quasi-experimental designs typically allow the
researcher to establish a comparison group that is as similar as possible to the intervention
group using either program eligibility criteria or statistical methods to control for confounding
variables. One important practical implication of this difference is that randomization requires
planning and starting the evaluation prior to the initiation of the intervention for which we want
to measure impact, as units of assignment can only be randomized ex ante, which may not
always be politically feasible or operationally realistic. Quasi-experimental approaches do not
have this specific limitation but suffer from other shortcomings, such as potential selection bias
due to differences in time-varying unobservable characteristics (Baker, 2000; White, 2011;
World Bank, 2013).

The impact evaluations we discuss in the first part of this paper are large n impact evaluations,
meaning that the design is based on data collected over a large sample, usually of individuals. In
the second part, we will briefly address the subject of small n attribution analysis, and what it
may imply for evaluations in conflict prevention and peacebuilding.

On its own, a large n impact evaluation explores the effect (or lack thereof) of a certain
intervention or activity. The counterfactual quantitative analysis of impact should be
supplemented by factual analysis, such as of program beneficiary targeting effectiveness,
implementation and process documentations, and qualitative data (which can be derived from a
large variety of methods) in order to help develop the initial theory of change, dissect the
differences in findings between different settings and to further understand why the results were
what they were. The importance of including both counterfactual and factual analysis is
exemplified by the International Initiative for Impact Evaluation’s (3ie) requirement of the use of
mixed methods for the evaluations they fund (White, 2009).




                                                                                                 3
3. Are impact evaluations of interventions in conflict prevention and
   peacebuilding feasible?

Conflict-affected settings make conducting impact evaluations challenging. To address the
objections that impact evaluation of peacebuilding interventions cannot be done, and to
document the type of methodologies that have been more prominently used, Samii, Brown, and
Kulma (2012) conducted a thorough literature search to identify all the impact evaluations that
have been conducted (and including some ongoing studies) of what they call stabilization
interventions. Their search and review covers impact evaluations of peacebuilding/stabilization
interventions by any donor or government. They found that there are roughly two dozen impact
evaluations, some ongoing, across seven categories of stabilization interventions. While the
search did not fulfill all the criteria to qualify as a systematic review according to the Campbell
Collaboration guidelines2, it was extensive, covering multiple databases as well as direct contact
with the researchers to identify ongoing studies. The largest number of impact evaluations has
been of ex-combatant reintegration programs and of peace dividends (community-driven
reconstruction) programs. The impact evaluations they found were conducted in Afghanistan,
Burundi, Democratic Republic of Congo, the Aceh region of Indonesia, Israel and Palestine,
Liberia, Rwanda, Sierra Leone, Sri Lanka, and Northern Uganda. 3

The Samii et al. search results demonstrate that impact evaluation is indeed possible of
peacebuilding interventions in conflict-affected settings in a number of circumstances. This
insight then brings us to a second type of concerns relating to the worth of impact evaluations.
What sort of insights can these types of evaluations bring that other types cannot?

This paper addresses major concerns and questions about the feasibility, value added and ethics
of impact evaluations in conflict-affected settings. It builds among others on the review of the
two dozen screened studies from the Samii et al. paper, as well as insights from a survey we
administered to the authors of the included studies. The paper explores (i) evaluation design
issues in conflict-affected situations; (ii) evaluations as interventions, and the implications for
the risks and reliability of results; (iii) the importance and value-added of impact evaluations;
and (iv) ethical concerns about impact evaluations in conflict prevention and peacebuilding.




2   http://www.campbellcollaboration.org/systematic_reviews/index.php
3   Annex A lists the included studies.
                                                                                                 4
4. Evaluation design issues in conflict-affected situations

4.1.    Establishing the counterfactual

In designing an impact evaluation in fragile or unstable contexts, it is important to carefully
consider how to establish a counterfactual, analyzing what is ethical and feasible in the particular
context. While one might have expected to see mainly quasi-experimental designs used for impact
evaluations of conflict prevention and peacebuilding interventions, given that these
methodologies avoid many of the challenges of randomization, the majority of impact evaluations
of these interventions still use experimental designs (Samii et al., 2012). This section provides a
few illustrative examples of how different researchers have established a counterfactual using
experimental and quasi-experimental designs.

The first example is of individual randomization. In Blattman and Annan’s (2011) study of a
reintegration program for ex-combatants in Liberia, demand exceeded supply of spaces in the
program so registrants were admitted to the program by individual lottery. The program team
publicized the intervention in their target communities to identified ‘risky’ populations and
screened people interested in registering, identifying 1,330 eligible participants. The random
assignment was stratified by gender, military rank, and location, using a computer program.
From an ethical point of view, given that space in the program was limited, the equal chances of
participating that the lottery awarded within each stratum was arguably the fairest and most
transparent approach. An exception to the random assignment was made for those who
previously held a rank of general in an armed group. Because they were considered high-risk by
the program implementers, all who met this criterion were assigned to the program and were
hence excluded from the study.

The second type of example is of group-based randomization. Many peacebuilding
interventions are implemented in groups or communities, which requires group- instead of
individual-based randomization. For example, for a community-driven program aiming to
improve social cohesion, economic welfare and democratic governance in Liberia studied by
Fearon, Humphreys and Weinstein (2008), the NGO randomly assigned 42 communities to
receive the program of 83 eligible communities. The lottery was conducted in a public place,
with chiefs representing each community in attendance. In a similar community-driven project
in Sierra Leone, a pool of communities was selected from two districts that had regional,
political and ethnic diversity, high levels of poverty and little NGO presence. From those
districts, an eligible pool of communities of the appropriate size for the project were chosen

                                                                                                  5
and then randomly assigned into treatment (118) and control (118) communities using a
computerized random number generator (Casey, Glennerster & Miguel, 2011).

An additional example of group-based randomization is Paluck’s evaluation of a reconciliation
radio program in Rwanda, which used matched-pair randomization at the level of listening
groups. Communities were first sampled to represent political, regional and ethnic breakdowns.
Then communities were matched to the most similar community “according to a number of
observable characteristics, such as gender ratio, quality of dwellings, and education level. Then,
one community in each pair was randomly assigned to the reconciliation program and the other
to the health program. This stratification of sites helped to balance and minimize observable
differences between the communities ex ante�? (Paluck, 2009a, pgs. 577-78).

The third type of examples is of quasi-experimental designs. Where randomization is not feasible
or ethical, quasi-experimental designs may be used to create a suitable counter factual. For
example, to examine the impact of a reintegration program on ex-combatants in Burundi,
Gilligan, Mvukiyehe, and Samii (2010) used a disruption in the roll out of a program to construct
a counterfactual. Three NGOs were given contracts in three different regions to provide benefits
to ex-combatants. However, due to external factors, one of the NGOs delayed providing services
for a year. Because the disruption was unrelated to choice of entry by participants or
implementers, this comparison group theoretically avoids the traps of self-selection or targeting
bias. However, the participants in the delayed area may be systematically different from
individuals in the other two areas. To account for the potential imbalance on important
covariates, the authors matched the ‘treatment’ and ‘control’ groups on individual (e.g., age,
economic variables, and combatant variables) and community characteristics (e.g., war violence
and population density) as well as propensity score 4.

To estimate the effects of the Demobilization, Disarmament, Rehabilitation and Reintegration
(DDRR) program in Liberia on incomes and chances of employment, Levely (2012) used
propensity-score matching based on age, gender, rank and county. As pointed out by the
authors, propensity-score matching does not entirely solve the identification problem, as it does
not account for potential self-selection on unobservable characteristics. Nevertheless, it does
provide a more accurate estimate by accounting for observable variables.




4 A propensity score is the probability of a unit (a person, a school, a community etc.) being assigned to receive an
intervention (the ‘treatment’) given a set of observed covariates, were the treatment to be made available.
                                                                                                                        6
Sometimes the experimental conditions are determined by nature or by other factors beyond
the control of the experimenters but imitate a randomized process to the extent that they are
called natural experiments. In an evaluation of peace workshops for youth in Sri Lanka
(Malhotra, 2005), those who came from the same schools as workshop participants and had
been nominated to attend the workshops but had not been able to participate due to budget
cuts that year were treated as a natural control group.

An underused quasi-experimental design is the Regression Discontinuity Design (RDD) which
uses program eligibility criteria (e.g., an eligibility cutoff score such as a poverty-line) to establish
the counterfactual. The Samii et al. search uncovered no existing RDD impact evaluations in
the fields of conflict prevention and peacebuilding. One could however imagine a scenario
where a program in this field had rated districts in a country by a fragility index, or by some
index related to risk of (re)outbreak of violence. If the program decided that only districts with
a score above a certain level qualified for their program, then the districts that were close to, but
below, the cutoff point and hence did not take part in the program would be very similar to
those who were just above it and hence received it, and could act as a control group. The
advantage of this approach is that, as long as individuals cannot manipulate the assignment
variable it is as good as an experiment, but only around the cutoff point. A challenge is to have
a large enough sample of observations close to cutoff. It is also important to note that causal
conclusions are limited to units close to the cutoff and that extrapolation beyond this point
(whether to the rest of the sample or to a larger population) requires additional assumptions.
(Lee and Lemieux, 2010).

We have described ways of establishing a counterfactual when eligible individuals are excluded
from the treatment or the treatment is delayed or rationed. However, the peacebuilding
interventions whose effectiveness we would like to measure are often of a nature that does not
easily permit the identification of a control or comparison group because in theory they should
be available to everyone at the same time. This could be the case, for example, when using the
media to deliver peace messages, as above, or when a service such as social reintegration
services for ex- combatants is in theory available to all. As long as the uptake of the service or
intervention is less than 100 percent, there still exists the possibility to create a comparison
group. This method is called an “encouragement design�? because it requires that a randomly-
selected group of beneficiaries receive additional encouragement, typically in the form of
additional information or incentives, to take up the offered service (or use more of it). As long
as information on relative uptake is available along with the measured outcomes, the
encouragement design allows estimation of the effect of the intervention as well as the effect of
the encouragement itself. The creation of listening groups has already been mentioned (Paluck,
                                                                                                 7
2009a) as one type of encouragement. Other types could include an informational brochure
about a service which is available or subsidizing the service-fee or sign-up costs of a service for
a limited period.

4.2.   Adaptation and flexibility throughout the evaluation process

The unpredictability of the situation in which many peace building and conflict prevention
impact evaluations take place sometimes calls for flexibility in the design and implementation of
the evaluation.

Despite serious challenges to data collection in conflict-affected environments, all but one of
the impact evaluations summarized by Samii et al. (2012) involved collecting primary data. It
appears that the dearth of useful administrative data in these settings leaves little option but to
collect primary data. The Samii et al. review did report that relative to the comparison group of
impact evaluations carried out in other sectors, the impact evaluations appeared to be based on
smaller average sample sizes than the comparison group of impact evaluations (2.5 – 4 times
smaller), which may limit the analysis, for example of differential effects on sub-groups.

We asked the researchers of the conflict prevention and peacebuilding studies whether and how
the research teams adapted the data collection methods for the conflict-affected settings. Of the
survey-responses that reported some adaptations, the types of modifications can be roughly
divided into four categories: 1) adaptations to the sample; 2) timing; 3) question formulation
and focus group composition; and 4) the enumerators’ experience and training.

First, adaptation of sample size, either by design or due to unforeseen events, was a recurring
response. In the evaluation of the Community Development Fund in Sudan, the researchers
reported that they lost 60% of the sample communities due to the (re)outbreak of war (Paluck,
2009b), whereas in the impact evaluation of Afghanistan’s National Solidarity Programme
(NSP), the districts in which the security of the enumerators and participants was at risk were
excluded from the intervention and evaluation (Beath et al., 2010). The fact that the research
was not being done in the hostile Pashtun communities clearly affects and limits the
generalizability of the findings, and so it is important to be careful about how one reads the
evidence. As described above, in the evaluation of ex-combatant agriculture and psychological
training program in Liberia, the team decided to exclude high ranking commanders from the
evaluation in order to avoid potential conflict caused by randomizing them into both treatment
and control. The program was concerned that commanders who were randomized into the
control group may cause problems for the overall program. Therefore, all commanders were
                                                                                              8
provided access into the program and were excluded from the evaluation. The validity of the
evaluation findings is therefore limited to the ex-combatants of lower ranks.

Second, the timing of surveys is one of the most commonly-cited adjustments made. The
researchers involved in the evaluation of the peace education program for Israeli and Palestinian
youth reported having had to adjust the timing of data collection due to the conflict (Biton and
Solomon, 2006). In the evaluation of the Rwandan radio program, the team had planned
follow-up interviews in prisons which were among the experimental sites. The timing of these
had to be changed due to a sudden move to release prisoners (Paluck, 2009a). Similarly, in an
ongoing evaluation of the community monitoring for better health program in Burkina Faso
(World Bank, 2012), the research team had to halt data gathering in the Sahel because of
problems with Tuaregs who were engaged in violent conflict in neighboring Mali. They were
however able to gather the data at a later stage. It is worth reflecting on the fact that the
measured size of effects is likely to change over time, and may take a non-linear shape, hence a
great deal of caution is necessary when interpreting the findings for policy-making purposes.
This will be particularly important in situations when the window of opportunity for data
collection is limited.

Third, researchers described issues over what questions could be asked due to conflict-related
sensitivities. In the evaluation of the Community Driven Development interventions in Sierra
Leone, they explored whether they could ask about ongoing tensions, or directly about people’s
role in the conflict. The team spent time discussing with those working in the communities and
piloting questions. They found little reluctance to talk about the conflict and found that it did
not seem to raise tensions. However, they decided not to ask about some areas of current
tensions, such as marital infidelity, as they were warned that this could spark tensions (Casey et
al., 2011). In the civic education program in Southern Sudan, the focus groups were designed to
prevent more conflict. Where the social divisions were based on sect, single-sect discussion
groups were organized. Where conflict was based on the affiliation to ethnic/tribal groups, the
groups included members of only one ethnic group (Paluck, 2009b).

Finally, researchers frequently mentioned the experience and background of the enumerators as
a factor that had been taken into account when designing data-collection strategies. For both
the studies of the Burundi ex-combatant reintegration program (Gilligan et al. 2010) and of the
peacebuilding and democracy promotion efforts in Liberia (Mvukiyehe and Samii, 2010, 2011),
the authors reported having recruited specially trained enumerators who had either done social
work or human rights advocacy. It was deemed important that the research staff were sensitive
to issues of trauma and trained to handle themselves in sensitive situations. For the evaluation
                                                                                               9
of an IRC Community Driven Reconstruction program in northern Liberia, the authors
reported that the use of staff from a local organization, consisting almost entirely of ex-
combatants, as enumerators had been helpful. In the case of the Afghanistan evaluation, female
enumerators who were able to decide the most appropriate means of selecting participants
carried out the focus groups and interviews among the female population. In the case of the
evaluation of peacekeeping in Cote d’Ivoire (Mvukiyehe and Samii, 2009), the enumerators were
intensively trained in human subjects and survey techniques for a week. For the evaluation of
the reconciliation radio program in Rwanda (Paluck, 2009a), the research assistants represented
both Hutu and Tutsi backgrounds, which in itself gave a message of tolerance and may have
helped in downplaying ethnicity issues when approaching the communities.

4.3.    Evaluations as interventions: Implications for reliability and risk

All evaluations in which primary data are being collected through human interaction could in
themselves be seen or perceived as a type of intervention. This fact has potential implications
both for the reliability of the evaluation results and for the safety of the evaluation personnel
and those being evaluated. In addition, the perceived or real threat to safety is likely to be
negatively correlated with the reliability of the results, as has been acknowledged in the new
OECD DAC guidance for Evaluating Peacebuilding Activities in Settings of Conflict and Fragility
(OECD/DAC, 2012). The guidance states that “evaluations of interventions in the field of
conflict prevention and peacebuilding expose – in contrast to almost all forms of evaluation –
both evaluators and evaluated to real risk�?. The guidance goes on to discuss the implications:
“First, the threat of violence may constrain the evaluators’ ability to raise issues, collect material
and data, recruit and retain local staff, meet interlocutors, publish findings, and disclose sources.
Defending the integrity of evaluation findings in highly politicized and even dangerous settings
can pose problems for evaluation teams, particularly where evaluation findings may potentially
be misused by different parties to a conflict or harm those involved. Second, the risk of harm
may mean that the information obtained is biased, incomplete and/or (voluntarily or
involuntarily) censored. Consequently, evaluations must address the operational and
methodological consequences of the risk of violence. More specifically, in order to deal with
this challenge, it is advisable that the evaluation itself include a conflict analysis in order to
assess the intervention and to ensure that the evaluation process and product is conflict
sensitive.�? (OECD/DAC, 2012, p. 28) Impact evaluations, to a greater extent than other
evaluation methodologies, rely on the collection of primary data from a large number of units
both in a treated and a comparison population. This means that evaluation teams may have
increased exposure to the above-mentioned risks.

                                                                                                   10
Carrying out evaluations in conflict affected settings can potentially cause harm to participants
of the evaluation team (which under field visits may include implementation staff) and the local
population interviewed. For example, in a community driven reconstruction evaluation in the
DRC, “the harsh conditions produced great costs to enumerators with high incidence of
sickness including malaria and cholera. Although safety regulations were in place in all areas,
one of the teams was involved in a tragic accident in which a child died…. Despite the
precautions undertaken we did encounter some security issues: 31 villages were not visited due
to security risks; one team was ambushed and had to hand over their equipment; and one IRC
staff member was abducted (and subsequently released unharmed)�? (Humphreys et al., 2012,
pgs. 34-35).

Carrying out evaluations in conflict affected settings could furthermore potentially adversely
affect intergroup relations and the course of intergroup conflict. While examples were not
found of this having happened, an ongoing study in Cote D’Ivoire (not included in the Samii et
al. review) made evaluation design adjustments to minimize the risk of exacerbating existing
conflict. The evaluation, which looks at the impact of couples discussion groups in addition to a
savings intervention to combat intimate partner violence (Gupta and Annan, ongoing), included
women who would not otherwise have been included in the sample. They were interviewing
women in savings groups, but were only interested in women who had partners because the
outcomes of interest were about partner relations and decision making. Given that the villages
were in areas where there had been high ethnic tensions and conflict, the program team felt that
if they separated the women and interviewed some and not others, there was the potential to
create conflict and suspicion. They therefore decided also to administer a shorter survey to non-
partnered women.

Broadly speaking, there are three reasons for evaluation teams to conduct conflict analyses: 1)
to assess the relevance and impact of the program; 2) to assess the risks of negative effects of
conflict on the evaluation design and process; and 3) to assess the risks of the evaluation
exacerbating conflict (conflict sensitivity) (DFID, 2002). When reviewing the conflict
assessments reported on in the summarized studies or commented upon in the survey
responses, we found a varied approach. Some teams reported having relied on the assessment
of the program implementing agency and partners in the country. In other cases, an assessment
of risk to subjects was conducted as part of the institutional review board (IRB) approval
process. A number of the studies derived insights from baseline studies that included questions
about conflict experience, regular program reviews of the methods and measures with a ‘do no
harm’ approach in mind, survey piloting and discussions with people working in similar
communities, as well as behavioral monitoring. All of these approaches indicate an adapted use
                                                                                              11
of conflict assessment. A finding of some concern is that none of the studies or of the survey
respondents indicated that they had given any thought to the potential of adverse effects from
the publication and dissemination of results and findings.

Determining what is good enough in terms of conflict assessment is a difficult question and a
balancing act between time, resources and a priori knowledge of risk. The instability of conflict-
affected settings poses significant challenges to the rigor of evaluation design and the quality of
data collected. Evaluations in these settings also introduce risks and potential harm to
evaluators and the evaluated. Drawing from experiences of evaluators in these contexts, below
are a set of questions about ethical and feasibility issues that research teams should consider:
(i)    Does the evaluation factor in time for delays, which are more likely to occur in unstable
       conditions?
(ii) Does the sample size factor in the potential for higher attrition due to potential security,
       issues, migration or ethical concerns?
(iii) Have the potential ways that the evaluation may introduce risk and harm to participants,
       interviewers and implementing partners been adequately considered and have strategies
       been devised to mitigate these risks and harm?
(iv) Have interviewers been trained in ethical data collection and conflict sensitive approaches
       to study participants? Have the characteristics of the interview team been thought
       through in light of the conflict (i.e., ethnicity, age, gender, status).
(v) Is there a security protocol or guidelines for evaluation staff? Does evaluation staff fall
       under any organizational protection for security?
(vi) Who carries the legal responsibility for the risks taken? Have the researchers partnered
       with an organization able to bear the risks?
(vii) Have methods of monitoring the potential ethical and conflict-related issues throughout
       data collection process been considered and planned?
(viii) Does the evaluation team have strong key informants who can provide thoughtful
       analysis about the security situation and the research implications at the design phase and
       throughout the evaluation?
(ix) Is a flexible approach to the evaluation in place such that adjustments can be made
       throughout the process in light of potential harm, security or other programmatic issues?
(x) Is the responsibility of the dissemination and communication of the findings clarified, is
       there a communications plan in place and is it conflict sensitive?

We have seen that impact evaluations are feasible to carry out in diverse and challenging
circumstances, but require precautionary measures, flexibility, and conflict sensitivity on the part
of the evaluators. Having acknowledged the challenges and resources impact evaluations in
                                                                                                 12
these contexts take, the next section attempts to address the core question of the value of
impact evaluations.



5. Why are impact evaluations important?

5.1.   Testing theories of change of conflict prevention and peacebuilding
       interventions

While large n impact evaluations of peacebuilding are possible, they are also difficult, so the
question remains why do we need them? The answer is that only impact evaluations allow us to
measure net impact and thus attribute the effects of the intervention. As a result, only impact
evaluations allow us to test whether the intervention and its various inputs and outputs, lead to
the hypothesized changes, outcomes and impacts in our theory of change (White, 2009). The
simplest case for this claim is the before-after fallacy. Consider measuring an outcome both
before the program and after the program. Typically if there is an improvement, the evaluator
(and program manager) considers the intervention a success. But over the period of any
program, many other factors come into play, not least of which, all the other programs that are
being implemented in the same country. Without a valid counterfactual, there is no way of
knowing whether the improvement can be attributed to the program’s activities or may have
happened in spite of these.

In conflict-affected settings, the before-after fallacy may be even more misleading, as the
general situation may actually deteriorate over the period of the program. The before-after
measurement would show the outcomes worsening, but a comparison to a counterfactual could
very well reveal that the program prevented the outcomes from worsening to an even greater
extent — a crucial result for a peacebuilding program. Similarly, a before-after measurement
could show an improvement that is entirely due to other factors and may indeed mask
unintended negative consequences of the program in question—again a crucial result for
peacebuilding programs.

So, when returning to the question of the importance of impact evaluation, we suggest focusing
on the two key tenets that tend to distinguish this type of evaluation from more traditional
program evaluation, namely the need to account for other possible confounding factors and to
focus on results rather than the intentions implicit in the process. While we stand to be
corrected, our impression is that most disagreements and discussions about the importance of

                                                                                              13
impact evaluations, the way we have defined these in the paper, revolve around the need for a
control or a comparison group to account for confounding factors. We will not deal with this
larger debate here but refer to recent literature (Stern et al., 2012).

A limitation to quantitative impact evaluation often cited is the fact that large n impact
evaluations can only be applied in large n situations, therefore significantly limiting the questions
that can be addressed. While large n situations can be possible to implement even in what may
seem like small n situations, such as when a nationwide policy is being implemented from which
no one is or can be excluded (we have argued earlier in this paper that even in this case an
encouragement design can help us give a dose-response perspective of the policy in question),
quite often they are not. In these cases, rather than to move on and look for the next question
that is evaluable by large n methods, we call for small n attribution analysis and will revert to
these in the next sub-section. First, however, we present the type of learning and insights that
can be gained by large n impact evaluations.

The results of several large n impact evaluations of peacebuilding interventions provide
compelling evidence that many key assumptions and theories of change about conflict
prevention and peacebuilding need to be tested. This section presents examples of impact
evaluations whose findings challenge the theories that personal beliefs and prejudices need change
in order to change behavior; that discussion and debate necessarily leads to improved tolerance;
and that Community-Driven Development (CDD) or Community Driven Reconstruction
(CDR)projects, at least in the way these have tended to be implemented, improve social cohesion.

Two studies by Elizabeth Levy Paluck (2012) test psychological theories of attitude and
behavior change from media interventions designed to help rebuild communities following
conflict. In Rwanda, she evaluated a reconciliation-themed radio soap opera (Paluck, 2009a) and
in eastern Democratic Republic of Congo (DRC), she evaluated a radio talk show that was aired
in conjunction with a talk show (Paluck, 2010).

The first evaluation tested conflicting psychological theories about the relationships between
personal beliefs, societal norms, and behaviors, and how those can be influenced by media. In
Rwanda, the NGO La Benevolencija produced a radio soap opera called Musekewaya (“New
Dawn�?) that was designed to promote reconciliation by playing out a story that includes similar
sources of tensions and violent outcomes as the 1994 Rwandan genocide 5, but that speaks out

5The Rwandan Genocide was the 1994 mass murder of an estimated 800,000 people in the East African state of
Rwanda. It was the culmination of longstanding ethnic competition and tensions between the minority Tutsi, who
                                                                                                            14
against violence and includes characters banding together across ethnic groups (which were
proxied by “communities�? as the government forbade the use of the word “ethnic�?). Although
the radio program was aired nationwide, Paluck created a pair-wise matched cluster randomized
controlled trial using an “encouragement�? design. She established listening groups to encourage
the beneficiary, or treatment, group to listen to the “New Dawn�? program and to concurrently
encourage the control group to listen to an alternate radio program on health.

Since the ultimate goal of the program was to reduce intergroup conflict, the questions the
experiment tried to answer were first, can such a radio program influence both personal beliefs
and prejudices as well as perceived societal norm, and second, is a change in personal beliefs a
necessary precondition to influence behavior. While psychological theories conflict, “theories of
media persuasion claim that beliefs are influenced by media cultures and programs�? (Paluck
2009a, p. 575). The findings were startling; the perceptions of social norms as well as behaviors
changed significantly in the treatment group with respect to intermarriage, open dissent, trust,
empathy, cooperation, and trauma healing, while the program did not significantly change
listeners’ personal beliefs.

The second evaluation tested the effectiveness of discussion to reduce conflict. In the DRC, a
radio soap opera Kumbuka Kesho (‘Think of tomorrow’) emphasized conflict reduction through
community cooperation. While the radio program was aired in all the experiment’s regions,
Paluck again used an encouragement design, this time by pair-wise matching regions and
randomly choosing one broadcast region in each pair to air a talk-show directly following the
soap opera, and the other the soap opera only. The talk show was designed to encourage
listeners’ reactions and discussions. While there is a resurgence in the use of discussion as a
policy tool to reduce conflict (evidenced by the proliferation of terms such as “deliberation�?,
“dialogue�?, “participatory�? and “community driven�? in the literature on interventions designed
to promote peace) psychological research has also flagged potential hazards of discussions
including opinion polarization, social pressure and cognitive errors (Paluck; 2010). Paluck
carried out this research to learn more about the success of discussion-based conflict-reduction
programs. The findings were sobering: those listeners who were encouraged to discuss through
the additional talk show did indeed discuss more, but were also found to become more
intolerant and less likely to aid disliked community members.




had controlled power for centuries, and the majority Hutu peoples, who had come to power in the rebellion of
1959–62 (Wikipedia; accessed 31/10/2012).
                                                                                                               15
A third group of evaluations examined the effectiveness of Community-Driven Development
projects to strengthen social cohesion. A commonly proposed theory is that of the importance
of social cohesion, or the (re)building of interpersonal or intergroup networks, trust, and
reciprocity, as a crucial factor for peacebuilding and conflict prevention. In a recent talk at the
launching conference for the High Commissioner on National Minorities (HCNM) Guidelines
on Integration in Diverse Societies, Stefan Wolff answered his rhetorical question about what it
is about social cohesion that is so important for successful conflict prevention in the following
way: “One of the fundamental ideas underlying the notion of conflict prevention in diverse
societies is that different population segments can resolve any differences by recourse to
institutional processes rather than violence. For such institutional processes to be effective, a
viable and resilient state is required whose fundamental constitutional principles are broadly
accepted and respected across all segments of society. If this is the case, societies may well be
diverse across any number of indicators, including, ethnicity, language, and religion, but they
will also be characterized by a sufficient level of social cohesion.�? (Wolff, 2012). Efforts to
strengthen social cohesion have increased among development organizations, most often
operationalized through Community-Driven Development (CDD) or Community Driven
Reconstruction (CDR) projects. In a systematic review of interventions to promote social
cohesion in Sub-Saharan Africa, including in several conflict-affected countries (King et al.
2010), the authors outlined the theory underlying CDD interventions: “projects promote social
cohesion by supporting and building community capacity for decision-making and collective
action through a process of participation. The hypothesis is that, by handing over control of
decisions and resources to the community, the sub-projects will better meet communities’ needs
and enhance ownership, and that the experience of being involved in this participatory process
will empower communities, improve capacity for local development and improve social
cohesion.�? (King et al. 2010, p. 347) Drawing upon the available evidence from impact
evaluations that fulfilled a set of quality criteria, the review finds that the evidence of pro-social
effects from Community-Driven Development (CDD) type interventions is weak. More
surprisingly, a negative effect on individuals’ perceptions of inter-group relations is found across
the three studies that measured this factor. 6

The preceding examples of how impact evaluations have been used as tools to test and critically
examine commonly-held assumptions about how development interventions affect change were
all based on large n impact evaluations. But what happens when we have a question about


6 The review indicates that this finding may be partly explained by the fact that broad and substantive participation,
including in actual decision-making, was often lacking and suggests that the implementation of the CDD
interventions may have been flawed.
                                                                                                                   16
results and impact of an intervention, be it a policy reform or a service delivered, on the ground
(on the so-called ‘beneficiaries’) and do not have a large number of units of assignment? The
next section discusses the use and commonalities of small n impact evaluations.

5.2.     Small n impact evaluations

What distinguishes impact evaluation from other types of evaluation is that it relies on a
counterfactual analysis to attribute an effect to a particular intervention or set of interventions,
or said differently; to make causal inferences. We further distinguish between large n impact
evaluations which involve tests of statistical significance between outcomes for treatment and
comparison groups, with n referring to the unit of assignment, and small n impact evaluations
carried out when a treatment and comparison group of sufficient size cannot be identified, be it
individuals, communities or countries, and thus where tests of statistical significance are not
possible.

While there exists considerable consensus among impact evaluators conducting large n impact
evaluations as to what constitutes a high quality impact evaluation, no such consensus exists for
small n impact evaluations. In a recent paper by White and Phillips (2012), they examine various
small n evaluation approaches that have been used and find that a methodological core which
could provide a basis for consensus exists: ‘This common core involves the specification of a
theory of change together with a number of further alternative causal hypotheses. Causation is
established beyond reasonable doubt by collecting evidence to validate, invalidate, or revise the
hypothesized explanations, with the goal of rigorously evidencing the links in the actual causal
chain’ 7 . This type of approaches they refer to as process- or mechanism-based approaches.
They go on to summarizing the main difference between large and small n evaluations in the
following manner: ‘Whereas experimental approaches infer causality by identifying the
outcomes resulting from manipulated causes, a mechanism-based approach searches for the
causes of observed outcomes’ 8. The small n evaluations will typically gather information on
both the ‘what’ and the ‘why’, but are at risk of suffering from substantial biases likely to arise
from the collection, analysis and reporting of qualitative data.

Quite often however, when large n impact evaluation is not possible, evaluators revert to
process evaluations 9 or impact assessments based on association 10 rather than to small n

7 White and Phillips, 2012, p.3.
8 White and Phillips, 2012, p.18.
9 What distinguishes impact evaluations from process evaluations - evaluations of how the implementation was

carried out – is that the benchmark against which we compare in process evaluations is not a counterfactual
                                                                                                               17
attribution analysis, not out of methodological disagreement but rather due to a whole range of
supply and demand limitations (related to time and resources, evaluation skills etc.) (see
Grävingholt and Leininger, forthcoming 2013).

An illustration of an evaluation that used elements of the methodologies referred to as small n
attribution analysis to critically assess important theories of change is the evaluation of
Norwegian peace efforts in Sri Lanka (Goodhand et al., forthcoming 2013). Among the main
objectives of the Sri Lanka evaluation was to assess results achieved through the Norwegian
facilitation of the peace process. This is a case where the total population (N) is 1 (and small n
can obviously not be larger than large N). In other words, there was only one peace negotiation
process going on with Norwegian involvement in Sri Lanka, and that was what the researchers
set out to evaluate. Clearly, no large n impact evaluation was feasible. What about small n
attribution analysis? One of the main challenges to attributing results to the Norwegian
facilitation efforts is that the ‘treatment’, Norwegian facilitation, cannot be assumed to be an
independent variable – rather the Sri Lankan and international actors chose to contact Norway,
or did not object to this, requesting it to play the role as facilitator (and Norway chose to
accept). It is likely that assumptions about what role Norway could or would play will have
influenced the decision of approaching Norwegian policymakers. Indeed, according to the
report “Norway was chosen as a facilitator, not only for its expertise, but also because it was a
small power without geo-strategic interests and colonial baggage. Being a less powerful player,
Norway felt it had to consult the US and India, the former as the world’s superpower and the
latter as the regional hegemon�? (Goodhand et al., 2011, p.73). Assuming the Norwegian
treatment as exogenous would have led to overplaying the role of agent, as opposed to context
and path-dependence being crucial factors. Indeed, the provocative title of the evaluation
report, “pawns of peace�?, alludes directly to the endogeneity issue.

The methods chosen by the team include features designed to explicitly assess the plausibility of
causal claims; the common feature of small n attribution analysis. In particular, the ‘inside out’
and ‘outside in’ approaches that they seek to combine allows them to critically assess whether it
is realistic to believe that if Norway had acted differently different outcomes would have ensued

scenario but rather non-tested (or in the best-case scenario previously tested) assumptions of what underlies a
‘good process’.
10 Association claims are very widespread in the small n evaluation world, as is raised elsewhere in this book

(Grävingholt and Leininger, forthcoming). These are claims of having contributed to an outcome (or sometimes
even claiming attribution) by having contributed an input or claiming to have done so (e.g. by having been present
at the same time). This approach does not explore alternative causal hypothesis, the minimum criteria for small n
attribution analysis, and is clearly not good enough.
                                                                                                               18
at various points in time, given the structural constraints in which key actors operated. The
study is also very explicit about the many data collection constraints and biases they faced,
including missing key informants, secrecy and safety issues, conflicting and unreliable accounts,
and not being able to interview a number of key informants in person due to visa problems.
The main strategy used to deal with these challenges was that of triangulation.



6. Ethical concerns about impact evaluations in conflict prevention and
   peacebuilding

The descriptors of large n impact evaluations in conflict-affected settings raise several ethical
concerns. There is the concern that impact evaluation designs require that only some individuals 11
receive the intervention. This is considered an ethical problem and some claim that for certain
peacebuilding interventions, it simply is not feasible to involve some individuals and not others.
These objections are not unique to evaluations in conflict-affected settings although the risks in
these settings may be heightened. We will see that just as possibilities for ethical evaluations
abound in other types of development interventions, they also exist in conflict-affected settings.
Randomization or quasi-experimental designs do not necessarily drive the fact that only some
individuals receive the intervention; they are particularly well-suited when for financial or logistical
reasons the implementation and roll-out is slow or staggered, or when comparable groups are left
out for other reasons. This is the reality of most development interventions, as well as those in
fragile settings.

Part of what underlies the ethical concern about impact evaluations is the premise that
assignment to a comparison or control group implies ‘not receiving a benefit’. This is not
necessarily the case for two reasons. First, the comparison group can be receiving a treatment
with which another competing intervention is being compared. For example, in the case of the
impact evaluation of the agricultural training program for ex-combatants in Liberia (Blattman
and Annan, 2011) questions about whether to invest in capital or skills in agricultural
programming arose as some of the results suggested that the private returns to capital could be
higher than those for skills. In a future impact evaluation one could compare the impact of
providing capital versus skills without the necessity of a control group that does not receive any
program interventions.


11For the purpose of discussion, we use the term “individuals�? for the unit of analysis, although households or
communities or other entities may also be the unit of analysis.
                                                                                                                  19
Second, it is important to examine the assumption that receiving a development intervention, or
more of one, is always a benefit. The reality is that the effectiveness and impact of a large
number of development interventions have yet to be proven (CGD, 2006). When a genuine
state of uncertainty exists about the benefits of an intervention, so that in theory it could be
harmful or ineffective, there is an urgent need for it to be critically examined. This state of
uncertainty, known as equipoise in the medical literature, is considered a necessary ethical
condition for the use of a control group which is reflected (with a couple of important caveats)
in the Declaration of Helsinki on the use of placebo controls (World Medical Association, 2001;
Lau et al. 2003). When there is no known effective medical treatment, a new drug might
produce better, worse, or the same results as no treatment, and so there is no ethical conflict in
trials where this equipoise is present. 12 The evaluation discussed in section 5.1 provides a case in
point. Discussion groups as a tool to reduce ethnic conflict was tested in the DRC context, and
found to increase rather than decrease intolerance (Paluck, 2010).

Another concern raised is about randomizing a program’s activities across possible beneficiaries
instead of selecting according to other criteria (e.g., those who first apply or those easiest to
access). In a conflict-affected setting, prioritizing certain beneficiaries could be important for
defusing volatile situations or prioritizing quick wins. On the other hand, in cases where a
program cannot be implemented across all individuals immediately, randomization of eligible
individuals can in fact be more ethical and politically feasible than determining who benefits
first and who later, especially in a sensitive situation where particular choices can be construed
as being politically motivated.

While the ethical concerns may sometimes be misplaced or exaggerated for the reasons just
described, it is nevertheless critically important to always carefully consider the potential ethical
issues that may arise when designing and conducting impact evaluations. Guidelines exist to
help determine when not to do an impact evaluation for ethical reasons, and there exist a
number of strategies to alleviate ethical concerns. Many agencies and universities have formal
ethical clearance procedures, and the standards typically include (i) ensuring informed consent,
(ii) guaranteeing the confidentiality of participant data; (iii) limiting the burden associated with
study participation; and (iv) making sure that no one is denied essential services for the purpose
of the evaluation (Friedman, 2011; USDA, 2005).



12It is worth noting that development interventions may differ slightly from the medical "equipoise" case of a zero
probability event, in the sense that the development practitioners tend to hold priors of an intervention being
beneficial but with some remaining doubts.
                                                                                                                20
7. Conclusion: High risk, high return?

Carrying out impact evaluations in conflict-affected settings can be risky and methodologically
challenging, though we have discussed ways in which the evaluation designs and data collection
practices can be adapted and risks reduced to make their implementation feasible. Impact
evaluations are also costly, due to the reliance on data from large samples to achieve statistical
power, ranging from as little as US$50,000 for quasi-experimental impact evaluations with
preexisting survey data to over US$1 million for large multi-year RCTs with several rounds of
survey-data gathered. For both of these reasons, the returns to the studies in terms of learning
and programmatic improvements should also be high for the effort to be worth it.

We have argued that if we are interested in the actual development effects of interventions and
programs on the people they are supposed to benefit, rather than whether the program was
implemented as planned, and if we want to know whether this effect was due to, or despite of,
the intervention in question, then a well-designed and executed impact evaluation is the most
reliable approach. The potential usefulness and importance of impact evaluation is well
exemplified by the way impact evaluations have tested and challenged many of the key
assumptions and theories of change that underpin conflict prevention and peacebuilding
activities.

Important insights have therefore been gained and it is important that this knowledge feeds
back into the way we design and implement conflict prevention and peacebuilding programs, as
well as the way we carry out program-theory evaluations. To date, evidence of putting learning
into practice from impact evaluations is limited. Of the 13 programs in which survey
respondents had been involved, two were rated as not having led to any learning, three as
having contributed to program improvements or general learning around a program type, and in
all of eight cases the respondents said any learning impact was unclear or ‘too early to tell’. It
may be that learning happens without the knowledge of the researchers, and clearly learning
takes time. Especially when trying to draw lessons that have validity beyond a single program,
country, and point in time, it is necessary to build up a body of evidence and systematically
review it. Nevertheless, the survey responses are a good reminder that dissemination and
learning from evaluation work, the raison d’être of these risky and challenging endeavors, cannot
be taken for granted. Whether the high risk leads to high returns remains an open question. The
returns will to a large extent depend on the international development community’s capacity to
more strategically incorporate evidence-based learning into interventions operating in contexts
of conflict and fragility.

                                                                                               21
References

Annan, Jeannie and Christopher Blattman, 2011, “Reintegrating and Employing High Risk Youth in Liberia:
Lessons from a randomized evaluation of a Landmine Action agricultural training program for ex-combatants.�?
Evidence from Randomized Evaluations of Peacebuilding in Liberia: Policy Report 2011. 1, Yale and IPA.

Baker, Judy. L, 2000, Evaluating the Impact of Development Projects on Poverty, A Handbook for Practitioners,
The International Bank for Reconstruction and Development/THE WORLD BANK, 1818 H Street, N.W.,
Washington, D.C. 20433

Barron, Patrick, Macartan Humphreys, Laura Paler, and Jeremy Weinstein, 2009, Community-Based Reintegration
in       Aceh:        Assessing      the        Impacts       of         BRA-KDP,        World       Bank,
www.columbia.edu/~lbp2106/docs/arls/FINAL_BRA-KDP_WB.pdf.

Beath, Andrew, Fotini Christia, Ruben Enikolopov, and Shahim Ahmad Kabuli, 2010, Randomized Impact
Evaluation of Phase II of Afghanistan’s National Solidarity Programme (NSP): Estimates of Interim Program
Impact        from        First       Follow-up         Survey,      http://www.nsp-ie.org/reports/BCEK-
Interim_Estimates_of_Program_Impact_2010_07_13.pdf.

Biton, Yifat and Gavriel Solomon, 2006, Peace in the Eyes of Israeli and Palestinian Youths: Effects of Collective
Narratives and Peace Education Program, Journal of Peace Research 43, no. 2: 167-180,
http://jpr.sagepub.com/cgi/doi/10.1177/0022343306061888.

Blattman, Christopher, 2011, Uganda: Enterprises for Ultra-poor Women after War, (in progress)
http://chrisblattman.com/projects/wings.

---, 2011, Uganda: Post-war Youth Vocational Training.‖(in progress) www.chrisblattman.com/projects/nusaf_yo/.

---, 2011, Peace Education in Rural Liberia.‖ Innovations for Poverty Action. (in progress), www.poverty-
action.org/project/0139.

Blattman, Christopher and Jeannie Annan, 2011, Reintegrating and Employing High Risk Youth in Liberia:
Lessons from a randomized evaluation of a Landmine Action agricultural training program for ex-combatants,
Evidence from Randomized Evaluations of Peacebuilding in Liberia: Policy Report 2011.1, Interventions for
Policy Action, IPA, Yale University.

Casey, Katherine, Rachel Glennerster, and Edward Miguel, 2011, Reshaping Institutions: Evidence on External Aid
and Local Collective Action, National Bureau of Economic Research Working Paper no. 17012,
http://www.nber.org/papers/w17012

Center for Global Development, 2006, When Will We Ever Learn? Improving Lives through Impact Evaluation,
Report of the Evaluation Gap Working Group, May 2006.

Diamond, A. and J. Hainmueller, 2007, The Encouragement Design for Program Evaluation, IFC;
http://www.ifc.org/ifcext/rmas.nsf/AttachmentsByTitle/Encouragement/$FILE/The+
Encouragement+Design+for+Program+Evaluation.pdf

DiNardo, J. (2008). "Natural experiments and quasi-natural experiments". In Durlauf, Steven N.; Blume, Lawrence
E. The New Palgrave Dictionary of Economics (Second ed.). Palgrave Macmillan.



                                                                                                               22
Department For International Development (DFID), 2002, Conducting conflict assessments: guidance notes,
issues,    http://www.conflictsensitivity.org/sites/default/files/Conducting_Conflict_Assessment_Guidance.pdf
(accessed 06/01/2013).

Fearon, James, Macartan Humphreys, and Jeremy M. Weinstein, 2008, Communitty-Driven Reconstruction in Lofa
County.www.columbia.edu/~mh2245/FHW/FHW_final.pdf.

Friedman, Jed, 2011, Development Impact blog: The ethics of a control group in randomized impact evaluations –
the start of an ongoing discussion, accessed 31/03/2013; http://blogs.worldbank.org/impactevaluations/node/598

Gilligan, Michael, Eric Mvukiyehe, and Cyrus Samii, 2010, Reintegrating Rebels Into Civilian Life: Quasi-
experimental       Evidence     From       Burundi,      United      States    Institute     of     Peace,
http://www.columbia.edu/~cds81/docs/bdi09_reintegration100701.pdf.

Glennerster, Rachel, and Edward Miguel, 2010, The role of information and radios on political knowledge and
participation in Sierra Leone, Poverty Action Lab (in progress), http://www.povertyactionlab.org/evaluation/role-
information-and-radios-political-knowledge-and-participation-sierra-leone.

Goodhand, Jonathan, Bart Klem and Gunnar Sørbø, 2013, Evaluating Norwegian peace efforts in Sri Lanka, in O.
Winckler Andersen, B. Bull and M. Kennedy-Chouane, Evaluation Methodologies for Aid in Conflict, London:
Routledge, Taylor and Francis Group.

Grävingholt, Jörn and Julia Leininger, (forthcoming, 2013), Evaluating Statebuilding Support in Fragile States:
Learning from Experience or Judging from Assumptions?, in O. Winckler Andersen, B. Bull and M. Kennedy-
Chouane, Evaluation Methodologies for Aid in Conflict, London: Routledge, Taylor and Francis Group.

Gupta, Jhumka and Jeannie Annan, ongoing IRC project, Evaluating an economic and empowerment intervention
on the prevention of partner violence.

Hossain M, Zimmerman C, Kiss L, Watts C. (2010), Violence against women and men in Côte d’Ivoire: A cluster randomized
controlled trial to assess the impact of the ‘Men & Women in Partnership’ intervention on the reduction of violence against women and
girls in rural Côte d’Ivoire - Results from a community survey, London: London School of Hygiene & Tropical Medicine.

Humphreys, Macartan, 2008, Community-Driven Resonstruction in the Democratic Republic of Congo, Baseline
Report, Columbia University and the International Rescue Committee.

Humphreys, Macartan, and Jeremy M. Weinstein, 2007, Demobilization and Reintegration, Journal of Conflict
Resolution 51, no. 4 (August): 531-567, http://jcr.sagepub.com/cgi/doi/10.1177/0022002707302790.

Humphreys, Macartan, Raul Sanchez de la Sierra and Peter van der Windt, 2012, Social and Economic Impacts of
Tuungane. Final Report on the Effects of a Community Driven Reconstruction Program in Eastern Democratic Republic of Congo,
Columbia                  University,                June                  2012,                pgs                  34-35,
http://www.oecd.org/countries/democraticrepublicofthecongo/drc.pdf

King, Elisabeth, Cyrus Samii and Birte Snilstveit, 2010, Interventions to promote social cohesion in sub-Saharan
Africa. Journal of Development Effectiveness, 2(3), pp. 336–370.

Kondylis, Florence., 2007, Agricultural Outputs and Conflict Displacement: Evidence from a Policy Intervention
in Rwanda, Households in Conflict Network Working Paper 28, http://www.csae.ox.ac.uk/conferences/2007-edia-
lawbidc/papers/046-kondylis.pdf.



                                                                                                                                 23
Lau, J.T.F., J. Mao, and J. Woo, 2003, “Ethical Issues Related to the Use of Placebo in Clinical Trials.�? Hong Kong
Medical Journal 9.3 (2003): 192-98.

Lee, David and Thomas Lemieux (2010), Regression Discontinuity Designs in Economics, Journal of Economic
Literature 48 (June 2010): 281–355

Levely, Ian, 2012, Measuring Intermediate Outcomes of Liberia’s DDRR Program, Institute of Economic Studies,
Faculty of Social Sciences Charles University in Prague, IES Working Paper 2/2012.

Malhotra, D., 2005, Long-Term Effects of Peace Workshops in Protracted Conflicts, Journal of Conflict Resolution 49,
no. 6 (December): 908-924, http://jcr.sagepub.com/cgi/doi/10.1177/0022002705281153.

Mvukiyehe, Eric and Cyrus Samii, 2011, Peace from the Bottom Up: A Randomized Trial with UN Peacekeepers,
Paper presented at the FBA Peacekeeping Working Group, Stockholm, February 11-12, 2011.

Mvukiyehe, Eric and Cyrus Samii, 2010, Quantitative Impact Evaluation of the United Nations Mission in Liberia:
Final Report, Typescript, Columbia University, www.columbia.edu/~cds81/docs/lib/unmil_final100209.pdf.

Mvukiyehe, Eric and Cyrus Samii, 2009, Laying a Foundation for Peace? Micro-Effects of Peacekeeping in Cote
d’Ivoire, Paper prepared for the 2009 American Political Science Association Conference, Toronto,
http://www.columbia.edu/~cds81/docs/unoci/mvukiyehe_samii_unoci090801.pdf.

OECD (2012), Evaluating Peacebuilding Activities in Settings of Conflict and Fragility: Improving Learning for Results, DAC
Guidelines            and              Reference                   Series,                OECD                   Publishing.
doi: 10.1787/9789264106802-en

Paluck, Elizabeth Levy, 2009a, Reducing Intergroup Prejudice and Conflict Using the Media: A Field Experiment
in    Rwanda,     Journal of Personality and Social Psychology 96,            no.    3    (March):   574-587,
http://www.ncbi.nlm.nih.gov/pubmed/19254104.

Paluck, Elizabeth Levy, 2009b, “Entertainment, Information, and Discussion: Experimenting with media
techniques for civic education and engagement in Southern Sudan.�? Memo presented at the Experiments on
Government and Politics (EGAP) Conference at the Institution for Social and Policy Studies, Yale, April 24-25,
2009. http://isps.research.yale.edu/conferences/EGAP/egap/download/Paluck_4.25.09_MEMO.pdf.

Paluck, Elizabeth Levy, 2010, Is It Better Not to Talk? Group Polarization, Extended Contact, and Perspectives
Taking in Eastern Republic of Congo, Personality and Social Psychology Bulletin 36 no. 9: 1170-1185.

Paluck, Elizabeth Levy, and Donald P. Green, 2009, Deference, Dissent, and Dispute Resolution: An Experimental
Intervention Using Mass Media to Change Norms and Behavior in Rwanda, American Political Science Review 103, no.
04 (October): 622, http://www.journals.cambridge.org/abstract_S0003055409990128.

Pugel, James, 2007, What the Fighters Say: A Survey of Ex-combatants in Liberia, United Nations Development
Programme – Liberia, www.lr.undp.org/UNDPwhatFightersSayLiberia-2006.pdf.Samii, Cyrus, Annette N. Brown,
and Monika Kulma, 2012, Evaluating Stabilization Interventions, Working draft 2.0, August 16, 2012

Samii, Cyrus; Annette Brown and Monika Kulma (2012), Evaluating Stabilization Interventions, International
Initative           for         Impact             Evaluation            (3ie)         White          Paper,
https://files.nyu.edu/cds2083/public/docs/evaluating_stabilization_interventions_120816shortenedb.pdf
(accessed 06/20/2013).



                                                                                                                        24
Stern, Elliot, Nicoletta Stame, John Mayne, Kim Forss, Rick Davies, and Barbara Befani, 2012, Broadening the
range of designs and methods for impact evaluations, report of a study commissioned by the Department for International
Development, Working Paper 38.

USDA, Food and Nutrition Service, 2005, Nutrition Education: Principles of Sound Impact Evaluation,
http://www.fns.usda.gov/Ora/menu/Published/NutritionEducation/Files/EvaluationPrinciples.pdf,            accessed
31/03/2013.
White, Howard and Daniel Phillips, 2012, Addressing attribution of cause and effect in small n impact evaluations:
towards and integrated framework, International Initiative for Impact Evaluation, 3ie; Working Paper 15.

White, Howard, 2009, Theory-Based Impact Evaluation: Principles and Practice, 3ie Working paper 3.

White, Howard, 2011, An introduction to the use of randomized control trails to evaluate development
interventions, the International Initiative for Impact Evaluation, Working Paper 9.

Winckler Andersen, Ole, Bull, Beate and Megan Kennedy-Chouane (forthcoming, 2013), Evaluation
Methodologies for Aid in Conflict, London: Routledge, Taylor and Francis Group.

Wolff, Stefan, 2012, Integration and Conflict Prevention in Diverse Societies, The Ljubljana Recommendations of
the OSCE High Commissioner on National Minorities in the Post-Soviet Context, Launching Conference HCNM
Guidelines on Integration in Diverse Societies, November 7, 2012, http://www.stefanwolff.com/talks/integration-
and-conflict-prevention-in-diverse-societies, accessed 19/11/12.

World Bank, 2012, Impact Evaluation of the Burkina Faso Community Monitoring for Better Health and
Education Service Delivery Project, ongoing evaluation presented at a World Bank seminar July 10 2012;
http://web.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTDEVIMPEVAINI/0,,contentMDK:232381
46~menuPK:7637304~pagePK:64168445~piPK:64168309~theSitePK:3998212,00.html (accessed 06/01/2013).

World             Bank,          2013,        Evaluation            Designs              (webpage),
http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/EXTPOVERTY/EXTISPMA/0,,contentMDK:20
188242~menuPK:415130~pagePK:148956~piPK:216618~theSitePK:384329,00.html, accessed 30/03/2013.

World Development Report, 2011: Conflict, Security and Development, the World Bank Group.

World Medical Association, 2000, Declaration of Helsinki: Ethical principles for medical research involving human
subjects. JAMA 284:3043–45.

World Medical Association, 2001, Note of clarification on paragraph 29 of the WMA Declaration of Helsinki.
Geneva: World Medical Association. Available from: http://www.wma.net/e/home.html.




                                                                                                                   25
Annex A: Studies reviewed in the Samii, Brown, and Kulma (2012) paper

     Article                  Category               Country       Status        IE Type        Counterfactual
1    Annan, J. and C.         Ex-Combatant           Liberia       Ongoing       RCT            Randomized Control Group
     Blattman (2011)          Reintegration
2    Beath, A. et al.         Peace Dividends        Afghanistan   Completed     RCT            Randomized Control Group
     (2010)
3    Blattman,       C.       Peace Structures       Liberia       Ongoing       RCT            Randomized Control Group
     (2011a)
4    Blattman,       C.       Victims of War         Uganda        Ongoing       RCT            Delayed Treatment Control
     (2011b)                                                                                    Group
5    Blattman,       C.       Victims of War         Uganda        Ongoing       RCT            Randomized Control Group
     (2011c)
6    Casey, K. (2011)         Peace Dividends        Sierra        Completed     RCT            Randomized Control Group
                                                     Leone
7    Fearon, J. et al.        Peace Dividends        Liberia       Completed     RCT            Randomized             group
     (2009)                                                                                     assignment of villages

8    Fearon, J. et al.        Peace Dividends        Liberia       Completed     RCT            Randomized Control Group
     (2008)
9    Glennerster, R.          Peace Messaging        Sierra        Ongoing       RCT            Randomized Control Group
     and E. Miguel                                   Leone
     (2010)
10   Paluck, E. and D.        Peace Messaging        Rwanda        Completed     RCT            Clustered           random
     Green (2009)                                                                               assignment
11   Paluck,            E.    Peace Messaging        Sudan         Ongoing       RCT            Clustered           random
     (2009a)                                                                                    assignment   with   factorial
                                                                                                model
12   Paluck,            E.    Peace Messaging        Rwanda        Completed     RCT            Randomized assignment of
     (2009b)                                                                                    clusters with matching

13   Pugel, J. (2007)         Ex-Combatant           Liberia       Completed     RCT            Randomized selection of 20
                              Reintegration                                                     person clusters
14   Paluck, E. (2010)        Peace Messaging        DRC           Completed     RCT            Randomized assignment of
                                                                                                clusters with matching
15   Barron, P.          et   Peace Dividends        Indonesia     Completed     Quasi          Matched Control Group
     al.(2009)                                                                   Experimental

16   Biton, Y. and G.         Consensus          &   Israel        Completed     Quasi          Matched-pair randomization
     Solomon (2006)           Dialogue                                           Experimental   of classes       in selected
                                                                                                schools/ natural
17   Gilligan, M. et al.      Ex-Combatant           Burundi       Completed     Quasi          Natural control group with
     (2010)                   Reintegration                                      Experimental   matching
18   Humphreys,   M.          Ex-Combatant           Sierra        Completed     Quasi          Matched control group
     and J. Weinstein         Reintegration          Leone                       Experimental
     (2007)
19   Kondylis,          F.    Victims of War         Rwanda        Preliminary   Quasi          Natural control group
     (2007)                                                                      Experimental
20   Levely, I. (2010)        Ex-Combatant           Liberia       Completed     Quasi          Matched Control Group
                              Reintegration                                      Experimental
21   Malhotra, D. and         Consensus          &   Sri Lanka     Completed     Quasi          Natural control group
     S.      Liyanage         Dialogue                                           Experimental
     (2005)


                                                                                                                         26
22   Mvukiyehe, E.   Peace Dividends      Cote       Preliminary   Quasi          Natural control group
     and C. Samii                         d'Ivoire                 Experimental
     (2009)
23   Mvukiyehe, E.   Community Security   Liberia    Preliminary   Quasi          Matched            Clusters
     and C. Samii    Initiatives                                   Experimental   (communities)
     (2011)
24   Mvukiyehe, E.   Ex-Combatant         Liberia    Completed     Quasi          Cluster matched sampling
     and C. Samii    Reintegration,                                Experimental
     (2010)          Peace Dividends




                                                                                                          27