For a Randomised Control Trial (RCT) several elements are necessary. Evaluators need to be involved long before it ends – ideally from the conception. Randomisation must take place. The operationalisation and measurement must be defined. The data collection process and the data analysis must be performed rigorously. Randomisation and the data collection process is what makes the difference compared to other experiments.
To run a RCT partners are needed. Often firms and non-governmental organisations (NGOs) are partners since they benefit from evaluating their work. Governments are still rare partners, but the number of government-sponsored RCTs is increasing. The programme under evaluation can be either an actual programme (only simple impact evaluation) or pilot programmes (impact evaluation can become field experiments).
Randomisation needs to be chosen carefully. Usually, access, timing of access or encouragement to participant is randomised. The optimal test would run on access, but ethical concerns may make that impossible. Relaxation of access are obtained by introducing the treatment in waves or by encouraging the population to take up the treatment (and measuring the people who did not take it up as “non-accessed”).
A randomised trial can be run in many circumstances, for instance:
- New program design,
- New program,
- New services,
- New people,
- New local,
- Over- or under-subscription of existing programs,
- Rotation of program benefits or burdens,
- Admission cutoffs, and
- Admission in phases.
The choice of the randomisation level is another important parameter. Often the type of treatment or randomisation opportunities determine the randomisation level. However, the best choice usually would be the individual level. If the level can be picked, there are still several considerations that need to be made when determining the level:
- Unit of measurement (experiment constraint),
- spillovers (interaction between observed units),
- Attrition (loss of units throughout the observation),
- Compliance (will the treatment work),
- Statistical Power (number of units available), and
- Feasibility (can the unit be observed (cost-)effectively).
Often a cross-cutting design is used. Where several treatments are applied and distributes across the units such that all combinations are observed. This allows to assess the individual treatments as well as the cross-interaction between treatments.
The data collection process can be described in three steps.
- Baseline Measurement (asserts whether the randomisation works and can assess the bias of non-compliance and attrition)
- Midstream Measurement (only in long-term projects)
- Endline Measurements (in combination with baseline measurements allows to estimate unit fixed effect (differences-in differences estimation)
Threats to RCTs
The four main threats to RCTs are partial compliance, attrition, spill-overs and evaluation-driven effects.
Partial compliance can be caused by several issues: Implementation staff may depart from the allocation or treatment procedures; Units in treatment group may not be treated or units in control group may be treated; Units in treatment group do not get complete treatment; and Unit exhibit opposite of compliance (so-called defiers).
Attrition may occur for specific reasons. However, often drop-out reasons cannot be measured (or the answer is refused).
Spillovers may occur on different levels: physical, behavioural, informational, general-equilibrium (market-wide) effects (i.e. long-term system-wide effects).
Evaluation-driven effects have been observed. The most important ones include:
- Hawthorn effect (the treatment group changes behaviour due to being observed; the to counter the effect something that cannot be changed should be measured [alternatively, not telling that participants are observed would help, but is often unethical]);
- John Henry effect (the control group changes behaviour due to believing being disadvantaged and trying to compensate);
- resentment and demoralisation effect (selection into treatment and control changes behaviour);
- demand effect (participants want to produce the result required by the observer or impress the observer);
- anticipation effect (the psychological state of participants influences there performance [e.g. if they expected to be good at something they score better]); and
- survey effect (the framing and order of tasks/questions will influence the response).
RCTs are seen as the gold standard when it comes to impact evaluation, but they are no panacea. Designing a rigorous impact evaluation requires innovative thinking and substantive knowledge of the program and policy area.
Funders in the U.S. and GB have begun to increasingly ask for RCT evaluation of programmes, especially in certain domestic policy areas (e.g. education) and development. Continental Europe is still somewhat lagging in this respect.