IOB evaluation quality criteria | Policy and Operations Evaluation Department (IOB)

IOB’s 17 evaluation quality criteria serve as a comprehensive framework for assessing and monitoring the quality of project and programme-level evaluations commissioned by the Ministry of Foreign Affairs or by its partner organizations. In addition, IOB uses the evaluation quality criteria in its own research. The criteria have been structured chronologically across three different phases in the evaluation process: the Terms of Reference, the elaborated methodology, and the evaluation report.

The document IOB evaluation criteria 2024 (Zip) presents the 17 criteria, accompanied by an explanation as to why IOB deems an evaluation practice as ‘good’, ‘adequate’, or ‘inadequate’, along with illustrative examples.

This page presents the 17 criteria in the expandable items.

We recognise that the art and craft of policy evaluation is constantly evolving. We welcome feedback and suggestions via iob@minbuza.nl.

Evaluation criteria in phase 1 Terms of Reference

During formulating the Terms of Reference, the first subset of criteria (1 – 8) help shape aspects such as project description, evaluation objectives, scope, evaluation questions, and quality control.

The ToR should broadly present and assess the intervention theory and assumptions of the project under evaluation as this will provide useful input for formulating the evaluation questions.

Evaluation criteria in phase 2 Elaborated methodology

The second subset of criteria (9 – 14) help in assessing the elaborated methodology, as outlined by the evaluator, e.g. in an ‘inception report’ or ‘technical proposal’. Based on the assessment of these criteria, a commissioner may ask evaluators to adjust the methodology.

This phase also includes a re-assessment of the first subset of criteria, to make sure whether the elaborated methodology is in line with the ToR.

The selected evaluation method(s) should be appropriate to assess the contribution or attribution of the projects or interventions to observed results at the outcome or impact levels. Evaluators may use qualitative evaluation method(s) to evaluate the degree to which causal claims of projects or interventions about results, effects, and outcomes are plausible.

In their paper, White and Phillips (2012) distinguish four qualitative evaluation methods that are suitable for substantiating claims about effectiveness (realistic evaluation; contribution analysis; process tracing and; general elimination methodology) and four methods that are less suitable for substantiating claims about effectiveness (most significant change; success case method; outcome mapping; method for impact). In recent years, outcome harvesting, which shares characteristics with outcome mapping, has also gained popularity amongst evaluators, especially for evaluating lobby and advocacy related activities.

Qualitative evaluation methods that are able to substantiate clausal claims about effectiveness should generally follow these five steps:

Formulate the cause-effect contribution question.
Construct of reconstruct an intervention theory, including the assumptions.
Formulate alternative theories and explanations for the observed changes.
Collect data along results areas in the intervention theory, and for the alternative theories, including data from stakeholders that have not been directly involved in the project.
Verify in a step-by-step manner the causal chains of the intervention theory for the full range of possible outcomes (including achieved results, intended results that have not been achieved, and unintended effects), and the alternative theories.

Research designs may include participatory evaluation methods, which can be used to deepen the understanding of specific mechanisms and can improve the evaluator’s contextual understanding and to facilitate downward accountability.

In order to answer evaluation questions about effectiveness it is important that, combined, the methods align with the five steps outlined above to validly and reliably answer the evaluation questions. Therefore, (single) participatory evaluation methods should be complemented with other evaluation methods. Furthermore, the methods should assess the full range of possible outcomes, including achieved results, intended results that have not been achieved, and unintended effects.

Evaluations can include multiple sampling strategies and/or case selection strategies to serve different evaluation objectives or answer different evaluation questions.

Sampling is the process of selecting a subset of individuals or units from a larger population. The following distinction is important: probability sampling and non-probability sampling

Probability sampling, e.g. random sampling, stratified sampling, or systematic sampling, ensures that each member of the population has a known chance of being selected. The objective is to create a representative subsample which allows the evaluators to generalise findings from the sample to the broader population and prevent selection bias.

When the chosen sample in an evaluation does not accurately represent the entire target population, this is referred to as selection bias. It occurs, for example, when participants in an evaluation of a youth employment project self-select. More successful or more entrepreneurial young people may be more likely to voluntarily participate in an evaluation and, as a result, the findings might not be representative for all young people.

Non-probability sampling, e.g. convenience sampling, purposive sampling, snowball sampling, does not ensure that every member has a chance of being selected. Especially in qualitative research, evaluators apply non-probability sampling methods to study specific phenomena, without the objective to generalise results to the broader population.

In quantitative research, calculating the sample size should be done prior to data-collection and is ideally based on a power calculation.

Case selection is choosing specific cases for in-depth study based on their relevance to the objective of the evaluation, and the evaluation questions. The main goal of case selection is to gain a deep, contextual understanding of a particular case or particular cases, which may provide valuable insights. Depending on the objectives, evaluation questions and analysis strategy, evaluators may select typical cases, extreme cases, critical cases, or comparative cases.

When sampling and selecting cases, evaluators should:

Formulate sample or case selection criteria - the set of characteristics that must be present - independently from the actively involved stakeholders.
Be transparent about sample selection criteria applied, e.g. by presenting a list of all potential cases, interventions or countries and their scores or the selection criteria.

Evaluation criteria in phase 3 Draft and final report

The third subset of criteria (15 – 17) are designed to assess the quality of the draft and final report. These criteria focus on transparent reporting and on conclusions and recommendations formulated.

During this stage of assessing evaluation quality, it is no longer possible to adjust the data collection as applied, but it remains possible to improve descriptions, enhance the analysis, and to reformulate conclusions.

This phase includes a re-assessment of the second subset and of five criteria from the first subset (1 – 5).

Evaluation criteria in phase 1 Terms of Reference

1. Reference group

2. Independence of evaluators

3. Description of context

4. Description of the intervention

5. Scope of the evaluation

6. Evaluation’s objective

7. OECD/DAC evaluation criteria

8. Evaluation questions

Evaluation criteria in phase 2 Elaborated methodology

9. Research design

10a. Plausibility of causal claim - qualitative

10b. Plausibility of causal claim - quantitative

11. Indicators or description of result areas

12. Sampling and case selection

13. Sufficient and independent information sources

14. Description of limitations and bias

Evaluation criteria in phase 3 Draft and final report

15. Transparency

16. Answer to evaluation questions

17. Logic of conclusions and recommendations