1,111
Views
1
CrossRef citations to date
0
Altmetric
Original Article

Organisational analytical capacity: Policy evaluation in BelgiumFootnote*

&

Abstract

Checklists for evaluation capacity builders include a wide range of building blocks for supporting evaluation activity. Yet, the relative importance of each building block is not clear. The purpose of this article is to identify the capacity related factors that are necessary, but not necessarily sufficient, for organisations that wish to institutionalise high quality policy evaluations. To retrieve these factors, we rely on the necessity function in qualitative comparative analysis. We present a study of twenty-seven organisations of the Flemish public sector (Belgium), in which the introduction of policy evaluations is relatively recent. Our case analysis thus sheds an interesting light upon how policy evaluation, and the underlying capacity to evaluate is given shape. Our findings point at evaluation demand as the most necessary prerequisite for fostering evaluation activity, more so than supply related factors.

1 Introduction

In recent decades, evaluation has been given a prominent position in government modernisation reforms worldwide (CitationBrans & Vancoppenolle, 2005) as an essential analytical tool for professional policy-making. Yet, despite a general consensus on the intrinsic value of policy evaluations for good government, there is a large variety in the extent to which organisations are active in evaluation. Checklists for evaluation capacity builders typically include a wide range of building blocks that are assumed to contribute to evaluation activity. At present, however, it is not clear which of the capacity building conditions are strictly necessary. How then, should governments know where to prioritise efforts, should they seek to promote evaluation activity? This is the question we attempt to answer in this article. We rely on the necessity function in qualitative comparative analysis (QCA) (CitationRagin, 1987, 2000, 2008) to investigate which capacity related conditions enable the presence or absence of evaluation activity. We agree with CitationRagin (2003: 179) that “Necessary conditions are central to social theory, research design and more generally, the process of coaxing generalizations from empirical evidence”.

We distinguish between three dimensions of evaluation activity: conduct of evaluations; regularity of evaluations; and quality of evaluations. For each of these dimensions, we conduct a necessity analysis. Twenty-seven organisations of the Flemish public sector (Belgium) constitute our cases. Internationally compared, Flanders belongs to the second wave of countries (regions) where policy evaluation was gradually introduced only by the start of the new millennium (CitationVarone, Jacob, & De Winter, 2005; Pattyn, 2014a). In Flanders, policy evaluations typically acquired a place on the agenda along with the introduction of a broader modernisation reform agenda. In 2006, the Flemish administration implemented large-scale NPM inspired changes, coined as Beter Bestuurlijk Beleid (CitationVlaams Parlement, 2003). For evaluation, this meant that departments were officially put in charge of evaluating implemented policies (instruments used, outputs and outcome), while the implementing agencies were to provide the input for evaluations. With policy evaluations only recently introduced in the Flemish public sector, our cases shed an interesting light upon how policy evaluation, and the underlying capacity to evaluate is given shape. We trust our findings offer a source of inspiration for governments who seek to strengthen their evaluation capacity.

In a first section of the article, we discuss the relevance of the study, and present the capacity building blocks included in the research. Next, we discuss the research strategy and explain the potential and characteristics of the analysis of necessity. In a third section, we address the actual puzzle of the article and present the necessary conditions for each of the three dimensions of evaluation activity: incidence of evaluation, regularity of evaluations, and quality of evaluations. We conclude the article with a recapitulation and discussion of the policy implications of our findings.

2 Evaluation capacity for evaluation activity

2.1 Relevance

For our analysis, we rely on one of the most cited definitions of policy evaluation: “Policy evaluation is a scientific analysis of a certain policy (or part of a policy), aimed at determining its merit or worth on the basis of certain criteria” (CitationScriven, 1991: 139). We distinguish policy evaluation from monitoring. Admittedly, monitoring systems can be useful in answering descriptive evaluation questions (i.e. What-questions). But they will fall short when it comes to Why- and How-questions. The latter are particularly the type of questions that constitute the largest share of policy evaluations. We also draw upon CitationPainter and Pierre (2005: 5) who denoted evaluation as a supporting system for policy capacity, particularly when evaluation helps further values of coherence, public regardingness, credibility, decisiveness, and resoluteness. Although policy evaluation is pivotal in supporting policy capacity, it is also relevant for two of the other governing capacities: administrative capacity and state capacity. Indeed, to the extent that policy evaluations take into account criteria as economy, efficiency, responsibility, probity or equity, they contribute to administrative capacity. Policy evaluations possibly support state capacity too, when they are called to assess the legitimacy, accountability, compliance or consent of a policy.

Our comprehensive notion of evaluation activity makes a distinction between (a) the mere incidence of evaluation activity, (b) the regularity of evaluation activity and (c) the quality of evaluation activity. These three dimensions return as key indicators for the success of evaluation capacity building (ECB). As highlighted by one of the most well-known ECB definitions: “ECB is a context-dependent, intentional action system of guided processes and practices for bringing about and sustaining a state of affairs in which quality program evaluation and its appropriate uses are ordinary and ongoing practices (…) (CitationStockdill, Baizerman, & Compton, 2002: 8). Ultimately, ECB strives for regular and high quality evaluation activity.

2.2 Evaluation capacity: building blocks

In this contribution, we aim to identify the necessary conditions for evaluation activity, with its three dimensions. Rather than generating one list of potentially necessary conditions per dimension, we composed one single list for all dimensions together. We combined two sources. A first source was the ECB literature. We conducted a wide literature screening of top journals in the evaluation field (examples: Evaluation, New Directions of Evaluation), books/chapters in books of renowned evaluation scholars, reports from international institutions with well-known evaluation units (European Commission; World Bank; United States General Accounting Office), and evaluation checklists (examples: CitationStufflebeam, 2002; Volkov and King, 2007). Substantial input for our list of conditions was provided by an earlier exercise in which we attempted to identify the building blocks behind the constructs of evaluation capacity and evaluation culture (CitationDe Peuter & Pattyn, 2009). Given the conceptual complexities associated with an analysis of evaluation capacity and evaluation activity, other related constructs, such as evaluation maturity or evaluation culture, were also taken into account. Most of the sources relate to a diversity of empirical contexts. We did not exclude any geographical contexts, though we omitted factors that are only relevant for a national or country level of analysis. These national level factors are of no explanatory value, since they are kept stable in our research. Following the principle of saturation, we halted the literature review when no new potential conditions were found. The literature review was then complemented with insights we collected during an explorative quick scan with nineteen policy workers from the thirteen departments of the Flemish public sector. This scan gave us a preliminary picture of the state of evaluation activity in Flanders. In the quick scan, respondents were asked what could potentially impact upon evaluation activity. We added their suggestions to the list of conditions that we retrieved from the evaluation literature. In the context of this paper, we analyse the necessity status of twenty-two conditions, categorised in four groups. The list of conditions is included in Appendix A (column 1).

Evaluation supply. If policy capacity is about creating, storing and marshalling the necessary resources to make intelligent collective choices (CitationPainter & Pierre, 2005), evaluation capacity is about mobilising such resources for evaluation activity. Evaluators come from a variety of disciplines and conduct evaluations in a mixture of policy settings with different needs and concerns. Evaluators also differ substantially in their approach to evaluation. Nonetheless, we can find overall consensus with regard to the importance of the following resource “stocks” (CitationPainter & Pierre, 2005: 4) for evaluation activity: the availability of (i) evaluation budget (CitationSchwartz, 1998), (ii) staff with expertise in evaluation (CitationMackay, 2002), (iii) capable external evaluators (CitationEuropean Commission, 2013), (iv) the presence of an evaluation unit (CitationBalthasar, 2010), (v) the availability of monitoring information (CitationMackay, 2002), and of evaluation skills. As to what should be counted as evaluation skills, the evaluation field is inconclusive. Varying upon the source and context (CitationMcDonald, Rogers, & Kefford, 2003), skills may primarily be skills to systematically collect and analyse data on program results (CitationUS GAO, 2003), or the deployment of a range of qualitative and quantitative methods (CitationStufflebeam, 2002). In this article, we generally focused on the disposition of (vi) organisational skills to conduct in-house evaluations, and the disposition of (vii) skills to outsource evaluations.

Evaluation demand. Various political and administrative actors can request evaluations. In our research, we study the necessity of evaluation demand coming from (viii) sector minister, (ix) organisational management, (x) parliament, and (xi) other public sector organisations from the same policy domain. To avoid a misleading focus on government alone at the source of policy capacity (CitationPainter & Pierre, 2005: 7) our research also considered (xii) evaluation demand coming from civil society organisations, which we moreover consider particularly relevant in the consociationalist context of Belgium. We also took into account the role of (xiii) evaluation requirements stipulated in legislative documents, coming from the Flemish government or from the European Union. Such evaluation requirements, usually present in the framework of funds or subsidies, have played a major role in the diffusion of policy evaluation across a wide range of countries (CitationFurubo & Sandahl, 2002). The European Structural Funds program, for which specific evaluation manuals were developed, is a well-known example (CitationStame, 2003). A last demand related condition that we will analyse is the (xii) in-house support for evaluations (CitationSchwartz, 1998). When demand for policy evaluation is not shared by the staff members of the administration, the actual implementation of evaluations can be difficult.

Nature of the organisation. What an organisation looks like, and the tasks it deals with, is another category that likely determines whether evaluation activity development is considered preferable and feasible. In CitationSchwartz's (1998) comparative study of programs, (xiv) size was the only statistically significant factor that proved to have a positive correlation with the incidence of evaluation in programs. Big organisations can be assumed to have more resources at their disposal to develop evaluation capacity. Especially in the context of the NPM oriented reforms that the Flemish sector implemented, also the (xv) status of an organisation is a useful factor to explore. As part of the reforms, the evaluation function was formally assigned to the departments. The question now arises to what extent this discourse is implemented in reality. Have departments and agencies taken up their prescribed roles? And what impact does the status of the organisation (department/agency) have on their current evaluation activity?

Next is a series of conditions that concern the tasks of the organisation. Policies may differ in the extent of (xvi) issue salience they have in media or parliament. With policy evaluation being increasingly treated as an accountability instrument (CitationDe Peuter, Pattyn, & Brans, 2008), salience can be assumed to influence the perceived need for evaluation activity (CitationHaarich & del Castillo Hermosa, 2004). A final set of conditions associated with the type of issues the organisation is faced with, relates to the (xvii) measurability of its outputs and (xviii) outcomes. With CitationWilson (1989: 159), we conceive these conditions as crucial characteristics that reveal a lot about the type of organisation. The link with policy evaluation is clear. Measurement is one of the core characteristics in our conceptualisation of evaluation. The ease with which an organisation can measure its outputs and outcomes can be assumed to have a major influence on its evaluation activity. Whereas Wilson speaks in relatively radical terms of complete observability or non-observability, we would rather emphasise the costs of observability (costs not only perceived in terms of financial costs, but also investments in staff, for instance) (CitationSchantz, 2000). When outputs/outcomes are not very costly to observe, an important barrier is removed, making it easier to proceed to the conduct of evaluations.

Path of organisation. A last category, and one that features heavily in historical neo-institutionalist approaches, is the path of the organisation. Conducting evaluations is a complex undertaking and evaluation capacity will only develop gradually. We will investigate whether (xix) pre-NPM reform evaluation experience is necessary for evaluation activity. Policy evaluation is also about change. With CitationSchwartz (1998), we speculate that a certain level of (xx) organisational stability, and (xxi) ministerial stability fosters a readiness for evaluations, and for the implementation of policy recommendations that follow from them.

3 Research strategy

3.1 Triangulating sources

Research about evaluation activity is complicated by several factors. A first difficulty springs from the conceptual inconsistency that characterises the evaluation landscape (CitationKing, 2003). When interviewing policy workers, it is far from easy to cope with the different notion they hold of evaluation. A second difficulty derives from the choice to examine a large number of conditions and from trying to operationalise them all in an objective way. Objectively measuring each of the supply conditions, however, would require lengthy studies in their own right. For most conditions, we will therefore take perceptions as proxies of the objective reality. With scholars as CitationScharpf (1997), for instance, we tend to believe that people will not act on the basis of objective reality, but on the way they subjectively perceive this reality. To overcome these difficulties and to get a reliable picture for each organisation, we extensively triangulated sources. The research proceeded in five steps. First, as already mentioned, we conducted a quick scan that consisted of semi-structured interviews with policy workers of the various departments. In the actual analysis, we proceeded with all departments active in vertical policy domains. Secondly, we conducted a pilot study in the domain education and training, during which we interviewed the management of all agencies in this field (n = 4), as well as the head of the Education Advisory Council. Thirdly, we organised a series of interviews with a large range of agencies (n = 17). Examples are the Flemish Energy Agency or the Flemish Transport Company. We strived to include agencies with and without evaluation practice. The quick scan hereto provided us with the relevant indications. In order to obtain as much variation as possible on the conditions, we deliberately tried to mix organisations with various legal statutes, various task contents, size, autonomy, etc. Similar than the departments, all are active in vertical policy domains. Organisations that were too drastically restructured following the NPM inspired reforms were not kept in the analysis. We also omitted such agencies as hospitals and psychiatric clinics, as well as external autonomous entities of private character, or autonomous entities that fell outside the NPM reform framework. To identify respondents, we contacted the management of the agencies, asking them who was best placed to provide us with insights about the state of policy evaluation activity in the organisation. Fourthly, after the completion of the interviews, we sent out a written survey to the same respondents interviewed. The main purpose of this survey was to corroborate the research findings collected during the interviews, as well as to collect evidence on a more systematic basis. Importantly, only organisations that participated in the interviews and in the survey were kept in the analysis. Fifthly, for each sector minister involved, we interviewed at least one of his/her personal advisors. This interview gave us a complementary check on our data. Finally, throughout the research, we conducted a document analysis, of which one purpose was to verify whether the reported evaluation studies were consistent with our definition.

Twenty-seven organisations (twenty-nine analytical cases) were ultimately included in the research: nine (vertical) departments and eighteen agencies (CitationFobé, Brans, Vancoppenolle, & Van Damme, 2013). Our study covers 41.5% of all entities of the Flemish public sector, that were operational at the time of our data collection, and that function in the NPM reform framework. Two departments combine policy fields that are very diverse in nature. These departments were split in two analytical cases each.

For the various dimensions of evaluation activity we used indicators that are relatively straightforward (see ). In our operationalisation of evaluation quality, we focused on the organisational intention to deliver high quality reports, rather than on the actual quality. We focused on a set of quality criteria that, we assume, are always important, regardless of context and purpose of the evaluation, and irrespective of the evaluation model (on this issue, see CitationPattyn, 2014b). Policy evaluation in Flanders, as clearly shows, is still in a premature level of development, generally speaking.

Table 1 Evaluation activity in the Flemish cases, for the three dimensions.

3.2 Boolean approach

Policy evaluation activity is determined by a multitude of conditions. The question now is which of these conditions are necessary for the incidence, regularity and quality of policy evaluations? The concept of necessity, in addition to the concept of sufficiency, has been a useful tool to cope with causal complexity (CitationLedermann, 2012). To be clear, the present article exclusively focuses on identifying the individual conditions that are necessary for one of the dimensions of evaluation activity, without considering whether these conditions are also sufficient. In qualitative comparative analysis (QCA), the approach that we use, the analysis of necessity usually paves the way for an analysis of sufficiency, in which combinations of conditions are the main focus. Yet, with CitationLedermann (2012: 166), we argue that: “with regard to complex social phenomena, such as evaluation use [and evaluation activity-own addition], where so many factors might be relevant, context-bound necessity claims seem more adequate than sufficiency claims. This is why necessity actually deserves an analysis in its own right”.

For the purposes of our research, and compatible with the original crisp set version of QCA, we translated all conditions in a Boolean way (0/1). Although this translation involves a certain loss of information, we strongly believe that the reliability of the research has benefited from this strategy. Various respondents of a single organisation sometimes differed in opinion about the degree of availability of a particular condition, but agreed on a certain pattern. Sometimes, we also observed inconsistency in nuances across interview and survey of one single respondent, whereas the overall pattern was consistent. A certain simplification of the data helped us to rectify these inconsistencies. A Boolean translation of the data is interesting for practitioner purposes as well. Policy makers typically think of actions to take or not to take, and can thus be expected to respond to such binary messages (CitationBlackman, 2013). To summarise: our approach was to proceed with broader trends that we could measure in a reliable and robust way, rather than with fine-grained nuances about which concerns can be raised in terms of intra-organisational and inter-source reliability.

Earlier, we described how we operationalised the dimensions of evaluation activity. As for the conditions, the information collected via the survey constituted our main reference point to calibrate the data in 1 and 0 scores. In addition, this information was compared with the data collected via the interviews and retrieved via documents. This approach gave us the opportunity to construct a summative picture per condition. It also helped us with uncovering certain inconsistencies between the sources and/or the different respondents for a particular case, and thus indicated where clarification was necessary. Appendix A lists how we measured each condition in a Boolean way. For instance, when there is no or hardly any evaluation demand coming from the sector minister, we conceived this condition as absent (code: 0). Inversely, when there is sometimes or frequent ministerial demand for evaluations, we coded this condition as present (code: 1). Appendix B provides the overview of translated cases.

3.3 Technical tools to identify necessity

QCA developed by CitationCharles Ragin (1987, 2000) is based on the above outlined Boolean principle, and offers the techniques to identify necessary conditions in a formal and systematic way. In this contribution, we restrict ourselves to a basic explanation of the tools used to identify necessary conditions. For more extensive technical details, and more background on the ontological underpinnings of QCA, we refer to specialised works (for instance: CitationRagin, 1987, 2000; Schneider & Wagemann, 2012). QCA follows the classic mathematical and philosophical vision to necessity, and considers a condition as necessary if it is always present/absent, whenever the outcome (i.e. evaluation activity) is present/absent (CitationGoertz & Starr, 2003: 3). Two measures of fit are key to assess the relevance (read: non-trivialness) of necessary conditions. Consistency is a first measure of fit. It expresses “the degree to which instances of the outcome agree in displaying the causal condition thought to be necessary” (CitationRagin, 2008: 44) (…) To put it in terms that are more familiar to quantitative scholars: “Consistency, like significance, signals whether an empirical connection merits the close attention of the investigator” (CitationRagin, 2008: 45). Technically, consistency can be computed as “the number of cases with a (1) value on the condition AND a (1) value on the outcome, divided by the total number of cases with a (1) value on the outcome” (CitationSchneider & Wagemann, 2012: 140). In our research, we have several outcomes. We calculate necessity for each dimension of evaluation activity. Consistency will be 100% if the necessary condition is shared by all cases with a particular outcome value. Imagine we found that all cases that experienced ministerial turnover are inactive in evaluations. This would suggest that ministerial turnover is a necessary condition for evaluation inactivity. In this situation, consistency would be perfect. If there were organisations that are inactive in evaluations, but which escaped from ministerial turnover, consistency would be lower. In line with QCA common practice, we apply a consistency measure of 90% (CitationSchneider & Wagemann, 2012: 143), and hence search for conditions that are almost always necessary. This approach opens up the possibility to take, for instance, measurement errors, chance, randomness and other “troubling aspects of social data” into account (CitationRagin, 2000: 109). In a way, this involves moving to a more probabilistic assessment of the data.

The consistency value only gives a partial assessment of the relevance of a particular condition. To be accurate, we should also look at the coverage measure. This refers to “the degree to which instances of the condition are paired with instances of the outcome” (CitationRagin, 2008: 45). We can compare coverage with what we consider as “strength” in correlational connections. Coverage indicates the empirical relevance of a condition-outcome linkage (CitationRagin, 2008: 45). Coverage of a necessary condition can be calculated as the “number of cases with a (1) value on the condition AND on the outcome, divided by the number of cases with a (1) value on the condition” (CitationSchneider & Wagemann, 2012: 144). When a necessary condition is unique for all cases with a particular value, the coverage is 100%. Referring back to our hypothetical case from above: if only the organisations that are inactive in evaluations experienced ministerial turnover, coverage will be 100%. If there were cases with ministerial turnover, but that were nonetheless conducting evaluations, the coverage of this necessary condition would be lower than 100%. If a particular necessary condition is featuring widely in both the presence and the absence of the outcome, the necessity qualification can be considered trivial. In our research, we consider a necessary condition as trivial if it indeed appears in both the presence and the absence of the outcome, or when its coverage value is lower than 50%. For our analysis, we used the necessity analysis tool of the fsQCA.2.0 software.

Importantly, we perform the necessity analyses for both the presence (1), as well as the absence (0) of the various dimensions of evaluation activity. This is in line with the asymmetric assumption of causality common to configurational comparative methods (CitationSchneider & Wagemann, 2012: 6). Explaining evaluation inactivity is presumably driven by other dynamics than explaining evaluation activity. The separate analysis of the presence and absence of the outcome gives us the possibility to take account of the cases that are in the planning stage of developing evaluation activity. These cases (i.e. DEPT9 and DEPT10) have a peculiar status. They cannot be considered as conducting evaluations, but they are neither fully inactive in evaluating.

3.4 Capacity for evaluation activity: necessary conditions

Which evaluation capacity building blocks are necessary for evaluation activity to take place? Is evaluation capacity for evaluation activity more about supply, or about demand? And is evaluation activity only feasible for a specific type of cases?

3.4.1 Capacity for evaluation conduct

offers an overview of necessary conditions with a minimum consistency value of 90% for evaluation conduct. The coverage score of each of these conditions is also mentioned. Conditions found as necessary but trivial from a coverage point of view are mentioned in parentheses. The ✓ symbol represents the necessary presence of a particular condition for a certain outcome. The

Table 2 Necessary conditions for the incidence of evaluation.

symbol stands for the necessary absence of a condition.

Having the skills to outsource evaluations appears to be an absolute minimum for the organisations that are active in evaluations or that have concrete plans to develop evaluation activity. This does not come as a surprise. Of the evaluation capacity building blocks, the skills for evaluation (in- or ex-house) are among the most cited (CitationDe Peuter & Pattyn, 2009). Some authors even use the term evaluation capabilities as a synonym for evaluation capacity (CitationMcDonald et al., 2003; Williams, 2001). More remarkable in our findings is the fact that none of the other supply related conditions, typically identified by evaluation capacity builders, is strictly necessary. This suggests that organisations that are committed to evaluate will always find creative ways to overcome deficits in supply. Often, the excuse is proffered that evaluation is not a prime occupation for organisations whose monitoring system is not fully developed. Yet, as our findings demonstrate, not all organisations active in evaluation have a well-developed monitoring system. The evaluation field offers a wide array of approaches or techniques that do not necessarily require a reliance on existing monitoring data.

Of the demand related conditions, none is strictly necessary for the conduct of evaluations. Also this finding runs counter the accepted ECB literature. Demand from organisational management has power only over those organisations that plan doing evaluations. Management can put evaluation on the organisational agenda, but other conditions will need to be met for evaluations to take place effectively. In an NPM setting driven by the primacy of politics principle, the sector minister will be an important actor. In absence of ministerial demand for evaluations, translating available evaluation capacity into real evaluation activity will be hard. Note that despite increased attention for evaluation in parliament (CitationPattyn and Brans, 2014), evaluation demand from members of parliament for evaluation is not essential for activity.

The adoption of the NPM inspired framework has had an ambiguous impact. It did not really affect the cases that evaluated prior to the reforms. All cases, agencies included, that were active in evaluation before the reforms, continued on this track. Evaluation activity, once developed, is not easily neutralised. Of various policy supportive functions, policy evaluation is perhaps the most bureau-politically sensitive. Evaluation capacity is more than the conduct of evaluations only. The evidence that follows from evaluation studies is potentially powerful material that can be used by agencies to back their legitimacy or their request for more resources. Giving up the evaluation function would imply a loss of this power. For departments without an evaluation history, the framework had more effect. All but one of the departments are now at least planning to conduct evaluations.

For evaluation capacity builders, it is encouraging to observe that evaluation activity does not require any other organisational characteristics to be met. None of the conditions related to the nature of the organisation is found necessary.

The table, finally, lists some trivial necessary conditions: the absence of legislative evaluation requirements and the perceived measurability of organisational outputs. At least in Flanders, these factors will not play a major role in distinguishing between evaluation activity and inactivity.

3.4.2 Capacity for evaluation regularity

Evaluation capacity building can only be considered successful when organisations are regularly conducting evaluations ( ). The ECB definition referred to above (CitationStockdill et al., 2002: 8) speaks about evaluations as “ordinary and ongoing practice”. Of the long list of conditions investigated, only pre-reform evaluation experience of the organisation comes closest to qualify as a necessary condition. The implementation of the NPM framework is still a relatively recent phenomenon. It would not be realistic to expect that organisations without an evaluation tradition, have the same pace of evaluating than those organisations with experience. Regularly conducting evaluations requires practice and exercise. Also a lot of organisations without evaluation expertise evaluated already prior to the reforms. The necessity status of this condition is hence trivial.Footnote1

Table 3 Necessary conditions for evaluation regularity.

The analysis of necessity of the cases without evaluation regularity is more revealing. The analysis confirms the relative position of managerial demand in the web of actors that may demand evaluation. Organisational management can trigger an organisation to evaluate, but evaluation regularity will not depend on it.

Of the seven observed cases with regular evaluation activity, five can be called instances of policy capacity enforcement through evaluation requirements in legislation. Evaluation clauses or evaluation sunset legislation is an emerging phenomenon in Belgium. The inclusion of evaluation clauses in legislation often serves as leverage to achieve a compromise in difficult negotiations over policies. This being said, also the European Union has played an influential role in the development of evaluation regularity, via the imposition of evaluations as conditions for receiving European structural funds. As CitationKnill (2005: 52) accurately described: domestic policy capacities are increasingly influenced by intergovernmental dependencies. In a way, national policy capacity in evaluation, can be explained by a transfer of European policy capacity. Supranational compliance procedures have strengthened national capacity in evaluation (CitationKnill, 2005: 58). The manuals and trainings developed by the European Union, and disseminated in the framework of the structural funds evaluations, proved to be very powerful tools in this respect.

Although the overwhelming majority of cases perceives its outputs as relatively easy to measure, this perception especially applies to the cases that are not regularly active in evaluations. CitationWilson's (1989) classification of organisations is enlightening in this respect. Cases that have observable outputs are typically production or craft organisations. Although evaluation is in principle easier for such type of outputs, since data will be more easily available, production and craft organisations are usually not dealing with the type of policy evaluations as we defined them. For evaluation regularity as an activity dimension, the nature of an organisation will indeed matter.

3.4.3 Capacity for evaluation quality

The conduct of high quality evaluations is another core ambition of evaluation capacity builders ( ). High quality evaluations, as the analysis of necessity shows, will in the first place be a matter of culture and experience rather than of evaluation supply. The anchorage of the evaluation function is an exception. The necessity status of an evaluation unit complies with the importance of institutionalising evaluation capacity. While not essential for the incidence of evaluation or for evaluation regularity, having an evaluation unit may help to sensitise staff and management about the importance of evaluation quality measures and may guarantee its application. The policy workers active in such evaluation units frequently have a background in research, and usually understand the importance of quality control for evaluations. Although our evidence is too limited to draw robust conclusions about this, we speculate that organisations accountable to a sector minister with research affinity are also more open to evaluation quality control. Further research should verify this. Demand from organisational management returns as necessary, but again trivial, for cases without an evaluation quality reflex. It is an important condition, but in the political context of evaluation, demand from the sector minister will be more decisive for evaluation quality. The application of evaluation quality measures takes time (CitationPattyn, 2014b), and can lengthen the evaluation process. Ministers should be convinced of the value of evaluation to be ready to consider quality measures.

Table 4 Necessary conditions for the application of evaluation quality measures.

The table lists three other trivial necessary conditions that are nonetheless worth mentioning. As hinted at above, the nature of civil society can be assumed to influence the policy capacity of a state (CitationPainter & Pierre, 2005: 11–13). Flanders is characterised by a dense advisory system (CitationSpeer, Pattyn, & De Peuter, 2015). Civil society organisations, active in advisory bodies, are increasingly demanding policy evaluations. Of the eighteen cases that conduct evaluations, fourteen indicated to be subjected to such demands. It remains to be investigated whether there is any causal relationship with the extent of evaluation quality measures applied. All organisations in the latter situation indeed receive civil society evaluation requests. The same reservations should be made with regard to the condition referring to the pre-reform evaluation experience of the organisation. Organisations with evaluation expertise, will be more ready to develop evaluation quality measures. Based on their experience, they may have learnt that the investment in such measures is fruitful in the long run.

4 Conclusion

There are plenty of evaluation criteria: economy, efficiency, coherence, legitimacy, equity, decisiveness… Depending on the criteria used, evaluations can be a supportive function for administrative capacity, state capacity or policy capacity. Having thorough insight in the necessary conditions for organisational evaluation activity can be useful for governments willing to invest in evaluation capacity building. Necessary conditions should be understood as the conditions that enable the presence or absence of an outcome value, in our particular pool of analysis. They can be considered as essential, though not sufficient, preconditions for the outcome to occur (CitationLedermann, 2012). The research demonstrates the value of using the QCA analysis of necessity tool to gather systematic evidence on policy capacity supporting tools. The findings apply to the Flemish public sector in the first place, but can be inspirational for other public sectors where evaluation activity is still in development.

From our various analyses, we conclude that all but one of the activity dimensions require at least one necessary condition. Regular evaluation practice is the exception in this regard. There is no compulsory element to reach this outcome. At the outset of this study, we intuitively expected that evaluation supply would be the most necessary category in explaining evaluation activity. This is mainly based on the observation that the largest share of testimonials from evaluation capacity builders concentrates on the necessity of having the appropriate capabilities to evaluate. Yet, this could not be confirmed by the analysis. Instead, what is needed most across the board is evaluation demand.

Our study provides relevant guidelines for evaluation practitioners and governments about what they should prioritise when building evaluation capacity. Admittedly, not all conditions can be easily translated into hands-on lessons. Possible actions to foster evaluation demand can be the provision of evaluation training and networking events. Yet, we realise that mobilising ministers and their personal advisors for these kind of events is more difficult. To trigger evaluation supply, promising capacity building initiatives include establishing evaluation units, developing an evaluation best practice database, or testing evaluation skills when recruiting staff for policy units. The category referring to the nature of the organisation is only necessary to understand low evaluation regularity. From an ECB perspective, this is promising. Evaluation capacity and evaluation activity is attainable for organisations, even for those that are small, deal with less salient policy issues, or find it difficult to measure outputs and outcomes.

Notes

* The article presents (part of the) findings of PhD research conducted at KU Leuven, Public Governance Institute, Belgium.

1 This also applies for the condition referring to the skills to outsource an evaluation. All organisations that once conducted an evaluation meet this requirement (see above). Its discriminatory potential between high evaluation regularity and low regularity, and between high quality and low evaluation quality is nil. We therefore no longer listed this condition in the table.

References

  • A. Balthasar . Are there bases for evidence-based health policy in Switzerland? Factors influencing the extent of evaluation activity in health policy in the Swiss cantons. Evidence and Policy. 6(3): 2010; 333–349.
  • T. Blackman . Rethinking policy related research: Charting a path using qualitative comparative analysis and complexity theory. Contemporary Social Science: Journal of the Academy of Social Sciences. 8(3): 2013; p.333–p.345.
  • M. Brans , D. Vancoppenolle . Policy-making reforms and civil service: An exploration of agendas and consequences. M. Painter , J. Pierre . Challenges to state policy capacity: Global trends and comparative perspectives. 2005; Palgrave Macmillan: Basingstoke 164–184.
  • B. De Peuter , V. Pattyn , M. Brans . Policy evaluation before and after the governmental reform in Flanders (Belgium). Progress in evaluating progress?. 2008; Paper presented at the European Evaluation Society Conference: Lisbon, Portugal
  • B. De Peuter , V. Pattyn . Evaluation capacity: Enabler or exponent of evaluation culture. A. Fouquet , L. Méasson . L’évaluation des politiques publiques en Europe. Cultures et futures. 2009; l’Harmattan: Paris 133–142.
  • European Commission . EVALSED. The resource for the evaluation of socio-economic development. 2013. Retrieved from: http://ec.europa.eu/regional_policy/sources/docgener/evaluation/guide/guide_evalsed.pdf .
  • E. Fobé , M. Brans , D. Vancoppenolle , J. Van Damme . Institutionalized advisory systems: An analysis of member satisfaction of advice production and use across nine strategic advisory councils in Flanders (Belgium). Policy & Society. 32(3): 2013; 225–240.
  • J.E. Furubo , R. Sandahl . Introduction. A diffusion perspective on global developments in evaluation. J.E. Furubo , R.C. Rist , R. Sandahl . International Atlas of Evaluation. 2002; Transaction Publishers: New Jersey 1–23.
  • G. Goertz , H. Starr . Necessary conditions: Theory, methodology and applications. 2003; Rowman and Littlefield: Oxford
  • S. Haarich , J. del Castillo Hermosa . Development of evaluation systems — Evaluation capacity building in the framework of the new challenges of EU Structural Policy. 2004; Paper presented at the ESRA Conference, Porto, Portugal.
  • J.A. King . The challenge of studying evaluation theory. New Directions for Evaluation. 97 2003; 57–67.
  • C. Knill . The Europeanization of national policy capacities. M. Pierre , J. Pierre . Challenges to state policy capacity. Global trends and comparative perspectives. 2005; Palgrave Macmillan: New York 52–72.
  • S. Ledermann . Exploring the necessary conditions for evaluation use in program change. American Journal of Evaluation. 33(2): 2012; 159–178.
  • K. Mackay . The World bank's ECB experience. New Directions for Evaluation. 93 2002; 81–99.
  • B. McDonald , P. Rogers , B. Kefford . Teaching people to fish? Building the evaluation capability of public sector organisations. Evaluation. 9(1): 2003; 9–29.
  • M. Painter , J. Pierre . Challenges to state policy capacity. Global trends and comparative perspectives. 2005; Palgrave Macmillan: New York
  • V. Pattyn . Policy evaluation (in)activity unravelled. A configurational analysis of the incidence, number, locus and quality of policy evaluations in the Flemish public sector. (PhD Dissertation) 2014; KU Leuven Faculty of Social Sciences: Leuven
  • V. Pattyn . Why organisations (do not) evaluate? Explaining evaluation activity through the lens of configurational comparative methods. Evaluation: The International Journal of Theory, Research and Practice. 20(3): 2014; 348–367.
  • V. Pattyn , M. Brans . Explaining organisational variety in evaluation quality assurance. Which conditions matter?. International Journal of Public Administration. 37(6): 2014; 363–375.
  • C.C. Ragin . The comparative method. Moving beyond qualitative and quantitative strategies. 1987; University of California Press: London
  • C.C. Ragin . Fuzzy set social science. 2000; University Chicago Press: Chicago
  • C.C. Ragin . Fuzzy set analysis of necessary conditions. G. Goertz , H. Starr . Necessary conditions. Theory, methodology and applications. 2003; Rowan and Littlefield: Oxford 179–224.
  • C.C. Ragin . Redesigning social inquiry: Fuzzy sets and beyond. 2008; University Chicago Press: Chicago
  • K. Schantz . Performance management is a recipe for centralization?. 2000; Paper prepared for the Contested Issues in Public Management Seminar June 2000. London School of Economics: London
  • F.W. Scharpf . Games real actors play: Actor centered institutionalism in policy research. 1997; Westview Press: Oxford
  • C.Q. Schneider , C. Wagemann . Set-theoretic methods for the social sciences. A guide to qualitative comparative analysis. 2012; Cambridge University Press: Cambridge
  • R. Schwartz . The politics of evaluation reconsidered: A comparative study of Israeli programs. Evaluation. 4(3): 1998; 294–309.
  • M. Scriven . Evaluation thesaurus. 4th ed., 1991; Sage Publications: Newbury Park, CA
  • S. Speer , V. Pattyn , B. De Peuter . The growing role of evaluation in parliaments: Holding governments accountable?. International Review of Administrative Sciences. 81(1): 2015; 37–57.
  • N. Stame . Evaluation and the policy context: The European experience. Evaluation Journal of Australasia. 3(2): 2003; 36–43.
  • S.H. Stockdill , M. Baizerman , D.W. Compton . Towards a definition of the ECB process: A conversation with the ECB literature. New Directions of Evaluation. 93 2002; 7–25.
  • D.L. Stufflebeam . Institutionalizing evaluation checklist. 2002; Western Michigan University, The Evaluation Center. Available from: www.wmich.edu/evalctr/checklists .
  • US GAO (United States General Accounting Office . Program evaluation. An evaluation culture and collaborative partnerships help build agency capacity. 2003; Report to Congressional Committees, May 2003.
  • B.B. Volkov , J.A. King . A checklist for building organizational evaluation capacity. 2007; Evaluation Checklists Project, Western Michigan University, The Evaluation Center.
  • F. Varone , S. Jacob , L. De Winter . Polity, politics and policy evaluation in Belgium. Evaluation. 11(3): 2005; 253–273.
  • Vlaams Parlement . Kaderdecreet Bestuurlijk Beleid. 2003; Vlaams Parlement: Brussel
  • B. Williams . Building Evaluation Capability. 2001. Unpublished paper. Available from: http://users.actrix.co.nz/bobwill/ .
  • J.Q. Wilson . Bureaucracy. 1989; Basic Books: New York
Appendix A: Measuring and calibrating the conditions

Appendix B: Cases translated in Boolean variables

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.