Impact is created in interaction – why doesn’t a randomized trial see it?

Published
13.5.2026
The views expressed in the blog posts are the writers' own and do not represent the official position of the institution.

The randomized controlled trial (RCT) is repeatedly brought up in social and health care whenever certainty and credibility are sought to support decision-making. When public money is invested in new solutions – especially artificial intelligence – pressure increases to demonstrate impact through “hard evidence.” The RCT offers a clear and widely accepted answer to this demand. It is a methodological safe harbor whose language funders, decision-makers, and researchers understand.

For this reason, it is also relied upon in situations where the phenomenon being evaluated does not align with its assumptions. The RCT is not only a method, but part of a way of thinking in which impact is seen as an isolatable, measurable, and generalizable property.

This way of thinking can be challenged by using artificial intelligence in social and health care as an example. The effects of AI only emerge when technology, people, and organizational practices meet in everyday work. Still, we easily return to the RCT design because it offers a clear structure amid complexity.

Problem 1: We imagine an intervention that is actually an entire operating environment

AI solutions are not standardized interventions that can be detached from their context. They take shape through the interaction of data, professional interpretations, leadership, resources, and clients’ reactions. The effect is not static, but continuously evolving. This challenges RCT thinking, which requires that an intervention can be isolated and measured in a controlled manner.

An RCT may produce a precise number, but it does not explain how and why the effect comes about. The key question – how AI works in everyday practice – remains unanswered.

The introduction of AI changes task allocation, prioritization, and responsibilities. At the same time, organizational practices shape the system itself.

Thus, the object of evaluation is not a single “AI intervention,” but a set of relationships that are constructed in practice. The RCT seeks to isolate an effect, even though the effect arises precisely from this intertwinement.

Problem 2: We measure outcomes, not how they are produced

The impact of AI does not emerge solely from the accuracy of predictions or analyses. These are merely information. Impact only arises once the information is put to use: it is interpreted, trusted (or not), and used as a basis for changing practices.

The same prediction can lead to completely different decisions in different organizations – or remain unused altogether.

When we measure only outcomes, such as numbers of visits or costs, this intervening process disappears from view. The RCT assumes that impact proceeds linearly from intervention to outcome, even though in reality it is constructed through multistage interaction.

Problem 3: The evaluation approach shapes reality

An evaluation framework is not neutral. It defines what is considered meaningful. When we primarily measure individual-level outcomes, we may overlook how technology alters organizational relationships, decision-making hierarchies, or the distribution of responsibility.

At the same time, we reinforce the notion that problems can be solved through technical optimization, even when they are related to structures and the organization of work. The evaluation approach begins to steer development: what is measured is what gets developed.

We need evaluation that follows practice – not only indicators

In evaluating AI, implementation should be examined as part of a broader operational whole. Instead of asking only whether the system reduced a particular single indicator, we should examine how it changed practices, what kinds of dependencies it created, and what kinds of interpretive practices developed around it.

This requires complementing quantitative data with qualitative analysis: observation, document analysis, and discussions with professionals. Only then can we understand why technology produces certain kinds of outcomes in a particular organization. Evaluation must accompany everyday practice, not merely assess it retrospectively.

Impact is not a universal property

Technology is not a finished package that can be transferred unchanged from one context to another. Each implementation is a negotiation in which the system adapts to local practices and practices adapt to the system. Impact is a situational phenomenon, not a permanent property that can be demonstrated once and assumed to be universal.

If we cling to a single evaluation model in situations where the object is a mutable operational whole, the risk is that we obtain precise answers to the wrong questions.

Expanding evidence, not narrowing it

The issue is not that the RCT is a wrong method. It addresses well questions in which the intervention is clearly defined and standardizable, such as medications and simple clinical procedures. Problems arise when it is applied to situations where the object of evaluation takes shape through more complex interaction.

AI solutions in social and health care belong to this category. Their effects emerge through the interplay of people, processes, and environments, and are shaped by local practices.

Therefore, we need evaluation that recognizes this complexity. Evidence should not be narrowed to a single model, but expanded so that it also makes visible how impact is actually constructed in practice.