Thousands of consultations, diagnoses and treatment plans: each one a valuable data point for training AI. Algorithms that speed up diagnoses, improve treatments and ultimately save lives. But behind every data point is a patient who came to their healthcare provider in confidence, and those records are protected by medical confidentiality.
That makes training AI models with patient data legally complex. In this blog, we explain what is currently permitted, where the obstacles lie, and what will change with the arrival of the European Health Data Space (EHDS).
The short answer: yes, you may use patient data to train AI, if the provided is truly anonymous. But with medical data, genuine anonymisation is technically very difficult. The combination of diagnoses, treatment data and demographic characteristics often makes re-identification possible, particularly for rare conditions. In practice, you will quickly find yourself working with identifiable health data.
And that is where things get complicated. You must comply with both the General Data Protection Regulation (GDPR) and the duty of medical confidentiality under the Dutch Medical Treatment Contracts Act (WGBO). In the end, both routes amount to the same thing: explicit consent from the patient. On top of that, a Data Protection Impact Assessment is almost always required when processing health data on a large scale.
That sounds workable, but in practice it stalls. Patients who withhold consent drop out of the dataset, which may cause the algorithm to perform worse for certain groups. For historical data, obtaining consent is often impossible. And approaching thousands of patients individually is logistically complex and costly.
In short: the consent model hampers innovation. And that is precisely what the EHDS aims to address.
The EHDS introduces a new route for "secondary use" of health data: use for purposes other than direct care for the patient concerned. Think of research, policy-making and training algorithms.
The core of the shift: where explicit consent is currently the rule, opt-out becomes the default. Health data will in principle be available for secondary use, unless the patient actively objects.
This reversal has far-reaching legal consequences. The legal basis shifts from patient consent to a statutory obligation for data holders (healthcare providers) to make data available. Medical confidentiality is also overridden on the basis of this statutory obligation, no longer on the basis of individual consent.
For patients, this represents a material change to their position: they must take action for themselves to keep their data out of secondary use. Healthcare institutions will have an active duty to inform patients about this.
The EHDS sets out an exhaustive list of purposes for which data may be used via this route. Recognised purposes include scientific research, public health policy-making, education, and the training, testing and evaluation of AI systems and medical devices.
Under the EHDS, access to data does not run directly between the healthcare institution (data holder) and the data user. The data user may be a researcher or a commercial ICT service provider. The HDAB (Health Data Access Body) acts as gatekeeper for all parties.
Anyone wishing to use data submits an application to the HDAB. The HDAB assesses:
Does the intended use fall within the exhaustively listed recognised purposes?
Is the data requested proportionate to the purpose?
Are adequate technical and organisational safeguards in place?
Upon approval, a data permit is issued with specific conditions. Purpose limitation continues to apply strictly: data received for a diagnostic algorithm may not be reused for another purpose without a new permit. Breach of this can lead to revocation of the permit, exclusion from future applications and GDPR enforcement. Moreover, the data does not, in principle, leave the HDAB's secure processing environment.
Clinical AI systems are additionally subject to the AI Act. These systems typically qualify as high-risk, with strict data governance requirements. Training data must be relevant, representative and as free from bias as possible.
The AI Act contains its own legal basis for processing special categories of personal data for bias detection and correction. Attractive, but very strict: the provision applies only to those who qualify as a provider within the meaning of the AI Act and imposes strict (cumulative) conditions.
Article 10(5) permits processing only "to the extent that it is strictly necessary for the purposes of ensuring bias detection and correction." The provision itself specifies GDPR-oriented measures: pseudonymisation, access controls and short retention periods. For the Dutch healthcare context, this is probably too narrow. Medical confidentiality under the WGBO is also a fundamental rights safeguard, but of a fundamentally different nature: it protects the relationship of trust between doctor and patient, not the technical integrity of data. Pseudonymisation does not resolve that. In our view, anyone seeking to rely on paragraph 5 will therefore need to supplement the "appropriate safeguards" with a ground for overriding medical confidentiality (in practice, the patient consent). This significantly reduces the appeal of this route.
Practically speaking, synthetic data is often the most legally workable route: artificially generated data that preserves the statistical patterns of the original dataset without being traceable to individual patients. This takes you outside both the GDPR regime and the scope of medical confidentiality, provided the quality of the synthetic data is sufficient to actually detect bias.
The legal scope for training AI with patient data will shift significantly in the coming years. In the short term, consent remains the main rule, with all the practical limitations that entails. In due course, the EHDS will open a new route via the Health Data Access Body, with opt-out as the default.
For healthcare institutions, it is currently worth thinking about three questions: which data do you want to make available, how will you set up the opt-out mechanism, and what contractual arrangements will you make with AI suppliers and ICT service providers regarding roles, permits and ownership of trained models?
Do you have questions about these developments? Feel free to reach out if you'd like more information about the AI Act and EHDS.