May Meta use your data to train AI? These are the rules!

Blog - AI / 27 May 2025

Imagine this: you post a holiday photo on Instagram, add a funny caption, and without realising it that same content is later used to train an AI model. It may sound strange, yet this is exactly what Meta, the parent company of Facebook and Instagram, intends to do1. From 27 May Meta will use the personal data of adult users in Europe to train its AI models. This concerns everything you have shared publicly, from photos to captions, comments and possibly even audio clips. These data are collected, processed and analysed, after which the AI ‘learns’ from your behaviour, language and preferences. The reason is that Meta wants to build AI that is smarter, more personalised and above all more relevant to European users.

But this decision by Meta is not without controversy. The privacy organisation NOYB, led by Max Schrems, has already issued Meta with a formal cease and desist letter. Other organisations are calling on users to lodge objections against the use of their data. This raises a logical question: is Meta allowed to do this? And how does this relate to the rules on personal data and AI?

On which legal basis does Meta rely?

To answer that question we need to turn to the General Data Protection Regulation (GDPR). This European privacy law states that personal data may only be processed if there is a valid legal basis. Consent is the most familiar basis, meaning that you must explicitly agree to the use of your data. In the context of AI training this is difficult. How can you give consent if you do not know which data are used or for what precise purpose? For that reason many companies, including Meta, rely on another basis: legitimate interest.

Under the Regulation this basis is allowed, but only if three conditions are met. There must be a legitimate interest, the processing must be necessary to achieve that interest, and the rights of users must not outweigh the interest of the organisation. This may sound abstract, but the European Data Protection Board (EDPB) recently published guidance that provides more clarity2. It identifies legitimate interests such as building chatbots, detecting fraud or improving security systems. Meta states that it wants to train AI with European data so that its systems handle local languages, cultures, humour and habits more effectively, which in its view improves the user experience. Think of an AI that understands Flemish expressions or recognises sarcasm in French.

There is more. Even if Meta’s interest is legitimate, it must still be assessed whether the use of personal data is genuinely necessary. Could the same purpose be achieved with anonymised or synthetic data? And more importantly, are users’ rights sufficiently protected? The EDPB emphasises that companies such as Meta must take measures to limit risks for users. These may range from technical safeguards to clear communication and the possibility to object. What is not permitted is a vague or generic justification. Each processing activity must be assessed and substantiated individually.

Many large companies assume that whatever users put on the internet, whether a tweet, a blog post or a public profile, is free for anyone to use. In the context of AI training the legal position is far more complex. A recent study examined whether companies such as OpenAI may use publicly available personal data to train large language models like ChatGPT3. The conclusion is that the fact that data are publicly accessible does not mean they may be used without consent. The key question is whether people could reasonably expect that their old forum posts or profile photos would be used to make a chatbot smarter. The GDPR contains an exception for data that a person has “manifestly made public”. But this applies only where that person published the data knowingly and intentionally. This may be the case for a LinkedIn post, but what about a photo you posted to Instagram ten years ago?

Are data subjects’ rights safeguarded?

Alongside the question of whether the processing is lawful, a crucial point remains: the users’ rights. Even if Meta has a valid legal basis, users cannot be sidelined. You have the right to information and transparency. You must know which data are used, why and for what purpose. In practice this means that Meta must not rely solely on a reference in its terms and conditions but must actively communicate through its privacy notice.

You also have the right of access. You may ask whether your data are being used and, if so, which ones. In the AI context this is complex. AI models are often black boxes and the data are not always traceable to individuals. Nonetheless, the principle of the GDPR remains that if your data are included, you have a right of access.

The right to object also applies. If you disagree with the use of your data based on legitimate interest, you may object. Meta must then demonstrate that its interest outweighs your rights. If this cannot be demonstrated, the processing must stop. In practice this right is often invoked when people discover that their public social media posts have been scraped and used for AI training without their knowledge. In the context of AI training this is difficult. Once a model has been trained it is not always technically possible to remove individual data. Even so, the law obliges organisations to offer a solution or to show why this is not possible. Some companies go beyond the legal minimum and offer users the option to have their data removed from an AI system for ethical reasons.

In short, Meta cannot simply use your data for AI training, even if you once made it public yourself. Strict rules apply. These require not only a valid legal basis but also a fair balancing of interests and compliance with your rights as a user. Whether Meta satisfies all these requirements will undoubtedly be the subject of debate and possibly litigation in the near future. This could cause significant difficulties.

It is clear that Meta has a legitimate interest in developing its AI. The question is whether processing all users’ personal data is necessary to achieve that aim. If the same objective can be reached with anonymised or synthetic data, this would suggest that the processing fails the necessity test. There is also room for doubt over whether Meta’s interests outweigh the privacy of users. Many people do not expect that old, seemingly harmless posts, such as a holiday photo, will be used to train AI. The GDPR states that such processing must be foreseeable, which is not evident here. Meta also has considerable work to do to make data subjects’ rights effective and accessible.