Memorisation and copyright infringement: when does AI remember too much?

Generative AI (hereinafter: “GenAI”) has quickly become an integral part of creative processes. GenAI models can effortlessly produce text, images, and music, sometimes even in the form of (almost) literal reproductions of existing copyrighted works. This is referred to as “memorisation.” Memorisation occurs when a GenAI model is able to reproduce (fragments of) training data verbatim. This does not involve imitating a particular style or providing a general summary of a piece but rather delivering output in which the work can actually be seen one-to-one.

This creates legal tensions between copyright and GenAI. For organizations that use GenAI, this topic is also directly relevant because it is not always predictable whether output can (partly) rely on copyright-protected works that appear in training data. Increasingly, disputes focus not only on the training of GenAI, but also on whether the output itself can constitute a copyright infringement. At the time of writing, there is no definitive ruling on this matter in the Netherlands, but judges in several foreign jurisdictions have ruled on this issue. The key question is therefore: can memorisation lead to the generated output or the GenAI model as such being regarded as an infringing work, and who is liable for this? This blog discusses these judgments and their implications.

Copyright infringement

First of all, it is important to determine when an infringement of copyright has actually occurred. Under the Dutch Copyright Act, the copyright holder has two exclusive rights: ‘the right to make the work available to the public’ and ‘the right to reproduce it’ (also known as ‘copying’). An act of publication of the work occurs when an existing protected work is shown to an audience without the copyright holder's permission. Reproduction involves both copies of an existing work and imitations in modified form, provided that the protected features of the original work have been copied. Only the copyright holder may perform these acts with regard to their work. Others may only do so with the permission of the copyright holder.

There is no consensus among legal experts as to whether memorisation constitutes an infringement. In a sense, it seems logical that a virtually literal output could qualify as (unauthorised) reproduction, but the complicating factor is that the GenAI model generates output based on statistical patterns and not by storing a ‘copy’ in the traditional sense. If the output of a GenAI model qualifies as reproduction, then making the GenAI model available to the public could be seen as making the work available to the public. In addition, it could be argued that the distribution of the GenAI model is in itself an act of making the work available to the public, because it gives the public access to the copyrighted work in question.

Germany (GEMA v. OpenAI)

On November 11, 2025, the Munich Regional Court ruled on memorisation, among other things, in a case between GEMA and OpenAI. GEMA, an association for musical performance and reproduction rights that acts on behalf of artists (comparable to the Dutch BUMA/STEMRA), had filed a lawsuit against OpenAI because they had used lyrics from well-known German songs in the training of ChatGPT. Among other things, they argued that the lyrics could be reproduced almost word for word and should therefore be stored in ChatGPT's parameters. OpenAI defended itself by arguing that no training data is stored in the GenAI model, but that the model only shows statistical correlations that it has distilled from the entire dataset. According to OpenAI, the alleged infringement was therefore a result of the prompts entered by the end user, which are beyond OpenAI's control.

The court conducted a comparative analysis between the original song lyrics and the output generated by ChatGPT using relatively simple prompts (such as “What is the chorus of song A?” or “What are the lyrics to song B?”). This output proved to be virtually identical to the song lyrics, with sporadic hallucinations of minor significance. The court therefore ruled that the lyrics were stored in ChatGPT's parameters, constituting copyright infringement. OpenAI, rather than the end user, was held responsible for this. The court therefore did not accept OpenAI's argument. In Germany, it was thus determined that memorisation qualifies as a copy of the work and therefore constitutes infringement.

What is striking is the broad scope of the concept of reproduction, now that even numerical storage (such as parameters) can constitute an infringement if the text is ultimately reproducible. A reproduction does not therefore have to be a directly visible copy in a ‘human-readable’ form, as long as the recording allows for the reconstruction of the work. This approach thus indirectly supports the view that a GenAI model can contain a relevant ‘recording’ in the copyright sense if it can reproduce protected works (almost) verbatim.

United Kingdom (Getty v. Stability AI)

However, the British court ruled differently in the case between Getty Images and Stability AI. Stability AI is the developer and operator of the GenAI model Stable Diffusion, which can be used to generate images by means of prompts. According to stock photo website Getty Images, the operation of Stable Diffusion constituted copyright infringement, as the GenAI model itself had become an ‘infringing work’ as a result of being trained with copyright-protected material.

The judge acknowledged that memorisation could hypothetically occur, but Getty Images was unable to prove this in the present case. Without this evidence, the British judge concluded that there was no copyright infringement. This ruling seems diametrically opposed to the German court's ruling in GEMA v. OpenAI, but that is not necessarily the case. In particular, there was a lack of evidence from Getty Images. If, as in GEMA v. OpenAI, an investigation had been conducted in which virtually literal reproductions had been generated, this case might have turned out differently. The difference in form of expression is also relevant in this sense, as it is easier to verify an infringement of textual works than of images, especially when hallucinations occur sporadically.

United States (Bartz v. Anthropic)

In a previous blog, I discussed the Anthropic case, in which it was decided that training Anthropic's GenAI model with lawfully obtained copyrighted books was ‘fair’. The US court considered that the copyrighted books undergo a transformation process during training, making it unrealistic that the output could produce a complete or coherent reproduction of the books.

This case did not address the question of whether memorisation per se constitutes an infringement but used the concept of ‘memorisation’ as a point of view in assessing whether the use of a protected work was permissible. If Anthropic had been able to create literal copies of protected works, this assessment might have been decided to Anthropic's disadvantage.

China

An interesting ruling was also issued in China by an internet court in Guangzhou (Canton). The case, published on February 10, 2025, centered on GenAI-generated images that imitated protected works from the Japanese superhero franchise Ultraman. The GenAI platform is not named, but it offered users the ability to generate, publish, and download images that closely resembled protected works from Ultraman as “their own content.” The platform explicitly offered this possibility by prominently positioning the infringing images on the homepage under “Recommended” and “IP Works.”

The court did not hold the platform liable for direct copyright infringements, but ruled that the platform was liable on the grounds of complicity because the platform derived commercial benefit from the GenAI service (1), it concerned well-known protected works by Ultraman (2), the output was stable and repeatable (3), and no proactive moderation was applied by the platform (4). All in all, the platform could not therefore escape liability via a safe harbor exemption. In the European Union, the Digital Services Act provides for a similar form of platform liability in the case of knowledge of infringing content. Unlike in GEMA v. OpenAI, the provider of the GenAI model did not independently commit copyright infringement but was nevertheless held indirectly liable.

What now?

Over the past year, several rulings have been issued that addressed the issue of memorisation. The ruling in GEMA v. OpenAI is particularly interesting because it was issued by a judge within the European Union. However, it should be noted that the ruling was issued by a lower German court and that an appeal is still pending. Copyright law has been partially harmonized within the European Union. There is a real chance that preliminary questions will ultimately be referred to the Court of Justice. It is therefore not yet certain that the ruling will be followed throughout the European Union.

Outside the European Union, judges have also ruled on the concept of “memorisation,” with varying results. In the United Kingdom, memorisation was considered to be an infringement, but the argument was rejected due to evidentiary issues. In the United States, memorisation was taken into account as a factor in the ‘fair use’ assessment. In China, a GenAI platform was indirectly held liable for infringing images.

All in all, case law seems to be slowly exploring the option that memorisation is a form of copyright infringement. In certain cases, this may also raise questions about the GenAI model as such, particularly when it structurally enables (virtually) literal reproductions. The likelihood of memorisation is not just related to the size of the training dataset, but above all to how frequently the work in question appears in the training data. A work that only appears incidentally in the dataset is less likely to be reproduced than a work that appears very often in the training data. GenAI models trained with large amounts of data therefore run a real risk of memorisation, unless the model ‘learns’ that it is not allowed to imitate works. This is referred to as alignment or guardrails. However, this poses technical challenges. In addition, the potential classification of memorisation as an infringement also has consequences for end users and professional customers who use and publish output. It is therefore important to stay well informed of the latest developments.

Discover our other blogs on intellectual property and dive deeper into this topic.

Intellectual property blogs

Back to overview