2026-04-24·12 min read·

EU AI Act Art.52: Obligations for Providers of General-Purpose AI Models — Documentation, Copyright Policy, and Transparency (2026)

Art.51 defines which GPAI models carry systemic risk. Article 52 sets the baseline obligations that apply to every GPAI model provider — regardless of whether the model crosses the 10²⁵ FLOPs threshold. If you develop and place a general-purpose AI model on the EU market, Art.52 applies from day one.

The obligations in Art.52 are foundational: technical documentation for regulatory oversight, a structured information package for downstream integrators, a demonstrable copyright compliance policy, a training data transparency summary, and — for models not qualifying for the open-source exemption — registration in the EU database. These obligations enter the GPAI provisions as a floor, not a ceiling: providers with systemic-risk models face all of Art.52 plus the additional requirements in Art.53-56.

Who Art.52 Applies To

Art.52 addresses "providers of general-purpose AI models" as defined in Art.3(63): organisations that develop a GPAI model and place it on the EU market or put it into service. The key characteristics:

Training scope: Models trained on large amounts of data using self-supervision at scale.
Generality: Capable of competently performing a wide range of distinct tasks.
Market activity: Placed on the EU market or put into service — including models made accessible via API, download, or integrated into products.

Importantly, Art.52 obligations attach to the provider of the GPAI model, not to downstream deployers who build applications on top of it. A company using the Anthropic API or an open-source LLaMA checkpoint to build a customer service chatbot is a deployer, not a GPAI provider. The GPAI provider obligations flow up the supply chain to the organisations doing the foundational training.

The upstream-downstream distinction matters throughout Art.52. The documentation requirements in Art.52(1)-(2) are specifically structured to enable the information flow from GPAI providers to the downstream providers building AI systems on top of their models — a critical design feature for the EU AI Act's value chain compliance architecture.

Art.52(1): Technical Documentation — Annex XI

Art.52(1) requires GPAI providers to draw up technical documentation before placing the model on the EU market. This documentation must be kept up to date and made available to the AI Office and national competent authorities on request.

The content requirements are specified in Annex XI of the EU AI Act. Annex XI is divided into two parts depending on whether the documentation is for regulators or for downstream providers:

Annex XI, Part 1 — Technical Documentation for Authorities

The technical documentation for regulatory purposes must include:

General model description:

Name, version, and release date of the GPAI model.
The intended use cases and the use cases the provider has specifically excluded.
The distribution channels through which the model is accessible (API, open-source download, product integration).

Training specifications:

The training methodology and key design choices including the rationale for those choices.
The training data sources and data collection, labelling, cleaning, filtering, and enrichment procedures applied.
A description of the compute infrastructure used for training, the total training compute in FLOPs.
The data categories used in training, with information on the origin of training data.
Training data quality assessment including filtering criteria applied.

Architecture and parameters:

The overall architecture of the model and the key design choices.
The number of parameters.
The modalities (text, image, audio, video, code, other) the model handles.
The context window length, if applicable.

Evaluation and testing:

Results of evaluations conducted, including those conducted via external bodies.
Known or foreseeable limitations and the circumstances under which those limitations may appear.
Results of red-teaming or other safety evaluations conducted.
Bias assessments and measures taken to mitigate identified biases.

Post-market monitoring:

A description of the measures put in place for post-market monitoring and incident reporting.
A point of contact for regulatory communications.

Annex XI, Part 2 — Documentation for Downstream Providers

The second part of the Annex XI technical documentation covers information that providers must make available to downstream providers who integrate the GPAI model into AI systems:

A description of the capabilities and limitations of the GPAI model (what the model can and cannot do).
The types of tasks for which the model is suitable and those for which it should not be used.
The specifications for the API or integration interfaces through which the model is accessed.
Information on how the model generates outputs, including whether it uses probabilistic generation, and what the expected variance is.
Recommended safety measures and best practices for downstream integration.

The documentation update obligation: Art.52(1) requires documentation to be kept up to date. This means providers cannot treat Annex XI documentation as a one-time exercise — significant updates to model architecture, training data, fine-tuning approaches, or capability evaluations trigger an obligation to update the documentation before the updated model is re-released.

Art.52(2): Information for Downstream Providers — Annex XII

Art.52(2) creates a distinct obligation to provide information to downstream providers, beyond what is documented for regulatory purposes. The content requirements are specified in Annex XII, which covers the integration-facing information set:

Annex XII Requirements

Usage documentation:

Instructions for use — how to integrate and deploy the model safely and effectively.
Documented limitations on the use of the model, including the types of tasks or domains where the model's performance has not been validated.
Technical specifications for accessing the model (API endpoints, rate limits, input/output formats).

Training data summary:

A summary of training data used, at sufficient granularity to allow downstream providers to assess copyright exposure and domain coverage. (Note: the full training data copyright policy is addressed separately under Art.52(4) — this Annex XII summary is the downstream-facing version of that information.)

Model capability profile:

Benchmark results on standard evaluation suites, where available.
Information on which capability domains the model has been tested and which have not been evaluated.

Integration guidance:

Recommended prompting strategies and known prompting edge cases.
Information on any fine-tuning the provider has applied and how that affects base model behaviour.

The Annex XII information package must be made available to downstream providers — this is an active disclosure obligation, not a documentation-for-regulators exercise. In practice, this means GPAI providers need a structured onboarding documentation set for developers integrating their models.

Art.52(3): Copyright Compliance Policy

Art.52(3) introduces a specific copyright obligation that goes beyond the general training data documentation:

Providers of general-purpose AI models shall put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Article 4(3) of Directive 2019/790.

Unpacking the Copyright Policy Requirement

What is Art.4(3) of Directive 2019/790? The EU Copyright in the Digital Single Market Directive (CDSM Directive) allows rightsholders to reserve their rights against text-and-data mining (TDM) — including AI training — by expressing a machine-readable opt-out. The technical implementation of this opt-out is standardised through robots.txt, HTTP headers, and the emerging TDMRep (Training Data Machine Readable Opt-Out Protocol) standard.

What Art.52(3) requires:

A documented copyright policy — not merely a statement of intent, but a documented operational procedure for how the provider identifies and responds to copyright reservations in training data.
Technical implementation of opt-out detection — the policy must use "state-of-the-art technologies" to identify rights reservations expressed under Art.4(3) CDSM. This means automated detection of robots.txt directives, TDMRep declarations, and watermarks that express training opt-outs.
Compliance with detected reservations — identifying a reservation is not sufficient; the policy must describe how the provider actually excludes or handles data where opt-out has been expressed.

The "state-of-the-art technologies" standard creates a dynamic obligation. As technical standards for expressing training data opt-outs evolve (TDMRep is still maturing as of 2026), providers are expected to adopt the current best technical approach to detecting those reservations. A copyright policy that was state-of-the-art in 2024 may not satisfy the standard in 2027.

Copyright policy scope: The policy covers both the initial training data collection and any subsequent fine-tuning or continual learning that uses external data. Providers who run ongoing learning pipelines — updating models with data scraped post-release — need policies that cover both phases.

Copyright Policy Documentation

The copyright policy itself should be documented in writing and should address:

The sources from which training data was collected (web scrape, licensed datasets, synthetic data, public domain).
The technical methods used to identify opt-out expressions (robots.txt parsers, TDMRep checkers, watermark detectors).
The operational procedure when an opt-out is detected (exclusion from training set, legal review trigger, documentation).
How the policy is updated as technical standards evolve.
The process for handling retrospective claims from rightsholders who assert their opt-out was not respected.

Art.52(4): Training Data Summary

Art.52(4) requires providers to publish a sufficiently detailed summary of the training data used for training the GPAI model. This is a public-facing transparency obligation, distinct from the Annex XI technical documentation (which is for regulators) and the Annex XII information package (which is for downstream providers).

Key features of the training data summary:

"Sufficiently detailed" — the standard requires enough detail to allow meaningful assessment, but the regulation recognises that fully granular training data disclosure would compromise legitimate trade secrets. The AI Office has indicated that appropriate granularity includes data source categories, approximate proportions by data type, and the temporal coverage of the training data.
Public availability — the summary must be published, not merely made available on request. In practice, this means a publicly accessible page on the provider's website or in technical documentation.
Regular updates — as models are updated with new training runs or fine-tuning, the training data summary must be updated to reflect the current model's training data composition.

The training data summary interacts with the copyright policy: the summary provides downstream providers and the public with enough information to assess their own exposure as potential rightsholders, while the copyright policy demonstrates how the provider managed opt-out obligations during data collection. Together, these form the transparency layer for GPAI training data.

Art.52(5): Open-Source Model Exemptions

Art.52(5) provides a significant carve-out for GPAI models made available under a free and open-source licence. The exemption structure is:

What the open-source exemption covers: Providers of GPAI models released under open-source licences are exempt from the following Art.52 obligations:

The Annex XI technical documentation requirement under Art.52(1).
The Annex XII downstream provider information obligation under Art.52(2).

What the open-source exemption does NOT cover:

Art.52(3): The copyright compliance policy obligation applies to all GPAI providers, including open-source releases.
Art.52(4): The training data summary obligation applies to all GPAI providers, including open-source releases.
Art.52(6): EU database registration still applies to open-source models unless there is a separate exemption (see below).

Conditions for the open-source exemption: The exemption applies when the GPAI model is made available with the parameters — including the weights — under a free and open-source licence that allows access, use, modification, and distribution of the model. A model released with weights but with usage restrictions that limit commercial use or impose deployment conditions may not qualify as truly open-source for this purpose.

Open-source models with systemic risk: Critically, Art.52(5) states that the open-source exemption does not apply to GPAI models classified as having systemic risk under Art.51. An open-source model that crosses the 10²⁵ FLOPs threshold — or is classified via Commission decision — must comply with the full Art.52 documentation requirements in addition to Art.53-56. The open-source exemption was deliberately designed not to create a loophole for the highest-risk models.

Open-Source Exemption in Practice

Obligation	Commercial GPAI	Open-Source GPAI	Open-Source + Systemic Risk
Art.52(1) Annex XI technical docs	Required	Exempt	Required
Art.52(2) Annex XII downstream info	Required	Exempt	Required
Art.52(3) Copyright policy	Required	Required	Required
Art.52(4) Training data summary	Required	Required	Required
Art.52(6) Database registration	Required	Exempt (conditions)	Required

Art.52(6): EU Database Registration

Art.52(6) requires GPAI providers to register their models in the EU database established under Art.71, before placing the model on the EU market. Registration information must include:

Provider name and contact details.
Name and version of the GPAI model.
Date of first placing on the market or putting into service.
The type of open-source licence, if applicable.
A summary of the information in the Annex XI technical documentation (not the full documentation itself).

Registration exemptions: Open-source GPAI models that are made available under genuinely open-source licences with full weights access are generally exempt from the Art.52(6) database registration requirement, subject to Commission guidance. The Art.71 database framework specifies which information fields are publicly accessible versus restricted to regulatory authorities.

Art.52 and the GPAI Value Chain

Art.52 is designed around the GPAI value chain: foundation model providers → downstream AI system providers → deployers → end users. The documentation obligations in Art.52(1) and (2) explicitly support this chain:

Annex XI flows upward to regulators — the AI Office, national competent authorities, and the scientific panel use this documentation to assess systemic risk, conduct model evaluations, and oversee compliance.
Annex XII flows downstream to integrators — developers building AI systems on GPAI models use this information to design appropriate risk mitigation into their products and to satisfy their own documentation obligations under Art.11 and Art.13.

Liability allocation: Art.52(2) creates a structured information handoff between GPAI providers and downstream providers. When a downstream provider builds a high-risk AI system using a GPAI model, their compliance documentation can rely on the Annex XII information provided by the GPAI provider. Conversely, if a GPAI provider fails to disclose a known limitation that leads to a downstream compliance failure, the GPAI provider bears responsibility for that documentation gap.

Python Implementation: GPAIComplianceManager

from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
import datetime


class GPAIModelType(Enum):
    COMMERCIAL = "commercial"
    OPEN_SOURCE = "open_source"
    OPEN_SOURCE_SYSTEMIC_RISK = "open_source_systemic_risk"


class CopyrightReservationProtocol(Enum):
    ROBOTS_TXT = "robots.txt"
    TDMREP = "TDMRep"
    HTTP_HEADER = "X-Robots-Tag"
    WATERMARK = "watermark"


@dataclass
class TrainingDataSource:
    name: str
    source_type: str  # web_scrape, licensed_dataset, synthetic, public_domain
    approximate_proportion_pct: float
    temporal_coverage_start: datetime.date
    temporal_coverage_end: datetime.date
    opt_out_checked: bool = False
    opt_out_protocol: Optional[CopyrightReservationProtocol] = None


@dataclass
class AnnexXIDocumentation:
    """Annex XI Part 1 — Technical Documentation for Authorities"""
    model_name: str
    model_version: str
    release_date: datetime.date
    intended_use_cases: list[str]
    excluded_use_cases: list[str]
    distribution_channels: list[str]
    training_compute_flops: float
    parameter_count: int
    architecture_description: str
    modalities: list[str]
    context_window_tokens: Optional[int]
    training_data_sources: list[TrainingDataSource]
    evaluation_results: dict[str, float]
    known_limitations: list[str]
    red_teaming_conducted: bool
    bias_assessment_conducted: bool
    post_market_monitoring_description: str
    regulatory_contact_email: str


@dataclass
class AnnexXIIInformation:
    """Annex XII — Information for Downstream Providers"""
    usage_instructions: str
    documented_limitations: list[str]
    api_specifications_url: str
    benchmark_results: dict[str, float]
    evaluated_capability_domains: list[str]
    unevaluated_domains: list[str]
    recommended_prompting_guidance: str
    known_prompting_edge_cases: list[str]
    fine_tuning_applied: bool
    fine_tuning_description: Optional[str]


@dataclass
class CopyrightCompliancePolicy:
    """Art.52(3) copyright policy implementation"""
    policy_version: str
    policy_date: datetime.date
    training_data_sources_covered: list[str]
    opt_out_detection_protocols: list[CopyrightReservationProtocol]
    exclusion_procedure: str
    policy_update_cadence_months: int
    retrospective_claim_procedure: str

    def check_compliance(self) -> dict[str, bool]:
        return {
            "documented_policy": bool(self.policy_version and self.policy_date),
            "covers_all_opt_out_protocols": len(self.opt_out_detection_protocols) >= 2,
            "exclusion_procedure_defined": bool(self.exclusion_procedure),
            "update_cadence_reasonable": self.policy_update_cadence_months <= 12,
            "retrospective_claims_addressed": bool(self.retrospective_claim_procedure),
        }


@dataclass
class TrainingDataSummary:
    """Art.52(4) public training data summary"""
    publication_url: str
    last_updated: datetime.date
    data_categories: list[dict]  # [{category, proportion_pct, source_type}]
    temporal_coverage: str
    languages_covered: list[str]
    total_tokens_approximate: Optional[str]
    data_quality_measures: list[str]
    copyright_compliance_reference: str

    def is_sufficiently_detailed(self) -> bool:
        return (
            len(self.data_categories) >= 3
            and bool(self.temporal_coverage)
            and len(self.languages_covered) >= 1
            and bool(self.copyright_compliance_reference)
        )


@dataclass
class GPAIComplianceManager:
    model_name: str
    model_type: GPAIModelType
    training_flops: float
    systemic_risk_classified: bool = False

    annex_xi_docs: Optional[AnnexXIDocumentation] = None
    annex_xii_info: Optional[AnnexXIIInformation] = None
    copyright_policy: Optional[CopyrightCompliancePolicy] = None
    training_data_summary: Optional[TrainingDataSummary] = None
    eu_database_registered: bool = False
    registration_date: Optional[datetime.date] = None

    SYSTEMIC_RISK_FLOPS_THRESHOLD = 1e25

    def __post_init__(self):
        if self.training_flops >= self.SYSTEMIC_RISK_FLOPS_THRESHOLD:
            self.systemic_risk_classified = True
        if (
            self.model_type == GPAIModelType.OPEN_SOURCE
            and self.systemic_risk_classified
        ):
            self.model_type = GPAIModelType.OPEN_SOURCE_SYSTEMIC_RISK

    def requires_annex_xi(self) -> bool:
        """Art.52(1): Commercial providers and OSS with systemic risk"""
        return self.model_type in (
            GPAIModelType.COMMERCIAL,
            GPAIModelType.OPEN_SOURCE_SYSTEMIC_RISK,
        )

    def requires_annex_xii(self) -> bool:
        """Art.52(2): Same as Annex XI scope"""
        return self.requires_annex_xi()

    def requires_copyright_policy(self) -> bool:
        """Art.52(3): All GPAI providers — no exemption"""
        return True

    def requires_training_data_summary(self) -> bool:
        """Art.52(4): All GPAI providers — no exemption"""
        return True

    def requires_eu_registration(self) -> bool:
        """Art.52(6): Commercial + OSS systemic risk; pure OSS exempt"""
        return self.model_type in (
            GPAIModelType.COMMERCIAL,
            GPAIModelType.OPEN_SOURCE_SYSTEMIC_RISK,
        )

    def compliance_status(self) -> dict[str, bool]:
        status = {}

        if self.requires_annex_xi():
            status["annex_xi_technical_docs"] = self.annex_xi_docs is not None
        else:
            status["annex_xi_technical_docs"] = True  # exempt

        if self.requires_annex_xii():
            status["annex_xii_downstream_info"] = self.annex_xii_info is not None
        else:
            status["annex_xii_downstream_info"] = True  # exempt

        status["copyright_policy"] = (
            self.copyright_policy is not None
            and all(self.copyright_policy.check_compliance().values())
        )

        status["training_data_summary"] = (
            self.training_data_summary is not None
            and self.training_data_summary.is_sufficiently_detailed()
        )

        if self.requires_eu_registration():
            status["eu_database_registration"] = (
                self.eu_database_registered and self.registration_date is not None
            )
        else:
            status["eu_database_registration"] = True  # exempt

        return status

    def compliance_report(self) -> str:
        status = self.compliance_status()
        lines = [
            f"Art.52 Compliance Report — {self.model_name}",
            f"Model type: {self.model_type.value}",
            f"Training FLOPs: {self.training_flops:.2e}",
            f"Systemic risk classified: {self.systemic_risk_classified}",
            "",
        ]
        for obligation, compliant in status.items():
            mark = "✓" if compliant else "✗"
            lines.append(f"  {mark} {obligation}")
        overall = all(status.values())
        lines.append(f"\nOverall Art.52 compliance: {'PASS' if overall else 'GAPS IDENTIFIED'}")
        return "\n".join(lines)


# Example: commercial GPAI provider assessment
model = GPAIComplianceManager(
    model_name="ExampleFoundationModel-v2",
    model_type=GPAIModelType.COMMERCIAL,
    training_flops=5e24,  # below 10^25, no systemic risk presumption
)

model.copyright_policy = CopyrightCompliancePolicy(
    policy_version="2.1",
    policy_date=datetime.date(2026, 1, 15),
    training_data_sources_covered=["web_scrape_2020_2025", "licensed_datasets_pool"],
    opt_out_detection_protocols=[
        CopyrightReservationProtocol.ROBOTS_TXT,
        CopyrightReservationProtocol.TDMREP,
        CopyrightReservationProtocol.HTTP_HEADER,
    ],
    exclusion_procedure="Automated flagging during crawl; flagged URLs excluded from training set within 48h of detection.",
    policy_update_cadence_months=6,
    retrospective_claim_procedure="Legal team review within 30 days; model retrained if claim substantiated.",
)

model.training_data_summary = TrainingDataSummary(
    publication_url="https://example-ai.com/model/training-data",
    last_updated=datetime.date(2026, 4, 1),
    data_categories=[
        {"category": "web_text", "proportion_pct": 65.0, "source_type": "web_scrape"},
        {"category": "books_licensed", "proportion_pct": 15.0, "source_type": "licensed_dataset"},
        {"category": "code", "proportion_pct": 12.0, "source_type": "web_scrape"},
        {"category": "scientific_papers", "proportion_pct": 5.0, "source_type": "public_domain"},
        {"category": "synthetic_data", "proportion_pct": 3.0, "source_type": "synthetic"},
    ],
    temporal_coverage="Web data: 2016-2025; Books: 2000-2024",
    languages_covered=["en", "de", "fr", "es", "it", "nl", "pl", "pt"],
    total_tokens_approximate="2.1 trillion tokens",
    data_quality_measures=["deduplication", "toxicity_filtering", "quality_classification"],
    copyright_compliance_reference="https://example-ai.com/legal/copyright-policy",
)

print(model.compliance_report())

Art.52 Compliance Checklist (14 Items)

Annex XI Technical Documentation (Art.52(1)) — Commercial and Systemic-Risk OSS

1. General model description complete: Model name, version, release date, intended and excluded use cases, distribution channels documented.
2. Training specifications documented: Methodology, data sources, data collection procedures, total training compute in FLOPs.
3. Architecture documentation complete: Architecture overview, parameter count, modalities, context window length.
4. Evaluation results documented: Benchmark results, known limitations, red-teaming outcomes, bias assessment results.
5. Post-market monitoring plan: Monitoring procedures and regulatory contact point documented.
6. Documentation kept up to date: Update process in place; documentation reviewed on each model version release.

Annex XII Downstream Provider Information (Art.52(2)) — Commercial and Systemic-Risk OSS

7. Usage instructions published: Integration guidance, capability profile, and known limitations documented and accessible to downstream providers.
8. API specifications available: Technical integration specifications, rate limits, input/output formats documented.

Copyright Policy (Art.52(3)) — All GPAI Providers

9. Copyright compliance policy documented: Written policy covering all training data sources with specific opt-out detection procedures.
10. State-of-the-art opt-out detection implemented: Technical detection for robots.txt, TDMRep, and HTTP headers; policy updated as standards evolve.
11. Exclusion procedure operational: Documented workflow for handling detected opt-outs — from detection through exclusion and documentation.

Training Data Summary (Art.52(4)) — All GPAI Providers

12. Public training data summary published: Summary accessible at a public URL, with sufficient detail on data categories, proportions, temporal coverage, and languages.

EU Database Registration (Art.52(6)) — Commercial and Systemic-Risk OSS

13. EU database registration completed: Model registered before placing on the EU market; registration information current.

Open-Source Providers

14. Open-source licence verified: Licence allows access, use, modification, and distribution including weights; usage restrictions assessed against OSS exemption conditions. If model exceeds 10²⁵ FLOPs training compute, confirm systemic risk status and apply full Art.52 obligations.

Interaction with Art.53-56

Art.52 creates the baseline for all GPAI providers. Providers whose models are classified as having systemic risk under Art.51 must additionally comply with:

Art.53: Adversarial testing, red-teaming with AI Office involvement, incident reporting to the AI Office, cybersecurity measures, and energy efficiency reporting.
Art.54: Codes of practice for systemic-risk GPAI models, developed with AI Office facilitation.
Art.55: Disclosure obligations specific to systemic-risk models when integrated into downstream AI systems.
Art.56: AI Office supervisory powers over systemic-risk GPAI models including model evaluations, access to model weights and source code, and corrective measures.

Art.52 obligations run in parallel — providers with systemic risk do not graduate from Art.52; they comply with Art.52 plus Art.53-56.

Cross-References

Art.3(63): GPAI model definition — who Art.52 applies to.
Art.51: Systemic risk classification — determines whether Art.53-56 apply in addition to Art.52.
Annex XI: Full technical documentation requirements for Art.52(1).
Annex XII: Full downstream provider information requirements for Art.52(2).
Art.4(3) CDSM Directive 2019/790: The copyright reservation mechanism that Art.52(3) requires providers to detect and respect.
Art.71: EU database where GPAI model registration under Art.52(6) is submitted.
Art.90: Scientific panel's role in GPAI oversight, including qualified alerts that can trigger Art.51(1)(b) classification.
Art.99(3): Penalties for non-compliance with Art.52 obligations — up to 3% of total worldwide annual turnover, or €15 million for SMEs, whichever is higher.
Art.52 → Art.53: Article 52 is the floor; Art.53 adds systemic risk obligations on top.