2026-04-24·12 min read·

EU AI Act Art.52: Obligations for Providers of General-Purpose AI Models — Documentation, Copyright Policy, and Transparency (2026)

Art.51 defines which GPAI models carry systemic risk. Article 52 sets the baseline obligations that apply to every GPAI model provider — regardless of whether the model crosses the 10²⁵ FLOPs threshold. If you develop and place a general-purpose AI model on the EU market, Art.52 applies from day one.

The obligations in Art.52 are foundational: technical documentation for regulatory oversight, a structured information package for downstream integrators, a demonstrable copyright compliance policy, a training data transparency summary, and — for models not qualifying for the open-source exemption — registration in the EU database. These obligations enter the GPAI provisions as a floor, not a ceiling: providers with systemic-risk models face all of Art.52 plus the additional requirements in Art.53-56.

Who Art.52 Applies To

Art.52 addresses "providers of general-purpose AI models" as defined in Art.3(63): organisations that develop a GPAI model and place it on the EU market or put it into service. The key characteristics:

Importantly, Art.52 obligations attach to the provider of the GPAI model, not to downstream deployers who build applications on top of it. A company using the Anthropic API or an open-source LLaMA checkpoint to build a customer service chatbot is a deployer, not a GPAI provider. The GPAI provider obligations flow up the supply chain to the organisations doing the foundational training.

The upstream-downstream distinction matters throughout Art.52. The documentation requirements in Art.52(1)-(2) are specifically structured to enable the information flow from GPAI providers to the downstream providers building AI systems on top of their models — a critical design feature for the EU AI Act's value chain compliance architecture.

Art.52(1): Technical Documentation — Annex XI

Art.52(1) requires GPAI providers to draw up technical documentation before placing the model on the EU market. This documentation must be kept up to date and made available to the AI Office and national competent authorities on request.

The content requirements are specified in Annex XI of the EU AI Act. Annex XI is divided into two parts depending on whether the documentation is for regulators or for downstream providers:

Annex XI, Part 1 — Technical Documentation for Authorities

The technical documentation for regulatory purposes must include:

General model description:

Training specifications:

Architecture and parameters:

Evaluation and testing:

Post-market monitoring:

Annex XI, Part 2 — Documentation for Downstream Providers

The second part of the Annex XI technical documentation covers information that providers must make available to downstream providers who integrate the GPAI model into AI systems:

The documentation update obligation: Art.52(1) requires documentation to be kept up to date. This means providers cannot treat Annex XI documentation as a one-time exercise — significant updates to model architecture, training data, fine-tuning approaches, or capability evaluations trigger an obligation to update the documentation before the updated model is re-released.

Art.52(2): Information for Downstream Providers — Annex XII

Art.52(2) creates a distinct obligation to provide information to downstream providers, beyond what is documented for regulatory purposes. The content requirements are specified in Annex XII, which covers the integration-facing information set:

Annex XII Requirements

Usage documentation:

Training data summary:

Model capability profile:

Integration guidance:

The Annex XII information package must be made available to downstream providers — this is an active disclosure obligation, not a documentation-for-regulators exercise. In practice, this means GPAI providers need a structured onboarding documentation set for developers integrating their models.

Art.52(3) introduces a specific copyright obligation that goes beyond the general training data documentation:

Providers of general-purpose AI models shall put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Article 4(3) of Directive 2019/790.

What is Art.4(3) of Directive 2019/790? The EU Copyright in the Digital Single Market Directive (CDSM Directive) allows rightsholders to reserve their rights against text-and-data mining (TDM) — including AI training — by expressing a machine-readable opt-out. The technical implementation of this opt-out is standardised through robots.txt, HTTP headers, and the emerging TDMRep (Training Data Machine Readable Opt-Out Protocol) standard.

What Art.52(3) requires:

  1. A documented copyright policy — not merely a statement of intent, but a documented operational procedure for how the provider identifies and responds to copyright reservations in training data.
  2. Technical implementation of opt-out detection — the policy must use "state-of-the-art technologies" to identify rights reservations expressed under Art.4(3) CDSM. This means automated detection of robots.txt directives, TDMRep declarations, and watermarks that express training opt-outs.
  3. Compliance with detected reservations — identifying a reservation is not sufficient; the policy must describe how the provider actually excludes or handles data where opt-out has been expressed.

The "state-of-the-art technologies" standard creates a dynamic obligation. As technical standards for expressing training data opt-outs evolve (TDMRep is still maturing as of 2026), providers are expected to adopt the current best technical approach to detecting those reservations. A copyright policy that was state-of-the-art in 2024 may not satisfy the standard in 2027.

Copyright policy scope: The policy covers both the initial training data collection and any subsequent fine-tuning or continual learning that uses external data. Providers who run ongoing learning pipelines — updating models with data scraped post-release — need policies that cover both phases.

The copyright policy itself should be documented in writing and should address:

Art.52(4): Training Data Summary

Art.52(4) requires providers to publish a sufficiently detailed summary of the training data used for training the GPAI model. This is a public-facing transparency obligation, distinct from the Annex XI technical documentation (which is for regulators) and the Annex XII information package (which is for downstream providers).

Key features of the training data summary:

The training data summary interacts with the copyright policy: the summary provides downstream providers and the public with enough information to assess their own exposure as potential rightsholders, while the copyright policy demonstrates how the provider managed opt-out obligations during data collection. Together, these form the transparency layer for GPAI training data.

Art.52(5): Open-Source Model Exemptions

Art.52(5) provides a significant carve-out for GPAI models made available under a free and open-source licence. The exemption structure is:

What the open-source exemption covers: Providers of GPAI models released under open-source licences are exempt from the following Art.52 obligations:

What the open-source exemption does NOT cover:

Conditions for the open-source exemption: The exemption applies when the GPAI model is made available with the parameters — including the weights — under a free and open-source licence that allows access, use, modification, and distribution of the model. A model released with weights but with usage restrictions that limit commercial use or impose deployment conditions may not qualify as truly open-source for this purpose.

Open-source models with systemic risk: Critically, Art.52(5) states that the open-source exemption does not apply to GPAI models classified as having systemic risk under Art.51. An open-source model that crosses the 10²⁵ FLOPs threshold — or is classified via Commission decision — must comply with the full Art.52 documentation requirements in addition to Art.53-56. The open-source exemption was deliberately designed not to create a loophole for the highest-risk models.

Open-Source Exemption in Practice

ObligationCommercial GPAIOpen-Source GPAIOpen-Source + Systemic Risk
Art.52(1) Annex XI technical docsRequiredExemptRequired
Art.52(2) Annex XII downstream infoRequiredExemptRequired
Art.52(3) Copyright policyRequiredRequiredRequired
Art.52(4) Training data summaryRequiredRequiredRequired
Art.52(6) Database registrationRequiredExempt (conditions)Required

Art.52(6): EU Database Registration

Art.52(6) requires GPAI providers to register their models in the EU database established under Art.71, before placing the model on the EU market. Registration information must include:

Registration exemptions: Open-source GPAI models that are made available under genuinely open-source licences with full weights access are generally exempt from the Art.52(6) database registration requirement, subject to Commission guidance. The Art.71 database framework specifies which information fields are publicly accessible versus restricted to regulatory authorities.

Art.52 and the GPAI Value Chain

Art.52 is designed around the GPAI value chain: foundation model providers → downstream AI system providers → deployers → end users. The documentation obligations in Art.52(1) and (2) explicitly support this chain:

Liability allocation: Art.52(2) creates a structured information handoff between GPAI providers and downstream providers. When a downstream provider builds a high-risk AI system using a GPAI model, their compliance documentation can rely on the Annex XII information provided by the GPAI provider. Conversely, if a GPAI provider fails to disclose a known limitation that leads to a downstream compliance failure, the GPAI provider bears responsibility for that documentation gap.

Python Implementation: GPAIComplianceManager

from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
import datetime


class GPAIModelType(Enum):
    COMMERCIAL = "commercial"
    OPEN_SOURCE = "open_source"
    OPEN_SOURCE_SYSTEMIC_RISK = "open_source_systemic_risk"


class CopyrightReservationProtocol(Enum):
    ROBOTS_TXT = "robots.txt"
    TDMREP = "TDMRep"
    HTTP_HEADER = "X-Robots-Tag"
    WATERMARK = "watermark"


@dataclass
class TrainingDataSource:
    name: str
    source_type: str  # web_scrape, licensed_dataset, synthetic, public_domain
    approximate_proportion_pct: float
    temporal_coverage_start: datetime.date
    temporal_coverage_end: datetime.date
    opt_out_checked: bool = False
    opt_out_protocol: Optional[CopyrightReservationProtocol] = None


@dataclass
class AnnexXIDocumentation:
    """Annex XI Part 1 — Technical Documentation for Authorities"""
    model_name: str
    model_version: str
    release_date: datetime.date
    intended_use_cases: list[str]
    excluded_use_cases: list[str]
    distribution_channels: list[str]
    training_compute_flops: float
    parameter_count: int
    architecture_description: str
    modalities: list[str]
    context_window_tokens: Optional[int]
    training_data_sources: list[TrainingDataSource]
    evaluation_results: dict[str, float]
    known_limitations: list[str]
    red_teaming_conducted: bool
    bias_assessment_conducted: bool
    post_market_monitoring_description: str
    regulatory_contact_email: str


@dataclass
class AnnexXIIInformation:
    """Annex XII — Information for Downstream Providers"""
    usage_instructions: str
    documented_limitations: list[str]
    api_specifications_url: str
    benchmark_results: dict[str, float]
    evaluated_capability_domains: list[str]
    unevaluated_domains: list[str]
    recommended_prompting_guidance: str
    known_prompting_edge_cases: list[str]
    fine_tuning_applied: bool
    fine_tuning_description: Optional[str]


@dataclass
class CopyrightCompliancePolicy:
    """Art.52(3) copyright policy implementation"""
    policy_version: str
    policy_date: datetime.date
    training_data_sources_covered: list[str]
    opt_out_detection_protocols: list[CopyrightReservationProtocol]
    exclusion_procedure: str
    policy_update_cadence_months: int
    retrospective_claim_procedure: str

    def check_compliance(self) -> dict[str, bool]:
        return {
            "documented_policy": bool(self.policy_version and self.policy_date),
            "covers_all_opt_out_protocols": len(self.opt_out_detection_protocols) >= 2,
            "exclusion_procedure_defined": bool(self.exclusion_procedure),
            "update_cadence_reasonable": self.policy_update_cadence_months <= 12,
            "retrospective_claims_addressed": bool(self.retrospective_claim_procedure),
        }


@dataclass
class TrainingDataSummary:
    """Art.52(4) public training data summary"""
    publication_url: str
    last_updated: datetime.date
    data_categories: list[dict]  # [{category, proportion_pct, source_type}]
    temporal_coverage: str
    languages_covered: list[str]
    total_tokens_approximate: Optional[str]
    data_quality_measures: list[str]
    copyright_compliance_reference: str

    def is_sufficiently_detailed(self) -> bool:
        return (
            len(self.data_categories) >= 3
            and bool(self.temporal_coverage)
            and len(self.languages_covered) >= 1
            and bool(self.copyright_compliance_reference)
        )


@dataclass
class GPAIComplianceManager:
    model_name: str
    model_type: GPAIModelType
    training_flops: float
    systemic_risk_classified: bool = False

    annex_xi_docs: Optional[AnnexXIDocumentation] = None
    annex_xii_info: Optional[AnnexXIIInformation] = None
    copyright_policy: Optional[CopyrightCompliancePolicy] = None
    training_data_summary: Optional[TrainingDataSummary] = None
    eu_database_registered: bool = False
    registration_date: Optional[datetime.date] = None

    SYSTEMIC_RISK_FLOPS_THRESHOLD = 1e25

    def __post_init__(self):
        if self.training_flops >= self.SYSTEMIC_RISK_FLOPS_THRESHOLD:
            self.systemic_risk_classified = True
        if (
            self.model_type == GPAIModelType.OPEN_SOURCE
            and self.systemic_risk_classified
        ):
            self.model_type = GPAIModelType.OPEN_SOURCE_SYSTEMIC_RISK

    def requires_annex_xi(self) -> bool:
        """Art.52(1): Commercial providers and OSS with systemic risk"""
        return self.model_type in (
            GPAIModelType.COMMERCIAL,
            GPAIModelType.OPEN_SOURCE_SYSTEMIC_RISK,
        )

    def requires_annex_xii(self) -> bool:
        """Art.52(2): Same as Annex XI scope"""
        return self.requires_annex_xi()

    def requires_copyright_policy(self) -> bool:
        """Art.52(3): All GPAI providers — no exemption"""
        return True

    def requires_training_data_summary(self) -> bool:
        """Art.52(4): All GPAI providers — no exemption"""
        return True

    def requires_eu_registration(self) -> bool:
        """Art.52(6): Commercial + OSS systemic risk; pure OSS exempt"""
        return self.model_type in (
            GPAIModelType.COMMERCIAL,
            GPAIModelType.OPEN_SOURCE_SYSTEMIC_RISK,
        )

    def compliance_status(self) -> dict[str, bool]:
        status = {}

        if self.requires_annex_xi():
            status["annex_xi_technical_docs"] = self.annex_xi_docs is not None
        else:
            status["annex_xi_technical_docs"] = True  # exempt

        if self.requires_annex_xii():
            status["annex_xii_downstream_info"] = self.annex_xii_info is not None
        else:
            status["annex_xii_downstream_info"] = True  # exempt

        status["copyright_policy"] = (
            self.copyright_policy is not None
            and all(self.copyright_policy.check_compliance().values())
        )

        status["training_data_summary"] = (
            self.training_data_summary is not None
            and self.training_data_summary.is_sufficiently_detailed()
        )

        if self.requires_eu_registration():
            status["eu_database_registration"] = (
                self.eu_database_registered and self.registration_date is not None
            )
        else:
            status["eu_database_registration"] = True  # exempt

        return status

    def compliance_report(self) -> str:
        status = self.compliance_status()
        lines = [
            f"Art.52 Compliance Report — {self.model_name}",
            f"Model type: {self.model_type.value}",
            f"Training FLOPs: {self.training_flops:.2e}",
            f"Systemic risk classified: {self.systemic_risk_classified}",
            "",
        ]
        for obligation, compliant in status.items():
            mark = "✓" if compliant else "✗"
            lines.append(f"  {mark} {obligation}")
        overall = all(status.values())
        lines.append(f"\nOverall Art.52 compliance: {'PASS' if overall else 'GAPS IDENTIFIED'}")
        return "\n".join(lines)


# Example: commercial GPAI provider assessment
model = GPAIComplianceManager(
    model_name="ExampleFoundationModel-v2",
    model_type=GPAIModelType.COMMERCIAL,
    training_flops=5e24,  # below 10^25, no systemic risk presumption
)

model.copyright_policy = CopyrightCompliancePolicy(
    policy_version="2.1",
    policy_date=datetime.date(2026, 1, 15),
    training_data_sources_covered=["web_scrape_2020_2025", "licensed_datasets_pool"],
    opt_out_detection_protocols=[
        CopyrightReservationProtocol.ROBOTS_TXT,
        CopyrightReservationProtocol.TDMREP,
        CopyrightReservationProtocol.HTTP_HEADER,
    ],
    exclusion_procedure="Automated flagging during crawl; flagged URLs excluded from training set within 48h of detection.",
    policy_update_cadence_months=6,
    retrospective_claim_procedure="Legal team review within 30 days; model retrained if claim substantiated.",
)

model.training_data_summary = TrainingDataSummary(
    publication_url="https://example-ai.com/model/training-data",
    last_updated=datetime.date(2026, 4, 1),
    data_categories=[
        {"category": "web_text", "proportion_pct": 65.0, "source_type": "web_scrape"},
        {"category": "books_licensed", "proportion_pct": 15.0, "source_type": "licensed_dataset"},
        {"category": "code", "proportion_pct": 12.0, "source_type": "web_scrape"},
        {"category": "scientific_papers", "proportion_pct": 5.0, "source_type": "public_domain"},
        {"category": "synthetic_data", "proportion_pct": 3.0, "source_type": "synthetic"},
    ],
    temporal_coverage="Web data: 2016-2025; Books: 2000-2024",
    languages_covered=["en", "de", "fr", "es", "it", "nl", "pl", "pt"],
    total_tokens_approximate="2.1 trillion tokens",
    data_quality_measures=["deduplication", "toxicity_filtering", "quality_classification"],
    copyright_compliance_reference="https://example-ai.com/legal/copyright-policy",
)

print(model.compliance_report())

Art.52 Compliance Checklist (14 Items)

Annex XI Technical Documentation (Art.52(1)) — Commercial and Systemic-Risk OSS

Annex XII Downstream Provider Information (Art.52(2)) — Commercial and Systemic-Risk OSS

Copyright Policy (Art.52(3)) — All GPAI Providers

Training Data Summary (Art.52(4)) — All GPAI Providers

EU Database Registration (Art.52(6)) — Commercial and Systemic-Risk OSS

Open-Source Providers

Interaction with Art.53-56

Art.52 creates the baseline for all GPAI providers. Providers whose models are classified as having systemic risk under Art.51 must additionally comply with:

Art.52 obligations run in parallel — providers with systemic risk do not graduate from Art.52; they comply with Art.52 plus Art.53-56.

Cross-References

See Also