2026-04-16·12 min read·

EU AI Act Art.52 GPAI Model General Obligations: Technical Documentation, Training Data & Copyright — Developer Guide (2026)

EU AI Act Chapter V imposes a tiered obligation structure on General-Purpose AI (GPAI) model providers. Article 51 classifies GPAI models into two tiers based on training compute — general GPAI models and GPAI models with systemic risk. Article 52 establishes the baseline obligations that apply to every GPAI model provider regardless of tier. These are not optional enhancements: they are the minimum compliance floor for any organisation that trains, fine-tunes, or distributes a GPAI model in or into the EU market.

Chapter V of the EU AI Act became applicable on 2 August 2025 — before the rest of the Act's August 2026 deadline. Art.52 obligations are therefore already in force for GPAI model providers. If you train a foundation model, a large language model, a multimodal model, or any other model meeting the GPAI definition, Art.52 compliance is not a future concern — it is a present legal obligation.

For downstream SaaS developers integrating GPAI APIs into products, Art.52 is the regulatory basis for the technical documentation and model cards that upstream GPAI providers must supply. Art.55 flows Art.52 documentation obligations downstream: your GPAI API provider must give you the model card and training data summary that Art.52 requires them to maintain. Understanding Art.52 helps you understand what you are contractually entitled to demand from your GPAI provider and what obligations flow to you when they supply it.


Art.52 in the EU AI Act Structure

Art.52 sits at the second article of Chapter V — General-Purpose AI Models (Art.51–56). It provides the compliance baseline — the obligations that exist independently of systemic risk classification.

ArticleTitleWho It Applies To
Art.51GPAI model classificationDefines who is a GPAI provider and which tier
Art.52General GPAI model obligationsAll GPAI model providers — both tiers
Art.53Systemic risk GPAI obligationsSystemic risk tier only (adversarial testing, incident reporting, cybersecurity)
Art.54Authorised representativeNon-EU systemic risk providers only
Art.55Downstream provider obligationsAll GPAI providers — obligations to downstream integrators
Art.56Code of practiceSystemic risk tier — compliance pathway

Art.52 provides four baseline obligations:

  1. Art.52(1)(a) — Draw up, keep up-to-date, and make available to the Commission technical documentation covering model architecture, training methodology, purposes, and capabilities and limitations
  2. Art.52(1)(a)(i) — Include in technical documentation training data transparency covering the types and sources of training data and, where applicable, geographic coverage and quality assessment
  3. Art.52(1)(a)(ii) — Include in technical documentation a copyright compliance policy and a summary of the content used for training
  4. Art.52(1)(b) — Draw up and make available to downstream providers an information document (model card) in machine-readable format
  5. Art.52(2) — Upon request, provide technical documentation to the Commission and national authorities
  6. Art.52(3) — Register and publish a summary of training data for public transparency purposes (in the EU database under Art.71)

Art.52(1)(a): Technical Documentation for GPAI Models

Art.52(1)(a) requires every GPAI provider to draw up and maintain technical documentation before placing the model on the EU market. This documentation is distinct from and additional to the Annex IV technical documentation required for high-risk AI systems — it is GPAI-specific and governed by Annex XI.

Annex XI mandatory content elements for GPAI technical documentation:

ElementMandatory ContentPractical Implication
Model architectureType of architecture, number of parameters, context window size, modality (text/image/audio/code/multimodal)Must be documented in the technical file before market placement
Training methodologyPre-training approach (self-supervised / masked language modelling / RLHF / Constitutional AI), fine-tuning steps, alignment proceduresThe full training pipeline must be described — not just the final training run
Intended purposesTasks the model is designed to perform, target use cases, deployment scenarios described in the provider's documentationThe documentation must reflect what the model is marketed and deployed for
CapabilitiesPerformance on standard benchmarks relevant to the model's domain, demonstrated capabilities across task categoriesMust be current — if capabilities are updated, the documentation must be updated
LimitationsKnown failure modes, hallucination rates, bias and fairness assessments, safety benchmarks, capability limitationsLimitations documentation is mandatory, not optional transparency
Performance evaluationResults of internal testing and third-party evaluations, including adversarial probes for general GPAI modelsFor systemic risk models, this is enhanced by Art.53(1)(b) adversarial testing requirements

Documentation update obligation:

Art.52(1) requires that documentation is "kept up-to-date." This creates a continuous obligation — not just a one-time pre-market filing. Significant updates trigger documentation revision requirements:

Update TypeDocumentation Obligation
New pre-training run (continued training)Update training methodology + cumulative FLOPs + new capability assessments
Fine-tuning update released publiclyUpdate intended purposes, capabilities, and limitations sections
Safety patch or alignment updateUpdate training methodology and limitations sections
New modality added (e.g., adding image understanding to text model)Full documentation revision as new capabilities are added
Benchmark performance changeUpdate capabilities and performance evaluation sections

Art.52(1)(a)(i): Training Data Transparency

Art.52(1)(a)(i) requires that the technical documentation includes transparency about training data. This is one of the most commercially sensitive provisions of Art.52 because it requires disclosing information that model providers have historically kept confidential for competitive reasons.

Mandatory training data disclosure elements:

ElementWhat Must Be DisclosedNotes
Types of dataCategories of data used (web text, books, code, scientific papers, conversation data, images, audio, synthetic data, etc.)Type-level disclosure — not specific dataset names in all cases
Sources of dataOrigin of training datasets (Common Crawl, licensed datasets, proprietary data collection, partnerships)Source-level disclosure — includes whether data was scraped, licensed, or created
Geographic coverageLanguages represented, countries of origin of content creators, geographic distribution of training corpusRelevant for assessing cultural bias and multilingual capability representation
Quality assessmentFiltering procedures applied, deduplication methods, quality scoring mechanisms, NSFW/toxicity filteringDemonstrates diligence in training data curation — relevant to EU copyright and fundamental rights compliance

Sensitive disclosures — proportionality principle:

Art.52(1)(a)(i) requires disclosure of training data information to the Commission (Art.52(2)) and in summary form publicly (Art.52(3)), but trade secrets protection applies to the confidential portions. The regulation recognises a tension:

Information CategoryDisclosure to CommissionPublic Summary (Art.52(3))
Training data typesFull disclosureRequired summary
Training data sources (general)Full disclosureRequired summary
Specific proprietary dataset names and compositionsSubject to trade secret protectionAggregate categories only
Dataset licensing terms and costsTrade secret protection appliesNot required in public summary
Synthetic data generation methodologyFull disclosureSummary
Geographic distribution statisticsFull disclosureRequired summary

Bias and fairness dimension of training data transparency:

The geographic coverage and quality assessment requirements implicitly address bias. A GPAI provider must document:

This creates an audit trail that regulators — and downstream providers claiming Art.55 rights — can use to evaluate whether the model's capabilities and limitations documentation accurately reflects its training data composition.


Art.52(1)(a)(ii) is one of the most legally consequential provisions of Art.52. It requires every GPAI provider to:

  1. Have and document a copyright compliance policy
  2. Prepare a summary of the content used for training that enables rights holders to exercise their rights

Why Art.52(1)(a)(ii) exists:

The provision intersects with the EU Digital Single Market Directive (DSMD) 2019/790, which created the Text and Data Mining (TDM) exception at Articles 3 and 4. The DSMD framework:

DSMD ProvisionWhat It DoesArt.52(1)(a)(ii) Connection
Art.3 DSMDMandatory TDM exception for research organisations — cannot be overridden by contractResearch TDM use is presumptively lawful; Art.52 documentation confirms compliance
Art.4 DSMDGeneral TDM exception — but rights holders can opt out using machine-readable reservationArt.52(1)(a)(ii) requires providers to document how they handle opt-outs
Art.4(3) DSMDRights holders may reserve TDM rights using machine-readable meansThe copyright compliance policy must address how reserved rights are detected and respected

Copyright compliance policy — mandatory content:

Policy ElementRequired Content
TDM opt-out detectionHow the provider checks for robots.txt TDM opt-outs, tdmrep.json entries, and other machine-readable reservations
Licensed contentCategories of content acquired under licence agreements and the licences held
Public domain and open licence contentHow CC0, CC-BY, CC-BY-SA, and similar open-licensed content is handled
Lawful web scrapingLegal basis for scraping non-opted-out content under Art.4 DSMD TDM exception
Dispute resolutionProcess for responding to copyright infringement claims from rights holders
Ongoing compliance monitoringHow the provider monitors for new TDM opt-outs and updates training pipelines

Training data summary for rights holders:

The "summary of content used for training" serves a specific purpose: it enables rights holders to identify whether their works were included in training data and to exercise their rights (including claims under DSMD and national copyright law). The summary must be:

EU-wide harmonisation: The EU AI Office is developing guidance on what constitutes a compliant copyright compliance policy and a sufficient training data summary. Providers should monitor AI Office publications at ai.ec.europa.eu for updated requirements.


Art.52(1)(b): Machine-Readable Model Card for Downstream Providers

Art.52(1)(b) requires every GPAI provider to draw up an information document — commonly called a model card — and make it available to downstream providers who integrate the GPAI model into their own AI systems. This is the primary mechanism by which Art.52 obligations flow downstream via Art.55.

Machine-readable format requirement:

The model card must be machine-readable, not just human-readable. This enables:

Minimum model card content under Art.52(1)(b):

SectionContent RequiredFormat Guidance
Model identityProvider name, model name, version, release date, model type (GPAI / GPAI with systemic risk)Stable identifiers; version-controlled
Art.51 classificationTier classification (general GPAI or systemic risk), designated or threshold-basedBinary field + supporting evidence reference
Intended purposesApproved use cases, supported languages, input/output modalitiesList format; version-controlled
CapabilitiesPerformance summary across relevant benchmarks, demonstrated task competenceBenchmark name + score + version of benchmark
LimitationsKnown failure modes, hallucination characteristics, bias dimensions, capability gapsStructured list; severity-tagged
Training data summarySummary of training data types, sources, geographic coverage — from Art.52(1)(a)(i)Reference to full Art.52(3) public summary
Copyright compliancePointer to copyright compliance policy from Art.52(1)(a)(ii)URL to policy document; version-tracked
Art.55 downstream obligationsWhat obligations downstream integrators have when integrating this modelChecklist format for downstream compliance
Update notificationHow downstream providers will be notified of material model changesWebhook URL or notification API endpoint
Contact pointGPAI provider contact for compliance inquiries from downstream providersEmail or API endpoint

Machine-readable format options:

The AI Act does not mandate a specific schema, but emerging standards include:

Providers should choose a format that can be consumed by compliance automation tools and is compatible with the EU AI Office's API-based registration system under Art.71.


Art.52(2): Commission Access Obligation

Art.52(2) grants the European Commission the right to request access to GPAI technical documentation. This is a reactive right (Commission requests access) rather than a pro-active submission obligation — but it has significant compliance implications.

When Art.52(2) is triggered:

ScenarioCommission ActionProvider Response Obligation
Routine compliance monitoringCommission requests documentation reviewProvider must provide within the specified timeframe
Systemic risk assessmentCommission is evaluating potential Art.51(2) designation of a modelProvider must provide detailed technical documentation to support or rebut assessment
Serious incident investigationCommission or AI Office is investigating an incident involving the GPAI modelPriority access; provider must cooperate under Art.52(2) + Art.21
GPAI code of practice reviewCommission is assessing adequacy of code of practice measuresTechnical documentation is central evidence

Practical implications for compliance infrastructure:

Art.52(2) creates an ongoing obligation to maintain documentation in a form that can be produced on request. This means:

Art.52(2) vs. Art.21 (MSA cooperation for high-risk AI):

Art.52(2) applies to GPAI models specifically; Art.21 applies to high-risk AI systems. A GPAI model integrated into a high-risk AI system may be subject to both obligations simultaneously — the high-risk AI system integrator cooperates under Art.21, and the upstream GPAI provider cooperates under Art.52(2).


Art.52(3): Public Training Data Summary

Art.52(3) requires GPAI providers to register a summary of the content used for training in the EU AI database under Art.71 and make it publicly accessible. This is the public-facing obligation — distinct from the Commission-only documentation under Art.52(2).

What the public summary must contain:

ElementRequired in Public SummaryConfidentiality Protection
Training data typesYes — all major categoriesNo protection — public interest disclosure
Training data sources (general)Yes — at category levelTrade secrets on specific proprietary sources
Geographic coverageYes — by language and regionNo protection
Quality filtering proceduresSummary descriptionSpecific algorithms may have trade secret protection
Copyright compliance measuresSummary — "complies with EU copyright law via the following measures"Policy document referenced; details protected
Data collection timeframeYes — training data vintage/cutoffNo protection
Opt-out respect measuresYes — confirmation of TDM opt-out complianceNo protection

Registration in the EU AI database:

Art.52(3) links to Art.71 — the EU AI Office maintains a central database of AI systems and GPAI models. GPAI providers must register their model and publish the training data summary through this database. The AI Office provides the registration interface and has published technical guidelines on submission format.

Update obligation:

If a new version of the GPAI model is trained on additional data, the public summary must be updated to reflect the extended training data. The EU AI database is version-aware — providers must submit updated summaries tied to each major model version.


Art.52 × Art.55: Downstream Information Flow Chain

Art.52 creates documentation obligations on GPAI providers; Art.55 transmits those obligations downstream to providers who integrate GPAI models into their AI systems. This creates a two-tier information chain:

LayerActorObligation
Tier 1 (GPAI Provider)Foundation model providerDraws up Art.52 technical documentation, model card, copyright policy, training data summary
Tier 2 (Downstream Provider)SaaS developer integrating GPAI APIReceives Art.52 model card under Art.55; uses it to populate their own Art.11/Annex IV technical documentation for any high-risk AI system built on the GPAI model
End UserDeployer or user of downstream AI productBenefits from transparency chain; can exercise rights under Art.86 (right to explanation) based on information documented through Art.52 → Art.55 chain

What downstream providers are entitled to receive under Art.55:

When you integrate a GPAI API into a product — particularly a high-risk AI system under Annex III — you are entitled to receive from your GPAI API provider:

  1. A copy of the machine-readable model card (Art.52(1)(b))
  2. A reference to the publicly available training data summary (Art.52(3))
  3. A reference to the copyright compliance policy (Art.52(1)(a)(ii))
  4. The Art.51 classification status of the underlying model (general GPAI or systemic risk)
  5. For systemic risk models: notification of classification change and related obligations

Contractual enforcement of Art.55 rights:

Art.55 does not create an automatic legal transfer — it must be implemented through contracts between GPAI providers and downstream developers. Downstream developers should:


CLOUD Act × Art.52 Technical Documentation

Art.52 creates documentation that is highly sensitive from a legal and competitive perspective — and that documentation is directly at risk from CLOUD Act compellability when stored on US cloud infrastructure.

At-risk documentation under Art.52:

DocumentationArt.52 RequirementCLOUD Act Risk
Cumulative training compute logsRequired for Commission notification and technical documentationCompellable by US law enforcement — potential disclosure of proprietary training methodology
Training dataset compositionArt.52(1)(a)(i) — types and sources of dataCommercially sensitive; training data selection is a core competitive differentiator
Copyright compliance recordsArt.52(1)(a)(ii) — policy and data usage summaryLitigation-relevant; copyright disputes use these records as evidence
Capability and limitations documentationArt.52(1)(a) — benchmarks and known failuresCompetitive intelligence; discloses safety vulnerabilities
Model card versionsArt.52(1)(b) — machine-readable model card historyVersion history shows capability evolution; competitive intelligence
Training data opt-out compliance logsPart of copyright compliance policyCopyright litigation evidence; dual-access risk

Dual-compellability scenario for GPAI providers:

A GPAI provider operating primarily on US cloud infrastructure (AWS, Azure, GCP) faces the following simultaneous obligations:

These two obligations can conflict: the GPAI provider is simultaneously obligated to protect commercially sensitive documentation for EU regulatory purposes and subject to compelled disclosure to US law enforcement.

EU-native infrastructure as structural mitigation:

Storing GPAI compliance documentation — technical files, training data transparency reports, copyright compliance records, model card versions — on EU-native infrastructure (incorporated in the EU, operating under GDPR and EU administrative law) eliminates the CLOUD Act compellability risk. The records are subject to EU judicial process only, and access by non-EU authorities requires Mutual Legal Assistance Treaty (MLAT) procedures that provide legal visibility.

For developers building compliance infrastructure for GPAI providers — model registries, technical documentation management systems, copyright compliance tracking systems — EU-native PaaS deployment is an architectural requirement to achieve single-jurisdiction compliance.


Python Implementation

GPAITechnicalDocumentationRecord

from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
from datetime import date

class GPAITier(Enum):
    GENERAL = "general_gpai"
    SYSTEMIC_RISK = "systemic_risk_gpai"

class DocumentationStatus(Enum):
    DRAFT = "draft"
    COMPLETE = "complete"
    UNDER_REVIEW = "under_review"
    APPROVED = "approved"

@dataclass
class GPAITechnicalDocumentationRecord:
    """Art.52(1)(a) — GPAI model technical documentation record."""
    
    model_name: str
    model_version: str
    provider_name: str
    tier: GPAITier
    architecture_type: str           # e.g., "transformer", "diffusion", "mixture-of-experts"
    parameter_count: Optional[int]   # May be confidential — None if not disclosed
    context_window: Optional[int]    # Token context window
    modalities: list[str]            # e.g., ["text", "image", "code"]
    training_methodology: str        # Pre-training approach description
    fine_tuning_steps: list[str]     # List of fine-tuning and alignment procedures
    intended_purposes: list[str]     # Documented intended use cases
    supported_languages: list[str]   # BCP 47 language codes
    capabilities_summary: str        # Narrative capabilities description
    limitations_summary: str         # Narrative limitations and known failure modes
    benchmark_results: dict[str, float] = field(default_factory=dict)  # name -> score
    safety_evaluations: list[str] = field(default_factory=list)
    cumulative_training_flops: Optional[float] = None  # 10^25 threshold reference
    documentation_version: str = "1.0.0"
    documentation_date: str = ""
    status: DocumentationStatus = DocumentationStatus.DRAFT
    
    def __post_init__(self):
        if not self.documentation_date:
            self.documentation_date = date.today().isoformat()
    
    def is_systemic_risk_threshold_met(self) -> bool:
        """Check if cumulative training compute exceeds Art.51 threshold."""
        if self.cumulative_training_flops is None:
            return False
        return self.cumulative_training_flops >= 1e25
    
    def validate_completeness(self) -> list[str]:
        """Return list of missing mandatory Art.52(1)(a) elements."""
        gaps = []
        if not self.architecture_type:
            gaps.append("Art.52(1)(a): Model architecture type not documented")
        if not self.training_methodology:
            gaps.append("Art.52(1)(a): Training methodology not documented")
        if not self.intended_purposes:
            gaps.append("Art.52(1)(a): Intended purposes not documented")
        if not self.capabilities_summary:
            gaps.append("Art.52(1)(a): Capabilities summary not documented")
        if not self.limitations_summary:
            gaps.append("Art.52(1)(a): Limitations summary not documented")
        if not self.benchmark_results:
            gaps.append("Art.52(1)(a): No benchmark results documented")
        if not self.safety_evaluations:
            gaps.append("Art.52(1)(a): No safety evaluation results documented")
        return gaps
    
    def to_commission_submission(self) -> dict:
        """Prepare record for Art.52(2) Commission access submission."""
        return {
            "provider": self.provider_name,
            "model": f"{self.model_name} v{self.model_version}",
            "tier": self.tier.value,
            "architecture": self.architecture_type,
            "modalities": self.modalities,
            "methodology": self.training_methodology,
            "purposes": self.intended_purposes,
            "capabilities": self.capabilities_summary,
            "limitations": self.limitations_summary,
            "benchmarks": self.benchmark_results,
            "safety": self.safety_evaluations,
            "compute": self.cumulative_training_flops,
            "version": self.documentation_version,
            "date": self.documentation_date,
        }

TrainingDataTransparencyReport

@dataclass
class TrainingDataSource:
    """A single training data source entry."""
    source_name: str            # e.g., "Common Crawl", "Books3", "GitHub"
    data_type: str              # e.g., "web_text", "books", "code", "scientific_papers"
    collection_method: str      # "scraped", "licensed", "proprietary", "synthetic"
    approximate_size_tokens: Optional[int] = None
    languages: list[str] = field(default_factory=list)
    geographic_coverage: list[str] = field(default_factory=list)  # ISO 3166-1 alpha-2 codes
    collection_period_start: Optional[str] = None  # ISO date
    collection_period_end: Optional[str] = None    # ISO date  
    tdm_opt_out_checked: bool = False
    licence_held: Optional[str] = None             # SPDX identifier or "proprietary"
    is_confidential: bool = False                  # Trade secret protection flag

@dataclass
class TrainingDataTransparencyReport:
    """Art.52(1)(a)(i) — Training data transparency report."""
    
    model_name: str
    model_version: str
    report_date: str
    sources: list[TrainingDataSource]
    total_training_tokens: Optional[int] = None
    quality_filtering_methods: list[str] = field(default_factory=list)
    deduplication_methods: list[str] = field(default_factory=list)
    toxicity_filtering: bool = False
    nsfw_filtering: bool = False
    pii_filtering: bool = False
    
    def get_geographic_distribution(self) -> dict[str, int]:
        """Aggregate geographic coverage across all sources."""
        distribution: dict[str, int] = {}
        for source in self.sources:
            for country in source.geographic_coverage:
                distribution[country] = distribution.get(country, 0) + 1
        return distribution
    
    def get_language_distribution(self) -> dict[str, int]:
        """Aggregate language coverage across all sources."""
        distribution: dict[str, int] = {}
        for source in self.sources:
            for lang in source.languages:
                distribution[lang] = distribution.get(lang, 0) + 1
        return distribution
    
    def get_data_type_distribution(self) -> dict[str, list[str]]:
        """Group sources by data type."""
        by_type: dict[str, list[str]] = {}
        for source in self.sources:
            if source.data_type not in by_type:
                by_type[source.data_type] = []
            by_type[source.data_type].append(source.source_name)
        return by_type
    
    def generate_public_summary(self) -> dict:
        """Generate Art.52(3) public summary — excludes confidential source details."""
        public_sources = []
        for source in self.sources:
            if not source.is_confidential:
                public_sources.append({
                    "type": source.data_type,
                    "collection_method": source.collection_method,
                    "languages": source.languages,
                    "geographic_coverage": source.geographic_coverage,
                    "period": f"{source.collection_period_start} — {source.collection_period_end}",
                    "tdm_opt_out_respected": source.tdm_opt_out_checked,
                })
        return {
            "model": f"{self.model_name} v{self.model_version}",
            "report_date": self.report_date,
            "total_tokens": self.total_training_tokens,
            "data_sources_count": len(self.sources),
            "public_sources": public_sources,
            "geographic_distribution": self.get_geographic_distribution(),
            "language_distribution": self.get_language_distribution(),
            "quality_measures": {
                "filtering": self.quality_filtering_methods,
                "deduplication": self.deduplication_methods,
                "toxicity_filtering": self.toxicity_filtering,
                "nsfw_filtering": self.nsfw_filtering,
                "pii_filtering": self.pii_filtering,
            }
        }
    
    def check_tdm_opt_out_compliance(self) -> list[str]:
        """Identify sources where TDM opt-out was not checked."""
        non_compliant = []
        for source in self.sources:
            if source.collection_method == "scraped" and not source.tdm_opt_out_checked:
                non_compliant.append(
                    f"Art.52(1)(a)(i)/(ii): {source.source_name} — scraped source, TDM opt-out not verified"
                )
        return non_compliant

CopyrightCompliancePolicy

from enum import Enum

class TDMLegalBasis(Enum):
    ART3_DSMD_RESEARCH = "art3_dsmd_research_exception"    # Art.3 DSMD — mandatory research exception
    ART4_DSMD_GENERAL = "art4_dsmd_general_exception"      # Art.4 DSMD — general TDM exception (opt-out possible)
    LICENSED = "licensed"                                   # Content obtained under licence
    PUBLIC_DOMAIN = "public_domain"                         # Copyright expired or CC0
    OPEN_LICENCE = "open_licence"                           # CC-BY, CC-BY-SA, Apache, etc.
    PROPRIETARY = "proprietary"                             # Provider-created content

@dataclass
class CopyrightCompliancePolicy:
    """Art.52(1)(a)(ii) — Copyright compliance policy for GPAI training data."""
    
    provider_name: str
    model_name: str
    policy_version: str
    effective_date: str
    
    # Legal bases used in training data collection
    legal_bases_used: list[TDMLegalBasis]
    
    # TDM opt-out detection
    tdm_opt_out_detection_method: str  # e.g., "robots.txt + tdmrep.json automated check"
    tdm_opt_out_update_frequency: str  # e.g., "weekly recrawl, retroactive removal on detection"
    
    # Licence compliance
    licence_categories: list[str]     # Types of licences held (not specific licences — trade secret)
    licence_audit_frequency: str      # How often licences are reviewed for compliance
    
    # Dispute resolution
    infringement_report_contact: str  # Email or API endpoint for copyright claims
    response_sla: str                 # Response time for infringement reports
    removal_procedure: str            # How content is removed from training data on valid claim
    
    # Ongoing monitoring
    new_optout_monitoring: bool = True
    new_licence_monitoring: bool = True
    retroactive_compliance_review: bool = False
    
    def validate(self) -> list[str]:
        """Return compliance gaps in the copyright policy."""
        gaps = []
        if TDMLegalBasis.ART4_DSMD_GENERAL in self.legal_bases_used:
            if not self.tdm_opt_out_detection_method:
                gaps.append("Art.52(1)(a)(ii): Art.4 DSMD TDM exception used but no opt-out detection method documented")
        if not self.infringement_report_contact:
            gaps.append("Art.52(1)(a)(ii): No infringement report contact documented")
        if not self.removal_procedure:
            gaps.append("Art.52(1)(a)(ii): No content removal procedure documented")
        return gaps
    
    def to_public_summary(self) -> dict:
        """Generate public-facing copyright compliance summary."""
        return {
            "provider": self.provider_name,
            "model": self.model_name,
            "policy_version": self.policy_version,
            "effective_date": self.effective_date,
            "legal_bases": [lb.value for lb in self.legal_bases_used],
            "tdm_opt_out_respected": TDMLegalBasis.ART4_DSMD_GENERAL in self.legal_bases_used,
            "opt_out_detection": self.tdm_opt_out_detection_method,
            "opt_out_update_frequency": self.tdm_opt_out_update_frequency,
            "infringement_contact": self.infringement_report_contact,
            "response_sla": self.response_sla,
        }

Art.52 Compliance Checklist (40 Items)

Technical Documentation — Art.52(1)(a)

Training Data Transparency — Art.52(1)(a)(i)

Machine-Readable Model Card — Art.52(1)(b)

Commission Access and Public Summary — Art.52(2)/(3)


See Also