2026-04-16·12 min read

EU AI Act Art.29 Obligations for Providers of General-Purpose AI Models: Developer Guide (2026)

EU AI Act Article 29 — together with Articles 51 through 55 — establishes the compliance framework for providers of general-purpose AI (GPAI) models. GPAI models include foundation models, large language models (LLMs), and multimodal models that are made available to downstream providers who integrate them into AI systems placed on the EU market. Unlike the high-risk AI system obligations in Chapter III (Art.9–Art.28), GPAI obligations in Chapter V apply to the upstream model layer, not to finished AI applications. Any developer, company, or open-source project that trains and distributes a GPAI model used by others in the EU must understand Art.29's requirements.

The enforcement body for GPAI obligations is the EU AI Office, established under Art.64, not national market surveillance authorities. This creates a single EU-level enforcement track for GPAI compliance — significantly different from the decentralized national enforcement that governs high-risk AI systems. Fines for GPAI obligation violations under Art.99 reach €15 million or 3 % of global annual turnover (whichever is higher) for most violations, and €7.5 million or 1.5 % for providing incorrect information. GPAI providers cannot treat these as abstract regulatory concerns: the EU AI Office has investigative powers including audits, document requests, and capability assessments under Art.68.

This guide covers Art.29(1) technical documentation (Annex XI), Art.29(2) downstream provider access obligations, Art.29(3) systemic risk assessment and the 10²⁵ FLOP threshold, the Art.29 × Art.51/52/53/55 intersection matrix, the CLOUD Act jurisdiction risk for GPAI training data and model weights stored on US infrastructure, Python implementation for GPAIProviderRecord, DownstreamProviderAccessRecord, and GPAITransparencyChecker, and the 40-item Art.29 compliance checklist.

Art.29 in the GPAI Regulatory Architecture

Art.29 is the structural hub of Chapter V. It does not stand alone — it cross-references and is cross-referenced by Articles 51 through 55, each of which adds an obligation layer depending on the model's characteristics:

Article	Title	What It Does	Applies To
Art.51	Classification of GPAI models with systemic risk	Sets 10²⁵ FLOP threshold and capability-based classification	All GPAI providers
Art.52	Transparency obligations for certain AI systems	AI-generated content disclosure, deepfake labelling, chatbot disclosure	All GPAI providers + deployers
Art.53	Obligations for providers of GPAI models	Technical documentation, copyright summary, downstream policy, open-weight rules	All GPAI providers
Art.54	Authorised representatives and cooperation with AI Office	Registration, representative designation for non-EU providers	Non-EU GPAI providers
Art.55	Obligations for providers of GPAI models with systemic risk	Adversarial testing, incident reporting, cybersecurity, energy efficiency	Systemic-risk GPAI providers only

Art.29 in the recitals refers to the full Art.53 obligation set as the "Art.29 obligations" framework — this terminology is used in EU AI Office guidance documents. When reading official guidance, "Art.29 obligations" typically means the complete Art.53 list applied to GPAI providers, while Art.55 adds the systemic-risk tier on top.

The critical developer insight: every GPAI provider faces Art.52 and Art.53 obligations. Only models that cross the Art.51 threshold face the additional Art.55 obligations. But Art.51 classification — whether by FLOP threshold or EU AI Office capability assessment — triggers automatically. There is no pre-notification grace period for Art.55 obligations once the threshold is crossed.

Art.29(1): Technical Documentation for GPAI Providers

Art.29(1) requires GPAI providers to draw up and maintain technical documentation before the GPAI model is placed on the market and throughout its lifecycle. The content requirements are specified in Annex XI of the EU AI Act.

Annex XI: Required Documentation Elements

Annex XI specifies nine categories of technical documentation that GPAI providers must maintain:

#	Annex XI Element	What Developers Must Document
1	General description of the GPAI model	Architecture overview, modality (text, image, multimodal), intended use categories, languages supported
2	Description of model elements, training, and development process	Pre-training architecture, fine-tuning approach, RLHF/RLAIF methods, hardware used, training duration
3	Information on training data	Data sources, data categories, data collection methodologies, geographic coverage, time periods
4	Computational resources used	FLOP count (critical for Art.51 threshold assessment), GPU/TPU hours, energy consumption per training run
5	Benchmarks and evaluation results	Standardised evaluation scores (MMLU, HellaSwag, BIG-Bench, safety benchmarks), evaluation methodology
6	Known or foreseeable risks	Risk taxonomy from training data, misuse potential, known failure modes documented from red-teaming
7	Cybersecurity measures	Model hardening, adversarial robustness testing results, access controls for model weights/API
8	Capabilities documentation	Downstream use case categories the model is suitable for, limitations, contraindicated uses
9	Copyright summary	Training data copyright categories, DSM Directive Art.4 opt-out compliance, rights reservation records

Annex XI documentation is not a one-time exercise. It must be updated when the model is modified through fine-tuning, updated with new training data, or re-evaluated on capability benchmarks. The EU AI Office can request Annex XI documentation at any time under its investigative powers.

What "Updated Throughout Lifecycle" Means in Practice

For developers maintaining living GPAI models (continuous learning, periodic retraining, LoRA fine-tuning), Art.29(1) implies a documentation versioning requirement. Each material change to the model should generate a new documentation version with:

Updated training data description if new data was incorporated
Updated FLOP count if retraining occurred (for Art.51 threshold tracking)
Updated benchmark results if capability evaluations were re-run
Updated risk assessment if new failure modes were identified

Teams using model registries (MLflow, W&B, Hugging Face Hub) should ensure that the registry entries capture all Annex XI fields, not just internal engineering metadata.

Art.29(2): Downstream Provider Access Obligations

Art.29(2) requires GPAI providers to make available to downstream providers — companies and developers who build AI systems using the GPAI model — the information and documentation they need to comply with their own obligations under the EU AI Act.

What Downstream Providers Need

Downstream providers building high-risk AI systems under Chapter III need specific inputs from the GPAI provider to complete their own compliance documentation:

What Downstream Provider Needs	Source in Art.29(2)	Why They Need It
Technical documentation summary	Art.29(2)(a)	To document their AI system's technical foundation in their own Annex IV documentation
Copyright training data summary	Art.29(2)(b)	To assess whether their use of the GPAI model output triggers copyright liability
Known risks and limitations	Art.29(2)(c)	To complete their Art.9 risk management system for the high-risk AI system built on top
Capability categories	Art.29(2)(d)	To determine whether the GPAI model's use case fits within the intended use scope
Compliance summary document	Art.29(2)(e)	A standardised document that summarises the GPAI provider's EU AI Act compliance status

The "compliance summary document" in Art.29(2)(e) is the key operational output. This document must be sufficient for a downstream provider to demonstrate that their upstream GPAI model layer is compliant, without the downstream provider having to independently assess the entire GPAI model. Practically, this functions like a software bill of materials (SBOM) but for AI compliance.

API Access and Model Weight Access

For GPAI models distributed via API access, Art.29(2) access obligations are fulfilled through API documentation, model cards, and the compliance summary document published alongside the API. Downstream providers accessing via API cannot inspect model internals directly — the API documentation and compliance summary are their primary compliance inputs.

For open-weight GPAI models (Llama-class, Mistral-class, Falcon-class), Art.29(2) access obligations are fulfilled through model card documentation that covers Annex XI elements, the copyright training data summary, and the downstream compliance summary. The EU AI Act specifically recognises the open-weight distribution model and provides adapted compliance pathways — but does not exempt open-weight providers from documentation obligations entirely.

Integration with Art.25: Value Chain Responsibilities

Art.25 establishes the rule that any entity that substantially modifies a GPAI model, or places it on the market under its own name for a new intended purpose, becomes a new provider subject to the full provider obligation set. This creates a direct interaction with Art.29(2):

A company that fine-tunes a GPAI model on proprietary data and offers it via API is a new GPAI provider under Art.25 and must maintain its own Annex XI documentation, not just reference the upstream provider's documentation.
A company that wraps a GPAI API in a high-risk AI system (e.g., a medical decision support tool) is a downstream provider under Art.29(2) who can rely on the upstream GPAI provider's compliance summary for the model layer, but must independently fulfil Art.9–Art.23 for their own high-risk AI system.

The key question that determines which regime applies: did the downstream entity materially change the model's capabilities or intended use? If yes, Art.25 transformation applies and the downstream entity becomes a provider. If no, Art.29(2) downstream access documentation is sufficient.

Art.29(3): Systemic Risk Assessment

Art.29(3) requires GPAI providers to assess whether their model meets the criteria for classification as a GPAI model with systemic risk under Art.51.

Art.51(1)(a): The 10²⁵ FLOP Threshold

Art.51(1)(a) establishes an automatic classification trigger: a GPAI model trained using a cumulative computational effort exceeding 10²⁵ floating-point operations (FLOPs) is presumed to have systemic risk. This threshold is:

Technology-neutral: applies to any training paradigm (transformer pre-training, MoE, diffusion model training)
Cumulative: counts the total FLOPs across all training stages (pre-training + fine-tuning in the same training run)
Self-reported initially: GPAI providers must assess their own FLOP counts and report to the EU AI Office if they cross the threshold
Subject to downward revision: the Commission can lower the threshold by delegated act as computational efficiency improves

As of 2026, models in the GPT-4 / Gemini Ultra / Claude 3 Opus class are generally understood to have crossed or approached the 10²⁵ FLOP threshold. Models in the 7B–70B parameter range trained on ≤ 10 trillion tokens are generally below the threshold, though exact FLOP counts depend on training duration and hardware efficiency.

Art.51(2): Capability-Based Classification

Art.51(2) allows the EU AI Office to classify a GPAI model as having systemic risk even below the 10²⁵ FLOP threshold based on a capability assessment. The criteria for capability-based classification include:

Criterion	Description	Systemic Risk Indicators
Economic sector breadth	Number of economic sectors the model is deployed across	Models deployed in 5+ major sectors simultaneously
Critical infrastructure impact	Whether model is used in energy, finance, healthcare, transport, water	High-risk sector penetration without sector-specific safeguards
Fundamental rights impact	Whether model outputs affect fundamental rights at scale	Models used in hiring, credit, law enforcement, public benefit decisions
Societal reach	Number of EU users, volume of API calls, downstream AI systems built on model	> 10M EU users or > 100 downstream AI systems
Dual-use potential	Whether model capabilities extend to CBRN (chemical, biological, radiological, nuclear) risk generation	Models with uplift potential for weapons of mass destruction

The capability-based classification process under Art.51(2) gives the EU AI Office discretion to designate models as systemic risk regardless of their FLOP count. This creates regulatory uncertainty for mid-size GPAI models (10²³–10²⁵ FLOPs) that have broad downstream deployment. GPAI providers in this range should monitor EU AI Office guidance and code of practice developments closely.

Notification Obligations

Once a GPAI provider determines that their model meets the Art.51 threshold — either by FLOP count or EU AI Office designation — they must:

Notify the EU AI Office under Art.54(1)(a)
Register in the EU database under Art.71 (GPAI model section)
Implement Art.55 obligations immediately (adversarial testing, incident reporting, cybersecurity, energy efficiency)
Update Annex XI documentation to reflect systemic risk classification

There is no grace period between crossing the threshold and implementing Art.55 obligations. GPAI providers training new models that approach the 10²⁵ FLOP threshold should implement Art.55 infrastructure before training concludes, not after.

Art.29 × Art.52: Transparency Obligations

Art.52 establishes transparency requirements that apply at the output level of GPAI model deployments. While Art.52 also applies to deployers who use AI systems that generate synthetic content, GPAI providers face specific obligations when they distribute models that generate text, images, audio, or video.

Art.52(1): AI-Generated Content Disclosure

Art.52(1) requires that persons interacting with a chatbot or other AI system that operates through natural conversation know they are interacting with an AI, unless this is obvious. For GPAI providers distributing conversation APIs, this obligation typically falls on the downstream deployer — but GPAI providers must ensure their API terms and documentation clearly communicate this downstream obligation so it can be fulfilled.

Art.52(2): Deep Fake Disclosure

Art.52(2) requires that AI-generated or AI-manipulated image, audio, or video content — particularly where it depicts real persons — be disclosed as artificially generated or manipulated in a machine-readable format. For GPAI models capable of generating deepfake content (image/video/audio generation models), this means:

The model's output should include machine-readable metadata indicating AI provenance
The API documentation must clearly state that Art.52(2) disclosure obligations apply to downstream uses of the model's image/video/audio generation capabilities
GPAI providers should implement C2PA (Content Credentials) or equivalent technical watermarking to enable downstream compliance

Art.52(3): Synthetic Text Disclosure

Art.52(3) requires disclosure when AI-generated text is published on topics of public interest — including news, political commentary, and public affairs information. This creates a documentation obligation for GPAI text models: providers must clearly communicate in their downstream compliance documents that this disclosure obligation exists and that deployers using the model for public-interest content publishing must implement it.

Who Bears the Obligation: Provider vs Deployer

Content Type	Primary Obligation Holder	GPAI Provider's Role
Chatbot disclosure (Art.52(1))	Deployer who operates the chatbot	Document downstream obligation in compliance summary
Deepfake content (Art.52(2))	Deployer who publishes the content; GPAI provider for technical enablement	Implement C2PA/watermarking in model output; document in downstream compliance
Synthetic text on public interest (Art.52(3))	Deployer who publishes the text	Document downstream obligation; consider technical labelling

Art.29 × Art.53: Core GPAI Provider Obligations

Art.53 is the primary obligation article for all GPAI providers — regardless of systemic risk classification. Art.53(1) lists four sub-obligations:

Art.53(1)(a): Technical Documentation + Copyright Summary

GPAI providers must maintain the Annex XI technical documentation (as described in the Art.29(1) section above) and produce a copyright training data summary. The copyright summary must include:

Which data categories were used (web crawl, licensed data, public domain, proprietary)
Whether the training data use falls within the DSM Directive Art.4 research exemption
Which data sources implemented the TDM opt-out under DSM Art.4(3) and whether those opt-outs were respected
A summary of any licensed data agreements covering training data

The copyright summary is particularly sensitive. The EU AI Act does not resolve the underlying question of whether training a model on copyrighted data without a license is lawful — that remains a matter for EU and member-state copyright law and ongoing litigation. What Art.53(1)(a) requires is transparency: GPAI providers must document what they did, not certify that it was lawful.

Art.53(1)(b): Policy for Downstream Providers

GPAI providers must publish and maintain a policy for downstream providers that describes how downstream providers can use the model in compliance with their own EU AI Act obligations. This policy must address:

Which intended use categories are permitted
Which use categories are restricted (e.g., prohibited applications under Art.5)
What compliance documentation the downstream provider can rely on from the GPAI provider
How to contact the GPAI provider if downstream compliance questions arise
Update mechanisms if the GPAI provider's compliance status changes

Practically, this policy often takes the form of an Acceptable Use Policy (AUP) combined with a compliance summary document. For open-weight models, it may be embedded in the model card or licence terms.

Art.53(1)(c): DSM Directive Compliance and Opt-Out Mechanism

Art.53(1)(c) requires GPAI providers to comply with EU copyright law, including the DSM Directive's text and data mining (TDM) provisions. This specifically includes:

Identifying and honouring TDM opt-outs implemented by rights holders via machine-readable means (robots.txt, IPTC Photo Metadata, licensing signals)
Maintaining records of opt-out compliance in the copyright training data summary
Not using data from sources that have implemented Art.4(3) TDM opt-outs for commercial training purposes

The practical challenge: at web-crawl scale, comprehensive opt-out tracking is technically difficult. The EU AI Office's code of practice for GPAI models (being developed in 2025–2026) is expected to provide technical standards for opt-out compliance at scale. GPAI providers should monitor the code of practice process and implement opt-out tracking infrastructure before the code becomes binding.

Art.53(1)(d): Training Data Transparency Summary

Art.53(1)(d) requires GPAI providers to make available to the EU AI Office, upon request, a detailed summary of the training data used. This summary goes beyond the Annex XI documentation and must include:

The number of data sources and their geographic distribution
The data collection and filtering pipeline description
Quality assurance steps applied to training data
Known data quality issues or biases identified during data preparation

Unlike Annex XI documentation (which is provided to downstream providers and published in accessible form), the Art.53(1)(d) training data summary may include commercially sensitive information that GPAI providers wish to protect. Art.78 of the EU AI Act establishes confidentiality protections for information provided to the AI Office — but these protections are narrower than trade secret protections in commercial contexts.

Art.29 × Art.55: Systemic Risk Additional Requirements

For GPAI models with systemic risk (Art.51 classification), Art.55 adds four categories of additional obligations on top of the Art.53 baseline:

Art.55(1)(a): Adversarial Testing / Red-Teaming

Art.55(1)(a) requires providers of systemic-risk GPAI models to perform model evaluations including adversarial testing to identify and document risks to health, safety, and fundamental rights. This must be done:

Before the model is placed on the market
Periodically after placement (the EU AI Office code of practice will define frequency)
After material model updates that could affect systemic risk characteristics

Red-teaming for GPAI systemic risk purposes is broader than standard safety evaluations. It must cover:

Red-Teaming Domain	Scope	Why Required
CBRN uplift	Can the model provide meaningful assistance in creating chemical, biological, radiological, or nuclear weapons?	Art.5 prohibition + systemic risk classification
Cyberattack facilitation	Can the model generate functional malware, explain critical infrastructure vulnerabilities, or assist in advanced persistent threats?	Cybersecurity risk
Large-scale manipulation	Can the model generate disinformation at scale, impersonate political figures convincingly, or enable coordinated inauthentic behaviour?	Democratic discourse and election integrity
Critical infrastructure attacks	Can the model provide operational planning for attacks on energy, water, finance, or transport infrastructure?	Critical infrastructure protection
Psychological harm	Does the model exhibit harmful tendencies in extended conversation contexts (encouraging self-harm, radicalization)?	Fundamental rights and health

Adversarial testing results must be documented and retained. The EU AI Office can request these records at any time under its investigative powers. Results that reveal unresolved high-severity risks must be disclosed to the EU AI Office under Art.55(1)(b).

Art.55(1)(b): Incident Reporting to EU AI Office

Providers of systemic-risk GPAI models must report serious incidents to the EU AI Office. A serious incident for GPAI purposes includes:

Deaths, serious injuries, or significant damage to property attributable to GPAI model outputs
Significant disruptions to critical infrastructure facilitated by GPAI model outputs
GPAI model use in large-scale criminal activity with significant societal impact
Discovery that the GPAI model produces CBRN or critical infrastructure attack content at significant scale

The reporting timeline and format will be specified in EU AI Office implementing acts. GPAI providers should implement incident monitoring pipelines that can detect and escalate relevant incidents, particularly when models are used via API by downstream providers at scale — incidents may be reported to the GPAI provider by downstream providers and must be escalated to the EU AI Office.

Art.55(1)(c): Cybersecurity Measures

Art.55(1)(c) requires providers of systemic-risk GPAI models to implement adequate cybersecurity protection for the model itself — specifically model weights, training data, and capability evaluation records. Required cybersecurity measures include:

Cybersecurity Measure	Rationale	Implementation
Model weight protection	Leaked weights create uncontrolled deployment outside Art.55 obligations	Encrypted storage, access controls, air-gapped training infrastructure
Training data security	Training data may contain sensitive information and is a compliance record	Data governance pipeline, access logging, tamper-evident storage
API security	Preventing jailbreak extraction of model capabilities and misuse at scale	Rate limiting, abuse detection, output monitoring
Evaluation record integrity	Capability evaluation records are regulatory documents that must be tamper-proof	Cryptographic signing of evaluation results, audit trail

Art.55(1)(d): Energy Efficiency Reporting

Art.55(1)(d) requires providers of systemic-risk GPAI models to document and report energy consumption for both training and inference. This covers:

Total training energy consumption (kWh) for each training run
Per-token inference energy consumption at representative scale
Data centre location and grid carbon intensity during training
Projected inference energy consumption at deployment scale

Energy efficiency reporting is both a transparency obligation and, in future implementing acts, potentially a performance standard. GPAI providers should implement energy tracking infrastructure in their training pipelines from the start — retrofitting energy accounting onto completed training runs is significantly more difficult.

Systemic Risk Obligations Matrix

Obligation	Art.55 Reference	Trigger	Timing
Adversarial testing	Art.55(1)(a)	Art.51 classification	Before market + periodic
Incident reporting	Art.55(1)(b)	Art.51 classification	Within 2 weeks of discovery (proposed)
Cybersecurity measures	Art.55(1)(c)	Art.51 classification	Before market + ongoing
Energy efficiency reporting	Art.55(1)(d)	Art.51 classification	Before market + per update
EU AI Office cooperation	Art.55(2)	Art.51 classification	Upon request
Code of practice participation	Art.56	Art.51 classification	By mandate if no voluntary code

CLOUD Act × Art.29: Jurisdiction Risk

The CLOUD Act (Clarifying Lawful Overseas Use of Data Act, 2018) allows US law enforcement to compel US-based cloud providers and their affiliates to produce data stored anywhere in the world, regardless of where the data physically resides. For GPAI providers using US infrastructure, this creates specific jurisdiction risks at the Art.29 compliance layer.

Training Data on US Infrastructure

GPAI training data stored in US-based cloud environments (AWS S3, Google Cloud Storage, Azure Blob) is subject to CLOUD Act disclosure requests. This creates two risks:

Copyright summary exposure: If the EU AI Office requests the Art.53(1)(d) training data summary under its investigative powers, and that summary contains information about training data sources that the GPAI provider considers commercially sensitive, the data may already be accessible to US law enforcement under a CLOUD Act order — without the EU AI Act's Art.78 confidentiality protections applying.
Litigation exposure: In copyright infringement litigation (EU or US), CLOUD Act-accessible training data records could be subpoenaed by opposing parties via US courts, bypassing EU procedural protections.

Model Weights as US Jurisdiction Assets

GPAI model weights stored on US infrastructure or trained on US hardware are subject to US export control regulations (EAR) and CLOUD Act jurisdiction. For systemic-risk GPAI models, this creates:

Export control compliance risk: Advanced AI model weights may be subject to BIS export controls affecting distribution to certain jurisdictions
Regulatory document integrity risk: If Art.55 evaluation records (adversarial testing results, incident reports) are stored on US infrastructure, they are simultaneously EU regulatory records (subject to EU AI Act) and US-jurisdiction assets (subject to CLOUD Act)

Capability Evaluation Records

Art.55(1)(a) adversarial testing results are particularly sensitive. These records document both the capabilities and the known failure modes of systemic-risk GPAI models. If stored on US infrastructure:

They are accessible to US intelligence agencies under CLOUD Act / FISA orders
They could be used to map the specific vulnerabilities and bypass methods of safety-critical AI systems
Their integrity as regulatory compliance documents depends on tamper-evident storage that US jurisdiction access could compromise

EU-Native Infrastructure Advantage

GPAI providers building EU-native training and storage infrastructure — using EU-domiciled PaaS providers (Scaleway, OVHcloud, Hetzner) for training data storage, model weight storage, and compliance record storage — avoid CLOUD Act jurisdiction exposure. The practical compliance advantage:

Compliance Record	US Infrastructure Risk	EU-Native Mitigation
Annex XI documentation	CLOUD Act accessible	Store in EU-native encrypted vault
Copyright training data summary	Litigation subpoena risk	EU-native document management with Art.78 protections
Art.55 adversarial testing results	Intelligence access risk	Air-gapped EU storage, zero-knowledge encryption
Incident reports to EU AI Office	Disclosure timing complications	EU-native secure channel to AI Office
Model weights	Export control + CLOUD Act	EU-native HPC/model storage infrastructure

Python Implementation

The following Python implementation provides data structures and logic for tracking Art.29 / Art.53 / Art.55 obligations for GPAI providers.

from __future__ import annotations

from dataclasses import dataclass, field
from datetime import date
from typing import Literal


# ---------------------------------------------------------------------------
# GPAIProviderRecord — tracks Art.29/53 obligations
# ---------------------------------------------------------------------------

@dataclass
class GPAIProviderRecord:
    """Tracks Art.29 + Art.53 obligations for a GPAI model provider."""

    model_id: str
    model_name: str
    parameter_count: int                    # e.g. 70_000_000_000 for 70B
    training_flops: float                   # cumulative FLOPs; 1e25 = Art.51 threshold
    open_weight: bool                       # True = open-weight distribution (Llama-class)
    training_data_categories: list[str] = field(default_factory=list)

    # Art.53(1)(a) — Annex XI documentation completeness
    annex_xi_general_description: bool = False
    annex_xi_training_process: bool = False
    annex_xi_training_data: bool = False
    annex_xi_computational_resources: bool = False
    annex_xi_benchmark_results: bool = False
    annex_xi_known_risks: bool = False
    annex_xi_cybersecurity_measures: bool = False
    annex_xi_capabilities: bool = False
    annex_xi_copyright_summary: bool = False

    # Art.53(1)(b) — downstream provider policy
    downstream_policy_published: bool = False
    downstream_policy_url: str | None = None

    # Art.53(1)(c) — DSM opt-out compliance
    tdm_optout_tracking_implemented: bool = False
    copyright_compliance_record_available: bool = False

    # Art.53(1)(d) — training data transparency summary
    training_data_summary_available: bool = False

    # Art.51 — systemic risk classification
    eu_ai_office_notified: bool = False
    eu_database_registered: bool = False

    # Art.55 — systemic risk additional obligations (populated if systemic risk)
    adversarial_testing_completed: bool = False
    adversarial_testing_date: date | None = None
    incident_reporting_pipeline_active: bool = False
    cybersecurity_measures_implemented: bool = False
    energy_consumption_tracked: bool = False

    def is_systemic_risk(self) -> bool:
        """Art.51(1)(a): automatic classification above 10^25 FLOPs."""
        return self.training_flops >= 1e25

    def annex_xi_completeness_report(self) -> dict:
        """Returns a completeness report for all 9 Annex XI elements."""
        elements = {
            "general_description": self.annex_xi_general_description,
            "training_process": self.annex_xi_training_process,
            "training_data": self.annex_xi_training_data,
            "computational_resources": self.annex_xi_computational_resources,
            "benchmark_results": self.annex_xi_benchmark_results,
            "known_risks": self.annex_xi_known_risks,
            "cybersecurity_measures": self.annex_xi_cybersecurity_measures,
            "capabilities": self.annex_xi_capabilities,
            "copyright_summary": self.annex_xi_copyright_summary,
        }
        completed = sum(elements.values())
        return {
            "model_id": self.model_id,
            "elements": elements,
            "completed": completed,
            "total": len(elements),
            "completion_pct": round(completed / len(elements) * 100, 1),
            "annex_xi_complete": completed == len(elements),
        }

    def art53_compliance_report(self) -> dict:
        """Returns Art.53(1)(a)-(d) compliance status."""
        return {
            "model_id": self.model_id,
            "art53_1a_technical_doc": self.annex_xi_completeness_report()["annex_xi_complete"],
            "art53_1b_downstream_policy": self.downstream_policy_published,
            "art53_1c_copyright_optout": self.tdm_optout_tracking_implemented,
            "art53_1d_training_data_summary": self.training_data_summary_available,
            "all_art53_complete": all([
                self.annex_xi_completeness_report()["annex_xi_complete"],
                self.downstream_policy_published,
                self.tdm_optout_tracking_implemented,
                self.training_data_summary_available,
            ]),
        }

    def art55_compliance_report(self) -> dict | None:
        """Returns Art.55 compliance status. None if not systemic risk."""
        if not self.is_systemic_risk():
            return None
        return {
            "model_id": self.model_id,
            "is_systemic_risk": True,
            "eu_ai_office_notified": self.eu_ai_office_notified,
            "eu_database_registered": self.eu_database_registered,
            "art55_1a_adversarial_testing": self.adversarial_testing_completed,
            "adversarial_testing_date": (
                self.adversarial_testing_date.isoformat()
                if self.adversarial_testing_date else None
            ),
            "art55_1b_incident_reporting": self.incident_reporting_pipeline_active,
            "art55_1c_cybersecurity": self.cybersecurity_measures_implemented,
            "art55_1d_energy_tracking": self.energy_consumption_tracked,
            "all_art55_complete": all([
                self.eu_ai_office_notified,
                self.eu_database_registered,
                self.adversarial_testing_completed,
                self.incident_reporting_pipeline_active,
                self.cybersecurity_measures_implemented,
                self.energy_consumption_tracked,
            ]),
        }

    def full_compliance_report(self) -> dict:
        return {
            "model_id": self.model_id,
            "model_name": self.model_name,
            "parameter_count": self.parameter_count,
            "training_flops": self.training_flops,
            "is_systemic_risk": self.is_systemic_risk(),
            "annex_xi": self.annex_xi_completeness_report(),
            "art53": self.art53_compliance_report(),
            "art55": self.art55_compliance_report(),
        }


# ---------------------------------------------------------------------------
# DownstreamProviderAccessRecord — Art.29(2) downstream access
# ---------------------------------------------------------------------------

@dataclass
class DownstreamProviderAccessRecord:
    """Tracks Art.29(2) downstream provider access obligations."""

    upstream_gpai_model_id: str
    downstream_provider_id: str
    access_type: Literal["api", "model_weights", "both"]

    # What the downstream provider has received
    compliance_summary_received: bool = False
    annex_xi_summary_received: bool = False
    copyright_summary_received: bool = False
    known_risks_documentation_received: bool = False
    downstream_policy_accepted: bool = False
    art53_documentation_available: bool = False

    # Access metadata
    access_granted_date: date | None = None
    compliance_summary_version: str | None = None

    def access_rights_summary(self) -> str:
        access_map = {
            "api": "API access only — model weights not available",
            "model_weights": "Model weights access — open-weight distribution",
            "both": "Full access — API + model weights",
        }
        return (
            f"Downstream provider {self.downstream_provider_id} has "
            f"{access_map[self.access_type]} to GPAI model "
            f"{self.upstream_gpai_model_id}."
        )

    def downstream_compliance_readiness(self) -> dict:
        """Checks whether downstream provider has all Art.29(2) inputs."""
        checks = {
            "compliance_summary": self.compliance_summary_received,
            "annex_xi_summary": self.annex_xi_summary_received,
            "copyright_summary": self.copyright_summary_received,
            "known_risks_documentation": self.known_risks_documentation_received,
            "downstream_policy_accepted": self.downstream_policy_accepted,
        }
        complete = sum(checks.values())
        return {
            "upstream_model": self.upstream_gpai_model_id,
            "downstream_provider": self.downstream_provider_id,
            "access_type": self.access_type,
            "checks": checks,
            "ready": complete == len(checks),
            "missing": [k for k, v in checks.items() if not v],
        }


# ---------------------------------------------------------------------------
# GPAITransparencyChecker — Art.52 transparency obligations
# ---------------------------------------------------------------------------

@dataclass
class GPAITransparencyChecker:
    """Checks Art.52 transparency obligations for a GPAI model deployment."""

    model_id: str
    output_types: list[str]  # e.g. ["text", "image", "audio", "video"]

    # Art.52(1) — chatbot disclosure
    chatbot_disclosure_implemented: bool = False

    # Art.52(2) — deepfake / synthetic media disclosure
    c2pa_watermarking_implemented: bool = False
    machine_readable_ai_provenance: bool = False

    # Art.52(3) — synthetic text on public interest topics
    synthetic_text_labelling_implemented: bool = False

    def generates_synthetic_media(self) -> bool:
        return any(t in self.output_types for t in ["image", "audio", "video"])

    def generates_text(self) -> bool:
        return "text" in self.output_types

    def check_ai_generated_disclosure(self, output_type: str) -> bool:
        """Returns True if disclosure obligation is met for given output type."""
        if output_type == "chatbot":
            return self.chatbot_disclosure_implemented
        if output_type in ("image", "audio", "video"):
            return self.c2pa_watermarking_implemented and self.machine_readable_ai_provenance
        if output_type == "text_public_interest":
            return self.synthetic_text_labelling_implemented
        return False

    def disclosure_completeness_report(self) -> dict:
        """Returns Art.52 compliance status for all applicable output types."""
        checks: dict[str, bool] = {}

        if self.generates_text():
            checks["art52_1_chatbot_disclosure"] = self.chatbot_disclosure_implemented
            checks["art52_3_synthetic_text_labelling"] = self.synthetic_text_labelling_implemented

        if self.generates_synthetic_media():
            checks["art52_2_c2pa_watermarking"] = self.c2pa_watermarking_implemented
            checks["art52_2_machine_readable_provenance"] = self.machine_readable_ai_provenance

        completed = sum(checks.values())
        return {
            "model_id": self.model_id,
            "output_types": self.output_types,
            "checks": checks,
            "completed": completed,
            "total": len(checks),
            "art52_complete": completed == len(checks),
            "missing_obligations": [k for k, v in checks.items() if not v],
        }

Art.29 Compliance Checklist

Practical 40-item checklist for GPAI providers implementing Art.29 / Art.53 / Art.55 compliance:

Annex XI Technical Documentation (Art.53(1)(a))

1. Drafted general description of the GPAI model covering architecture, modality, and intended use categories
2. Documented training and development process including pre-training, fine-tuning, and RLHF/RLAIF methodology
3. Documented all training data sources including categories, geographic distribution, and time periods
4. Calculated and recorded total cumulative training FLOPs for Art.51 threshold assessment
5. Recorded hardware used (GPU/TPU type, count, training duration) in Annex XI documentation
6. Documented all benchmark evaluation results with methodology and evaluation dates
7. Documented known and foreseeable risks from training data and model capabilities
8. Documented cybersecurity measures protecting model weights, training data, and API access
9. Produced copyright training data summary covering DSM Directive Art.4 TDM analysis

DSM Directive Compliance (Art.53(1)(c))

10. Implemented TDM opt-out tracking in web crawl pipeline (robots.txt, IPTC, licensing signals)
11. Excluded from training data sources that implemented Art.4(3) TDM opt-out for commercial use
12. Maintained records of opt-out compliance decisions for EU AI Office audit readiness
13. Reviewed licensed data agreements for scope of training data use permissions
14. Documented data collection and filtering pipeline for copyright compliance audit trail

Downstream Provider Policy (Art.53(1)(b))

15. Published downstream provider policy describing permitted and restricted use categories
16. Identified Art.5 prohibited use categories and explicitly restricted them in downstream policy
17. Produced compliance summary document for downstream providers building on the GPAI model
18. Established mechanism for downstream providers to receive Art.29(2) documentation updates
19. Implemented process for downstream providers to report compliance questions to the GPAI provider

Training Data Transparency (Art.53(1)(d))

20. Prepared detailed training data transparency summary for EU AI Office requests
21. Documented data quality assurance steps and known training data quality issues
22. Prepared to provide training data summary under Art.78 confidentiality protections

Systemic Risk Assessment (Art.51 / Art.29(3))

23. Assessed whether training FLOPs exceed 10²⁵ threshold for Art.51(1)(a) automatic classification
24. Assessed whether model meets Art.51(2) capability-based criteria for systemic risk classification
25. Implemented monitoring to detect if FLOP threshold is crossed during ongoing training runs
26. If systemic risk: notified EU AI Office under Art.54(1)(a) before market placement
27. If systemic risk: registered in EU database under Art.71 (GPAI model section)

Adversarial Testing (Art.55(1)(a)) — Systemic Risk Models Only

28. Completed pre-market adversarial testing covering CBRN uplift, cyberattack facilitation, large-scale manipulation
29. Documented red-teaming methodology and results in retainable compliance records
30. Established periodic adversarial testing schedule post-deployment
31. Implemented process to escalate unresolved high-severity red-team findings to EU AI Office

Incident Reporting (Art.55(1)(b)) — Systemic Risk Models Only

32. Implemented incident monitoring pipeline for API-level misuse detection
33. Established downstream provider incident reporting channel (so downstream providers can report serious incidents to GPAI provider)
34. Prepared incident report template aligned with EU AI Office expected format

Cybersecurity (Art.55(1)(c)) — Systemic Risk Models Only

35. Implemented encrypted storage and access controls for model weights
36. Implemented API rate limiting and abuse detection for model access
37. Implemented cryptographic signing of capability evaluation records for tamper-evidence

Energy Efficiency (Art.55(1)(d)) — Systemic Risk Models Only

38. Implemented energy tracking in training pipeline (kWh per training run)
39. Measured per-token inference energy consumption at representative scale
40. Documented data centre grid carbon intensity for training runs

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.

Join the waitlist View plans