2026-04-23·14 min read·

EU AI Act Art.43: GPAI Models with Systemic Risk — Classification, Evaluation Obligations, and Compliance Framework (2026)

Article 43 of the EU AI Act introduces the most demanding compliance tier in the general-purpose AI model framework: the systemic risk designation. Where Art.41 established documentation and information-sharing obligations that apply to all GPAI model providers, Art.43 creates a second layer that applies exclusively to models whose training computation or demonstrated capabilities place them at the upper end of the capability frontier. The article is short in text but significant in consequence — it triggers obligations for model evaluation, incident reporting, cybersecurity, and energy transparency that do not apply to the broader GPAI model population.

For AI engineers, ML platform developers, and organisations building or integrating frontier AI models, Art.43 defines the compliance architecture that applies to GPT-class and successor systems. Understanding it requires grasping both the classification logic that determines whether a model falls within Art.43 scope and the full set of additional obligations that systemic risk status activates.

The Systemic Risk Classification Logic

Art.43 is a tiered article: it applies only to GPAI models that qualify as systemic risk models. The classification mechanism has two tracks.

Track 1 — Computational Threshold

A GPAI model is presumed to present systemic risk if it is trained using a total computing power of more than 10^25 floating-point operations (FLOPs). This threshold was set based on the compute scale used to train the largest frontier models at the time the regulation was finalised, and it was chosen as a proxy for capability level: models trained at this scale are assumed to exhibit capabilities that could generate systemic harms at scale — harms that are not limited to individual users or deployment contexts but could propagate across critical infrastructure, safety-relevant systems, or entire societal domains.

The 10^25 FLOPs threshold is a rebuttable presumption. A provider can argue to the AI Office that its model trained at or above the threshold does not in fact present systemic risk based on its actual capabilities, intended uses, and deployment context. The AI Office can accept or reject that argument, but the burden of rebuttal rests with the provider.

The threshold is also subject to adjustment. The Commission is empowered to update the computational threshold through delegated acts as the compute-capability frontier evolves. Given the trajectory of AI training compute, the practical expectation is that the threshold may be tightened over time to maintain its role as a capability proxy.

Track 2 — Commission Decision

Independent of the computation threshold, the Commission may classify a GPAI model as presenting systemic risk based on criteria other than training compute — specifically, where the model has high-impact capabilities or where there is reason to believe its capabilities or deployment could generate systemic risks, including where that risk materialises through accumulation or interaction with other systems. The Commission acts on request of the AI Office or on its own initiative, and it may also remove a systemic risk classification where circumstances change.

This second track extends systemic risk regulation to GPAI models that might be trained below the threshold but exhibit high-risk capability profiles through architectural efficiency, multimodal capability, or domain-specific capability concentration.

The Two-Tier Obligations Architecture

Systemic risk designation does not replace the Art.41 obligations for GPAI model providers — it adds to them. A GPAI model provider with a systemic risk designation must comply with all of Art.41 (technical documentation, downstream information provision, copyright policy, training data summary) and additionally with the Art.43 systemic risk obligations. The compliance architecture is additive, not alternative.

Art.43's additional obligations fall into four categories: model evaluation, serious incident reporting, cybersecurity measures, and energy efficiency reporting. Each targets a different dimension of the risks that frontier model scale introduces.

Obligation 1: Model Evaluation and Adversarial Testing

Providers of systemic risk GPAI models must perform model evaluations, including adversarial testing, before making the model available to downstream providers or deployers, and on an ongoing basis. The adversarial testing obligation operationalises the principle that frontier models must be actively stress-tested for dangerous capabilities rather than having providers simply assert they are safe based on intended use.

What model evaluation must cover

The evaluation must assess the model's capability profile across safety-relevant dimensions. For systemic risk models, the relevant dimensions include: the model's performance on tasks that could contribute to the development or use of weapons of mass destruction (chemical, biological, radiological, nuclear); its susceptibility to generating harmful content at scale (CSAM, incitement, coordinated disinformation); its capacity to enable offensive cyber operations; and its aggregate risk profile when deployed across many simultaneous users or applications.

The model evaluation must be documented, and the documentation must be made available to the AI Office on request. Evaluations should be repeated when the model is updated, fine-tuned in ways that could affect its risk profile, or deployed in new domains.

Adversarial testing requirements

Adversarial testing — colloquially called red-teaming — means deliberately probing the model for failure modes, dangerous capabilities, and safety guardrail bypasses using techniques that simulate how a determined adversary or misuse case would approach the model. The regulation does not specify a particular red-teaming methodology, but the state of the art in this domain involves a combination of automated evaluation (using evaluation harnesses like HELM, MMLU-adversarial variants, or custom benchmarks), human expert red-teaming (domain specialists in biosecurity, cybersecurity, and social manipulation probing the model directly), and structured threat modelling (identifying plausible misuse actors and scenarios and designing evaluations targeting those threat profiles).

Providers are expected to engage qualified evaluators — internal safety teams, external red-team providers, or national AI safety institutes where applicable — and to conduct evaluations under conditions that approximate deployment conditions rather than controlled laboratory settings.

AI Office Coordinated Evaluation

The AI Office may also initiate coordinated evaluation processes — cross-provider assessments that allow comparison of how different frontier models respond to the same adversarial test set. Providers subject to Art.43 must participate in AI Office-coordinated evaluation activities when requested. This creates a quasi-regulatory testing regime that extends beyond what any individual provider's internal safety team conducts.

Obligation 2: Serious Incident Reporting

Providers of systemic risk GPAI models must report serious incidents to the AI Office without undue delay. A serious incident in the GPAI context means an incident where the GPAI model contributed to generating harm at the systemic risk scale — that is, where the harm is not confined to a single user or deployment but has or threatens to have broader societal, safety, or security consequences.

Scope of reportable incidents

The regulation's serious incident scope for GPAI models differs from the serious incident reporting framework for high-risk AI systems under Art.73. For high-risk AI, serious incidents are defined around specific harm categories in specific deployment contexts (healthcare, critical infrastructure, law enforcement). For GPAI models, the scope reflects the model's cross-cutting nature: a serious incident could be a case where the model's capabilities were exploited to conduct a cyberattack at scale, to generate disinformation that measurably interfered with democratic processes, to contribute to the synthesis of a dangerous substance, or to generate content at scale that constitutes a public safety threat.

Providers must maintain incident monitoring systems adequate to detect and assess potential serious incidents at the GPAI model level — meaning incidents arising from the model's outputs regardless of which downstream application generated them. This is non-trivial for foundation model providers whose models are integrated into thousands of downstream products: it requires maintaining channels through which downstream providers can report model-level incidents, not just product-level incidents.

Reporting timeline and content

Reports must be made to the AI Office without undue delay — the regulation does not specify a fixed timeline (the 72-hour rule that applies to GDPR personal data breaches does not apply here), but the expectation is that the report is made as soon as the provider has determined that an incident qualifies as serious and has gathered the minimum information needed to describe it. The AI Office may request additional information after the initial report.

Reports should document: the nature of the incident and the GPAI model's contribution to it; the scale and nature of the harm or threatened harm; the affected users, deployers, or third parties; the model version and any relevant deployment configuration; the measures the provider has taken or plans to take in response; and any open questions about the incident's causes or consequences.

Obligation 3: State-of-the-Art Cybersecurity Measures

Providers of systemic risk GPAI models must implement state-of-the-art cybersecurity measures adequate to protect the model, its weights, and its infrastructure against attacks, and they must report cybersecurity incidents to the AI Office.

Why cybersecurity is a distinct GPAI obligation

Cybersecurity obligations under the EU AI Act apply to high-risk AI system providers in the form of robustness and accuracy requirements (Art.15). For systemic risk GPAI models, a separate cybersecurity obligation is warranted by the model's value as a target: a GPAI model with frontier capabilities is itself a strategic asset whose theft, manipulation, or weaponisation by a state or non-state adversary represents a systemic risk distinct from any individual deployment context.

The cybersecurity measures must cover the model weights (protecting against model theft through weight extraction or exfiltration), the training infrastructure (protecting the integrity of future training runs), the inference infrastructure (protecting against adversarial input attacks at scale and API abuse), and the supply chain for model components and data (protecting against poisoning attacks during training).

State-of-the-art standard

The regulation uses the "state of the art" standard — the same formulation used in NIS2 and in the high-risk AI requirements — which means the measures must reflect current best practice in the security field rather than minimum acceptable security. For frontier AI providers, this means active penetration testing of model serving infrastructure, model access controls that prevent large-scale weight extraction through inference attacks, security vetting of data supply chains, and network security architectures that limit lateral movement in the event of a breach.

Cybersecurity incidents affecting the model, its weights, or its infrastructure must be reported to the AI Office. The connection to NIS2 is relevant here: providers of systemic risk GPAI models that qualify as essential or important entities under NIS2 must report cybersecurity incidents under both frameworks, and the AI Office and NIS2 national authorities are expected to coordinate.

Obligation 4: Energy Efficiency Reporting

Providers of systemic risk GPAI models must document and report information about their energy consumption. This obligation reflects the regulatory recognition that frontier model training and inference at scale represents a significant and growing share of data centre energy demand, and that systemic risk model providers — whose models are likely to be among the most computationally intensive to train and serve — carry a specific transparency obligation in this area.

What must be reported

The energy reporting obligation covers: training energy consumption (measured in MWh, ideally broken down by training phase and hardware type); inference energy consumption (measured as energy per token or per query, at representative load profiles); the energy mix used during training and inference (proportion of renewable energy); and the geographic distribution of training and inference infrastructure (relevant to the energy mix assessment).

These figures must be reported to the AI Office and made publicly available in a standardised format. The intent is to enable cross-model comparisons that allow regulators, downstream providers, and deployers to assess the sustainability profile of systemic risk models as part of procurement and deployment decisions.

The energy efficiency reporting obligation for systemic risk models is separate from and additional to the training data summary requirements under Art.41, though providers preparing comprehensive technical documentation will naturally consolidate both in a single documentation package.

Art.43 × Art.41 Interface: Cumulative Documentation Architecture

For providers subject to both Art.41 and Art.43, the documentation architecture must integrate both articles' requirements into a coherent compliance package. The key integration points are:

Technical Documentation Expansion

Art.41's technical documentation requirements (for the AI Office and downstream providers) must be expanded with systemic risk model elements: model evaluation methodology and results, adversarial testing protocols and findings, cybersecurity architecture and security posture, and energy consumption figures. The AI Office's technical documentation templates for systemic risk models incorporate these elements.

Downstream Information Package Expansion

Art.41's downstream provider information package must include information specifically relevant to systemic risk model deployment: the scope of adversarial testing conducted and any residual risks identified, the incident reporting channels and requirements for downstream providers detecting model-level incidents, cybersecurity integration requirements for downstream serving infrastructure, and any capability limitations or constraints imposed based on evaluation results.

Ongoing Obligation Synchronisation

Art.41's ongoing obligations (updating documentation when the model changes, maintaining copyright policy, updating training data summary) must be synchronised with Art.43's ongoing obligations (repeating adversarial testing after significant model updates, monitoring for and reporting serious incidents). For organisations managing compliance for frontier model releases, this requires a compliance operation that runs continuously rather than just at model release points.

Python Implementation: SystemicRiskGPAIManager

The following implementation provides a framework for managing Art.43 obligations across the systemic risk GPAI model compliance lifecycle.

from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime, date
from enum import Enum
from typing import Optional


class SystemicRiskClassification(Enum):
    NOT_CLASSIFIED = "not_classified"
    PRESUMED_SYSTEMIC = "presumed_systemic_risk"  # >10^25 FLOPs
    COMMISSION_DESIGNATED = "commission_designated"
    REBUTTAL_PENDING = "rebuttal_pending"
    REBUTTAL_ACCEPTED = "rebuttal_accepted_not_systemic"


class EvaluationStatus(Enum):
    SCHEDULED = "scheduled"
    IN_PROGRESS = "in_progress"
    COMPLETED = "completed"
    REMEDIATION_REQUIRED = "remediation_required"


class IncidentSeverity(Enum):
    MINOR = "minor"              # product-level, not systemic
    SIGNIFICANT = "significant"  # monitor and document
    SERIOUS = "serious"          # report to AI Office


@dataclass
class TrainingComputeProfile:
    total_flops: float           # total FLOPs used in training
    hardware_type: str           # GPU/TPU type
    training_duration_days: int
    energy_mwh: float            # measured training energy
    energy_mix_renewable_pct: float
    datacenter_locations: list[str]

    @property
    def exceeds_threshold(self) -> bool:
        return self.total_flops > 1e25

    @property
    def systemic_risk_presumed(self) -> bool:
        return self.exceeds_threshold


@dataclass
class AdversarialTestingResult:
    test_date: date
    methodology: str             # e.g. "HELM + human red-team"
    evaluator_type: str          # "internal" | "external" | "ai_office_coordinated"
    domains_tested: list[str]    # e.g. ["biosecurity", "cybersecurity", "disinformation"]
    critical_findings: list[str]
    residual_risks: list[str]
    remediation_measures: list[str]
    evaluation_version: str      # model version evaluated
    next_evaluation_trigger: str  # e.g. "on next major update" or specific date

    @property
    def has_unmitigated_critical_findings(self) -> bool:
        return len(self.critical_findings) > len(self.remediation_measures)


@dataclass
class CybersecurityMeasures:
    weight_protection_implemented: bool
    training_infra_security_audit_date: Optional[date]
    inference_infra_pen_test_date: Optional[date]
    supply_chain_security_review_date: Optional[date]
    incident_detection_system: bool
    last_security_review_date: Optional[date]
    nis2_entity_status: str      # "essential" | "important" | "not_applicable"

    def compliance_gaps(self) -> list[str]:
        gaps = []
        if not self.weight_protection_implemented:
            gaps.append("Model weight protection measures not implemented")
        if not self.training_infra_security_audit_date:
            gaps.append("Training infrastructure security audit not conducted")
        if not self.inference_infra_pen_test_date:
            gaps.append("Inference infrastructure penetration test not conducted")
        if not self.supply_chain_security_review_date:
            gaps.append("Supply chain security review not conducted")
        if not self.incident_detection_system:
            gaps.append("Cybersecurity incident detection system not in place")
        return gaps


@dataclass
class SeriousIncidentReport:
    incident_id: str
    detection_date: datetime
    report_date: Optional[datetime]
    model_version: str
    incident_description: str
    scale_of_harm: str
    affected_parties: list[str]
    model_contribution: str
    measures_taken: list[str]
    open_questions: list[str]
    ai_office_notified: bool = False
    severity: IncidentSeverity = IncidentSeverity.SIGNIFICANT

    @property
    def requires_ai_office_notification(self) -> bool:
        return self.severity == IncidentSeverity.SERIOUS

    @property
    def notification_overdue(self) -> bool:
        if not self.requires_ai_office_notification:
            return False
        if self.ai_office_notified:
            return False
        days_since_detection = (datetime.now() - self.detection_date).days
        return days_since_detection > 3


@dataclass
class EnergyEfficiencyReport:
    reporting_period_start: date
    reporting_period_end: date
    training_energy_mwh: float
    inference_energy_per_1m_tokens_kwh: float
    total_inference_energy_mwh: float
    renewable_energy_pct: float
    datacenter_pue: float        # Power Usage Effectiveness
    carbon_footprint_tonnes_co2e: float
    energy_certified: bool       # third-party certification

    def report_summary(self) -> str:
        return (
            f"Energy Report {self.reporting_period_start} to {self.reporting_period_end}: "
            f"Training {self.training_energy_mwh:.1f} MWh, "
            f"Inference {self.total_inference_energy_mwh:.1f} MWh, "
            f"Renewable {self.renewable_energy_pct:.0f}%, "
            f"Carbon {self.carbon_footprint_tonnes_co2e:.1f} tCO2e"
        )


@dataclass
class SystemicRiskGPAIManager:
    model_name: str
    model_version: str
    provider_name: str
    classification: SystemicRiskClassification
    training_compute: TrainingComputeProfile
    adversarial_tests: list[AdversarialTestingResult] = field(default_factory=list)
    cybersecurity: Optional[CybersecurityMeasures] = None
    incidents: list[SeriousIncidentReport] = field(default_factory=list)
    energy_reports: list[EnergyEfficiencyReport] = field(default_factory=list)

    def is_systemic_risk_model(self) -> bool:
        return self.classification in (
            SystemicRiskClassification.PRESUMED_SYSTEMIC,
            SystemicRiskClassification.COMMISSION_DESIGNATED,
        )

    def latest_adversarial_test(self) -> Optional[AdversarialTestingResult]:
        if not self.adversarial_tests:
            return None
        return max(self.adversarial_tests, key=lambda t: t.test_date)

    def pending_serious_incident_reports(self) -> list[SeriousIncidentReport]:
        return [
            inc for inc in self.incidents
            if inc.requires_ai_office_notification and not inc.ai_office_notified
        ]

    def overdue_incident_reports(self) -> list[SeriousIncidentReport]:
        return [inc for inc in self.incidents if inc.notification_overdue]

    def art43_compliance_assessment(self) -> dict:
        if not self.is_systemic_risk_model():
            return {"status": "not_applicable", "reason": "Model not classified as systemic risk"}

        issues = []

        # Adversarial testing
        latest_test = self.latest_adversarial_test()
        if not latest_test:
            issues.append("CRITICAL: No adversarial testing conducted")
        elif (date.today() - latest_test.test_date).days > 365:
            issues.append("WARNING: Adversarial testing older than 12 months — repeat evaluation required")
        elif latest_test.has_unmitigated_critical_findings:
            issues.append(f"CRITICAL: {len(latest_test.critical_findings)} unmitigated critical findings from adversarial testing")

        # Cybersecurity
        if not self.cybersecurity:
            issues.append("CRITICAL: Cybersecurity measures not documented")
        else:
            for gap in self.cybersecurity.compliance_gaps():
                issues.append(f"GAP: {gap}")

        # Incident reporting
        for incident in self.overdue_incident_reports():
            issues.append(f"CRITICAL: Serious incident {incident.incident_id} notification to AI Office overdue")

        # Energy reporting
        if not self.energy_reports:
            issues.append("WARNING: No energy efficiency report filed")

        return {
            "model": self.model_name,
            "version": self.model_version,
            "classification": self.classification.value,
            "total_issues": len(issues),
            "issues": issues,
            "status": "COMPLIANT" if not issues else "NON_COMPLIANT",
        }

    def generate_ai_office_summary(self) -> str:
        assessment = self.art43_compliance_assessment()
        latest_test = self.latest_adversarial_test()
        lines = [
            f"Art.43 Compliance Summary — {self.model_name} v{self.model_version}",
            f"Provider: {self.provider_name}",
            f"Classification: {self.classification.value}",
            f"Training compute: {self.training_compute.total_flops:.2e} FLOPs "
            f"({'exceeds' if self.training_compute.exceeds_threshold else 'below'} 10^25 threshold)",
            "",
            "Adversarial Testing:",
        ]
        if latest_test:
            lines.append(f"  Last test: {latest_test.test_date} ({latest_test.evaluator_type})")
            lines.append(f"  Domains: {', '.join(latest_test.domains_tested)}")
            lines.append(f"  Critical findings: {len(latest_test.critical_findings)}")
            lines.append(f"  Remediation measures: {len(latest_test.remediation_measures)}")
        else:
            lines.append("  No adversarial testing recorded")

        lines.append("")
        lines.append(f"Open serious incidents requiring AI Office notification: {len(self.pending_serious_incident_reports())}")
        lines.append(f"Overdue notifications: {len(self.overdue_incident_reports())}")
        lines.append("")
        lines.append(f"Compliance status: {assessment['status']}")
        if assessment['issues']:
            lines.append("Issues:")
            for issue in assessment['issues']:
                lines.append(f"  • {issue}")
        return "\n".join(lines)

Art.43 Compliance Checklist

Providers of GPAI models that meet or may meet the systemic risk threshold should work through the following checklist.

Classification and Threshold Assessment

Have you measured and documented your total training compute in FLOPs for all training phases contributing to the model's current state?
If your training compute exceeds 10^25 FLOPs, have you notified the AI Office and registered the model in the AI Office database?
If your training compute is near (within an order of magnitude of) the 10^25 threshold, have you assessed whether architectural efficiency or multimodal capability could trigger a Commission designation independent of the compute figure?
If you believe your model is presumed systemic risk but does not actually present systemic risk, have you prepared and submitted a rebuttal to the AI Office with documented evidence?
Do you have a process for monitoring changes to the Commission's delegated acts updating the computational threshold?

Model Evaluation and Adversarial Testing

Have you conducted pre-release adversarial testing covering the main systemic risk domains: mass-casualty weapons, offensive cyber, coordinated disinformation, and CSAM generation?
Were the adversarial testing evaluators qualified and independent — either external red-team providers, internal safety teams with separation from model development, or national AI safety institute participation?
Have you documented your adversarial testing methodology, test sets, and findings in a format accessible to the AI Office?
Have you implemented remediation measures for all critical findings identified in adversarial testing, or documented the residual risk justification for findings you have not remediated?
Do you have a trigger-based schedule for repeating adversarial testing after significant model updates or fine-tuning operations?
If the AI Office requests participation in a coordinated evaluation process, do you have the operational capacity and contractual authorisation to participate?

Serious Incident Reporting

Do you have a monitoring system capable of detecting serious incidents at the model level — including incidents originating in downstream products that are attributable to your model's capabilities?
Do you have escalation procedures for assessing whether an observed incident reaches the serious incident threshold requiring AI Office notification?
Do you have reporting templates and operational procedures for filing serious incident reports to the AI Office without undue delay?
Have you established channels through which downstream providers can report to you model-level incidents observed in their products?

Cybersecurity Measures

Have you implemented technical measures to protect your model's weights against extraction, exfiltration, or theft, including controls on inference API access that limit weight reconstruction attacks?
Have you conducted a security audit of your training infrastructure covering access controls, data pipeline integrity, and supply chain security?
Have you conducted penetration testing of your inference infrastructure covering API security, rate limiting, and lateral movement risk?
If your organisation qualifies as an essential or important entity under NIS2, have you integrated your Art.43 cybersecurity obligations with your NIS2 compliance programme?
Do you have an internal process for assessing, documenting, and (where required) reporting cybersecurity incidents affecting your model or infrastructure to the AI Office?

Energy Efficiency Reporting

Do you measure and record training energy consumption in MWh, with appropriate granularity to distinguish pre-training, fine-tuning, and reinforcement learning phases?
Do you measure and record inference energy consumption in a standardised unit (energy per 1M tokens or per query at representative load) and have you filed this information with the AI Office in the required format?

Enforcement and Practical Implications

The AI Office is the primary enforcement authority for Art.43 obligations. It can request documents and information from systemic risk GPAI model providers, conduct evaluations (including coordinated adversarial testing), issue compliance orders, and impose fines. The fines for Art.43 violations follow the general GPAI model violations scale: up to 3% of worldwide annual turnover or €15 million, whichever is higher — below the 6% maximum applicable to prohibited practices but above the 1% applicable to information provision failures.

For frontier AI developers, the practical implication is that Art.43 is not a box-checking exercise — it is a regulatory framework that will evolve in specificity as the AI Office develops implementation guidance, model evaluation protocols, and incident reporting procedures. Developers operating in this space should track the AI Office's model evaluation methodology development (which is parallel-tracked with the GPAI Code of Practice), engage with national AI safety institute programmes that provide evaluation capacity, and structure their compliance operations to handle the ongoing nature of the obligations rather than treating them as pre-release checkpoints.

Article 43 is, in regulatory terms, a statement of priorities: at the frontier of AI capability, the EU is asserting that evaluation, reporting, security, and transparency are non-negotiable — and that the organisations operating at that frontier bear the cost of demonstrating those properties to public authorities, not just asserting them to customers.