2026-04-14·13 min read·sota.io team

GPAI Code of Practice Chapter 3: Adversarial Testing, Red-Teaming, and Incident Reporting for Systemic Risk AI (2026)

The GPAI Code of Practice, adopted by the EU AI Office in July 2025, has three chapters. Chapters 1 (Transparency) and 2 (Copyright) apply to all GPAI model providers. Chapter 3 (Safety & Security) applies only to Systemic Risk providers — those whose models exceed the 10^25 FLOPs training compute threshold defined in Art.51(1)(a) of the EU AI Act, or whose models are designated as Systemic Risk by the AI Office under Art.51(1)(b).

This post is a technical deep-dive into Chapter 3. Our earlier guide (GPAI Code of Practice Final: Implementation Guide) covered all three chapters at overview level. This guide focuses exclusively on what Chapter 3 actually requires in practice: the ten Safety & Security measures (S-01 through S-10), how to run a compliant red-teaming program, what triggers incident reporting, and what the cybersecurity controls mean at the implementation level.

If your GPAI model is below the 10^25 FLOPs threshold and has not been individually designated by the AI Office, Chapter 3 does not apply to you. If it does apply, Chapter 3 is the most operationally demanding part of GPAI compliance.


The Systemic Risk Threshold: Art.51 and CoP Chapter 3 Scope

Before Chapter 3 obligations apply, the provider must determine whether their model qualifies as a Systemic Risk GPAI model.

The Threshold: 10^25 FLOPs

Art.51(1)(a) sets the primary trigger: a GPAI model is presumed to present systemic risk if it was trained using a total computing power of more than 10^25 floating point operations (FLOPs). This threshold was first established in the EU AI Act's recitals as an indicative figure for frontier-scale models and was codified in the final text.

The 10^25 FLOPs figure covers training compute across all training runs contributing to the final model — pre-training, fine-tuning on instruction-following data, and safety alignment fine-tuning all count toward the total. Compute figures are typically measured in GPU-hours and converted to FLOPs based on hardware specifications. GPAI CoP Chapter 1 (Transparency) requires providers to document FLOPs in the technical documentation for all GPAI models; for Systemic Risk threshold assessment, this documentation is also the primary evidence for self-assessment.

Training ComputeArt.51 StatusChapter 3 Applies?
< 10^25 FLOPsBelow threshold — no automatic Systemic RiskNo (unless Art.51(1)(b) designation)
≥ 10^25 FLOPsPresumed Systemic Risk under Art.51(1)(a)Yes
Any computeAI Office designation under Art.51(1)(b)Yes if designated

AI Office Designation: Art.51(1)(b)

The AI Office can designate any GPAI model as Systemic Risk based on factors other than raw training compute. This pathway exists because training compute is not the only proxy for systemic risk — a model with 5×10^24 FLOPs that achieves dangerous capability levels through exceptionally efficient training or architectural innovation can be designated. Art.51(1)(b) designation is based on capability evaluations, not self-reported compute.

The AI Office publishes the list of designated Systemic Risk GPAI models. Providers that cross the 10^25 FLOPs threshold must notify the AI Office and have 15 working days to submit a self-assessment. The CoP provides a standard format for this notification.

Annual Re-Evaluation

Systemic Risk designation is not permanent. As capability benchmarks evolve and new models enter the market, the AI Office conducts annual reviews. A model that was Systemic Risk may be de-listed; a model that was below threshold may be added. The Chapter 3 obligations track designation status, not just initial classification.


Chapter 3 Structure: Ten Safety & Security Measures

The GPAI CoP Chapter 3 is organized into ten Safety & Security measures grouped into three functional categories:

Pre-Deployment Adversarial Testing (S-01 to S-04): Testing that must occur before initial deployment and before material changes.

Incident Reporting (S-05 to S-07): The process and timeline for reporting serious incidents to the AI Office.

Cybersecurity Controls (S-08 to S-10): Continuous operational controls for model security.

MeasureCategoryCore Requirement
S-01Pre-DeploymentDefine adversarial testing scope across five capability categories
S-02Pre-DeploymentConduct red-teaming with independent third-party evaluators
S-03Pre-DeploymentPre-deployment gate decision incorporating test results
S-04Pre-DeploymentRetesting triggers for material changes and new deployment contexts
S-05Incident ReportingSerious incident definition and internal classification process
S-06Incident Reporting72-hour initial notification to AI Office GPAI Incident Portal
S-07Incident Reporting15-day root cause analysis report
S-08CybersecurityPrompt injection protection — input validation and output monitoring
S-09CybersecurityModel weight access control — physical and logical security
S-10CybersecurityAnomaly monitoring and behavioral drift detection

Pre-Deployment Adversarial Testing (S-01 to S-04)

S-01: Adversarial Testing Scope

Measure S-01 requires providers to define and document the scope of adversarial testing before each deployment. The CoP specifies five capability categories that must always be included in the testing scope for Systemic Risk models:

Category 1 — CBRN Uplift (Chemical, Biological, Radiological, Nuclear) The model must be tested for its ability to provide uplift to actors seeking to synthesize, acquire, or deploy CBRN agents. "Uplift" means meaningfully increasing the capability of a non-expert actor beyond what is freely available in public literature. Testing uses structured threat scenarios — e.g., a red-teamer plays a chemistry graduate student attempting to synthesize a schedule-1 substance using only the model's guidance. The test measures how much the model's responses narrow the capability gap between an expert and a non-expert in operational terms.

Category 2 — Cyberoffensive Capabilities The model must be tested for its ability to assist in creating novel malware, exploiting previously-unpatched vulnerabilities, or automating offensive cybersecurity operations at scale. This includes both direct code generation and step-by-step operational guidance that constitutes meaningful assistance beyond publicly documented techniques.

Category 3 — Critical Infrastructure Dependencies The model must be tested for its ability to generate accurate, operationally specific guidance for attacking or disrupting critical infrastructure — electrical grids, water treatment systems, financial clearing infrastructure, communications networks. Testing includes both direct attack guidance and scenario reconstruction where indirect guidance could be assembled into an attack plan.

Category 4 — Autonomous Goal-Seeking Behavior The model must be tested for its ability to pursue multi-step goals autonomously in agentic contexts — taking actions in the real world, acquiring resources, resisting shutdown, and deceiving operators or users about its activities. This category is specifically relevant for GPAI models deployed in agentic frameworks. Testing uses sandboxed environments with tool-calling capabilities and structured goal-seeking tasks.

Category 5 — Large-Scale Persuasion and Manipulation The model must be tested for its ability to generate highly effective personalized persuasive content at scale — targeted influence operations, coordinated disinformation campaigns, psychological manipulation at individual level. Testing includes both direct generation tasks and scaled deployment scenarios where the model's outputs could be used for mass influence operations.

For each category, the testing scope document must specify: the threat model, the test methodology, the pass/fail criteria, and the remediation pathway if test results indicate unacceptable risk levels.

S-02: Red-Teaming Methodology and Third-Party Evaluators

Measure S-02 governs who conducts the testing and how. The CoP requires a hybrid model: internal safety team testing plus at least one independent external red-teaming exercise per model generation before initial deployment.

Internal testing: The provider's own safety, alignment, or red-teaming team conducts structured capability evaluations across all five S-01 categories. Internal testing is iterative — it informs model modifications throughout development. Internal testers must have documented independence from the model development team: results cannot be filtered by product owners before safety team review.

External / independent evaluators: At least one external red-teaming exercise must be conducted by a third party that meets independence requirements. The CoP defines independence as: no financial dependency on the model development team, no prior engagement that could create conflict of interest, no access to the model's training data or architecture details beyond what is provided for the specific evaluation.

Third-party evaluator requirements:

The AI Office maintains a non-binding registry of evaluators that have completed CoP methodology training. Using a registry-listed evaluator is not required but creates a documentation trail that simplifies the presumption of conformity argument.

Automated evaluations (model vs. model attacks, benchmark scoring, automated prompt injection testing) may supplement but do not substitute for human red-teaming in the external evaluator requirement.

S-03: Pre-Deployment Gate Decision

Measure S-03 requires that the pre-deployment testing results be formally incorporated into the deployment decision. This is not a bureaucratic requirement — it defines the conditions under which a model may be deployed and what happens when test results indicate unacceptable risk levels.

The CoP requires a documented deployment gate review that includes:

If any capability category is assessed as unacceptable risk level and cannot be mitigated before the target deployment date, the deployment must be delayed. The CoP does not permit deploying with documented unacceptable risk and a plan to fix it post-deployment.

Risk categorization framework for S-03:

Risk LevelDefinitionGate Decision
AcceptableTesting confirms capability category presents negligible uplift or riskDeploy permitted
Acceptable with mitigationsTesting identifies risk; mitigations implemented and retestedDeploy permitted after successful retest
Conditionally acceptableRisk present but below deployment threshold; monitoring plan requiredDeploy with enhanced monitoring commitment
UnacceptableTesting confirms capability category presents meaningful uplift or risk beyond thresholdNo deployment until risk is reduced

The deployment gate document must be retained for the duration the model is deployed plus five years, and produced on request to the AI Office.

S-04: Retesting Triggers

Measure S-04 requires that the pre-deployment testing cycle be repeated when specified conditions occur post-initial deployment. The rationale: models that pass initial red-teaming may have capabilities change through fine-tuning, deployment in new contexts, or emergent behaviors discovered in production.

Mandatory retesting triggers under S-04:

Retesting does not always require a full external red-team exercise. If the trigger is a narrow change — for example, a fine-tuning run that modified only the model's tone and writing style — a targeted internal retest of the affected capability categories may satisfy S-04. The provider must document the rationale for why a limited retest was appropriate.


Incident Reporting (S-05 to S-07)

S-05: Serious Incident Definition

Measure S-05 requires providers to maintain an internal incident classification process that identifies when an event constitutes a "serious incident" triggering reporting obligations under S-06 and S-07.

The CoP defines a serious incident as any of the following:

Death or serious physical harm to persons: Any incident where the model's output is determined to be a causal or contributing factor in physical injury or death. This includes cases where the model provided medical misinformation that led to harmful treatment decisions, operational guidance that contributed to physical accidents, or content that contributed to self-harm.

CBRN-relevant outputs discovered in production: Discovery that the model has provided meaningful uplift in any of the CBRN categories to a user or group of users, regardless of whether harm has occurred. The discovery itself — through monitoring, external report, or law enforcement notification — is a reportable event.

Large-scale psychological or societal harm: Documented use of the model's outputs in large-scale influence operations, coordinated manipulation campaigns, or disinformation events that caused measurable societal harm.

Critical infrastructure incident: Discovery that the model's outputs were used in or contributed to an attack on or disruption of critical infrastructure.

Model security breach: Unauthorized access to model weights, training data, or inference infrastructure by a party with potentially malicious intent.

The CoP specifies that the incident classification must be made by the provider's safety or incident response team, not by the product or business teams. This independence requirement prevents incidents from being reclassified to avoid reporting obligations.

S-06: 72-Hour Initial Notification

Within 72 hours of classifying an event as a serious incident, the provider must submit an initial notification to the AI Office through the GPAI Incident Reporting Portal. The 72-hour clock starts when the provider has sufficient information to classify the event as serious — not when the underlying event occurred.

The initial notification must contain:

FieldRequired Content
Incident identifierProvider-assigned unique ID for tracking
Incident categoryWhich serious incident definition triggered reporting
Discovery date/timeWhen the provider became aware of the event
Classification date/timeWhen the event was classified as serious
Affected model(s)Model identifiers, versions, and deployment context
Preliminary descriptionWhat is known about the event at notification time
Immediate containment actionsSteps taken since discovery to limit harm
Preliminary causal assessmentBest current hypothesis about cause (subject to change)
Known third-party notificationsLaw enforcement, downstream providers, affected parties notified

The 72-hour window is explicitly not conditioned on having complete information. Providers must submit the initial notification with available information and update it as the investigation progresses. Failure to submit because the investigation is ongoing is not a valid reason for missing the 72-hour deadline.

Downstream provider notification: If the incident involves a model deployed through downstream providers, the upstream GPAI provider must also notify affected downstream providers within 72 hours. The CoP does not specify a separate timeline for downstream notification — it is assumed to occur in parallel with the AI Office notification.

S-07: 15-Day Root Cause Report

Within 15 calendar days of the initial S-06 notification, the provider must submit a root cause analysis report to the AI Office. The 15-day clock starts from the date of the S-06 notification, not the incident discovery date.

The root cause report must address:

What happened: A complete factual reconstruction of the incident from the earliest detectable precursor through the immediate cause of harm or risk. This includes model inputs and outputs, user interactions, the deployment context, and the causal chain between the model's behavior and the identified harm or risk.

Why it happened: The technical root cause(s). For capability-related incidents, this must include an assessment of whether the capability was present in deployment gate testing, why it was not detected, and what capability evaluation gap exists. For security incidents, this must include the attack vector, the exploitation mechanism, and how existing controls failed.

What was done immediately: The containment actions taken in the 72-hour window after discovery — model access restrictions, downstream provider notifications, affected user notifications, law enforcement engagement.

What will be done to prevent recurrence: Corrective actions with implementation timelines, updated deployment gate criteria, changes to the S-01 capability category scope, security control improvements. Each corrective action must identify the responsible individual and the completion date.

Evidence package: Relevant logs, monitoring data, model evaluation records, user interaction data (pseudonymized per GDPR requirements), and third-party reports. The AI Office may request additional evidence after receiving the root cause report.

The AI Office reviews the root cause report and may:


Cybersecurity Controls (S-08 to S-10)

S-08: Prompt Injection Protection

Measure S-08 requires GPAI providers to implement and maintain technical controls to protect against prompt injection attacks — attempts to override or subvert the model's instructions through adversarial inputs.

Input-level controls:

Output-level controls:

For agentic deployments specifically: When the GPAI model operates with tool-calling capabilities or has access to external data sources, S-08 requires additional controls: validation of tool-call parameters before execution (preventing injection-driven tool misuse), sandboxing of tool execution environments, and explicit re-validation of the model's instruction state before high-consequence tool calls.

The CoP does not specify which technical implementation satisfies S-08 — it specifies the outcome requirement (reduction in successful injection rate to below a defined threshold). Providers document their implementation approach and the effectiveness metrics in the Chapter 3 compliance documentation.

S-09: Model Weight Access Control

Measure S-09 requires that model weights — the trained parameters of the GPAI model — be protected against unauthorized access, exfiltration, or modification.

Physical security: Training compute and inference infrastructure must be hosted in facilities with documented physical access controls. CLOUD Act jurisdiction is specifically identified in the CoP as a risk factor: infrastructure hosted in US-jurisdiction data centers is subject to CLOUD Act orders that could provide US government access without EU legal process. CoP guidance recommends EU-sovereign infrastructure for model weight storage, though this is not a hard requirement.

Logical security:

Supply chain security:

S-10: Anomaly Monitoring and Behavioral Drift Detection

Measure S-10 requires continuous monitoring of the model's production behavior to detect changes that may indicate a security incident, capability drift, or successful attack.

Behavioral baseline: The provider must establish a behavioral baseline derived from the deployment gate testing. The baseline documents expected output distributions across the five S-01 capability categories and normal response patterns for representative input types.

Monitoring requirements:

Response to detected drift: When anomaly monitoring detects a significant deviation from the behavioral baseline, the response process is:

  1. Internal escalation to the safety team for review
  2. If the deviation is consistent with a capability-related serious incident category under S-05, initiate the incident classification process
  3. If the deviation is not immediately explicable, trigger a targeted S-04 retest of the affected capability categories
  4. Document the anomaly, the investigation, and the determination — regardless of whether it resulted in an S-05 classification

Chapter 3 vs Art.53: Statutory Mapping

The GPAI CoP Chapter 3 measures are designed to satisfy the Art.53 statutory obligations for Systemic Risk GPAI models. Understanding the mapping is important for providers who need to demonstrate compliance through the equivalence pathway (non-CoP-signatory) or who need to respond to AI Office investigations.

Art.53 ObligationCoP MeasuresNotes
Art.53(1)(a) — Model evaluation + systemic risk mitigationS-01, S-02, S-03Red-teaming = the primary implementation of Art.53(1)(a)
Art.53(1)(a) — Adversarial testingS-01, S-02Third-party evaluators = the CoP's implementation of "adequate testing"
Art.53(1)(b) — Incident notification to AI OfficeS-05, S-06, S-07The 72h/15-day timeline is CoP-specific, not mandated by Art.53 text
Art.53(1)(c) — CybersecurityS-08, S-09, S-10CoP gives specific technical form to the statutory cybersecurity obligation
Art.53(1)(d) — Energy efficiency baselineNot in Chapter 3Energy reporting is Chapter 1 (Transparency), not Chapter 3

Presumption of conformity: A CoP signatory that has implemented all ten S-01 to S-10 measures and documented that implementation benefits from the Art.56(8) presumption of conformity for Art.53 obligations. The AI Office must prove a violation rather than the provider proving compliance.

Non-signatory equivalence: A provider not signed to the CoP must demonstrate that their alternative measures achieve equivalent protection for each Art.53(1) obligation. The AI Office's published guidance indicates that the Chapter 3 measures set the reference level for equivalence assessment — a non-signatory would need to show why their approach achieves the same or better protection than S-01 through S-10.


Python Tooling: Systemic Risk Chapter 3 Compliance Tracker

from dataclasses import dataclass, field
from datetime import date, datetime, timedelta
from enum import Enum
from typing import Optional


class CapabilityCategory(Enum):
    CBRN_UPLIFT = "cbrn_uplift"
    CYBEROFFENSIVE = "cyberoffensive"
    CRITICAL_INFRASTRUCTURE = "critical_infrastructure"
    AUTONOMOUS_GOAL_SEEKING = "autonomous_goal_seeking"
    LARGE_SCALE_PERSUASION = "large_scale_persuasion"


class RiskLevel(Enum):
    ACCEPTABLE = "acceptable"
    ACCEPTABLE_WITH_MITIGATIONS = "acceptable_with_mitigations"
    CONDITIONALLY_ACCEPTABLE = "conditionally_acceptable"
    UNACCEPTABLE = "unacceptable"


class EvaluatorType(Enum):
    INTERNAL = "internal"
    EXTERNAL_INDEPENDENT = "external_independent"
    AUTOMATED = "automated"


class IncidentCategory(Enum):
    DEATH_OR_SERIOUS_HARM = "death_or_serious_harm"
    CBRN_RELEVANT_OUTPUT = "cbrn_relevant_output"
    LARGE_SCALE_HARM = "large_scale_harm"
    CRITICAL_INFRASTRUCTURE = "critical_infrastructure"
    MODEL_SECURITY_BREACH = "model_security_breach"


@dataclass
class CapabilityTestResult:
    category: CapabilityCategory
    evaluator_type: EvaluatorType
    test_date: date
    risk_level: RiskLevel
    methodology_documented: bool
    third_party_report_received: bool  # Required if evaluator_type == EXTERNAL_INDEPENDENT
    mitigations_implemented: list[str] = field(default_factory=list)
    retest_required: bool = False
    retest_passed: Optional[bool] = None


@dataclass
class PreDeploymentGate:
    gate_date: date
    model_version: str
    responsible_officer: str
    test_results: list[CapabilityTestResult] = field(default_factory=list)
    deployment_approved: bool = False
    approval_conditions: list[str] = field(default_factory=list)

    def can_deploy(self) -> bool:
        """Returns True only if all five categories tested and none unacceptable."""
        if len(self.test_results) < 5:
            return False
        categories_tested = {r.category for r in self.test_results}
        all_categories = set(CapabilityCategory)
        if not all_categories.issubset(categories_tested):
            return False
        for result in self.test_results:
            if result.risk_level == RiskLevel.UNACCEPTABLE:
                return False
            if result.retest_required and result.retest_passed is not True:
                return False
        return True

    def missing_categories(self) -> list[CapabilityCategory]:
        tested = {r.category for r in self.test_results}
        return [c for c in CapabilityCategory if c not in tested]

    def has_external_evaluator(self) -> bool:
        return any(
            r.evaluator_type == EvaluatorType.EXTERNAL_INDEPENDENT
            for r in self.test_results
        )


@dataclass
class IncidentReport:
    incident_id: str
    category: IncidentCategory
    discovery_datetime: datetime
    classification_datetime: datetime
    affected_model_version: str
    preliminary_description: str
    containment_actions: list[str] = field(default_factory=list)
    ai_office_notified_datetime: Optional[datetime] = None
    root_cause_report_submitted_datetime: Optional[datetime] = None
    root_cause_summary: Optional[str] = None

    def notification_deadline(self) -> datetime:
        """72-hour notification deadline from classification."""
        return self.classification_datetime + timedelta(hours=72)

    def root_cause_deadline(self) -> datetime:
        """15-day root cause report deadline from notification."""
        if self.ai_office_notified_datetime:
            return self.ai_office_notified_datetime + timedelta(days=15)
        return self.notification_deadline() + timedelta(days=15)

    def notification_overdue(self) -> bool:
        if self.ai_office_notified_datetime:
            return False
        return datetime.utcnow() > self.notification_deadline()

    def root_cause_overdue(self) -> bool:
        if self.root_cause_report_submitted_datetime:
            return False
        return datetime.utcnow() > self.root_cause_deadline()

    def compliance_status(self) -> dict:
        return {
            "notification_met": self.ai_office_notified_datetime is not None
            and self.ai_office_notified_datetime <= self.notification_deadline(),
            "root_cause_met": self.root_cause_report_submitted_datetime is not None
            and self.root_cause_report_submitted_datetime <= self.root_cause_deadline(),
            "notification_overdue": self.notification_overdue(),
            "root_cause_overdue": self.root_cause_overdue(),
        }


@dataclass
class CybersecurityControls:
    prompt_injection_controls_documented: bool = False
    prompt_injection_effectiveness_metric: Optional[float] = None  # % injection attempts blocked
    model_weight_access_log_active: bool = False
    model_weight_encryption_at_rest: bool = False
    model_weight_multi_person_auth: bool = False
    eu_sovereign_infrastructure: bool = False  # CoP recommendation
    anomaly_monitoring_active: bool = False
    behavioral_baseline_established: bool = False
    anomaly_alert_threshold_defined: bool = False
    last_monitoring_review: Optional[date] = None

    def s08_compliant(self) -> bool:
        return self.prompt_injection_controls_documented

    def s09_compliant(self) -> bool:
        return (
            self.model_weight_access_log_active
            and self.model_weight_encryption_at_rest
            and self.model_weight_multi_person_auth
        )

    def s10_compliant(self) -> bool:
        return (
            self.anomaly_monitoring_active
            and self.behavioral_baseline_established
            and self.anomaly_alert_threshold_defined
        )

    def gaps(self) -> list[str]:
        result = []
        if not self.s08_compliant():
            result.append("S-08: Prompt injection controls not documented")
        if not self.s09_compliant():
            if not self.model_weight_access_log_active:
                result.append("S-09: Model weight access logging not active")
            if not self.model_weight_encryption_at_rest:
                result.append("S-09: Model weights not encrypted at rest")
            if not self.model_weight_multi_person_auth:
                result.append("S-09: Multi-person auth for weight export not implemented")
        if not self.s10_compliant():
            if not self.behavioral_baseline_established:
                result.append("S-10: Behavioral baseline not established")
            if not self.anomaly_monitoring_active:
                result.append("S-10: Anomaly monitoring not active")
            if not self.anomaly_alert_threshold_defined:
                result.append("S-10: Alert thresholds not defined")
        return result


@dataclass
class SystemicRiskCh3Tracker:
    model_name: str
    model_version: str
    systemic_risk_basis: str  # "art51_1a_threshold" or "art51_1b_designation"
    deployment_date: Optional[date]
    latest_gate: Optional[PreDeploymentGate] = None
    incidents: list[IncidentReport] = field(default_factory=list)
    cybersecurity: CybersecurityControls = field(default_factory=CybersecurityControls)
    last_annual_retest: Optional[date] = None

    ENFORCEMENT_DATE = date(2026, 8, 2)

    def days_to_enforcement(self) -> int:
        return (self.ENFORCEMENT_DATE - date.today()).days

    def annual_retest_overdue(self) -> bool:
        if not self.last_annual_retest:
            return True
        return (date.today() - self.last_annual_retest).days > 365

    def chapter3_gaps(self) -> list[str]:
        gaps = []
        # Pre-deployment gate
        if not self.latest_gate:
            gaps.append("S-01/S-02/S-03: No deployment gate on record")
        else:
            if not self.latest_gate.can_deploy():
                for cat in self.latest_gate.missing_categories():
                    gaps.append(f"S-01: Missing test for {cat.value}")
            if not self.latest_gate.has_external_evaluator():
                gaps.append("S-02: No external independent evaluator engaged")
        # Annual retest
        if self.annual_retest_overdue():
            gaps.append("S-04: Annual retest overdue")
        # Open incidents
        for incident in self.incidents:
            if incident.notification_overdue():
                gaps.append(f"S-06: Incident {incident.incident_id} — 72h notification overdue")
            if incident.root_cause_overdue():
                gaps.append(f"S-07: Incident {incident.incident_id} — 15-day root cause report overdue")
        # Cybersecurity
        gaps.extend(self.cybersecurity.gaps())
        return gaps

    def overall_ch3_readiness(self) -> float:
        """Returns 0.0 to 1.0 — fraction of Chapter 3 requirements met."""
        total = 10  # S-01 through S-10
        met = 0
        if self.latest_gate:
            if not self.latest_gate.missing_categories():
                met += 1  # S-01
            if self.latest_gate.has_external_evaluator():
                met += 1  # S-02
            if self.latest_gate.can_deploy():
                met += 1  # S-03
        if not self.annual_retest_overdue():
            met += 1  # S-04
        # S-05: incident classification process (assumed if tracking class used)
        met += 1  # S-05 (tracking infrastructure in place)
        open_incidents = [i for i in self.incidents if not i.notification_overdue()]
        if not open_incidents:
            met += 1  # S-06
        closed_incidents = [
            i for i in self.incidents
            if i.root_cause_report_submitted_datetime is not None
        ]
        if len(closed_incidents) == len(self.incidents):
            met += 1  # S-07
        if self.cybersecurity.s08_compliant():
            met += 1  # S-08
        if self.cybersecurity.s09_compliant():
            met += 1  # S-09
        if self.cybersecurity.s10_compliant():
            met += 1  # S-10
        return met / total

    def generate_ch3_report(self) -> str:
        gaps = self.chapter3_gaps()
        readiness = self.overall_ch3_readiness()
        report = [
            f"Chapter 3 Compliance Report — {self.model_name} {self.model_version}",
            f"Systemic Risk Basis: {self.systemic_risk_basis}",
            f"Enforcement: {self.days_to_enforcement()} days to August 2, 2026",
            f"Overall Readiness: {readiness:.0%}",
            "",
            "GAPS:" if gaps else "No gaps identified.",
        ]
        for gap in gaps:
            report.append(f"  ✗ {gap}")
        return "\n".join(report)

Chapter 3 Readiness Checklist (20 Items)

Pre-Deployment Adversarial Testing (S-01 to S-04):

Incident Reporting (S-05 to S-07):

Cybersecurity (S-08 to S-10):


Timeline and Enforcement

Chapter 3 obligations became applicable for Systemic Risk providers on August 2, 2025 — the date when GPAI Chapter V obligations entered into force. AI Office enforcement of GPAI obligations — including Chapter 3 — begins August 2, 2026. This means:

Interaction with the GPAI CoP Final — What Changed in July 2025:

The AI Office adopted the final CoP in July 2025. For providers already in the CoP development process (2023-2025), the final CoP text clarified several Chapter 3 specifics that were not defined in earlier drafts:

Providers who built their Chapter 3 programs against earlier draft versions should review these four areas against the final July 2025 text.

See Also