2026-06-05·5 min read·sota.io Team

GPAI Systemic Risk Assessment & Red Teaming: EU AI Act Art.53 Developer Protocol

Post #4 in the sota.io EU AI Act GPAI Code of Practice Series

GPAI Systemic Risk Assessment and Red Teaming Protocol under EU AI Act Art.53

If your organisation trains or deploys a general-purpose AI (GPAI) model and it crosses the 10²⁵ floating-point operations (FLOPs) training-compute threshold defined in Article 51 of the EU AI Act, you are in systemic risk territory. That classification triggers a separate and significantly more demanding compliance regime under Article 53 — one that goes well beyond transparency obligations and enters the realm of mandatory adversarial testing, capability evaluation, and formal incident notification to the AI Office.

August 2, 2026 is the enforcement date for GPAI provisions. With less than two months remaining, providers who have not yet initiated their systemic risk assessment are dangerously close to default. This guide breaks down what Article 53 requires, how the GPAI Code of Practice translates those requirements into concrete protocols, and what a defensible red teaming programme looks like in practice.


What Makes a GPAI Model "Systemic Risk"?

The EU AI Act distinguishes between two tiers of GPAI models:

General-purpose AI models (all): Subject to transparency and copyright obligations (Art.52, Art.53(1)(b) copyright track). Covered in posts #2 and #3 of this series.

General-purpose AI models with systemic risk: Subject to all of the above plus the more stringent Art.53 safety obligations.

The Threshold: Article 51

Article 51 sets the classification criterion: a GPAI model is deemed to present systemic risk if it has been trained using more than 10²⁵ FLOPs. This is a compute-based proxy for capability — it captures frontier-scale models while excluding smaller models that pose fewer cross-sector risks.

The Commission may update this threshold by delegated act as the technology evolves.

The AI Office may also classify a model as systemic risk based on qualitative criteria — even if it falls below the FLOPs threshold — when the model demonstrates capabilities that could have significant impacts on public health, safety, or fundamental rights. This opens the door to classification based on observed capabilities, not just compute.

Practical implications for providers:


Article 53 Obligations: The Systemic Risk Regime

Article 53 imposes four principal obligations on providers of GPAI models with systemic risk. Each has implementation implications that extend from technical architecture to organisational process.

1. Model Evaluation and Adversarial Testing

Art.53(1)(a) requires providers to perform state-of-the-art model evaluations in accordance with standardised evaluation protocols and to conduct adversarial testing by qualified independent experts.

This is the obligation most commonly described as "red teaming" in industry practice. It is not a one-time exercise — the regulation envisions an ongoing programme that reflects the model's state at the point of market placement and at significant model updates.

What the GPAI Code of Practice adds:

The GPAI Code of Practice (Track 1: Safety and Security) specifies that adversarial testing must be:

Red teaming scope under the Code of Practice:

Testing CategoryWhat It Covers
Safety failuresHarmful content generation, refusal bypasses, prompt injection
Misuse potentialCBRN uplift, cyberoffence capability, persuasion at scale
DeceptionHidden capabilities, goal misgeneralisation, sycophancy risks
Cross-context failuresBehaviour under fine-tuning by downstream deployers
Cumulative riskEmergent risks from model use in combination with other systems

2. Serious Incident Notification

Art.53(1)(b) requires providers to notify the AI Office (and, where appropriate, relevant national competent authorities) of any serious incidents and any corrective measures taken.

The notification obligation is not retrospective — it applies to incidents that occur after the provider becomes aware of them. The GPAI Code of Practice defines a "serious incident" in the context of GPAI systemic risk models as an incident that:

Important timing detail: The GPAI Code of Practice specifies initial notification within 72 hours of becoming aware of a potential serious incident, followed by a fuller report within 30 days.

This differs from the general AI Act incident timelines under Art.73 (which covers incidents involving deployed high-risk AI systems). GPAI systemic risk notifications go to the AI Office directly, not to national market surveillance authorities.

3. Cybersecurity Measures

Art.53(1)(c) requires providers to ensure an appropriate level of cybersecurity protection for the GPAI model and its model weights.

For most frontier model providers, this maps onto existing security engineering practices — access control to model weights, secure inference infrastructure, and protection of fine-tuning interfaces. However, the systemic risk classification elevates the stakes: weights for a GPAI model with systemic risk are a regulated asset.

Minimum expectations from the Code of Practice:

4. Post-Market Monitoring

Art.53(1)(d) requires providers to document and report to the AI Office information relevant to understanding the model's capabilities and risks, including results of internal evaluations and any significant updates.

This is not just a documentation burden — it is the mechanism by which the AI Office builds its ongoing picture of systemic risk across the frontier model ecosystem. Providers should treat it as a reporting relationship, not a one-time filing.


Building a Defensible Red Teaming Programme

Given the centrality of adversarial testing to the systemic risk regime, providers need a structured programme that can withstand scrutiny. Here is a framework aligned with the GPAI Code of Practice:

Programme Architecture

1. Scope definition Before testing begins, define the model scope: which capabilities are in scope for red teaming, which use-case contexts are covered, and what constitutes a finding that must be escalated. The scope should map directly to the attack categories enumerated in the Code of Practice.

2. Expert independence The Code of Practice is explicit: red teamers must be independent from the development team. For most organisations, this means either:

3. Structured attack categories Move beyond ad-hoc "break the model" exercises. Structure testing around the five categories above, with specific test cases documented for each. The documentation must be detailed enough to demonstrate systematic coverage to the AI Office.

4. Finding classification Establish a severity taxonomy before testing begins. The Code of Practice distinguishes between:

5. Iteration and re-testing Red teaming for a GPAI model with systemic risk is not a gate — it is a process. Significant capability updates trigger re-testing within the scope of the updated capabilities. Document what "significant update" means in your governance framework before you ship.

The Model Card Connection

Your systemic risk red teaming results feed directly into the model card (covered in post #2 of this series). The model card must disclose:

The GPAI Code of Practice treats the model card as the primary disclosure mechanism — downstream deployers rely on it to understand what safeguards are already in place and what they must add.


Capability Evaluation Framework

Beyond adversarial testing, Art.53(1)(a) also requires state-of-the-art model evaluations. This is the capability assessment side of the obligation — understanding what the model can and cannot do, and how those capabilities map to risk.

Benchmark Selection

The AI Office and the Code of Practice workgroup are developing standardised evaluation benchmarks. Until those are formally adopted, providers are expected to use:

Uplift Assessments

For GPAI models at systemic scale, the AI Office pays particular attention to whether the model provides meaningful uplift in capability areas that could enable mass-harm events. This includes:

If your model provides meaningful uplift in any of these areas, you have a material risk finding that must be disclosed to the AI Office — even if the model has safeguards in place. The safeguards go into the finding record; the capability is the finding.


Timeline and What to Do This Month

With enforcement beginning August 2, 2026, providers of GPAI models with systemic risk should be at the following stages:

TaskCompletion Target
Confirm whether model exceeds Art.51 FLOPs thresholdDone
Identify or engage independent red teamJune 2026
Define scope and attack categories for adversarial testingJune 2026
Complete structured adversarial testing (Phase 1)July 1, 2026
Remediate critical findingsJuly 15, 2026
Finalise capability evaluation and uplift assessmentJuly 15, 2026
Complete model card with red team resultsJuly 20, 2026
Establish serious incident notification procedureJuly 20, 2026
Internal dry run: simulate an AI Office information requestJuly 25, 2026
August 2 deadline✅ Compliant

For Downstream Deployers

If you are deploying a GPAI model with systemic risk (via API or integrated service) but are not the model provider, your obligations differ — but you are not exempt.

Under the Code of Practice, downstream deployers must:

If the provider has not published a model card or cannot demonstrate they have completed adversarial testing, that is a red flag for your procurement due diligence.


Looking Ahead: Post #5 — The GPAI Compliance Stack

Post #5 (the series finale) will consolidate the full compliance stack for GPAI providers and deployers: the model registry, transparency documentation, copyright compliance, systemic risk assessment, and incident reporting — assembled into an August 2026 go-live checklist.

The sota.io GPAI Compliance Summary (Series So Far)

PostTopicStatus
#1: Developer IntroductionCode of Practice overview and GPAI classification✅ Published
#2: Model Card & TransparencyTechnical documentation obligations, model card structure✅ Published
#3: Copyright & Training DataArt.53 copyright policy, TDM opt-out, audit approach✅ Published
#4: Systemic Risk & Red TeamingArt.53 adversarial testing, capability evaluation, notification📖 This post
#5: Compliance Stack FinaleFull GPAI compliance checklist, August 2026 go-live🔲 Coming next

sota.io runs on EU infrastructure. Every API we expose is processed on servers in the European Union, governed by GDPR, and subject to EU law — not US Cloud Act jurisdiction. Start your free trial and ship EU-compliant AI applications without cross-border data transfer risk.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.