EU AI Act High-Risk AI Monitoring & Post-Market Surveillance 2026: What SaaS Providers Must Track After Deployment
Post #4 in the sota.io EU AI Act High-Risk Classification Series
Most teams spend months preparing for the EU AI Act conformity assessment — the technical documentation, risk management system, testing protocols — and then treat deployment as the finish line. It isn't.
Article 72 of the EU AI Act mandates that providers of high-risk AI systems implement a post-market monitoring system that actively collects and analyzes performance data throughout the lifecycle of the deployed system. Deployment is not the end of compliance. It is the beginning of a continuous obligation.
For SaaS providers, this creates an operational challenge that sits at the intersection of engineering, legal, and MLOps: what exactly must you monitor, how must you report, and what happens when something goes wrong?
This guide covers the full post-market surveillance framework: Article 72 monitoring obligations, Article 73 serious incident reporting, the interaction with market surveillance authorities, and the practical architecture decisions that make compliance manageable — including why the location of your infrastructure matters significantly.
What Article 72 Actually Requires
Article 72(1) states that providers "shall put in place a post-market monitoring system that is proportionate to the nature of the AI technologies and the risks of the high-risk AI system." The monitoring system must "actively and systematically collect, document and analyse relevant data" about how the system performs throughout its lifetime.
This is not passive logging. It is an active, structured system with defined scope, methodology, and outputs.
The Six Core Monitoring Obligations
1. Performance monitoring against intended purpose
You must track how the system performs against the metrics established in your Annex IV technical documentation. If your conformity assessment established that the system achieves 94% accuracy on a specific task, your monitoring system must track whether production performance holds at or near that level over time.
This includes distribution shift detection — recognizing when the statistical properties of incoming data have drifted from your training distribution in ways that may degrade performance.
2. Risks identified post-deployment
Your Article 9 Risk Management System is a living document. Article 72(2) requires that monitoring feeds new risk findings back into the system. If you discover in production that a risk you classified as low-severity during development is actually manifesting more frequently or more severely, your risk register must be updated and your risk management system revised accordingly.
3. Monitoring of "serious incidents"
Article 72(3) explicitly connects the monitoring system to incident detection. Your monitoring infrastructure must be capable of identifying events that meet the "serious incident" threshold defined in Article 3(49) — specifically, incidents that result in death, serious harm to health, significant property damage, or disruption to critical infrastructure.
4. Corrective action tracking
When the monitoring system identifies a performance issue, you must document the corrective actions taken and track whether those actions resolved the issue. The loop must close with evidence.
5. Data retention
All monitoring data must be retained for the period specified in Article 12 — ten years for high-risk AI systems in general, with some category-specific variations. This includes the raw monitoring data, the analyses performed, the incidents identified, and the corrective actions taken.
6. Cooperation with notified bodies
If your system was assessed by a notified body (Procedure B), your monitoring system must generate outputs compatible with the ongoing oversight requirements of that body. This typically means periodic reports and notification of significant changes.
Article 73: Serious Incident Reporting
Article 73 creates a mandatory reporting obligation that is separate from and more urgent than the general monitoring requirements.
What Constitutes a "Serious Incident"
Article 3(49) defines a serious incident as any incident or malfunction of a high-risk AI system that directly or indirectly leads to:
- The death of a person
- Serious injury to a person
- Significant unintended health and safety risks
- Serious and irreversible disruption of the management and operation of critical infrastructure
- Infringement of obligations under Union law protecting fundamental rights
- Serious damage to property or the environment
For most SaaS providers, the relevant categories will be serious harm to persons and infringement of fundamental rights obligations. An AI-powered hiring system that systematically disadvantages a protected group would trigger this threshold. A medical AI system that produces dangerous diagnostic recommendations meets it clearly.
The 15-Day Reporting Clock
When a serious incident occurs, Article 73(2) requires the provider to report it "immediately" and in any event within 15 days of becoming aware of it to the market surveillance authorities of the Member States where the incident occurred.
If the incident is life-threatening, the reporting window is 72 hours.
"Immediately upon becoming aware" is the triggering event — not when you have completed your investigation. You report first, then investigate. This means your monitoring infrastructure must have:
- Automated detection capable of flagging events that may meet the serious incident threshold without requiring manual review to initiate
- Clear escalation paths from detection to legal team notification within hours
- Pre-built reporting templates that can be rapidly populated with the information authorities require
- Jurisdictional mapping — understanding which Member State authority receives the report depends on where the incident occurred and where your system is placed on the market
What the Report Must Contain
The report to market surveillance authorities must include:
- Description of the incident, including its nature, duration, and effects
- Information about the high-risk AI system involved (model version, deployment configuration, intended purpose)
- The users affected and the context of use
- Corrective measures taken or planned
- Where available, the root cause assessment
You do not need a complete root cause analysis to submit the initial report. The 15-day window is specifically designed to allow initial reporting before full investigation. Follow-up reports with complete analysis are expected.
Building the Monitoring Architecture
Translating Articles 72 and 73 into an actual technical system requires decisions about instrumentation, data collection, analysis, and alerting.
Layer 1: Input/Output Logging
Every inference request and response must be logged in a way that supports later analysis. This means capturing:
- The input features or content (subject to data minimization — capture what you need for analysis, not everything)
- The model output and confidence scores
- The timestamp and request metadata
- The user context where relevant (role, jurisdiction, use case category)
For SaaS systems, this logging infrastructure typically lives at the API gateway or model serving layer. The key architectural choice is retention location: these logs contain potentially sensitive information about individuals and must be stored in compliance with GDPR. For EU-based systems serving EU users, storing inference logs in EU infrastructure eliminates a significant layer of GDPR complexity.
Layer 2: Statistical Performance Monitoring
Raw logs are not a monitoring system. You need automated analysis that computes performance metrics and detects anomalies.
Performance drift monitoring tracks key metrics over time and alerts when they fall outside acceptable ranges. The specific metrics depend on your system type:
| System Type | Primary Metrics | Drift Indicators |
|---|---|---|
| Classification | Accuracy, F1, AUC | Label distribution shift, confidence calibration drift |
| Regression | RMSE, MAE, R² | Prediction interval coverage, residual distribution |
| Ranking/Recommendation | NDCG, MRR, Precision@K | Exposure bias, popularity shift |
| NLP/Generative | Task-specific + human eval | Semantic drift, refusal rate changes |
| Computer Vision | mAP, precision/recall by class | Background distribution shift, class balance |
Data distribution monitoring tracks whether input distributions are shifting away from the training distribution. Tools like Evidently AI (EU-based, Estonia) and Whylogs provide open-source frameworks for this.
A practical implementation for a high-risk classifier might use Population Stability Index (PSI) on each input feature:
import numpy as np
def population_stability_index(expected, actual, bins=10):
"""
PSI < 0.1: No significant change
PSI 0.1-0.25: Some change, investigate
PSI > 0.25: Significant change, take action
"""
expected_freq = np.histogram(expected, bins=bins)[0] / len(expected)
actual_freq = np.histogram(actual, bins=bins)[0] / len(actual)
# Avoid log(0)
expected_freq = np.where(expected_freq == 0, 0.0001, expected_freq)
actual_freq = np.where(actual_freq == 0, 0.0001, actual_freq)
psi = np.sum(
(actual_freq - expected_freq) * np.log(actual_freq / expected_freq)
)
return psi
# In your monitoring pipeline
for feature in monitoring_features:
psi = population_stability_index(
baseline_data[feature],
production_data[feature]
)
if psi > 0.25:
alert_team(f"Significant drift in feature: {feature}, PSI: {psi:.3f}")
Fairness monitoring is mandatory for Annex III systems where bias is a recognized risk. This means computing fairness metrics disaggregated by protected attributes:
from sklearn.metrics import accuracy_score
def compute_fairness_metrics(y_true, y_pred, protected_attr):
"""Monitor demographic parity and equalized odds."""
groups = protected_attr.unique()
metrics = {}
overall_accuracy = accuracy_score(y_true, y_pred)
overall_positive_rate = y_pred.mean()
for group in groups:
mask = protected_attr == group
group_accuracy = accuracy_score(y_true[mask], y_pred[mask])
group_positive_rate = y_pred[mask].mean()
metrics[group] = {
'accuracy': group_accuracy,
'accuracy_gap': abs(group_accuracy - overall_accuracy),
'positive_rate': group_positive_rate,
'demographic_parity_gap': abs(group_positive_rate - overall_positive_rate)
}
# Alert if any group exceeds acceptable gap
for group, m in metrics.items():
if m['accuracy_gap'] > 0.05: # 5% accuracy gap threshold
alert_compliance_team(f"Fairness alert: {group} accuracy gap {m['accuracy_gap']:.1%}")
return metrics
Layer 3: Serious Incident Detection
The most operationally critical part of the monitoring system is the mechanism that flags potential serious incidents for human review. This cannot be fully automated — a human must make the judgment about whether an event meets the Article 3(49) threshold — but automation can dramatically reduce the time to human review.
A practical approach uses a two-stage pipeline:
Stage 1 — Automated flagging: Rules and ML-based classifiers that identify events with characteristics associated with serious harm. For a hiring AI, this might include unusually high rejection rates for specific demographic groups in a short window, or feedback signals indicating discriminatory outcomes. For a medical AI, it includes explicit harm reports and anomalous clinical pathway deviations.
Stage 2 — Human review queue: Flagged events enter a prioritized review queue with a defined SLA. For potential life-threatening events: 2-hour review SLA. For other serious incident candidates: 24-hour review SLA. The reviewer makes the threshold determination and initiates reporting if warranted.
class SeriousIncidentDetector:
def __init__(self, thresholds):
self.thresholds = thresholds
def evaluate_event(self, event):
severity_signals = []
# Check for explicit harm reports
if event.get('user_reported_harm'):
severity_signals.append({
'type': 'user_harm_report',
'details': event['user_reported_harm'],
'sla_hours': 2
})
# Check for anomalous rejection patterns (hiring AI example)
if event.get('rejection_rate_by_group'):
for group, rate in event['rejection_rate_by_group'].items():
if rate > self.thresholds['max_rejection_rate']:
severity_signals.append({
'type': 'fairness_anomaly',
'affected_group': group,
'rate': rate,
'sla_hours': 24
})
# Check for cascade failures (infrastructure AI)
if event.get('downstream_system_failures', 0) > self.thresholds['cascade_threshold']:
severity_signals.append({
'type': 'cascade_failure',
'count': event['downstream_system_failures'],
'sla_hours': 2
})
if severity_signals:
min_sla = min(s['sla_hours'] for s in severity_signals)
self.escalate_for_review(event, severity_signals, sla_hours=min_sla)
return severity_signals
def escalate_for_review(self, event, signals, sla_hours):
# Create review ticket with SLA
# Notify legal and compliance teams
# Log escalation with timestamp (15-day reporting clock context)
pass
Layer 4: Reporting Infrastructure
When an event is determined to be a serious incident, you need pre-built infrastructure to generate the Article 73 report. This means:
Jurisdictional routing: Determine which national authority receives the report. The primary authority is where the incident occurred (where the user is located). Secondary notification may be required where the system is placed on the market and where you are established.
Major national market surveillance authorities for AI include:
- Germany: Bundesnetzagentur (lead for cross-sector AI)
- France: ANSSI + sectoral authorities (ARJEL for gambling, ACPR for finance)
- Netherlands: Rijksdienst voor Ondernemend Nederland (RVO) + ACM
- Italy: AGCOM + sector-specific authorities
- Spain: AESIA (Agencia Española de Supervisión de la Inteligencia Artificial) — first dedicated national AI regulator
Report generation: A template-driven system that pulls incident data, system metadata, and corrective action records into the required format. The EU AI Office is expected to publish standard reporting forms; pre-map your data schema to these forms now.
Evidence packaging: Logs, model version records, deployment configuration, and monitoring data from the incident window, packaged in a format that can be transmitted to authorities within the 15-day window.
The EUDB Continuous Reporting Obligation
Beyond incident reporting, Article 71 requires that high-risk AI systems be registered in the EU AI Act Database (EUDB). Once registered, providers have ongoing obligations to update that registration when:
- The system is substantially modified (triggering a new conformity assessment)
- The intended purpose changes
- A serious incident occurs and is reported
- The system is withdrawn from the market
The EUDB is being built by the EU AI Office and is expected to launch in late 2026. Plan your registration data architecture now so that your monitoring system can generate the required updates automatically rather than requiring manual data entry.
Substantial Modification: When You Need a New Conformity Assessment
One of the most practically significant monitoring obligations is the duty to recognize when a change to your system constitutes a "substantial modification" under Article 83, triggering a new conformity assessment.
Substantial modifications include:
- Changes that affect the system's performance against safety requirements
- Changes to the intended purpose
- Changes that affect the system's compliance with fundamental rights requirements
- Changes to the AI system's design that materially alter its functioning
The challenge is that in a continuous delivery SaaS environment, you are making changes constantly. Your monitoring system needs a change management gate that classifies each change by its potential to constitute a substantial modification:
class SubstantialModificationChecker:
"""
Classify changes to determine if they trigger new conformity assessment.
Based on EU AI Act Article 83 and draft NANDO guidance.
"""
SUBSTANTIAL_INDICATORS = [
'model_architecture_change',
'training_data_domain_shift',
'intended_purpose_expansion',
'input_feature_schema_change',
'output_interpretation_change',
'removal_of_safety_feature',
'accuracy_degradation_exceeds_threshold'
]
NON_SUBSTANTIAL_INDICATORS = [
'hyperparameter_tuning_within_bounds',
'infrastructure_upgrade_same_model',
'ui_change_no_model_impact',
'logging_enhancement',
'performance_optimization_same_outputs',
'bug_fix_restoring_documented_behavior'
]
def classify_change(self, change_description, change_metadata):
substantial_signals = []
for indicator in self.SUBSTANTIAL_INDICATORS:
if change_metadata.get(indicator):
substantial_signals.append(indicator)
if substantial_signals:
return {
'classification': 'POTENTIALLY_SUBSTANTIAL',
'signals': substantial_signals,
'action': 'REQUIRE_LEGAL_REVIEW_BEFORE_DEPLOY',
'estimated_reassessment_needed': True
}
return {
'classification': 'NON_SUBSTANTIAL',
'signals': [],
'action': 'PROCEED_WITH_STANDARD_CHANGE_MANAGEMENT',
'estimated_reassessment_needed': False
}
This gate should sit in your CI/CD pipeline for high-risk AI systems. Every deployment must pass through the classification check, and POTENTIALLY_SUBSTANTIAL changes must receive legal review before deployment approval.
The Post-Market Monitoring Plan
The EU AI Act and its supporting guidelines require that providers document their post-market monitoring approach in a Post-Market Monitoring Plan (PMMP) that is part of the Annex IV technical documentation.
Your PMMP must specify:
Scope: Which system behaviors are monitored and why. The scope should be derived from your Article 9 risk assessment — higher-risk behaviors receive more intensive monitoring.
Data collection methodology: What data is collected, how, from where, and at what frequency. This includes the baseline datasets against which production performance is compared.
Analysis methodology: How collected data is processed to identify issues. This includes the statistical methods, the thresholds that trigger alerts, and the review process for ambiguous cases.
Reporting structure: How monitoring findings are reported internally, how they feed into the risk management system, and under what circumstances they trigger external reporting (Article 73).
Review cadence: Article 72 does not specify a mandatory review frequency, but the monitoring plan must define one. Common approaches are monthly statistical reviews, quarterly formal assessments, and annual comprehensive audits.
Corrective action framework: The predefined process for responding to different categories of monitoring findings, including who is authorized to take which types of corrective action and within what timeframe.
A minimal but compliant PMMP table of contents looks like:
1. System Overview and Scope
1.1 System Identification (as registered in EUDB)
1.2 Intended Purpose and Use Cases in Scope
1.3 Risk Categories Addressed by Monitoring
2. Monitoring Framework
2.1 Performance Metrics and Baselines
2.2 Fairness and Non-Discrimination Metrics
2.3 Data Distribution Monitoring
2.4 Threshold Definitions and Alert Criteria
3. Data Collection
3.1 Data Sources
3.2 Collection Methodology
3.3 Data Retention (minimum 10 years per Article 12)
3.4 GDPR Compliance for Monitoring Data
4. Incident Detection and Reporting
4.1 Serious Incident Definition and Examples (per Article 3(49))
4.2 Detection Pipeline
4.3 Escalation Procedure and SLAs
4.4 Article 73 Reporting Process and Jurisdictional Routing
5. Corrective Action Framework
5.1 Action Types by Severity
5.2 Authorization Matrix
5.3 Substantial Modification Assessment
5.4 Documentation and Traceability
6. Governance
6.1 Roles and Responsibilities
6.2 Review Cadence
6.3 Plan Update Procedure
Why EU Hosting Matters for Post-Market Monitoring
The infrastructure where your monitoring system runs is not a compliance irrelevance. It has direct implications for three Article 72 requirements.
GDPR Alignment for Monitoring Data
Your monitoring infrastructure processes personal data — inference logs contain information about individuals' interactions with your high-risk AI system. If that infrastructure is in a non-EU jurisdiction subject to foreign surveillance legislation (US CLOUD Act, China NIL, etc.), you have a potential conflict between your GDPR data minimization obligations and foreign law enforcement access rights.
Running your monitoring infrastructure on EU-sovereign cloud providers eliminates this conflict entirely. There is no foreign law that can compel access to monitoring data held solely in EU infrastructure under EU provider control.
Data Residency for 10-Year Retention
Article 12 requires 10-year retention of technical documentation and, by implication, the monitoring records that verify ongoing compliance. If your monitoring data is in a US-based cloud provider, you are subject to data residency uncertainty over a decade — provider policy changes, regulatory changes, market exits. EU providers operating under EU regulatory frameworks provide a more stable long-term residency guarantee.
Audit Trail Integrity
Market surveillance authorities conducting audits of your post-market monitoring system need confidence in the integrity of your records. Monitoring data stored in EU-sovereign infrastructure under EU data protection law provides a cleaner chain of custody for regulatory examination than data held in multi-jurisdictional US-based infrastructure with complex data transfer arrangements.
Practical Implementation Timeline
If you are building post-market monitoring infrastructure now, here is a realistic 16-week implementation timeline for a SaaS provider with an existing high-risk AI system:
Weeks 1-2: Baseline establishment
- Audit existing logging infrastructure for gaps
- Define the monitoring scope based on your Article 9 risk assessment
- Establish production baseline metrics from historical data
Weeks 3-5: Core monitoring pipeline
- Deploy statistical performance monitoring (PSI, distribution monitoring)
- Implement fairness monitoring for protected attributes in scope
- Set threshold values and configure alerting
Weeks 6-8: Incident detection
- Define serious incident criteria specific to your system
- Build the two-stage detection pipeline (automated flagging + human review queue)
- Test with historical incidents and simulated scenarios
Weeks 9-11: Reporting infrastructure
- Build jurisdictional routing logic for Article 73 reports
- Create report templates aligned with expected EU AI Office format
- Integrate reporting with legal team notification workflow
Weeks 12-14: Change management integration
- Deploy substantial modification classifier in CI/CD pipeline
- Document classification criteria and approval gates
- Train engineering team on classification criteria
Weeks 15-16: Documentation and plan finalization
- Write the Post-Market Monitoring Plan document
- Integrate PMMP into Annex IV technical documentation
- Run a tabletop exercise simulating a serious incident from detection to reporting
What Changes After August 2026
The August 2, 2026 enforcement date for high-risk AI systems under the EU AI Act means that your post-market monitoring system needs to be operational — not just planned — for systems placed on the market from that date.
For systems already deployed before August 2026, transitional provisions in Article 111 provide some flexibility, but providers who placed systems on the market before the regulation applied and intend to continue doing so should treat the August 2026 date as their operational deadline regardless.
Market surveillance authorities will be actively auditing in Q4 2026. The first enforcement actions are expected within 18 months of the enforcement date. Post-market monitoring documentation — particularly the PMMP and evidence of actual monitoring activity — will be among the first things requested in any audit.
The Monitoring Obligation Continues
The EU AI Act's post-market surveillance framework is designed to create an ongoing feedback loop between real-world performance and the risk management system — a loop that must keep running for the entire operational lifetime of your high-risk AI system.
For SaaS providers, this means building monitoring into your product infrastructure the same way you build performance monitoring and error tracking. It is not a periodic compliance exercise. It is a continuous engineering discipline with legal consequences.
The providers who will navigate this most successfully are those who treat Article 72 monitoring not as a reporting burden but as a product quality mechanism — one that happens to generate the compliance evidence authorities need, as a byproduct of genuine operational excellence.
In the final post in this series, we will examine the complete high-risk AI compliance stack: combining the classification assessment from Post #1, the Annex III deep dive from Post #2, the conformity assessment framework from Post #3, and the monitoring obligations from this post into a unified developer toolkit for August 2026 readiness.
sota.io runs on EU-sovereign infrastructure, keeping your post-market monitoring data, inference logs, and compliance records under EU law without cross-border data transfer complexity. See how sota.io simplifies EU AI Act compliance infrastructure →
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.