2026-05-28·5 min read·sota.io Team

EU AI Act High-Risk AI Monitoring & Post-Market Surveillance 2026: What SaaS Providers Must Track After Deployment

Post #4 in the sota.io EU AI Act High-Risk Classification Series

EU AI Act post-market monitoring dashboard showing continuous surveillance metrics, incident reporting timeline, and Article 72 compliance tracking for high-risk AI systems

Most teams spend months preparing for the EU AI Act conformity assessment — the technical documentation, risk management system, testing protocols — and then treat deployment as the finish line. It isn't.

Article 72 of the EU AI Act mandates that providers of high-risk AI systems implement a post-market monitoring system that actively collects and analyzes performance data throughout the lifecycle of the deployed system. Deployment is not the end of compliance. It is the beginning of a continuous obligation.

For SaaS providers, this creates an operational challenge that sits at the intersection of engineering, legal, and MLOps: what exactly must you monitor, how must you report, and what happens when something goes wrong?

This guide covers the full post-market surveillance framework: Article 72 monitoring obligations, Article 73 serious incident reporting, the interaction with market surveillance authorities, and the practical architecture decisions that make compliance manageable — including why the location of your infrastructure matters significantly.

What Article 72 Actually Requires

Article 72(1) states that providers "shall put in place a post-market monitoring system that is proportionate to the nature of the AI technologies and the risks of the high-risk AI system." The monitoring system must "actively and systematically collect, document and analyse relevant data" about how the system performs throughout its lifetime.

This is not passive logging. It is an active, structured system with defined scope, methodology, and outputs.

The Six Core Monitoring Obligations

1. Performance monitoring against intended purpose

You must track how the system performs against the metrics established in your Annex IV technical documentation. If your conformity assessment established that the system achieves 94% accuracy on a specific task, your monitoring system must track whether production performance holds at or near that level over time.

This includes distribution shift detection — recognizing when the statistical properties of incoming data have drifted from your training distribution in ways that may degrade performance.

2. Risks identified post-deployment

Your Article 9 Risk Management System is a living document. Article 72(2) requires that monitoring feeds new risk findings back into the system. If you discover in production that a risk you classified as low-severity during development is actually manifesting more frequently or more severely, your risk register must be updated and your risk management system revised accordingly.

3. Monitoring of "serious incidents"

Article 72(3) explicitly connects the monitoring system to incident detection. Your monitoring infrastructure must be capable of identifying events that meet the "serious incident" threshold defined in Article 3(49) — specifically, incidents that result in death, serious harm to health, significant property damage, or disruption to critical infrastructure.

4. Corrective action tracking

When the monitoring system identifies a performance issue, you must document the corrective actions taken and track whether those actions resolved the issue. The loop must close with evidence.

5. Data retention

All monitoring data must be retained for the period specified in Article 12 — ten years for high-risk AI systems in general, with some category-specific variations. This includes the raw monitoring data, the analyses performed, the incidents identified, and the corrective actions taken.

6. Cooperation with notified bodies

If your system was assessed by a notified body (Procedure B), your monitoring system must generate outputs compatible with the ongoing oversight requirements of that body. This typically means periodic reports and notification of significant changes.

Article 73: Serious Incident Reporting

Article 73 creates a mandatory reporting obligation that is separate from and more urgent than the general monitoring requirements.

What Constitutes a "Serious Incident"

Article 3(49) defines a serious incident as any incident or malfunction of a high-risk AI system that directly or indirectly leads to:

The death of a person
Serious injury to a person
Significant unintended health and safety risks
Serious and irreversible disruption of the management and operation of critical infrastructure
Infringement of obligations under Union law protecting fundamental rights
Serious damage to property or the environment

For most SaaS providers, the relevant categories will be serious harm to persons and infringement of fundamental rights obligations. An AI-powered hiring system that systematically disadvantages a protected group would trigger this threshold. A medical AI system that produces dangerous diagnostic recommendations meets it clearly.

The 15-Day Reporting Clock

When a serious incident occurs, Article 73(2) requires the provider to report it "immediately" and in any event within 15 days of becoming aware of it to the market surveillance authorities of the Member States where the incident occurred.

If the incident is life-threatening, the reporting window is 72 hours.

"Immediately upon becoming aware" is the triggering event — not when you have completed your investigation. You report first, then investigate. This means your monitoring infrastructure must have:

Automated detection capable of flagging events that may meet the serious incident threshold without requiring manual review to initiate
Clear escalation paths from detection to legal team notification within hours
Pre-built reporting templates that can be rapidly populated with the information authorities require
Jurisdictional mapping — understanding which Member State authority receives the report depends on where the incident occurred and where your system is placed on the market

What the Report Must Contain

The report to market surveillance authorities must include:

Description of the incident, including its nature, duration, and effects
Information about the high-risk AI system involved (model version, deployment configuration, intended purpose)
The users affected and the context of use
Corrective measures taken or planned
Where available, the root cause assessment

You do not need a complete root cause analysis to submit the initial report. The 15-day window is specifically designed to allow initial reporting before full investigation. Follow-up reports with complete analysis are expected.

Building the Monitoring Architecture

Translating Articles 72 and 73 into an actual technical system requires decisions about instrumentation, data collection, analysis, and alerting.

Layer 1: Input/Output Logging

Every inference request and response must be logged in a way that supports later analysis. This means capturing:

The input features or content (subject to data minimization — capture what you need for analysis, not everything)
The model output and confidence scores
The timestamp and request metadata
The user context where relevant (role, jurisdiction, use case category)

For SaaS systems, this logging infrastructure typically lives at the API gateway or model serving layer. The key architectural choice is retention location: these logs contain potentially sensitive information about individuals and must be stored in compliance with GDPR. For EU-based systems serving EU users, storing inference logs in EU infrastructure eliminates a significant layer of GDPR complexity.

Layer 2: Statistical Performance Monitoring

Raw logs are not a monitoring system. You need automated analysis that computes performance metrics and detects anomalies.

Performance drift monitoring tracks key metrics over time and alerts when they fall outside acceptable ranges. The specific metrics depend on your system type:

System Type	Primary Metrics	Drift Indicators
Classification	Accuracy, F1, AUC	Label distribution shift, confidence calibration drift
Regression	RMSE, MAE, R²	Prediction interval coverage, residual distribution
Ranking/Recommendation	NDCG, MRR, Precision@K	Exposure bias, popularity shift
NLP/Generative	Task-specific + human eval	Semantic drift, refusal rate changes
Computer Vision	mAP, precision/recall by class	Background distribution shift, class balance

Data distribution monitoring tracks whether input distributions are shifting away from the training distribution. Tools like Evidently AI (EU-based, Estonia) and Whylogs provide open-source frameworks for this.

A practical implementation for a high-risk classifier might use Population Stability Index (PSI) on each input feature:

import numpy as np

def population_stability_index(expected, actual, bins=10):
    """
    PSI < 0.1: No significant change
    PSI 0.1-0.25: Some change, investigate
    PSI > 0.25: Significant change, take action
    """
    expected_freq = np.histogram(expected, bins=bins)[0] / len(expected)
    actual_freq = np.histogram(actual, bins=bins)[0] / len(actual)
    
    # Avoid log(0)
    expected_freq = np.where(expected_freq == 0, 0.0001, expected_freq)
    actual_freq = np.where(actual_freq == 0, 0.0001, actual_freq)
    
    psi = np.sum(
        (actual_freq - expected_freq) * np.log(actual_freq / expected_freq)
    )
    return psi

# In your monitoring pipeline
for feature in monitoring_features:
    psi = population_stability_index(
        baseline_data[feature],
        production_data[feature]
    )
    if psi > 0.25:
        alert_team(f"Significant drift in feature: {feature}, PSI: {psi:.3f}")

Fairness monitoring is mandatory for Annex III systems where bias is a recognized risk. This means computing fairness metrics disaggregated by protected attributes:

from sklearn.metrics import accuracy_score

def compute_fairness_metrics(y_true, y_pred, protected_attr):
    """Monitor demographic parity and equalized odds."""
    groups = protected_attr.unique()
    metrics = {}
    
    overall_accuracy = accuracy_score(y_true, y_pred)
    overall_positive_rate = y_pred.mean()
    
    for group in groups:
        mask = protected_attr == group
        group_accuracy = accuracy_score(y_true[mask], y_pred[mask])
        group_positive_rate = y_pred[mask].mean()
        
        metrics[group] = {
            'accuracy': group_accuracy,
            'accuracy_gap': abs(group_accuracy - overall_accuracy),
            'positive_rate': group_positive_rate,
            'demographic_parity_gap': abs(group_positive_rate - overall_positive_rate)
        }
    
    # Alert if any group exceeds acceptable gap
    for group, m in metrics.items():
        if m['accuracy_gap'] > 0.05:  # 5% accuracy gap threshold
            alert_compliance_team(f"Fairness alert: {group} accuracy gap {m['accuracy_gap']:.1%}")
    
    return metrics

Layer 3: Serious Incident Detection

The most operationally critical part of the monitoring system is the mechanism that flags potential serious incidents for human review. This cannot be fully automated — a human must make the judgment about whether an event meets the Article 3(49) threshold — but automation can dramatically reduce the time to human review.

A practical approach uses a two-stage pipeline:

Stage 1 — Automated flagging: Rules and ML-based classifiers that identify events with characteristics associated with serious harm. For a hiring AI, this might include unusually high rejection rates for specific demographic groups in a short window, or feedback signals indicating discriminatory outcomes. For a medical AI, it includes explicit harm reports and anomalous clinical pathway deviations.

Stage 2 — Human review queue: Flagged events enter a prioritized review queue with a defined SLA. For potential life-threatening events: 2-hour review SLA. For other serious incident candidates: 24-hour review SLA. The reviewer makes the threshold determination and initiates reporting if warranted.

class SeriousIncidentDetector:
    def __init__(self, thresholds):
        self.thresholds = thresholds
    
    def evaluate_event(self, event):
        severity_signals = []
        
        # Check for explicit harm reports
        if event.get('user_reported_harm'):
            severity_signals.append({
                'type': 'user_harm_report',
                'details': event['user_reported_harm'],
                'sla_hours': 2
            })
        
        # Check for anomalous rejection patterns (hiring AI example)
        if event.get('rejection_rate_by_group'):
            for group, rate in event['rejection_rate_by_group'].items():
                if rate > self.thresholds['max_rejection_rate']:
                    severity_signals.append({
                        'type': 'fairness_anomaly',
                        'affected_group': group,
                        'rate': rate,
                        'sla_hours': 24
                    })
        
        # Check for cascade failures (infrastructure AI)
        if event.get('downstream_system_failures', 0) > self.thresholds['cascade_threshold']:
            severity_signals.append({
                'type': 'cascade_failure',
                'count': event['downstream_system_failures'],
                'sla_hours': 2
            })
        
        if severity_signals:
            min_sla = min(s['sla_hours'] for s in severity_signals)
            self.escalate_for_review(event, severity_signals, sla_hours=min_sla)
        
        return severity_signals
    
    def escalate_for_review(self, event, signals, sla_hours):
        # Create review ticket with SLA
        # Notify legal and compliance teams
        # Log escalation with timestamp (15-day reporting clock context)
        pass

Layer 4: Reporting Infrastructure

When an event is determined to be a serious incident, you need pre-built infrastructure to generate the Article 73 report. This means:

Jurisdictional routing: Determine which national authority receives the report. The primary authority is where the incident occurred (where the user is located). Secondary notification may be required where the system is placed on the market and where you are established.

Major national market surveillance authorities for AI include:

Germany: Bundesnetzagentur (lead for cross-sector AI)
France: ANSSI + sectoral authorities (ARJEL for gambling, ACPR for finance)
Netherlands: Rijksdienst voor Ondernemend Nederland (RVO) + ACM
Italy: AGCOM + sector-specific authorities
Spain: AESIA (Agencia Española de Supervisión de la Inteligencia Artificial) — first dedicated national AI regulator

Report generation: A template-driven system that pulls incident data, system metadata, and corrective action records into the required format. The EU AI Office is expected to publish standard reporting forms; pre-map your data schema to these forms now.

Evidence packaging: Logs, model version records, deployment configuration, and monitoring data from the incident window, packaged in a format that can be transmitted to authorities within the 15-day window.

The EUDB Continuous Reporting Obligation

Beyond incident reporting, Article 71 requires that high-risk AI systems be registered in the EU AI Act Database (EUDB). Once registered, providers have ongoing obligations to update that registration when:

The system is substantially modified (triggering a new conformity assessment)
The intended purpose changes
A serious incident occurs and is reported
The system is withdrawn from the market

The EUDB is being built by the EU AI Office and is expected to launch in late 2026. Plan your registration data architecture now so that your monitoring system can generate the required updates automatically rather than requiring manual data entry.

Substantial Modification: When You Need a New Conformity Assessment

One of the most practically significant monitoring obligations is the duty to recognize when a change to your system constitutes a "substantial modification" under Article 83, triggering a new conformity assessment.

Substantial modifications include:

Changes that affect the system's performance against safety requirements
Changes to the intended purpose
Changes that affect the system's compliance with fundamental rights requirements
Changes to the AI system's design that materially alter its functioning

The challenge is that in a continuous delivery SaaS environment, you are making changes constantly. Your monitoring system needs a change management gate that classifies each change by its potential to constitute a substantial modification:

class SubstantialModificationChecker:
    """
    Classify changes to determine if they trigger new conformity assessment.
    Based on EU AI Act Article 83 and draft NANDO guidance.
    """
    
    SUBSTANTIAL_INDICATORS = [
        'model_architecture_change',
        'training_data_domain_shift',
        'intended_purpose_expansion',
        'input_feature_schema_change',
        'output_interpretation_change',
        'removal_of_safety_feature',
        'accuracy_degradation_exceeds_threshold'
    ]
    
    NON_SUBSTANTIAL_INDICATORS = [
        'hyperparameter_tuning_within_bounds',
        'infrastructure_upgrade_same_model',
        'ui_change_no_model_impact',
        'logging_enhancement',
        'performance_optimization_same_outputs',
        'bug_fix_restoring_documented_behavior'
    ]
    
    def classify_change(self, change_description, change_metadata):
        substantial_signals = []
        
        for indicator in self.SUBSTANTIAL_INDICATORS:
            if change_metadata.get(indicator):
                substantial_signals.append(indicator)
        
        if substantial_signals:
            return {
                'classification': 'POTENTIALLY_SUBSTANTIAL',
                'signals': substantial_signals,
                'action': 'REQUIRE_LEGAL_REVIEW_BEFORE_DEPLOY',
                'estimated_reassessment_needed': True
            }
        
        return {
            'classification': 'NON_SUBSTANTIAL',
            'signals': [],
            'action': 'PROCEED_WITH_STANDARD_CHANGE_MANAGEMENT',
            'estimated_reassessment_needed': False
        }

This gate should sit in your CI/CD pipeline for high-risk AI systems. Every deployment must pass through the classification check, and POTENTIALLY_SUBSTANTIAL changes must receive legal review before deployment approval.

The Post-Market Monitoring Plan

The EU AI Act and its supporting guidelines require that providers document their post-market monitoring approach in a Post-Market Monitoring Plan (PMMP) that is part of the Annex IV technical documentation.

Your PMMP must specify:

Scope: Which system behaviors are monitored and why. The scope should be derived from your Article 9 risk assessment — higher-risk behaviors receive more intensive monitoring.

Data collection methodology: What data is collected, how, from where, and at what frequency. This includes the baseline datasets against which production performance is compared.

Analysis methodology: How collected data is processed to identify issues. This includes the statistical methods, the thresholds that trigger alerts, and the review process for ambiguous cases.

Reporting structure: How monitoring findings are reported internally, how they feed into the risk management system, and under what circumstances they trigger external reporting (Article 73).

Review cadence: Article 72 does not specify a mandatory review frequency, but the monitoring plan must define one. Common approaches are monthly statistical reviews, quarterly formal assessments, and annual comprehensive audits.

Corrective action framework: The predefined process for responding to different categories of monitoring findings, including who is authorized to take which types of corrective action and within what timeframe.

A minimal but compliant PMMP table of contents looks like:

1. System Overview and Scope
   1.1 System Identification (as registered in EUDB)
   1.2 Intended Purpose and Use Cases in Scope
   1.3 Risk Categories Addressed by Monitoring

2. Monitoring Framework
   2.1 Performance Metrics and Baselines
   2.2 Fairness and Non-Discrimination Metrics
   2.3 Data Distribution Monitoring
   2.4 Threshold Definitions and Alert Criteria

3. Data Collection
   3.1 Data Sources
   3.2 Collection Methodology
   3.3 Data Retention (minimum 10 years per Article 12)
   3.4 GDPR Compliance for Monitoring Data

4. Incident Detection and Reporting
   4.1 Serious Incident Definition and Examples (per Article 3(49))
   4.2 Detection Pipeline
   4.3 Escalation Procedure and SLAs
   4.4 Article 73 Reporting Process and Jurisdictional Routing

5. Corrective Action Framework
   5.1 Action Types by Severity
   5.2 Authorization Matrix
   5.3 Substantial Modification Assessment
   5.4 Documentation and Traceability

6. Governance
   6.1 Roles and Responsibilities
   6.2 Review Cadence
   6.3 Plan Update Procedure

Why EU Hosting Matters for Post-Market Monitoring

The infrastructure where your monitoring system runs is not a compliance irrelevance. It has direct implications for three Article 72 requirements.

Your monitoring infrastructure processes personal data — inference logs contain information about individuals' interactions with your high-risk AI system. If that infrastructure is in a non-EU jurisdiction subject to foreign surveillance legislation (US CLOUD Act, China NIL, etc.), you have a potential conflict between your GDPR data minimization obligations and foreign law enforcement access rights.

Running your monitoring infrastructure on EU-sovereign cloud providers eliminates this conflict entirely. There is no foreign law that can compel access to monitoring data held solely in EU infrastructure under EU provider control.

Data Residency for 10-Year Retention

Article 12 requires 10-year retention of technical documentation and, by implication, the monitoring records that verify ongoing compliance. If your monitoring data is in a US-based cloud provider, you are subject to data residency uncertainty over a decade — provider policy changes, regulatory changes, market exits. EU providers operating under EU regulatory frameworks provide a more stable long-term residency guarantee.

Audit Trail Integrity

Market surveillance authorities conducting audits of your post-market monitoring system need confidence in the integrity of your records. Monitoring data stored in EU-sovereign infrastructure under EU data protection law provides a cleaner chain of custody for regulatory examination than data held in multi-jurisdictional US-based infrastructure with complex data transfer arrangements.

Practical Implementation Timeline

If you are building post-market monitoring infrastructure now, here is a realistic 16-week implementation timeline for a SaaS provider with an existing high-risk AI system:

Weeks 1-2: Baseline establishment

Audit existing logging infrastructure for gaps
Define the monitoring scope based on your Article 9 risk assessment
Establish production baseline metrics from historical data

Weeks 3-5: Core monitoring pipeline

Deploy statistical performance monitoring (PSI, distribution monitoring)
Implement fairness monitoring for protected attributes in scope
Set threshold values and configure alerting

Weeks 6-8: Incident detection

Define serious incident criteria specific to your system
Build the two-stage detection pipeline (automated flagging + human review queue)
Test with historical incidents and simulated scenarios

Weeks 9-11: Reporting infrastructure

Build jurisdictional routing logic for Article 73 reports
Create report templates aligned with expected EU AI Office format
Integrate reporting with legal team notification workflow

Weeks 12-14: Change management integration

Deploy substantial modification classifier in CI/CD pipeline
Document classification criteria and approval gates
Train engineering team on classification criteria

Weeks 15-16: Documentation and plan finalization

Write the Post-Market Monitoring Plan document
Integrate PMMP into Annex IV technical documentation
Run a tabletop exercise simulating a serious incident from detection to reporting

What Changes After August 2026

The August 2, 2026 enforcement date for high-risk AI systems under the EU AI Act means that your post-market monitoring system needs to be operational — not just planned — for systems placed on the market from that date.

For systems already deployed before August 2026, transitional provisions in Article 111 provide some flexibility, but providers who placed systems on the market before the regulation applied and intend to continue doing so should treat the August 2026 date as their operational deadline regardless.

Market surveillance authorities will be actively auditing in Q4 2026. The first enforcement actions are expected within 18 months of the enforcement date. Post-market monitoring documentation — particularly the PMMP and evidence of actual monitoring activity — will be among the first things requested in any audit.

The Monitoring Obligation Continues

The EU AI Act's post-market surveillance framework is designed to create an ongoing feedback loop between real-world performance and the risk management system — a loop that must keep running for the entire operational lifetime of your high-risk AI system.

For SaaS providers, this means building monitoring into your product infrastructure the same way you build performance monitoring and error tracking. It is not a periodic compliance exercise. It is a continuous engineering discipline with legal consequences.

The providers who will navigate this most successfully are those who treat Article 72 monitoring not as a reporting burden but as a product quality mechanism — one that happens to generate the compliance evidence authorities need, as a byproduct of genuine operational excellence.

In the final post in this series, we will examine the complete high-risk AI compliance stack: combining the classification assessment from Post #1, the Annex III deep dive from Post #2, the conformity assessment framework from Post #3, and the monitoring obligations from this post into a unified developer toolkit for August 2026 readiness.

sota.io runs on EU-sovereign infrastructure, keeping your post-market monitoring data, inference logs, and compliance records under EU law without cross-border data transfer complexity. See how sota.io simplifies EU AI Act compliance infrastructure →

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.

Join the waitlist View pricing