2026-04-12·12 min read·sota.io team

EU AI Act Art.97: Commission Evaluation — What Gets Reviewed and When — Developer Guide (2026)

The EU AI Act is not a static regulation. Its architecture includes multiple built-in feedback mechanisms that allow the framework to evolve as AI technology changes. Most developers know about the Art.85 comprehensive review scheduled for 2027. Fewer understand that Article 97 creates a parallel, more targeted evaluation mechanism with different scope, different timing, and different developer implications.

Article 97 is the Commission's ongoing evaluation engine. Where Art.85 is the legislative reset triggered by a 3-year review, Art.97 is the regulatory calibration instrument — it evaluates specific technical and operational parameters of the Act on a rolling basis, feeds those findings into Commission reports, and provides the legal basis for delegated act amendments without requiring full legislative procedure.

For developers, the Art.97 cycle matters because it is the mechanism most likely to change your specific compliance obligations: what Annex III categories cover, where the GPAI systemic risk threshold sits, and how enforcement levels are calibrated.

What Article 97 Actually Says

Article 97 mandates the Commission to evaluate and report on the following specific matters — separately and on a more technical basis than the comprehensive Art.85 review.

The core evaluation obligation:

The Commission shall, within 4 years of the date of application of this Regulation, carry out an assessment of and report to the European Parliament and to the Council on the following:

(a) Adequacy of the supervision framework — whether the governance structure under Chapter VI and VII is functioning effectively. This includes the AI Office's oversight of GPAI models, the European AI Board's coordination role, and the adequacy of national Market Surveillance Authority resources.

(b) Scope of prohibited practices under Art.5 — whether the list of absolutely forbidden AI applications remains appropriate, whether new capabilities have created gaps in the prohibition list, and whether any current prohibition has become technically obsolete or overly broad.

(c) Annex III high-risk classification — this is the most developer-relevant evaluation item. The Commission assesses whether the eight categories in Annex III remain appropriate, whether new AI application areas should be added, and whether any category should be removed or narrowed.

(d) Effectiveness of conformity assessment — whether the Art.43-49 conformity pathway is generating sufficient assurance, or whether new conformity mechanisms are needed for emerging AI architectures (multimodal models, foundation-model-based deployments).

(e) Penalty level calibration — whether Art.99 fine tiers are achieving deterrence. If enforcement data from MSA Art.84 annual reports shows systematic underfiling of violations or inadequate provider behaviour change, the Commission can propose penalty adjustments.

(f) GPAI threshold review — the 10²⁵ FLOP systemic risk threshold in Art.51 is explicitly subject to the Art.97 evaluation. As compute efficiency improves and more models approach or cross this threshold, the Commission evaluates whether the trigger level remains technically meaningful.

The Art.85 vs Art.97 Distinction

Developers building compliance roadmaps past 2027 need to understand the difference between these two review mechanisms, because they operate on different timelines and have different legislative consequences.

DimensionArt.85 ReviewArt.97 Evaluation
Trigger3 years after application (≈ August 2027)4 years after application (≈ August 2028)
ScopeComprehensive — entire ActTargeted — specific parameters
OutputReport to Parliament + Council → can trigger legislative amendmentCommission evaluation report → can trigger delegated acts
Annex IIICovered as part of full Act reviewPrimary focus — specific assessment
GPAI thresholdsCovered generallyExplicitly evaluated
Legislative mechanismFull legislative procedure requiredDelegated acts under Art.97(3) possible without full Parliament/Council vote
Developer impactBroad potential changesSurgical Annex III and threshold changes most likely

The critical distinction is the legislative mechanism. Art.85 amendments require the full EU legislative process — Commission proposal → Parliament debate → Council vote → potentially 18-24 months. Art.97 evaluation can result in delegated acts — Commission-only amendments to Annexes I-III, with a shorter timeline and less political friction.

This means Art.97-triggered Annex III expansions can happen faster than Art.85-triggered changes to the core text.

The Art.84 → Art.97 Data Pipeline

The Art.97 evaluation does not happen in a vacuum. The Commission's assessment of Annex III adequacy, enforcement effectiveness, and penalty calibration depends on data — and that data flows from MSA annual reports under Art.84.

The pipeline works as follows:

MSA investigations (Art.74-76)
         ↓
Serious incident reports (Art.73)
         ↓
MSA annual reports to Commission (Art.84)
         ↓ aggregated across all 27 member states
Commission evaluation database
         ↓
Art.97 evaluation report (Year 4)
         ↓ if Annex III scope issues identified
Delegated act amending Annex III (Art.97(3))

For developers, this pipeline has a specific implication: your Art.73 serious incident reports and your EUAIDB registration data enter the Commission's Art.97 evaluation input. Sectors with high incident reporting rates are most likely to receive increased scrutiny in the Art.97 assessment — and potentially Annex III expansion or stricter conformity requirements.

Three data categories are particularly influential in Art.97 evaluations:

1. Incident concentration by sector: If Art.84 annual reports show that a particular AI application type (say, AI-assisted insurance underwriting) generates disproportionate serious incident reports, Art.97 evaluation may trigger reclassification of that application into Annex III, even if it was previously considered non-high-risk.

2. Enforcement gap patterns: If MSAs consistently fail to identify Annex III violations in certain sectors — because current definitions are ambiguous, or because MSAs lack technical capacity for specific AI types — the Art.97 evaluation can recommend conformity procedure improvements or new harmonized standards.

3. Fine effectiveness signals: If Art.84 reports show that maximum Art.99 fines are being imposed without changing provider behaviour, the Art.97 evaluation provides the formal basis for proposing penalty escalation through delegated acts or legislative amendment recommendation.

Annex III Expansion Risk: The Developer's Core Concern

The Art.97 evaluation of Annex III scope is the parameter most likely to affect your compliance obligations. Understanding which current AI applications are at risk of reclassification helps prioritize compliance investment.

Current Annex III categories (for reference):

CategoryDescriptionReclassification risk
1. BiometricRemote identification, categorization, emotion recognitionStable — already comprehensive
2. Critical InfrastructureUtilities, transport, financeMedium — IoT/edge AI gaps
3. EducationAccess/admission decisionsLow — well-defined
4. EmploymentHiring, performance assessment, terminationMedium — gig economy AI gaps
5. Essential ServicesCredit, health, insurance, social benefitsHigh — AI scoring expansion likely
6. Law EnforcementRisk assessment, crime prediction, evidence evaluationHigh — newer AI investigative tools
7. Migration/AsylumRisk assessment, document verificationMedium — biometric AI expansion
8. Justice/DemocracyElectoral influence, court decisionsHigh — generative AI political risks

Applications currently outside Annex III that face reclassification risk by 2028:

  1. AI-assisted insurance underwriting (currently non-high-risk if below direct benefit/exclusion trigger) — large incident volume in financial services Art.84 data likely
  2. Autonomous HR performance scoring (continuous monitoring, not just hiring decisions) — coverage gap identified by Art.84 labor enforcement data
  3. AI-driven content recommendation with political influence (currently Art.5(1)(a)(iii) covers electoral manipulation, but influence below direct manipulation threshold is in a gray zone)
  4. Medical triage AI (below Art.6(2) threshold if not autonomous diagnosis) — Art.84 healthcare incident data may push toward Annex III

GPAI Threshold Compression

The 10²⁵ FLOP systemic risk threshold for GPAI models under Art.51 is explicitly within the Art.97 evaluation scope. The trajectory suggests compression — not expansion.

Why the threshold is likely to decrease:

In 2024 when the threshold was set, reaching 10²⁵ FLOP was a frontier-model-only achievement. By 2026, multiple model families are approaching or crossing this threshold. Algorithmic improvements mean equivalent capability can be achieved at lower compute. By 2028 (Art.97 evaluation year), the 10²⁵ FLOP threshold may apply to what are then considered mid-tier commercial models.

If the Commission's Art.97 evaluation concludes that the systemic risk designation applies only to a shrinking set of models that become progressively less representative of actual AI risk, it will adjust the threshold downward — bringing more models under Art.55 systemic risk obligations (adversarial testing, model evaluation, incident reporting, cybersecurity measures).

For GPAI model providers below the current threshold:

CLOUD Act Implications in Commission Evaluation Reports

The Art.97 evaluation reports have an additional significance that is not obvious from the text of the article itself: they are one of the primary mechanisms by which the Commission can explicitly address CLOUD Act conflicts with EU AI Act compliance requirements.

The current AI Act text does not mention the CLOUD Act by name. But Art.97 evaluation reports are policy documents — the Commission can include findings and recommendations that go beyond the statutory text. If Art.84 data shows that CLOUD Act compellability of compliance documentation is creating systematic enforcement problems (MSA requests for documentation held on US infrastructure being contested or delayed by US government national security assertions), Art.97 is the vehicle for a formal Commission position.

What a CLOUD Act finding in an Art.97 evaluation report could mean:

Commission findingDelegated act consequence
US cloud infrastructure creates "systematic risk of CLOUD Act compellability" for Annex III compliance docsAnnex I amendment requiring EU-jurisdiction documentation storage for high-risk AI
GPAI systematic risk assessments accessible under US DOJ compulsionArt.55 amendment specifying storage jurisdiction requirements
Art.73 incident reports from providers on US cloud compromised by CLOUD ActArt.73 implementation act requiring EU-native reporting systems

This is not speculative — the Commission's 2021 CLOUD Act assessment, the European Data Protection Board's 2023 guidelines, and multiple national DPA enforcement actions have all pointed toward this structural conflict. The Art.97 evaluation is the first formal legislative vehicle for addressing it.

For developers choosing infrastructure for AI compliance documentation today, this trajectory points in one direction: EU-native infrastructure now is insurance against regulatory mandate later.

Python Tooling: Regulatory Future-Proofing

from dataclasses import dataclass, field
from enum import Enum
from datetime import date
from typing import Optional

class AnnexIIICategory(Enum):
    BIOMETRIC = "biometric_id_categorization"
    CRITICAL_INFRA = "critical_infrastructure"
    EDUCATION = "education_training"
    EMPLOYMENT = "employment_hr"
    ESSENTIAL_SERVICES = "essential_services_credit_health"
    LAW_ENFORCEMENT = "law_enforcement"
    MIGRATION_ASYLUM = "migration_asylum"
    JUSTICE_DEMOCRACY = "justice_democracy_elections"
    NOT_HIGH_RISK = "not_currently_high_risk"

class ReclassificationRisk(Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    IMMINENT = "imminent_2028"

@dataclass
class Art97ComplianceAssessment:
    current_category: AnnexIIICategory
    reclassification_risk: ReclassificationRisk
    gpai_threshold_exposure: bool
    cloud_act_documentation_risk: bool
    recommended_actions: list[str]
    art97_evaluation_year: int = 2028
    art85_review_year: int = 2027

def assess_art97_exposure(
    ai_application_type: str,
    training_compute_flop: Optional[float],
    documentation_on_us_cloud: bool,
    current_annex_iii: bool,
    sector: str
) -> Art97ComplianceAssessment:
    """
    Assess a developer's exposure to Art.97 Commission evaluation changes.
    
    Returns risk profile and recommended preparatory actions.
    """
    
    # GPAI threshold exposure
    gpai_exposed = (
        training_compute_flop is not None and 
        training_compute_flop >= 1e23  # Within 100x of current 10^25 threshold
    )
    
    # Reclassification risk by sector
    HIGH_RISK_SECTORS = {
        "financial_services", "insurance", "healthcare_triage", 
        "political_content", "gig_economy_hr", "autonomous_hr_scoring"
    }
    MEDIUM_RISK_SECTORS = {
        "employment_general", "essential_services", "law_enforcement_adjacent",
        "critical_infrastructure_iot"
    }
    
    if sector in HIGH_RISK_SECTORS and not current_annex_iii:
        reclass_risk = ReclassificationRisk.HIGH
        category = AnnexIIICategory.ESSENTIAL_SERVICES  # Most likely catch-all expansion
    elif sector in MEDIUM_RISK_SECTORS and not current_annex_iii:
        reclass_risk = ReclassificationRisk.MEDIUM
        category = AnnexIIICategory.EMPLOYMENT
    elif current_annex_iii:
        reclass_risk = ReclassificationRisk.LOW  # Already covered
        category = AnnexIIICategory.ESSENTIAL_SERVICES
    else:
        reclass_risk = ReclassificationRisk.LOW
        category = AnnexIIICategory.NOT_HIGH_RISK
    
    # Build action plan
    actions = []
    
    if reclass_risk in {ReclassificationRisk.HIGH, ReclassificationRisk.IMMINENT}:
        actions.extend([
            "Begin Art.11 technical documentation now (before reclassification mandate)",
            "Implement Art.9 risk management process in advance",
            "Establish Art.12 logging with EU-jurisdiction storage",
            "Register on Art.95 voluntary code of conduct as good-faith signal to Commission"
        ])
    
    if gpai_exposed:
        actions.extend([
            f"Training compute {training_compute_flop:.1e} FLOP: within Art.97 threshold review range",
            "Implement Art.55 documentation structures proactively",
            "Monitor AI Office GPAI register for threshold adjustment announcements",
            "Consider Art.95 voluntary code adoption before Art.55 becomes mandatory"
        ])
    
    if documentation_on_us_cloud:
        actions.extend([
            "CLOUD Act risk in Art.97 evaluation trajectory: migrate compliance docs to EU-native storage",
            "Review Art.73 incident report system jurisdiction before 2027 Art.85 review",
            "Document CLOUD Act exposure in DPO risk register for Art.97 evaluation window"
        ])
    
    actions.append(
        f"Schedule Art.85 review prep for Q2 2027 and Art.97 evaluation prep for Q2 2028"
    )
    
    return Art97ComplianceAssessment(
        current_category=category,
        reclassification_risk=reclass_risk,
        gpai_threshold_exposure=gpai_exposed,
        cloud_act_documentation_risk=documentation_on_us_cloud,
        recommended_actions=actions
    )


@dataclass
class AnnexIIIExpansionTracker:
    """
    Tracks indicators that a non-high-risk AI system may be 
    reclassified under Art.97 Commission evaluation.
    """
    application_type: str
    sector: str
    incident_reports_submitted: int = 0
    msa_inquiry_received: bool = False
    competitor_reclassified: bool = False
    art84_data_exposure: bool = False
    
    def expansion_risk_score(self) -> float:
        """Returns 0.0 (no risk) to 1.0 (imminent reclassification)."""
        score = 0.0
        if self.msa_inquiry_received:
            score += 0.3  # MSA attention = high signal
        if self.incident_reports_submitted > 0:
            score += min(0.2, self.incident_reports_submitted * 0.05)
        if self.competitor_reclassified:
            score += 0.25  # Sector-level reclassification wave pattern
        if self.art84_data_exposure:
            score += 0.15
        return min(score, 1.0)
    
    def pre_compliance_recommendation(self) -> str:
        score = self.expansion_risk_score()
        if score >= 0.5:
            return (
                "HIGH: Begin Art.11 technical documentation immediately. "
                "Art.97 evaluation is likely to reclassify this application type. "
                "Pre-compliance now prevents emergency remediation post-reclassification."
            )
        elif score >= 0.25:
            return (
                "MEDIUM: Prepare lightweight technical documentation skeleton. "
                "Monitor AI Office Art.97 evaluation publications from 2027 onward."
            )
        return (
            "LOW: Maintain Art.95 voluntary compliance posture. "
            "Annual review of Art.97 Commission evaluation reports recommended."
        )

The Developer's Art.97 Timeline

2025-08-02: EU AI Act applies (prohibited practices, GPAI, Art.95-96)
2026-08-02: Full Act application — Annex III high-risk obligations begin
2027-08-02: Art.85 comprehensive review published
              → Legislative amendments begin (if recommended)
2028-08-02: Art.97 evaluation published
              → Delegated acts amending Annexes possible (no full Parliament vote)
              → GPAI threshold adjustment most likely here
              → New Annex III categories possible
2029-early:  Delegated act amendments from Art.97 evaluation take effect
2030-08-02:  Second Art.97 evaluation cycle begins

For developers building AI systems today, the compliance architecture decision window is approximately 2026–2027 — before the Art.85 review that will generate recommendations, and well before the Art.97 delegated acts that could rapidly expand Annex III.

30-Item Art.97 Future-Proofing Checklist

Category 1: Understanding Your Review Exposure (5 items)

Category 2: Annex III Pre-Compliance (5 items)

Category 3: GPAI Threshold Monitoring (5 items)

Category 4: Art.84 Data Management (5 items)

Category 5: Infrastructure Future-Proofing (5 items)

Category 6: Commission Monitoring and Response (5 items)

See Also