2026-06-10·5 min read·sota.io Team

EU AI Act Art.16 Provider Obligations When Your High-Risk AI System Uses a GPAI Model

Post #1639 in the sota.io EU AI Act Compliance Series — ART16-PROVIDER-OBLIGATIONS-2026 #3/5

EU AI Act Art.16 High-Risk AI GPAI Dependency Technical Documentation Conformity Assessment 2026

You called the GPT-4 API, added some prompt engineering, and shipped a recruitment screening tool. Your product classifies CVs, ranks candidates, and flags "cultural fit" concerns. It runs in the cloud, scales across your SaaS customers, and processes tens of thousands of applications per month.

Under the EU AI Act, that product is almost certainly high-risk AI under Annex III (employment and worker management). And you — the company that built and markets it — are an Art.16 provider with the full set of provider obligations.

The uncomfortable part: those obligations include technical documentation that describes how your system works, conformity assessment that demonstrates it meets Art.9–15 requirements, and registration in the EU AI Act database. But the core decision-making logic runs inside a model you don't own, can't inspect, and can't stop from changing without notice.

This is the GPAI dependency gap in Art.16 compliance. This guide maps exactly how to navigate it.

The Classification That Triggers Art.16

Before diving into documentation requirements, it is worth being precise about which classification scenario creates Art.16 obligations for GPAI-dependent systems. There are two distinct paths:

Path 1 — GPAI model provider (Art.53): You train, fine-tune, or redistribute a general-purpose AI model under your own brand. You place the model itself on the market. This triggers Art.53 obligations: technical documentation about training data, compute, capabilities, and known limitations. This is Anthropic, OpenAI, and Mistral AI's situation — not yours if you are calling their APIs.

Path 2 — High-risk AI system provider (Art.16): You build a product that incorporates GPAI (via API or locally hosted model) and the resulting system qualifies as high-risk under Art.6 and Annex III. You place that system on the market. This triggers Art.16 obligations. This is the scenario for most SaaS developers building AI-powered B2B tools.

The key distinction: you are not obligated as a GPAI model provider. You are obligated as the provider of the AI-powered product that happens to use a GPAI model internally.

Common Annex III triggers for GPAI-dependent products:

Annex III, §2 (Education): AI that determines access to educational institutions or evaluates students
Annex III, §3 (Employment): AI for recruitment, CV screening, performance evaluation, workforce management
Annex III, §5 (Essential private services): AI used for creditworthiness assessment, insurance risk pricing
Annex III, §6 (Law enforcement): AI used for risk assessment, deepfake detection, criminal profiling
Annex III, §8 (Administration of justice): AI used to assist in legal research, dispute resolution

If your GPAI-powered product touches any of these areas, Art.16 applies to you regardless of which foundation model sits underneath.

Art.16(c) + Art.11 + Annex IV: Documenting a GPAI-Dependent Architecture

Art.16(c) requires providers to "draw up the technical documentation in accordance with Article 11 and Annex IV." Art.11 says the documentation must be sufficient for authorities to "assess the conformity of the AI system with the requirements laid down in this Chapter."

For GPAI-dependent systems, this creates a documentation challenge: you must describe a system whose most consequential component — the GPAI model — is outside your direct control and potentially subject to change.

What Annex IV Requires You to Document

Annex IV lists the mandatory contents of high-risk AI technical documentation. The sections most affected by GPAI dependency are:

Annex IV §1 — General description of the AI system:

Intended purpose, deployment context, and use cases
The interaction between your system and the GPAI model: prompt construction, response parsing, post-processing logic
The version of the GPAI model used at time of conformity assessment

Annex IV §2 — Description of elements of the AI system and its development process:

Training data (if any — your GPAI-dependent system may not have a separate training step, in which case document the absence and why)
The pre-trained model used (identify provider, model family, and version)
Fine-tuning or retrieval-augmented generation (RAG) layers if present
Prompt engineering and instruction-tuning approach

Annex IV §3 — Detailed description of the monitoring, functioning and control of the AI system:

Runtime behaviour monitoring — what you observe and log about system inputs and outputs
Accuracy, precision, and recall metrics measured at the system level (not at the GPAI model level)
Procedures for detecting GPAI model drift and version changes

Annex IV §9 — Record-keeping capabilities:

What logs your system generates automatically
Which model version processed which request

The Version Pinning Problem

The most operationally important documentation requirement is model version tracking. Foundation model APIs frequently change: model weights update, safety filters are adjusted, and API deprecations force version migrations. Each of these can change your system's behaviour without any action on your part.

Your technical documentation must address this:

# gpai_dependency_manifest.yaml
# Include this in your Annex IV technical documentation package

system_identifier: "candidate-screening-v2.3"
gpai_component:
  provider: "OpenAI"
  model_family: "GPT-4"
  pinned_version: "gpt-4-0125-preview"  # hard-pin to specific deployment
  api_endpoint: "https://api.openai.com/v1/chat/completions"
  accessed_via: "API (third-party hosted)"
  access_date_for_conformity_assessment: "2026-06-10"
  
conformity_assessment_scope: >
  Conformity assessment covers the system as configured with the pinned model version.
  Migration to a different model version constitutes a system change requiring
  re-assessment under Art.45 (substantial modification review).

version_change_protocol:
  detection: "Automated API response signature monitoring"
  notification_threshold: "Statistically significant accuracy delta on evaluation set"
  escalation: "Halt new deployments, notify NCA if material change detected"

The practical implication: your system should hard-pin to a specific model API version wherever the GPAI provider allows it. Document the pinned version in your technical file. When you migrate to a new version, document whether that constitutes a "substantial modification" under Art.45.

Art.16(e) + Art.43: Conformity Assessment Without Model Weight Access

Art.16(e) requires providers to "ensure that the high-risk AI system undergoes the relevant conformity assessment procedure referred to in Article 43, prior to its placing on the market."

For most high-risk AI systems not requiring a notified body, this means the internal control procedure under Annex VI. You self-certify that the system meets Art.9–15 requirements by conducting your own technical assessment and producing documentation.

For GPAI-dependent systems, internal control is almost always the applicable route — notified bodies assessing AI systems typically focus on systems with accessible technical documentation, and GPAI-based systems using third-party APIs generally fall outside the Annex VII (notified body involvement) categories.

What Art.43 Internal Control Looks Like for GPAI-Dependent Systems

The internal control procedure (Annex VI) requires you to:

Verify that your technical documentation under Annex IV is complete — including the GPAI dependency manifest above
Conduct technical testing to verify compliance with Art.9–15
Issue an EU declaration of conformity under Art.47
Affix CE marking under Art.48

The critical question: how do you verify compliance with Art.15 (accuracy, robustness, cybersecurity) when you cannot inspect the model?

The answer: black-box behavioral testing.

You do not need to understand the model's weights to verify that the system performs within your stated accuracy bounds, exhibits consistent behavior across protected demographic groups, and degrades gracefully under adversarial inputs.

# conformity_test_suite.py
# Black-box conformity testing for high-risk AI using GPAI API

import pytest
import json
import statistics
from typing import List, Dict

class ConformityTestSuite:
    """
    EU AI Act Art.43 conformity assessment via black-box behavioral testing.
    Tests the system as a whole — input/output boundary — without requiring
    model weight access.
    """
    
    def __init__(self, system_client, evaluation_dataset: List[Dict]):
        self.client = system_client
        self.eval_data = evaluation_dataset
        self.results = []
    
    def test_accuracy_bounds(self, min_accuracy: float = 0.85) -> bool:
        """Art.15(1): Verify accuracy thresholds across evaluation dataset."""
        correct = 0
        total = len(self.eval_data)
        
        for item in self.eval_data:
            prediction = self.client.predict(item["input"])
            if prediction["decision"] == item["ground_truth_label"]:
                correct += 1
        
        accuracy = correct / total
        self.results.append({
            "test": "accuracy_bounds",
            "accuracy": accuracy,
            "passed": accuracy >= min_accuracy,
            "requirement": f"Art.15(1): min {min_accuracy:.0%}"
        })
        return accuracy >= min_accuracy
    
    def test_demographic_consistency(self) -> bool:
        """Art.15(1) + Art.10: Verify consistent performance across demographic groups."""
        group_accuracies = {}
        
        for item in self.eval_data:
            group = item.get("demographic_group")
            if group not in group_accuracies:
                group_accuracies[group] = {"correct": 0, "total": 0}
            
            prediction = self.client.predict(item["input"])
            group_accuracies[group]["total"] += 1
            if prediction["decision"] == item["ground_truth_label"]:
                group_accuracies[group]["correct"] += 1
        
        accuracy_by_group = {
            g: v["correct"] / v["total"]
            for g, v in group_accuracies.items()
            if v["total"] > 0
        }
        
        max_disparity = max(accuracy_by_group.values()) - min(accuracy_by_group.values())
        passed = max_disparity < 0.05  # 5pp max disparity threshold
        
        self.results.append({
            "test": "demographic_consistency",
            "accuracy_by_group": accuracy_by_group,
            "max_disparity": max_disparity,
            "passed": passed,
            "requirement": "Art.15(1) + Art.10: max 5pp demographic accuracy gap"
        })
        return passed
    
    def test_adversarial_robustness(self) -> bool:
        """Art.15(3): Verify system robustness under adversarial inputs."""
        adversarial_cases = [
            item for item in self.eval_data
            if item.get("type") == "adversarial"
        ]
        
        if not adversarial_cases:
            return True  # No adversarial test cases defined
        
        safe_decisions = 0
        for item in adversarial_cases:
            prediction = self.client.predict(item["input"])
            # System should either refuse or maintain accuracy on adversarial inputs
            if prediction.get("refused") or prediction["decision"] == item.get("safe_output"):
                safe_decisions += 1
        
        robustness_rate = safe_decisions / len(adversarial_cases)
        passed = robustness_rate >= 0.95
        
        self.results.append({
            "test": "adversarial_robustness",
            "robustness_rate": robustness_rate,
            "passed": passed,
            "requirement": "Art.15(3): ≥95% safe response on adversarial inputs"
        })
        return passed
    
    def generate_conformity_assessment_report(self) -> Dict:
        """Generate Art.43 conformity assessment report for technical documentation."""
        all_passed = all(r["passed"] for r in self.results)
        
        return {
            "assessment_date": "2026-06-10",
            "system_identifier": "candidate-screening-v2.3",
            "gpai_model_version": "gpt-4-0125-preview",
            "assessment_method": "Internal control (Annex VI) — black-box behavioral testing",
            "tests_conducted": self.results,
            "overall_result": "PASS" if all_passed else "FAIL",
            "eu_doc_ready": all_passed,
            "notes": [
                "Assessment conducted at system boundary; model internals not inspectable.",
                "Version-pinned to gpt-4-0125-preview. Version change triggers reassessment.",
                "Evaluation dataset: 2,000 ground-truth labelled cases, demographic-stratified."
            ]
        }

Annex VI — The Internal Control Checklist for GPAI-Dependent Systems

The internal control procedure requires you to verify each Art.9–15 requirement. Here is the assessment approach for GPAI-dependent architectures:

Art. Requirement	Verifiable Without Model Access?	Assessment Method
Art.9 — Risk management	Yes	Document risks in your RMS; include "GPAI model change" as a risk with mitigation
Art.10 — Data governance	Partial	You control inputs/outputs; document what training data you used for any fine-tuning layer; for base model, cite provider's published data card
Art.11 — Technical documentation	Yes	Produce Annex IV technical file including GPAI dependency manifest
Art.12 — Record-keeping	Yes	Implement logging of all inputs and outputs at API boundary
Art.13 — Transparency to deployers	Yes	Produce instructions including model dependency disclosure
Art.14 — Human oversight	Yes	Design and document human-in-the-loop mechanisms
Art.15 — Accuracy/robustness	Yes (behavioral)	Black-box testing suite as shown above

Art.16(a) + Art.15: Maintaining Compliance When the Foundation Model Changes

The most operationally complex Art.16 obligation for GPAI-dependent systems is Art.16(a): providers must "ensure that their high-risk AI systems are compliant with the requirements set out in Chapter 2." The challenge: compliance is relative to a system behaviour that can change without your action.

When OpenAI or Anthropic updates a model, your system may produce different outputs on the same inputs. If you have passed conformity assessment with the old model behaviour, you may no longer be in conformity after the update.

Three strategies for maintaining Art.16(a) compliance:

Strategy 1 — Hard version pinning. Most GPAI providers offer version-pinned API endpoints (e.g., gpt-4-0125-preview rather than gpt-4). Pin your production system to a specific version. Treat version migration as a system change that triggers re-assessment under Art.45.

Strategy 2 — Automated behavioral drift detection. Continuously compare system outputs against a held-out reference set. Statistical deviation from reference behaviour triggers an alert and potential compliance review.

# gpai_drift_monitor.py
# Art.72 post-market monitoring for GPAI-dependent high-risk AI systems

import statistics
import json
from datetime import datetime, timedelta

class GPAIDriftMonitor:
    """
    Monitors GPAI-dependent high-risk AI for Art.15 compliance drift.
    Runs continuously as part of Art.72 post-market monitoring plan.
    """
    
    def __init__(self, reference_set_path: str, drift_threshold: float = 0.03):
        with open(reference_set_path) as f:
            self.reference_set = json.load(f)
        self.drift_threshold = drift_threshold  # 3pp accuracy delta triggers review
        self.reference_accuracy = self._compute_baseline_accuracy()
    
    def _compute_baseline_accuracy(self) -> float:
        """Baseline accuracy from conformity assessment run."""
        return self.reference_set["conformity_assessment"]["accuracy"]
    
    def check_current_accuracy(self, current_system_client) -> Dict:
        """
        Run reference set through current system and compare against baseline.
        If delta exceeds threshold, flag for Art.16(g) corrective action review.
        """
        correct = 0
        total = len(self.reference_set["test_cases"])
        
        for case in self.reference_set["test_cases"]:
            result = current_system_client.predict(case["input"])
            if result["decision"] == case["expected_output"]:
                correct += 1
        
        current_accuracy = correct / total
        delta = abs(current_accuracy - self.reference_accuracy)
        
        status = {
            "timestamp": datetime.utcnow().isoformat(),
            "reference_accuracy": self.reference_accuracy,
            "current_accuracy": current_accuracy,
            "delta": delta,
            "drift_detected": delta > self.drift_threshold,
            "action_required": delta > self.drift_threshold
        }
        
        if status["drift_detected"]:
            # Art.16(g): Provider must take corrective action when non-conformity detected
            status["recommended_action"] = (
                "Accuracy drift exceeds threshold. "
                "Review GPAI model version for unexpected update. "
                "Consider: (1) re-assess conformity, (2) pin to previous version, "
                "(3) notify deployers per Art.20 if material non-conformity confirmed."
            )
        
        return status

Strategy 3 — Contractual GPAI provider obligations. Your technical documentation should note that you rely on GPAI provider change management. In practice, include a requirement in your vendor agreements (to the extent GPAI providers allow) to receive advance notice of model changes that may affect output quality. Document this as a risk control in your Art.9 risk management system.

Art.16(d) + Art.19: Automatic Record-Keeping for GPAI-Dependent Systems

Art.16(d) requires providers to "keep the logs automatically generated by their high-risk AI systems under Article 12(1) in so far as such logs are under their control."

For GPAI-dependent systems, what is "under your control" is the boundary between your application and the GPAI API. You can and must log:

The input your system constructed (including the prompt sent to the GPAI API)
The GPAI model response
Your system's post-processing logic that converted the GPAI response into a decision
The final decision output
The model version identifier
Timestamps and request identifiers

What is not under your control: internal model activations, attention patterns, or intermediate computations. These are inside the GPAI provider's infrastructure. Your Art.19 logging obligation covers what passes through your system's boundaries.

# art19_gpai_logger.py
# Art.12/19 compliant logging for GPAI-dependent high-risk AI systems

import json
import hashlib
from datetime import datetime
from typing import Any, Dict

class Art19GPAILogger:
    """
    Automatically generates Art.12/19 compliant logs for GPAI-dependent systems.
    Logs are stored under provider control as required by Art.16(d).
    """
    
    def __init__(self, storage_backend, gpai_model_version: str, system_version: str):
        self.storage = storage_backend
        self.gpai_version = gpai_model_version
        self.system_version = system_version
    
    def log_decision(
        self,
        request_id: str,
        raw_input: Dict[str, Any],
        prompt_sent: str,
        gpai_response: str,
        postprocessed_decision: str,
        deployer_id: str,
        subject_id_hash: str  # pseudonymised, not raw subject identifier
    ) -> str:
        """
        Create Art.19 compliant log entry.
        Returns log entry ID for traceability.
        """
        log_entry = {
            "log_format_version": "1.0",
            "regulation": "EU AI Act Art.12/19",
            "timestamp_utc": datetime.utcnow().isoformat(),
            "request_id": request_id,
            "system_identifier": self.system_version,
            "gpai_model_version": self.gpai_version,
            "deployer_id": deployer_id,
            "subject_id_hash": subject_id_hash,
            
            # Logging the input constructed by the system
            "input_summary": {
                "fields_processed": list(raw_input.keys()),
                "input_hash": hashlib.sha256(
                    json.dumps(raw_input, sort_keys=True).encode()
                ).hexdigest()
            },
            
            # The prompt sent to the GPAI model (under provider control)
            "prompt_sent_to_gpai": prompt_sent,
            
            # The GPAI model response (under provider control once received)
            "gpai_response_received": gpai_response,
            
            # Post-processing that converted GPAI response to decision
            "decision_output": postprocessed_decision,
            
            # NCA inspection fields (Art.21)
            "nca_accessible": True,
            "retention_period_years": 10
        }
        
        entry_id = f"log-{request_id}-{datetime.utcnow().strftime('%Y%m%d')}"
        self.storage.write(entry_id, log_entry)
        return entry_id

Storage location and jurisdiction. Your Art.19 logs must be accessible to national competent authorities (Art.21) and therefore should reside in EU jurisdiction. When your GPAI API calls traverse US-hosted infrastructure, the logs your system generates on the EU side still fall under your Art.16(d) control. Ensure your log storage backend is EU-hosted and not itself covered by the US CLOUD Act (i.e., avoid S3 buckets in your AWS account if the CLOUD Act could reach them).

The Art.25 Connection: Documentation of GPAI Responsibilities

Art.25 governs "responsibilities along the AI value chain." For your situation as a high-risk AI system provider building on a GPAI model, Art.25 establishes that you bear full Art.16 obligations — the GPAI model provider's compliance with Art.53 does not substitute for your Art.16 compliance.

Your technical documentation should include a section that explicitly documents this responsibility allocation:

## GPAI Component — Responsibility Documentation (Art.25)

| Obligation | Responsible Party | Reference |
|------------|------------------|-----------|
| GPAI model technical documentation (model cards, training data) | Foundation model provider (OpenAI / Anthropic) | Art.53 |
| High-risk AI system technical documentation (Annex IV) | [Your Company Name] | Art.16(c), Art.11 |
| Conformity assessment of high-risk AI system | [Your Company Name] | Art.16(e), Art.43 |
| EU declaration of conformity | [Your Company Name] | Art.47 |
| CE marking | [Your Company Name] | Art.48 |
| Registration in EU AI Act database | [Your Company Name] | Art.49 |
| Post-market monitoring | [Your Company Name] | Art.72 |
| Incident reporting | [Your Company Name] | Art.73 |

Foundation model provider compliance with their Art.53 obligations is verified
annually via published transparency reports. No delegation of Art.16 obligations
to the foundation model provider is claimed or possible.

This documentation section demonstrates to NCAs that you have correctly understood the responsibility allocation and are not attempting to pass compliance obligations to the GPAI model provider.

Infrastructure Note: EU Jurisdiction for GPAI API Calls

One compliance consideration that sits at the intersection of the EU AI Act and data protection law: when your high-risk AI system processes personal data (as most employment or credit-scoring applications do), the data sent to the GPAI API endpoint may constitute a personal data transfer.

If you send EU data subjects' CVs to api.openai.com (US-hosted), you are making a GDPR Art.46 transfer. The additional risk: under US CLOUD Act provisions, that data may be accessible to US law enforcement even if processed in EU infrastructure. Your Art.9 risk management system should document this transfer and its mitigation.

Hosting your system on EU infrastructure (sota.io, Hetzner Germany) mitigates the infrastructure risk but not the API call transfer. For full CLOUD Act mitigation, consider:

GPAI providers with EU-hosted API endpoints (available for some providers in regulated sectors)
Self-hosted open-weight models (Llama, Mistral) on EU infrastructure where accuracy permits
Data pre-processing to pseudonymise personal data before sending to GPAI APIs

7-Item Checklist: Art.16 Compliance for GPAI-Dependent High-Risk AI

Before placing your GPAI-dependent high-risk AI system on the market or August 2, 2026 deadline (whichever is sooner):

## Art.16 GPAI Dependency Compliance Checklist

### Technical Documentation (Art.16(c), Art.11, Annex IV)
- [ ] Annex IV technical file completed including GPAI dependency manifest
- [ ] GPAI model version pinned and documented
- [ ] Responsibility allocation table (Art.25) included in technical file

### Risk Management (Art.16(a), Art.9)
- [ ] Art.9 RMS includes "GPAI model change" as identified risk with mitigation
- [ ] Version migration procedure documented

### Conformity Assessment (Art.16(e), Art.43)
- [ ] Black-box behavioral test suite executed against pinned model version
- [ ] All Art.15 tests passed: accuracy bounds, demographic consistency, adversarial robustness
- [ ] Internal control (Annex VI) conformity assessment report generated
- [ ] EU declaration of conformity (Art.47) signed
- [ ] CE marking (Art.48) affixed

### Record-Keeping (Art.16(d), Art.19)
- [ ] Art.19 logging implemented at GPAI API boundary
- [ ] Logs include model version identifier per request
- [ ] Log storage in EU jurisdiction verified
- [ ] 10-year retention configured

### Post-Market Monitoring (Art.72)
- [ ] GPAI drift monitor deployed and alert thresholds configured
- [ ] Model version change detection active
- [ ] Quarterly accuracy review scheduled

### Registration (Art.16(f), Art.49)
- [ ] System registered in EU AI Act database before market placement
- [ ] Registration includes GPAI component disclosure

### Deployer Information (Art.16 / Art.13)
- [ ] Art.13 instructions-for-use document prepared for B2B deployers
- [ ] GPAI dependency, model version, and accuracy metrics disclosed to deployers

Key Takeaways

The GPAI dependency gap in Art.16 compliance is real, but manageable. The fundamental principle is that your obligations run to the system you place on the market, not to the foundation model inside it. Foundation model providers carry their own Art.53 obligations; your Art.16 obligations are independent and cannot be delegated to them.

The practical consequence: conformity assessment for GPAI-dependent systems is behavioral and boundary-based. You test and document what the system does, not what the model computes. Version pinning, drift monitoring, and boundary logging together satisfy the core Art.15, Art.12, and Art.72 requirements even without model weight access.

With August 2, 2026 now under 60 days away, providers of GPAI-dependent high-risk AI systems should prioritize: (1) complete their Annex IV technical file with a GPAI dependency manifest, (2) run and document a black-box behavioral conformity assessment, and (3) deploy boundary logging that includes model version tracking. These three actions satisfy the most critical Art.16 obligations before the deadline.

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.

Join the waitlist View plans