2026-06-10·5 min read·sota.io Team

EU AI Act Art.17 QMS Testing, Validation and Verification: Building the T&V Pipeline (2026)

Post #3 in the sota.io EU AI Act Art.17 Quality Management System Series

EU AI Act Art.17 QMS Testing Validation Verification Pipeline

The previous posts in this series established the QMS framework (Art.17 foundations) and the documentation structure (Annex IV technical file). This post covers the operational heart of any QMS: what you actually test, when you test it, and how the evidence lands in your technical file.

Art.17(1)(d) is the clause that connects paper to practice. It requires "examination, test and validation procedures to be carried out before, during and after the development of the high-risk AI system, and the frequency with which they are to be carried out." This is not a documentation requirement — it is an engineering requirement. If your test procedures are not defined, versioned, and traceable to risk, a notified body will flag your QMS as incomplete.


The V&V Distinction That Matters for Art.17

Testing in software engineering is often treated as a single activity. Under Art.17, it splits into two legally distinct concepts that carry different documentation obligations.

Verification answers: Did we build the system correctly? It checks that the implementation matches the specification — unit tests, integration tests, contract tests, static analysis. Verification artifacts live in your CI pipeline and your Annex IV technical file section on design control.

Validation answers: Did we build the correct system? It checks that the system meets its intended purpose and performs as expected in deployment conditions — performance benchmarks against Art.15 requirements, acceptance tests run against real-world distributions, edge-case evaluations with domain experts. Validation artifacts connect your QMS to your Art.9 risk management system and to the Annex IV technical file section on performance metrics.

The distinction matters because notified bodies audit them separately. A QMS that collapses both into "we ran the test suite" will fail an Art.43 conformity assessment review. Your QMS documentation must explicitly label which procedures are verification procedures and which are validation procedures, and both must appear in your technical file.


The Three Testing Gates Art.17 Requires

Art.17(1)(d) specifies "before, during and after development." Each gate has a distinct scope.

Gate 1: Pre-Development — Requirements Testability Review

Before a single line of model code is written, Art.17 requires you to establish what testable criteria your system must satisfy. This is not about writing tests early — it is about ensuring your requirements are falsifiable.

What this gate produces:

The Art.9 intersection: Your Art.9 risk management system identifies risks. Gate 1 must map each identified risk to at least one test procedure that would detect or bound that risk. This mapping is the primary evidence that your testing is risk-based rather than ad hoc — and it is the first thing a notified body will look for.

What goes in your Annex IV file: The requirements testability matrix and the initial test strategy document. Date-stamped and Git-committed before development begins.

Gate 2: During Development — Continuous Verification

Art.17(1)(d) specifies that procedures must be carried out "during" development. This maps to your CI/CD pipeline — the automated gate that runs on every merge.

What this gate produces:

The frequency requirement: Art.17(1)(d) requires specification of "the frequency with which they are to be carried out." For CI gates, the answer is "on every merge to main." Document this explicitly in your QMS. Auditors expect to see a defined frequency, not just test results.

Storing CI artifacts as QMS evidence: Your pipeline must archive test results in a location that is not just the CI cache — cached artifacts expire. The QMS-compliant pattern is to push test reports to a versioned artifact store (S3-compatible, object storage, or your documentation system) with a reference committed to Git. The test result becomes a first-class QMS artifact with a permanent URL, not a transient CI log.

Infrastructure jurisdiction note: If your CI/CD runs on a US-controlled cloud provider (AWS, Azure, GCP without explicit EU-only guarantees), your test artifacts and potentially your model weights are subject to the CLOUD Act. For high-risk AI systems that process personal data or operate in regulated sectors, running your CI pipeline on EU-jurisdiction infrastructure (Hetzner, OVHcloud, Exoscale) ensures your quality evidence is not subject to US law enforcement access requests without your knowledge. This is not a hypothetical — it affects the reliability of your Art.16 obligations as a provider.

Gate 3: Post-Development — Validation and Acceptance

Before deployment, Art.17 requires validation: demonstrating that the system meets its intended purpose under realistic conditions. This is the gate that directly feeds your Art.43 conformity assessment package.

What this gate produces:

The Art.15 bridge: Art.15 requires that high-risk AI systems achieve "an appropriate level of accuracy, robustness and cybersecurity." These are not narrative claims — they are test results. Your Gate 3 validation must produce quantitative evidence for each Art.15 dimension, with thresholds defined in your Gate 1 testability matrix. If your post-deployment monitoring later shows the system falling below those thresholds, Art.20 requires you to take corrective action and notify market surveillance authorities.


Building the Git-Native T&V Pipeline

A QMS-compliant T&V pipeline has three properties that distinguish it from a standard software test suite.

Traceability: Every test must trace to a requirement or a risk. Tests that exist for developer convenience but trace to nothing cannot be presented as QMS evidence. Your test framework should enforce requirement tags — either through test annotations (@requirement:REQ-007) or through a separate traceability matrix maintained in Git alongside your tests.

Immutability of evidence: Test results used for conformity assessment must be immutable. Use content-addressed storage (Git tags, artifact digests) to ensure that the test report you present to a notified body is demonstrably the one generated from the specific model version under assessment.

Separation of concerns: Development test runs and compliance test runs are different. Your developers run tests on every commit for fast feedback. Your compliance validation runs on release candidates, uses the locked test dataset defined in your Gate 1 strategy, and produces results signed by the designated responsible person. These must be separate pipelines with separate artifact stores, even if they share test code.

A minimal Git-native implementation:

repo/
├── tests/
│   ├── unit/               # Gate 2 verification
│   ├── integration/        # Gate 2 verification
│   ├── compliance/         # Gate 3 validation (separate pipeline)
│   │   ├── acceptance/
│   │   └── performance/
│   └── traceability.json   # REQ-ID → test file mappings
├── qms/
│   ├── test-strategy.md    # Gate 1 document
│   ├── requirements/
│   │   └── testability-matrix.md
│   └── results/
│       └── v1.0.0/         # Immutable release test evidence
│           ├── unit.xml
│           ├── integration.xml
│           └── compliance.xml
└── .github/
    └── workflows/
        ├── ci.yml          # Gate 2 — on every merge
        └── release.yml     # Gate 3 — on release candidate tag

The qms/results/ directory is never overwritten — only new version directories are added. This gives you an append-only evidence store in Git that a notified body can audit.


Risk-Based Test Coverage: The Art.9 Integration

Art.17 does not require 100% test coverage. It requires that your test procedures are proportionate to the risk identified in your Art.9 risk management system. This is the single most important concept for developers who feel overwhelmed by compliance testing overhead.

Risk-based testing means:

  1. High-severity, high-likelihood risks get mandatory test cases. If your Art.9 risk register identifies "model output used without human verification in a clinical decision" as a top risk, you need an explicit test procedure that validates human oversight control points — connecting to your Art.14 implementation.

  2. Low-severity risks get sampling or periodic testing. A risk like "UI renders incorrectly on certain browsers" does not require dedicated compliance test infrastructure — your standard development testing is sufficient if documented in the testability matrix as covering this risk.

  3. Residual risks after mitigation are documented, not tested to zero. If your risk management determines that a risk is acceptable after mitigation, document that decision in the risk register and note which test procedures verify that the mitigation is in place. You do not need to prove the risk never materialises — you need to prove the mitigation is effective.

The traceability matrix maps these relationships:

Risk IDSeverityMitigationTest ProcedureTest Gate
RISK-001CriticalHuman override requiredcompliance/acceptance/test_override.pyGate 3
RISK-012HighInput validationunit/test_input_sanitizer.pyGate 2
RISK-034MediumLoggingintegration/test_audit_trail.pyGate 2
RISK-089LowDocumentation— (documented, no dedicated test)N/A

This table, kept in your QMS repository alongside your Art.9 risk register, is the primary evidence that your testing is compliant with Art.17's risk-proportionate approach.


What Notified Bodies Actually Check

When a notified body reviews your QMS under Art.43, the T&V section review focuses on five things.

Completeness of the testability matrix. Can they trace every requirement in your technical file to at least one test procedure? Gaps here are findings — not because the untested requirement is necessarily a problem, but because it signals you have not determined whether it is a problem.

Definition of the compliance test dataset. What dataset did you use for Gate 3 validation? Is it described with sufficient specificity that the test could be reproduced? Is it separate from training data? If you cannot answer these questions, your validation is not documented sufficiently for Art.17 compliance.

Evidence that post-deployment testing is planned. Art.72 requires a post-market monitoring plan, and Art.17 requires that your QMS addresses it. Notified bodies will look for a documented procedure for ongoing validation — not just the one-time pre-deployment run.

Signature and responsibility chain. Who signed off on the Gate 3 validation? Your QMS must designate a responsible person. If test results have no named owner, they carry no accountability, which is an Art.17 gap.

CI pipeline configuration in version control. Auditors increasingly ask to see the CI configuration file that implements Gate 2. If your .github/workflows/ci.yml (or equivalent) is not version-controlled or is not committed alongside your test code, you have not documented the frequency and conditions under which verification runs — an explicit Art.17(1)(d) requirement.


The August 2026 Deadline: What to Prioritise

If you are building a high-risk AI system that must comply by August 2, 2026, and your T&V pipeline does not yet meet Art.17 standards, prioritise in this order.

Week 1: Create the requirements testability matrix. This is a spreadsheet or markdown table — it does not require new test infrastructure. It gives you a baseline of what you need to test and what you already test. Most teams find they have 70–80% of required test procedures already; they just are not documented as QMS artifacts.

Weeks 2–3: Implement the compliance test pipeline as a separate CI workflow that runs on release candidates. Even if it runs the same tests as your development pipeline initially, the separation and the artifact archiving to qms/results/ make the results presentable as conformity assessment evidence.

Week 4: Conduct a Gate 3 validation run and produce the acceptance test report. This is the primary artifact notified bodies request. Date-stamp it, sign it, commit the reference to Git.

Ongoing: Wire the CI frequency requirement into your QMS documentation. One line: "Verification procedures defined in CI workflow ci.yml run on every merge to the main branch." That sentence, in a document with a name and a date, satisfies Art.17(1)(d)'s frequency specification.


Summary

Art.17(1)(d) requires more than a test suite — it requires a documented T&V program with defined procedures, defined frequency, traceability to requirements and risks, and immutable evidence. The three testing gates (pre-development, during development, post-development) map to verification and validation respectively, feeding your Annex IV technical file with the artifacts notified bodies expect.

The practical path is to formalise what you already do: name your test procedures, tag them to requirements, archive the results alongside your code, and separate your compliance runs from your development runs. Most of the compliance work is documentation discipline applied to engineering that is already happening.

The next post in this series covers Art.17 change management — what your QMS must do when a model update, a training data change, or a new deployment environment constitutes a substantial modification under Art.45.


sota.io is EU-native managed PaaS — Hetzner Germany, no US parent, no CLOUD Act exposure. Deploy your high-risk AI compliance infrastructure with full EU jurisdiction certainty. View pricing.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.