2026-06-03·5 min read·sota.io Team

EU AI Act NCA Inspection: What Compliance Testing Evidence Inspectors Actually Examine (Art.15 + Annex IV)

Post #1480 in the sota.io EU AI Act Audit Readiness Series

EU AI Act compliance testing evidence for NCA inspectors — validation records and performance metrics dashboard

When a National Competent Authority (NCA) arrives to inspect your high-risk AI system, they do not start with your risk management documentation or your EU Declaration of Conformity. They start with your testing records.

This surprises many providers. But it makes sense from an inspector's perspective: testing records are objective, timestamped, and hard to fake retroactively. If your accuracy benchmarks, robustness tests, and bias audits were done properly, they leave an audit trail. If they were not done — or were done without rigor — that gap becomes visible within the first hour of inspection.

This post covers what EU AI Act Article 15 and Annex IV require in terms of testing and validation evidence, what NCA inspectors specifically look for, and how to structure your test documentation to survive inspection.


Why Testing Evidence Is the NCA's First Priority

Under Article 74 of the EU AI Act, NCAs have broad powers to inspect high-risk AI systems placed on the EU market. They can request access to technical documentation, source code, training datasets, and — critically — the testing and validation records that form the evidentiary basis for your conformity claim.

Article 15 establishes the substantive requirements: high-risk AI systems must achieve appropriate levels of accuracy, robustness, and cybersecurity. But the proof that you met those levels lives in your testing documentation.

NCAs look at testing evidence first because it answers the threshold question: was this system actually validated before deployment? Everything else — risk management, human oversight, post-market monitoring — is downstream of that answer.


What Article 15 Requires (and What Inspectors Check)

Article 15 imposes three categories of technical requirements on high-risk AI systems:

1. Accuracy

Your system must achieve "appropriate" accuracy for its intended purpose. The Regulation does not prescribe specific accuracy thresholds — that would be impossible given the diversity of use cases — but it does require you to:

What inspectors flag: accuracy metrics defined only on training data; no held-out test set; performance not broken down by population subgroups; thresholds undefined or defined post-hoc.

2. Robustness

Article 15 requires high-risk AI systems to be resilient against errors, faults, and inconsistencies — whether arising from within the system or from intentional adversarial manipulation.

Your robustness testing documentation should cover:

What inspectors flag: robustness testing limited to happy-path scenarios; no adversarial input testing; failure modes identified but not documented; consistency testing absent.

3. Cybersecurity

Article 15 explicitly extends to cybersecurity of the AI system itself — not just the infrastructure hosting it. This means:

What inspectors flag: cybersecurity section of testing documentation absent entirely; threat model limited to infrastructure (network, server) without AI-specific attack vectors; pre-trained components used without provenance documentation.


Annex IV: The Technical Documentation Checklist for Testing

Annex IV specifies the minimum content of the technical documentation required under Article 11. The testing-relevant sections include:

Section 2(d): Description of the validation and testing procedures used, including metrics and performance criteria relevant to the intended purpose

Section 2(e): Description of the measures applied for the examination, testing, and validation of the AI system, including information on the datasets used

Section 2(f): Description of the system's performance in terms of accuracy, robustness, and cybersecurity, as referred to in Article 15

Section 2(g): Information on the tests performed and the results obtained to demonstrate compliance with the requirements in Chapter III, Section 2

Section 5: For systems that continue to learn after deployment, post-deployment monitoring plan and performance metrics

NCAs use Annex IV as their inspection checklist. Each section above corresponds to a document (or set of documents) they will request. If a section is missing or thin, the inspection lengthens — because the inspector now needs to probe manually what the documentation should have explained.


The Five Testing Evidence Categories NCAs Examine

Based on the Article 15 requirements and Annex IV structure, NCA inspectors typically organize their review around five evidence categories:

Category 1: Pre-Deployment Validation Package

This is your core testing evidence — the results of all validation runs performed before the system was placed on the market.

Required contents:

Common gap: validation package exists but is tied to a version that is no longer deployed. NCAs expect documentation aligned to the current production version.

Category 2: Robustness and Edge Case Testing Records

This category documents testing beyond normal operating conditions.

Required contents:

Common gap: robustness testing was performed but not documented separately from general validation — inspectors cannot distinguish edge case results from standard performance metrics.

Category 3: Bias and Fairness Audit Records

For any high-risk AI system that makes decisions affecting individuals, the EU AI Act's data governance requirements (Article 10) and accuracy requirements (Article 15) together create an implicit bias testing obligation.

Required contents:

Common gap: bias testing performed only on training data distribution, not on independent test sets; results not documented in a form that NCAs can evaluate.

Category 4: Post-Deployment Monitoring Evidence

For systems deployed and operating, Article 15 requirements do not end at launch. NCAs will ask for evidence that you are monitoring performance in production.

Required contents:

Common gap: monitoring plan exists in documentation but no evidence it is actually operating; no records of monitoring reviews; thresholds defined but alerts never triggered (suspicious if system has been live more than six months).

Category 5: Third-Party and Pre-Trained Component Validation

If your system uses pre-trained model weights, API-based AI services, or embedded third-party components, NCAs will ask how you validated those components for your intended use case.

Required contents:

Common gap: assumption that the third-party provider's own testing suffices — it does not. You bear conformity responsibility for the system as deployed, which means you must validate third-party components in your specific use case and document that validation.


How to Organize Testing Evidence for NCA Inspection

Structure matters as much as content. An inspector who cannot find the document they need within two minutes may record an adverse finding — not because the document does not exist, but because the documentation structure does not support audit review.

Recommended structure:

/compliance-evidence/
  /testing/
    /pre-deployment-validation/
      v1.2.3_validation_report_2026-05-15.pdf
      v1.2.3_test_set_metadata.json
      v1.2.3_subgroup_performance_breakdown.xlsx
    /robustness-testing/
      v1.2.3_adversarial_test_results_2026-05-10.pdf
      v1.2.3_failure_modes_register.xlsx
    /bias-fairness-audit/
      v1.2.3_fairness_audit_2026-05-12.pdf
      v1.2.3_subgroup_metrics.xlsx
    /post-deployment-monitoring/
      monitoring_plan_v1.2.pdf
      monitoring_review_2026-05-01.pdf
      monitoring_review_2026-06-01.pdf
    /third-party-validation/
      component_inventory.xlsx
      openai_gpt4_validation_our_usecase_2026-04-20.pdf

Version alignment: every document in /testing/ must reference the same model version currently in production. If you have updated your model since the documents were created, you need updated testing documentation for the current version.

Timestamps and signatories: every test report should carry a timestamp and the identity of the team or individual who ran the test. Anonymous or undated reports are not acceptable under NCA inspection.

Immutability evidence: where possible, store testing records in a system that creates tamper-evident audit trails (content-addressed storage, signed commits, or dedicated compliance document management tools). NCAs increasingly ask for evidence that records were not modified after the fact.


What Triggers an Extended Inspection

An NCA inspection that goes beyond its standard duration is almost always caused by one of three documentation failures:

1. Version mismatch — testing records apply to a previous model version, not the one currently deployed. The inspector must now determine whether the untested changes affected the system's compliance posture. This requires additional evidence gathering and potentially testing the current version during the inspection.

2. Coverage gaps — one or more of the five categories above is absent. The inspector must probe deeper into other documents to reconstruct what should have been explicitly documented. This is time-consuming for both parties.

3. Internal inconsistency — performance metrics in the validation report differ from metrics cited in the risk management documentation or the technical documentation summary. Inconsistency is a red flag that documentation was assembled after the fact rather than maintained continuously.

None of these triggers require bad faith on the provider's part. They arise from ordinary development practices that do not account for the audit requirements of Article 15 and Annex IV. The solution is to treat testing documentation as a compliance artifact from the beginning of the project, not a deliverable assembled before an audit.


August 2, 2026: The Practical Deadline for Testing Evidence

August 2, 2026 is when the main provisions of the EU AI Act apply to high-risk AI systems under Annex III. For providers who have not yet structured their testing evidence for NCA inspection, this is the deadline for completing that work.

The practical order of operations:

  1. Inventory your current testing evidence against the five categories above. Identify gaps.
  2. Align documentation to current production version. If you have released model updates since your last validation, run and document updated tests.
  3. Create the directory structure described above and populate it with existing and new documents.
  4. Implement monitoring records if post-deployment monitoring documentation is absent.
  5. Add third-party validation records for any AI components not validated in your deployment context.

Completing steps 1–5 before August 2, 2026 means that if an NCA initiates an inspection in the months following that date, your testing evidence is ready to produce within the inspection's initial documentation request window.


Summary: Testing Evidence Checklist

Before the August 2026 compliance deadline, verify that your testing documentation includes:

Pre-deployment validation

Robustness testing

Bias and fairness

Post-deployment monitoring

Third-party components


See Also


This is post 4 of 5 in the AUDIT-READINESS-SPRINT-2026 series. The finale (post 5) covers the complete NCA inspection readiness checklist that brings all five series posts together into a single operational self-assessment.

Deploying EU-compliant software on infrastructure that is itself outside the reach of foreign surveillance orders reduces your compliance surface area. sota.io is EU-native managed PaaS — no US parent company, no CLOUD Act exposure, hosted on Hetzner Germany.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.