2026-05-30·5 min read·sota.io Team

EU AI Act Art.10 Data Governance 2026: High-Risk AI Provider Guide

Post #1395 in the sota.io EU AI Act Compliance Series — EU-AI-ACT-PROVIDER-SPRINT-2026 #2/5

EU AI Act Art.10 Data Governance High-Risk AI Provider Implementation Guide 2026

Every high-risk AI model your team trains runs on data. Under EU AI Act Art.10, that data is no longer just an engineering concern — it becomes a legal compliance artifact. The quality of your training sets, the governance practices around data collection, the documented examination of biases: all of this must be established, documented, and maintained before August 2, 2026.

Art.10 is one of the most operational requirements in the EU AI Act. Unlike Art.9 (which governs your risk management process) or Art.17 (which governs your quality management system), Art.10 touches every data engineer, ML engineer, and data scientist on your team. It reaches into data pipelines, annotation workflows, bias testing procedures, and dataset retention policies.

This is Post #2 in the EU AI Act Provider Sprint series. Post #1 covered Art.9 Risk Management System requirements. Post #3 covers Art.11 Technical Documentation, Post #4 covers Art.17 Quality Management System, and Post #5 delivers the Complete Provider Compliance Stack.


Who Art.10 Applies To

Art.10 applies to providers of high-risk AI systems — companies that develop, train, and deploy AI models that fall under the high-risk categories in Annex III of the EU AI Act. These include:

If you are a deployer — a company that integrates a third-party AI model into your product — Art.10 obligations fall primarily on the model provider. But you have responsibilities under Art.26 to provide sufficient data and context to the provider, and to ensure the data you supply for customisation meets their governance requirements.

The rest of this guide addresses providers.


The Core Art.10 Requirement: Data Governance Practices

Art.10 requires that providers subject their training, validation, and testing data sets to appropriate data governance and management practices throughout the entire lifecycle of the AI system.

This is a lifecycle obligation. Your data governance practices cannot be applied only at the moment of initial model training and then set aside. They must remain active and documented through model updates, fine-tuning runs, post-market monitoring, and decommissioning.

What "Data Governance and Management Practices" Must Cover

Art.10 specifies that these practices must address eight domains:

1. Design Choices and Data Strategy

Document the relevant design choices that shaped which data was collected and how. Why was this particular dataset chosen? What alternatives were considered? What population or domain does it represent? These choices must be traceable and available for conformity assessment.

2. Data Collection: Processes and Origin

For each dataset, document:

The "original purpose" requirement matters. If you are using personal data that was collected for one purpose (e.g., customer transactions) and repurposing it to train a high-risk model (e.g., a credit risk classifier), that repurposing must be documented and legally justified.

3. Data Preparation Operations

Document all data preparation steps:

Each of these operations must be reproducible and documented. If you cannot explain what cleaning steps were applied to your training data, you are not Art.10 compliant.

4. Assumptions: What the Data Is Supposed to Measure

Formulate and document the relevant assumptions underlying your data:

This is particularly important for automated decision systems. A model trained to predict "loan default" using historical data that reflects discriminatory lending practices encodes those practices into its predictions. Art.10 requires you to examine and document this explicitly — not to guarantee bias-free models, but to demonstrate that you have examined the assumptions and taken appropriate measures.

5. Availability, Quantity, and Suitability Assessment

Conduct and document an assessment of whether your datasets are:

For high-risk AI systems, "suitable" includes demographic and geographic representativeness. A medical diagnostic AI trained entirely on data from Western European male patients aged 40–65 is not suitable for deployment to a general population — this inadequacy must be identified and documented.

6. Bias Examination

This is the most operationally demanding Art.10 requirement for most providers.

Art.10 requires that you examine your datasets for possible biases that are likely to affect health and safety, have an adverse impact on fundamental rights, or lead to discrimination prohibited under EU law.

The examination must be active and documented, not a passing acknowledgement that bias may exist. It must include:

For models that affect fundamental rights — employment decisions, credit, law enforcement, education — this bias examination is not optional and must be demonstrably thorough.

7. Bias Detection and Mitigation Measures

For each identified bias, document the measures applied:

Note that Art.10 does not require zero bias — it requires that you identify, assess, and appropriately address bias. A well-documented process that identifies bias and applies reasonable mitigation measures with residual risk explicitly accepted and documented is Art.10 compliant. An undocumented process that happens to produce a "fair" model is not.

8. Data Gaps and Shortcomings

Identify any data gaps — categories of real-world situations, population segments, or edge cases that are absent from or underrepresented in your training data — and document how they are being addressed. Options include:


Dataset Quality Requirements

Beyond governance practices, Art.10 sets direct quality requirements for training, validation, and testing data sets. The datasets must be:

Relevant: The data must actually capture the features and outcomes the model is designed to predict. Irrelevant features in training data create spurious correlations and make models brittle.

Representative: The data must reflect the range of situations, populations, and contexts the model will encounter in deployment. A model trained on a non-representative dataset will perform well on its training distribution but fail on real-world inputs.

Free from errors to the extent possible: Label noise, annotation errors, data entry mistakes, and encoding inconsistencies must be identified and corrected as far as practicable. Residual errors must be documented.

Complete to the extent possible: Systematic missing data (e.g., a protected characteristic that is consistently absent from a segment of the dataset) must be identified and addressed.

Appropriate statistical properties: The datasets must have documented statistical properties including class distributions, feature distributions, and where applicable, distributions across the persons or groups the system is intended to serve.

"To the extent possible" is an important qualifier throughout Art.10. The law does not require perfect data — it requires best-effort compliance with documented evidence of what you did and why. The key is documentation: if you know your dataset has limitations, you must document them.


Geographical, Contextual, Behavioural, and Functional Setting

Art.10 requires that datasets account for the specific setting in which the high-risk AI system is intended to be used. This has practical implications for deployment context:

If your model is intended for deployment across multiple geographic markets, cultural contexts, or operational settings, your data governance documentation must address how the dataset represents — or fails to represent — each of those settings.


Special Categories Data Under Art.10

One of the most practically significant provisions in Art.10 is the authorised processing of special categories of personal data for bias monitoring.

Under GDPR Art.9, processing special categories of personal data — including racial or ethnic origin, political opinions, religious beliefs, health data, sexual orientation, biometric data, trade union membership — is generally prohibited outside of specific legal bases.

Art.10 of the EU AI Act creates a specific derogation: providers of high-risk AI systems may process special categories of personal data strictly for the purposes of ensuring bias monitoring, detection, and correction, subject to:

This is significant for providers whose high-risk AI systems affect protected groups. To comply with Art.10's bias examination requirements, you may need demographic data (race, gender, age, health status) about individuals in your training dataset in order to audit for disparate impact. The Art.10 derogation provides a legal path to do this — but with strict constraints.

In practice, this means:

Work with your DPO before implementing this. The Art.10 derogation is available, but the processing must be strictly scoped to bias monitoring and documented carefully.


Art.10 Integration with Art.9, Art.11, and Art.12

Art.10 does not stand alone. Its requirements feed directly into three other provider obligations:

Art.9: Risk Management System

The bias examination and data gap identification from Art.10 are inputs to the Art.9 risk management process. When you identify that your training data underrepresents a particular demographic group, that is a risk — it belongs in your risk management system as a known risk with documented mitigation measures and residual risk acceptance.

Your Art.9 risk management system should have a formal "data risk" category that captures:

Art.11: Technical Documentation

Art.11 requires providers to maintain technical documentation demonstrating compliance with the EU AI Act. Art.10 compliance evidence is a mandatory component of that documentation. Your technical documentation must include:

Without Art.10 documentation, your Art.11 technical documentation is incomplete and conformity assessment cannot proceed.

Art.12: Record-Keeping

Art.12 requires high-risk AI systems to automatically log events relevant to risk assessment. Data-level events that must feed into Art.12 record-keeping include:


Implementation Roadmap: Eight Weeks to Art.10 Compliance

Weeks 1–2: Data Inventory and Documentation Baseline

Weeks 3–4: Bias Examination

Weeks 5–6: Bias Mitigation and Gap Assessment

Weeks 7–8: Documentation and Art.11 Integration


Five Common Art.10 Mistakes

Mistake 1: Treating data governance as a one-time pre-launch exercise

Art.10 applies throughout the AI system lifecycle. When you update your training data, fine-tune the model, or add new data sources, the Art.10 governance practices must be re-applied. This means your data team needs ongoing processes, not just a pre-launch checklist.

Mistake 2: Bias examination limited to protected attributes under GDPR

Art.10 requires examination of biases that affect health and safety, fundamental rights, and discrimination prohibited under EU law. This is broader than just race and gender. Consider: age, disability status, geographic origin, socioeconomic background, language, and any attribute that creates systematic disadvantage for a group.

Mistake 3: Documentation at the wrong level of abstraction

"We applied standard data cleaning techniques" is not Art.10 compliant. Your documentation must be specific enough that an external auditor could assess whether the practices were appropriate. Which cleaning steps? What thresholds? Which records were excluded?

Mistake 4: Omitting annotation quality documentation

If your training data required human annotation (labelling, tagging, scoring), the quality of that annotation is a compliance issue. Document: how many annotators, what guidelines they followed, how disagreements were resolved, what inter-annotator agreement score was achieved. Low annotation quality is a known data risk that must appear in your Art.9 risk management system.

Mistake 5: Assuming "good overall metrics" means no bias

A model that achieves 95% overall accuracy may achieve 70% accuracy on a minority subgroup that represents 5% of your training data. Overall performance metrics do not reveal this. Art.10 bias examination requires subgroup-level analysis. Run it before you launch.


Art.10 Compliance Checklist (20 Items)

Data Governance Documentation

Dataset Quality

Bias Examination and Mitigation

Integration


What Comes Next

Art.10 data governance documentation feeds directly into Art.11 Technical Documentation — the formal record that demonstrates to a conformity assessment body that your AI system meets all applicable requirements. Post #3 in this series covers exactly what Art.11 requires, what the technical documentation package must contain, and how to structure it for efficient conformity assessment: Art.11 Technical Documentation: Provider Obligations.

The August 2, 2026 deadline applies to all high-risk AI systems actively deployed. If your system is in Annex III, your Art.10 data governance practices must be in place and documented by that date.


This guide is for informational purposes. For regulatory advice specific to your AI systems and use cases, consult a qualified legal advisor.

EU AI Act Art.9 Risk Management System · EU AI Act Art.11 Technical Documentation · EU AI Act Art.17 Quality Management System · Provider Sprint Finale

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.