EU AI Act Art.10 Data Governance 2026: High-Risk AI Provider Guide
Post #1395 in the sota.io EU AI Act Compliance Series — EU-AI-ACT-PROVIDER-SPRINT-2026 #2/5
Every high-risk AI model your team trains runs on data. Under EU AI Act Art.10, that data is no longer just an engineering concern — it becomes a legal compliance artifact. The quality of your training sets, the governance practices around data collection, the documented examination of biases: all of this must be established, documented, and maintained before August 2, 2026.
Art.10 is one of the most operational requirements in the EU AI Act. Unlike Art.9 (which governs your risk management process) or Art.17 (which governs your quality management system), Art.10 touches every data engineer, ML engineer, and data scientist on your team. It reaches into data pipelines, annotation workflows, bias testing procedures, and dataset retention policies.
This is Post #2 in the EU AI Act Provider Sprint series. Post #1 covered Art.9 Risk Management System requirements. Post #3 covers Art.11 Technical Documentation, Post #4 covers Art.17 Quality Management System, and Post #5 delivers the Complete Provider Compliance Stack.
Who Art.10 Applies To
Art.10 applies to providers of high-risk AI systems — companies that develop, train, and deploy AI models that fall under the high-risk categories in Annex III of the EU AI Act. These include:
- AI systems used in employment and HR decisions (CV screening, performance assessment, task allocation)
- AI systems used in credit scoring and creditworthiness assessment
- AI systems used in education (admission, assessment, monitoring)
- AI systems in biometric identification and verification
- AI systems affecting access to essential services
If you are a deployer — a company that integrates a third-party AI model into your product — Art.10 obligations fall primarily on the model provider. But you have responsibilities under Art.26 to provide sufficient data and context to the provider, and to ensure the data you supply for customisation meets their governance requirements.
The rest of this guide addresses providers.
The Core Art.10 Requirement: Data Governance Practices
Art.10 requires that providers subject their training, validation, and testing data sets to appropriate data governance and management practices throughout the entire lifecycle of the AI system.
This is a lifecycle obligation. Your data governance practices cannot be applied only at the moment of initial model training and then set aside. They must remain active and documented through model updates, fine-tuning runs, post-market monitoring, and decommissioning.
What "Data Governance and Management Practices" Must Cover
Art.10 specifies that these practices must address eight domains:
1. Design Choices and Data Strategy
Document the relevant design choices that shaped which data was collected and how. Why was this particular dataset chosen? What alternatives were considered? What population or domain does it represent? These choices must be traceable and available for conformity assessment.
2. Data Collection: Processes and Origin
For each dataset, document:
- Where and how the data was collected (surveys, scraping, sensor feeds, human labellers, licensed sources)
- The original purpose of the data collection, particularly for personal data
- The legal basis under which personal data was collected
- Any transformations applied between original collection and use in training
The "original purpose" requirement matters. If you are using personal data that was collected for one purpose (e.g., customer transactions) and repurposing it to train a high-risk model (e.g., a credit risk classifier), that repurposing must be documented and legally justified.
3. Data Preparation Operations
Document all data preparation steps:
- Annotation and labelling: who performed the labelling, what guidelines were applied, what inter-annotator agreement was achieved, how disagreements were resolved
- Cleaning: what data quality checks were applied, what records were excluded and why, what missing-value strategies were used
- Enrichment: what external signals or features were added to the dataset
- Aggregation: how individual records were combined into summary features
- Retention: what data was retained vs. discarded after training, for how long, and on what basis
Each of these operations must be reproducible and documented. If you cannot explain what cleaning steps were applied to your training data, you are not Art.10 compliant.
4. Assumptions: What the Data Is Supposed to Measure
Formulate and document the relevant assumptions underlying your data:
- What real-world phenomenon is the training data supposed to capture?
- What is the intended target variable, and is it a valid proxy for what you are trying to predict?
- What population is the model intended to serve, and does the training data represent that population?
- Where does the proxy variable diverge from the actual desired outcome (label noise, selection bias, historical bias)?
This is particularly important for automated decision systems. A model trained to predict "loan default" using historical data that reflects discriminatory lending practices encodes those practices into its predictions. Art.10 requires you to examine and document this explicitly — not to guarantee bias-free models, but to demonstrate that you have examined the assumptions and taken appropriate measures.
5. Availability, Quantity, and Suitability Assessment
Conduct and document an assessment of whether your datasets are:
- Available: can you access the data you need, and is access sustainable?
- Sufficient in quantity: is the dataset large enough to train a model that performs reliably across the intended population?
- Suitable for the purpose: does the dataset actually represent the situations the model will encounter in deployment?
For high-risk AI systems, "suitable" includes demographic and geographic representativeness. A medical diagnostic AI trained entirely on data from Western European male patients aged 40–65 is not suitable for deployment to a general population — this inadequacy must be identified and documented.
6. Bias Examination
This is the most operationally demanding Art.10 requirement for most providers.
Art.10 requires that you examine your datasets for possible biases that are likely to affect health and safety, have an adverse impact on fundamental rights, or lead to discrimination prohibited under EU law.
The examination must be active and documented, not a passing acknowledgement that bias may exist. It must include:
- Identification of which population subgroups appear in the dataset
- Analysis of representation gaps (groups that are underrepresented relative to their prevalence in the intended deployment population)
- Analysis of performance disparities across subgroups (groups where model accuracy, false positive rate, or false negative rate differs meaningfully from the overall metric)
- Documentation of which biases were found, which were mitigated, and how
- Explanation of residual biases that could not be mitigated, and the rationale for accepting them
For models that affect fundamental rights — employment decisions, credit, law enforcement, education — this bias examination is not optional and must be demonstrably thorough.
7. Bias Detection and Mitigation Measures
For each identified bias, document the measures applied:
- Pre-processing: resampling, reweighting, data augmentation, synthetic data generation
- In-processing: fairness constraints during model training, adversarial debiasing
- Post-processing: threshold adjustment, output calibration per subgroup
- Why each measure was selected
- Evidence of its effectiveness (validation set metrics before and after)
- Known limitations and residual risk
Note that Art.10 does not require zero bias — it requires that you identify, assess, and appropriately address bias. A well-documented process that identifies bias and applies reasonable mitigation measures with residual risk explicitly accepted and documented is Art.10 compliant. An undocumented process that happens to produce a "fair" model is not.
8. Data Gaps and Shortcomings
Identify any data gaps — categories of real-world situations, population segments, or edge cases that are absent from or underrepresented in your training data — and document how they are being addressed. Options include:
- Collecting additional data for the gap
- Restricting the model's intended use to exclude scenarios not covered by the training data
- Adding explicit human oversight requirements for inputs that fall in the gap
- Accepting the gap with documented rationale and risk assessment
Dataset Quality Requirements
Beyond governance practices, Art.10 sets direct quality requirements for training, validation, and testing data sets. The datasets must be:
Relevant: The data must actually capture the features and outcomes the model is designed to predict. Irrelevant features in training data create spurious correlations and make models brittle.
Representative: The data must reflect the range of situations, populations, and contexts the model will encounter in deployment. A model trained on a non-representative dataset will perform well on its training distribution but fail on real-world inputs.
Free from errors to the extent possible: Label noise, annotation errors, data entry mistakes, and encoding inconsistencies must be identified and corrected as far as practicable. Residual errors must be documented.
Complete to the extent possible: Systematic missing data (e.g., a protected characteristic that is consistently absent from a segment of the dataset) must be identified and addressed.
Appropriate statistical properties: The datasets must have documented statistical properties including class distributions, feature distributions, and where applicable, distributions across the persons or groups the system is intended to serve.
"To the extent possible" is an important qualifier throughout Art.10. The law does not require perfect data — it requires best-effort compliance with documented evidence of what you did and why. The key is documentation: if you know your dataset has limitations, you must document them.
Geographical, Contextual, Behavioural, and Functional Setting
Art.10 requires that datasets account for the specific setting in which the high-risk AI system is intended to be used. This has practical implications for deployment context:
- A fraud detection model trained on transaction data from Germany may not transfer reliably to Southern European markets with different payment behaviour patterns
- A medical diagnosis model trained on data from hospital settings may perform differently in remote/telehealth contexts
- An HR screening model trained on applications from one decade may encode outdated norms if used a decade later
If your model is intended for deployment across multiple geographic markets, cultural contexts, or operational settings, your data governance documentation must address how the dataset represents — or fails to represent — each of those settings.
Special Categories Data Under Art.10
One of the most practically significant provisions in Art.10 is the authorised processing of special categories of personal data for bias monitoring.
Under GDPR Art.9, processing special categories of personal data — including racial or ethnic origin, political opinions, religious beliefs, health data, sexual orientation, biometric data, trade union membership — is generally prohibited outside of specific legal bases.
Art.10 of the EU AI Act creates a specific derogation: providers of high-risk AI systems may process special categories of personal data strictly for the purposes of ensuring bias monitoring, detection, and correction, subject to:
- Implementation of appropriate safeguards for fundamental rights and freedoms
- The data being processed only to the extent strictly necessary
- The existence of technical and organisational measures to protect the data
This is significant for providers whose high-risk AI systems affect protected groups. To comply with Art.10's bias examination requirements, you may need demographic data (race, gender, age, health status) about individuals in your training dataset in order to audit for disparate impact. The Art.10 derogation provides a legal path to do this — but with strict constraints.
In practice, this means:
- You can annotate or collect demographic attributes on training data subjects for bias auditing purposes
- This annotation must be for bias monitoring only, not for use as model features
- You must implement strong access controls and security measures for this sensitive data
- You should retain this data only as long as needed for ongoing bias monitoring
- You must document the legal basis (Art.10 derogation + GDPR technical safeguards) in your data processing records
Work with your DPO before implementing this. The Art.10 derogation is available, but the processing must be strictly scoped to bias monitoring and documented carefully.
Art.10 Integration with Art.9, Art.11, and Art.12
Art.10 does not stand alone. Its requirements feed directly into three other provider obligations:
Art.9: Risk Management System
The bias examination and data gap identification from Art.10 are inputs to the Art.9 risk management process. When you identify that your training data underrepresents a particular demographic group, that is a risk — it belongs in your risk management system as a known risk with documented mitigation measures and residual risk acceptance.
Your Art.9 risk management system should have a formal "data risk" category that captures:
- Representation biases identified during Art.10 examination
- Label noise and annotation quality risks
- Data staleness risk (model trained on historical data that no longer reflects current patterns)
- Data supply risk (dependency on a third-party data source)
Art.11: Technical Documentation
Art.11 requires providers to maintain technical documentation demonstrating compliance with the EU AI Act. Art.10 compliance evidence is a mandatory component of that documentation. Your technical documentation must include:
- Description of training, validation, and testing data sets including sources and processing
- Description of data governance and management practices applied
- Results of bias examination including identified biases and mitigation measures
- Evidence of dataset quality (statistical properties, test metrics by subgroup)
- Documentation of data gaps and shortcomings
Without Art.10 documentation, your Art.11 technical documentation is incomplete and conformity assessment cannot proceed.
Art.12: Record-Keeping
Art.12 requires high-risk AI systems to automatically log events relevant to risk assessment. Data-level events that must feed into Art.12 record-keeping include:
- When new training data was introduced (fine-tuning runs, dataset updates)
- When bias monitoring was performed and what it found
- When model behaviour changed significantly (drift detection)
Implementation Roadmap: Eight Weeks to Art.10 Compliance
Weeks 1–2: Data Inventory and Documentation Baseline
- Audit all datasets used for training, validation, and testing of each high-risk AI system
- Document data origins: source, collection method, original purpose, legal basis for personal data
- Identify which datasets include personal data or special categories data
- Map data preparation steps: annotation, cleaning, enrichment, aggregation procedures currently applied
Weeks 3–4: Bias Examination
- For each training dataset, identify which population subgroups are present
- Run representation analysis: are subgroup frequencies in the dataset consistent with their prevalence in the intended deployment population?
- Run performance disparity analysis: train a model (or use your existing model) and evaluate accuracy, false positive rate, and false negative rate by subgroup
- Document all identified biases with their potential impact on health, safety, and fundamental rights
- If special-categories data is needed for bias examination: document the Art.10 derogation, implement technical safeguards, consult with DPO
Weeks 5–6: Bias Mitigation and Gap Assessment
- For each identified bias: evaluate pre-processing, in-processing, and post-processing mitigation options
- Apply selected mitigation measures and validate effectiveness on held-out data
- Document residual biases with rationale for accepting them
- Conduct data gap assessment: which real-world scenarios, populations, and edge cases are underrepresented?
- For each gap: decide between data collection, use-case restriction, human oversight requirement, or documented acceptance
Weeks 7–8: Documentation and Art.11 Integration
- Compile all Art.10 documentation into a structured data governance record
- Review against Art.11 technical documentation requirements
- Ensure Art.9 risk management system includes data-related risks identified in Art.10 examination
- Set up ongoing bias monitoring process (post-launch)
- Plan for dataset update governance: what process triggers re-examination when training data is updated?
Five Common Art.10 Mistakes
Mistake 1: Treating data governance as a one-time pre-launch exercise
Art.10 applies throughout the AI system lifecycle. When you update your training data, fine-tune the model, or add new data sources, the Art.10 governance practices must be re-applied. This means your data team needs ongoing processes, not just a pre-launch checklist.
Mistake 2: Bias examination limited to protected attributes under GDPR
Art.10 requires examination of biases that affect health and safety, fundamental rights, and discrimination prohibited under EU law. This is broader than just race and gender. Consider: age, disability status, geographic origin, socioeconomic background, language, and any attribute that creates systematic disadvantage for a group.
Mistake 3: Documentation at the wrong level of abstraction
"We applied standard data cleaning techniques" is not Art.10 compliant. Your documentation must be specific enough that an external auditor could assess whether the practices were appropriate. Which cleaning steps? What thresholds? Which records were excluded?
Mistake 4: Omitting annotation quality documentation
If your training data required human annotation (labelling, tagging, scoring), the quality of that annotation is a compliance issue. Document: how many annotators, what guidelines they followed, how disagreements were resolved, what inter-annotator agreement score was achieved. Low annotation quality is a known data risk that must appear in your Art.9 risk management system.
Mistake 5: Assuming "good overall metrics" means no bias
A model that achieves 95% overall accuracy may achieve 70% accuracy on a minority subgroup that represents 5% of your training data. Overall performance metrics do not reveal this. Art.10 bias examination requires subgroup-level analysis. Run it before you launch.
Art.10 Compliance Checklist (20 Items)
Data Governance Documentation
- Training, validation, and testing datasets inventoried with source, origin, and collection method
- Original purpose of personal data collection documented for each dataset
- Data preparation operations (annotation, cleaning, enrichment, aggregation) documented with sufficient specificity
- Annotation quality documented: annotators, guidelines, inter-annotator agreement, disagreement resolution
- Data retention policies documented and applied
- Data governance practices confirmed as lifecycle obligations, not one-time activities
Dataset Quality
- Dataset relevance assessed and documented: does the data capture what the model needs to predict?
- Dataset representativeness assessed: does the data reflect the intended deployment population?
- Known errors and data quality issues identified and addressed
- Statistical properties documented: class distributions, feature distributions, subgroup frequencies
- Geographical, contextual, and behavioural setting considerations documented
- Data gaps and shortcomings identified with documented response (collection, restriction, oversight, acceptance)
Bias Examination and Mitigation
- Population subgroups in training data identified
- Representation gap analysis conducted and documented
- Performance disparity analysis conducted by subgroup
- All identified biases documented with potential impact on health, safety, and fundamental rights
- Bias mitigation measures applied with evidence of effectiveness
- Residual biases documented with rationale for acceptance
- Special-categories data processed under Art.10 derogation (if applicable): scope limited to bias monitoring, technical safeguards applied, DPO consulted
Integration
- Art.10 findings incorporated into Art.9 risk management system as data-related risks
- Art.10 documentation included in Art.11 technical documentation package
What Comes Next
Art.10 data governance documentation feeds directly into Art.11 Technical Documentation — the formal record that demonstrates to a conformity assessment body that your AI system meets all applicable requirements. Post #3 in this series covers exactly what Art.11 requires, what the technical documentation package must contain, and how to structure it for efficient conformity assessment: Art.11 Technical Documentation: Provider Obligations.
The August 2, 2026 deadline applies to all high-risk AI systems actively deployed. If your system is in Annex III, your Art.10 data governance practices must be in place and documented by that date.
This guide is for informational purposes. For regulatory advice specific to your AI systems and use cases, consult a qualified legal advisor.
EU AI Act Art.9 Risk Management System · EU AI Act Art.11 Technical Documentation · EU AI Act Art.17 Quality Management System · Provider Sprint Finale
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.