EU AI Act Art.10: Data Governance for High-Risk AI — Dataset Splits, Lineage Tracking, and the Bias Detection Carve-Out (2026)
Article 10 of the EU AI Act is the data foundation of the high-risk compliance framework. Where Art.9 defines the risk management system and Art.13 covers transparency, Art.10 specifies the concrete data governance practices that every provider of a high-risk AI system must implement, document, and maintain for their training, validation, and testing datasets.
August 2, 2026 is the hard deadline: Art.10 applies to all high-risk AI systems placed on the EU market or put into service on or after that date. If your system falls under Annex III — healthcare diagnostics, biometric identification, critical infrastructure, employment screening, creditworthiness, law enforcement, or education — you need an Art.10-compliant data governance pipeline before market placement.
This guide covers what Art.10 actually requires, where most developers get it wrong, how the Art.10(5)–(6) bias detection carve-out works in practice, what your data lineage documentation must contain for Annex IV, and how to build Art.10-compliant data governance in Python.
What Art.10 Covers and Who It Applies To
Art.10 applies to providers of high-risk AI systems as defined by Art.3(3) — the entity that places the system on the EU market or puts it into service under their own name or trademark. This includes:
- Companies training foundation models adapted for high-risk applications
- Organisations fine-tuning general-purpose models for Annex III use cases
- Developers building custom classifiers for employment, credit, or medical decisions
- Any entity substantially modifying a high-risk system (Art.25(1) triggers provider obligations)
Art.10 applies to three dataset categories:
| Dataset | Art.10 Obligation | When Triggered |
|---|---|---|
| Training datasets | Full Art.10(2) data governance | Before any training run |
| Validation datasets | Full Art.10(2) data governance | During development and evaluation |
| Testing datasets | Full Art.10(2) data governance | Pre-deployment testing, Art.9(6) compliance testing |
The word "appropriate" in Art.10(1) gives some proportionality room, but the six specific areas in Art.10(2) are not optional — they describe what appropriate data governance must cover.
The Six Art.10(2) Obligations
Art.10(2) defines six areas that data governance and management practices must cover. These are the auditable baseline for any high-risk AI system:
1. Relevant Design Choices (Art.10(2)(a))
Providers must document the relevant design choices made regarding data collection. This includes:
- Which data sources were selected and why
- What collection methodologies were applied
- What sampling strategies were used
- How temporal coverage was determined
The emphasis on relevance means you need to justify your choices against the intended purpose and the Annex III category. A healthcare diagnostic AI sourcing training data from a single hospital network must document why that source adequately represents the intended deployment population.
2. Data Collection Procedures and Purpose (Art.10(2)(b))
Data collection procedures must be documented and must establish purpose limitations — the data must be collected for a specified purpose that is consistent with the high-risk AI use case. Art.10(2)(b) creates a direct link to GDPR Art.5(1)(b) (purpose limitation) for personal data in training sets.
Operationally, this means:
- Documented collection protocols (web scraping scripts, API access logs, data purchase agreements)
- Purpose definitions written before collection begins
- Records of which data subjects or data sources consented to AI training use
3. Data Processing Operations (Art.10(2)(c))
All preprocessing, cleaning, enriching, aggregating, and labelling operations must be documented. This creates the processing chain record that links raw inputs to final training datasets. Art.10(2)(c) is the closest Art.10 comes to requiring data lineage tracking — though the obligation is documentation-level, not technical tooling.
The practical implication: every transformation applied to a dataset must be traceable. If you drop rows, normalise features, apply oversampling, or run deduplication, the method and parameters must be documented and repeatable.
4. Formulation of Assumptions (Art.10(2)(d))
Providers must document the assumptions made regarding data — particularly assumptions about what the data represents. This includes:
- The population or world-state the data is intended to model
- Known limitations in the data's representativeness
- Assumptions about temporal stability (will the data distribution hold in deployment?)
- Assumptions about the relationship between training context and deployment context
This is one of the most underestimated obligations. The EU AI Act does not require providers to guarantee their assumptions are correct — but it requires that they be explicit, so that supervisory authorities can evaluate whether the assumptions were reasonable.
5. Availability, Quantity, and Suitability Assessment (Art.10(2)(e))
Providers must examine whether the data is:
- Available in sufficient quantities for the intended use case
- Quantitatively sufficient given the system's output complexity and the performance standard required under Art.9(6)
- Suitable — meaning fit for purpose given the Annex III category, the deployment context, and the risk profile established under Art.9
This is where dataset size justification becomes a compliance artefact. The technical documentation must contain a reasoned assessment of why the dataset size is adequate, not merely a count of samples.
6. Examination for Biases (Art.10(2)(f))
The final obligation is to examine whether training, validation, and testing data may contain biases that could lead to violations of EU law or harm to health, safety, or fundamental rights, and if applicable, to mitigate those biases.
Art.10(2)(f) works in conjunction with Art.10(5)–(6) (the special category carve-out). The obligation to examine for bias is unconditional. The ability to process special-category data to detect and correct that bias is conditional on the Art.10(5)–(6) requirements being met.
Dataset Quality Requirements: Art.10(3) and (4)
Beyond the six management obligations, Art.10 sets substantive quality standards:
Art.10(3): Training, validation, and testing datasets shall be:
- Relevant to the intended purpose
- Sufficiently representative
- Free of errors and complete to the best extent possible
- Appropriate for the specific deployment context (Art.10(5))
The phrase "to the best extent possible" is a proportionality qualifier — perfect data is not required, but providers must make demonstrable, documented efforts to maximise data quality within reasonable constraints.
Art.10(4): Datasets shall have the appropriate statistical properties, including regarding persons or groups of persons on whom the system produces outputs. This is particularly relevant for:
- Demographic breakdown of training populations vs. deployment populations
- Class imbalance documentation and mitigation
- Sub-group performance metrics (disaggregated evaluation)
- Temporal distribution alignment
Statistical property documentation is one of the hardest things to retrofit. Providers who collect data without capturing demographic metadata will struggle to demonstrate Art.10(4) compliance at audit time.
Dataset Split Strategy for Art.10 Compliance
Art.10 applies separately to training, validation, and testing datasets. This has operational consequences for how you structure your ML pipeline:
The Three-Dataset Problem
Many practitioners treat the train/test split as a technical detail. Under Art.10, it becomes a compliance artefact:
- Training set: Subject to Art.10(2) in full. All preprocessing, bias examination, and quality assessment must be documented before training begins.
- Validation set: Must be held out from training and documented separately. Using validation data to tune hyperparameters is permitted, but the validation set must meet the same Art.10(3)–(4) quality standards as the training set.
- Test set: Must be genuinely unseen — not used during development. Art.9(6) requires testing datasets to be appropriate for the intended purpose and consistent with the deployment context (Art.10(5)).
Avoiding Contamination
Data contamination — where information from the test set leaks into training through shared preprocessing, shared tokenisation vocabularies, or shared normalisation statistics — creates a compliance gap. Art.10(2)(c) requires documentation of all processing operations. If preprocessing statistics are derived from the full dataset before splitting, this must be documented and justified.
Best practice under Art.10:
- Define splits before any exploratory analysis
- Compute normalisation statistics on training split only
- Apply the same transformations to validation and test splits using training-derived parameters
- Document the split methodology, including any stratification used to maintain class balance
Stratified Splits and Art.10(4)
Where Art.10(4) requires appropriate statistical properties "including regarding persons or groups of persons," stratified splitting becomes not just good practice but a compliance obligation for Annex III systems. If your system affects individuals in protected-characteristic categories (age, sex, ethnicity, disability), the test set must contain sufficient representation to enable meaningful disaggregated evaluation.
Data Lineage Documentation for Annex IV
Annex IV Section 2 requires technical documentation to include "a description of the training, testing and validation methodologies used including information on the type, origin, and provenance of the training data sets."
Art.10(2)(a)–(c) translate into a lineage record that must capture at minimum:
Dataset Lineage Record
├── Source Registration
│ ├── Source ID, Name, Type (licensed corpus, web scrape, internal data)
│ ├── Access date and URL/endpoint
│ ├── License or data processing agreement reference
│ └── Known limitations of the source
├── Collection Run Record
│ ├── Collection method (API, download, annotator task)
│ ├── Parameters (date range, geographic scope, sampling strategy)
│ ├── Collection timestamp
│ └── Raw record count
├── Processing Chain
│ ├── Step 1: Deduplication (method, parameters, records removed)
│ ├── Step 2: Quality filtering (criteria, threshold, records removed)
│ ├── Step N: ... (all transformation steps)
│ └── Final record count per split
└── Quality Assessment
├── Bias examination method and findings (Art.10(2)(f))
├── Statistical properties documentation (Art.10(4))
└── Suitability assessment (Art.10(2)(e))
This record must be reproducible — a supervisory authority requesting a repeat of the data preparation pipeline must be able to reconstruct the final dataset from the lineage record.
The Art.10(5)–(6) Bias Detection Carve-Out
Art.10(5) and (6) contain one of the most important — and most misunderstood — provisions in the EU AI Act. Art.10(5) permits processing of special categories of personal data (GDPR Art.9(1) data — race, ethnic origin, health, biometric data, etc.) specifically for the purpose of bias monitoring, detection, and correction.
The carve-out has strict conditions under Art.10(6):
| Condition | Requirement |
|---|---|
| Strict necessity | Processing must be strictly necessary for bias detection/correction — no broader purpose |
| Subject to appropriate safeguards | GDPR Art.9(4) safeguards apply; national law may impose additional requirements |
| No transmission outside | Special category data used for bias detection must not be transmitted or further processed beyond this purpose |
| Protective measures | State-of-the-art security measures, anonymisation as soon as possible |
What This Carve-Out Enables in Practice
Without Art.10(6), providers would be blocked from using demographic data (ethnicity, health status, disability) to detect that their system performs worse for certain groups, because processing that data for AI training purposes would typically lack a lawful basis under GDPR Art.9(2).
Art.10(6) provides that lawful basis for bias monitoring — but only for bias monitoring. You cannot use the carve-out to:
- Train on special-category data for purposes other than bias correction
- Retain special-category bias data beyond what is necessary for the detection task
- Transfer the bias analysis dataset to third-party evaluators without separate legal basis
Implementing Art.10(6) Correctly
# Example: Bias evaluation with Art.10(6) carve-out compliance
class BiasEvaluationPipeline:
"""
Art.10(6)-compliant bias detection pipeline.
Special-category data processed under strict necessity;
anonymised as soon as evaluation completes.
"""
def __init__(self, model, protected_attributes: list[str]):
self.model = model
self.protected_attributes = protected_attributes
self._evaluation_log = []
def evaluate_bias(
self,
evaluation_dataset: pd.DataFrame,
label_col: str,
prediction_col: str,
) -> BiasReport:
# 1. Process under Art.10(6) — strictly necessary scope only
results = {}
for attr in self.protected_attributes:
if attr not in evaluation_dataset.columns:
continue
# 2. Compute disaggregated metrics — never retain raw attribute data
group_metrics = self._compute_group_metrics(
evaluation_dataset, attr, label_col, prediction_col
)
results[attr] = group_metrics
# 3. Anonymise immediately — no retention beyond this function
report = BiasReport(
metrics=results,
evaluation_timestamp=datetime.utcnow(),
art10_6_basis="bias monitoring and detection",
)
# 4. Log processing for Art.10(2)(c) documentation
self._evaluation_log.append({
"timestamp": report.evaluation_timestamp.isoformat(),
"attributes_examined": self.protected_attributes,
"records_processed": len(evaluation_dataset),
"special_category_retained": False, # critical: must be False
})
return report
def _compute_group_metrics(
self, df: pd.DataFrame, group_col: str, label_col: str, pred_col: str
) -> dict:
metrics = {}
for group_val in df[group_col].unique():
mask = df[group_col] == group_val
group_df = df[mask]
# Compute only aggregate statistics — never return individual rows
metrics[str(group_val)] = {
"n": int(mask.sum()),
"accuracy": float((group_df[label_col] == group_df[pred_col]).mean()),
"positive_rate": float(group_df[pred_col].mean()),
}
return metrics
def get_processing_log(self) -> list[dict]:
"""Returns audit log for Art.10(2)(c) documentation."""
return self._evaluation_log
The critical constraint: special_category_retained: False. At no point should the pipeline persist individual-level special-category records. Only aggregate metrics flow out.
Integration with Art.9: The Data-Risk Loop
Art.10 and Art.9 form a closed loop in the high-risk compliance framework:
Art.9(2): Identify risks from data (including biases in training data)
↓
Art.10(2)(f): Examine training data for biases that could cause those risks
↓
Art.10(6): Process special-category data to detect and correct bias
↓
Art.9(3): Evaluate residual risk after bias correction — is it acceptable?
↓
Art.9(6): Test with Art.10(3)-compliant testing datasets
↓
Art.9(4)(d): Update risk management when new data bias evidence emerges
↑ (feedback loop)
This loop has a practical implication: Art.10 data governance is not a one-time pre-training activity. Post-market monitoring under Art.72 can generate new evidence of bias in deployment (e.g., demographic disparities in outcomes). When this evidence emerges, Art.9 requires the RMS to be updated, which may trigger a new round of Art.10 data review and retraining.
The living document obligation in Art.9 extends to your data governance records. The Annex IV technical documentation must reflect the current state of your dataset and bias mitigation, not the state at initial market placement.
Art.10 × Art.15: Accuracy, Robustness, and Dataset Quality
Art.15 requires high-risk AI systems to achieve "appropriate levels of accuracy, robustness and cybersecurity." The connection to Art.10 is direct: the quality of the training dataset is the primary determinant of accuracy and robustness.
Art.15 does not set specific accuracy thresholds — those are determined by the harmonised standards or common specifications adopted under Art.40–41. But providers cannot invoke Art.15 compliance without demonstrating that the training data met Art.10 standards, because Art.15 performance cannot be evaluated independently of the data used to produce it.
For Annex IV technical documentation, this means accuracy metrics must be accompanied by:
- The dataset on which accuracy was measured (test split, characteristics)
- The bias examination conducted under Art.10(2)(f)
- Evidence that the test set meets Art.10(3)–(5) quality requirements
Art.10 Checklist — 30 Items
Use this checklist to assess Art.10 readiness before market placement:
Data Governance Documentation (Art.10(2))
- Relevant design choices for data collection documented (sources, sampling strategy, temporal scope)
- Collection procedures documented with purpose definitions
- All preprocessing operations documented with method and parameters
- Assumptions about data representativeness explicitly stated
- Dataset quantity and suitability assessment completed and documented
- Bias examination performed and findings documented for each split
Dataset Quality (Art.10(3)–(5))
- Relevance to intended purpose assessed against Annex III category
- Statistical representativeness evaluated (including demographic representation)
- Error rate and completeness examined; known errors documented
- Deployment context alignment assessed (geographic, behavioural, functional)
- Art.10(5) contextual considerations documented for the specific use case
Statistical Properties (Art.10(4))
- Class distribution documented for each split
- Demographic breakdown documented where relevant to outputs
- Disaggregated performance metrics computed and documented
- Sub-group performance gaps identified and addressed
Dataset Split Protocol
- Training, validation, and test splits defined before exploratory analysis
- Normalisation statistics derived from training split only
- Split stratification documented (by class, by demographic group)
- Test set confirmed as genuinely unseen throughout development
- Split contamination check completed
Bias Detection (Art.10(2)(f) and Art.10(6))
- Bias examination methodology selected and documented
- Art.10(6) special-category processing basis established if applicable
- Bias evaluation logs retained (Art.10(2)(c) processing records)
- Aggregate-only metrics retained — no individual special-category records
- Bias mitigation measures applied and documented where bias detected
- Residual bias post-mitigation documented
Lineage and Annex IV
- Data lineage record covers source, collection, processing, quality assessment
- Lineage is reproducible from documentation alone
- Annex IV Section 2 (data description) completed using lineage records
- Art.10 documentation version-controlled alongside model artefacts
- Update trigger defined: when new bias evidence triggers Art.10 review
Common Art.10 Failure Modes
Treating Art.10 as a one-time gate. Art.10 data governance must be updated when new post-market evidence suggests dataset issues. Providers who treat the initial data review as permanent are non-compliant when deployment evidence reveals bias.
Not separating training statistics from test split. Normalisation using full-dataset statistics contaminates the test split. This is not just a methodological problem — it means your Annex IV accuracy claim is based on a compromised evaluation.
Conflating the Art.10(6) carve-out with a general AI training basis. Art.10(6) permits special-category processing only for bias monitoring, detection, and correction. Using the carve-out as a general authorisation to collect demographic data for training purposes is a GDPR Art.9 violation.
Failing to document the absence of bias. "No bias found" is a finding that requires documentation: what method was used, what threshold was applied, what groups were examined. An absence claim without a documented examination is not Art.10(2)(f) compliance.
Missing Art.10(4) for sub-groups. Art.10(4) requires statistical properties to be appropriate "regarding persons or groups of persons." This is not satisfied by overall accuracy metrics. Disaggregated evaluation is required.
Key Takeaways
Art.10 is the data foundation of the EU AI Act's high-risk framework. Its requirements are specific, auditable, and interconnected with Art.9 (risk management), Art.15 (accuracy), and Annex IV (technical documentation).
The three highest-impact Art.10 requirements for most providers:
-
Data lineage documentation: Every source, collection procedure, and preprocessing operation must be traceable. Retrofitting this after model development is significantly harder than building it in from the start.
-
Disaggregated bias examination: Art.10(2)(f) requires examining for biases across relevant demographic groups, not just overall performance. Missing sub-group evaluation is the most common Art.10 gap found in conformity assessments.
-
Art.10(6) compliance for bias detection: If your bias detection pipeline touches ethnicity, health status, or other special-category data, you need the Art.10(6) carve-out conditions met — including strict necessity, immediate aggregation, and no retention of individual-level special-category records.
The August 2, 2026 deadline creates urgency for providers who have not yet built these practices into their ML pipelines. The technical documentation required under Annex IV cannot be assembled after the fact without access to the original data sources, collection records, and preprocessing logs.
See also: EU AI Act Art.9: Risk Management System for High-Risk AI (2026) · EU AI Act Art.8: Compliance Requirements for High-Risk AI Systems · EU AI Act Art.13: Transparency and Instructions for Use (2026)