2026-04-22·14 min read·

EU AI Act Art.10: Data Governance for High-Risk AI — Dataset Splits, Lineage Tracking, and the Bias Detection Carve-Out (2026)

Article 10 of the EU AI Act is the data foundation of the high-risk compliance framework. Where Art.9 defines the risk management system and Art.13 covers transparency, Art.10 specifies the concrete data governance practices that every provider of a high-risk AI system must implement, document, and maintain for their training, validation, and testing datasets.

August 2, 2026 is the hard deadline: Art.10 applies to all high-risk AI systems placed on the EU market or put into service on or after that date. If your system falls under Annex III — healthcare diagnostics, biometric identification, critical infrastructure, employment screening, creditworthiness, law enforcement, or education — you need an Art.10-compliant data governance pipeline before market placement.

This guide covers what Art.10 actually requires, where most developers get it wrong, how the Art.10(5)–(6) bias detection carve-out works in practice, what your data lineage documentation must contain for Annex IV, and how to build Art.10-compliant data governance in Python.


What Art.10 Covers and Who It Applies To

Art.10 applies to providers of high-risk AI systems as defined by Art.3(3) — the entity that places the system on the EU market or puts it into service under their own name or trademark. This includes:

Art.10 applies to three dataset categories:

DatasetArt.10 ObligationWhen Triggered
Training datasetsFull Art.10(2) data governanceBefore any training run
Validation datasetsFull Art.10(2) data governanceDuring development and evaluation
Testing datasetsFull Art.10(2) data governancePre-deployment testing, Art.9(6) compliance testing

The word "appropriate" in Art.10(1) gives some proportionality room, but the six specific areas in Art.10(2) are not optional — they describe what appropriate data governance must cover.


The Six Art.10(2) Obligations

Art.10(2) defines six areas that data governance and management practices must cover. These are the auditable baseline for any high-risk AI system:

1. Relevant Design Choices (Art.10(2)(a))

Providers must document the relevant design choices made regarding data collection. This includes:

The emphasis on relevance means you need to justify your choices against the intended purpose and the Annex III category. A healthcare diagnostic AI sourcing training data from a single hospital network must document why that source adequately represents the intended deployment population.

2. Data Collection Procedures and Purpose (Art.10(2)(b))

Data collection procedures must be documented and must establish purpose limitations — the data must be collected for a specified purpose that is consistent with the high-risk AI use case. Art.10(2)(b) creates a direct link to GDPR Art.5(1)(b) (purpose limitation) for personal data in training sets.

Operationally, this means:

3. Data Processing Operations (Art.10(2)(c))

All preprocessing, cleaning, enriching, aggregating, and labelling operations must be documented. This creates the processing chain record that links raw inputs to final training datasets. Art.10(2)(c) is the closest Art.10 comes to requiring data lineage tracking — though the obligation is documentation-level, not technical tooling.

The practical implication: every transformation applied to a dataset must be traceable. If you drop rows, normalise features, apply oversampling, or run deduplication, the method and parameters must be documented and repeatable.

4. Formulation of Assumptions (Art.10(2)(d))

Providers must document the assumptions made regarding data — particularly assumptions about what the data represents. This includes:

This is one of the most underestimated obligations. The EU AI Act does not require providers to guarantee their assumptions are correct — but it requires that they be explicit, so that supervisory authorities can evaluate whether the assumptions were reasonable.

5. Availability, Quantity, and Suitability Assessment (Art.10(2)(e))

Providers must examine whether the data is:

This is where dataset size justification becomes a compliance artefact. The technical documentation must contain a reasoned assessment of why the dataset size is adequate, not merely a count of samples.

6. Examination for Biases (Art.10(2)(f))

The final obligation is to examine whether training, validation, and testing data may contain biases that could lead to violations of EU law or harm to health, safety, or fundamental rights, and if applicable, to mitigate those biases.

Art.10(2)(f) works in conjunction with Art.10(5)–(6) (the special category carve-out). The obligation to examine for bias is unconditional. The ability to process special-category data to detect and correct that bias is conditional on the Art.10(5)–(6) requirements being met.


Dataset Quality Requirements: Art.10(3) and (4)

Beyond the six management obligations, Art.10 sets substantive quality standards:

Art.10(3): Training, validation, and testing datasets shall be:

The phrase "to the best extent possible" is a proportionality qualifier — perfect data is not required, but providers must make demonstrable, documented efforts to maximise data quality within reasonable constraints.

Art.10(4): Datasets shall have the appropriate statistical properties, including regarding persons or groups of persons on whom the system produces outputs. This is particularly relevant for:

Statistical property documentation is one of the hardest things to retrofit. Providers who collect data without capturing demographic metadata will struggle to demonstrate Art.10(4) compliance at audit time.


Dataset Split Strategy for Art.10 Compliance

Art.10 applies separately to training, validation, and testing datasets. This has operational consequences for how you structure your ML pipeline:

The Three-Dataset Problem

Many practitioners treat the train/test split as a technical detail. Under Art.10, it becomes a compliance artefact:

Avoiding Contamination

Data contamination — where information from the test set leaks into training through shared preprocessing, shared tokenisation vocabularies, or shared normalisation statistics — creates a compliance gap. Art.10(2)(c) requires documentation of all processing operations. If preprocessing statistics are derived from the full dataset before splitting, this must be documented and justified.

Best practice under Art.10:

  1. Define splits before any exploratory analysis
  2. Compute normalisation statistics on training split only
  3. Apply the same transformations to validation and test splits using training-derived parameters
  4. Document the split methodology, including any stratification used to maintain class balance

Stratified Splits and Art.10(4)

Where Art.10(4) requires appropriate statistical properties "including regarding persons or groups of persons," stratified splitting becomes not just good practice but a compliance obligation for Annex III systems. If your system affects individuals in protected-characteristic categories (age, sex, ethnicity, disability), the test set must contain sufficient representation to enable meaningful disaggregated evaluation.


Data Lineage Documentation for Annex IV

Annex IV Section 2 requires technical documentation to include "a description of the training, testing and validation methodologies used including information on the type, origin, and provenance of the training data sets."

Art.10(2)(a)–(c) translate into a lineage record that must capture at minimum:

Dataset Lineage Record
├── Source Registration
│   ├── Source ID, Name, Type (licensed corpus, web scrape, internal data)
│   ├── Access date and URL/endpoint
│   ├── License or data processing agreement reference
│   └── Known limitations of the source
├── Collection Run Record
│   ├── Collection method (API, download, annotator task)
│   ├── Parameters (date range, geographic scope, sampling strategy)
│   ├── Collection timestamp
│   └── Raw record count
├── Processing Chain
│   ├── Step 1: Deduplication (method, parameters, records removed)
│   ├── Step 2: Quality filtering (criteria, threshold, records removed)
│   ├── Step N: ... (all transformation steps)
│   └── Final record count per split
└── Quality Assessment
    ├── Bias examination method and findings (Art.10(2)(f))
    ├── Statistical properties documentation (Art.10(4))
    └── Suitability assessment (Art.10(2)(e))

This record must be reproducible — a supervisory authority requesting a repeat of the data preparation pipeline must be able to reconstruct the final dataset from the lineage record.


The Art.10(5)–(6) Bias Detection Carve-Out

Art.10(5) and (6) contain one of the most important — and most misunderstood — provisions in the EU AI Act. Art.10(5) permits processing of special categories of personal data (GDPR Art.9(1) data — race, ethnic origin, health, biometric data, etc.) specifically for the purpose of bias monitoring, detection, and correction.

The carve-out has strict conditions under Art.10(6):

ConditionRequirement
Strict necessityProcessing must be strictly necessary for bias detection/correction — no broader purpose
Subject to appropriate safeguardsGDPR Art.9(4) safeguards apply; national law may impose additional requirements
No transmission outsideSpecial category data used for bias detection must not be transmitted or further processed beyond this purpose
Protective measuresState-of-the-art security measures, anonymisation as soon as possible

What This Carve-Out Enables in Practice

Without Art.10(6), providers would be blocked from using demographic data (ethnicity, health status, disability) to detect that their system performs worse for certain groups, because processing that data for AI training purposes would typically lack a lawful basis under GDPR Art.9(2).

Art.10(6) provides that lawful basis for bias monitoring — but only for bias monitoring. You cannot use the carve-out to:

Implementing Art.10(6) Correctly

# Example: Bias evaluation with Art.10(6) carve-out compliance

class BiasEvaluationPipeline:
    """
    Art.10(6)-compliant bias detection pipeline.
    Special-category data processed under strict necessity;
    anonymised as soon as evaluation completes.
    """
    
    def __init__(self, model, protected_attributes: list[str]):
        self.model = model
        self.protected_attributes = protected_attributes
        self._evaluation_log = []
    
    def evaluate_bias(
        self,
        evaluation_dataset: pd.DataFrame,
        label_col: str,
        prediction_col: str,
    ) -> BiasReport:
        # 1. Process under Art.10(6) — strictly necessary scope only
        results = {}
        for attr in self.protected_attributes:
            if attr not in evaluation_dataset.columns:
                continue
            # 2. Compute disaggregated metrics — never retain raw attribute data
            group_metrics = self._compute_group_metrics(
                evaluation_dataset, attr, label_col, prediction_col
            )
            results[attr] = group_metrics
            
        # 3. Anonymise immediately — no retention beyond this function
        report = BiasReport(
            metrics=results,
            evaluation_timestamp=datetime.utcnow(),
            art10_6_basis="bias monitoring and detection",
        )
        
        # 4. Log processing for Art.10(2)(c) documentation
        self._evaluation_log.append({
            "timestamp": report.evaluation_timestamp.isoformat(),
            "attributes_examined": self.protected_attributes,
            "records_processed": len(evaluation_dataset),
            "special_category_retained": False,  # critical: must be False
        })
        
        return report
    
    def _compute_group_metrics(
        self, df: pd.DataFrame, group_col: str, label_col: str, pred_col: str
    ) -> dict:
        metrics = {}
        for group_val in df[group_col].unique():
            mask = df[group_col] == group_val
            group_df = df[mask]
            # Compute only aggregate statistics — never return individual rows
            metrics[str(group_val)] = {
                "n": int(mask.sum()),
                "accuracy": float((group_df[label_col] == group_df[pred_col]).mean()),
                "positive_rate": float(group_df[pred_col].mean()),
            }
        return metrics
    
    def get_processing_log(self) -> list[dict]:
        """Returns audit log for Art.10(2)(c) documentation."""
        return self._evaluation_log

The critical constraint: special_category_retained: False. At no point should the pipeline persist individual-level special-category records. Only aggregate metrics flow out.


Integration with Art.9: The Data-Risk Loop

Art.10 and Art.9 form a closed loop in the high-risk compliance framework:

Art.9(2): Identify risks from data (including biases in training data)
     ↓
Art.10(2)(f): Examine training data for biases that could cause those risks
     ↓
Art.10(6): Process special-category data to detect and correct bias
     ↓
Art.9(3): Evaluate residual risk after bias correction — is it acceptable?
     ↓
Art.9(6): Test with Art.10(3)-compliant testing datasets
     ↓
Art.9(4)(d): Update risk management when new data bias evidence emerges
     ↑ (feedback loop)

This loop has a practical implication: Art.10 data governance is not a one-time pre-training activity. Post-market monitoring under Art.72 can generate new evidence of bias in deployment (e.g., demographic disparities in outcomes). When this evidence emerges, Art.9 requires the RMS to be updated, which may trigger a new round of Art.10 data review and retraining.

The living document obligation in Art.9 extends to your data governance records. The Annex IV technical documentation must reflect the current state of your dataset and bias mitigation, not the state at initial market placement.


Art.10 × Art.15: Accuracy, Robustness, and Dataset Quality

Art.15 requires high-risk AI systems to achieve "appropriate levels of accuracy, robustness and cybersecurity." The connection to Art.10 is direct: the quality of the training dataset is the primary determinant of accuracy and robustness.

Art.15 does not set specific accuracy thresholds — those are determined by the harmonised standards or common specifications adopted under Art.40–41. But providers cannot invoke Art.15 compliance without demonstrating that the training data met Art.10 standards, because Art.15 performance cannot be evaluated independently of the data used to produce it.

For Annex IV technical documentation, this means accuracy metrics must be accompanied by:


Art.10 Checklist — 30 Items

Use this checklist to assess Art.10 readiness before market placement:

Data Governance Documentation (Art.10(2))

Dataset Quality (Art.10(3)–(5))

Statistical Properties (Art.10(4))

Dataset Split Protocol

Bias Detection (Art.10(2)(f) and Art.10(6))

Lineage and Annex IV


Common Art.10 Failure Modes

Treating Art.10 as a one-time gate. Art.10 data governance must be updated when new post-market evidence suggests dataset issues. Providers who treat the initial data review as permanent are non-compliant when deployment evidence reveals bias.

Not separating training statistics from test split. Normalisation using full-dataset statistics contaminates the test split. This is not just a methodological problem — it means your Annex IV accuracy claim is based on a compromised evaluation.

Conflating the Art.10(6) carve-out with a general AI training basis. Art.10(6) permits special-category processing only for bias monitoring, detection, and correction. Using the carve-out as a general authorisation to collect demographic data for training purposes is a GDPR Art.9 violation.

Failing to document the absence of bias. "No bias found" is a finding that requires documentation: what method was used, what threshold was applied, what groups were examined. An absence claim without a documented examination is not Art.10(2)(f) compliance.

Missing Art.10(4) for sub-groups. Art.10(4) requires statistical properties to be appropriate "regarding persons or groups of persons." This is not satisfied by overall accuracy metrics. Disaggregated evaluation is required.


Key Takeaways

Art.10 is the data foundation of the EU AI Act's high-risk framework. Its requirements are specific, auditable, and interconnected with Art.9 (risk management), Art.15 (accuracy), and Annex IV (technical documentation).

The three highest-impact Art.10 requirements for most providers:

  1. Data lineage documentation: Every source, collection procedure, and preprocessing operation must be traceable. Retrofitting this after model development is significantly harder than building it in from the start.

  2. Disaggregated bias examination: Art.10(2)(f) requires examining for biases across relevant demographic groups, not just overall performance. Missing sub-group evaluation is the most common Art.10 gap found in conformity assessments.

  3. Art.10(6) compliance for bias detection: If your bias detection pipeline touches ethnicity, health status, or other special-category data, you need the Art.10(6) carve-out conditions met — including strict necessity, immediate aggregation, and no retention of individual-level special-category records.

The August 2, 2026 deadline creates urgency for providers who have not yet built these practices into their ML pipelines. The technical documentation required under Annex IV cannot be assembled after the fact without access to the original data sources, collection records, and preprocessing logs.


See also: EU AI Act Art.9: Risk Management System for High-Risk AI (2026) · EU AI Act Art.8: Compliance Requirements for High-Risk AI Systems · EU AI Act Art.13: Transparency and Instructions for Use (2026)