2026-04-22·14 min read

EU AI Act Art.10: Data Governance for High-Risk AI — Dataset Splits, Lineage Tracking, and the Bias Detection Carve-Out (2026)

Article 10 of the EU AI Act is the data foundation of the high-risk compliance framework. Where Art.9 defines the risk management system and Art.13 covers transparency, Art.10 specifies the concrete data governance practices that every provider of a high-risk AI system must implement, document, and maintain for their training, validation, and testing datasets.

August 2, 2026 is the hard deadline: Art.10 applies to all high-risk AI systems placed on the EU market or put into service on or after that date. If your system falls under Annex III — healthcare diagnostics, biometric identification, critical infrastructure, employment screening, creditworthiness, law enforcement, or education — you need an Art.10-compliant data governance pipeline before market placement.

This guide covers what Art.10 actually requires, where most developers get it wrong, how the Art.10(5)–(6) bias detection carve-out works in practice, what your data lineage documentation must contain for Annex IV, and how to build Art.10-compliant data governance in Python.

What Art.10 Covers and Who It Applies To

Art.10 applies to providers of high-risk AI systems as defined by Art.3(3) — the entity that places the system on the EU market or puts it into service under their own name or trademark. This includes:

Companies training foundation models adapted for high-risk applications
Organisations fine-tuning general-purpose models for Annex III use cases
Developers building custom classifiers for employment, credit, or medical decisions
Any entity substantially modifying a high-risk system (Art.25(1) triggers provider obligations)

Art.10 applies to three dataset categories:

Dataset	Art.10 Obligation	When Triggered
Training datasets	Full Art.10(2) data governance	Before any training run
Validation datasets	Full Art.10(2) data governance	During development and evaluation
Testing datasets	Full Art.10(2) data governance	Pre-deployment testing, Art.9(6) compliance testing

The word "appropriate" in Art.10(1) gives some proportionality room, but the six specific areas in Art.10(2) are not optional — they describe what appropriate data governance must cover.

The Six Art.10(2) Obligations

Art.10(2) defines six areas that data governance and management practices must cover. These are the auditable baseline for any high-risk AI system:

1. Relevant Design Choices (Art.10(2)(a))

Providers must document the relevant design choices made regarding data collection. This includes:

Which data sources were selected and why
What collection methodologies were applied
What sampling strategies were used
How temporal coverage was determined

The emphasis on relevance means you need to justify your choices against the intended purpose and the Annex III category. A healthcare diagnostic AI sourcing training data from a single hospital network must document why that source adequately represents the intended deployment population.

2. Data Collection Procedures and Purpose (Art.10(2)(b))

Data collection procedures must be documented and must establish purpose limitations — the data must be collected for a specified purpose that is consistent with the high-risk AI use case. Art.10(2)(b) creates a direct link to GDPR Art.5(1)(b) (purpose limitation) for personal data in training sets.

Operationally, this means:

Documented collection protocols (web scraping scripts, API access logs, data purchase agreements)
Purpose definitions written before collection begins
Records of which data subjects or data sources consented to AI training use

3. Data Processing Operations (Art.10(2)(c))

All preprocessing, cleaning, enriching, aggregating, and labelling operations must be documented. This creates the processing chain record that links raw inputs to final training datasets. Art.10(2)(c) is the closest Art.10 comes to requiring data lineage tracking — though the obligation is documentation-level, not technical tooling.

The practical implication: every transformation applied to a dataset must be traceable. If you drop rows, normalise features, apply oversampling, or run deduplication, the method and parameters must be documented and repeatable.

4. Formulation of Assumptions (Art.10(2)(d))

Providers must document the assumptions made regarding data — particularly assumptions about what the data represents. This includes:

The population or world-state the data is intended to model
Known limitations in the data's representativeness
Assumptions about temporal stability (will the data distribution hold in deployment?)
Assumptions about the relationship between training context and deployment context

This is one of the most underestimated obligations. The EU AI Act does not require providers to guarantee their assumptions are correct — but it requires that they be explicit, so that supervisory authorities can evaluate whether the assumptions were reasonable.

5. Availability, Quantity, and Suitability Assessment (Art.10(2)(e))

Providers must examine whether the data is:

Available in sufficient quantities for the intended use case
Quantitatively sufficient given the system's output complexity and the performance standard required under Art.9(6)
Suitable — meaning fit for purpose given the Annex III category, the deployment context, and the risk profile established under Art.9

This is where dataset size justification becomes a compliance artefact. The technical documentation must contain a reasoned assessment of why the dataset size is adequate, not merely a count of samples.

6. Examination for Biases (Art.10(2)(f))

The final obligation is to examine whether training, validation, and testing data may contain biases that could lead to violations of EU law or harm to health, safety, or fundamental rights, and if applicable, to mitigate those biases.

Art.10(2)(f) works in conjunction with Art.10(5)–(6) (the special category carve-out). The obligation to examine for bias is unconditional. The ability to process special-category data to detect and correct that bias is conditional on the Art.10(5)–(6) requirements being met.

Dataset Quality Requirements: Art.10(3) and (4)

Beyond the six management obligations, Art.10 sets substantive quality standards:

Art.10(3): Training, validation, and testing datasets shall be:

Relevant to the intended purpose
Sufficiently representative
Free of errors and complete to the best extent possible
Appropriate for the specific deployment context (Art.10(5))

The phrase "to the best extent possible" is a proportionality qualifier — perfect data is not required, but providers must make demonstrable, documented efforts to maximise data quality within reasonable constraints.

Art.10(4): Datasets shall have the appropriate statistical properties, including regarding persons or groups of persons on whom the system produces outputs. This is particularly relevant for:

Demographic breakdown of training populations vs. deployment populations
Class imbalance documentation and mitigation
Sub-group performance metrics (disaggregated evaluation)
Temporal distribution alignment

Statistical property documentation is one of the hardest things to retrofit. Providers who collect data without capturing demographic metadata will struggle to demonstrate Art.10(4) compliance at audit time.

Dataset Split Strategy for Art.10 Compliance

Art.10 applies separately to training, validation, and testing datasets. This has operational consequences for how you structure your ML pipeline:

The Three-Dataset Problem

Many practitioners treat the train/test split as a technical detail. Under Art.10, it becomes a compliance artefact:

Training set: Subject to Art.10(2) in full. All preprocessing, bias examination, and quality assessment must be documented before training begins.
Validation set: Must be held out from training and documented separately. Using validation data to tune hyperparameters is permitted, but the validation set must meet the same Art.10(3)–(4) quality standards as the training set.
Test set: Must be genuinely unseen — not used during development. Art.9(6) requires testing datasets to be appropriate for the intended purpose and consistent with the deployment context (Art.10(5)).

Avoiding Contamination

Data contamination — where information from the test set leaks into training through shared preprocessing, shared tokenisation vocabularies, or shared normalisation statistics — creates a compliance gap. Art.10(2)(c) requires documentation of all processing operations. If preprocessing statistics are derived from the full dataset before splitting, this must be documented and justified.

Best practice under Art.10:

Define splits before any exploratory analysis
Compute normalisation statistics on training split only
Apply the same transformations to validation and test splits using training-derived parameters
Document the split methodology, including any stratification used to maintain class balance

Stratified Splits and Art.10(4)

Where Art.10(4) requires appropriate statistical properties "including regarding persons or groups of persons," stratified splitting becomes not just good practice but a compliance obligation for Annex III systems. If your system affects individuals in protected-characteristic categories (age, sex, ethnicity, disability), the test set must contain sufficient representation to enable meaningful disaggregated evaluation.

Data Lineage Documentation for Annex IV

Annex IV Section 2 requires technical documentation to include "a description of the training, testing and validation methodologies used including information on the type, origin, and provenance of the training data sets."

Art.10(2)(a)–(c) translate into a lineage record that must capture at minimum:

Dataset Lineage Record
├── Source Registration
│   ├── Source ID, Name, Type (licensed corpus, web scrape, internal data)
│   ├── Access date and URL/endpoint
│   ├── License or data processing agreement reference
│   └── Known limitations of the source
├── Collection Run Record
│   ├── Collection method (API, download, annotator task)
│   ├── Parameters (date range, geographic scope, sampling strategy)
│   ├── Collection timestamp
│   └── Raw record count
├── Processing Chain
│   ├── Step 1: Deduplication (method, parameters, records removed)
│   ├── Step 2: Quality filtering (criteria, threshold, records removed)
│   ├── Step N: ... (all transformation steps)
│   └── Final record count per split
└── Quality Assessment
    ├── Bias examination method and findings (Art.10(2)(f))
    ├── Statistical properties documentation (Art.10(4))
    └── Suitability assessment (Art.10(2)(e))

This record must be reproducible — a supervisory authority requesting a repeat of the data preparation pipeline must be able to reconstruct the final dataset from the lineage record.

The Art.10(5)–(6) Bias Detection Carve-Out

Art.10(5) and (6) contain one of the most important — and most misunderstood — provisions in the EU AI Act. Art.10(5) permits processing of special categories of personal data (GDPR Art.9(1) data — race, ethnic origin, health, biometric data, etc.) specifically for the purpose of bias monitoring, detection, and correction.

The carve-out has strict conditions under Art.10(6):

Condition	Requirement
Strict necessity	Processing must be strictly necessary for bias detection/correction — no broader purpose
Subject to appropriate safeguards	GDPR Art.9(4) safeguards apply; national law may impose additional requirements
No transmission outside	Special category data used for bias detection must not be transmitted or further processed beyond this purpose
Protective measures	State-of-the-art security measures, anonymisation as soon as possible

What This Carve-Out Enables in Practice

Without Art.10(6), providers would be blocked from using demographic data (ethnicity, health status, disability) to detect that their system performs worse for certain groups, because processing that data for AI training purposes would typically lack a lawful basis under GDPR Art.9(2).

Art.10(6) provides that lawful basis for bias monitoring — but only for bias monitoring. You cannot use the carve-out to:

Train on special-category data for purposes other than bias correction
Retain special-category bias data beyond what is necessary for the detection task
Transfer the bias analysis dataset to third-party evaluators without separate legal basis

Implementing Art.10(6) Correctly

# Example: Bias evaluation with Art.10(6) carve-out compliance

class BiasEvaluationPipeline:
    """
    Art.10(6)-compliant bias detection pipeline.
    Special-category data processed under strict necessity;
    anonymised as soon as evaluation completes.
    """
    
    def __init__(self, model, protected_attributes: list[str]):
        self.model = model
        self.protected_attributes = protected_attributes
        self._evaluation_log = []
    
    def evaluate_bias(
        self,
        evaluation_dataset: pd.DataFrame,
        label_col: str,
        prediction_col: str,
    ) -> BiasReport:
        # 1. Process under Art.10(6) — strictly necessary scope only
        results = {}
        for attr in self.protected_attributes:
            if attr not in evaluation_dataset.columns:
                continue
            # 2. Compute disaggregated metrics — never retain raw attribute data
            group_metrics = self._compute_group_metrics(
                evaluation_dataset, attr, label_col, prediction_col
            )
            results[attr] = group_metrics
            
        # 3. Anonymise immediately — no retention beyond this function
        report = BiasReport(
            metrics=results,
            evaluation_timestamp=datetime.utcnow(),
            art10_6_basis="bias monitoring and detection",
        )
        
        # 4. Log processing for Art.10(2)(c) documentation
        self._evaluation_log.append({
            "timestamp": report.evaluation_timestamp.isoformat(),
            "attributes_examined": self.protected_attributes,
            "records_processed": len(evaluation_dataset),
            "special_category_retained": False,  # critical: must be False
        })
        
        return report
    
    def _compute_group_metrics(
        self, df: pd.DataFrame, group_col: str, label_col: str, pred_col: str
    ) -> dict:
        metrics = {}
        for group_val in df[group_col].unique():
            mask = df[group_col] == group_val
            group_df = df[mask]
            # Compute only aggregate statistics — never return individual rows
            metrics[str(group_val)] = {
                "n": int(mask.sum()),
                "accuracy": float((group_df[label_col] == group_df[pred_col]).mean()),
                "positive_rate": float(group_df[pred_col].mean()),
            }
        return metrics
    
    def get_processing_log(self) -> list[dict]:
        """Returns audit log for Art.10(2)(c) documentation."""
        return self._evaluation_log

The critical constraint: special_category_retained: False. At no point should the pipeline persist individual-level special-category records. Only aggregate metrics flow out.

Integration with Art.9: The Data-Risk Loop

Art.10 and Art.9 form a closed loop in the high-risk compliance framework:

Art.9(2): Identify risks from data (including biases in training data)
     ↓
Art.10(2)(f): Examine training data for biases that could cause those risks
     ↓
Art.10(6): Process special-category data to detect and correct bias
     ↓
Art.9(3): Evaluate residual risk after bias correction — is it acceptable?
     ↓
Art.9(6): Test with Art.10(3)-compliant testing datasets
     ↓
Art.9(4)(d): Update risk management when new data bias evidence emerges
     ↑ (feedback loop)

This loop has a practical implication: Art.10 data governance is not a one-time pre-training activity. Post-market monitoring under Art.72 can generate new evidence of bias in deployment (e.g., demographic disparities in outcomes). When this evidence emerges, Art.9 requires the RMS to be updated, which may trigger a new round of Art.10 data review and retraining.

The living document obligation in Art.9 extends to your data governance records. The Annex IV technical documentation must reflect the current state of your dataset and bias mitigation, not the state at initial market placement.

Art.10 × Art.15: Accuracy, Robustness, and Dataset Quality

Art.15 requires high-risk AI systems to achieve "appropriate levels of accuracy, robustness and cybersecurity." The connection to Art.10 is direct: the quality of the training dataset is the primary determinant of accuracy and robustness.

Art.15 does not set specific accuracy thresholds — those are determined by the harmonised standards or common specifications adopted under Art.40–41. But providers cannot invoke Art.15 compliance without demonstrating that the training data met Art.10 standards, because Art.15 performance cannot be evaluated independently of the data used to produce it.

For Annex IV technical documentation, this means accuracy metrics must be accompanied by:

The dataset on which accuracy was measured (test split, characteristics)
The bias examination conducted under Art.10(2)(f)
Evidence that the test set meets Art.10(3)–(5) quality requirements

Art.10 Checklist — 30 Items

Use this checklist to assess Art.10 readiness before market placement:

Data Governance Documentation (Art.10(2))

Relevant design choices for data collection documented (sources, sampling strategy, temporal scope)
Collection procedures documented with purpose definitions
All preprocessing operations documented with method and parameters
Assumptions about data representativeness explicitly stated
Dataset quantity and suitability assessment completed and documented
Bias examination performed and findings documented for each split

Dataset Quality (Art.10(3)–(5))

Relevance to intended purpose assessed against Annex III category
Statistical representativeness evaluated (including demographic representation)
Error rate and completeness examined; known errors documented
Deployment context alignment assessed (geographic, behavioural, functional)
Art.10(5) contextual considerations documented for the specific use case

Statistical Properties (Art.10(4))

Class distribution documented for each split
Demographic breakdown documented where relevant to outputs
Disaggregated performance metrics computed and documented
Sub-group performance gaps identified and addressed

Dataset Split Protocol

Training, validation, and test splits defined before exploratory analysis
Normalisation statistics derived from training split only
Split stratification documented (by class, by demographic group)
Test set confirmed as genuinely unseen throughout development
Split contamination check completed

Bias Detection (Art.10(2)(f) and Art.10(6))

Bias examination methodology selected and documented
Art.10(6) special-category processing basis established if applicable
Bias evaluation logs retained (Art.10(2)(c) processing records)
Aggregate-only metrics retained — no individual special-category records
Bias mitigation measures applied and documented where bias detected
Residual bias post-mitigation documented

Lineage and Annex IV

Data lineage record covers source, collection, processing, quality assessment
Lineage is reproducible from documentation alone
Annex IV Section 2 (data description) completed using lineage records
Art.10 documentation version-controlled alongside model artefacts
Update trigger defined: when new bias evidence triggers Art.10 review

Common Art.10 Failure Modes

Treating Art.10 as a one-time gate. Art.10 data governance must be updated when new post-market evidence suggests dataset issues. Providers who treat the initial data review as permanent are non-compliant when deployment evidence reveals bias.

Not separating training statistics from test split. Normalisation using full-dataset statistics contaminates the test split. This is not just a methodological problem — it means your Annex IV accuracy claim is based on a compromised evaluation.

Conflating the Art.10(6) carve-out with a general AI training basis. Art.10(6) permits special-category processing only for bias monitoring, detection, and correction. Using the carve-out as a general authorisation to collect demographic data for training purposes is a GDPR Art.9 violation.

Failing to document the absence of bias. "No bias found" is a finding that requires documentation: what method was used, what threshold was applied, what groups were examined. An absence claim without a documented examination is not Art.10(2)(f) compliance.

Missing Art.10(4) for sub-groups. Art.10(4) requires statistical properties to be appropriate "regarding persons or groups of persons." This is not satisfied by overall accuracy metrics. Disaggregated evaluation is required.

Key Takeaways

Art.10 is the data foundation of the EU AI Act's high-risk framework. Its requirements are specific, auditable, and interconnected with Art.9 (risk management), Art.15 (accuracy), and Annex IV (technical documentation).

The three highest-impact Art.10 requirements for most providers:

Data lineage documentation: Every source, collection procedure, and preprocessing operation must be traceable. Retrofitting this after model development is significantly harder than building it in from the start.
Disaggregated bias examination: Art.10(2)(f) requires examining for biases across relevant demographic groups, not just overall performance. Missing sub-group evaluation is the most common Art.10 gap found in conformity assessments.
Art.10(6) compliance for bias detection: If your bias detection pipeline touches ethnicity, health status, or other special-category data, you need the Art.10(6) carve-out conditions met — including strict necessity, immediate aggregation, and no retention of individual-level special-category records.

The August 2, 2026 deadline creates urgency for providers who have not yet built these practices into their ML pipelines. The technical documentation required under Annex IV cannot be assembled after the fact without access to the original data sources, collection records, and preprocessing logs.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.

Join the waitlist View plans