2026-04-12·13 min read·sota.io team

EU AI Act Article 10: Data Governance Requirements for High-Risk AI Training Datasets (2026)

On August 2, 2026, Article 10 of the EU AI Act begins applying to providers of new high-risk AI systems placed on the EU market. If you train, fine-tune, validate, or test a model that falls under Annex III — covering healthcare diagnostics, biometric identification, critical infrastructure, employment screening, credit scoring, law enforcement, or education — Article 10 is your mandatory data governance framework.

Most developers treating Article 10 as a vague "use good data" requirement are misreading the regulation. Article 10 contains six specific, auditable data governance obligations in Art.10(2) alone, plus distinct quality criteria, statistical requirements, contextual obligations, and a special-category processing carve-out for bias detection that developers routinely overlook.

This guide covers what Art.10 actually demands, how its requirements intersect with Art.9 (risk management), Art.15 (accuracy), and Art.17 (QMS), where CLOUD Act jurisdiction creates documentation exposure, and how to implement Art.10-compliant data governance in Python.

What Article 10 Actually Requires

Article 10 applies to training, validation, and testing datasets for high-risk AI systems. The article has six subsections:

Art.10(1): Training, validation, and testing datasets shall be subject to appropriate data governance and management practices
Art.10(2): Data governance and management practices shall cover specific areas (the six obligations)
Art.10(3): Datasets shall be relevant, sufficiently representative, and to the best extent possible, free of errors and complete
Art.10(4): Datasets shall have the appropriate statistical properties, including regarding persons or groups of persons
Art.10(5): Datasets shall take into account characteristics of the specific geographical, contextual, behavioural or functional setting
Art.10(6): Providers may process special categories of personal data for bias monitoring, detection, and correction under strict conditions

The operational core is Art.10(2), which lists the six areas that data governance practices must cover.

Art.10(2): The Six Data Governance Obligations

Art.10(2) specifies that data governance and management practices must cover:

(a) Design Choices

The intended purpose of the AI system determines which data is relevant. Art.10(2)(a) requires documentation of the design choices made — why specific datasets were selected, what alternatives were considered, and what the dataset is designed to make the model learn. This connects directly to Art.13(1) transparency: users of the system must be able to understand what the system was trained to do.

In practice: your model card must explain dataset selection rationale, not just dataset composition.

(b) Data Collection Processes

Art.10(2)(b) requires documentation of how data was collected — the process, not just the result. This includes:

The collection method (web scraping, licensed datasets, in-house annotation, third-party data brokers)
The time period and geographic scope of collection
The terms under which data was obtained (consent, legitimate interest, public domain)
Any filtering applied during collection

Why this matters: the collection process determines what systematic biases are baked into the dataset before any preprocessing occurs. A resume screening model trained on historically male-dominated engineering applications will inherit gender bias at collection, not at training.

(c) Data Preparation Operations

Art.10(2)(c) covers relevant data preparation operations including annotation, labelling, cleaning, enrichment, aggregation, and retention. Each of these can introduce or amplify bias:

Annotation: Inter-annotator agreement must be documented. Whose labels are ground truth?
Labelling: Label schema decisions shape what the model can predict
Cleaning: What data was removed and why? Removing "outliers" can remove under-represented groups
Enrichment: What features were added? Third-party enrichment can introduce proxies for protected characteristics
Aggregation: How were multiple sources combined? Differential source weights create differential representation
Retention: What data was excluded from final datasets? Exclusion criteria must be auditable

(d) Formulation of Relevant Assumptions

Art.10(2)(d) requires explicit documentation of assumptions made about the data — what the provider assumes the data represents, and what the data does not represent. This is an epistemological obligation: you must state what you believe the dataset means and what limitations that belief carries.

Examples:

"This dataset represents adults aged 18–65 in the DACH region; performance on populations outside this demographic has not been validated"
"Annotation reflects consensus of English-speaking annotators; cross-cultural validity has not been assessed"

Undocumented assumptions are the most common source of Art.10 non-compliance. If your technical documentation does not state what your training data assumes, you have not met Art.10(2)(d).

(e) Availability, Quantity, and Suitability Assessment

Art.10(2)(e) requires assessment of whether the available data is sufficient in quantity and suitable in quality for the intended purpose. This is not a qualitative statement — it requires a documented methodology for evaluating dataset sufficiency.

The assessment must cover:

Is the dataset large enough for the model to generalise? (quantity)
Does the dataset cover the use cases the system will encounter in deployment? (suitability)
Are validation and test splits genuinely independent from training data? (methodology)

(f) Examination for Possible Biases

Art.10(2)(f) requires examination for biases that could result in prohibited discrimination or risks to health and safety. This is the bias detection obligation. Art.10(6) provides a special-category processing carve-out specifically to enable this examination (covered below).

The examination must be documented — not just conducted. "We ran fairness metrics" is insufficient. The documentation must show:

What bias types were examined (representation bias, measurement bias, aggregation bias)
What groups were assessed (per protected characteristics)
What metrics were used and what thresholds were applied
What findings were made and what mitigations were taken

Art.10(3): Quality Criteria — Relevant, Representative, Error-Free

Art.10(3) sets quality criteria for datasets:

Training, validation, and testing datasets shall be relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose.

The three quality criteria operate jointly:

Relevant: The dataset must relate to the decision the system makes. A fraud detection model trained on transactions from a different regulatory jurisdiction may be technically high-quality but irrelevant to the deployment context.

Sufficiently representative: The dataset must reflect the diversity of cases the system will encounter in deployment. "Sufficiently representative" is a relative standard — representative relative to the use case, the affected population, and the intended deployment context. A dataset that is representative for urban German users may not be representative for rural Polish users of the same system.

Free of errors and complete: "To the best extent possible" softens this requirement, but it creates a due diligence obligation. The provider must document the error rate found and the mitigation applied — not claim that data is error-free without evidence.

Art.10(4): Statistical Properties and Fairness

Art.10(4) requires that datasets have appropriate statistical properties, including regarding persons or groups of persons. This is the quantitative fairness requirement.

"Appropriate statistical properties" means:

The distribution of examples across the outcome space is appropriate for the task
Where the system affects specific demographic groups, the dataset must have adequate representation of those groups
Statistical properties must be documented and justified (not just asserted)

For classification tasks, this requires class balance analysis. For regression tasks, coverage of the output range must be demonstrated. For systems affecting protected groups under EU law (gender, race, ethnicity, age, disability, sexual orientation, religion), the dataset distribution across those groups must be documented and assessed for differential performance risk.

Art.10(5): Contextual Appropriateness

Art.10(5) adds a contextual dimension: datasets shall take into account characteristics of the specific geographical, contextual, behavioural, or functional setting in which the high-risk AI system is intended to be used.

This obligation targets geographic and contextual transfer problems. A system validated on UK hospital records that is deployed in German hospitals faces an Art.10(5) gap — different healthcare system structure, different patient demographics, different documentation practices. The provider must document how the training data maps to the intended deployment context.

Contextual characteristics include:

Geographical: Country, region, urban/rural, language
Contextual: Professional domain, organisational setting, regulatory environment
Behavioural: How end users interact with the system vs. how training data was generated
Functional: The specific decisions the system makes vs. the decisions in training

Art.10(6): Special Categories for Bias Detection

Art.10(6) is the most technically important carve-out in Art.10. It permits providers to process special categories of personal data (GDPR Art.9 data — race, ethnic origin, health, sexual orientation, political opinion, religious belief) for the purpose of detecting and correcting biases in high-risk AI systems, subject to strict conditions:

Appropriate safeguards: encryption, pseudonymisation, access controls
Strictly necessary: the special-category processing must be limited to what is required for bias detection
Only data for bias detection: the special-category data may not be used for training the AI system itself
State-of-the-art security: as defined in applicable data protection law

The Art.10(6) carve-out resolves a practical conflict: you cannot examine whether a model discriminates by ethnicity without knowing the ethnicity of your validation set subjects. GDPR Art.9 normally prohibits processing ethnicity data. Art.10(6) creates a derogation from that prohibition, but only for the bias detection purpose.

In practice: special-category data used for Art.10(6) bias assessment must be:

Separated from training data (cannot be fed into the model)
Pseudonymised at minimum
Deleted after the bias assessment is complete (or retained only as aggregate statistics)
Covered by a specific GDPR legal basis documentation entry

Art.10 × Art.9: The Risk Management Loop

Art.10 and Art.9 form a closed feedback loop:

Art.9 identifies foreseeable risks
        ↓
Art.10(2)(f) examines training data for bias risks
        ↓
Bias findings feed back into Art.9 risk assessment
        ↓
Art.9 requires risk management measures
        ↓
Measures implemented in data pipeline or model training
        ↓
Art.10(3) quality criteria re-assessed
        ↓
Loop continues throughout system lifecycle

The key connection: Art.9(4) requires identification of foreseeable risks. A foreseeable risk for any model trained on demographic data is differential performance across protected groups. Art.10(2)(f) is the mechanism by which that foreseeable risk is examined and documented. The bias examination results must flow back into the Art.9 risk management record.

Providers who maintain Art.9 and Art.10 documentation in separate silos — a risk register and a dataset card without cross-references — are meeting neither requirement in the spirit of the regulation.

Art.10 × Art.15: Accuracy, Robustness, and Dataset Quality

Art.15 requires high-risk AI systems to achieve an appropriate level of accuracy, robustness, and cybersecurity. Dataset quality under Art.10 is the upstream determinant of achievable accuracy.

The intersection creates a compliance chain:

Art.10(3): Dataset must be relevant and representative → enables Art.15 accuracy
Art.10(4): Statistical properties must be appropriate → enables Art.15 robustness
Art.10(5): Contextual match between training and deployment → enables Art.15 accuracy in the specific deployment setting

A system that meets Art.15 accuracy thresholds on a non-representative test set while failing Art.10(3) representativeness criteria is non-compliant with both articles. The test set itself must meet Art.10 quality criteria — it cannot be selected to produce favourable accuracy numbers.

CLOUD Act × Art.10: Where Training Data Must Be Stored

Training datasets for high-risk AI systems create documentation obligations that intersect with CLOUD Act jurisdiction risk.

Art.10(2) requires documentation of data governance and management practices. Art.17 requires that technical documentation — which includes dataset documentation under Annex IV — be retained and made available to competent authorities for 10 years.

If your training data documentation, dataset bias assessments, and data governance records are stored on US-provider infrastructure, those records are accessible to US authorities under the CLOUD Act without EU judicial review or notification to you. For high-risk AI systems operating in regulated sectors (health, employment, law enforcement), this creates a dual exposure:

Audit risk: If an EU market surveillance authority requests Art.10(2) documentation during an incident investigation, CLOUD Act disclosure means that documentation may already have been accessed by US law enforcement without your knowledge
GDPR cross-reference: Special-category data processed under Art.10(6) for bias detection is subject to GDPR Chapter V transfer restrictions — storing it on US infrastructure without appropriate safeguards creates a separate GDPR violation

The practical mitigation: store Art.10 documentation on EU-sovereign infrastructure. Training datasets themselves (if they contain personal data) must comply with GDPR transfer rules regardless of Art.10.

Python Implementation: Art.10-Compliant Data Governance

from dataclasses import dataclass, field
from datetime import date, datetime
from enum import Enum
from typing import Optional
import hashlib


class BiasType(Enum):
    REPRESENTATION = "representation"
    MEASUREMENT = "measurement"
    AGGREGATION = "aggregation"
    HISTORICAL = "historical"
    DEPLOYMENT = "deployment"


class DatasetSplit(Enum):
    TRAINING = "training"
    VALIDATION = "validation"
    TEST = "test"


@dataclass
class DataCollectionRecord:
    """Art.10(2)(b) — Data collection process documentation."""
    method: str                      # e.g., "licensed_dataset", "web_scrape", "in_house_annotation"
    time_period_start: date
    time_period_end: date
    geographic_scope: list[str]      # ISO 3166-1 country codes
    legal_basis: str                 # e.g., "consent", "legitimate_interest", "public_domain"
    third_party_sources: list[str]
    filters_applied: list[str]
    collection_date: date = field(default_factory=date.today)

    def audit_record(self) -> dict:
        return {
            "method": self.method,
            "period": f"{self.time_period_start}–{self.time_period_end}",
            "geography": self.geographic_scope,
            "legal_basis": self.legal_basis,
            "third_party_sources": len(self.third_party_sources),
            "filters": self.filters_applied,
        }


@dataclass
class DataPreparationRecord:
    """Art.10(2)(c) — Data preparation operations documentation."""
    annotation_schema_version: str
    inter_annotator_agreement: Optional[float]  # Cohen's kappa or similar
    labelling_guidelines_hash: str              # SHA-256 of labelling spec
    cleaning_operations: list[str]
    records_removed: int
    removal_reasons: list[str]
    enrichment_sources: list[str]
    aggregation_weights: dict[str, float]       # source → weight
    retention_policy_days: int

    def validate_iaa(self) -> bool:
        """Warn if inter-annotator agreement is below acceptable threshold."""
        if self.inter_annotator_agreement is None:
            return False  # IAA not measured — non-compliant
        return self.inter_annotator_agreement >= 0.70  # κ ≥ 0.70 substantial agreement


@dataclass
class BiasAssessmentRecord:
    """Art.10(2)(f) + Art.10(6) — Bias examination documentation."""
    assessed_by: str
    assessment_date: date
    bias_types_examined: list[BiasType]
    protected_characteristics_assessed: list[str]  # e.g., ["gender", "age", "ethnicity"]
    special_category_data_used: bool               # Art.10(6) carve-out activated
    special_category_safeguards: list[str]         # e.g., ["pseudonymised", "separate_split", "deleted_after_assessment"]
    metrics_used: list[str]                        # e.g., ["demographic_parity", "equalised_odds", "calibration"]
    findings: list[str]
    mitigations_applied: list[str]
    residual_risk_documented: bool

    def art10_6_compliant(self) -> bool:
        """Check Art.10(6) special-category processing safeguards."""
        if not self.special_category_data_used:
            return True
        required_safeguards = {"pseudonymised", "separate_split", "deleted_after_assessment"}
        actual = set(self.special_category_safeguards)
        return required_safeguards.issubset(actual)


@dataclass
class DatasetQualityAssessment:
    """Art.10(3)/(4)/(5) — Dataset quality, statistical properties, contextual fit."""
    dataset_id: str
    split: DatasetSplit
    total_records: int
    error_rate_percent: float               # Art.10(3) — free of errors
    class_distribution: dict[str, float]   # Art.10(4) — statistical properties
    intended_deployment_geography: list[str]
    training_data_geography: list[str]
    contextual_gap_documented: bool         # Art.10(5) — contextual characteristics
    contextual_gap_description: Optional[str]
    sufficiency_assessment_method: str     # Art.10(2)(e)
    sufficiency_conclusion: str

    def geographic_coverage_gap(self) -> list[str]:
        """Identify deployment geographies not represented in training data."""
        deploy_set = set(self.intended_deployment_geography)
        train_set = set(self.training_data_geography)
        return list(deploy_set - train_set)

    def quality_flags(self) -> list[str]:
        flags = []
        if self.error_rate_percent > 2.0:
            flags.append(f"HIGH_ERROR_RATE: {self.error_rate_percent}% (>2% threshold)")
        gaps = self.geographic_coverage_gap()
        if gaps:
            flags.append(f"GEOGRAPHIC_GAP: {gaps}")
        if not self.contextual_gap_documented and gaps:
            flags.append("CONTEXTUAL_GAP_NOT_DOCUMENTED: Art.10(5) non-compliant")
        return flags


@dataclass
class Art10DataGovernanceRecord:
    """Complete Art.10 data governance record for a high-risk AI system dataset."""
    system_id: str
    system_name: str
    intended_purpose: str                    # Art.10(2)(a) design choice basis
    design_choice_rationale: str             # Art.10(2)(a)
    assumptions: list[str]                  # Art.10(2)(d)
    collection: DataCollectionRecord
    preparation: DataPreparationRecord
    quality: DatasetQualityAssessment
    bias_assessment: BiasAssessmentRecord
    record_created: datetime = field(default_factory=datetime.now)
    retained_until: Optional[date] = None   # Art.17: 10 years from last market placement

    def compliance_summary(self) -> dict:
        """Generate Art.10 compliance summary for technical documentation."""
        flags = self.quality.quality_flags()
        return {
            "system_id": self.system_id,
            "art10_2a_design_documented": bool(self.design_choice_rationale),
            "art10_2b_collection_documented": True,
            "art10_2c_preparation_documented": True,
            "art10_2d_assumptions_documented": len(self.assumptions) > 0,
            "art10_2e_sufficiency_assessed": bool(self.quality.sufficiency_assessment_method),
            "art10_2f_bias_examined": len(self.bias_assessment.findings) >= 0,
            "art10_3_quality_flags": flags,
            "art10_4_iaa_valid": self.preparation.validate_iaa(),
            "art10_5_contextual_gaps": self.quality.geographic_coverage_gap(),
            "art10_6_compliant": self.bias_assessment.art10_6_compliant(),
            "overall_compliant": len(flags) == 0 and self.bias_assessment.art10_6_compliant(),
        }

Five Common Art.10 Mistakes

Mistake 1: Using the Test Set to Validate Quality, Not to Measure It

Art.10(3) requires that training, validation, and testing datasets are relevant, representative, and error-free. A common mistake is applying data quality checks only to the training set, then using the test set as-is from a benchmark. Benchmark test sets often have known systematic biases (ImageNet label noise, NLP benchmark contamination) — using them without documentation violates Art.10(3) for the test split.

Mistake 2: Treating Art.10(2)(d) Assumptions as Optional

The obligation to document formulated assumptions is frequently omitted from dataset cards. Developers assume this is covered by "model limitations" sections. It is not. Art.10(2)(d) requires explicit identification of what the provider assumes the dataset represents — not just what the model cannot do. An undocumented assumption is an undocumented foreseeable risk under Art.9(4).

Mistake 3: Confusing Statistical Balance with Representativeness

Art.10(4) requires appropriate statistical properties regarding persons or groups of persons. A dataset that is numerically balanced (equal numbers of examples per class) may still be unrepresentative if the class structure does not reflect real-world incidence rates. A fraud detection model trained on 50/50 fraud/legitimate splits will be miscalibrated on production data where fraud is 0.1% — and Art.10(4) requires this calibration question to be addressed.

Mistake 4: Not Activating Art.10(6) When Needed, Then Not Meeting Its Safeguards When Activated

Providers either skip bias examination entirely (because they do not want to process sensitive demographics) — violating Art.10(2)(f) — or they process special-category data for bias assessment without implementing the Art.10(6) safeguards (pseudonymisation, strict separation from training data, post-assessment deletion) — violating both Art.10(6) and GDPR Art.9.

Mistake 5: Treating Data Governance Documentation as a One-Time Artefact

Art.10 applies throughout the system lifecycle. When training data is updated, augmented, or replaced — as happens during retraining cycles — Art.10 documentation must be updated. A dataset card written at initial launch that is not updated when retraining occurs creates a documentation gap. This gap becomes an Art.10 violation during any market surveillance investigation that requests current dataset documentation.

30-Item Art.10 Data Governance Checklist

Art.10(2)(a) — Design Choices

1. Dataset selection rationale documented: why this dataset for this intended purpose
2. Alternative datasets considered and rejected — reasons documented
3. Intended purpose mapped to specific data requirements

Art.10(2)(b) — Data Collection

4. Collection method documented (web scrape / licensed / in-house / broker)
5. Collection time period and geographic scope documented
6. Legal basis for collection documented (consent / LI / public domain)
7. Third-party data sources listed with provenance records
8. Collection-stage filters documented

Art.10(2)(c) — Data Preparation

9. Annotation schema version-controlled and hashed
10. Inter-annotator agreement measured and documented (κ ≥ 0.70 target)
11. Labelling guidelines available in technical documentation
12. Cleaning operations documented: records removed + removal reasons
13. Enrichment sources listed with bias-introduction assessment
14. Aggregation weights documented per source
15. Retention policy documented (training vs. validation vs. test)

Art.10(2)(d) — Assumptions

16. Explicit assumptions about what the dataset represents — documented
17. Explicit assumptions about populations NOT represented — documented
18. Assumptions validated against deployment context (Art.10(5) cross-check)

Art.10(2)(e) — Sufficiency Assessment

19. Dataset quantity assessment: sufficient for intended model generalisation
20. Dataset suitability assessment: covers deployment scenarios
21. Train/validation/test split independence verified
22. Sufficiency methodology and conclusion documented

Art.10(2)(f) + Art.10(3)/(4) — Bias Examination and Quality

23. Bias types examined: representation, measurement, aggregation, historical
24. Protected characteristics assessed: gender, age, ethnicity, disability (as applicable)
25. Fairness metrics applied: demographic parity, equalised odds, calibration
26. Error rate measured per split — documented with mitigation
27. Class distribution documented and calibration assessed (Art.10(4))
28. Bias findings documented with mitigations applied and residual risk noted

Art.10(5) — Contextual Appropriateness

29. Geographic coverage gap between training data and deployment context — documented
30. Contextual, behavioural, and functional gaps documented with performance validation plan

Art.10(6) — Special-Category Data for Bias Detection (if applicable)

Triggered only if special-category demographic data was processed for bias assessment
Special-category data pseudonymised before use
Special-category data kept strictly separate from training split
Special-category data deleted after bias assessment (aggregate statistics retained)
GDPR Art.9(2)(g)/(j) legal basis documented in ROPA

Art.10 × Art.99 Penalty Exposure

Data governance failures under Art.10 fall under Art.99(3) when they cause non-conformity of the high-risk AI system:

Training on non-representative data → discriminatory outputs → Art.10(3) + Art.15 non-compliance: up to €30 million or 6% of global annual turnover
Missing Art.10(2) documentation during MSA investigation: up to €15 million or 3% of global annual turnover (Art.99(4) — providing incorrect information)
Art.10(6) safeguards not met → special-category data misuse: up to €30 million or 6% (Art.99(3)) plus parallel GDPR Art.9 exposure under GDPR Art.83(4) (up to €10 million or 2% of annual turnover)

The GDPR parallel exposure is the practical reason to take Art.10(6) safeguards seriously: a single bias assessment that processes ethnicity data without pseudonymisation creates simultaneous EU AI Act and GDPR fines.