2026-04-12·13 min read·sota.io team

EU AI Act Article 10: Data Governance Requirements for High-Risk AI Training Datasets (2026)

On August 2, 2026, Article 10 of the EU AI Act begins applying to providers of new high-risk AI systems placed on the EU market. If you train, fine-tune, validate, or test a model that falls under Annex III — covering healthcare diagnostics, biometric identification, critical infrastructure, employment screening, credit scoring, law enforcement, or education — Article 10 is your mandatory data governance framework.

Most developers treating Article 10 as a vague "use good data" requirement are misreading the regulation. Article 10 contains six specific, auditable data governance obligations in Art.10(2) alone, plus distinct quality criteria, statistical requirements, contextual obligations, and a special-category processing carve-out for bias detection that developers routinely overlook.

This guide covers what Art.10 actually demands, how its requirements intersect with Art.9 (risk management), Art.15 (accuracy), and Art.17 (QMS), where CLOUD Act jurisdiction creates documentation exposure, and how to implement Art.10-compliant data governance in Python.


What Article 10 Actually Requires

Article 10 applies to training, validation, and testing datasets for high-risk AI systems. The article has six subsections:

The operational core is Art.10(2), which lists the six areas that data governance practices must cover.


Art.10(2): The Six Data Governance Obligations

Art.10(2) specifies that data governance and management practices must cover:

(a) Design Choices

The intended purpose of the AI system determines which data is relevant. Art.10(2)(a) requires documentation of the design choices made — why specific datasets were selected, what alternatives were considered, and what the dataset is designed to make the model learn. This connects directly to Art.13(1) transparency: users of the system must be able to understand what the system was trained to do.

In practice: your model card must explain dataset selection rationale, not just dataset composition.

(b) Data Collection Processes

Art.10(2)(b) requires documentation of how data was collected — the process, not just the result. This includes:

Why this matters: the collection process determines what systematic biases are baked into the dataset before any preprocessing occurs. A resume screening model trained on historically male-dominated engineering applications will inherit gender bias at collection, not at training.

(c) Data Preparation Operations

Art.10(2)(c) covers relevant data preparation operations including annotation, labelling, cleaning, enrichment, aggregation, and retention. Each of these can introduce or amplify bias:

(d) Formulation of Relevant Assumptions

Art.10(2)(d) requires explicit documentation of assumptions made about the data — what the provider assumes the data represents, and what the data does not represent. This is an epistemological obligation: you must state what you believe the dataset means and what limitations that belief carries.

Examples:

Undocumented assumptions are the most common source of Art.10 non-compliance. If your technical documentation does not state what your training data assumes, you have not met Art.10(2)(d).

(e) Availability, Quantity, and Suitability Assessment

Art.10(2)(e) requires assessment of whether the available data is sufficient in quantity and suitable in quality for the intended purpose. This is not a qualitative statement — it requires a documented methodology for evaluating dataset sufficiency.

The assessment must cover:

(f) Examination for Possible Biases

Art.10(2)(f) requires examination for biases that could result in prohibited discrimination or risks to health and safety. This is the bias detection obligation. Art.10(6) provides a special-category processing carve-out specifically to enable this examination (covered below).

The examination must be documented — not just conducted. "We ran fairness metrics" is insufficient. The documentation must show:


Art.10(3): Quality Criteria — Relevant, Representative, Error-Free

Art.10(3) sets quality criteria for datasets:

Training, validation, and testing datasets shall be relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose.

The three quality criteria operate jointly:

Relevant: The dataset must relate to the decision the system makes. A fraud detection model trained on transactions from a different regulatory jurisdiction may be technically high-quality but irrelevant to the deployment context.

Sufficiently representative: The dataset must reflect the diversity of cases the system will encounter in deployment. "Sufficiently representative" is a relative standard — representative relative to the use case, the affected population, and the intended deployment context. A dataset that is representative for urban German users may not be representative for rural Polish users of the same system.

Free of errors and complete: "To the best extent possible" softens this requirement, but it creates a due diligence obligation. The provider must document the error rate found and the mitigation applied — not claim that data is error-free without evidence.


Art.10(4): Statistical Properties and Fairness

Art.10(4) requires that datasets have appropriate statistical properties, including regarding persons or groups of persons. This is the quantitative fairness requirement.

"Appropriate statistical properties" means:

For classification tasks, this requires class balance analysis. For regression tasks, coverage of the output range must be demonstrated. For systems affecting protected groups under EU law (gender, race, ethnicity, age, disability, sexual orientation, religion), the dataset distribution across those groups must be documented and assessed for differential performance risk.


Art.10(5): Contextual Appropriateness

Art.10(5) adds a contextual dimension: datasets shall take into account characteristics of the specific geographical, contextual, behavioural, or functional setting in which the high-risk AI system is intended to be used.

This obligation targets geographic and contextual transfer problems. A system validated on UK hospital records that is deployed in German hospitals faces an Art.10(5) gap — different healthcare system structure, different patient demographics, different documentation practices. The provider must document how the training data maps to the intended deployment context.

Contextual characteristics include:


Art.10(6): Special Categories for Bias Detection

Art.10(6) is the most technically important carve-out in Art.10. It permits providers to process special categories of personal data (GDPR Art.9 data — race, ethnic origin, health, sexual orientation, political opinion, religious belief) for the purpose of detecting and correcting biases in high-risk AI systems, subject to strict conditions:

  1. Appropriate safeguards: encryption, pseudonymisation, access controls
  2. Strictly necessary: the special-category processing must be limited to what is required for bias detection
  3. Only data for bias detection: the special-category data may not be used for training the AI system itself
  4. State-of-the-art security: as defined in applicable data protection law

The Art.10(6) carve-out resolves a practical conflict: you cannot examine whether a model discriminates by ethnicity without knowing the ethnicity of your validation set subjects. GDPR Art.9 normally prohibits processing ethnicity data. Art.10(6) creates a derogation from that prohibition, but only for the bias detection purpose.

In practice: special-category data used for Art.10(6) bias assessment must be:


Art.10 × Art.9: The Risk Management Loop

Art.10 and Art.9 form a closed feedback loop:

Art.9 identifies foreseeable risks
        ↓
Art.10(2)(f) examines training data for bias risks
        ↓
Bias findings feed back into Art.9 risk assessment
        ↓
Art.9 requires risk management measures
        ↓
Measures implemented in data pipeline or model training
        ↓
Art.10(3) quality criteria re-assessed
        ↓
Loop continues throughout system lifecycle

The key connection: Art.9(4) requires identification of foreseeable risks. A foreseeable risk for any model trained on demographic data is differential performance across protected groups. Art.10(2)(f) is the mechanism by which that foreseeable risk is examined and documented. The bias examination results must flow back into the Art.9 risk management record.

Providers who maintain Art.9 and Art.10 documentation in separate silos — a risk register and a dataset card without cross-references — are meeting neither requirement in the spirit of the regulation.


Art.10 × Art.15: Accuracy, Robustness, and Dataset Quality

Art.15 requires high-risk AI systems to achieve an appropriate level of accuracy, robustness, and cybersecurity. Dataset quality under Art.10 is the upstream determinant of achievable accuracy.

The intersection creates a compliance chain:

A system that meets Art.15 accuracy thresholds on a non-representative test set while failing Art.10(3) representativeness criteria is non-compliant with both articles. The test set itself must meet Art.10 quality criteria — it cannot be selected to produce favourable accuracy numbers.


CLOUD Act × Art.10: Where Training Data Must Be Stored

Training datasets for high-risk AI systems create documentation obligations that intersect with CLOUD Act jurisdiction risk.

Art.10(2) requires documentation of data governance and management practices. Art.17 requires that technical documentation — which includes dataset documentation under Annex IV — be retained and made available to competent authorities for 10 years.

If your training data documentation, dataset bias assessments, and data governance records are stored on US-provider infrastructure, those records are accessible to US authorities under the CLOUD Act without EU judicial review or notification to you. For high-risk AI systems operating in regulated sectors (health, employment, law enforcement), this creates a dual exposure:

  1. Audit risk: If an EU market surveillance authority requests Art.10(2) documentation during an incident investigation, CLOUD Act disclosure means that documentation may already have been accessed by US law enforcement without your knowledge
  2. GDPR cross-reference: Special-category data processed under Art.10(6) for bias detection is subject to GDPR Chapter V transfer restrictions — storing it on US infrastructure without appropriate safeguards creates a separate GDPR violation

The practical mitigation: store Art.10 documentation on EU-sovereign infrastructure. Training datasets themselves (if they contain personal data) must comply with GDPR transfer rules regardless of Art.10.


Python Implementation: Art.10-Compliant Data Governance

from dataclasses import dataclass, field
from datetime import date, datetime
from enum import Enum
from typing import Optional
import hashlib


class BiasType(Enum):
    REPRESENTATION = "representation"
    MEASUREMENT = "measurement"
    AGGREGATION = "aggregation"
    HISTORICAL = "historical"
    DEPLOYMENT = "deployment"


class DatasetSplit(Enum):
    TRAINING = "training"
    VALIDATION = "validation"
    TEST = "test"


@dataclass
class DataCollectionRecord:
    """Art.10(2)(b) — Data collection process documentation."""
    method: str                      # e.g., "licensed_dataset", "web_scrape", "in_house_annotation"
    time_period_start: date
    time_period_end: date
    geographic_scope: list[str]      # ISO 3166-1 country codes
    legal_basis: str                 # e.g., "consent", "legitimate_interest", "public_domain"
    third_party_sources: list[str]
    filters_applied: list[str]
    collection_date: date = field(default_factory=date.today)

    def audit_record(self) -> dict:
        return {
            "method": self.method,
            "period": f"{self.time_period_start}–{self.time_period_end}",
            "geography": self.geographic_scope,
            "legal_basis": self.legal_basis,
            "third_party_sources": len(self.third_party_sources),
            "filters": self.filters_applied,
        }


@dataclass
class DataPreparationRecord:
    """Art.10(2)(c) — Data preparation operations documentation."""
    annotation_schema_version: str
    inter_annotator_agreement: Optional[float]  # Cohen's kappa or similar
    labelling_guidelines_hash: str              # SHA-256 of labelling spec
    cleaning_operations: list[str]
    records_removed: int
    removal_reasons: list[str]
    enrichment_sources: list[str]
    aggregation_weights: dict[str, float]       # source → weight
    retention_policy_days: int

    def validate_iaa(self) -> bool:
        """Warn if inter-annotator agreement is below acceptable threshold."""
        if self.inter_annotator_agreement is None:
            return False  # IAA not measured — non-compliant
        return self.inter_annotator_agreement >= 0.70  # κ ≥ 0.70 substantial agreement


@dataclass
class BiasAssessmentRecord:
    """Art.10(2)(f) + Art.10(6) — Bias examination documentation."""
    assessed_by: str
    assessment_date: date
    bias_types_examined: list[BiasType]
    protected_characteristics_assessed: list[str]  # e.g., ["gender", "age", "ethnicity"]
    special_category_data_used: bool               # Art.10(6) carve-out activated
    special_category_safeguards: list[str]         # e.g., ["pseudonymised", "separate_split", "deleted_after_assessment"]
    metrics_used: list[str]                        # e.g., ["demographic_parity", "equalised_odds", "calibration"]
    findings: list[str]
    mitigations_applied: list[str]
    residual_risk_documented: bool

    def art10_6_compliant(self) -> bool:
        """Check Art.10(6) special-category processing safeguards."""
        if not self.special_category_data_used:
            return True
        required_safeguards = {"pseudonymised", "separate_split", "deleted_after_assessment"}
        actual = set(self.special_category_safeguards)
        return required_safeguards.issubset(actual)


@dataclass
class DatasetQualityAssessment:
    """Art.10(3)/(4)/(5) — Dataset quality, statistical properties, contextual fit."""
    dataset_id: str
    split: DatasetSplit
    total_records: int
    error_rate_percent: float               # Art.10(3) — free of errors
    class_distribution: dict[str, float]   # Art.10(4) — statistical properties
    intended_deployment_geography: list[str]
    training_data_geography: list[str]
    contextual_gap_documented: bool         # Art.10(5) — contextual characteristics
    contextual_gap_description: Optional[str]
    sufficiency_assessment_method: str     # Art.10(2)(e)
    sufficiency_conclusion: str

    def geographic_coverage_gap(self) -> list[str]:
        """Identify deployment geographies not represented in training data."""
        deploy_set = set(self.intended_deployment_geography)
        train_set = set(self.training_data_geography)
        return list(deploy_set - train_set)

    def quality_flags(self) -> list[str]:
        flags = []
        if self.error_rate_percent > 2.0:
            flags.append(f"HIGH_ERROR_RATE: {self.error_rate_percent}% (>2% threshold)")
        gaps = self.geographic_coverage_gap()
        if gaps:
            flags.append(f"GEOGRAPHIC_GAP: {gaps}")
        if not self.contextual_gap_documented and gaps:
            flags.append("CONTEXTUAL_GAP_NOT_DOCUMENTED: Art.10(5) non-compliant")
        return flags


@dataclass
class Art10DataGovernanceRecord:
    """Complete Art.10 data governance record for a high-risk AI system dataset."""
    system_id: str
    system_name: str
    intended_purpose: str                    # Art.10(2)(a) design choice basis
    design_choice_rationale: str             # Art.10(2)(a)
    assumptions: list[str]                  # Art.10(2)(d)
    collection: DataCollectionRecord
    preparation: DataPreparationRecord
    quality: DatasetQualityAssessment
    bias_assessment: BiasAssessmentRecord
    record_created: datetime = field(default_factory=datetime.now)
    retained_until: Optional[date] = None   # Art.17: 10 years from last market placement

    def compliance_summary(self) -> dict:
        """Generate Art.10 compliance summary for technical documentation."""
        flags = self.quality.quality_flags()
        return {
            "system_id": self.system_id,
            "art10_2a_design_documented": bool(self.design_choice_rationale),
            "art10_2b_collection_documented": True,
            "art10_2c_preparation_documented": True,
            "art10_2d_assumptions_documented": len(self.assumptions) > 0,
            "art10_2e_sufficiency_assessed": bool(self.quality.sufficiency_assessment_method),
            "art10_2f_bias_examined": len(self.bias_assessment.findings) >= 0,
            "art10_3_quality_flags": flags,
            "art10_4_iaa_valid": self.preparation.validate_iaa(),
            "art10_5_contextual_gaps": self.quality.geographic_coverage_gap(),
            "art10_6_compliant": self.bias_assessment.art10_6_compliant(),
            "overall_compliant": len(flags) == 0 and self.bias_assessment.art10_6_compliant(),
        }

Five Common Art.10 Mistakes

Mistake 1: Using the Test Set to Validate Quality, Not to Measure It

Art.10(3) requires that training, validation, and testing datasets are relevant, representative, and error-free. A common mistake is applying data quality checks only to the training set, then using the test set as-is from a benchmark. Benchmark test sets often have known systematic biases (ImageNet label noise, NLP benchmark contamination) — using them without documentation violates Art.10(3) for the test split.

Mistake 2: Treating Art.10(2)(d) Assumptions as Optional

The obligation to document formulated assumptions is frequently omitted from dataset cards. Developers assume this is covered by "model limitations" sections. It is not. Art.10(2)(d) requires explicit identification of what the provider assumes the dataset represents — not just what the model cannot do. An undocumented assumption is an undocumented foreseeable risk under Art.9(4).

Mistake 3: Confusing Statistical Balance with Representativeness

Art.10(4) requires appropriate statistical properties regarding persons or groups of persons. A dataset that is numerically balanced (equal numbers of examples per class) may still be unrepresentative if the class structure does not reflect real-world incidence rates. A fraud detection model trained on 50/50 fraud/legitimate splits will be miscalibrated on production data where fraud is 0.1% — and Art.10(4) requires this calibration question to be addressed.

Mistake 4: Not Activating Art.10(6) When Needed, Then Not Meeting Its Safeguards When Activated

Providers either skip bias examination entirely (because they do not want to process sensitive demographics) — violating Art.10(2)(f) — or they process special-category data for bias assessment without implementing the Art.10(6) safeguards (pseudonymisation, strict separation from training data, post-assessment deletion) — violating both Art.10(6) and GDPR Art.9.

Mistake 5: Treating Data Governance Documentation as a One-Time Artefact

Art.10 applies throughout the system lifecycle. When training data is updated, augmented, or replaced — as happens during retraining cycles — Art.10 documentation must be updated. A dataset card written at initial launch that is not updated when retraining occurs creates a documentation gap. This gap becomes an Art.10 violation during any market surveillance investigation that requests current dataset documentation.


30-Item Art.10 Data Governance Checklist

Art.10(2)(a) — Design Choices

Art.10(2)(b) — Data Collection

Art.10(2)(c) — Data Preparation

Art.10(2)(d) — Assumptions

Art.10(2)(e) — Sufficiency Assessment

Art.10(2)(f) + Art.10(3)/(4) — Bias Examination and Quality

Art.10(5) — Contextual Appropriateness

Art.10(6) — Special-Category Data for Bias Detection (if applicable)


Art.10 × Art.99 Penalty Exposure

Data governance failures under Art.10 fall under Art.99(3) when they cause non-conformity of the high-risk AI system:

The GDPR parallel exposure is the practical reason to take Art.10(6) safeguards seriously: a single bias assessment that processes ethnicity data without pseudonymisation creates simultaneous EU AI Act and GDPR fines.


See Also