EU AI Act Article 10: Data Governance Requirements for High-Risk AI Training Datasets (2026)
On August 2, 2026, Article 10 of the EU AI Act begins applying to providers of new high-risk AI systems placed on the EU market. If you train, fine-tune, validate, or test a model that falls under Annex III — covering healthcare diagnostics, biometric identification, critical infrastructure, employment screening, credit scoring, law enforcement, or education — Article 10 is your mandatory data governance framework.
Most developers treating Article 10 as a vague "use good data" requirement are misreading the regulation. Article 10 contains six specific, auditable data governance obligations in Art.10(2) alone, plus distinct quality criteria, statistical requirements, contextual obligations, and a special-category processing carve-out for bias detection that developers routinely overlook.
This guide covers what Art.10 actually demands, how its requirements intersect with Art.9 (risk management), Art.15 (accuracy), and Art.17 (QMS), where CLOUD Act jurisdiction creates documentation exposure, and how to implement Art.10-compliant data governance in Python.
What Article 10 Actually Requires
Article 10 applies to training, validation, and testing datasets for high-risk AI systems. The article has six subsections:
- Art.10(1): Training, validation, and testing datasets shall be subject to appropriate data governance and management practices
- Art.10(2): Data governance and management practices shall cover specific areas (the six obligations)
- Art.10(3): Datasets shall be relevant, sufficiently representative, and to the best extent possible, free of errors and complete
- Art.10(4): Datasets shall have the appropriate statistical properties, including regarding persons or groups of persons
- Art.10(5): Datasets shall take into account characteristics of the specific geographical, contextual, behavioural or functional setting
- Art.10(6): Providers may process special categories of personal data for bias monitoring, detection, and correction under strict conditions
The operational core is Art.10(2), which lists the six areas that data governance practices must cover.
Art.10(2): The Six Data Governance Obligations
Art.10(2) specifies that data governance and management practices must cover:
(a) Design Choices
The intended purpose of the AI system determines which data is relevant. Art.10(2)(a) requires documentation of the design choices made — why specific datasets were selected, what alternatives were considered, and what the dataset is designed to make the model learn. This connects directly to Art.13(1) transparency: users of the system must be able to understand what the system was trained to do.
In practice: your model card must explain dataset selection rationale, not just dataset composition.
(b) Data Collection Processes
Art.10(2)(b) requires documentation of how data was collected — the process, not just the result. This includes:
- The collection method (web scraping, licensed datasets, in-house annotation, third-party data brokers)
- The time period and geographic scope of collection
- The terms under which data was obtained (consent, legitimate interest, public domain)
- Any filtering applied during collection
Why this matters: the collection process determines what systematic biases are baked into the dataset before any preprocessing occurs. A resume screening model trained on historically male-dominated engineering applications will inherit gender bias at collection, not at training.
(c) Data Preparation Operations
Art.10(2)(c) covers relevant data preparation operations including annotation, labelling, cleaning, enrichment, aggregation, and retention. Each of these can introduce or amplify bias:
- Annotation: Inter-annotator agreement must be documented. Whose labels are ground truth?
- Labelling: Label schema decisions shape what the model can predict
- Cleaning: What data was removed and why? Removing "outliers" can remove under-represented groups
- Enrichment: What features were added? Third-party enrichment can introduce proxies for protected characteristics
- Aggregation: How were multiple sources combined? Differential source weights create differential representation
- Retention: What data was excluded from final datasets? Exclusion criteria must be auditable
(d) Formulation of Relevant Assumptions
Art.10(2)(d) requires explicit documentation of assumptions made about the data — what the provider assumes the data represents, and what the data does not represent. This is an epistemological obligation: you must state what you believe the dataset means and what limitations that belief carries.
Examples:
- "This dataset represents adults aged 18–65 in the DACH region; performance on populations outside this demographic has not been validated"
- "Annotation reflects consensus of English-speaking annotators; cross-cultural validity has not been assessed"
Undocumented assumptions are the most common source of Art.10 non-compliance. If your technical documentation does not state what your training data assumes, you have not met Art.10(2)(d).
(e) Availability, Quantity, and Suitability Assessment
Art.10(2)(e) requires assessment of whether the available data is sufficient in quantity and suitable in quality for the intended purpose. This is not a qualitative statement — it requires a documented methodology for evaluating dataset sufficiency.
The assessment must cover:
- Is the dataset large enough for the model to generalise? (quantity)
- Does the dataset cover the use cases the system will encounter in deployment? (suitability)
- Are validation and test splits genuinely independent from training data? (methodology)
(f) Examination for Possible Biases
Art.10(2)(f) requires examination for biases that could result in prohibited discrimination or risks to health and safety. This is the bias detection obligation. Art.10(6) provides a special-category processing carve-out specifically to enable this examination (covered below).
The examination must be documented — not just conducted. "We ran fairness metrics" is insufficient. The documentation must show:
- What bias types were examined (representation bias, measurement bias, aggregation bias)
- What groups were assessed (per protected characteristics)
- What metrics were used and what thresholds were applied
- What findings were made and what mitigations were taken
Art.10(3): Quality Criteria — Relevant, Representative, Error-Free
Art.10(3) sets quality criteria for datasets:
Training, validation, and testing datasets shall be relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose.
The three quality criteria operate jointly:
Relevant: The dataset must relate to the decision the system makes. A fraud detection model trained on transactions from a different regulatory jurisdiction may be technically high-quality but irrelevant to the deployment context.
Sufficiently representative: The dataset must reflect the diversity of cases the system will encounter in deployment. "Sufficiently representative" is a relative standard — representative relative to the use case, the affected population, and the intended deployment context. A dataset that is representative for urban German users may not be representative for rural Polish users of the same system.
Free of errors and complete: "To the best extent possible" softens this requirement, but it creates a due diligence obligation. The provider must document the error rate found and the mitigation applied — not claim that data is error-free without evidence.
Art.10(4): Statistical Properties and Fairness
Art.10(4) requires that datasets have appropriate statistical properties, including regarding persons or groups of persons. This is the quantitative fairness requirement.
"Appropriate statistical properties" means:
- The distribution of examples across the outcome space is appropriate for the task
- Where the system affects specific demographic groups, the dataset must have adequate representation of those groups
- Statistical properties must be documented and justified (not just asserted)
For classification tasks, this requires class balance analysis. For regression tasks, coverage of the output range must be demonstrated. For systems affecting protected groups under EU law (gender, race, ethnicity, age, disability, sexual orientation, religion), the dataset distribution across those groups must be documented and assessed for differential performance risk.
Art.10(5): Contextual Appropriateness
Art.10(5) adds a contextual dimension: datasets shall take into account characteristics of the specific geographical, contextual, behavioural, or functional setting in which the high-risk AI system is intended to be used.
This obligation targets geographic and contextual transfer problems. A system validated on UK hospital records that is deployed in German hospitals faces an Art.10(5) gap — different healthcare system structure, different patient demographics, different documentation practices. The provider must document how the training data maps to the intended deployment context.
Contextual characteristics include:
- Geographical: Country, region, urban/rural, language
- Contextual: Professional domain, organisational setting, regulatory environment
- Behavioural: How end users interact with the system vs. how training data was generated
- Functional: The specific decisions the system makes vs. the decisions in training
Art.10(6): Special Categories for Bias Detection
Art.10(6) is the most technically important carve-out in Art.10. It permits providers to process special categories of personal data (GDPR Art.9 data — race, ethnic origin, health, sexual orientation, political opinion, religious belief) for the purpose of detecting and correcting biases in high-risk AI systems, subject to strict conditions:
- Appropriate safeguards: encryption, pseudonymisation, access controls
- Strictly necessary: the special-category processing must be limited to what is required for bias detection
- Only data for bias detection: the special-category data may not be used for training the AI system itself
- State-of-the-art security: as defined in applicable data protection law
The Art.10(6) carve-out resolves a practical conflict: you cannot examine whether a model discriminates by ethnicity without knowing the ethnicity of your validation set subjects. GDPR Art.9 normally prohibits processing ethnicity data. Art.10(6) creates a derogation from that prohibition, but only for the bias detection purpose.
In practice: special-category data used for Art.10(6) bias assessment must be:
- Separated from training data (cannot be fed into the model)
- Pseudonymised at minimum
- Deleted after the bias assessment is complete (or retained only as aggregate statistics)
- Covered by a specific GDPR legal basis documentation entry
Art.10 × Art.9: The Risk Management Loop
Art.10 and Art.9 form a closed feedback loop:
Art.9 identifies foreseeable risks
↓
Art.10(2)(f) examines training data for bias risks
↓
Bias findings feed back into Art.9 risk assessment
↓
Art.9 requires risk management measures
↓
Measures implemented in data pipeline or model training
↓
Art.10(3) quality criteria re-assessed
↓
Loop continues throughout system lifecycle
The key connection: Art.9(4) requires identification of foreseeable risks. A foreseeable risk for any model trained on demographic data is differential performance across protected groups. Art.10(2)(f) is the mechanism by which that foreseeable risk is examined and documented. The bias examination results must flow back into the Art.9 risk management record.
Providers who maintain Art.9 and Art.10 documentation in separate silos — a risk register and a dataset card without cross-references — are meeting neither requirement in the spirit of the regulation.
Art.10 × Art.15: Accuracy, Robustness, and Dataset Quality
Art.15 requires high-risk AI systems to achieve an appropriate level of accuracy, robustness, and cybersecurity. Dataset quality under Art.10 is the upstream determinant of achievable accuracy.
The intersection creates a compliance chain:
- Art.10(3): Dataset must be relevant and representative → enables Art.15 accuracy
- Art.10(4): Statistical properties must be appropriate → enables Art.15 robustness
- Art.10(5): Contextual match between training and deployment → enables Art.15 accuracy in the specific deployment setting
A system that meets Art.15 accuracy thresholds on a non-representative test set while failing Art.10(3) representativeness criteria is non-compliant with both articles. The test set itself must meet Art.10 quality criteria — it cannot be selected to produce favourable accuracy numbers.
CLOUD Act × Art.10: Where Training Data Must Be Stored
Training datasets for high-risk AI systems create documentation obligations that intersect with CLOUD Act jurisdiction risk.
Art.10(2) requires documentation of data governance and management practices. Art.17 requires that technical documentation — which includes dataset documentation under Annex IV — be retained and made available to competent authorities for 10 years.
If your training data documentation, dataset bias assessments, and data governance records are stored on US-provider infrastructure, those records are accessible to US authorities under the CLOUD Act without EU judicial review or notification to you. For high-risk AI systems operating in regulated sectors (health, employment, law enforcement), this creates a dual exposure:
- Audit risk: If an EU market surveillance authority requests Art.10(2) documentation during an incident investigation, CLOUD Act disclosure means that documentation may already have been accessed by US law enforcement without your knowledge
- GDPR cross-reference: Special-category data processed under Art.10(6) for bias detection is subject to GDPR Chapter V transfer restrictions — storing it on US infrastructure without appropriate safeguards creates a separate GDPR violation
The practical mitigation: store Art.10 documentation on EU-sovereign infrastructure. Training datasets themselves (if they contain personal data) must comply with GDPR transfer rules regardless of Art.10.
Python Implementation: Art.10-Compliant Data Governance
from dataclasses import dataclass, field
from datetime import date, datetime
from enum import Enum
from typing import Optional
import hashlib
class BiasType(Enum):
REPRESENTATION = "representation"
MEASUREMENT = "measurement"
AGGREGATION = "aggregation"
HISTORICAL = "historical"
DEPLOYMENT = "deployment"
class DatasetSplit(Enum):
TRAINING = "training"
VALIDATION = "validation"
TEST = "test"
@dataclass
class DataCollectionRecord:
"""Art.10(2)(b) — Data collection process documentation."""
method: str # e.g., "licensed_dataset", "web_scrape", "in_house_annotation"
time_period_start: date
time_period_end: date
geographic_scope: list[str] # ISO 3166-1 country codes
legal_basis: str # e.g., "consent", "legitimate_interest", "public_domain"
third_party_sources: list[str]
filters_applied: list[str]
collection_date: date = field(default_factory=date.today)
def audit_record(self) -> dict:
return {
"method": self.method,
"period": f"{self.time_period_start}–{self.time_period_end}",
"geography": self.geographic_scope,
"legal_basis": self.legal_basis,
"third_party_sources": len(self.third_party_sources),
"filters": self.filters_applied,
}
@dataclass
class DataPreparationRecord:
"""Art.10(2)(c) — Data preparation operations documentation."""
annotation_schema_version: str
inter_annotator_agreement: Optional[float] # Cohen's kappa or similar
labelling_guidelines_hash: str # SHA-256 of labelling spec
cleaning_operations: list[str]
records_removed: int
removal_reasons: list[str]
enrichment_sources: list[str]
aggregation_weights: dict[str, float] # source → weight
retention_policy_days: int
def validate_iaa(self) -> bool:
"""Warn if inter-annotator agreement is below acceptable threshold."""
if self.inter_annotator_agreement is None:
return False # IAA not measured — non-compliant
return self.inter_annotator_agreement >= 0.70 # κ ≥ 0.70 substantial agreement
@dataclass
class BiasAssessmentRecord:
"""Art.10(2)(f) + Art.10(6) — Bias examination documentation."""
assessed_by: str
assessment_date: date
bias_types_examined: list[BiasType]
protected_characteristics_assessed: list[str] # e.g., ["gender", "age", "ethnicity"]
special_category_data_used: bool # Art.10(6) carve-out activated
special_category_safeguards: list[str] # e.g., ["pseudonymised", "separate_split", "deleted_after_assessment"]
metrics_used: list[str] # e.g., ["demographic_parity", "equalised_odds", "calibration"]
findings: list[str]
mitigations_applied: list[str]
residual_risk_documented: bool
def art10_6_compliant(self) -> bool:
"""Check Art.10(6) special-category processing safeguards."""
if not self.special_category_data_used:
return True
required_safeguards = {"pseudonymised", "separate_split", "deleted_after_assessment"}
actual = set(self.special_category_safeguards)
return required_safeguards.issubset(actual)
@dataclass
class DatasetQualityAssessment:
"""Art.10(3)/(4)/(5) — Dataset quality, statistical properties, contextual fit."""
dataset_id: str
split: DatasetSplit
total_records: int
error_rate_percent: float # Art.10(3) — free of errors
class_distribution: dict[str, float] # Art.10(4) — statistical properties
intended_deployment_geography: list[str]
training_data_geography: list[str]
contextual_gap_documented: bool # Art.10(5) — contextual characteristics
contextual_gap_description: Optional[str]
sufficiency_assessment_method: str # Art.10(2)(e)
sufficiency_conclusion: str
def geographic_coverage_gap(self) -> list[str]:
"""Identify deployment geographies not represented in training data."""
deploy_set = set(self.intended_deployment_geography)
train_set = set(self.training_data_geography)
return list(deploy_set - train_set)
def quality_flags(self) -> list[str]:
flags = []
if self.error_rate_percent > 2.0:
flags.append(f"HIGH_ERROR_RATE: {self.error_rate_percent}% (>2% threshold)")
gaps = self.geographic_coverage_gap()
if gaps:
flags.append(f"GEOGRAPHIC_GAP: {gaps}")
if not self.contextual_gap_documented and gaps:
flags.append("CONTEXTUAL_GAP_NOT_DOCUMENTED: Art.10(5) non-compliant")
return flags
@dataclass
class Art10DataGovernanceRecord:
"""Complete Art.10 data governance record for a high-risk AI system dataset."""
system_id: str
system_name: str
intended_purpose: str # Art.10(2)(a) design choice basis
design_choice_rationale: str # Art.10(2)(a)
assumptions: list[str] # Art.10(2)(d)
collection: DataCollectionRecord
preparation: DataPreparationRecord
quality: DatasetQualityAssessment
bias_assessment: BiasAssessmentRecord
record_created: datetime = field(default_factory=datetime.now)
retained_until: Optional[date] = None # Art.17: 10 years from last market placement
def compliance_summary(self) -> dict:
"""Generate Art.10 compliance summary for technical documentation."""
flags = self.quality.quality_flags()
return {
"system_id": self.system_id,
"art10_2a_design_documented": bool(self.design_choice_rationale),
"art10_2b_collection_documented": True,
"art10_2c_preparation_documented": True,
"art10_2d_assumptions_documented": len(self.assumptions) > 0,
"art10_2e_sufficiency_assessed": bool(self.quality.sufficiency_assessment_method),
"art10_2f_bias_examined": len(self.bias_assessment.findings) >= 0,
"art10_3_quality_flags": flags,
"art10_4_iaa_valid": self.preparation.validate_iaa(),
"art10_5_contextual_gaps": self.quality.geographic_coverage_gap(),
"art10_6_compliant": self.bias_assessment.art10_6_compliant(),
"overall_compliant": len(flags) == 0 and self.bias_assessment.art10_6_compliant(),
}
Five Common Art.10 Mistakes
Mistake 1: Using the Test Set to Validate Quality, Not to Measure It
Art.10(3) requires that training, validation, and testing datasets are relevant, representative, and error-free. A common mistake is applying data quality checks only to the training set, then using the test set as-is from a benchmark. Benchmark test sets often have known systematic biases (ImageNet label noise, NLP benchmark contamination) — using them without documentation violates Art.10(3) for the test split.
Mistake 2: Treating Art.10(2)(d) Assumptions as Optional
The obligation to document formulated assumptions is frequently omitted from dataset cards. Developers assume this is covered by "model limitations" sections. It is not. Art.10(2)(d) requires explicit identification of what the provider assumes the dataset represents — not just what the model cannot do. An undocumented assumption is an undocumented foreseeable risk under Art.9(4).
Mistake 3: Confusing Statistical Balance with Representativeness
Art.10(4) requires appropriate statistical properties regarding persons or groups of persons. A dataset that is numerically balanced (equal numbers of examples per class) may still be unrepresentative if the class structure does not reflect real-world incidence rates. A fraud detection model trained on 50/50 fraud/legitimate splits will be miscalibrated on production data where fraud is 0.1% — and Art.10(4) requires this calibration question to be addressed.
Mistake 4: Not Activating Art.10(6) When Needed, Then Not Meeting Its Safeguards When Activated
Providers either skip bias examination entirely (because they do not want to process sensitive demographics) — violating Art.10(2)(f) — or they process special-category data for bias assessment without implementing the Art.10(6) safeguards (pseudonymisation, strict separation from training data, post-assessment deletion) — violating both Art.10(6) and GDPR Art.9.
Mistake 5: Treating Data Governance Documentation as a One-Time Artefact
Art.10 applies throughout the system lifecycle. When training data is updated, augmented, or replaced — as happens during retraining cycles — Art.10 documentation must be updated. A dataset card written at initial launch that is not updated when retraining occurs creates a documentation gap. This gap becomes an Art.10 violation during any market surveillance investigation that requests current dataset documentation.
30-Item Art.10 Data Governance Checklist
Art.10(2)(a) — Design Choices
- 1. Dataset selection rationale documented: why this dataset for this intended purpose
- 2. Alternative datasets considered and rejected — reasons documented
- 3. Intended purpose mapped to specific data requirements
Art.10(2)(b) — Data Collection
- 4. Collection method documented (web scrape / licensed / in-house / broker)
- 5. Collection time period and geographic scope documented
- 6. Legal basis for collection documented (consent / LI / public domain)
- 7. Third-party data sources listed with provenance records
- 8. Collection-stage filters documented
Art.10(2)(c) — Data Preparation
- 9. Annotation schema version-controlled and hashed
- 10. Inter-annotator agreement measured and documented (κ ≥ 0.70 target)
- 11. Labelling guidelines available in technical documentation
- 12. Cleaning operations documented: records removed + removal reasons
- 13. Enrichment sources listed with bias-introduction assessment
- 14. Aggregation weights documented per source
- 15. Retention policy documented (training vs. validation vs. test)
Art.10(2)(d) — Assumptions
- 16. Explicit assumptions about what the dataset represents — documented
- 17. Explicit assumptions about populations NOT represented — documented
- 18. Assumptions validated against deployment context (Art.10(5) cross-check)
Art.10(2)(e) — Sufficiency Assessment
- 19. Dataset quantity assessment: sufficient for intended model generalisation
- 20. Dataset suitability assessment: covers deployment scenarios
- 21. Train/validation/test split independence verified
- 22. Sufficiency methodology and conclusion documented
Art.10(2)(f) + Art.10(3)/(4) — Bias Examination and Quality
- 23. Bias types examined: representation, measurement, aggregation, historical
- 24. Protected characteristics assessed: gender, age, ethnicity, disability (as applicable)
- 25. Fairness metrics applied: demographic parity, equalised odds, calibration
- 26. Error rate measured per split — documented with mitigation
- 27. Class distribution documented and calibration assessed (Art.10(4))
- 28. Bias findings documented with mitigations applied and residual risk noted
Art.10(5) — Contextual Appropriateness
- 29. Geographic coverage gap between training data and deployment context — documented
- 30. Contextual, behavioural, and functional gaps documented with performance validation plan
Art.10(6) — Special-Category Data for Bias Detection (if applicable)
- Triggered only if special-category demographic data was processed for bias assessment
- Special-category data pseudonymised before use
- Special-category data kept strictly separate from training split
- Special-category data deleted after bias assessment (aggregate statistics retained)
- GDPR Art.9(2)(g)/(j) legal basis documented in ROPA
Art.10 × Art.99 Penalty Exposure
Data governance failures under Art.10 fall under Art.99(3) when they cause non-conformity of the high-risk AI system:
- Training on non-representative data → discriminatory outputs → Art.10(3) + Art.15 non-compliance: up to €30 million or 6% of global annual turnover
- Missing Art.10(2) documentation during MSA investigation: up to €15 million or 3% of global annual turnover (Art.99(4) — providing incorrect information)
- Art.10(6) safeguards not met → special-category data misuse: up to €30 million or 6% (Art.99(3)) plus parallel GDPR Art.9 exposure under GDPR Art.83(4) (up to €10 million or 2% of annual turnover)
The GDPR parallel exposure is the practical reason to take Art.10(6) safeguards seriously: a single bias assessment that processes ethnicity data without pseudonymisation creates simultaneous EU AI Act and GDPR fines.
See Also
- EU AI Act Art.9 — Risk Management System for High-Risk AI
- EU AI Act Art.11 — Technical Documentation Requirements (Annex IV)
- EU AI Act Art.13 — Transparency Obligations for High-Risk AI
- EU AI Act Art.15 — Accuracy, Robustness, and Cybersecurity
- EU AI Act Art.16 — Provider Obligations for High-Risk AI
- EU AI Act Art.17 — Quality Management System Requirements
- EU AI Act Art.26 — Obligations of Deployers
- EU AI Act Art.27 — Fundamental Rights Impact Assessment (FRIA)
- EU AI Act Art.99 — Penalties and Fines