2026-06-05·5 min read·sota.io Team

GPAI Model Card & Transparency Documentation: The Developer's Technical Guide

Post #2 in the sota.io EU AI Act GPAI Compliance Series

GPAI Model Card Transparency Documentation Guide

With 58 days until August 2, 2026, GPAI model providers face a documentation challenge unlike anything in previous AI regulation. The EU AI Act's GPAI chapter doesn't just ask what your model does — it demands structured evidence of how it was built, what data it was trained on, what risks it carries, and what downstream deployers need to use it safely.

This post is a technical breakdown of the model card and transparency documentation requirements under the GPAI Code of Practice (CoP). If you are building, fine-tuning, or distributing a general-purpose AI model that reaches EU users, this documentation framework applies to you.


Why Model Documentation Is the Foundation of GPAI Compliance

Before enforcement, compliance is a documentation problem. The EU AI Office cannot assess your model without your records. Downstream deployers cannot make informed integration decisions without your disclosures. Incident investigators cannot reconstruct what happened without your version history.

The GPAI Code of Practice — finalized in July 2026, built from four drafting rounds with industry, civil society, and academics — operationalizes the GPAI chapter's documentation requirements into concrete deliverables. Providers that implement the CoP correctly gain a compliance presumption: a rebuttable safe harbor that shifts the burden of proof from you to the regulator.

Providers that lack documentation are exposed on multiple fronts simultaneously.


The GPAI Documentation Stack: Four Layers

The GPAI compliance documentation is not a single document — it is a structured stack of four interconnected layers, each serving a different audience.

Layer 1 — Technical Documentation (Internal)

This is your comprehensive engineering record. It does not get published in full, but must be made available to the EU AI Office on request and to authorized downstream providers under appropriate confidentiality agreements.

Required content under the GPAI chapter:

SectionWhat to Document
Model architectureTransformer type, parameter count, context length, modalities (text/image/audio/code)
Training computeTotal FLOPs, hardware used, training duration
Training data provenanceDataset names, sizes, collection dates, sources, deduplication methods
Fine-tuning detailsAny supervised fine-tuning, RLHF, DPO, LoRA adapters applied
Evaluation benchmarksCapability assessments across domains (MMLU, HumanEval, MT-Bench, safety evals)
Known limitationsFailure modes, documented hallucination patterns, domain gaps
Safety testingPre-deployment red team results, harmful content probing, refusal rates
Model versionsVersion numbering, changelog, deployment dates

The key principle: if the EU AI Office asked your head of AI engineering to explain a safety incident tomorrow, could they answer from this documentation alone? If not, the documentation is incomplete.

Layer 2 — Model Card (Public-Facing Summary)

The model card is the public-facing summary of your technical documentation. It is the primary disclosure mechanism for downstream integrators and developers who want to understand what they are integrating.

A CoP-compliant model card is not a marketing document. It is a structured technical disclosure.

Mandatory model card sections:

## Model Overview
- Model name and version
- Release date
- License type (commercial, open-weight, API-only)
- Provider name and contact for compliance questions

## Intended Use
- Primary intended uses and users
- Out-of-scope uses (explicitly documented)
- Known harmful use patterns to avoid

## Training Data
- Data sources (categories, not necessarily specific URLs)
- Approximate data volume
- Data collection period
- Whether copyrighted material was included and under what legal basis
- Opt-out/reservation policy status

## Capabilities
- Supported languages
- Performance characteristics by domain (benchmarks)
- Context window
- Modalities

## Limitations and Known Risks
- Performance limitations (e.g., knowledge cutoff date)
- Documented failure modes
- Bias assessments and known skews
- Recommended human oversight level for specific use cases

## Safety and Evaluation
- Pre-deployment safety evaluations performed
- Red team summary
- Harmful content mitigation approach

## Version History
- Major version changes
- Date of last update
- Change summary

## Contact and Compliance
- Responsible AI contact email
- EU representative contact (if non-EU provider)
- Incident reporting channel

This is not optional structure — the CoP's workstream on transparency specifically calls for standardized, machine-readable model cards to enable downstream compliance automation.

Layer 3 — Downstream Deployer Package

When downstream developers and companies integrate your model, they are building products that may themselves be regulated as high-risk AI systems under the EU AI Act. Your GPAI model is the foundation layer — and downstream deployers need specific information from you to meet their compliance obligations.

The GPAI chapter requires GPAI providers to give downstream deployers "all necessary information" to comply. In practice, this means a structured deployer package containing:

What downstream deployers need:

InformationWhy Deployers Need It
Model capabilities and limitationsThey must not deploy your model for use cases it cannot handle safely
Prohibited use casesTheir own acceptable use policy must reflect your constraints
System prompt and input handlingSo they can implement correct filtering and guardrails
Safety mitigation documentationDownstream deployers inherit your safety work — they need to understand it
Incident reporting contactIf their product causes a serious incident, they must contact you (and you must contact the EU AI Office)
Model version notificationsIf you release a materially changed version, deployers must be notified to re-assess their compliance
Usage data accessWhether you log inputs, aggregate usage, and under what conditions (GDPR implications)

Practical implementation: Most GPAI providers deliver this via a developer portal combining:

Layer 4 — Training Data Summary (Published)

One of the most practically challenging requirements is the published summary of training data. Unlike internal technical documentation, this must be publicly available.

The published summary does not require revealing your exact dataset composition — competitive sensitivity is acknowledged. But it must provide enough information for:

Minimum content for a compliant training data summary:

  1. Data categories: Web text, books, code, scientific papers, dialogue, images (and approximate proportions)
  2. Collection period: When data was collected
  3. Geographic representation: Primary languages, cultural contexts overrepresented or underrepresented
  4. Opt-out status: Whether the provider operated an opt-out mechanism, and whether known opt-outs were honored
  5. Legal basis for use: What legal basis was relied upon for copyrighted content (e.g., EU text and data mining exception under DSM Directive Article 4)

Copyright documentation is an area where many GPAI providers have been caught underprepared. The EU AI Act's GPAI chapter requires providers to implement a policy for complying with EU copyright law and to document that policy.

The specifics hinge on the Text and Data Mining (TDM) exception in the EU Directive on Copyright in the Digital Single Market (DSM Directive, 2019/790). Under Article 4 of the DSM Directive, rights holders may reserve their rights against TDM — and if they have done so in a machine-readable format, GPAI providers are expected to have honored those reservations.

What your copyright compliance documentation must address:

ItemDocumentation Requirement
TDM reservation checkingDid you check for tdm-reservation headers, robots.txt, or Terms of Service restrictions?
Reservation honoringHow did you handle content where rights holders reserved TDM rights?
Licensed contentList of licensed datasets with the license type
Infringement claimsProcess for handling copyright infringement claims post-training
Ongoing policyWhat is your process for new training data?

This documentation must be created before your model enters enforcement scope — you cannot retroactively document that you honored TDM reservations for data collected three years ago without contemporaneous records.


Version Control and Change Documentation

Model documentation is not a one-time effort. As models are updated, safety-evaluated, and refined, the documentation stack must be updated in parallel.

What requires documentation updates:

Recommended versioning approach:

Version format: {major}.{minor}.{patch}-{date}
Example: 2.1.3-2026-06-01

Major: New training run or architectural change
Minor: Fine-tuning update, safety refinement with changed capabilities
Patch: Bug fix in API, inference parameter change
Date: Release date for traceability

Downstream deployers should receive automated notifications when a new major or minor version is released. Patching your model without telling deployers is a compliance risk if the patch changes safety-relevant behavior.


The EU AI Office Database Registration

Alongside the documentation stack, GPAI model providers must register their models in the EU AI Office's model database. This is separate from the Annex III high-risk AI system database (which covers downstream products), and applies specifically to GPAI models.

Registration requires:

The registration creates a public-facing record that downstream deployers, regulators, and researchers can query to verify a provider's compliance status.


Infrastructure Considerations: Why EU Hosting Matters for Documentation

The documentation requirements create a practical hosting question: where does your compliance documentation infrastructure live?

For GPAI providers with EU users, there are two considerations:

1. GDPR compliance of your developer portal: If your documentation site collects developer account data (email, API key usage logs, incident reports), that processing is subject to GDPR. Non-EU-hosted portals can face transfer restrictions.

2. Data sovereignty for your technical documentation: Technical documentation submitted to the EU AI Office may need to remain in EU jurisdiction for audit purposes. Storing it in AWS us-east-1 or similar creates potential conflicts with data residency expectations.

What EU-native providers have at this point: Infrastructure from European providers like sota.io, Scaleway, Hetzner, or OVHcloud is deployed under German/French/EU legal frameworks — no CLOUD Act exposure, no third-party government access risk without EU legal process. For GPAI compliance documentation that may include commercially sensitive training data summaries, that distinction matters.


Implementation Checklist: Model Card Compliance by August 2

Phase 1 — Now (June 2026)

Phase 2 — July 2026 (before CoP finalization)

Phase 3 — August 2, 2026 (enforcement date)


What Comes Next in This Series

The model card and documentation requirements are the foundation. But they feed into the more complex GPAI obligations:

Post #3 — Copyright compliance in depth: Training data audits, TDM reservation checking at scale, and managing unresolved copyright claims.

Post #4 — Systemic risk assessment: Red teaming protocols, capability evaluations, and the serious incident definition for GPAI providers.

Post #5 — The GPAI compliance stack finale: Putting it together — registry, model card, incident reporting, and EU infrastructure architecture for August 2.


Summary

GPAI compliance documentation is a four-layer stack: internal technical documentation, a public model card, a downstream deployer package, and a published training data summary. The EU AI Act's GPAI chapter requires all four layers to be in place before August 2, 2026.

The model card is not a marketing document — it is a structured technical disclosure that enables downstream compliance, regulatory audit, and incident response. Getting the structure right now saves significant remediation effort as enforcement begins.

For development teams building on EU infrastructure, the documentation work and the infrastructure decisions interact: where you store compliance records, where your developer portal lives, and how you handle cross-border data flows all have compliance implications under both the EU AI Act and GDPR.

sota.io provides the EU-native deployment infrastructure that GPAI providers need for both their production AI services and their compliance documentation systems — no CLOUD Act exposure, no transatlantic data transfer complexity, deployed in German data centres under EU legal frameworks.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.