GPAI Model Card & Transparency Documentation: The Developer's Technical Guide
Post #2 in the sota.io EU AI Act GPAI Compliance Series
With 58 days until August 2, 2026, GPAI model providers face a documentation challenge unlike anything in previous AI regulation. The EU AI Act's GPAI chapter doesn't just ask what your model does — it demands structured evidence of how it was built, what data it was trained on, what risks it carries, and what downstream deployers need to use it safely.
This post is a technical breakdown of the model card and transparency documentation requirements under the GPAI Code of Practice (CoP). If you are building, fine-tuning, or distributing a general-purpose AI model that reaches EU users, this documentation framework applies to you.
Why Model Documentation Is the Foundation of GPAI Compliance
Before enforcement, compliance is a documentation problem. The EU AI Office cannot assess your model without your records. Downstream deployers cannot make informed integration decisions without your disclosures. Incident investigators cannot reconstruct what happened without your version history.
The GPAI Code of Practice — finalized in July 2026, built from four drafting rounds with industry, civil society, and academics — operationalizes the GPAI chapter's documentation requirements into concrete deliverables. Providers that implement the CoP correctly gain a compliance presumption: a rebuttable safe harbor that shifts the burden of proof from you to the regulator.
Providers that lack documentation are exposed on multiple fronts simultaneously.
The GPAI Documentation Stack: Four Layers
The GPAI compliance documentation is not a single document — it is a structured stack of four interconnected layers, each serving a different audience.
Layer 1 — Technical Documentation (Internal)
This is your comprehensive engineering record. It does not get published in full, but must be made available to the EU AI Office on request and to authorized downstream providers under appropriate confidentiality agreements.
Required content under the GPAI chapter:
| Section | What to Document |
|---|---|
| Model architecture | Transformer type, parameter count, context length, modalities (text/image/audio/code) |
| Training compute | Total FLOPs, hardware used, training duration |
| Training data provenance | Dataset names, sizes, collection dates, sources, deduplication methods |
| Fine-tuning details | Any supervised fine-tuning, RLHF, DPO, LoRA adapters applied |
| Evaluation benchmarks | Capability assessments across domains (MMLU, HumanEval, MT-Bench, safety evals) |
| Known limitations | Failure modes, documented hallucination patterns, domain gaps |
| Safety testing | Pre-deployment red team results, harmful content probing, refusal rates |
| Model versions | Version numbering, changelog, deployment dates |
The key principle: if the EU AI Office asked your head of AI engineering to explain a safety incident tomorrow, could they answer from this documentation alone? If not, the documentation is incomplete.
Layer 2 — Model Card (Public-Facing Summary)
The model card is the public-facing summary of your technical documentation. It is the primary disclosure mechanism for downstream integrators and developers who want to understand what they are integrating.
A CoP-compliant model card is not a marketing document. It is a structured technical disclosure.
Mandatory model card sections:
## Model Overview
- Model name and version
- Release date
- License type (commercial, open-weight, API-only)
- Provider name and contact for compliance questions
## Intended Use
- Primary intended uses and users
- Out-of-scope uses (explicitly documented)
- Known harmful use patterns to avoid
## Training Data
- Data sources (categories, not necessarily specific URLs)
- Approximate data volume
- Data collection period
- Whether copyrighted material was included and under what legal basis
- Opt-out/reservation policy status
## Capabilities
- Supported languages
- Performance characteristics by domain (benchmarks)
- Context window
- Modalities
## Limitations and Known Risks
- Performance limitations (e.g., knowledge cutoff date)
- Documented failure modes
- Bias assessments and known skews
- Recommended human oversight level for specific use cases
## Safety and Evaluation
- Pre-deployment safety evaluations performed
- Red team summary
- Harmful content mitigation approach
## Version History
- Major version changes
- Date of last update
- Change summary
## Contact and Compliance
- Responsible AI contact email
- EU representative contact (if non-EU provider)
- Incident reporting channel
This is not optional structure — the CoP's workstream on transparency specifically calls for standardized, machine-readable model cards to enable downstream compliance automation.
Layer 3 — Downstream Deployer Package
When downstream developers and companies integrate your model, they are building products that may themselves be regulated as high-risk AI systems under the EU AI Act. Your GPAI model is the foundation layer — and downstream deployers need specific information from you to meet their compliance obligations.
The GPAI chapter requires GPAI providers to give downstream deployers "all necessary information" to comply. In practice, this means a structured deployer package containing:
What downstream deployers need:
| Information | Why Deployers Need It |
|---|---|
| Model capabilities and limitations | They must not deploy your model for use cases it cannot handle safely |
| Prohibited use cases | Their own acceptable use policy must reflect your constraints |
| System prompt and input handling | So they can implement correct filtering and guardrails |
| Safety mitigation documentation | Downstream deployers inherit your safety work — they need to understand it |
| Incident reporting contact | If their product causes a serious incident, they must contact you (and you must contact the EU AI Office) |
| Model version notifications | If you release a materially changed version, deployers must be notified to re-assess their compliance |
| Usage data access | Whether you log inputs, aggregate usage, and under what conditions (GDPR implications) |
Practical implementation: Most GPAI providers deliver this via a developer portal combining:
- A public documentation site (model card, API docs, usage policies)
- A terms-of-service / acceptable use policy that downstream deployers sign
- A change notification system (email/webhook) for material model updates
- A security contact channel for vulnerability disclosure
Layer 4 — Training Data Summary (Published)
One of the most practically challenging requirements is the published summary of training data. Unlike internal technical documentation, this must be publicly available.
The published summary does not require revealing your exact dataset composition — competitive sensitivity is acknowledged. But it must provide enough information for:
- Copyright holders to determine whether their work was included
- Researchers to assess potential biases in the training corpus
- Downstream deployers to make jurisdiction-appropriate deployment decisions
Minimum content for a compliant training data summary:
- Data categories: Web text, books, code, scientific papers, dialogue, images (and approximate proportions)
- Collection period: When data was collected
- Geographic representation: Primary languages, cultural contexts overrepresented or underrepresented
- Opt-out status: Whether the provider operated an opt-out mechanism, and whether known opt-outs were honored
- Legal basis for use: What legal basis was relied upon for copyrighted content (e.g., EU text and data mining exception under DSM Directive Article 4)
Copyright Compliance Documentation
Copyright documentation is an area where many GPAI providers have been caught underprepared. The EU AI Act's GPAI chapter requires providers to implement a policy for complying with EU copyright law and to document that policy.
The specifics hinge on the Text and Data Mining (TDM) exception in the EU Directive on Copyright in the Digital Single Market (DSM Directive, 2019/790). Under Article 4 of the DSM Directive, rights holders may reserve their rights against TDM — and if they have done so in a machine-readable format, GPAI providers are expected to have honored those reservations.
What your copyright compliance documentation must address:
| Item | Documentation Requirement |
|---|---|
| TDM reservation checking | Did you check for tdm-reservation headers, robots.txt, or Terms of Service restrictions? |
| Reservation honoring | How did you handle content where rights holders reserved TDM rights? |
| Licensed content | List of licensed datasets with the license type |
| Infringement claims | Process for handling copyright infringement claims post-training |
| Ongoing policy | What is your process for new training data? |
This documentation must be created before your model enters enforcement scope — you cannot retroactively document that you honored TDM reservations for data collected three years ago without contemporaneous records.
Version Control and Change Documentation
Model documentation is not a one-time effort. As models are updated, safety-evaluated, and refined, the documentation stack must be updated in parallel.
What requires documentation updates:
- Model weight updates: Any change to the model's weights that materially affects behavior requires a new model version with updated documentation
- Safety measure changes: If you modify your content filtering, guardrails, or refusal policies, downstream deployers must be notified
- Incident-driven changes: Post-incident model modifications must be documented and communicated to affected downstream deployers
- Benchmark changes: If re-evaluation reveals capability changes (positive or negative), update the model card
Recommended versioning approach:
Version format: {major}.{minor}.{patch}-{date}
Example: 2.1.3-2026-06-01
Major: New training run or architectural change
Minor: Fine-tuning update, safety refinement with changed capabilities
Patch: Bug fix in API, inference parameter change
Date: Release date for traceability
Downstream deployers should receive automated notifications when a new major or minor version is released. Patching your model without telling deployers is a compliance risk if the patch changes safety-relevant behavior.
The EU AI Office Database Registration
Alongside the documentation stack, GPAI model providers must register their models in the EU AI Office's model database. This is separate from the Annex III high-risk AI system database (which covers downstream products), and applies specifically to GPAI models.
Registration requires:
- Model name and version
- Provider identity and EU representative (if non-EU)
- Model card URL
- Date of first EU market placement
- Contact for compliance inquiries
- Declaration of compliance status (CoP signatory or alternative compliance approach)
The registration creates a public-facing record that downstream deployers, regulators, and researchers can query to verify a provider's compliance status.
Infrastructure Considerations: Why EU Hosting Matters for Documentation
The documentation requirements create a practical hosting question: where does your compliance documentation infrastructure live?
For GPAI providers with EU users, there are two considerations:
1. GDPR compliance of your developer portal: If your documentation site collects developer account data (email, API key usage logs, incident reports), that processing is subject to GDPR. Non-EU-hosted portals can face transfer restrictions.
2. Data sovereignty for your technical documentation: Technical documentation submitted to the EU AI Office may need to remain in EU jurisdiction for audit purposes. Storing it in AWS us-east-1 or similar creates potential conflicts with data residency expectations.
What EU-native providers have at this point: Infrastructure from European providers like sota.io, Scaleway, Hetzner, or OVHcloud is deployed under German/French/EU legal frameworks — no CLOUD Act exposure, no third-party government access risk without EU legal process. For GPAI compliance documentation that may include commercially sensitive training data summaries, that distinction matters.
Implementation Checklist: Model Card Compliance by August 2
Phase 1 — Now (June 2026)
- Audit existing documentation: do you have contemporaneous records for your training data provenance?
- Create or update your model card using the structure above
- Draft your training data summary (the version to be published)
- Document your copyright compliance policy and checking process
- Identify your EU representative if you are a non-EU provider
Phase 2 — July 2026 (before CoP finalization)
- Review final CoP text against your documentation stack
- Sign the Code of Practice or document your alternative compliance approach
- Register with EU AI Office model database
- Set up downstream deployer notification system (email/webhook for model version updates)
- Test your incident reporting pipeline (can you contact the EU AI Office within required timeframes?)
Phase 3 — August 2, 2026 (enforcement date)
- All documentation complete and version-controlled
- Model card publicly accessible
- Downstream deployer package delivered to all API customers
- Incident reporting channel live and tested
- Compliance monitoring running
What Comes Next in This Series
The model card and documentation requirements are the foundation. But they feed into the more complex GPAI obligations:
Post #3 — Copyright compliance in depth: Training data audits, TDM reservation checking at scale, and managing unresolved copyright claims.
Post #4 — Systemic risk assessment: Red teaming protocols, capability evaluations, and the serious incident definition for GPAI providers.
Post #5 — The GPAI compliance stack finale: Putting it together — registry, model card, incident reporting, and EU infrastructure architecture for August 2.
Summary
GPAI compliance documentation is a four-layer stack: internal technical documentation, a public model card, a downstream deployer package, and a published training data summary. The EU AI Act's GPAI chapter requires all four layers to be in place before August 2, 2026.
The model card is not a marketing document — it is a structured technical disclosure that enables downstream compliance, regulatory audit, and incident response. Getting the structure right now saves significant remediation effort as enforcement begins.
For development teams building on EU infrastructure, the documentation work and the infrastructure decisions interact: where you store compliance records, where your developer portal lives, and how you handle cross-border data flows all have compliance implications under both the EU AI Act and GDPR.
sota.io provides the EU-native deployment infrastructure that GPAI providers need for both their production AI services and their compliance documentation systems — no CLOUD Act exposure, no transatlantic data transfer complexity, deployed in German data centres under EU legal frameworks.
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.