2026-06-05·5 min read·sota.io Team

GPAI Model Card & Transparency Documentation: The Developer's Technical Guide

Post #2 in the sota.io EU AI Act GPAI Compliance Series

GPAI Model Card Transparency Documentation Guide

With 58 days until August 2, 2026, GPAI model providers face a documentation challenge unlike anything in previous AI regulation. The EU AI Act's GPAI chapter doesn't just ask what your model does — it demands structured evidence of how it was built, what data it was trained on, what risks it carries, and what downstream deployers need to use it safely.

This post is a technical breakdown of the model card and transparency documentation requirements under the GPAI Code of Practice (CoP). If you are building, fine-tuning, or distributing a general-purpose AI model that reaches EU users, this documentation framework applies to you.

Why Model Documentation Is the Foundation of GPAI Compliance

Before enforcement, compliance is a documentation problem. The EU AI Office cannot assess your model without your records. Downstream deployers cannot make informed integration decisions without your disclosures. Incident investigators cannot reconstruct what happened without your version history.

The GPAI Code of Practice — finalized in July 2026, built from four drafting rounds with industry, civil society, and academics — operationalizes the GPAI chapter's documentation requirements into concrete deliverables. Providers that implement the CoP correctly gain a compliance presumption: a rebuttable safe harbor that shifts the burden of proof from you to the regulator.

Providers that lack documentation are exposed on multiple fronts simultaneously.

The GPAI Documentation Stack: Four Layers

The GPAI compliance documentation is not a single document — it is a structured stack of four interconnected layers, each serving a different audience.

Layer 1 — Technical Documentation (Internal)

This is your comprehensive engineering record. It does not get published in full, but must be made available to the EU AI Office on request and to authorized downstream providers under appropriate confidentiality agreements.

Required content under the GPAI chapter:

Section	What to Document
Model architecture	Transformer type, parameter count, context length, modalities (text/image/audio/code)
Training compute	Total FLOPs, hardware used, training duration
Training data provenance	Dataset names, sizes, collection dates, sources, deduplication methods
Fine-tuning details	Any supervised fine-tuning, RLHF, DPO, LoRA adapters applied
Evaluation benchmarks	Capability assessments across domains (MMLU, HumanEval, MT-Bench, safety evals)
Known limitations	Failure modes, documented hallucination patterns, domain gaps
Safety testing	Pre-deployment red team results, harmful content probing, refusal rates
Model versions	Version numbering, changelog, deployment dates

The key principle: if the EU AI Office asked your head of AI engineering to explain a safety incident tomorrow, could they answer from this documentation alone? If not, the documentation is incomplete.

Layer 2 — Model Card (Public-Facing Summary)

The model card is the public-facing summary of your technical documentation. It is the primary disclosure mechanism for downstream integrators and developers who want to understand what they are integrating.

A CoP-compliant model card is not a marketing document. It is a structured technical disclosure.

Mandatory model card sections:

## Model Overview
- Model name and version
- Release date
- License type (commercial, open-weight, API-only)
- Provider name and contact for compliance questions

## Intended Use
- Primary intended uses and users
- Out-of-scope uses (explicitly documented)
- Known harmful use patterns to avoid

## Training Data
- Data sources (categories, not necessarily specific URLs)
- Approximate data volume
- Data collection period
- Whether copyrighted material was included and under what legal basis
- Opt-out/reservation policy status

## Capabilities
- Supported languages
- Performance characteristics by domain (benchmarks)
- Context window
- Modalities

## Limitations and Known Risks
- Performance limitations (e.g., knowledge cutoff date)
- Documented failure modes
- Bias assessments and known skews
- Recommended human oversight level for specific use cases

## Safety and Evaluation
- Pre-deployment safety evaluations performed
- Red team summary
- Harmful content mitigation approach

## Version History
- Major version changes
- Date of last update
- Change summary

## Contact and Compliance
- Responsible AI contact email
- EU representative contact (if non-EU provider)
- Incident reporting channel

This is not optional structure — the CoP's workstream on transparency specifically calls for standardized, machine-readable model cards to enable downstream compliance automation.

Layer 3 — Downstream Deployer Package

When downstream developers and companies integrate your model, they are building products that may themselves be regulated as high-risk AI systems under the EU AI Act. Your GPAI model is the foundation layer — and downstream deployers need specific information from you to meet their compliance obligations.

The GPAI chapter requires GPAI providers to give downstream deployers "all necessary information" to comply. In practice, this means a structured deployer package containing:

What downstream deployers need:

Information	Why Deployers Need It
Model capabilities and limitations	They must not deploy your model for use cases it cannot handle safely
Prohibited use cases	Their own acceptable use policy must reflect your constraints
System prompt and input handling	So they can implement correct filtering and guardrails
Safety mitigation documentation	Downstream deployers inherit your safety work — they need to understand it
Incident reporting contact	If their product causes a serious incident, they must contact you (and you must contact the EU AI Office)
Model version notifications	If you release a materially changed version, deployers must be notified to re-assess their compliance
Usage data access	Whether you log inputs, aggregate usage, and under what conditions (GDPR implications)

Practical implementation: Most GPAI providers deliver this via a developer portal combining:

A public documentation site (model card, API docs, usage policies)
A terms-of-service / acceptable use policy that downstream deployers sign
A change notification system (email/webhook) for material model updates
A security contact channel for vulnerability disclosure

Layer 4 — Training Data Summary (Published)

One of the most practically challenging requirements is the published summary of training data. Unlike internal technical documentation, this must be publicly available.

The published summary does not require revealing your exact dataset composition — competitive sensitivity is acknowledged. But it must provide enough information for:

Copyright holders to determine whether their work was included
Researchers to assess potential biases in the training corpus
Downstream deployers to make jurisdiction-appropriate deployment decisions

Minimum content for a compliant training data summary:

Data categories: Web text, books, code, scientific papers, dialogue, images (and approximate proportions)
Collection period: When data was collected
Geographic representation: Primary languages, cultural contexts overrepresented or underrepresented
Opt-out status: Whether the provider operated an opt-out mechanism, and whether known opt-outs were honored
Legal basis for use: What legal basis was relied upon for copyrighted content (e.g., EU text and data mining exception under DSM Directive Article 4)

Copyright Compliance Documentation

Copyright documentation is an area where many GPAI providers have been caught underprepared. The EU AI Act's GPAI chapter requires providers to implement a policy for complying with EU copyright law and to document that policy.

The specifics hinge on the Text and Data Mining (TDM) exception in the EU Directive on Copyright in the Digital Single Market (DSM Directive, 2019/790). Under Article 4 of the DSM Directive, rights holders may reserve their rights against TDM — and if they have done so in a machine-readable format, GPAI providers are expected to have honored those reservations.

What your copyright compliance documentation must address:

Item	Documentation Requirement
TDM reservation checking	Did you check for `tdm-reservation` headers, robots.txt, or Terms of Service restrictions?
Reservation honoring	How did you handle content where rights holders reserved TDM rights?
Licensed content	List of licensed datasets with the license type
Infringement claims	Process for handling copyright infringement claims post-training
Ongoing policy	What is your process for new training data?

This documentation must be created before your model enters enforcement scope — you cannot retroactively document that you honored TDM reservations for data collected three years ago without contemporaneous records.

Version Control and Change Documentation

Model documentation is not a one-time effort. As models are updated, safety-evaluated, and refined, the documentation stack must be updated in parallel.

What requires documentation updates:

Model weight updates: Any change to the model's weights that materially affects behavior requires a new model version with updated documentation
Safety measure changes: If you modify your content filtering, guardrails, or refusal policies, downstream deployers must be notified
Incident-driven changes: Post-incident model modifications must be documented and communicated to affected downstream deployers
Benchmark changes: If re-evaluation reveals capability changes (positive or negative), update the model card

Recommended versioning approach:

Version format: {major}.{minor}.{patch}-{date}
Example: 2.1.3-2026-06-01

Major: New training run or architectural change
Minor: Fine-tuning update, safety refinement with changed capabilities
Patch: Bug fix in API, inference parameter change
Date: Release date for traceability

Downstream deployers should receive automated notifications when a new major or minor version is released. Patching your model without telling deployers is a compliance risk if the patch changes safety-relevant behavior.

The EU AI Office Database Registration

Alongside the documentation stack, GPAI model providers must register their models in the EU AI Office's model database. This is separate from the Annex III high-risk AI system database (which covers downstream products), and applies specifically to GPAI models.

Registration requires:

Model name and version
Provider identity and EU representative (if non-EU)
Model card URL
Date of first EU market placement
Contact for compliance inquiries
Declaration of compliance status (CoP signatory or alternative compliance approach)

The registration creates a public-facing record that downstream deployers, regulators, and researchers can query to verify a provider's compliance status.

Infrastructure Considerations: Why EU Hosting Matters for Documentation

The documentation requirements create a practical hosting question: where does your compliance documentation infrastructure live?

For GPAI providers with EU users, there are two considerations:

1. GDPR compliance of your developer portal: If your documentation site collects developer account data (email, API key usage logs, incident reports), that processing is subject to GDPR. Non-EU-hosted portals can face transfer restrictions.

2. Data sovereignty for your technical documentation: Technical documentation submitted to the EU AI Office may need to remain in EU jurisdiction for audit purposes. Storing it in AWS us-east-1 or similar creates potential conflicts with data residency expectations.

What EU-native providers have at this point: Infrastructure from European providers like sota.io, Scaleway, Hetzner, or OVHcloud is deployed under German/French/EU legal frameworks — no CLOUD Act exposure, no third-party government access risk without EU legal process. For GPAI compliance documentation that may include commercially sensitive training data summaries, that distinction matters.

Implementation Checklist: Model Card Compliance by August 2

Phase 1 — Now (June 2026)

Audit existing documentation: do you have contemporaneous records for your training data provenance?
Create or update your model card using the structure above
Draft your training data summary (the version to be published)
Document your copyright compliance policy and checking process
Identify your EU representative if you are a non-EU provider

Phase 2 — July 2026 (before CoP finalization)

Review final CoP text against your documentation stack
Sign the Code of Practice or document your alternative compliance approach
Register with EU AI Office model database
Set up downstream deployer notification system (email/webhook for model version updates)
Test your incident reporting pipeline (can you contact the EU AI Office within required timeframes?)

Phase 3 — August 2, 2026 (enforcement date)

All documentation complete and version-controlled
Model card publicly accessible
Downstream deployer package delivered to all API customers
Incident reporting channel live and tested
Compliance monitoring running

What Comes Next in This Series

The model card and documentation requirements are the foundation. But they feed into the more complex GPAI obligations:

Post #3 — Copyright compliance in depth: Training data audits, TDM reservation checking at scale, and managing unresolved copyright claims.

Post #4 — Systemic risk assessment: Red teaming protocols, capability evaluations, and the serious incident definition for GPAI providers.

Post #5 — The GPAI compliance stack finale: Putting it together — registry, model card, incident reporting, and EU infrastructure architecture for August 2.

Summary

GPAI compliance documentation is a four-layer stack: internal technical documentation, a public model card, a downstream deployer package, and a published training data summary. The EU AI Act's GPAI chapter requires all four layers to be in place before August 2, 2026.

The model card is not a marketing document — it is a structured technical disclosure that enables downstream compliance, regulatory audit, and incident response. Getting the structure right now saves significant remediation effort as enforcement begins.

For development teams building on EU infrastructure, the documentation work and the infrastructure decisions interact: where you store compliance records, where your developer portal lives, and how you handle cross-border data flows all have compliance implications under both the EU AI Act and GDPR.

sota.io provides the EU-native deployment infrastructure that GPAI providers need for both their production AI services and their compliance documentation systems — no CLOUD Act exposure, no transatlantic data transfer complexity, deployed in German data centres under EU legal frameworks.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.

Join the waitlist View plans