EU AI Act Art.50 GPAI Content Labelling: Machine-Readable Metadata Standards & August 2026 Compliance Checklist
Post #1444 in the sota.io EU AI Act Art.50 Transparency Ops 2026 Series
61 days to August 2, 2026. EU AI Act Article 50 requires that AI-generated content be identifiable — not just to users through disclosure banners, but to automated systems through machine-readable metadata. This post covers the concrete technical standards: what metadata formats regulators expect, how C2PA Content Credentials work, where IPTC Digital Source Type fits, and what SaaS developers building on GPAI APIs must implement before the deadline.
Previous posts in this series covered what developers must disclose and how to implement watermarking. This post focuses on metadata standards and labelling — the layer above watermarking that enables automated content verification pipelines.
The Two-Layer Approach: Watermarks vs. Metadata Labels
Before diving into standards, it helps to understand why Art.50 requires two complementary approaches:
Layer 1 — Signal-level watermarks (covered in the previous post): Invisible or imperceptible markers embedded directly in the content bits. Robust to resaving, transcoding, and compression. Examples: SynthID for text/images, AudioSeal for audio. Survives format conversion but not always detectable by humans or standard metadata readers.
Layer 2 — Machine-readable metadata labels (this post): Structured, human-readable and machine-parseable records attached to or embedded in the content file. Describes who generated the content, when, with which system, and what AI was involved. Easily readable by verifiers, content management systems, and social media platforms. Can be stripped during file conversion but is straightforward to inspect without specialized watermark detectors.
Art.50 requires both. Content marking must be "machine-readable" — meaning a software system can extract it without human intervention — and must make the content "detectable as artificially generated or manipulated."
Standard 1: C2PA Content Credentials (The Core Standard)
The Coalition for Content Provenance and Authenticity (C2PA) has published the de-facto open standard for content provenance metadata. C2PA Content Credentials are already adopted by Adobe (via Content Authenticity Initiative), Microsoft, Google, OpenAI, Stability AI, and Truepic.
What a C2PA Manifest Contains
A C2PA Manifest is a JSON-LD structure embedded in the content file (or attached as a sidecar) that contains:
{
"claim_generator": "YourApp/1.0 c2pa-rs/0.36",
"format": "image/jpeg",
"assertions": [
{
"label": "c2pa.actions",
"data": {
"actions": [
{
"action": "c2pa.created",
"when": "2026-06-02T10:23:00Z",
"softwareAgent": {
"name": "YourAIService",
"version": "2.1"
}
}
]
}
},
{
"label": "c2pa.ai_generative_training",
"data": {
"dataSource": "training_data.ai_generative"
}
}
],
"claim": {
"signature": "<JWS signature over the manifest>",
"signingTime": "2026-06-02T10:23:01Z"
}
}
The manifest is signed with your organization's X.509 certificate, making it verifiable and tamper-evident. If the content is modified after signing, the signature invalidates — alerting verifiers to potential manipulation.
C2PA Embedding by File Type
| Format | Embedding Method | C2PA Spec Reference |
|---|---|---|
| JPEG | APP11 marker (JUMBF box) | C2PA Spec §11.1 |
| PNG | caBX ancillary chunk | C2PA Spec §11.2 |
| MP4/MOV | uuid box in moov/trak | C2PA Spec §11.3 |
| WAV/AIFF | JUMBF sidecar or LIST chunk | C2PA Spec §11.4 |
AF entry in XMP | C2PA Spec §11.5 | |
| Plain text/HTML | Sidecar .c2pa file or HTTP header | C2PA Spec §11.6 |
For text content (blog posts, chatbot outputs, summaries), C2PA recommends delivering credentials via HTTP response headers or as a .c2pa sidecar file:
Content-Credentials: <base64-encoded-manifest>
C2PA-Certificate: <url-to-public-key>
Implementation via c2pa-rs (Rust) or @contentauth/sdk (Node.js)
import { createC2pa, createTestSigner } from '@contentauth/sdk';
const c2pa = createC2pa();
async function labelAIGeneratedImage(imageBuffer: Buffer, mimeType: string): Promise<Buffer> {
const signer = await createTestSigner(); // Replace with real cert in production
const manifest = {
claim_generator: 'YourApp/1.0',
format: mimeType,
assertions: [
{
label: 'c2pa.actions',
data: {
actions: [{
action: 'c2pa.created',
when: new Date().toISOString(),
softwareAgent: { name: 'YourAI', version: '1.0' },
}],
},
},
],
};
const result = await c2pa.sign({ asset: imageBuffer, manifest, signer });
return result.signedAsset;
}
Production deployments should use a CA-signed certificate (not a self-signed test cert). C2PA certification programs are run through CAI (Content Authenticity Initiative) at contentauthenticity.org.
Standard 2: IPTC Digital Source Type Vocabulary
The International Press Telecommunications Council (IPTC) maintains the Iptc4xmpExt:DigitalSourceType vocabulary — a standardized set of URIs describing how digital content was created. This vocabulary is embedded via XMP (eXtensible Metadata Platform), Adobe's widely-supported metadata framework present in all major image formats.
IPTC Digital Source Type Values for AI-Generated Content
| URI | Meaning | When to Use |
|---|---|---|
http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia | Created by a trained AI model | AI text/image/audio generated from a model trained on data |
http://cv.iptc.org/newscodes/digitalsourcetype/algorithmicMedia | Created by algorithm (not trained) | Procedural generation, parametric content |
http://cv.iptc.org/newscodes/digitalsourcetype/compositeSynthetic | Human + AI composite | User-provided elements enhanced/modified by AI |
http://cv.iptc.org/newscodes/digitalsourcetype/digitalArt | Human-created digital art | Not AI-generated — do not use for AI outputs |
For any content your AI system fully generates, use trainedAlgorithmicMedia. For mixed outputs where a user uploads an image and your AI modifies it, use compositeSynthetic.
Embedding IPTC Metadata via ExifTool (CLI)
# Add AI-generated label to an image
exiftool \
-XMP-iptcExt:DigitalSourceType="http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia" \
-XMP-dc:Creator="YourAI Service v2.1" \
-XMP-photoshop:Credit="AI-generated by YourAI" \
output_image.jpg
Embedding IPTC Metadata via sharp (Node.js)
import sharp from 'sharp';
async function addIptcMetadata(imageBuffer: Buffer): Promise<Buffer> {
return sharp(imageBuffer)
.withMetadata({
exif: {
IFD0: {
Software: 'YourAI Service v2.1',
ImageDescription: 'AI-generated content. trainedAlgorithmicMedia',
},
},
})
.toBuffer();
}
Note: sharp does not support full XMP embedding natively. For complete XMP/IPTC fields, use ExifTool via subprocess or the piexifjs library.
Standard 3: SynthID Integration (GPAI API Consumers)
If you are building on Google's Gemini API, you may be able to query whether content was SynthID-watermarked. For your own content pipeline, Google provides the SynthID toolkit — but it is currently not fully open-source for all modalities.
Available SynthID tools:
- SynthID Text Detector — detects SynthID watermarks in text (Gemini outputs)
- SynthID Image — invisible watermarking for images (Imagen model output)
- SynthID Audio — AudioSeal-based watermarking (via Google DeepMind)
For GPAI API consumers (not Google Cloud customers), the primary obligation is to propagate and not strip watermarks that upstream providers have embedded. If you receive an AI-generated image from an API, do not pipeline it through operations that strip EXIF/XMP metadata without reattaching it.
// ❌ Strips metadata, destroys Art.50 compliance
const strippedImage = await sharp(apiResponse.image)
.jpeg({ quality: 85 })
.toBuffer();
// ✅ Preserves metadata
const preservedImage = await sharp(apiResponse.image)
.jpeg({ quality: 85 })
.withMetadata() // This preserves EXIF/XMP
.toBuffer();
The .withMetadata() call in sharp preserves all existing metadata including IPTC/XMP provenance fields. This is the minimal requirement for deployers who do not generate content themselves.
Standard 4: HTTP-Level Content Provenance Headers
For text-based AI outputs (chatbots, summary APIs, content generation endpoints), file-embedding is not applicable. The emerging approach is HTTP response headers combined with a verifiable credential:
HTTP/1.1 200 OK
Content-Type: application/json
X-Content-Type-AI: generated
X-AI-Model: YourModel/2.1
X-AI-Provenance: https://your-api.example.com/.well-known/ai-provenance
X-C2PA-Manifest-URL: https://credentials.your-api.example.com/manifests/abc123
A .well-known/ai-provenance JSON document should describe:
- Model name and version
- Training data declaration
- Contact for provenance disputes
- Link to full C2PA certificate chain
While HTTP headers are not yet formally mandated by Art.50 delegated acts, they align with the Commission's stated direction and are already implemented by OpenAI (openai-organization, openai-processing-ms) and Anthropic.
Compliance Architecture for SaaS Developers
Here's how to structure your content labelling pipeline for Art.50 compliance:
┌─────────────────────────────────────────────────────┐
│ CONTENT GENERATION LAYER │
│ GPAI API call → raw output (image/text/audio/video) │
└──────────────────────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ METADATA LABELLING LAYER │
│ 1. Attach IPTC DigitalSourceType = trainedAlgo... │
│ 2. Sign C2PA Manifest with your X.509 cert │
│ 3. Preserve upstream SynthID watermark (if present) │
│ 4. Add X-AI-Provenance header (text endpoints) │
└──────────────────────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ DELIVERY LAYER │
│ - Serve with metadata intact (.withMetadata()) │
│ - CDN: configure to pass through custom headers │
│ - Storage: S3/GCS object metadata for provenance │
└──────────────────────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ VERIFICATION LAYER (AUDIT LOG) │
│ - Log manifest hash + content hash at generation │
│ - Store in append-only audit trail │
│ - Retain for at least 6 months (Art.50 + GDPR) │
└─────────────────────────────────────────────────────┘
CDN Considerations
A common compliance gap is CDNs stripping custom headers. Ensure your CDN (Cloudflare, AWS CloudFront, Fastly) is configured to pass through Content-Credentials and X-AI-Provenance headers:
# nginx: pass AI provenance headers downstream
add_header X-Content-Type-AI $upstream_http_x_content_type_ai;
add_header X-AI-Provenance $upstream_http_x_ai_provenance;
add_header Content-Credentials $upstream_http_content_credentials;
S3/GCS Object Metadata
When storing AI-generated content in object storage, include provenance in object metadata:
// AWS S3
await s3Client.send(new PutObjectCommand({
Bucket: 'your-bucket',
Key: `images/${id}.jpg`,
Body: labelledImageBuffer,
ContentType: 'image/jpeg',
Metadata: {
'ai-generated': 'true',
'ai-model': 'your-model-v2.1',
'digital-source-type': 'trainedAlgorithmicMedia',
'generation-timestamp': new Date().toISOString(),
'c2pa-manifest-id': manifestId,
},
}));
What About the EU Commission's Implementing Acts?
Art.50 delegates the technical specification of machine-readable formats to implementing acts the European Commission is expected to adopt by mid-2026. These acts will formally designate which standards (C2PA, IPTC, or new ones) satisfy the legal obligation.
What this means for developers:
- C2PA and IPTC Digital Source Type are the safe choice now — they align with what the Commission has signalled and are already required by major AI platforms.
- Waiting for implementing acts to finalize your approach risks missing the August 2 deadline.
- The implementing acts are likely to recognize C2PA rather than mandate a different standard, so early implementation is low-regret.
You can track the Commission's progress on implementing acts via EUR-Lex under CELEX prefix 32024R1689 (the AI Act base regulation). The GPAI Code of Practice (currently in consultation phase) also provides guidance on expected standards for GPAI model providers.
August 2026 Compliance Checklist — GPAI Content Labelling
For AI System Providers (you generate content directly)
- C2PA manifest attached to all images, audio, video your system produces
- IPTC DigitalSourceType =
trainedAlgorithmicMediaembedded in image metadata - C2PA certificate chain from a recognized CA (not self-signed in production)
- Manifest includes
c2pa.createdaction, software agent name, generation timestamp - Signature verification passes in Verify by CAI (
verify.contentauthenticity.org) - Text outputs have
X-Content-Type-AIresponse header and.well-known/ai-provenanceendpoint - Audit log records manifest hash + content hash per generation event
- SynthID or equivalent watermark embedded (where technically feasible per modality)
For GPAI API Consumers/Deployers (you build on top of GPAI APIs)
- Upstream metadata preserved —
.withMetadata()in sharp or equivalent in your image pipeline - Metadata not stripped by CDN, resizing, format conversion, or storage operations
- Deepfake disclosure (Art.50(3)) in place for any manipulated real-world content
- User disclosure (Art.50(1)) present if your application interacts with users as an AI agent
- Provenance passthrough — if upstream C2PA manifest exists, it reaches the end user's browser/client
- Documented upstream API — written record of which GPAI model produces your content (for regulator inquiry)
For All Developers
- August 2, 2026 hard deadline tracked in your compliance calendar
- Legal team briefed on Art.50 obligations distinguishing providers vs. deployers
- Commission implementing acts on watch list for final technical standard designation
- Internal labelling policy documents which content types require Art.50 marking
What Comes After August 2026?
Art.50 compliance at August 2, 2026 is not the endpoint. The regulation anticipates evolution:
-
National market surveillance authorities will begin enforcement from August 2026. Early penalties are likely to target obvious non-compliance (no disclosure, no metadata at all) rather than technical minutiae.
-
The GPAI Code of Practice (expected finalized Q2/Q3 2026) will contain more detailed expectations for GPAI providers specifically, including content labelling depth.
-
C2PA v3.0 is in development and expected to add stronger AI-specific assertions and cross-platform verification infrastructure.
-
Browser-native verification — Chrome, Firefox, and Safari are evaluating native Content Credentials verification in their media rendering pipelines, which would make C2PA compliance visible to end users without plugins.
See Also
- EU AI Act Art.50: What Developers Must Disclose About AI-Generated Content — Legal obligations and scope
- EU AI Act Art.50 Watermarking: Technical Implementation Guide — Signal-level watermarks
- EU AI Act August 2026 Developer Action Checklist — Cross-article August deadline overview
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.