2026-06-02·5 min read·sota.io Team

EU AI Act Art.50 GPAI Content Labelling: Machine-Readable Metadata Standards & August 2026 Compliance Checklist

Post #1444 in the sota.io EU AI Act Art.50 Transparency Ops 2026 Series

EU AI Act Art.50 GPAI content labelling — machine-readable metadata standards diagram with C2PA, IPTC, SynthID

61 days to August 2, 2026. EU AI Act Article 50 requires that AI-generated content be identifiable — not just to users through disclosure banners, but to automated systems through machine-readable metadata. This post covers the concrete technical standards: what metadata formats regulators expect, how C2PA Content Credentials work, where IPTC Digital Source Type fits, and what SaaS developers building on GPAI APIs must implement before the deadline.

Previous posts in this series covered what developers must disclose and how to implement watermarking. This post focuses on metadata standards and labelling — the layer above watermarking that enables automated content verification pipelines.

The Two-Layer Approach: Watermarks vs. Metadata Labels

Before diving into standards, it helps to understand why Art.50 requires two complementary approaches:

Layer 1 — Signal-level watermarks (covered in the previous post): Invisible or imperceptible markers embedded directly in the content bits. Robust to resaving, transcoding, and compression. Examples: SynthID for text/images, AudioSeal for audio. Survives format conversion but not always detectable by humans or standard metadata readers.

Layer 2 — Machine-readable metadata labels (this post): Structured, human-readable and machine-parseable records attached to or embedded in the content file. Describes who generated the content, when, with which system, and what AI was involved. Easily readable by verifiers, content management systems, and social media platforms. Can be stripped during file conversion but is straightforward to inspect without specialized watermark detectors.

Art.50 requires both. Content marking must be "machine-readable" — meaning a software system can extract it without human intervention — and must make the content "detectable as artificially generated or manipulated."

Standard 1: C2PA Content Credentials (The Core Standard)

The Coalition for Content Provenance and Authenticity (C2PA) has published the de-facto open standard for content provenance metadata. C2PA Content Credentials are already adopted by Adobe (via Content Authenticity Initiative), Microsoft, Google, OpenAI, Stability AI, and Truepic.

What a C2PA Manifest Contains

A C2PA Manifest is a JSON-LD structure embedded in the content file (or attached as a sidecar) that contains:

{
  "claim_generator": "YourApp/1.0 c2pa-rs/0.36",
  "format": "image/jpeg",
  "assertions": [
    {
      "label": "c2pa.actions",
      "data": {
        "actions": [
          {
            "action": "c2pa.created",
            "when": "2026-06-02T10:23:00Z",
            "softwareAgent": {
              "name": "YourAIService",
              "version": "2.1"
            }
          }
        ]
      }
    },
    {
      "label": "c2pa.ai_generative_training",
      "data": {
        "dataSource": "training_data.ai_generative"
      }
    }
  ],
  "claim": {
    "signature": "<JWS signature over the manifest>",
    "signingTime": "2026-06-02T10:23:01Z"
  }
}

The manifest is signed with your organization's X.509 certificate, making it verifiable and tamper-evident. If the content is modified after signing, the signature invalidates — alerting verifiers to potential manipulation.

C2PA Embedding by File Type

Format	Embedding Method	C2PA Spec Reference
JPEG	`APP11` marker (JUMBF box)	C2PA Spec §11.1
PNG	`caBX` ancillary chunk	C2PA Spec §11.2
MP4/MOV	`uuid` box in moov/trak	C2PA Spec §11.3
WAV/AIFF	JUMBF sidecar or `LIST` chunk	C2PA Spec §11.4
PDF	`AF` entry in XMP	C2PA Spec §11.5
Plain text/HTML	Sidecar `.c2pa` file or HTTP header	C2PA Spec §11.6

For text content (blog posts, chatbot outputs, summaries), C2PA recommends delivering credentials via HTTP response headers or as a .c2pa sidecar file:

Content-Credentials: <base64-encoded-manifest>
C2PA-Certificate: <url-to-public-key>

Implementation via c2pa-rs (Rust) or @contentauth/sdk (Node.js)

import { createC2pa, createTestSigner } from '@contentauth/sdk';

const c2pa = createC2pa();

async function labelAIGeneratedImage(imageBuffer: Buffer, mimeType: string): Promise<Buffer> {
  const signer = await createTestSigner(); // Replace with real cert in production
  
  const manifest = {
    claim_generator: 'YourApp/1.0',
    format: mimeType,
    assertions: [
      {
        label: 'c2pa.actions',
        data: {
          actions: [{
            action: 'c2pa.created',
            when: new Date().toISOString(),
            softwareAgent: { name: 'YourAI', version: '1.0' },
          }],
        },
      },
    ],
  };

  const result = await c2pa.sign({ asset: imageBuffer, manifest, signer });
  return result.signedAsset;
}

Production deployments should use a CA-signed certificate (not a self-signed test cert). C2PA certification programs are run through CAI (Content Authenticity Initiative) at contentauthenticity.org.

Standard 2: IPTC Digital Source Type Vocabulary

The International Press Telecommunications Council (IPTC) maintains the Iptc4xmpExt:DigitalSourceType vocabulary — a standardized set of URIs describing how digital content was created. This vocabulary is embedded via XMP (eXtensible Metadata Platform), Adobe's widely-supported metadata framework present in all major image formats.

IPTC Digital Source Type Values for AI-Generated Content

URI	Meaning	When to Use
`http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia`	Created by a trained AI model	AI text/image/audio generated from a model trained on data
`http://cv.iptc.org/newscodes/digitalsourcetype/algorithmicMedia`	Created by algorithm (not trained)	Procedural generation, parametric content
`http://cv.iptc.org/newscodes/digitalsourcetype/compositeSynthetic`	Human + AI composite	User-provided elements enhanced/modified by AI
`http://cv.iptc.org/newscodes/digitalsourcetype/digitalArt`	Human-created digital art	Not AI-generated — do not use for AI outputs

For any content your AI system fully generates, use trainedAlgorithmicMedia. For mixed outputs where a user uploads an image and your AI modifies it, use compositeSynthetic.

Embedding IPTC Metadata via ExifTool (CLI)

# Add AI-generated label to an image
exiftool \
  -XMP-iptcExt:DigitalSourceType="http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia" \
  -XMP-dc:Creator="YourAI Service v2.1" \
  -XMP-photoshop:Credit="AI-generated by YourAI" \
  output_image.jpg

Embedding IPTC Metadata via sharp (Node.js)

import sharp from 'sharp';

async function addIptcMetadata(imageBuffer: Buffer): Promise<Buffer> {
  return sharp(imageBuffer)
    .withMetadata({
      exif: {
        IFD0: {
          Software: 'YourAI Service v2.1',
          ImageDescription: 'AI-generated content. trainedAlgorithmicMedia',
        },
      },
    })
    .toBuffer();
}

Note: sharp does not support full XMP embedding natively. For complete XMP/IPTC fields, use ExifTool via subprocess or the piexifjs library.

Standard 3: SynthID Integration (GPAI API Consumers)

If you are building on Google's Gemini API, you may be able to query whether content was SynthID-watermarked. For your own content pipeline, Google provides the SynthID toolkit — but it is currently not fully open-source for all modalities.

Available SynthID tools:

SynthID Text Detector — detects SynthID watermarks in text (Gemini outputs)
SynthID Image — invisible watermarking for images (Imagen model output)
SynthID Audio — AudioSeal-based watermarking (via Google DeepMind)

For GPAI API consumers (not Google Cloud customers), the primary obligation is to propagate and not strip watermarks that upstream providers have embedded. If you receive an AI-generated image from an API, do not pipeline it through operations that strip EXIF/XMP metadata without reattaching it.

// ❌ Strips metadata, destroys Art.50 compliance
const strippedImage = await sharp(apiResponse.image)
  .jpeg({ quality: 85 })
  .toBuffer();

// ✅ Preserves metadata
const preservedImage = await sharp(apiResponse.image)
  .jpeg({ quality: 85 })
  .withMetadata()  // This preserves EXIF/XMP
  .toBuffer();

The .withMetadata() call in sharp preserves all existing metadata including IPTC/XMP provenance fields. This is the minimal requirement for deployers who do not generate content themselves.

Standard 4: HTTP-Level Content Provenance Headers

For text-based AI outputs (chatbots, summary APIs, content generation endpoints), file-embedding is not applicable. The emerging approach is HTTP response headers combined with a verifiable credential:

HTTP/1.1 200 OK
Content-Type: application/json
X-Content-Type-AI: generated
X-AI-Model: YourModel/2.1
X-AI-Provenance: https://your-api.example.com/.well-known/ai-provenance
X-C2PA-Manifest-URL: https://credentials.your-api.example.com/manifests/abc123

A .well-known/ai-provenance JSON document should describe:

Model name and version
Training data declaration
Contact for provenance disputes
Link to full C2PA certificate chain

While HTTP headers are not yet formally mandated by Art.50 delegated acts, they align with the Commission's stated direction and are already implemented by OpenAI (openai-organization, openai-processing-ms) and Anthropic.

Compliance Architecture for SaaS Developers

Here's how to structure your content labelling pipeline for Art.50 compliance:

┌─────────────────────────────────────────────────────┐
│                CONTENT GENERATION LAYER              │
│  GPAI API call → raw output (image/text/audio/video) │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│              METADATA LABELLING LAYER                │
│  1. Attach IPTC DigitalSourceType = trainedAlgo...   │
│  2. Sign C2PA Manifest with your X.509 cert          │
│  3. Preserve upstream SynthID watermark (if present) │
│  4. Add X-AI-Provenance header (text endpoints)      │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│               DELIVERY LAYER                         │
│  - Serve with metadata intact (.withMetadata())      │
│  - CDN: configure to pass through custom headers     │
│  - Storage: S3/GCS object metadata for provenance    │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│              VERIFICATION LAYER (AUDIT LOG)          │
│  - Log manifest hash + content hash at generation    │
│  - Store in append-only audit trail                  │
│  - Retain for at least 6 months (Art.50 + GDPR)     │
└─────────────────────────────────────────────────────┘

CDN Considerations

A common compliance gap is CDNs stripping custom headers. Ensure your CDN (Cloudflare, AWS CloudFront, Fastly) is configured to pass through Content-Credentials and X-AI-Provenance headers:

# nginx: pass AI provenance headers downstream
add_header X-Content-Type-AI $upstream_http_x_content_type_ai;
add_header X-AI-Provenance $upstream_http_x_ai_provenance;
add_header Content-Credentials $upstream_http_content_credentials;

S3/GCS Object Metadata

When storing AI-generated content in object storage, include provenance in object metadata:

// AWS S3
await s3Client.send(new PutObjectCommand({
  Bucket: 'your-bucket',
  Key: `images/${id}.jpg`,
  Body: labelledImageBuffer,
  ContentType: 'image/jpeg',
  Metadata: {
    'ai-generated': 'true',
    'ai-model': 'your-model-v2.1',
    'digital-source-type': 'trainedAlgorithmicMedia',
    'generation-timestamp': new Date().toISOString(),
    'c2pa-manifest-id': manifestId,
  },
}));

What About the EU Commission's Implementing Acts?

Art.50 delegates the technical specification of machine-readable formats to implementing acts the European Commission is expected to adopt by mid-2026. These acts will formally designate which standards (C2PA, IPTC, or new ones) satisfy the legal obligation.

What this means for developers:

C2PA and IPTC Digital Source Type are the safe choice now — they align with what the Commission has signalled and are already required by major AI platforms.
Waiting for implementing acts to finalize your approach risks missing the August 2 deadline.
The implementing acts are likely to recognize C2PA rather than mandate a different standard, so early implementation is low-regret.

You can track the Commission's progress on implementing acts via EUR-Lex under CELEX prefix 32024R1689 (the AI Act base regulation). The GPAI Code of Practice (currently in consultation phase) also provides guidance on expected standards for GPAI model providers.

August 2026 Compliance Checklist — GPAI Content Labelling

For AI System Providers (you generate content directly)

C2PA manifest attached to all images, audio, video your system produces
IPTC DigitalSourceType = trainedAlgorithmicMedia embedded in image metadata
C2PA certificate chain from a recognized CA (not self-signed in production)
Manifest includes c2pa.created action, software agent name, generation timestamp
Signature verification passes in Verify by CAI (verify.contentauthenticity.org)
Text outputs have X-Content-Type-AI response header and .well-known/ai-provenance endpoint
Audit log records manifest hash + content hash per generation event
SynthID or equivalent watermark embedded (where technically feasible per modality)

For GPAI API Consumers/Deployers (you build on top of GPAI APIs)

Upstream metadata preserved — .withMetadata() in sharp or equivalent in your image pipeline
Metadata not stripped by CDN, resizing, format conversion, or storage operations
Deepfake disclosure (Art.50(3)) in place for any manipulated real-world content
User disclosure (Art.50(1)) present if your application interacts with users as an AI agent
Provenance passthrough — if upstream C2PA manifest exists, it reaches the end user's browser/client
Documented upstream API — written record of which GPAI model produces your content (for regulator inquiry)

For All Developers

August 2, 2026 hard deadline tracked in your compliance calendar
Legal team briefed on Art.50 obligations distinguishing providers vs. deployers
Commission implementing acts on watch list for final technical standard designation
Internal labelling policy documents which content types require Art.50 marking

What Comes After August 2026?

Art.50 compliance at August 2, 2026 is not the endpoint. The regulation anticipates evolution:

National market surveillance authorities will begin enforcement from August 2026. Early penalties are likely to target obvious non-compliance (no disclosure, no metadata at all) rather than technical minutiae.
The GPAI Code of Practice (expected finalized Q2/Q3 2026) will contain more detailed expectations for GPAI providers specifically, including content labelling depth.
C2PA v3.0 is in development and expected to add stronger AI-specific assertions and cross-platform verification infrastructure.
Browser-native verification — Chrome, Firefox, and Safari are evaluating native Content Credentials verification in their media rendering pipelines, which would make C2PA compliance visible to end users without plugins.

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.

Join the waitlist View pricing