2026-06-02·5 min read·sota.io Team

EU AI Act Art.50 GPAI Content Labelling: Machine-Readable Metadata Standards & August 2026 Compliance Checklist

Post #1444 in the sota.io EU AI Act Art.50 Transparency Ops 2026 Series

EU AI Act Art.50 GPAI content labelling — machine-readable metadata standards diagram with C2PA, IPTC, SynthID

61 days to August 2, 2026. EU AI Act Article 50 requires that AI-generated content be identifiable — not just to users through disclosure banners, but to automated systems through machine-readable metadata. This post covers the concrete technical standards: what metadata formats regulators expect, how C2PA Content Credentials work, where IPTC Digital Source Type fits, and what SaaS developers building on GPAI APIs must implement before the deadline.

Previous posts in this series covered what developers must disclose and how to implement watermarking. This post focuses on metadata standards and labelling — the layer above watermarking that enables automated content verification pipelines.


The Two-Layer Approach: Watermarks vs. Metadata Labels

Before diving into standards, it helps to understand why Art.50 requires two complementary approaches:

Layer 1 — Signal-level watermarks (covered in the previous post): Invisible or imperceptible markers embedded directly in the content bits. Robust to resaving, transcoding, and compression. Examples: SynthID for text/images, AudioSeal for audio. Survives format conversion but not always detectable by humans or standard metadata readers.

Layer 2 — Machine-readable metadata labels (this post): Structured, human-readable and machine-parseable records attached to or embedded in the content file. Describes who generated the content, when, with which system, and what AI was involved. Easily readable by verifiers, content management systems, and social media platforms. Can be stripped during file conversion but is straightforward to inspect without specialized watermark detectors.

Art.50 requires both. Content marking must be "machine-readable" — meaning a software system can extract it without human intervention — and must make the content "detectable as artificially generated or manipulated."


Standard 1: C2PA Content Credentials (The Core Standard)

The Coalition for Content Provenance and Authenticity (C2PA) has published the de-facto open standard for content provenance metadata. C2PA Content Credentials are already adopted by Adobe (via Content Authenticity Initiative), Microsoft, Google, OpenAI, Stability AI, and Truepic.

What a C2PA Manifest Contains

A C2PA Manifest is a JSON-LD structure embedded in the content file (or attached as a sidecar) that contains:

{
  "claim_generator": "YourApp/1.0 c2pa-rs/0.36",
  "format": "image/jpeg",
  "assertions": [
    {
      "label": "c2pa.actions",
      "data": {
        "actions": [
          {
            "action": "c2pa.created",
            "when": "2026-06-02T10:23:00Z",
            "softwareAgent": {
              "name": "YourAIService",
              "version": "2.1"
            }
          }
        ]
      }
    },
    {
      "label": "c2pa.ai_generative_training",
      "data": {
        "dataSource": "training_data.ai_generative"
      }
    }
  ],
  "claim": {
    "signature": "<JWS signature over the manifest>",
    "signingTime": "2026-06-02T10:23:01Z"
  }
}

The manifest is signed with your organization's X.509 certificate, making it verifiable and tamper-evident. If the content is modified after signing, the signature invalidates — alerting verifiers to potential manipulation.

C2PA Embedding by File Type

FormatEmbedding MethodC2PA Spec Reference
JPEGAPP11 marker (JUMBF box)C2PA Spec §11.1
PNGcaBX ancillary chunkC2PA Spec §11.2
MP4/MOVuuid box in moov/trakC2PA Spec §11.3
WAV/AIFFJUMBF sidecar or LIST chunkC2PA Spec §11.4
PDFAF entry in XMPC2PA Spec §11.5
Plain text/HTMLSidecar .c2pa file or HTTP headerC2PA Spec §11.6

For text content (blog posts, chatbot outputs, summaries), C2PA recommends delivering credentials via HTTP response headers or as a .c2pa sidecar file:

Content-Credentials: <base64-encoded-manifest>
C2PA-Certificate: <url-to-public-key>

Implementation via c2pa-rs (Rust) or @contentauth/sdk (Node.js)

import { createC2pa, createTestSigner } from '@contentauth/sdk';

const c2pa = createC2pa();

async function labelAIGeneratedImage(imageBuffer: Buffer, mimeType: string): Promise<Buffer> {
  const signer = await createTestSigner(); // Replace with real cert in production
  
  const manifest = {
    claim_generator: 'YourApp/1.0',
    format: mimeType,
    assertions: [
      {
        label: 'c2pa.actions',
        data: {
          actions: [{
            action: 'c2pa.created',
            when: new Date().toISOString(),
            softwareAgent: { name: 'YourAI', version: '1.0' },
          }],
        },
      },
    ],
  };

  const result = await c2pa.sign({ asset: imageBuffer, manifest, signer });
  return result.signedAsset;
}

Production deployments should use a CA-signed certificate (not a self-signed test cert). C2PA certification programs are run through CAI (Content Authenticity Initiative) at contentauthenticity.org.


Standard 2: IPTC Digital Source Type Vocabulary

The International Press Telecommunications Council (IPTC) maintains the Iptc4xmpExt:DigitalSourceType vocabulary — a standardized set of URIs describing how digital content was created. This vocabulary is embedded via XMP (eXtensible Metadata Platform), Adobe's widely-supported metadata framework present in all major image formats.

IPTC Digital Source Type Values for AI-Generated Content

URIMeaningWhen to Use
http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMediaCreated by a trained AI modelAI text/image/audio generated from a model trained on data
http://cv.iptc.org/newscodes/digitalsourcetype/algorithmicMediaCreated by algorithm (not trained)Procedural generation, parametric content
http://cv.iptc.org/newscodes/digitalsourcetype/compositeSyntheticHuman + AI compositeUser-provided elements enhanced/modified by AI
http://cv.iptc.org/newscodes/digitalsourcetype/digitalArtHuman-created digital artNot AI-generated — do not use for AI outputs

For any content your AI system fully generates, use trainedAlgorithmicMedia. For mixed outputs where a user uploads an image and your AI modifies it, use compositeSynthetic.

Embedding IPTC Metadata via ExifTool (CLI)

# Add AI-generated label to an image
exiftool \
  -XMP-iptcExt:DigitalSourceType="http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia" \
  -XMP-dc:Creator="YourAI Service v2.1" \
  -XMP-photoshop:Credit="AI-generated by YourAI" \
  output_image.jpg

Embedding IPTC Metadata via sharp (Node.js)

import sharp from 'sharp';

async function addIptcMetadata(imageBuffer: Buffer): Promise<Buffer> {
  return sharp(imageBuffer)
    .withMetadata({
      exif: {
        IFD0: {
          Software: 'YourAI Service v2.1',
          ImageDescription: 'AI-generated content. trainedAlgorithmicMedia',
        },
      },
    })
    .toBuffer();
}

Note: sharp does not support full XMP embedding natively. For complete XMP/IPTC fields, use ExifTool via subprocess or the piexifjs library.


Standard 3: SynthID Integration (GPAI API Consumers)

If you are building on Google's Gemini API, you may be able to query whether content was SynthID-watermarked. For your own content pipeline, Google provides the SynthID toolkit — but it is currently not fully open-source for all modalities.

Available SynthID tools:

For GPAI API consumers (not Google Cloud customers), the primary obligation is to propagate and not strip watermarks that upstream providers have embedded. If you receive an AI-generated image from an API, do not pipeline it through operations that strip EXIF/XMP metadata without reattaching it.

// ❌ Strips metadata, destroys Art.50 compliance
const strippedImage = await sharp(apiResponse.image)
  .jpeg({ quality: 85 })
  .toBuffer();

// ✅ Preserves metadata
const preservedImage = await sharp(apiResponse.image)
  .jpeg({ quality: 85 })
  .withMetadata()  // This preserves EXIF/XMP
  .toBuffer();

The .withMetadata() call in sharp preserves all existing metadata including IPTC/XMP provenance fields. This is the minimal requirement for deployers who do not generate content themselves.


Standard 4: HTTP-Level Content Provenance Headers

For text-based AI outputs (chatbots, summary APIs, content generation endpoints), file-embedding is not applicable. The emerging approach is HTTP response headers combined with a verifiable credential:

HTTP/1.1 200 OK
Content-Type: application/json
X-Content-Type-AI: generated
X-AI-Model: YourModel/2.1
X-AI-Provenance: https://your-api.example.com/.well-known/ai-provenance
X-C2PA-Manifest-URL: https://credentials.your-api.example.com/manifests/abc123

A .well-known/ai-provenance JSON document should describe:

While HTTP headers are not yet formally mandated by Art.50 delegated acts, they align with the Commission's stated direction and are already implemented by OpenAI (openai-organization, openai-processing-ms) and Anthropic.


Compliance Architecture for SaaS Developers

Here's how to structure your content labelling pipeline for Art.50 compliance:

┌─────────────────────────────────────────────────────┐
│                CONTENT GENERATION LAYER              │
│  GPAI API call → raw output (image/text/audio/video) │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│              METADATA LABELLING LAYER                │
│  1. Attach IPTC DigitalSourceType = trainedAlgo...   │
│  2. Sign C2PA Manifest with your X.509 cert          │
│  3. Preserve upstream SynthID watermark (if present) │
│  4. Add X-AI-Provenance header (text endpoints)      │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│               DELIVERY LAYER                         │
│  - Serve with metadata intact (.withMetadata())      │
│  - CDN: configure to pass through custom headers     │
│  - Storage: S3/GCS object metadata for provenance    │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│              VERIFICATION LAYER (AUDIT LOG)          │
│  - Log manifest hash + content hash at generation    │
│  - Store in append-only audit trail                  │
│  - Retain for at least 6 months (Art.50 + GDPR)     │
└─────────────────────────────────────────────────────┘

CDN Considerations

A common compliance gap is CDNs stripping custom headers. Ensure your CDN (Cloudflare, AWS CloudFront, Fastly) is configured to pass through Content-Credentials and X-AI-Provenance headers:

# nginx: pass AI provenance headers downstream
add_header X-Content-Type-AI $upstream_http_x_content_type_ai;
add_header X-AI-Provenance $upstream_http_x_ai_provenance;
add_header Content-Credentials $upstream_http_content_credentials;

S3/GCS Object Metadata

When storing AI-generated content in object storage, include provenance in object metadata:

// AWS S3
await s3Client.send(new PutObjectCommand({
  Bucket: 'your-bucket',
  Key: `images/${id}.jpg`,
  Body: labelledImageBuffer,
  ContentType: 'image/jpeg',
  Metadata: {
    'ai-generated': 'true',
    'ai-model': 'your-model-v2.1',
    'digital-source-type': 'trainedAlgorithmicMedia',
    'generation-timestamp': new Date().toISOString(),
    'c2pa-manifest-id': manifestId,
  },
}));

What About the EU Commission's Implementing Acts?

Art.50 delegates the technical specification of machine-readable formats to implementing acts the European Commission is expected to adopt by mid-2026. These acts will formally designate which standards (C2PA, IPTC, or new ones) satisfy the legal obligation.

What this means for developers:

You can track the Commission's progress on implementing acts via EUR-Lex under CELEX prefix 32024R1689 (the AI Act base regulation). The GPAI Code of Practice (currently in consultation phase) also provides guidance on expected standards for GPAI model providers.


August 2026 Compliance Checklist — GPAI Content Labelling

For AI System Providers (you generate content directly)

For GPAI API Consumers/Deployers (you build on top of GPAI APIs)

For All Developers


What Comes After August 2026?

Art.50 compliance at August 2, 2026 is not the endpoint. The regulation anticipates evolution:

  1. National market surveillance authorities will begin enforcement from August 2026. Early penalties are likely to target obvious non-compliance (no disclosure, no metadata at all) rather than technical minutiae.

  2. The GPAI Code of Practice (expected finalized Q2/Q3 2026) will contain more detailed expectations for GPAI providers specifically, including content labelling depth.

  3. C2PA v3.0 is in development and expected to add stronger AI-specific assertions and cross-platform verification infrastructure.

  4. Browser-native verification — Chrome, Firefox, and Safari are evaluating native Content Credentials verification in their media rendering pipelines, which would make C2PA compliance visible to end users without plugins.


See Also

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.