2026-05-27·5 min read·sota.io Team

EU AI Act AI-Generated Content Labelling Tools 2026: C2PA, Provenance & Detection Stack

Post #3 in the sota.io EU AI Act Transparency 2026 Series

EU AI Act AI-generated content labelling — C2PA provenance and detection stack diagram

EU AI Act Article 50(4) enters force on 2 August 2026 — and if your SaaS platform generates, modifies, or distributes AI-created images, video, audio, or text at scale, you have 67 days to implement machine-readable content labelling. This isn't a soft recommendation: GPAI model providers and deployers of "deep fake" style systems face obligations that go well beyond a visible badge. This guide covers every tool, standard, and implementation pattern you need.

The Regulatory Baseline: What Art.50(4) Actually Requires

Article 50(4) of the EU AI Act targets AI systems that generate or manipulate content (images, audio, video, text) in ways that could mislead:

"Providers of AI systems that generate or manipulate image, audio or video content constituting a deep fake shall disclose that the content has been artificially generated or manipulated."

The obligation splits into two layers:

Layer 1 — Visible disclosure (immediate obligation for GPAI deployers): Users must be clearly informed that content was AI-generated. A UI badge, overlay, or notification suffices — but must not be "easily removable."

Layer 2 — Machine-readable marking (technical obligation): Content must carry a technically detectable marker that persists through typical processing chains. Recital 133 clarifies this means provenance metadata embedded in the content itself.

The European AI Office's technical guidance (Q1 2026) further specifies that machine-readable marking should follow open interoperable standards — explicitly naming C2PA as the reference implementation.

Who Is Covered

ActorObligationTrigger
GPAI model provider (Art.53)Technical infrastructure to mark outputsModel deployed via API or integrated
SaaS deployer (Art.50(4))Disclosure to end users + markingPlatform generates images/video/audio/text
Intermediary platformsPass-through marking preservationMust not strip provenance metadata
Professional context exemptionReduced disclosure, not eliminated"Legitimate" creative/artistic use

One common misconception: text generation is included. Art.50(4) explicitly lists text alongside images, audio, and video. If your platform generates cover letters, product descriptions, or news summaries at scale without disclosure, you're exposed.


C2PA: The Technical Standard Behind EU Compliance

The Coalition for Content Provenance and Authenticity (C2PA) is the joint specification from Adobe, Microsoft, Intel, ARM, BBC, and now Google DeepMind that has emerged as the EU's de facto standard for provenance marking.

How C2PA Works

C2PA creates a cryptographically signed manifest attached to media files. The manifest contains:

The entire manifest is signed with an X.509 certificate chain rooted in the C2PA Trust List — a maintained registry of trusted content signers. Adobe, Google, and Microsoft are Trust List members. EU-sovereign options include D-Trust GmbH (subsidiary of German Bundesdruckerei) and Bundesdruckerei directly.

File (JPEG/PNG/MP4/MP3) 
  └── Embedded C2PA Manifest (JUMBF box)
        ├── Claim (version, title, format)
        ├── Assertions
        │   ├── c2pa.ai.generated  ← AI provenance
        │   ├── c2pa.thumbnail
        │   └── c2pa.ingredient
        └── Claim Signature (X.509, ECDSA P-384)

C2PA Python Implementation

import c2pa
from pathlib import Path

def label_ai_generated_image(
    input_path: str,
    output_path: str,
    model_name: str,
    cert_pem: str,
    private_key_pem: str
) -> str:
    """Embed C2PA AI-generated manifest into image file."""
    
    manifest_data = {
        "claim_generator": f"YourSaaS/1.0 c2pa-python/{c2pa.__version__}",
        "title": Path(input_path).name,
        "assertions": [
            {
                "label": "c2pa.ai.generated",
                "data": {
                    "model": {
                        "name": model_name,
                        "version": "1.0"
                    }
                }
            },
            {
                "label": "stds.schema-org.CreativeWork",
                "data": {
                    "@context": "https://schema.org",
                    "@type": "ImageObject",
                    "generator": {
                        "@type": "SoftwareApplication",
                        "name": model_name
                    }
                }
            }
        ]
    }
    
    signer = c2pa.create_signer(
        cert_pem.encode(),
        private_key_pem.encode(),
        "ps256",
        "http://timestamp.digicert.com"  # or D-Trust for EU-sovereign
    )
    
    builder = c2pa.Builder(manifest_data)
    builder.sign(input_path, output_path, signer)
    
    return output_path

FastAPI Endpoint: Auto-Label Pipeline

from fastapi import FastAPI, UploadFile, HTTPException
from fastapi.responses import FileResponse
import tempfile
import os

app = FastAPI()

@app.post("/api/generate/image")
async def generate_and_label_image(
    prompt: str,
    user_id: str
):
    # Step 1: Generate image with your AI model
    raw_image_bytes = await your_image_model.generate(prompt)
    
    # Step 2: Write to temp file
    with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as tmp_in:
        tmp_in.write(raw_image_bytes)
        tmp_in_path = tmp_in.name
    
    tmp_out_path = tmp_in_path.replace(".png", "_c2pa.png")
    
    try:
        # Step 3: Embed C2PA manifest
        label_ai_generated_image(
            input_path=tmp_in_path,
            output_path=tmp_out_path,
            model_name=os.getenv("AI_MODEL_NAME", "YourModel/1.0"),
            cert_pem=os.getenv("C2PA_CERT_PEM"),
            private_key_pem=os.getenv("C2PA_PRIVATE_KEY_PEM")
        )
        
        # Step 4: Return labelled image
        return FileResponse(
            tmp_out_path,
            media_type="image/png",
            headers={
                "X-AI-Generated": "true",
                "X-C2PA-Signed": "true",
                "X-Generator-Model": os.getenv("AI_MODEL_NAME")
            }
        )
    finally:
        os.unlink(tmp_in_path)

SynthID: Google DeepMind's Open-Sourced Watermarking

SynthID is Google DeepMind's imperceptible watermarking system, open-sourced in November 2024. Unlike C2PA (which relies on embedded metadata), SynthID modifies the content itself — making it detectable even if metadata is stripped.

SynthID Coverage

ModalityStatusEU Art.50(4) Coverage
ImagesProduction (Imagen 2/3)
AudioProduction (Lyria)
TextProduction (Gemini)
VideoBeta (Veo)✓ (Aug 2026)

SynthID Python (Open Source)

import synthid_text
from transformers import AutoTokenizer, AutoModelForCausalLM

# Watermark text generation
tokenizer = AutoTokenizer.from_pretrained("your-model")
model = AutoModelForCausalLM.from_pretrained("your-model")

# Initialize SynthID watermark processor
watermark_config = synthid_text.WatermarkingConfig(
    ngram_len=5,
    keys=[your_secret_key_1, your_secret_key_2],  # Keep these secret!
    context_history_size=1024,
    sampling_table_size=65536,
    sampling_table_seed=0,
    skipping_rule="score_ngrams"
)

logits_processor = synthid_text.SynthIDTextWatermarkLogitsProcessor(
    **watermark_config.__dict__,
    device="cuda"
)

# Generate watermarked text
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    logits_processor=[logits_processor],
    max_new_tokens=1024
)

watermarked_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

SynthID Detection

detector = synthid_text.SynthIDTextWatermarkDetector(
    **watermark_config.__dict__,
    device="cuda"
)

# Returns probability that text is watermarked
detection_result = detector.detect(
    text=candidate_text,
    tokenizer=tokenizer
)

is_ai_generated = detection_result.score > 0.95  # configurable threshold

Key advantage: SynthID watermarks survive typical transformations — cropping (images), compression (audio/video), paraphrasing (text). The EU AI Office's 2026 guidance explicitly mentions "robust watermarks that survive post-processing" as the target standard for machine-readable marking.


EU-Sovereign Content Labelling Stack

If you process personal data (including biometric data in images/video), deploying through US-headquartered infrastructure creates GDPR-CLOUD Act tension. Here's an EU-sovereign stack:

Tier 1: Signing Infrastructure

D-Trust (Deutsche Bundesdruckerei)

Bundesdruckerei directly — for enterprise scale (>1M signings/month)

Tier 2: Processing Infrastructure

Deploy on Hetzner (Nuremberg/Falkenstein/Helsinki) or Scaleway (Paris/Amsterdam) to keep all image/video/audio processing within EU:

# docker-compose.yml for EU-sovereign C2PA pipeline
services:
  c2pa-labeller:
    image: your-registry.eu/c2pa-labeller:latest
    environment:
      - C2PA_CERT_PEM=${D_TRUST_CERT_PEM}
      - C2PA_PRIVATE_KEY_PEM=${D_TRUST_KEY_PEM}
      - TIMESTAMP_SERVER=http://timestamp.d-trust.net/tsa/tsa.crt  # D-Trust TSA
    deploy:
      placement:
        constraints:
          - node.labels.region == eu-central  # Ensure EU datacenter

Tier 3: EU Detection APIs

ToolProviderHostingEU GDPRCapabilities
Hive AI DetectionHiveUS (CLOUD Act ⚠️)NoImages, Video, Text, Audio
Imatag DetectImatagFrance (EU ✓)YesImages, Video
TruepicTruepicUS (CLOUD Act ⚠️)NoImages, C2PA verification
c2pa-rs (self-hosted)Open SourceYour infra ✓YesC2PA verification
SynthID (self-hosted)Open SourceYour infra ✓YesText, Images (Imagen only)

EU-sovereign detection recommendation: Self-host c2pa-rs (Rust, Apache 2.0) for C2PA verification + SynthID for imperceptible watermark detection. For third-party detection services, Imatag (French company, EU-hosted) is the only major player meeting GDPR requirements without SCCs.


Tool Comparison: C2PA Implementations

LibraryLanguageMaturityC2PA SpecKey Features
c2pa-pythonPythonv0.5.xC2PA 2.1Signing, verification, manifest reading
c2pa-rsRustv0.36.xC2PA 2.1Core library, fastest, most complete
c2pa-nodeNode.jsv0.5.xC2PA 2.1Wraps c2pa-rs via NAPI
c2pa-jsBrowserv2.xC2PA 2.1Browser-side verification
Adobe Content AuthenticitySaaS APIGAC2PA 2.1US-hosted (CLOUD Act ⚠️)
Microsoft Azure AI Content SafetySaaS APIGAPartialUS-hosted (CLOUD Act ⚠️)

Installing the Python Stack

# Core C2PA library
pip install c2pa-python

# SynthID text watermarking
pip install synthid-text

# Optional: Pillow for image pre-processing
pip install Pillow

# Optional: ffmpeg-python for video/audio
pip install ffmpeg-python

Video and Audio Labelling

Article 50(4) covers video and audio, not just images. Here's how to handle each:

Video (MP4/WebM)

C2PA 2.1 supports video via fragmented MP4 containers. The manifest is embedded in the moov box:

import c2pa
import subprocess
import tempfile

def label_ai_video(input_mp4: str, output_mp4: str, model_name: str) -> str:
    """Embed C2PA manifest into MP4 video."""
    
    manifest = {
        "claim_generator": "YourSaaS/1.0",
        "assertions": [
            {
                "label": "c2pa.ai.generated",
                "data": {"model": {"name": model_name}}
            }
        ],
        "format": "video/mp4"  # Required for video
    }
    
    signer = c2pa.create_signer(
        cert_pem=load_cert(), 
        private_key_pem=load_key(),
        alg="ps256",
        tsa_url="http://timestamp.d-trust.net/tsa/tsa.crt"
    )
    
    builder = c2pa.Builder(manifest)
    builder.sign(input_mp4, output_mp4, signer)
    return output_mp4

For streaming video (HLS/DASH), each segment must be individually signed — a significant infrastructure consideration. AWS Elemental and cloud-native solutions don't support this. EU-sovereign alternative: Cloudflare Stream (Dublin) supports C2PA manifest injection as of Q1 2026.

Audio (MP3/AAC/FLAC)

# SynthID audio watermarking (for audio generated by Lyria or compatible models)
import synthid_audio

watermarked_audio = synthid_audio.watermark(
    audio_array=generated_audio,
    sample_rate=24000,
    keys=[audio_watermark_key],
    strength=2.0  # Higher = more detectable, less imperceptible
)

For audio not generated by Google models, C2PA is the only standards-compliant option. Embed into ID3 tags (MP3) or Vorbis comments (FLAC/OGG).


Text Content Labelling

The most underestimated obligation: text generation must also be disclosed. Article 50(4) wording explicitly includes text. The EU AI Office's March 2026 FAQ confirmed this applies to:

For text, machine-readable marking options are:

  1. SynthID Text (best for imperceptible watermarking)
  2. C2PA JSON manifest in API response headers (C2PA-Manifest: <base64>)
  3. Metadata fields in structured outputs (JSON/XML with ai_generated: true + model provenance)
# Option 3: Metadata in API response
from fastapi.responses import JSONResponse

@app.post("/api/generate/text")
async def generate_text(prompt: str):
    generated_text = await llm.generate(prompt)
    watermarked_text = watermark_logits_processor(generated_text)
    
    return JSONResponse(
        content={
            "text": watermarked_text,
            "metadata": {
                "ai_generated": True,
                "model": "YourLLM/1.0",
                "generated_at": datetime.utcnow().isoformat(),
                "synthid_watermarked": True,
                "c2pa_assertion": "c2pa.ai.generated"
            }
        },
        headers={
            "X-AI-Generated": "true",
            "X-Generator-Model": "YourLLM/1.0",
            "X-C2PA-Assertion": "c2pa.ai.generated"
        }
    )

Detection Endpoint: Verify AI Content

Build a verification endpoint so downstream systems can confirm provenance:

from fastapi import FastAPI, UploadFile, File
import c2pa
import json

@app.post("/api/verify/content-provenance")
async def verify_content_provenance(file: UploadFile = File(...)):
    """Check if content has valid C2PA AI provenance manifest."""
    
    content = await file.read()
    
    try:
        # Read C2PA manifest
        manifest_json = c2pa.read_file_from_memory(content, file.content_type)
        manifest = json.loads(manifest_json)
        
        # Check for AI generation assertion
        ai_assertions = [
            a for a in manifest.get("manifests", {}).values()
            for a in a.get("assertions", [])
            if a.get("label") == "c2pa.ai.generated"
        ]
        
        return {
            "has_c2pa_manifest": True,
            "ai_generated": len(ai_assertions) > 0,
            "assertions": [a["label"] for a in ai_assertions],
            "issuer": manifest.get("active_manifest", {}).get("claim_generator"),
            "compliant_art50": len(ai_assertions) > 0
        }
        
    except c2pa.Error.ManifestNotFound:
        return {
            "has_c2pa_manifest": False,
            "ai_generated": None,
            "compliant_art50": False,
            "warning": "No C2PA manifest found — content may not meet Art.50(4) requirements"
        }

Compliance Checklist: Art.50(4) for SaaS Platforms

Mandatory (by 2 August 2026)

Not Required But Best Practice


Common Implementation Mistakes

1. Marking at display time instead of generation time

Wrong: Adding a watermark to the image when it's shown in the UI. Right: Embedding the C2PA manifest immediately after the model generates the output, before any caching or storage.

2. Stripping metadata in image optimization pipelines

Most CDN pipelines (Cloudflare Image Resize, imgix, Thumbor) strip EXIF/JUMBF metadata by default. C2PA manifests are stored in JUMBF boxes — if your CDN removes them, you're non-compliant.

Fix for Cloudflare Workers:

// Preserve C2PA manifest by disabling metadata stripping
const response = await fetch(request, {
  cf: {
    image: {
      format: "png",
      metadata: "keep"  // ← Critical: keep all metadata including C2PA
    }
  }
});

3. Missing video segment signing

Adaptive bitrate streaming (HLS/DASH) requires each segment to be individually signed. Most implementations only sign the initial manifest or the first segment.

4. Assuming C2PA = compliance for text

C2PA is a file-format standard — it doesn't naturally attach to text responses. Text generation requires either SynthID, response headers, or a separate provenance record.

5. Using US-headquartered trust anchors for EU personal data

If your images contain personal data (faces, etc.), routing them through Adobe's Content Authenticity API or Microsoft's API for C2PA signing creates CLOUD Act exposure. Use D-Trust or self-sign with an eIDAS-qualified certificate.


Integration with sota.io

If you're deploying on sota.io and use AI generation features, the platform provides:

The default sota.io deployment configuration ensures AI content labelling happens at the infrastructure layer — you don't need to modify your application code to achieve Art.50(4) compliance.


Roadmap: What's Coming After August 2026

The EU AI Office's 2026 work programme signals additional requirements:

Q4 2026: C2PA interoperability mandate — platforms must accept C2PA manifests from third-party models, not just their own (prevents lock-in via proprietary marking schemes).

2027: GPAI model providers (Art.53) must register their signing certificates with a European Trust Registry — likely managed by ENISA.

2028 (CRA alignment): For AI-enabled IoT products, content labelling requirements align with Cyber Resilience Act vulnerability disclosure obligations.

Start with compliant architecture now — retrofitting a marking pipeline after August 2026 under regulator scrutiny is far more expensive than building it right the first time.


Next in This Series


Running EU-compliant AI infrastructure? sota.io provides EU-sovereign SaaS deployment with C2PA content labelling infrastructure built in.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.