2026-06-02·5 min read·sota.io Team

EU AI Act Automated Compliance Testing: Building a CI/CD Audit Pipeline for High-Risk AI Systems

Post #1463 in the sota.io EU AI Act Compliance Series

EU AI Act CI/CD compliance pipeline with automated audit gates

The August 2, 2026 EU AI Act deadline creates an uncomfortable reality for engineering teams: your high-risk AI system must carry a paper trail of compliance evidence before it goes to market, but modern ML workflows deploy models weekly or daily. If your compliance process is purely manual — a quarterly review session, a PDF checklist, an annual third-party audit — you will either ship non-compliant models or freeze deployments while lawyers catch up.

The solution is to shift compliance left: embed EU AI Act verification gates directly into your CI/CD pipeline so every model build either passes an automated audit or fails fast with actionable evidence. This post is the first in a five-part series on building that pipeline.

Why Manual Compliance Fails for AI Systems

Traditional software compliance works because code is deterministic. A security scan of yesterday's build is still valid today. A penetration test document from six months ago describes a known attack surface.

AI systems break this assumption. When you retrain a model on updated data, the bias profile changes. When you tune hyperparameters for accuracy, robustness against adversarial inputs may degrade. When you add a new feature to the input vector, the explainability obligation (Art.13 instructions for use) may require updated documentation.

EU AI Act Art.9(1) requires that your risk management system address risks throughout the entire lifecycle of the high-risk AI system, not just at initial placement on the market. Art.9(6) explicitly requires testing before market release and in each subsequent deployment. This lifecycle framing is what makes CI/CD integration not just convenient but legally necessary.

The Five Mandatory Compliance Gates

A robust EU AI Act CI/CD pipeline needs five verification stages, each mapped to specific legal obligations:

Gate 1: Data Governance (Art.10)

Art.10 requires that training, validation, and test datasets be managed according to defined data governance practices. Your pipeline must verify:

Tools: Evidently AI for data drift and bias reporting, Fairlearn for fairness metrics, Great Expectations for dataset validation.

# .github/workflows/compliance-gate-data.yml
- name: Data Governance Gate (Art.10)
  run: |
    python scripts/check_dataset_provenance.py --registry data-registry.json
    python scripts/run_bias_report.py --dataset $DATASET_PATH --output reports/art10-bias-$(date +%Y%m%d).html
    python scripts/check_representativeness.py --regions EU --threshold 0.85

The gate fails — and the build stops — if the bias report detects a demographic parity gap above your defined threshold, or if dataset provenance cannot be verified.

Gate 2: Accuracy and Robustness (Art.15)

Art.15 requires that high-risk AI systems achieve appropriate levels of accuracy, robustness, and cybersecurity. Critically, Art.15(1) specifies that these properties must be maintained throughout the lifecycle at a consistent performance level.

Your CI/CD pipeline must define and enforce performance thresholds:

# scripts/run_accuracy_gate.py
REQUIRED_ACCURACY = 0.92  # Must match Annex IV technical documentation
REQUIRED_ROBUSTNESS = 0.88  # Performance on held-out test set with distribution shift

accuracy = evaluate_model(model, validation_set)
robustness = evaluate_model(model, shifted_validation_set)

assert accuracy >= REQUIRED_ACCURACY, f"Art.15 FAIL: accuracy {accuracy:.3f} < {REQUIRED_ACCURACY}"
assert robustness >= REQUIRED_ROBUSTNESS, f"Art.15 FAIL: robustness {robustness:.3f} < {REQUIRED_ROBUSTNESS}"

# Write compliance evidence
write_evidence_record({
    "gate": "art15-accuracy-robustness",
    "timestamp": datetime.utcnow().isoformat(),
    "model_version": MODEL_VERSION,
    "accuracy": accuracy,
    "robustness": robustness,
    "threshold_met": True,
    "commit": os.getenv("GITHUB_SHA"),
})

The critical output here is the evidence record — a JSON artifact that becomes part of your Art.11 technical documentation. Every gate must produce machine-readable evidence, not just a pass/fail signal.

Gate 3: Logging and Record-Keeping (Art.12)

Art.12 requires that high-risk AI systems automatically log events with sufficient granularity to enable post-hoc reconstruction of system behaviour. This means your deployed model must be instrumented at build time, not just in production configuration.

In your CI/CD pipeline, this gate verifies that:

- name: Logging Configuration Gate (Art.12)
  run: |
    python scripts/verify_logging_hooks.py --model $MODEL_PATH
    python scripts/validate_log_schema.py --schema schemas/art12-log-schema.json
    python scripts/check_log_retention.py --target-days 365 --environment $DEPLOY_ENV

Failing this gate is a P0 blocker. A model without correct logging instrumentation cannot legally be deployed in the EU as a high-risk AI system.

Gate 4: Human Oversight Mechanism (Art.14)

Art.14 requires that high-risk AI systems be designed so that natural persons can effectively oversee them. This includes the ability to interrupt, override, or stop the system. Your pipeline must verify these mechanisms exist and are functional.

Human oversight verification in CI/CD is often overlooked because it feels like a UX concern rather than a testing concern. But Art.14(4) lists specific technical requirements:

Your pipeline should run an integration test that simulates a human override:

# tests/test_human_oversight.py
def test_override_mechanism():
    """Verify Art.14 human override is always callable."""
    model_service = ModelService()
    # Start processing
    future = model_service.predict_async(sample_input)
    # Override before completion — must succeed
    result = model_service.override(reason="operator_test")
    assert result.success, "Art.14 FAIL: override mechanism did not respond"
    assert result.audit_logged, "Art.14 FAIL: override not written to audit log"

def test_anomaly_alerting():
    """Verify system flags low-confidence outputs for human review."""
    low_confidence_output = model.predict(adversarial_input)
    assert low_confidence_output.requires_human_review, "Art.14 FAIL: no human review flag on low-confidence output"

Gate 5: Technical Documentation (Art.11 + Annex IV)

The final gate is often the most operationally intensive: generating and validating your Annex IV technical documentation. Annex IV lists nine mandatory elements, from general system description to design specifications, monitoring plans, and post-market data collection.

Rather than maintaining this documentation manually, treat it as a generated artifact:

# scripts/generate_annex_iv_doc.py
def generate_technical_documentation(model_version, config, test_results):
    return {
        "annex_iv_section_1": {
            "general_description": config["system_description"],
            "intended_purpose": config["intended_purpose"],
            "provider_details": config["provider"],
        },
        "annex_iv_section_2": {
            "design_specifications": extract_architecture_info(model_version),
            "training_methodology": config["training_methodology"],
        },
        "annex_iv_section_3": {
            "monitoring_measures": config["monitoring_plan"],
        },
        "annex_iv_section_4": {
            "risk_management_measures": config["risk_management"],
        },
        "test_evidence": {
            "accuracy": test_results["art15"],
            "bias_report": test_results["art10"],
            "logging_verification": test_results["art12"],
            "oversight_test": test_results["art14"],
        },
        "generated_at": datetime.utcnow().isoformat(),
        "model_commit": os.getenv("GITHUB_SHA"),
    }

The generated documentation artifact is stored in your EU-compliant artifact registry (ideally on EU-hosted infrastructure — this is where hosting choice becomes a compliance choice). For sota.io deployments, artifact storage on EU-sovereign infrastructure means your technical documentation trail never crosses into CLOUD Act jurisdiction.

Integrating the Five Gates: A Sample GitHub Actions Workflow

# .github/workflows/eu-ai-act-compliance.yml
name: EU AI Act Compliance Pipeline

on:
  push:
    branches: [main]
  pull_request:
    paths:
      - 'models/**'
      - 'training/**'

jobs:
  compliance-gates:
    runs-on: ubuntu-latest
    environment: eu-compliance
    steps:
      - uses: actions/checkout@v4

      - name: Gate 1 — Data Governance (Art.10)
        id: gate_data
        run: python scripts/check_dataset_provenance.py && python scripts/run_bias_report.py

      - name: Gate 2 — Accuracy & Robustness (Art.15)
        id: gate_accuracy
        run: python scripts/run_accuracy_gate.py
        
      - name: Gate 3 — Logging Configuration (Art.12)
        id: gate_logging
        run: python scripts/verify_logging_hooks.py

      - name: Gate 4 — Human Oversight (Art.14)
        id: gate_oversight
        run: pytest tests/test_human_oversight.py -v

      - name: Gate 5 — Technical Documentation (Art.11)
        id: gate_docs
        run: python scripts/generate_annex_iv_doc.py --output artifacts/annex-iv-${{ github.sha }}.json

      - name: Upload Compliance Artifacts
        uses: actions/upload-artifact@v4
        with:
          name: eu-ai-act-compliance-${{ github.sha }}
          path: |
            reports/art10-bias-*.html
            artifacts/annex-iv-*.json
            logs/compliance-evidence-*.json
          retention-days: 365  # Match post-market monitoring retention

      - name: Compliance Summary
        run: |
          echo "## EU AI Act Compliance Report" >> $GITHUB_STEP_SUMMARY
          echo "| Gate | Article | Status |" >> $GITHUB_STEP_SUMMARY
          echo "|------|---------|--------|" >> $GITHUB_STEP_SUMMARY
          echo "| Data Governance | Art.10 | ✅ PASS |" >> $GITHUB_STEP_SUMMARY
          echo "| Accuracy/Robustness | Art.15 | ✅ PASS |" >> $GITHUB_STEP_SUMMARY
          echo "| Logging Config | Art.12 | ✅ PASS |" >> $GITHUB_STEP_SUMMARY
          echo "| Human Oversight | Art.14 | ✅ PASS |" >> $GITHUB_STEP_SUMMARY
          echo "| Technical Docs | Art.11 | ✅ PASS |" >> $GITHUB_STEP_SUMMARY

Artifact Storage and EU Sovereignty

Compliance evidence generated in CI/CD is not just convenient documentation — it is potentially the primary evidence you present to a National Competent Authority (NCA) during a market surveillance inspection. Under Art.74, NCAs have the power to request access to technical documentation. If your artifacts are stored in a US-headquartered cloud (AWS S3, Google Cloud Storage, Azure Blob Storage), that storage is potentially subject to the CLOUD Act — meaning US authorities, not just EU regulators, could demand access to your compliance records under US law.

For EU-deployed AI systems, the recommendation is:

The Pre-Deployment Compliance Report

Before every production deployment, your pipeline should generate a single compliance report that consolidates all gate outputs:

EU AI Act Pre-Deployment Compliance Report
==========================================
Model Version: 2.4.1
Commit: abc123def456
Pipeline Run: 2026-06-01T14:22:00Z
Environment: production-eu-west

Gate Results:
  Art.10 Data Governance:    PASS (bias gap: 0.023 < threshold 0.05)
  Art.15 Accuracy:           PASS (accuracy: 0.943 >= threshold 0.92)
  Art.15 Robustness:         PASS (robustness: 0.891 >= threshold 0.88)
  Art.12 Logging Config:     PASS (schema validated, retention: 365d)
  Art.14 Human Oversight:    PASS (override latency: 12ms, audit log: ✓)
  Art.11 Documentation:      PASS (annex-iv-abc123.json generated)

Compliance Verdict: APPROVED FOR DEPLOYMENT
Evidence Package: s3://eu-compliance-artifacts/2026-06-01/abc123def456/

Deploying to: sota.io (EU-sovereign PaaS, Hetzner Germany)
CLOUD Act exposure: None (EU-only infrastructure)

This report becomes the cover page of your Art.11 technical documentation package for this deployment.

20-Item CI/CD Compliance Checklist

Data Governance (Art.10)

Accuracy and Robustness (Art.15)

Logging and Record-Keeping (Art.12)

Human Oversight (Art.14)

Technical Documentation (Art.11 + Annex IV)

What Comes Next in This Series

This post covers the pipeline architecture. The remaining four posts in the series go deeper:

The August 2, 2026 deadline is 61 days away. If your high-risk AI system does not have automated compliance gates in CI/CD, you have two months to build them. That is enough time — if you start now.


sota.io is an EU-native managed PaaS built for teams that need to keep their AI compliance artifacts in EU-sovereign infrastructure. No US parent, no CLOUD Act exposure, no data leaving Hetzner Germany. Your compliance evidence stays where your regulator expects it.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.