2026-06-03·5 min read·sota.io Team

EU AI Act CI/CD Compliance Finale: Complete August 2026 Readiness Checklist & Scorecard for High-Risk AI Providers

Post #5/5 in the sota.io EU AI Act Automated CI/CD Compliance Testing Series

EU AI Act CI/CD Compliance Finale — August 2026 Readiness Scorecard

August 2, 2026 is 60 days away. For high-risk AI providers and deployers, that deadline marks when EU AI Act compliance obligations become fully enforceable — including Articles 9, 12, 13, 14, and 15, all of which carry specific automated-testing requirements.

Over the past four posts in this series, we built a complete CI/CD compliance testing architecture covering the five core technical obligations. In this finale, we consolidate everything into a master checklist, a readiness scorecard, and an eight-week action plan you can use to close any remaining gaps before the deadline.


Series Recap: What We Built

This five-part series covered the full technical compliance testing pipeline:

PostArticlesCI/CD Gate Topic
#1/5AllAutomated compliance testing architecture — pipeline overview, tooling selection, gate placement strategy
#2/5Art.15Accuracy & robustness testing gates — threshold enforcement, adversarial input detection, performance drift alerting
#3/5Art.13 + Art.14Transparency + human oversight gates — IFU completeness, output schema validation, stop-function testing, override mechanism coverage
#4/5Art.12Record-keeping gates — log completeness, structured audit trail verification, immutability checks
#5/5 (this)AllMaster checklist, readiness scorecard, 8-week action plan

The Master CI/CD Compliance Checklist

Copy this checklist directly into your sprint backlog. Each item maps to a specific EU AI Act article obligation.

Art.9 — Risk Management System

Art.9 requires a continuous risk management system — not a one-time assessment. Your CI/CD pipeline must verify that risk documentation remains current and that identified risks have corresponding mitigations in place.

CI/CD Gates Required:

Tooling options: Great Expectations (data profiles), custom YAML linting, OpenTelemetry metrics assertions

Acceptance threshold: All CRITICAL and HIGH risks mitigated; no undocumented capability changes shipped


Art.12 — Record-Keeping

Art.12 requires high-risk AI systems to automatically log all output decisions with sufficient context to audit them. Logs must be retained and accessible for at least 6 months (deployer obligation under Art.26(5)).

CI/CD Gates Required:

Tooling options: Great Expectations, dbt tests, AWS Config rules, custom OpenTelemetry spans

Acceptance threshold: 100% required field completeness on sampled logs; storage TTL ≥ 183 days enforced


Art.13 — Transparency and Information for Deployers

Art.13 requires providers to supply deployers with sufficient information to understand the system's capabilities, limitations, accuracy metrics, and intended use cases. The Instructions for Use (IFU) must remain current as the model evolves.

CI/CD Gates Required:

Tooling options: Pydantic schema validation, custom markdown linting, Hugging Face model card validators

Acceptance threshold: IFU updated within 24 hours of any model release; all documented output fields present in actual outputs


Art.14 — Human Oversight

Art.14 requires that high-risk AI systems enable appropriate human oversight — including the ability to stop, override, or correct the system at any point. These mechanisms must be functional and tested.

CI/CD Gates Required:

Tooling options: Postman/Newman API tests, pytest with mock inference endpoints, k6 for latency assertions

Acceptance threshold: Stop function ≤500ms response; anomaly alerts delivered ≤60s; override access control enforced


Art.15 — Accuracy, Robustness, and Cybersecurity

Art.15 requires high-risk AI systems to achieve appropriate levels of accuracy and be resilient to errors, inconsistencies, and adversarial inputs. These thresholds must be documented and verified continuously.

CI/CD Gates Required:

Tooling options: Giskard (AI testing), pytest-benchmark, custom robustness harness, Adversarial Robustness Toolbox (IBM)

Acceptance threshold: All accuracy thresholds met; adversarial error rate within documented bounds; zero unhandled exceptions on malformed inputs


August 2026 Readiness Scorecard

Use this scorecard to assess your current compliance posture and prioritize remediation.

ObligationAutomated GatePointsStatus
Art.9: Risk register TTL checkImplemented + passing10[ ]
Art.9: Mitigation coverage per riskImplemented + passing10[ ]
Art.9: Pre-deploy capability delta scanImplemented + passing15[ ]
Art.12: Log schema validationImplemented + passing15[ ]
Art.12: 6-month retention enforcementPolicy configured10[ ]
Art.12: Immutability assertionImplemented + passing5[ ]
Art.13: IFU freshness gateImplemented + passing10[ ]
Art.13: Required sections checkImplemented + passing10[ ]
Art.13: Model card sync assertionImplemented + passing5[ ]
Art.14: Stop-function testImplemented + passing15[ ]
Art.14: Override mechanism testImplemented + passing10[ ]
Art.14: Anomaly alert testImplemented + passing10[ ]
Art.14: Override audit loggingImplemented + passing5[ ]
Art.15: Accuracy threshold gateImplemented + passing15[ ]
Art.15: Adversarial robustness suiteImplemented + passing15[ ]
Art.15: Performance drift monitoringImplemented + passing10[ ]
Art.15: Cybersecurity scanImplemented + passing15[ ]
Total180

Scoring interpretation:


The 8-Week Action Plan (June 3 → August 2, 2026)

Weeks 1–2 (June 3–15): Foundation Gates

Priority 1 — Art.12 logging (highest enforcement visibility): Every NCA audit will start by requesting logs. This is the easiest thing to get right and the most damaging to fail on.

  1. Define and document your log schema (decision_id, timestamp, model_version, input_hash, output_value, confidence_score)
  2. Instrument your inference layer to emit structured events per schema
  3. Configure storage with TTL ≥ 183 days + append-only permissions
  4. Write CI/CD log schema validation test — make it a required gate

Priority 2 — Art.14 stop-function (highest legal liability): A non-functional stop button is grounds for immediate enforcement action.

  1. Implement and document emergency stop endpoint
  2. Write pytest test asserting ≤500ms stop response time
  3. Add to required CI/CD gate list

Checkpoint: By June 15, Art.12 logging and Art.14 stop-function gates must be passing in CI.


Weeks 3–4 (June 15–29): Transparency and Documentation

Art.13 IFU refresh:

  1. Audit your current Instructions for Use — do all required sections exist?
  2. Add IFU freshness gate to CI pipeline (check last-modified date vs. last model training run)
  3. Generate or update model card with current performance metrics

Art.9 risk register:

  1. Review risk register currency — is it current with your deployed model?
  2. Add mitigation coverage test: for each HIGH/CRITICAL risk, assert a passing test exists
  3. Implement pre-deploy capability delta scan

Checkpoint: By June 29, IFU is current, model card is published, risk register is < 90 days old, and all Art.13 gates are passing in CI.


Weeks 5–6 (June 29 – July 13): Robustness and Accuracy

Art.15 adversarial testing:

  1. Define accuracy thresholds and document them in risk register
  2. Build adversarial input battery (edge cases, boundary conditions, OOD inputs)
  3. Configure CI/CD accuracy threshold gate — fail build below threshold
  4. Set up performance drift monitoring with ±2% alert threshold
  5. Run OWASP model-threat checklist and remediate findings

Checkpoint: By July 13, all Art.15 gates implemented and passing; adversarial error rates within documented bounds.


Weeks 7–8 (July 13 – August 2): Evidence Packaging and Final Checks

Pre-deadline sprint:

  1. Run complete readiness scorecard — target ≥160 points
  2. Package compliance evidence: CI pipeline reports, log retention policy docs, IFU version history, risk register + mitigation test results
  3. Prepare Art.47 EU Declaration of Conformity draft (or verify existing one is current)
  4. Conduct internal tabletop exercise: simulate an NCA audit request and verify you can produce all required evidence within 48 hours
  5. Final full pipeline run — confirm all gates green

Common Mistakes to Avoid

Mistake 1: Treating compliance gates as one-time checks Compliance gates must run on every build. A gate you wrote in January that breaks silently in July means you were non-compliant for 6 months without knowing it. Use CI required-status checks — never optional.

Mistake 2: Logging at the wrong granularity Art.12 requires logging at the individual decision level — not batch summaries, not aggregate metrics. Each inference call must produce one log record. Courts and NCAs will check this.

Mistake 3: Confusing Art.12 (provider) with Art.26(5) (deployer) If you are a provider, Art.12 requires you to build the logging into the system. If you are a deployer, Art.26(5) requires you to keep those logs for at least 6 months and make them available upon request. Many organizations are both — make sure both obligations are met.

Mistake 4: Stop-function that only stops the UI Art.14 stop capability must halt inference — not just the front-end. Test that your stop endpoint actually terminates the prediction pipeline, not just hides the interface from the user.

Mistake 5: Documentation that doesn't match deployed model An IFU for model v1.2 that describes a model now running v1.4 is non-compliant. Your CI pipeline must version-lock documentation to model artifacts.


The Minimal Viable Compliance Pipeline

If you're behind schedule, this is the absolute minimum viable pipeline to have running by August 2:

# .github/workflows/ai-act-compliance.yml
name: EU AI Act Compliance Gates

on: [push, pull_request]

jobs:
  compliance:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Art.12: Log schema validation
      - name: Validate log schema
        run: python scripts/validate_log_schema.py --schema schemas/decision_log_v1.json

      # Art.13: IFU freshness check
      - name: Check IFU currency
        run: python scripts/check_ifu_freshness.py --max-age-days 90

      # Art.14: Stop-function response time
      - name: Test stop function
        run: pytest tests/test_stop_function.py --timeout=5

      # Art.15: Accuracy threshold
      - name: Accuracy gate
        run: pytest tests/test_accuracy_threshold.py

      # Art.15: Robustness (adversarial inputs)
      - name: Robustness test suite
        run: pytest tests/test_robustness.py

      # Art.9: Risk register TTL
      - name: Risk register currency
        run: python scripts/check_risk_register.py --max-age-days 90

Each of these scripts can be implemented in under a day. The test assertions are simple — the value is in running them automatically on every build so you have continuous evidence of compliance.


What NCAs Will Look For

Based on current NCA guidance from the EU AI Office and national market surveillance authorities, enforcement actions for high-risk AI systems are expected to focus on:

  1. Evidence of systematic testing — not just assertions that the system is compliant, but automated test results showing continuous verification
  2. Log completeness — auditors will request a random sample of decision logs and check for required fields
  3. Incident response readiness — specifically, whether Art.73 incident reporting procedures are documented and rehearsed
  4. Documentation currency — IFU and technical documentation must be current with the deployed model, not last year's version

The CI/CD gates in this series directly address all four enforcement focus areas.


Series Complete: Your CI/CD Compliance Stack

After five posts, your EU AI Act CI/CD compliance stack should include:

Art.9 layer: Risk register TTL gate → mitigation coverage test → pre-deploy delta scan

Art.12 layer: Log schema validation → retention policy assertion → immutability check → freshness probe

Art.13 layer: IFU freshness gate → section completeness check → model card sync assertion → output schema contract test

Art.14 layer: Stop-function latency test → override mechanism test → anomaly alert latency assertion → access control test

Art.15 layer: Accuracy threshold gate → adversarial input battery → consistency test → performance drift check → cybersecurity scan

When all five layers are running on every build, you have a defensible, auditable compliance posture — one that generates continuous evidence rather than a point-in-time snapshot.

August 2, 2026 is the deadline. Eight weeks is enough time to build this, if you start today.


This post is part of the sota.io EU AI Act CI/CD Compliance Testing series. Deploy your high-risk AI on EU infrastructure with built-in compliance support at sota.io.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.