2026-06-03·5 min read·sota.io Team

EU AI Act CI/CD Compliance Finale: Complete August 2026 Readiness Checklist & Scorecard for High-Risk AI Providers

Post #5/5 in the sota.io EU AI Act Automated CI/CD Compliance Testing Series

EU AI Act CI/CD Compliance Finale — August 2026 Readiness Scorecard

August 2, 2026 is 60 days away. For high-risk AI providers and deployers, that deadline marks when EU AI Act compliance obligations become fully enforceable — including Articles 9, 12, 13, 14, and 15, all of which carry specific automated-testing requirements.

Over the past four posts in this series, we built a complete CI/CD compliance testing architecture covering the five core technical obligations. In this finale, we consolidate everything into a master checklist, a readiness scorecard, and an eight-week action plan you can use to close any remaining gaps before the deadline.

Series Recap: What We Built

This five-part series covered the full technical compliance testing pipeline:

Post	Articles	CI/CD Gate Topic
#1/5	All	Automated compliance testing architecture — pipeline overview, tooling selection, gate placement strategy
#2/5	Art.15	Accuracy & robustness testing gates — threshold enforcement, adversarial input detection, performance drift alerting
#3/5	Art.13 + Art.14	Transparency + human oversight gates — IFU completeness, output schema validation, stop-function testing, override mechanism coverage
#4/5	Art.12	Record-keeping gates — log completeness, structured audit trail verification, immutability checks
#5/5 (this)	All	Master checklist, readiness scorecard, 8-week action plan

The Master CI/CD Compliance Checklist

Copy this checklist directly into your sprint backlog. Each item maps to a specific EU AI Act article obligation.

Art.9 — Risk Management System

Art.9 requires a continuous risk management system — not a one-time assessment. Your CI/CD pipeline must verify that risk documentation remains current and that identified risks have corresponding mitigations in place.

CI/CD Gates Required:

Risk register version check: confirm risk_register.yaml last modified date is within rolling 90-day window
Mitigation coverage test: for every risk entry with severity ≥ MEDIUM, verify a corresponding mitigation test exists and passes
Pre-deployment risk delta scan: compare new model artifacts against current risk register — fail build if new capabilities lack documented risks
Post-deployment monitoring assertion: confirm risk monitoring queries are active and returning data (not stale)

Tooling options: Great Expectations (data profiles), custom YAML linting, OpenTelemetry metrics assertions

Acceptance threshold: All CRITICAL and HIGH risks mitigated; no undocumented capability changes shipped

Art.12 — Record-Keeping

Art.12 requires high-risk AI systems to automatically log all output decisions with sufficient context to audit them. Logs must be retained and accessible for at least 6 months (deployer obligation under Art.26(5)).

CI/CD Gates Required:

Log schema validation: every prediction/decision event emits decision_id, timestamp, model_version, input_hash, output_value, confidence_score
Mandatory field completeness: fail pipeline if any required field is null or missing across sampled log entries
Log retention assertion: verify storage policy enforces minimum 6-month TTL (e.g., S3 lifecycle rules, BigQuery partition expiry)
Immutability check: confirm log storage is append-only (no DELETE or UPDATE permissions on production log store)
Log freshness probe: in staging, verify last log entry is within 60 seconds of system activity

Tooling options: Great Expectations, dbt tests, AWS Config rules, custom OpenTelemetry spans

Acceptance threshold: 100% required field completeness on sampled logs; storage TTL ≥ 183 days enforced

Art.13 — Transparency and Information for Deployers

Art.13 requires providers to supply deployers with sufficient information to understand the system's capabilities, limitations, accuracy metrics, and intended use cases. The Instructions for Use (IFU) must remain current as the model evolves.

CI/CD Gates Required:

IFU freshness gate: fail build if docs/instructions-for-use.md last-modified date is older than the most recent model training run
Required section check: verify IFU contains all mandatory sections — intended purpose, performance metrics per use-case, known limitations, contraindications, human oversight requirements
Model card sync assertion: confirm model_card.json version matches current deployed model artifact version
Output schema documentation test: validate that all output fields documented in IFU match actual model output schema (contract testing)

Tooling options: Pydantic schema validation, custom markdown linting, Hugging Face model card validators

Acceptance threshold: IFU updated within 24 hours of any model release; all documented output fields present in actual outputs

Art.14 — Human Oversight

Art.14 requires that high-risk AI systems enable appropriate human oversight — including the ability to stop, override, or correct the system at any point. These mechanisms must be functional and tested.

CI/CD Gates Required:

Stop-function test: assert that calling the documented emergency stop endpoint returns HTTP 200 and halts inference within 500ms
Override mechanism test: verify that human review queue receives flagged decisions and that approved overrides propagate correctly to downstream systems
Anomaly alert test: confirm that inputs outside the normal distribution trigger alerts to the human oversight channel within 60 seconds
Audit log for overrides: verify that all human override events are logged with operator ID, timestamp, original AI decision, and corrected decision
Role-based access test: assert that only authorized roles can access override controls (test with unauthorized token — must return 403)

Tooling options: Postman/Newman API tests, pytest with mock inference endpoints, k6 for latency assertions

Acceptance threshold: Stop function ≤500ms response; anomaly alerts delivered ≤60s; override access control enforced

Art.15 — Accuracy, Robustness, and Cybersecurity

Art.15 requires high-risk AI systems to achieve appropriate levels of accuracy and be resilient to errors, inconsistencies, and adversarial inputs. These thresholds must be documented and verified continuously.

CI/CD Gates Required:

Accuracy threshold gate: fail build if primary accuracy metric falls below documented minimum threshold (from risk register)
Robustness test suite: run adversarial input battery — edge cases, out-of-distribution inputs, boundary conditions — assert error rate ≤ documented maximum
Consistency test: same input + same context must produce consistent outputs (±defined tolerance) across 100 repeat calls
Performance drift check: compare current accuracy metrics against baseline established at last certification; alert if drift ≥ 2%
Cybersecurity scan: run OWASP model-threat checklist — prompt injection, model extraction probes, membership inference tests
Graceful degradation test: verify system returns structured error (not unhandled exception) when input is malformed or adversarial

Tooling options: Giskard (AI testing), pytest-benchmark, custom robustness harness, Adversarial Robustness Toolbox (IBM)

Acceptance threshold: All accuracy thresholds met; adversarial error rate within documented bounds; zero unhandled exceptions on malformed inputs

August 2026 Readiness Scorecard

Use this scorecard to assess your current compliance posture and prioritize remediation.

Obligation	Automated Gate	Points	Status
Art.9: Risk register TTL check	Implemented + passing	10	[ ]
Art.9: Mitigation coverage per risk	Implemented + passing	10	[ ]
Art.9: Pre-deploy capability delta scan	Implemented + passing	15	[ ]
Art.12: Log schema validation	Implemented + passing	15	[ ]
Art.12: 6-month retention enforcement	Policy configured	10	[ ]
Art.12: Immutability assertion	Implemented + passing	5	[ ]
Art.13: IFU freshness gate	Implemented + passing	10	[ ]
Art.13: Required sections check	Implemented + passing	10	[ ]
Art.13: Model card sync assertion	Implemented + passing	5	[ ]
Art.14: Stop-function test	Implemented + passing	15	[ ]
Art.14: Override mechanism test	Implemented + passing	10	[ ]
Art.14: Anomaly alert test	Implemented + passing	10	[ ]
Art.14: Override audit logging	Implemented + passing	5	[ ]
Art.15: Accuracy threshold gate	Implemented + passing	15	[ ]
Art.15: Adversarial robustness suite	Implemented + passing	15	[ ]
Art.15: Performance drift monitoring	Implemented + passing	10	[ ]
Art.15: Cybersecurity scan	Implemented + passing	15	[ ]
Total		180

Scoring interpretation:

160–180 points: High readiness — focus on documentation and final evidence packaging
120–159 points: Medium readiness — 2–4 weeks of focused sprint work needed
80–119 points: Low readiness — prioritize Art.14 (stop-function) and Art.12 (logging) immediately; these are highest enforcement priority
Below 80 points: Critical gap — escalate to CTO immediately; consider engaging specialized EU AI compliance counsel

The 8-Week Action Plan (June 3 → August 2, 2026)

Weeks 1–2 (June 3–15): Foundation Gates

Priority 1 — Art.12 logging (highest enforcement visibility): Every NCA audit will start by requesting logs. This is the easiest thing to get right and the most damaging to fail on.

Define and document your log schema (decision_id, timestamp, model_version, input_hash, output_value, confidence_score)
Instrument your inference layer to emit structured events per schema
Configure storage with TTL ≥ 183 days + append-only permissions
Write CI/CD log schema validation test — make it a required gate

Priority 2 — Art.14 stop-function (highest legal liability): A non-functional stop button is grounds for immediate enforcement action.

Implement and document emergency stop endpoint
Write pytest test asserting ≤500ms stop response time
Add to required CI/CD gate list

Checkpoint: By June 15, Art.12 logging and Art.14 stop-function gates must be passing in CI.

Weeks 3–4 (June 15–29): Transparency and Documentation

Art.13 IFU refresh:

Audit your current Instructions for Use — do all required sections exist?
Add IFU freshness gate to CI pipeline (check last-modified date vs. last model training run)
Generate or update model card with current performance metrics

Art.9 risk register:

Review risk register currency — is it current with your deployed model?
Add mitigation coverage test: for each HIGH/CRITICAL risk, assert a passing test exists
Implement pre-deploy capability delta scan

Checkpoint: By June 29, IFU is current, model card is published, risk register is < 90 days old, and all Art.13 gates are passing in CI.

Weeks 5–6 (June 29 – July 13): Robustness and Accuracy

Art.15 adversarial testing:

Define accuracy thresholds and document them in risk register
Build adversarial input battery (edge cases, boundary conditions, OOD inputs)
Configure CI/CD accuracy threshold gate — fail build below threshold
Set up performance drift monitoring with ±2% alert threshold
Run OWASP model-threat checklist and remediate findings

Checkpoint: By July 13, all Art.15 gates implemented and passing; adversarial error rates within documented bounds.

Weeks 7–8 (July 13 – August 2): Evidence Packaging and Final Checks

Pre-deadline sprint:

Run complete readiness scorecard — target ≥160 points
Package compliance evidence: CI pipeline reports, log retention policy docs, IFU version history, risk register + mitigation test results
Prepare Art.47 EU Declaration of Conformity draft (or verify existing one is current)
Conduct internal tabletop exercise: simulate an NCA audit request and verify you can produce all required evidence within 48 hours
Final full pipeline run — confirm all gates green

Common Mistakes to Avoid

Mistake 1: Treating compliance gates as one-time checks Compliance gates must run on every build. A gate you wrote in January that breaks silently in July means you were non-compliant for 6 months without knowing it. Use CI required-status checks — never optional.

Mistake 2: Logging at the wrong granularity Art.12 requires logging at the individual decision level — not batch summaries, not aggregate metrics. Each inference call must produce one log record. Courts and NCAs will check this.

Mistake 3: Confusing Art.12 (provider) with Art.26(5) (deployer) If you are a provider, Art.12 requires you to build the logging into the system. If you are a deployer, Art.26(5) requires you to keep those logs for at least 6 months and make them available upon request. Many organizations are both — make sure both obligations are met.

Mistake 4: Stop-function that only stops the UI Art.14 stop capability must halt inference — not just the front-end. Test that your stop endpoint actually terminates the prediction pipeline, not just hides the interface from the user.

Mistake 5: Documentation that doesn't match deployed model An IFU for model v1.2 that describes a model now running v1.4 is non-compliant. Your CI pipeline must version-lock documentation to model artifacts.

The Minimal Viable Compliance Pipeline

If you're behind schedule, this is the absolute minimum viable pipeline to have running by August 2:

# .github/workflows/ai-act-compliance.yml
name: EU AI Act Compliance Gates

on: [push, pull_request]

jobs:
  compliance:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Art.12: Log schema validation
      - name: Validate log schema
        run: python scripts/validate_log_schema.py --schema schemas/decision_log_v1.json

      # Art.13: IFU freshness check
      - name: Check IFU currency
        run: python scripts/check_ifu_freshness.py --max-age-days 90

      # Art.14: Stop-function response time
      - name: Test stop function
        run: pytest tests/test_stop_function.py --timeout=5

      # Art.15: Accuracy threshold
      - name: Accuracy gate
        run: pytest tests/test_accuracy_threshold.py

      # Art.15: Robustness (adversarial inputs)
      - name: Robustness test suite
        run: pytest tests/test_robustness.py

      # Art.9: Risk register TTL
      - name: Risk register currency
        run: python scripts/check_risk_register.py --max-age-days 90

Each of these scripts can be implemented in under a day. The test assertions are simple — the value is in running them automatically on every build so you have continuous evidence of compliance.

What NCAs Will Look For

Based on current NCA guidance from the EU AI Office and national market surveillance authorities, enforcement actions for high-risk AI systems are expected to focus on:

Evidence of systematic testing — not just assertions that the system is compliant, but automated test results showing continuous verification
Log completeness — auditors will request a random sample of decision logs and check for required fields
Incident response readiness — specifically, whether Art.73 incident reporting procedures are documented and rehearsed
Documentation currency — IFU and technical documentation must be current with the deployed model, not last year's version

The CI/CD gates in this series directly address all four enforcement focus areas.

Series Complete: Your CI/CD Compliance Stack

After five posts, your EU AI Act CI/CD compliance stack should include:

Art.9 layer: Risk register TTL gate → mitigation coverage test → pre-deploy delta scan

Art.12 layer: Log schema validation → retention policy assertion → immutability check → freshness probe

Art.13 layer: IFU freshness gate → section completeness check → model card sync assertion → output schema contract test

Art.14 layer: Stop-function latency test → override mechanism test → anomaly alert latency assertion → access control test

Art.15 layer: Accuracy threshold gate → adversarial input battery → consistency test → performance drift check → cybersecurity scan

When all five layers are running on every build, you have a defensible, auditable compliance posture — one that generates continuous evidence rather than a point-in-time snapshot.

August 2, 2026 is the deadline. Eight weeks is enough time to build this, if you start today.

This post is part of the sota.io EU AI Act CI/CD Compliance Testing series. Deploy your high-risk AI on EU infrastructure with built-in compliance support at sota.io.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.

Join the waitlist View pricing