EU AI Act CI/CD Compliance Finale: Complete August 2026 Readiness Checklist & Scorecard for High-Risk AI Providers
Post #5/5 in the sota.io EU AI Act Automated CI/CD Compliance Testing Series
August 2, 2026 is 60 days away. For high-risk AI providers and deployers, that deadline marks when EU AI Act compliance obligations become fully enforceable — including Articles 9, 12, 13, 14, and 15, all of which carry specific automated-testing requirements.
Over the past four posts in this series, we built a complete CI/CD compliance testing architecture covering the five core technical obligations. In this finale, we consolidate everything into a master checklist, a readiness scorecard, and an eight-week action plan you can use to close any remaining gaps before the deadline.
Series Recap: What We Built
This five-part series covered the full technical compliance testing pipeline:
| Post | Articles | CI/CD Gate Topic |
|---|---|---|
| #1/5 | All | Automated compliance testing architecture — pipeline overview, tooling selection, gate placement strategy |
| #2/5 | Art.15 | Accuracy & robustness testing gates — threshold enforcement, adversarial input detection, performance drift alerting |
| #3/5 | Art.13 + Art.14 | Transparency + human oversight gates — IFU completeness, output schema validation, stop-function testing, override mechanism coverage |
| #4/5 | Art.12 | Record-keeping gates — log completeness, structured audit trail verification, immutability checks |
| #5/5 (this) | All | Master checklist, readiness scorecard, 8-week action plan |
The Master CI/CD Compliance Checklist
Copy this checklist directly into your sprint backlog. Each item maps to a specific EU AI Act article obligation.
Art.9 — Risk Management System
Art.9 requires a continuous risk management system — not a one-time assessment. Your CI/CD pipeline must verify that risk documentation remains current and that identified risks have corresponding mitigations in place.
CI/CD Gates Required:
- Risk register version check: confirm
risk_register.yamllast modified date is within rolling 90-day window - Mitigation coverage test: for every risk entry with severity ≥ MEDIUM, verify a corresponding mitigation test exists and passes
- Pre-deployment risk delta scan: compare new model artifacts against current risk register — fail build if new capabilities lack documented risks
- Post-deployment monitoring assertion: confirm risk monitoring queries are active and returning data (not stale)
Tooling options: Great Expectations (data profiles), custom YAML linting, OpenTelemetry metrics assertions
Acceptance threshold: All CRITICAL and HIGH risks mitigated; no undocumented capability changes shipped
Art.12 — Record-Keeping
Art.12 requires high-risk AI systems to automatically log all output decisions with sufficient context to audit them. Logs must be retained and accessible for at least 6 months (deployer obligation under Art.26(5)).
CI/CD Gates Required:
- Log schema validation: every prediction/decision event emits
decision_id,timestamp,model_version,input_hash,output_value,confidence_score - Mandatory field completeness: fail pipeline if any required field is null or missing across sampled log entries
- Log retention assertion: verify storage policy enforces minimum 6-month TTL (e.g., S3 lifecycle rules, BigQuery partition expiry)
- Immutability check: confirm log storage is append-only (no DELETE or UPDATE permissions on production log store)
- Log freshness probe: in staging, verify last log entry is within 60 seconds of system activity
Tooling options: Great Expectations, dbt tests, AWS Config rules, custom OpenTelemetry spans
Acceptance threshold: 100% required field completeness on sampled logs; storage TTL ≥ 183 days enforced
Art.13 — Transparency and Information for Deployers
Art.13 requires providers to supply deployers with sufficient information to understand the system's capabilities, limitations, accuracy metrics, and intended use cases. The Instructions for Use (IFU) must remain current as the model evolves.
CI/CD Gates Required:
- IFU freshness gate: fail build if
docs/instructions-for-use.mdlast-modified date is older than the most recent model training run - Required section check: verify IFU contains all mandatory sections — intended purpose, performance metrics per use-case, known limitations, contraindications, human oversight requirements
- Model card sync assertion: confirm
model_card.jsonversion matches current deployed model artifact version - Output schema documentation test: validate that all output fields documented in IFU match actual model output schema (contract testing)
Tooling options: Pydantic schema validation, custom markdown linting, Hugging Face model card validators
Acceptance threshold: IFU updated within 24 hours of any model release; all documented output fields present in actual outputs
Art.14 — Human Oversight
Art.14 requires that high-risk AI systems enable appropriate human oversight — including the ability to stop, override, or correct the system at any point. These mechanisms must be functional and tested.
CI/CD Gates Required:
- Stop-function test: assert that calling the documented emergency stop endpoint returns HTTP 200 and halts inference within 500ms
- Override mechanism test: verify that human review queue receives flagged decisions and that approved overrides propagate correctly to downstream systems
- Anomaly alert test: confirm that inputs outside the normal distribution trigger alerts to the human oversight channel within 60 seconds
- Audit log for overrides: verify that all human override events are logged with operator ID, timestamp, original AI decision, and corrected decision
- Role-based access test: assert that only authorized roles can access override controls (test with unauthorized token — must return 403)
Tooling options: Postman/Newman API tests, pytest with mock inference endpoints, k6 for latency assertions
Acceptance threshold: Stop function ≤500ms response; anomaly alerts delivered ≤60s; override access control enforced
Art.15 — Accuracy, Robustness, and Cybersecurity
Art.15 requires high-risk AI systems to achieve appropriate levels of accuracy and be resilient to errors, inconsistencies, and adversarial inputs. These thresholds must be documented and verified continuously.
CI/CD Gates Required:
- Accuracy threshold gate: fail build if primary accuracy metric falls below documented minimum threshold (from risk register)
- Robustness test suite: run adversarial input battery — edge cases, out-of-distribution inputs, boundary conditions — assert error rate ≤ documented maximum
- Consistency test: same input + same context must produce consistent outputs (±defined tolerance) across 100 repeat calls
- Performance drift check: compare current accuracy metrics against baseline established at last certification; alert if drift ≥ 2%
- Cybersecurity scan: run OWASP model-threat checklist — prompt injection, model extraction probes, membership inference tests
- Graceful degradation test: verify system returns structured error (not unhandled exception) when input is malformed or adversarial
Tooling options: Giskard (AI testing), pytest-benchmark, custom robustness harness, Adversarial Robustness Toolbox (IBM)
Acceptance threshold: All accuracy thresholds met; adversarial error rate within documented bounds; zero unhandled exceptions on malformed inputs
August 2026 Readiness Scorecard
Use this scorecard to assess your current compliance posture and prioritize remediation.
| Obligation | Automated Gate | Points | Status |
|---|---|---|---|
| Art.9: Risk register TTL check | Implemented + passing | 10 | [ ] |
| Art.9: Mitigation coverage per risk | Implemented + passing | 10 | [ ] |
| Art.9: Pre-deploy capability delta scan | Implemented + passing | 15 | [ ] |
| Art.12: Log schema validation | Implemented + passing | 15 | [ ] |
| Art.12: 6-month retention enforcement | Policy configured | 10 | [ ] |
| Art.12: Immutability assertion | Implemented + passing | 5 | [ ] |
| Art.13: IFU freshness gate | Implemented + passing | 10 | [ ] |
| Art.13: Required sections check | Implemented + passing | 10 | [ ] |
| Art.13: Model card sync assertion | Implemented + passing | 5 | [ ] |
| Art.14: Stop-function test | Implemented + passing | 15 | [ ] |
| Art.14: Override mechanism test | Implemented + passing | 10 | [ ] |
| Art.14: Anomaly alert test | Implemented + passing | 10 | [ ] |
| Art.14: Override audit logging | Implemented + passing | 5 | [ ] |
| Art.15: Accuracy threshold gate | Implemented + passing | 15 | [ ] |
| Art.15: Adversarial robustness suite | Implemented + passing | 15 | [ ] |
| Art.15: Performance drift monitoring | Implemented + passing | 10 | [ ] |
| Art.15: Cybersecurity scan | Implemented + passing | 15 | [ ] |
| Total | 180 |
Scoring interpretation:
- 160–180 points: High readiness — focus on documentation and final evidence packaging
- 120–159 points: Medium readiness — 2–4 weeks of focused sprint work needed
- 80–119 points: Low readiness — prioritize Art.14 (stop-function) and Art.12 (logging) immediately; these are highest enforcement priority
- Below 80 points: Critical gap — escalate to CTO immediately; consider engaging specialized EU AI compliance counsel
The 8-Week Action Plan (June 3 → August 2, 2026)
Weeks 1–2 (June 3–15): Foundation Gates
Priority 1 — Art.12 logging (highest enforcement visibility): Every NCA audit will start by requesting logs. This is the easiest thing to get right and the most damaging to fail on.
- Define and document your log schema (decision_id, timestamp, model_version, input_hash, output_value, confidence_score)
- Instrument your inference layer to emit structured events per schema
- Configure storage with TTL ≥ 183 days + append-only permissions
- Write CI/CD log schema validation test — make it a required gate
Priority 2 — Art.14 stop-function (highest legal liability): A non-functional stop button is grounds for immediate enforcement action.
- Implement and document emergency stop endpoint
- Write pytest test asserting ≤500ms stop response time
- Add to required CI/CD gate list
Checkpoint: By June 15, Art.12 logging and Art.14 stop-function gates must be passing in CI.
Weeks 3–4 (June 15–29): Transparency and Documentation
Art.13 IFU refresh:
- Audit your current Instructions for Use — do all required sections exist?
- Add IFU freshness gate to CI pipeline (check last-modified date vs. last model training run)
- Generate or update model card with current performance metrics
Art.9 risk register:
- Review risk register currency — is it current with your deployed model?
- Add mitigation coverage test: for each HIGH/CRITICAL risk, assert a passing test exists
- Implement pre-deploy capability delta scan
Checkpoint: By June 29, IFU is current, model card is published, risk register is < 90 days old, and all Art.13 gates are passing in CI.
Weeks 5–6 (June 29 – July 13): Robustness and Accuracy
Art.15 adversarial testing:
- Define accuracy thresholds and document them in risk register
- Build adversarial input battery (edge cases, boundary conditions, OOD inputs)
- Configure CI/CD accuracy threshold gate — fail build below threshold
- Set up performance drift monitoring with ±2% alert threshold
- Run OWASP model-threat checklist and remediate findings
Checkpoint: By July 13, all Art.15 gates implemented and passing; adversarial error rates within documented bounds.
Weeks 7–8 (July 13 – August 2): Evidence Packaging and Final Checks
Pre-deadline sprint:
- Run complete readiness scorecard — target ≥160 points
- Package compliance evidence: CI pipeline reports, log retention policy docs, IFU version history, risk register + mitigation test results
- Prepare Art.47 EU Declaration of Conformity draft (or verify existing one is current)
- Conduct internal tabletop exercise: simulate an NCA audit request and verify you can produce all required evidence within 48 hours
- Final full pipeline run — confirm all gates green
Common Mistakes to Avoid
Mistake 1: Treating compliance gates as one-time checks Compliance gates must run on every build. A gate you wrote in January that breaks silently in July means you were non-compliant for 6 months without knowing it. Use CI required-status checks — never optional.
Mistake 2: Logging at the wrong granularity Art.12 requires logging at the individual decision level — not batch summaries, not aggregate metrics. Each inference call must produce one log record. Courts and NCAs will check this.
Mistake 3: Confusing Art.12 (provider) with Art.26(5) (deployer) If you are a provider, Art.12 requires you to build the logging into the system. If you are a deployer, Art.26(5) requires you to keep those logs for at least 6 months and make them available upon request. Many organizations are both — make sure both obligations are met.
Mistake 4: Stop-function that only stops the UI Art.14 stop capability must halt inference — not just the front-end. Test that your stop endpoint actually terminates the prediction pipeline, not just hides the interface from the user.
Mistake 5: Documentation that doesn't match deployed model An IFU for model v1.2 that describes a model now running v1.4 is non-compliant. Your CI pipeline must version-lock documentation to model artifacts.
The Minimal Viable Compliance Pipeline
If you're behind schedule, this is the absolute minimum viable pipeline to have running by August 2:
# .github/workflows/ai-act-compliance.yml
name: EU AI Act Compliance Gates
on: [push, pull_request]
jobs:
compliance:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Art.12: Log schema validation
- name: Validate log schema
run: python scripts/validate_log_schema.py --schema schemas/decision_log_v1.json
# Art.13: IFU freshness check
- name: Check IFU currency
run: python scripts/check_ifu_freshness.py --max-age-days 90
# Art.14: Stop-function response time
- name: Test stop function
run: pytest tests/test_stop_function.py --timeout=5
# Art.15: Accuracy threshold
- name: Accuracy gate
run: pytest tests/test_accuracy_threshold.py
# Art.15: Robustness (adversarial inputs)
- name: Robustness test suite
run: pytest tests/test_robustness.py
# Art.9: Risk register TTL
- name: Risk register currency
run: python scripts/check_risk_register.py --max-age-days 90
Each of these scripts can be implemented in under a day. The test assertions are simple — the value is in running them automatically on every build so you have continuous evidence of compliance.
What NCAs Will Look For
Based on current NCA guidance from the EU AI Office and national market surveillance authorities, enforcement actions for high-risk AI systems are expected to focus on:
- Evidence of systematic testing — not just assertions that the system is compliant, but automated test results showing continuous verification
- Log completeness — auditors will request a random sample of decision logs and check for required fields
- Incident response readiness — specifically, whether Art.73 incident reporting procedures are documented and rehearsed
- Documentation currency — IFU and technical documentation must be current with the deployed model, not last year's version
The CI/CD gates in this series directly address all four enforcement focus areas.
Series Complete: Your CI/CD Compliance Stack
After five posts, your EU AI Act CI/CD compliance stack should include:
Art.9 layer: Risk register TTL gate → mitigation coverage test → pre-deploy delta scan
Art.12 layer: Log schema validation → retention policy assertion → immutability check → freshness probe
Art.13 layer: IFU freshness gate → section completeness check → model card sync assertion → output schema contract test
Art.14 layer: Stop-function latency test → override mechanism test → anomaly alert latency assertion → access control test
Art.15 layer: Accuracy threshold gate → adversarial input battery → consistency test → performance drift check → cybersecurity scan
When all five layers are running on every build, you have a defensible, auditable compliance posture — one that generates continuous evidence rather than a point-in-time snapshot.
August 2, 2026 is the deadline. Eight weeks is enough time to build this, if you start today.
This post is part of the sota.io EU AI Act CI/CD Compliance Testing series. Deploy your high-risk AI on EU infrastructure with built-in compliance support at sota.io.
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.