2026-06-03·5 min read·sota.io Team

EU AI Act Art.12 & Art.19 CI/CD Gates: Automating Record-Keeping and Audit Trail Verification for High-Risk AI 2026

Post #1470 — EU AI Act CICD Compliance Testing Series #4/5

EU AI Act Art.12 and Art.19 CI/CD audit trail verification gates for high-risk AI systems

In Part 1 of this series we scaffolded the pipeline. In Part 2 we automated Art.15 accuracy gates. In Part 3 we tested transparency and human oversight. Now in Part 4 we tackle the two articles that regulators will reach for first when investigating a high-risk AI incident: Art.12 (Record-keeping) and Art.19 (Automatically generated logs).

Logs are your alibi. If your high-risk AI system processes a decision that is later disputed — a credit denial, a medical triage flag, a recruitment filter — the first question any regulator or affected person will ask is: what did the system actually do, and what inputs led to that output? Art.12 requires your system to be technically capable of answering that question automatically. Art.19 requires deployers to keep the answer for at least six months.

This part of the series shows you how to gate every deployment on those two obligations.


What Art.12 Actually Requires

EU AI Act Art.12 — Record-keeping imposes a technical capability requirement on providers of high-risk AI systems. The system itself must be engineered to generate logs automatically, throughout its lifetime. Three points from the article text that have direct engineering implications:

Automatic recording is mandatory. The logging must happen without human initiation. A system that produces logs only when an operator presses "export" does not satisfy Art.12. Event capture must be a built-in function of the system, not an operational process around it.

Traceability level must match intended purpose. Art.12 does not define a fixed list of required fields for every system. Instead it anchors the requirement to the intended purpose: the logging must enable traceability at a level appropriate to what the system is designed to do. For a recruitment screening tool, that means tracing which candidate data drove which score. For a medical imaging classifier, it means tracing which image regions influenced which diagnostic flag.

Biometric identification systems face stricter logging. For high-risk AI systems that fall under Annex III point 1(a) — remote biometric identification — Art.12 specifies a minimum set of logged fields: the start and end date and time of each use, the reference database against which input data was checked, the input data for which the search produced a match, and the identification of the natural persons involved in verification as referenced in Art.14(5). This is the most prescriptive logging requirement in the article and applies regardless of how the system is deployed.


What Art.19 Actually Requires

EU AI Act Art.19 — Automatically generated logs is addressed to deployers, not providers. It creates a retention obligation: deployers of high-risk AI systems shall keep the automatically generated logs, to the extent those logs are under their control, for a minimum of 6 months, unless Union or national law applicable to the use case requires a longer period.

The "to the extent under their control" caveat matters. When a deployer runs a provider's AI system as a cloud service where the provider controls the infrastructure, the deployer may not have direct access to all log outputs. In such cases, the deployer needs a contractual right to retrieve logs from the provider and a process to do so within the retention window. Your CI/CD pipeline should gate on whether that contract right and retrieval mechanism exist, not just on whether logs are being generated.

Two practical implications fall out of Art.19:

  1. Six months is the floor, not the target. If your use case is regulated by sector-specific law — financial services, healthcare, employment — those sectoral regulations may impose longer retention. A medical device context intersecting with DORA-equivalent obligations may require two or more years.

  2. Logs that exist but are inaccessible do not comply. Retention means accessible retention. Logs archived to cold storage with no documented retrieval process — or worse, logs that exist in the provider's system but the deployer has no legal right to access — will fail an audit.


Building the CI/CD Gate Suite for Art.12 and Art.19

Five gate types cover the full obligation. We build each as a standalone test that can be wired into any pipeline.

Gate 1: Log Emission Gate

This gate verifies that your AI system actually emits log events when it processes inputs. It is the most fundamental Art.12 check and the first thing an auditor will ask for evidence of.

What it tests: Run the AI system against a set of synthetic inputs in a test environment. After execution, assert that a log entry was created for each input. Assert that the log entry contains at minimum: a timestamp, an input identifier or hash, the model version, and an output summary or prediction identifier.

# tests/compliance/test_art12_log_emission.py
import pytest
import json
import hashlib
from pathlib import Path
from datetime import datetime, timezone

from your_ai_system import process_input, get_log_entries

REQUIRED_LOG_FIELDS = {
    "timestamp",
    "input_id",
    "model_version",
    "output_id",
    "processing_duration_ms",
}

@pytest.fixture
def synthetic_inputs():
    return [
        {"id": "test-001", "payload": {"text": "sample input A"}},
        {"id": "test-002", "payload": {"text": "sample input B"}},
        {"id": "test-003", "payload": {"text": "sample input C"}},
    ]

def test_log_emitted_for_each_input(synthetic_inputs, tmp_log_dir):
    """Art.12: system must automatically record events for each processed input."""
    for inp in synthetic_inputs:
        process_input(inp["id"], inp["payload"])

    entries = get_log_entries(tmp_log_dir, since="1h")
    logged_ids = {e["input_id"] for e in entries}

    for inp in synthetic_inputs:
        assert inp["id"] in logged_ids, (
            f"Art.12 violation: no log entry found for input {inp['id']}. "
            "High-risk AI systems must automatically record events for every processed input."
        )

def test_log_fields_complete(synthetic_inputs, tmp_log_dir):
    """Art.12: log entries must contain fields sufficient for traceability."""
    process_input(synthetic_inputs[0]["id"], synthetic_inputs[0]["payload"])
    entries = get_log_entries(tmp_log_dir, since="1h")

    entry = next(e for e in entries if e.get("input_id") == synthetic_inputs[0]["id"])
    missing = REQUIRED_LOG_FIELDS - entry.keys()

    assert not missing, (
        f"Art.12 violation: log entry missing required fields: {missing}. "
        f"Present fields: {list(entry.keys())}"
    )

GitHub Actions step:

- name: Art.12 Log Emission Gate
  run: |
    pytest tests/compliance/test_art12_log_emission.py -v \
      --tb=short \
      --junit-xml=reports/art12-log-emission.xml
  env:
    AI_LOG_DIR: ${{ runner.temp }}/ai-logs
    AI_MODEL_VERSION: ${{ github.sha }}

Gate 2: Log Schema Completeness Gate

This gate goes beyond checking that some log entry exists. It validates the schema of every log entry against a versioned contract that encodes your traceability requirements for the system's intended purpose.

This is where the "appropriate to intended purpose" language of Art.12 becomes concrete. Different high-risk AI use cases require different log schemas. A job recruitment classifier needs to log the candidate identifier, the scoring model version, the feature vector summary, and the decision threshold crossed. A credit risk model needs the applicant pseudonym, the feature set, the model prediction, and the decision tier.

Schema-as-contract approach: maintain a JSON Schema document per system that defines the minimum required fields. The CI gate validates every emitted log entry against that schema.

# tests/compliance/test_art12_log_schema.py
import pytest
import jsonschema
from your_ai_system import get_log_entries, LOG_SCHEMA_PATH

def load_schema():
    import json
    with open(LOG_SCHEMA_PATH) as f:
        return json.load(f)

def test_all_log_entries_match_schema(production_log_sample):
    """Art.12: every log entry must conform to the purpose-appropriate schema."""
    schema = load_schema()
    violations = []

    for entry in production_log_sample:
        try:
            jsonschema.validate(instance=entry, schema=schema)
        except jsonschema.ValidationError as e:
            violations.append({
                "entry_id": entry.get("input_id", "unknown"),
                "error": e.message,
                "path": list(e.path),
            })

    assert not violations, (
        f"Art.12 schema violations in {len(violations)} log entries:\n"
        + "\n".join(f"  [{v['entry_id']}] {v['path']}: {v['error']}" for v in violations[:5])
    )

Schema document example for a recruitment screening classifier:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "HighRiskAILogEntry",
  "description": "Art.12 compliant log entry for recruitment AI classifier",
  "type": "object",
  "required": [
    "timestamp",
    "system_id",
    "model_version",
    "input_id",
    "input_hash",
    "output_prediction",
    "output_confidence",
    "decision_threshold",
    "processing_duration_ms",
    "operator_id"
  ],
  "properties": {
    "timestamp": {
      "type": "string",
      "format": "date-time",
      "description": "ISO 8601 UTC timestamp of event recording"
    },
    "system_id": {
      "type": "string",
      "description": "Unique identifier of the high-risk AI system"
    },
    "model_version": {
      "type": "string",
      "description": "Semantic version or commit SHA of the deployed model"
    },
    "input_id": {
      "type": "string",
      "description": "Pseudonymised identifier of the processed subject"
    },
    "input_hash": {
      "type": "string",
      "description": "SHA-256 hash of the input feature vector for integrity verification"
    },
    "output_prediction": {
      "type": "number",
      "minimum": 0,
      "maximum": 1
    },
    "output_confidence": {
      "type": "number"
    },
    "decision_threshold": {
      "type": "number",
      "description": "Threshold value applied at inference time — must be logged per Art.12"
    },
    "processing_duration_ms": {
      "type": "integer"
    },
    "operator_id": {
      "type": "string",
      "description": "Identifier of the deployer operator — required for Art.26 traceability"
    }
  },
  "additionalProperties": true
}

Gate 3: Log Retention Policy Gate

This gate verifies that the retention policy configured for your log storage backend meets the Art.19 six-month floor. It is a configuration check, not a behavioral test — but configuration checks fail silently in production and should be automated at deploy time.

The gate checks three things:

  1. The retention policy value in the log storage configuration
  2. Whether the configured period is at least 183 days (6 calendar months)
  3. Whether a retrieval process is documented and executable
# tests/compliance/test_art19_retention_policy.py
import pytest
from your_ai_system.config import get_log_retention_config

MIN_RETENTION_DAYS = 183  # 6 months floor per Art.19

def test_log_retention_period_meets_minimum():
    """Art.19: deployers must retain automatically generated logs for at least 6 months."""
    config = get_log_retention_config()
    actual_days = config.get("retention_days", 0)

    assert actual_days >= MIN_RETENTION_DAYS, (
        f"Art.19 violation: log retention configured for {actual_days} days "
        f"but EU AI Act Art.19 requires minimum {MIN_RETENTION_DAYS} days (6 months). "
        f"Increase retention in log storage configuration."
    )

def test_log_retention_applies_to_all_log_types():
    """Art.19: retention must cover all automatically generated log categories."""
    config = get_log_retention_config()
    required_log_types = {
        "inference_logs",
        "decision_logs",
        "error_logs",
        "model_version_logs",
    }

    configured_types = set(config.get("covered_log_types", []))
    missing = required_log_types - configured_types

    assert not missing, (
        f"Art.19 violation: retention policy does not cover: {missing}. "
        "All automatically generated log categories must be retained."
    )

def test_log_retrieval_process_documented():
    """Art.19: logs under deployer control must be accessible for the retention period."""
    config = get_log_retention_config()
    retrieval_runbook = config.get("retrieval_runbook_url")

    assert retrieval_runbook, (
        "Art.19 compliance gap: no log retrieval runbook documented. "
        "Deployers must be able to retrieve logs on request during the retention period. "
        "Add 'retrieval_runbook_url' to log retention configuration."
    )

Terraform configuration example that can be validated by this gate:

# infrastructure/logging.tf
resource "aws_cloudwatch_log_group" "ai_system_logs" {
  name              = "/high-risk-ai/${var.system_id}/inference"
  retention_in_days = 210  # 7 months — exceeds Art.19 minimum of 183 days

  tags = {
    eu_ai_act_art19_compliant = "true"
    minimum_required_days     = "183"
    system_id                 = var.system_id
    compliance_review_date    = "2026-08-02"
  }
}

# Gate reads this output to validate retention value
output "log_retention_days" {
  value = aws_cloudwatch_log_group.ai_system_logs.retention_in_days
}

output "log_group_name" {
  value = aws_cloudwatch_log_group.ai_system_logs.name
}

GitLab CI stage that runs the Terraform output validation:

art19-retention-gate:
  stage: compliance-check
  script:
    - terraform init
    - terraform plan -out=tfplan
    - RETENTION=$(terraform show -json tfplan | jq '.planned_values.outputs.log_retention_days.value')
    - |
      if [ "$RETENTION" -lt 183 ]; then
        echo "Art.19 FAIL: retention=$RETENTION days, minimum=183"
        exit 1
      fi
      echo "Art.19 PASS: retention=$RETENTION days"
    - pytest tests/compliance/test_art19_retention_policy.py -v
  artifacts:
    reports:
      junit: reports/art19-retention.xml

Gate 4: Log Integrity Gate

This gate verifies that your logs are tamper-evident. Art.12 requires that logs support post-market monitoring and incident investigation. Those two activities require that a regulator (or an affected person) can trust the logs as a faithful record of what actually happened — which means the log pipeline cannot allow undetected modification.

The practical implementation uses append-only storage plus hash chain verification or a write-once object storage policy. The gate validates that neither mechanism has been compromised.

# tests/compliance/test_art12_log_integrity.py
import pytest
import hashlib
import json

def compute_entry_hash(entry: dict, prev_hash: str) -> str:
    """Compute deterministic hash for a log entry chained to the previous entry."""
    canonical = json.dumps(entry, sort_keys=True, separators=(",", ":"))
    chain_input = f"{prev_hash}:{canonical}"
    return hashlib.sha256(chain_input.encode()).hexdigest()

def test_log_chain_integrity(log_chain_export):
    """Art.12: log records must support trusted audit reconstruction."""
    violations = []
    prev_hash = "genesis"

    for i, entry in enumerate(log_chain_export):
        stored_hash = entry.pop("chain_hash", None)
        expected_hash = compute_entry_hash(entry, prev_hash)

        if stored_hash != expected_hash:
            violations.append({
                "index": i,
                "entry_id": entry.get("input_id", "unknown"),
                "expected": expected_hash[:16] + "...",
                "stored": (stored_hash or "MISSING")[:16] + "...",
            })

        prev_hash = stored_hash or expected_hash

    assert not violations, (
        f"Art.12 integrity violations: {len(violations)} broken chain links.\n"
        + "\n".join(
            f"  [{v['index']}] {v['entry_id']}: expected {v['expected']}, got {v['stored']}"
            for v in violations[:5]
        )
    )

def test_log_storage_is_append_only():
    """Art.12: log storage must not allow modification of existing entries."""
    from your_ai_system.logging import LogStorageClient
    client = LogStorageClient()
    storage_config = client.get_storage_policy()

    assert storage_config.get("append_only") is True, (
        "Art.12 compliance gap: log storage does not enforce append-only policy. "
        "Existing log entries must not be modifiable."
    )
    assert storage_config.get("object_lock_enabled") or storage_config.get("worm_enabled"), (
        "Art.12 compliance gap: no WORM (Write Once Read Many) or object-lock policy detected. "
        "Configure S3 Object Lock or equivalent to prevent log tampering."
    )

Gate 5: Audit Reconstruction Gate

This is the end-to-end gate. It simulates what happens when a regulator or an affected person submits a subject access request or audit query: can you reconstruct a complete, coherent audit trail for a specific input, from receipt to output, using only the logs your system generated?

This gate tests that the logs are not only present, complete, and intact — but useful.

# tests/compliance/test_art12_audit_reconstruction.py
import pytest
from datetime import datetime, timezone, timedelta
from your_ai_system.audit import AuditTrailReconstructor

def test_full_audit_trail_reconstructable(known_test_case_id):
    """
    Art.12: logging must ensure traceability of the AI system's functioning.
    This test verifies that a complete audit trail can be reconstructed for
    a given input — simulating a regulator or subject access request.
    """
    reconstructor = AuditTrailReconstructor()
    trail = reconstructor.reconstruct(input_id=known_test_case_id)

    # Trail must include the full lifecycle
    assert trail.input_received_at is not None, "Audit trail missing: input receipt timestamp"
    assert trail.model_version is not None, "Audit trail missing: model version at inference time"
    assert trail.input_hash is not None, "Audit trail missing: input integrity hash"
    assert trail.output_prediction is not None, "Audit trail missing: prediction output"
    assert trail.decision_outcome is not None, "Audit trail missing: final decision"
    assert trail.human_review_required is not None, "Audit trail missing: human oversight flag (Art.14)"

    # Trail must be within retention window
    age_days = (datetime.now(timezone.utc) - trail.input_received_at).days
    assert age_days <= 183, (
        f"Art.19 violation: audit trail for input {known_test_case_id} is {age_days} days old "
        f"but retention is only 183 days. Extend retention policy."
    )

def test_audit_trail_query_latency(known_test_case_id):
    """
    Art.12 practical requirement: audit trail retrieval must be feasible on request.
    Gate fails if reconstruction takes longer than 30 seconds — regulators
    expect prompt responses to investigation queries.
    """
    import time
    reconstructor = AuditTrailReconstructor()

    start = time.monotonic()
    trail = reconstructor.reconstruct(input_id=known_test_case_id)
    elapsed = time.monotonic() - start

    assert trail is not None
    assert elapsed < 30.0, (
        f"Audit trail reconstruction took {elapsed:.1f}s. "
        "Acceptable maximum is 30s for regulatory investigation feasibility."
    )

Complete GitHub Actions Workflow

This workflow runs all five Art.12 and Art.19 gates as a single compliance job that must pass before any deployment to staging or production.

# .github/workflows/eu-ai-act-compliance-art12-art19.yml
name: EU AI Act Art.12 & Art.19 Compliance Gates

on:
  push:
    branches: [main, staging]
  pull_request:
    branches: [main]

jobs:
  art12-art19-compliance:
    name: Art.12 Record-Keeping & Art.19 Log Retention Gates
    runs-on: ubuntu-latest
    timeout-minutes: 20

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Configure test log environment
        run: |
          mkdir -p ${{ runner.temp }}/ai-logs
          cp tests/fixtures/retention-config.json config/log-retention.json
        env:
          LOG_RETENTION_DAYS: "210"

      - name: Gate 1 — Art.12 Log Emission
        run: |
          pytest tests/compliance/test_art12_log_emission.py -v \
            --junit-xml=reports/gate1-log-emission.xml
        env:
          AI_LOG_DIR: ${{ runner.temp }}/ai-logs

      - name: Gate 2 — Art.12 Log Schema Completeness
        run: |
          pytest tests/compliance/test_art12_log_schema.py -v \
            --junit-xml=reports/gate2-log-schema.xml

      - name: Gate 3 — Art.19 Log Retention Policy
        run: |
          pytest tests/compliance/test_art19_retention_policy.py -v \
            --junit-xml=reports/gate3-retention.xml

      - name: Gate 4 — Art.12 Log Integrity
        run: |
          pytest tests/compliance/test_art12_log_integrity.py -v \
            --junit-xml=reports/gate4-integrity.xml

      - name: Gate 5 — Art.12 Audit Reconstruction
        run: |
          pytest tests/compliance/test_art12_audit_reconstruction.py -v \
            --junit-xml=reports/gate5-audit-reconstruction.xml

      - name: Publish compliance test results
        uses: dorny/test-reporter@v1
        if: always()
        with:
          name: EU AI Act Art.12 & Art.19 Compliance Report
          path: reports/gate*.xml
          reporter: java-junit

      - name: Block deployment on failure
        if: failure()
        run: |
          echo "::error::EU AI Act Art.12/Art.19 compliance gate failed."
          echo "::error::High-risk AI systems must not be deployed without record-keeping compliance."
          echo "::error::Review gate reports and remediate before re-running."
          exit 1

Integration with Existing Pipeline Stages

If you are building on the foundation from earlier parts of this series, Art.12 and Art.19 gates fit naturally between the inference tests and the deployment approval stage:

┌─────────────────────────────────────────────────────────┐
│  CI/CD Pipeline — EU AI Act High-Risk AI Provider       │
│                                                         │
│  1. Build & Unit Tests                                  │
│  2. Art.15 — Accuracy & Robustness Gates     (Part 2)  │
│  3. Art.13 — Transparency Gates              (Part 3)  │
│  4. Art.14 — Human Oversight Gates           (Part 3)  │
│  5. Art.12 — Record-Keeping Gates    ← THIS POST       │
│  6. Art.19 — Log Retention Gates     ← THIS POST       │
│  7. Integration Tests                                   │
│  8. Deployment to Staging                               │
│  9. Smoke Tests + Art.12 Retention Spot-Check          │
│  10. Deployment to Production                           │
└─────────────────────────────────────────────────────────┘

The Art.12 and Art.19 gates should run in the same job since they share log infrastructure setup. Keep them in a separate job from the Art.15 accuracy gates — accuracy failures are about model quality, logging failures are about regulatory posture. Mixing them in one job obscures which category of problem caused a block.


Common Failure Patterns

Log emission works in tests but not in production. This happens when tests use a dedicated log handler that writes synchronously but production uses an async log queue. The fix: test against a realistic async pipeline, not a synchronous mock. Assert log arrival within a timeout, not instantly.

Retention policy is set but does not apply to all log categories. A common gap: inference logs have six-month retention, but error logs and model version change logs are cleaned up after 30 days. Error logs often contain the most valuable traceability information for incident investigation. Review your log storage policy by category.

Hash chain breaks after log rotation. If your logging system rotates log files and the hash chain does not carry across rotation boundaries, your chain integrity test will fail at every rotation point. Design your chain to anchor across rotation boundaries using a file-close hash stored in the new file's first entry.

Audit reconstruction times out for old entries. Log archives indexed by timestamp often have poor performance for lookup by input identifier. Build a secondary index keyed by input ID at write time. The audit reconstruction gate's 30-second latency check will catch this in CI before it becomes a regulator-facing problem.

Deployer assumes provider handles retention. Providers are responsible for building logging capability (Art.12). Deployers are responsible for retaining the logs (Art.19). If your AI system runs as a managed service from a provider, the retention obligation still falls on you as the deployer. Your CI/CD pipeline needs a gate that checks the contractual access right exists, not just that the provider's system is generating logs.


August 2026 Art.12 & Art.19 Readiness Checklist

Use this checklist in your pre-deadline review:

Provider obligations (Art.12):

Deployer obligations (Art.19):


What's Next in This Series

This is Part 4 of 5. In Part 5 — the finale — we will assemble the complete CI/CD compliance checklist across all five articles (Art.12, Art.13, Art.14, Art.15, Art.19), show how to wire the full pipeline as a reusable GitHub Actions composite action, and provide the August 2026 go/no-go readiness scorecard that ties everything together.

The August 2, 2026 deadline is now less than two months away. The record-keeping and log retention gates in this post are among the simplest to implement mechanically — and among the most valuable to have in place when an investigation starts.


This post is part of the sota.io EU AI Act CICD Compliance Testing Series. For the complete compliance guide, start with Part 1.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.