2026-05-03·13 min read·

AWS CodeGuru EU Alternative 2026: The Art.25 Code-Review Privacy Paradox

Post #788 in the sota.io EU Compliance Series

AWS CodeGuru Reviewer analyses your source code to find bugs, security vulnerabilities, and performance problems. It's the tool engineers use to write better, more secure code. The core problem: the code it analyses is your GDPR-sensitive application — the functions that process personal data, the authentication logic, the data access layer. To get CodeGuru's recommendations, you upload that code to Amazon's infrastructure.

The paradox is precise: you use CodeGuru to help you comply with GDPR Art.25 (Privacy by Design). But CodeGuru itself is a GDPR compliance problem. Your privacy-sensitive source code — the code that implements your privacy controls — now lives in a system under US jurisdiction that Amazon can be compelled to disclose under the CLOUD Act, without notifying you.

This guide covers what CodeGuru actually sends to AWS, the four GDPR obligations it triggers, and the EU-sovereign alternatives that keep your code in Europe.


What AWS CodeGuru Does and Why It's a GDPR Hotspot

AWS CodeGuru has two main products. CodeGuru Reviewer performs static analysis of your source code, finding security vulnerabilities, code quality issues, and AWS-specific anti-patterns. It integrates with GitHub, GitLab, Bitbucket, and CodeCommit and reviews code as part of your CI/CD pipeline. CodeGuru Profiler analyses your running application's performance characteristics in production.

Both products require sending data to AWS. Reviewer needs your source code. Profiler needs performance profiles from your production application.

The data exposure is structural. CodeGuru Reviewer cannot do its job without reading your code. Every function you have written — including the ones that handle passwords, process payment data, manage health records, or control access to personal data — is transmitted to and processed on Amazon's infrastructure.

Why this is different from standard AWS service data exposure.

Most AWS services process the data your users create. Your application code is different. Your source code is the intellectual property that describes exactly how your system handles personal data. It contains:

Sending this to a US cloud provider under CLOUD Act jurisdiction creates a disclosure surface that goes far beyond ordinary data processing.


The Four GDPR Problems

Problem 1: Art.25 Privacy by Design — The Self-Defeating Compliance Tool

GDPR Art.25 requires you to implement data protection by design and by default. In practice, this means writing code that handles personal data correctly: minimal collection, proper access controls, encryption at rest and in transit, correct deletion logic. Code reviews are one of the primary mechanisms for ensuring your implementation actually achieves these goals.

When you use CodeGuru for this purpose, you create a structural contradiction. You are reviewing code that processes personal data by uploading it to a system that processes it under US jurisdiction. The tool you use to verify your privacy controls is itself outside your control and subject to compelled disclosure.

Art.25 requires technical measures appropriate to the risk. Using a US-based code analysis service for privacy-sensitive application code is not an appropriate technical measure — it expands your exposure rather than reducing it.

Problem 2: Art.5(1)(b) Purpose Limitation — Code Analysis as Secondary Processing

GDPR Art.5(1)(b) requires that personal data is collected for specified, explicit, and legitimate purposes and not processed in a manner incompatible with those purposes.

Your source code is not personal data in the ordinary sense. But it contains patterns, structures, and identifiers that are directly linked to how you process your users' personal data. When CodeGuru analyses that code, the analysis necessarily involves Amazon processing information about your data subjects' data — their field names, identifiers, processing logic — as part of a code review service that your users never consented to and that has nothing to do with the purpose for which they provided their data.

This is a secondary processing chain that Art.5(1)(b) scrutiny applies to, particularly in fintech, health, and legal contexts where the code being reviewed handles special categories of data under Art.9.

Problem 3: Art.28 — Is Amazon a GDPR-Compliant Processor for Source Code?

GDPR Art.28 requires that when you engage a processor to handle personal data on your behalf, you have a data processing agreement that governs that processing. AWS's DPA covers the customer data you store and process through AWS services. It is much less clear that it covers the secondary data exposure that results from CodeGuru analysing your application logic.

The question your DPO needs to answer: when Amazon processes your source code through CodeGuru, and that source code contains the complete implementation of your personal data processing architecture, is Amazon acting as a processor under Art.28? If yes, your existing AWS DPA may not be sufficient. If no, you have a disclosure to a third party outside your Art.28 framework.

There is no clean answer to this question under current GDPR guidance. The ambiguity itself is a compliance risk.

Problem 4: CLOUD Act — Your Application Architecture Under a Gag Order

The CLOUD Act requires AWS to disclose data to US law enforcement when compelled, regardless of where that data is physically stored. AWS operates EU regions, but US law applies to AWS as a US company.

For ordinary application data, this creates a privacy risk for your users. For source code, the risk is different in kind. A CLOUD Act order compelling disclosure of your application code would give US authorities:

This disclosure would come under a gag order. You would not know it happened. You could not notify affected parties or your supervisory authority. GDPR Art.33 requires you to notify your supervisory authority of personal data breaches. A secret CLOUD Act disclosure of the code that processes all your users' personal data creates an Art.33 notification problem — a breach you cannot report because you do not know it occurred.


What CodeGuru Actually Transmits

Understanding the transmission scope is important for risk assessment.

CodeGuru Reviewer requires association with your source repository. When you enable it on a GitHub or GitLab repository, CodeGuru clones the relevant code and transmits it to AWS for analysis. The transmission includes:

CodeGuru Reviewer stores the code in S3 during analysis and retains analysis results for the lifetime of the association. AWS says it does not use your code to train models, but the data exists in AWS infrastructure and is subject to CLOUD Act obligations during that retention period.

CodeGuru Profiler runs an agent in your production application that samples stack frames and sends profiling data to AWS. For data-processing applications, stack frames will include function names, call paths through your data processing logic, and timing data that reveals which processing paths execute most frequently.


EU-Sovereign Alternatives

The alternatives fall into three categories: self-hosted open source, local static analysis, and local LLM-based review.

SonarQube — Self-Hosted Static Analysis

SonarQube is the most capable open-source static analysis platform. The Community Edition is free; Developer and Enterprise editions add language support and branch analysis.

Deployment on EU infrastructure:

# docker-compose.yml — SonarQube on EU VPS (e.g. Hetzner)
version: "3.8"
services:
  sonarqube:
    image: sonarqube:10-community
    environment:
      SONAR_JDBC_URL: jdbc:postgresql://db:5432/sonar
      SONAR_JDBC_USERNAME: sonar
      SONAR_JDBC_PASSWORD: ${SONAR_DB_PASSWORD}
    volumes:
      - sonarqube_data:/opt/sonarqube/data
      - sonarqube_logs:/opt/sonarqube/logs
    ports:
      - "9000:9000"
    depends_on:
      - db
  db:
    image: postgres:15
    environment:
      POSTGRES_USER: sonar
      POSTGRES_PASSWORD: ${SONAR_DB_PASSWORD}
      POSTGRES_DB: sonar
    volumes:
      - sonarqube_db:/var/lib/postgresql/data
volumes:
  sonarqube_data:
  sonarqube_logs:
  sonarqube_db:

CI/CD integration (GitHub Actions, EU runner):

- name: SonarQube Scan
  uses: sonarsource/sonarqube-scan-action@master
  env:
    SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
    SONAR_HOST_URL: https://sonar.your-eu-host.eu
  with:
    args: >
      -Dsonar.projectKey=your-project
      -Dsonar.sources=src
      -Dsonar.security.hotspots.inheritedRiskFactor=1

SonarQube's security hotspot detection covers OWASP Top 10, SQL injection, path traversal, and insecure cryptography — the categories most relevant to GDPR Art.32 technical security requirements. All analysis runs on your EU infrastructure. No code leaves your control.

GDPR advantage: SonarQube can be configured to flag GDPR-relevant patterns directly. Custom rules can detect hardcoded credentials, insecure logging of personal data, missing encryption of sensitive fields, and broken access control patterns.

Semgrep — Open Source Pattern-Based Analysis

Semgrep is a lightweight static analysis tool that uses pattern matching rather than full program analysis. It is significantly easier to set up than SonarQube and runs well in CI/CD pipelines on minimal infrastructure.

Installation and local run:

# Install
pip install semgrep

# Run with OWASP security ruleset (runs entirely locally)
semgrep --config "p/owasp-top-ten" \
        --config "p/secrets" \
        --config "p/gdpr" \
        src/

# JSON output for CI/CD integration
semgrep --config "p/security-audit" \
        --json \
        --output semgrep-results.json \
        src/

Custom GDPR Art.25 rule — detecting unencrypted personal data logging:

# rules/gdpr-logging.yaml
rules:
  - id: gdpr-pii-in-logs
    patterns:
      - pattern: |
          logger.$METHOD(..., $DATA, ...)
      - metavariable-regex:
          metavariable: $DATA
          regex: ".*(email|password|ssn|dob|address|phone|credit_card).*"
    message: "Possible PII in log statement — review for GDPR Art.5(1)(c) data minimisation"
    severity: WARNING
    languages: [python, javascript, typescript, java]

Custom rule — detecting missing Art.17 deletion coverage:

rules:
  - id: gdpr-missing-delete-path
    pattern: |
      def $FUNC(...):
        ...
        $DB.insert(...)
        ...
    message: "Function inserts data but has no visible delete path — verify Art.17 erasure coverage"
    severity: INFO
    languages: [python]

Semgrep's community registry includes a dedicated GDPR ruleset covering common compliance patterns. Running it locally means your code never leaves the EU.

Local LLM Code Review — Air-Gapped Analysis

For organisations handling special categories of data under Art.9 (health, biometric, financial), even self-hosted SaaS poses residual risk. Local LLM code review provides analysis with complete data sovereignty.

Setup with Ollama and a code-capable model:

# Install Ollama (runs entirely locally)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a code-capable model (runs on your GPU/CPU)
ollama pull codellama:13b
# or for larger context window:
ollama pull deepseek-coder:33b

# Review a file
ollama run codellama:13b << 'EOF'
Review the following Python function for GDPR compliance issues.
Check for: PII in logs, missing encryption, broken access control,
Art.17 erasure gaps, Art.5(1)(c) minimisation violations.

$(cat src/user_service.py)
EOF

Batch review script for CI/CD:

import subprocess
import json
from pathlib import Path

def review_file_for_gdpr(filepath: str) -> dict:
    """Run local LLM review on a source file. No data leaves the machine."""
    code = Path(filepath).read_text()
    prompt = f"""Analyse this source file for GDPR Art.25 (Privacy by Design) violations.
Check specifically:
1. Personal data logged without sanitisation (Art.5(1)(c))
2. Missing encryption for stored personal data (Art.32)
3. No Art.17 deletion path for personal data created
4. Hardcoded credentials or API keys
5. Overly broad database queries fetching unnecessary fields (Art.5(1)(c))

Respond as JSON: {{"issues": [{{"line": N, "severity": "HIGH|MEDIUM|LOW", "article": "Art.XX", "description": "..."}}]}}

Code:
{code}"""

    result = subprocess.run(
        ["ollama", "run", "codellama:13b", prompt],
        capture_output=True,
        text=True,
        timeout=120
    )
    try:
        return json.loads(result.stdout)
    except json.JSONDecodeError:
        return {"raw": result.stdout}

# Run on all Python files in src/
for py_file in Path("src").rglob("*.py"):
    findings = review_file_for_gdpr(str(py_file))
    if findings.get("issues"):
        print(f"\n{py_file}:")
        for issue in findings["issues"]:
            print(f"  Line {issue['line']} [{issue['severity']}] {issue['article']}: {issue['description']}")

Local LLM review is slower than SonarQube and less precise for traditional bug classes, but it is the only approach that offers complete code sovereignty with natural language reasoning about GDPR-specific patterns that rule-based tools miss.

PMD and SpotBugs — Java-Specific Self-Hosted Analysis

For Java applications, PMD and SpotBugs provide mature, self-hosted analysis that integrates with Maven and Gradle without any cloud component.

<!-- pom.xml — SpotBugs + PMD in Maven build, no cloud dependency -->
<plugin>
  <groupId>com.github.spotbugs</groupId>
  <artifactId>spotbugs-maven-plugin</artifactId>
  <version>4.8.3.1</version>
  <configuration>
    <effort>Max</effort>
    <threshold>Low</threshold>
    <includeFilterFile>spotbugs-security-include.xml</includeFilterFile>
  </configuration>
</plugin>
<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-pmd-plugin</artifactId>
  <version>3.21.2</version>
  <configuration>
    <rulesets>
      <ruleset>/rulesets/java/maven-pmd-plugin-default.xml</ruleset>
      <ruleset>gdpr-custom-rules.xml</ruleset>
    </rulesets>
  </configuration>
</plugin>

Migration from CodeGuru to EU-Sovereign Analysis

The practical migration path depends on where CodeGuru is in your pipeline.

Step 1: Inventory your CodeGuru associations. List all repositories currently associated with CodeGuru Reviewer in your AWS console. These are the codebases that have been or are being transmitted to AWS.

aws codeguru-reviewer list-repository-associations \
  --region eu-west-1 \
  --query 'RepositoryAssociationSummaries[*].{Name:Name,State:State,Owner:Owner}'

Step 2: Disassociate repositories. Before disassociating, document any active recommendations you want to carry forward.

aws codeguru-reviewer disassociate-repository \
  --association-arn arn:aws:codeguru-reviewer:eu-west-1:123456789:association/XXXX

Step 3: Set up SonarQube or Semgrep. For most teams, Semgrep is the fastest path to equivalent security coverage. SonarQube adds technical debt tracking and quality gates if you need CI/CD enforcement.

Step 4: Review CodeGuru Profiler separately. If you are using CodeGuru Profiler in production, your production application has been sending stack frame data to AWS. Migrating to a self-hosted profiler (Pyroscope, async-profiler with Grafana) is a separate exercise from Reviewer migration.


Art.25 Compliance Checklist for Code Review Tools

Before deploying any code review tool for GDPR-sensitive applications:


Conclusion

AWS CodeGuru is a capable code review tool. The GDPR problem is not a configuration issue — it is structural. Every time CodeGuru analyses your codebase, your application's complete data processing architecture is transmitted to and processed by a US company that can be compelled to disclose it under the CLOUD Act.

The self-defeating quality of the problem is what makes it distinctive: the tool you would naturally use to implement GDPR Art.25 (Privacy by Design) is itself incompatible with Art.25 if your code handles personal data. Using CodeGuru to review your privacy controls means your privacy controls live on US infrastructure, subject to disclosure without your knowledge.

SonarQube on an EU VPS, Semgrep running locally, and local LLM-based review with Ollama all provide equivalent or superior analysis quality with complete code sovereignty. The migration effort is lower than most teams expect, and the security model is fundamentally sound: the tool that reviews your privacy-sensitive code never leaves your infrastructure.

For fintech, health, and legal applications handling Art.9 special categories data, this is not an optional optimisation. It is a requirement of Art.25 itself.


sota.io is a European PaaS that runs entirely on EU infrastructure — no US parent company, no CLOUD Act exposure for your deployed applications. If you're migrating away from AWS services for GDPR compliance, sota.io is the deployment layer that completes the picture.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.