2026-05-01·13 min read·

AWS Detective EU Alternative 2026: Behavioral Graph Analysis, Security Investigation, and GDPR Under the CLOUD Act

Post #744 in the sota.io EU Compliance Series

AWS Detective is Amazon's managed security investigation service that uses machine learning to automatically correlate and analyze activity data from AWS CloudTrail API logs, Amazon VPC Flow Logs, Amazon GuardDuty findings, AWS Organizations data, and Amazon EKS audit logs. Security teams use it to investigate incidents faster, trace the blast radius of a compromised IAM credential, and visualize lateral movement across multi-account environments. Detective builds persistent behavioral graphs representing normal activity patterns for every entity — IAM users, IAM roles, EC2 instances, IP addresses, S3 buckets, EKS clusters — in your AWS environment.

The fundamental tension: Detective's value proposition is that it remembers everything that happened in your environment for up to one year, builds a behavioral baseline for every entity including human users, and lets investigators query this history to reconstruct the sequence of events before, during, and after an incident. That one year of behavioral data — every API call, every network connection, every authentication event attributed to individual users — is processed and stored on infrastructure that remains under US jurisdiction. Understanding the GDPR implications requires looking at what constitutes personal data in Detective's graph, not just where your EC2 instances run.

What AWS Detective Actually Does

When you enable Detective on an AWS account, the service begins ingesting event data from three primary sources: CloudTrail management events (API calls, console sign-ins, IAM changes), VPC Flow Logs (network connection records with source and destination IP, port, protocol, and byte counts), and GuardDuty findings (threat intelligence alerts with entity attribution). In multi-account environments, a Detective administrator account can aggregate data from member accounts, creating a unified graph spanning the entire AWS organization.

Detective processes this raw event data through a series of ML models that extract entities and relationships. The output is a behavior graph: a continuously updated graph database where nodes represent entities (IP addresses, user accounts, EC2 instances, S3 buckets, EKS pods) and edges represent observed interactions (API calls made, connections established, resources accessed). Each node accumulates a behavioral profile — the statistical baseline of what normal activity looks like for that entity — that Detective uses to score anomalies.

The investigation interface lets security analysts query this graph interactively. You can start from a GuardDuty finding, expand to all entities the flagged IP address interacted with, then pivot to the full API call history of any IAM role that IP authenticated as, then trace all S3 buckets that role accessed, all within a time window you specify. Detective surfaces volume-based anomalies (this IP made 40x its baseline number of S3 API calls), geolocation anomalies (this user authenticated from an unusual country), and timing anomalies (this credential was used at 3 AM for the first time in its history).

The investigation data is stored in a managed findings database maintained by AWS. Organizations pay for the volume of data ingested per month, with Detective retaining this data for a configurable period up to one year.

GDPR Exposure Point 1: Behavioral Graphs as Personal Data Under Art.4(1)

GDPR Art.4(1) defines personal data as any information relating to an identified or identifiable natural person. An identifiable person is one who can be identified, directly or indirectly, by reference to an identifier such as a name, identification number, location data, online identifier, or factors specific to their physical, professional, or mental identity.

Every node in the AWS Detective behavior graph representing an IAM user or IAM role directly maps to a natural person: the employee, contractor, or developer whose credentials correspond to that identity. CloudTrail records the IAM principal name in every API call log, and IAM user names are typically correlated with real people (firstname.lastname@company.com, employee ID numbers, or similar identifiers).

The behavioral graph is personal data. When Detective builds a profile showing that IAM user anna.schmidt authenticated from Frankfurt at 09:15 every weekday for six months, that this user typically accesses five specific S3 buckets, that her API call volume averages 340 requests per day, and that she has never authenticated outside Germany — this constitutes a detailed personal data profile under Art.4(1).

This profiling is not incidental. It is Detective's core function. The behavioral baseline exists precisely because it is associated with an individual, and its anomaly detection value derives entirely from its association with a specific person's normal behavior patterns.

Under GDPR Art.6, any processing of personal data requires a legal basis. For employee behavioral monitoring data of this depth and retention, legitimate interest (Art.6(1)(f)) requires a balancing test demonstrating that the legitimate interest outweighs the data subject's fundamental rights. A one-year behavioral profile of every API call an employee ever made is unlikely to survive this balancing test without a compelling documented justification — particularly given that the processing is continuous rather than triggered by specific incident response needs.

GDPR Exposure Point 2: Art.9 Risk — Inferring Special Category Data from Behavioral Patterns

GDPR Art.9 prohibits processing of special category personal data — health data, biometric data, racial or ethnic origin, political opinions, religious beliefs, trade union membership, genetic data, sexual orientation — without explicit legal basis.

Behavioral graph data can reveal special category attributes indirectly. IP-level analysis shows which external services employees access from the corporate network. An IAM role that makes API calls to S3 buckets named "mental-health-platform-data" or "trade-union-communications" reveals sensitive affiliations through access patterns. An employee whose login times correlate with religious holidays reveals religious practice. Network flow data showing connections to medical information services, legal aid organizations, or political campaign infrastructure can expose sensitive personal attributes that the employee never disclosed to their employer.

Detective aggregates exactly this type of behavioral signal — not because it targets special category data, but because comprehensive behavioral graphs necessarily capture the full pattern of organizational activity including patterns that reveal special category attributes.

The Art.9 exposure point is not that Detective explicitly processes health records or political opinions. It is that the behavioral profiles Detective builds are sufficiently detailed that they constitute indirect processing of special category data under Art.9's prohibition on processing information that "relates to" sensitive attributes. European Data Protection Authorities have taken an expansive view of what "relates to" means in the context of behavioral profiling: any data that, combined with other data points, can reveal a special category attribute falls within Art.9's scope.

GDPR Exposure Point 3: Art.5(1)(e) Storage Limitation — One Year of Behavioral History

GDPR Art.5(1)(e) requires that personal data be kept in a form that permits identification of data subjects for no longer than necessary for the purposes for which the personal data is processed (storage limitation principle).

AWS Detective's default retention period is one year. Organizations can reduce this, but the service is architected around long retention horizons: the behavioral baselines that power anomaly detection become more accurate with longer history, and forensic investigations often require tracing back through months of activity to understand the full scope of a compromise.

The tension with Art.5(1)(e): Security monitoring has a legitimate purpose for processing behavioral data, but the purpose is narrowly defined — detecting and investigating security incidents. Retaining every employee's complete behavioral profile for twelve months regardless of whether any incident occurred cannot be justified as "no longer than necessary" for the security monitoring purpose. The necessity standard requires demonstrating that historical data from non-incident periods was needed for the security purpose, not merely that it might become useful in future incident scenarios.

This is the same tension that applies to extended CCTV retention: security cameras have a legitimate purpose, but retaining all footage for twelve months when incidents requiring review are rare cannot satisfy the storage limitation principle.

For organizations subject to German co-determination law (Betriebsverfassungsgesetz), employee behavioral monitoring beyond what is strictly necessary for documented security purposes requires works council consultation. A one-year behavioral baseline for all employees will typically trigger this requirement.

Art.5(1)(e) also interacts with Art.17 right to erasure: if a former employee submits an erasure request, Detective's behavioral graph contains a year of that person's activity. Erasure from a managed graph database where data points are interleaved with other entities' data is technically complex — AWS does not provide a per-subject erasure mechanism for Detective data.

GDPR Exposure Point 4: CLOUD Act — Security Investigation Data as Law Enforcement Target

The US CLOUD Act (Clarifying Lawful Overseas Use of Data Act, 2018) allows US law enforcement to compel US providers to produce data stored on their infrastructure regardless of the physical location of servers or the nationality of data subjects.

AWS is a US provider. AWS Detective data is stored on AWS infrastructure. A valid CLOUD Act request — which does not require notification to the data subject or the data controller — can compel disclosure of the full Detective behavior graph, including one year of API call history, network connections, authentication events, and behavioral profiles for all IAM principals.

What this means for security investigation data specifically: A CLOUD Act request targeting your AWS account gains access not just to your application data, but to the forensic reconstruction of everything that happened in your environment. Detective's behavior graph is the most complete picture of internal organizational activity available — it shows who accessed what, from where, at what time, for the entire retention period. For a US law enforcement investigation targeting your organization or an individual employee, Detective data represents a comprehensive surveillance record that your organization never intended to create for law enforcement purposes.

The risk is not hypothetical. CLOUD Act requests targeting corporate infrastructure in ongoing investigations (antitrust, sanctions enforcement, export control, trade secret cases) regularly encompass cloud storage and log data. An organization running AWS Detective in an investigation-heavy regulatory environment is building a one-year forensic dossier on itself and its employees that is accessible to US law enforcement without EU judicial authorization.

GDPR Art.48 prohibits transfers of personal data to foreign courts or authorities unless based on an international agreement such as a mutual legal assistance treaty. Responding to a CLOUD Act order by producing AWS Detective data would likely violate Art.48 — but the CLOUD Act order goes to AWS, not to the data controller. AWS's response to the order does not require the controller's consent or knowledge.

GDPR Exposure Point 5: Art.5(1)(c) Data Minimization — No Selective Collection

GDPR Art.5(1)(c) requires that personal data be adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed (data minimization).

AWS Detective does not offer selective data collection by entity type, user category, or sensitivity level. When you enable Detective, it ingests all CloudTrail events, all VPC Flow Logs, and all GuardDuty findings for the enrolled accounts. There is no mechanism to configure Detective to collect network data for service accounts but exclude human user accounts, or to capture authentication events while excluding API call content.

Data minimization requires purpose limitation: behavioral monitoring for security purposes means collecting what is necessary to detect and investigate security incidents. It does not mean collecting every behavioral data point for every principal in the environment for the maximum retention period, on the theory that any of it might become relevant in a future investigation.

The analogy is wiretapping: law enforcement is permitted to conduct targeted communications surveillance with judicial authorization, not to continuously record all communications from everyone in a building on the premise that some of them might be relevant to a future investigation. GDPR Art.5(1)(c) applies a similar proportionality requirement to data collection for security monitoring.

For organizations with mixed-sensitivity workforces — contractors, third-party integrators, employees with access to special category data — the inability to scope Detective's collection to appropriate populations makes it difficult to implement a proportionate monitoring program.

GDPR Exposure Point 6: Cross-Account Aggregation and Art.26 Joint Controller Obligations

AWS Detective supports multi-account investigation through a delegated administrator model: a central administrator account can enroll member accounts, and Detective aggregates their behavioral data into a unified organization-level behavior graph. The administrator can investigate entities and events across all member accounts from a single console.

The Art.26 joint controller implication: In a multi-entity corporate structure — a holding company with operating subsidiaries, a SaaS platform with multiple subsidiary business units, a corporate group with EU subsidiaries and a US parent — different legal entities have different data protection obligations. When the US parent's AWS account acts as the Detective administrator for EU subsidiary member accounts, the US parent gains access to behavioral profiles of EU subsidiary employees.

GDPR Art.26 requires joint controllers to determine their respective responsibilities for compliance with GDPR obligations through a transparent arrangement. The existing AWS multi-account structures are typically governed by AWS Organizations policies, not by Art.26 joint controller agreements covering behavioral data access.

The practical consequence: a US parent company with Detective administrator access to EU subsidiary accounts can query the full one-year behavioral history of EU employees from the administrator console, with no requirement for the EU subsidiary's DPO to be involved or notified, and with no Art.26 agreement allocating GDPR responsibility for that access.

In an M&A context, if a US-headquartered acquirer gains access to the Detective administrator account of an acquired EU company as part of the transaction, they immediately gain access to a year of behavioral data on EU employees — creating Art.46 data transfer implications (behavioral data of EU employees flowing to US jurisdiction) that may not have been addressed in the acquisition GDPR due diligence.

EU Alternatives for Security Investigation and Behavioral Analytics

Organizations that need Detective-equivalent security investigation capabilities without CLOUD Act exposure have mature, well-maintained EU-deployable options:

Elastic SIEM with UEBA is the most functionally comparable alternative. Elastic Security, deployed on Elasticsearch running in EU infrastructure, provides entity behavior analytics (UEBA) using ML models that build behavioral baselines for users, hosts, and processes. It ingests CloudTrail, VPC Flow Logs, and custom log sources. The investigation interface supports graph-based exploration and pivot queries similar to Detective. Elastic is Apache-licensed for the core functionality, with paid features for ML-based anomaly detection.

OpenSearch Security Analytics (Apache 2.0) provides SIEM capabilities including correlation rules, threat detection, and investigation workflows built on OpenSearch. It lacks Detective's ML-powered behavioral baselines but supports rule-based detection and investigation across log sources. AWS contributed the Security Analytics plugin to OpenSearch after forking it from Elasticsearch, so the data model is familiar for teams migrating from AWS-managed OpenSearch.

Wazuh is an open-source SIEM and XDR platform that provides agent-based and agentless monitoring, behavioral rule matching, MITRE ATT&CK framework mapping, and incident investigation capabilities. Wazuh is deployed as self-hosted infrastructure and supports integration with cloud provider logs including CloudTrail and VPC Flow Logs via its cloud security module. The Wazuh indexer (based on OpenSearch) provides the storage layer for behavioral analytics.

TheHive (open-source, AGPL) combined with Cortex provides a security incident response platform with case management, observable analysis, and responder automation. TheHive does not provide Detective-equivalent behavioral baselines but excels at the investigation workflow and analyst collaboration aspects that Detective's investigation console supports.

OpenCTI provides threat intelligence management and can correlate investigation findings with external threat intelligence, supporting the same enrichment workflows that Detective performs using GuardDuty findings.

A reference architecture for EU-hosted behavioral security analytics using Elastic SIEM:

# Ingest CloudTrail to Elastic SIEM via Filebeat
# filebeat.yml configuration for CloudTrail ingestion

# filebeat.inputs:
# - type: aws-cloudtrail
#   queue_url: https://sqs.eu-central-1.amazonaws.com/[account]/cloudtrail-events
#   credential_profile_name: security-audit

# Equivalent Python ingestor for environments without Filebeat
import boto3
import json
from elasticsearch import Elasticsearch

def ingest_cloudtrail_events(sqs_queue_url: str, es_client: Elasticsearch):
    sqs = boto3.client("sqs", region_name="eu-central-1")
    
    while True:
        response = sqs.receive_message(
            QueueUrl=sqs_queue_url,
            MaxNumberOfMessages=10,
            WaitTimeSeconds=20,
        )
        
        messages = response.get("Messages", [])
        if not messages:
            continue
        
        for message in messages:
            body = json.loads(message["Body"])
            s3_info = json.loads(body["Message"])
            
            for record_info in s3_info.get("s3ObjectKey", []):
                events = fetch_cloudtrail_from_s3(record_info)
                for event in events:
                    # Normalize to ECS (Elastic Common Schema) format
                    ecs_event = normalize_to_ecs(event)
                    es_client.index(
                        index=f"logs-aws.cloudtrail-default",
                        document=ecs_event,
                    )
            
            sqs.delete_message(
                QueueUrl=sqs_queue_url,
                ReceiptHandle=message["ReceiptHandle"],
            )

Behavioral baseline with Elastic ML:

# Define ML job for user behavior baseline in Elastic
ml_job_config = {
    "job_id": "cloudtrail-user-anomaly",
    "description": "Detect anomalous API call volumes per IAM user",
    "analysis_config": {
        "bucket_span": "15m",
        "detectors": [
            {
                "detector_description": "High API call count by user",
                "function": "high_count",
                "partition_field_name": "user.name",
            },
            {
                "detector_description": "Rare new IP by user",
                "function": "rare",
                "by_field_name": "source.ip",
                "partition_field_name": "user.name",
            },
        ],
        "influencers": ["user.name", "source.ip", "aws.cloudtrail.user_identity.arn"],
    },
    "data_description": {
        "time_field": "@timestamp",
    },
    "model_snapshot_retention_days": 90,  # GDPR-compliant retention
}

# Apply GDPR-compliant retention to the underlying index
index_lifecycle_policy = {
    "policy": {
        "phases": {
            "hot": {
                "min_age": "0ms",
                "actions": {"rollover": {"max_size": "50gb", "max_age": "30d"}},
            },
            "warm": {
                "min_age": "30d",
                "actions": {"shrink": {"number_of_shards": 1}},
            },
            "delete": {
                "min_age": "90d",  # 3 months vs Detective's 1 year — proportionate to security purpose
                "actions": {"delete": {}},
            },
        }
    }
}

Implementing proportionate retention for behavioral data:

# Pseudonymize IAM user identifiers in security logs before indexing
# Preserves anomaly detection capability while reducing direct identifiability
import hashlib
import hmac

PSEUDONYMIZATION_KEY = os.environ["LOG_PSEUDONYMIZATION_KEY"]  # From EU-jurisdiction secrets manager

def pseudonymize_user_identity(user_name: str, account_id: str) -> str:
    """
    One-way pseudonymization of IAM user names for behavioral analytics.
    The mapping table is stored separately with restricted access.
    This satisfies Art.5(1)(c) proportionality for behavioral baselines.
    """
    identifier = f"{account_id}:{user_name}"
    pseudonym = hmac.new(
        PSEUDONYMIZATION_KEY.encode(),
        identifier.encode(),
        hashlib.sha256,
    ).hexdigest()[:16]
    return f"usr_{pseudonym}"

def normalize_to_ecs(cloudtrail_event: dict) -> dict:
    raw_user = cloudtrail_event.get("userIdentity", {}).get("userName", "unknown")
    account_id = cloudtrail_event.get("recipientAccountId", "")
    
    return {
        "@timestamp": cloudtrail_event["eventTime"],
        "event.action": cloudtrail_event["eventName"],
        "event.provider": cloudtrail_event["eventSource"],
        "event.outcome": "success" if not cloudtrail_event.get("errorCode") else "failure",
        "source.ip": cloudtrail_event.get("sourceIPAddress"),
        "source.geo.country_iso_code": cloudtrail_event.get("awsRegion", "").split("-")[0].upper(),
        "user.name": pseudonymize_user_identity(raw_user, account_id),  # Pseudonymized
        "aws.cloudtrail.user_identity.arn": cloudtrail_event.get("userIdentity", {}).get("arn"),
        "aws.cloudtrail.event_id": cloudtrail_event["eventID"],
    }

Phase 2: Validate Equivalent Detection Coverage

Run Elastic SIEM in parallel with Detective for four to six weeks, confirming that the rule-based detectors and ML anomaly jobs fire on the same events that Detective surfaces as high-severity findings. Detective's detection coverage is documented in the AWS documentation as a set of finding types — these map directly to Elastic detection rules available in the Elastic Security detection rules repository (over 800 pre-built rules covering MITRE ATT&CK techniques).

Gaps in detection coverage typically appear in Detective's cross-entity pivot capabilities: Detective makes it easy to ask "show me every IP address this compromised role ever connected to." Elastic KQL queries can replicate this:

event.provider: "iam.amazonaws.com" 
  AND user.name: "usr_a1b2c3d4e5f6g7h8"  /* pseudonymized role */
  AND event.outcome: success
  AND @timestamp >= "2026-01-01T00:00:00Z"
  | stats count() by source.ip
  | sort count desc

Phase 3: Implement Data Subject Rights Mechanisms

Unlike Detective's managed graph database, a self-hosted Elastic deployment gives you mechanisms to fulfill Art.17 erasure requests: delete by pseudonym (which removes all behavioral records associated with a specific person without affecting records for other entities), index lifecycle management with per-subject retention categories, and the ability to re-pseudonymize if the key rotation schedule requires it.

def process_erasure_request(user_name: str, account_id: str, es_client: Elasticsearch):
    """
    GDPR Art.17 erasure: delete all behavioral records for a specific person.
    Uses pseudonym to delete without requiring full log scan.
    """
    pseudonym = pseudonymize_user_identity(user_name, account_id)
    
    response = es_client.delete_by_query(
        index="logs-aws.cloudtrail-*",
        body={
            "query": {
                "term": {"user.name": pseudonym}
            }
        },
    )
    
    return {
        "deleted": response["deleted"],
        "subject": user_name,  # Only in the response — never logged
        "pseudonym_used": pseudonym,
        "erasure_timestamp": datetime.utcnow().isoformat(),
    }

Cost Comparison

AWS Detective pricing is based on the volume of data ingested per month. For a mid-size AWS environment with 10 accounts, 200 EC2 instances, and moderate API activity, Detective typically costs $300-800 per month depending on VPC Flow Log volume.

A self-hosted Elastic SIEM deployment on EU cloud infrastructure sized for equivalent log volume (typically 3-5 nodes with 16 cores and 64GB RAM each) costs €600-1,200 per month at EU cloud provider pricing — comparable to Detective, but with the addition of SOAR capabilities (TheHive/Cortex), threat intelligence (OpenCTI), and full control over retention and pseudonymization policies that Detective does not offer.

For organizations already running Elasticsearch or OpenSearch for application search or analytics, the incremental cost of adding SIEM workloads to existing clusters is substantially lower. Security analytics workloads can often share cluster resources with application workloads during off-peak hours.

The operational overhead is higher than Detective (you manage the infrastructure), but for organizations with a GDPR-conscious security team, the tradeoff is favorable: you gain Art.17 erasure capability, proportionate retention configuration, pseudonymization controls, and data that never leaves EU jurisdiction.

What sota.io Provides

sota.io is an EU-native PaaS platform built specifically for developers and organizations that cannot afford CLOUD Act exposure. Your applications, data, and infrastructure run exclusively in EU jurisdiction without US-parent company access.

For security analytics workloads, this means you can deploy Elastic SIEM, Wazuh, OpenSearch Security Analytics, TheHive, or OpenCTI on sota.io infrastructure without sending behavioral investigation data, entity profiles, or incident forensics outside EU jurisdiction. You get Detective-equivalent security investigation capabilities — behavioral baselines, anomaly detection, cross-entity pivoting, incident timeline reconstruction — with full data sovereignty.

The key difference from AWS Detective: when the behavioral baseline identifies anomalous API activity for an IAM user, that investigation data stays in your infrastructure. There is no US-jurisdiction service with access to the behavioral profile. No CLOUD Act order can reach your security investigation data because it never touched US infrastructure. Art.17 erasure requests can be fulfilled by deleting pseudonymized records from your own Elasticsearch indices — a deterministic, auditable operation rather than a support ticket to a managed service provider.

sota.io handles the infrastructure so your security team can focus on building detection coverage, tuning behavioral models, and running incident investigations — not on managing Elasticsearch cluster upgrades or configuring multi-region replication.

Conclusion

AWS Detective solves a real problem: security investigations require correlating months of activity data across dozens of data sources, and doing this without a managed graph database is operationally expensive. The GDPR tension is not that Detective performs this analysis badly — it is that the behavioral data Detective accumulates is among the most sensitive personal data an organization processes, and storing it on US-jurisdiction infrastructure for up to one year creates compounding exposure.

The six exposure points — behavioral graphs as personal data under Art.4(1), indirect Art.9 special category data inference from behavioral patterns, Art.5(1)(e) storage limitation for year-long behavioral baselines, CLOUD Act access to forensic security investigation data, Art.5(1)(c) data minimization obstacles from non-selective collection, and cross-account Art.26 joint controller complications — are structural consequences of running behavioral security analytics through US-jurisdiction managed infrastructure.

EU-native alternatives (Elastic SIEM with UEBA, Wazuh, OpenSearch Security Analytics, TheHive, OpenCTI) address these exposure points by keeping behavioral analytics infrastructure, entity profiles, and investigation data in EU jurisdiction where CLOUD Act authority does not apply. The migration is technically comparable to Detective's operational model — both require configuration, both produce behavioral baselines, both support incident investigation workflows — but the self-hosted path gives you the data subject rights mechanisms, proportionate retention controls, and pseudonymization capabilities that managed Detective cannot provide.

For organizations building GDPR-compliant security programs in 2026, the question is not whether to perform behavioral security analytics — Art.32's requirement for appropriate technical and organizational measures makes security monitoring necessary. The question is whether your security investigation infrastructure creates a one-year behavioral dossier on your employees that is accessible to US law enforcement without your knowledge or consent.


sota.io is an EU-native PaaS platform. Deploy Elastic SIEM, Wazuh, OpenSearch Security Analytics, or any open-source security investigation stack on infrastructure that stays under EU jurisdiction — no US-parent company, no CLOUD Act exposure.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.