2026-05-01·13 min read·

AWS Security Lake EU Alternative 2026: OCSF Security Data, Behavioral Profiling, and GDPR Under the CLOUD Act

Post #746 in the sota.io EU Compliance Series

AWS Security Lake is Amazon's managed service for centralizing security telemetry from across an organization's AWS environment, SaaS providers, on-premises infrastructure, and cloud sources into a purpose-built data lake. It ingests events from AWS services including CloudTrail, VPC Flow Logs, Route 53 Resolver query logs, Security Hub findings, Lambda execution logs, S3 data events, WAF logs, and EKS audit logs. It normalizes all of this data to the Open Cybersecurity Schema Framework (OCSF) standard and stores it as Apache Parquet files in Amazon S3, where third-party analytics tools and SIEMs can query it via AWS Lake Formation.

Security Lake is designed to solve a genuine problem: security telemetry is generated in different formats by dozens of sources, making cross-source analysis difficult. OCSF normalization enables security engineers to query across CloudTrail API calls, VPC network flows, Kubernetes audit events, and WAF requests using a unified schema. Subscribers including Splunk, IBM QRadar, Sumo Logic, and Databricks connect to Security Lake via S3 to pull normalized event data into their own analytics environments.

The critical tension: Security Lake's fundamental value proposition — standardizing security events into a machine-readable, analytics-optimized format — creates a structural GDPR problem. Security events are not neutral technical logs. They contain personal data about the humans who generate those events: IP addresses, IAM user identifiers, Kubernetes service account names, API operation patterns, and behavioral sequences that reveal what employees do on organizational systems. When this personal data is normalized to a structured, queryable schema and stored on US-jurisdiction infrastructure, CLOUD Act requests can compel disclosure of a machine-readable, analytically optimized profile of your employees' and users' digital behavior — with the OCSF normalization AWS built for your security team working equally well for the requesting authority's analysts.

What AWS Security Lake Actually Does

When you enable Security Lake for an AWS region, it creates an S3 bucket in your account to serve as the data lake. AWS services configured as sources begin writing security events to this bucket in OCSF-compliant Parquet format. Security Lake manages the partitioning, lifecycle, and metadata catalog (via AWS Glue) that makes this data queryable.

Source integration covers the breadth of AWS security telemetry. CloudTrail management events capture every API call made in your account — who called what operation, from which IAM principal, at what time, with what result. VPC Flow Logs capture every network connection attempt to and from resources in your VPC — source IP, destination IP, port, protocol, bytes transferred, accept/reject decision. Route 53 Resolver query logs capture every DNS lookup made by resources in your environment — query domain, query type, response, the IP address that made the query. EKS audit logs capture every Kubernetes API operation — pod creation, service account token usage, RBAC authorization decisions.

Third-party source integration uses the Security Lake OCSF ingestion API. SaaS providers can push security events in OCSF format to your Security Lake. On-premises security tools can write OCSF events via the API. This creates a single normalized store for security telemetry that spans AWS, SaaS, and on-premises environments.

Subscribers are the consumers of this data. An analytics subscriber configures access to the S3 data via Lake Formation, enabling their SIEM or analytics platform to query Security Lake data directly. Security Lake manages the IAM permissions and Lake Formation grants that control subscriber access.

OCSF exists specifically to make security event data more accessible to analytics tools. The schema defines standardized field names, data types, and event categories so that a query for "authentication events involving failed credential use" returns normalized results across CloudTrail, VPC Flow Logs, Okta events, and on-premises Active Directory logs — all using the same field structure.

The paradox: The same standardization that makes Security Lake valuable for your security team makes the personal data within it maximally accessible for bulk analysis by third parties. Without OCSF normalization, extracting personal data from diverse security logs requires understanding multiple proprietary formats, writing format-specific parsers, and reconciling inconsistent field names across sources. With Security Lake, the personal data in security events — user identifiers, IP addresses, behavioral sequences — is already normalized, typed, partitioned for efficient querying, and stored in Parquet format that can be loaded directly into analytical systems.

A CLOUD Act request targeting your Security Lake does not require the requesting authority to decode proprietary log formats or write custom parsers. The data is already in a schema-documented, analytically optimized format. OCSF field mappings are publicly documented. The requesting authority's analysts can query actor.user.name, src_endpoint.ip, activity_name, and time across your entire normalized security event history without any reverse engineering.

This creates a structurally different GDPR risk profile than security data stored in proprietary or unstructured formats. The CLOUD Act risk of unstructured log data requires effort to exploit. Security Lake's OCSF normalization eliminates that friction — by design, for legitimate analytics purposes, but with the side effect of making CLOUD Act disclosure of personal data in security events trivially usable.

The European Court of Justice confirmed in Breyer v. Germany (Case C-582/14) that dynamic IP addresses are personal data when the party holding them has legal means to identify the natural person behind the IP address — for example, through ISP disclosure. In enterprise contexts, where VPC Flow Logs capture internal IP addresses assigned to specific employees' devices or workstations, the personal data status is unambiguous.

Security Lake aggregates personal data in this sense from multiple sources simultaneously:

VPC Flow Logs capture src_endpoint.ip and dst_endpoint.ip for every network flow. In a corporate environment, these internal IP addresses map to specific employee devices via DHCP assignment records. A VPC Flow Log with a source IP of 10.0.4.23 is, when DHCP records are available, a record of a specific employee's network activity.

CloudTrail events capture the IAM principal that performed each API operation in actor.user.name and the actor.user.uid. IAM usernames are frequently human-readable identifiers that directly identify individuals — first.last@company.com as an IAM username, or human-readable role names like developer-alice. CloudTrail events containing these identifiers are records of named individuals' actions on organizational systems.

EKS audit logs capture Kubernetes service account tokens, pod names, and RBAC subjects. In development environments where developers run workloads under their own service accounts or where pod names encode developer identifiers, Kubernetes audit events contain personal data.

Route 53 Resolver logs capture the src_endpoint.ip that issued each DNS query. Combined with DHCP records, these become a chronological record of what external domains a specific employee's device attempted to resolve — browsing behavior, SaaS tool usage, potential shadow IT, and other behavioral data with GDPR implications under Art.5(1)(c) (data minimization) and Art.25 (data protection by design).

Art.5(1)(c) GDPR requires that personal data be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed." Comprehensive network flow logs, full API call history, and complete DNS query logs for all employees may exceed what is "necessary" for the security monitoring purposes for which they are collected. When this data is stored on US-jurisdiction infrastructure, the personal data within it is accessible under CLOUD Act without the Art.5 minimization analysis having been applied to CLOUD Act disclosure as a purpose.

Security Lake's core capability is cross-source correlation. A security event sequence that spans a CloudTrail API call, a VPC Flow Log connection, a Route 53 query, and a Security Hub finding from a third-party tool can be correlated into a unified behavioral narrative using the OCSF schema's consistent actor, endpoint, and event identifiers.

This cross-source correlation creates employee behavioral profiles that go far beyond what any individual source system contains. CloudTrail alone shows API operations. VPC Flow Logs alone show network flows. Combined in Security Lake, they show: this IAM user (identifiable as a specific employee) made these API calls at these times, generating these network flows to these external endpoints, while issuing these DNS queries, with these authentication events preceding and following the access sequence.

GDPR Art.22 restricts automated decision-making based on profiling, but the profiling protection in Art.4(4) and Art.22 is not limited to formal automated decisions — it covers the creation of profiles about natural persons based on their behavior. The Security Lake data model, by design, creates queryable behavioral profiles of the humans whose actions generate security events.

For employee data, Art.88 GDPR and national implementing laws (the German Bundesdatenschutzgesetz §26, for example) impose specific requirements for employee data processing and monitoring. Comprehensive behavioral profiling of employee IT activity via Security Lake may require works council consultation in Germany, employee notification under Art.13/14, and a legitimate interest assessment under Art.6(1)(f) that explicitly addresses the behavioral profiling dimension.

When Security Lake stores these behavioral profiles on US-jurisdiction infrastructure, a CLOUD Act request could compel disclosure of a structured, cross-correlated behavioral profile of specific employees — including employees who are witnesses, whistleblowers, journalists, legal counsel, or otherwise sensitive from a fundamental rights perspective.

Security Lake's subscriber model creates data flows that compound the CLOUD Act exposure.

Analytics subscribers access Security Lake data via AWS Lake Formation grants to S3. A Splunk Cloud deployment acting as a Security Lake subscriber copies normalized OCSF event data into Splunk's own infrastructure for indexing and analysis. Splunk is a US company; Splunk Cloud stores data on infrastructure subject to US jurisdiction. The CLOUD Act exposure applies to Security Lake's S3 storage and to Splunk Cloud's copy of the same normalized data.

IBM QRadar, Sumo Logic, Microsoft Sentinel, and other US-headquartered SIEM platforms that operate as Security Lake subscribers create the same duplication: a CLOUD Act request targeting any of these platforms can reach the normalized, personal-data-containing OCSF event stream that originated in your Security Lake.

Art.28 GDPR requires a Data Processing Agreement with each processor. Security Lake subscribers who receive personal data are processors — they process your security event data (which contains personal data about your employees and users) on your behalf. The processor chain from your Security Lake to each analytics subscriber requires Art.28 DPA coverage, and international transfer mechanisms under Chapter V GDPR for subscribers operating in third countries.

The CLOUD Act disclosure risk does not end at the Security Lake S3 boundary. It extends to every system that has received a copy of the normalized event data — including systems you may not have fully mapped as personal data processors.

Security Lake lifecycle configuration controls how long raw security events are retained in the data lake. Default configurations often retain data for 12 months or longer to support forensic investigation of historical incidents. Security teams have legitimate reasons to want long retention windows — many advanced persistent threats have dwell times measured in months, and investigating a breach requires log data from before the attacker's initial access.

GDPR Art.5(1)(e) requires that personal data be "kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed." The "no longer than necessary" standard applies to the personal data embedded in security events — IP addresses, user identifiers, behavioral sequences — as distinct from the security telemetry considered as purely technical data.

The EDPB's guidance on employee monitoring and security logging (drawing on Opinion 2/2017 and national DPA guidance) consistently emphasizes that log retention periods must be proportionate to the security purpose and that longer retention requires explicit justification. A 12-month Security Lake retention policy for VPC Flow Logs containing employee IP address data requires a proportionality analysis that most organizations have not formally documented.

For organizations subject to sector-specific retention obligations — financial services (MiFID II, DORA Art.25), healthcare (national implementing laws), telecommunications — retention periods may be mandated by regulation. The intersection of mandated security log retention, GDPR storage limitation, and CLOUD Act disclosure risk creates a compliance trilemma: you may be legally required to retain security logs that contain personal data for periods during which CLOUD Act requests can reach them.

Security Lake uses AWS Glue for data catalog management and, in some configurations, for ETL transformations of custom source data. The Glue Data Catalog records table schemas, partition structure, and data location for Security Lake's S3 data. Glue ETL jobs transform incoming data to OCSF format.

The Glue metadata catalog is a disclosure surface that is easy to overlook. The catalog contains structural information about your security data — what sources are ingested, what event categories are present, the partition structure that reveals the time ranges and volume of your security telemetry. This structural metadata tells a requesting authority what security data exists and how to access it, before they even examine the event data itself.

Glue ETL job execution history records what transformations were applied to your data, including when custom source integrations processed external security event streams. The ETL job history can reveal third-party security tool integrations, SaaS source coverage, and the operational scope of your security monitoring program.

Under CLOUD Act, a request targeting an organization's AWS account could encompass both the Security Lake data and the Glue catalog that documents its structure — providing a complete picture of the organization's security monitoring scope alongside the underlying event data.

EU Alternatives for Security Data Lakes

The EU market offers multiple approaches to security analytics that preserve EU data sovereignty.

Wazuh is an open-source security platform combining SIEM, XDR, and CSPM capabilities. Wazuh agents deploy on endpoints and servers to collect security events; a Wazuh indexer (based on OpenSearch) stores and indexes the data; the Wazuh dashboard provides analysis and alerting. Deployed on EU-hosted infrastructure — bare metal, VMs, or a European cloud provider — Wazuh keeps all security telemetry under EU jurisdiction. Wazuh supports OCSF-compatible output and integrates with numerous third-party sources. EU organizations using Wazuh include central banks, healthcare systems, and defense contractors where data sovereignty is a hard requirement.

Logpoint is a Danish SIEM provider with EU headquarters and EU-operated infrastructure. Logpoint's SIEM and SOAR platform is designed from the ground up for GDPR compliance, with built-in data minimization controls, configurable retention policies, and audit trails for access to personal data within logs. The company's EU base and EU-infrastructure focus address both CLOUD Act and Schrems II concerns directly.

Sekoia.io is a French cybersecurity company offering a cloud-native SIEM/XDR platform hosted exclusively on European infrastructure (OVHcloud). Sekoia's platform ingests security events from AWS, Azure, SaaS providers, and on-premises sources — the same source breadth as Security Lake — but stores and processes all data within the EU. Their threat intelligence feeds are sourced from European partners and their own research team, avoiding the US-intelligence-community integration that raises concerns with US SIEM providers.

Elastic Security deployed on EU-managed Elasticsearch clusters provides a self-hosted SIEM and security analytics capability with OCSF support. Organizations that already use Elasticsearch can add Elastic Security agents, configure their existing AWS source integrations to write to a self-hosted Elastic cluster running on EU infrastructure, and build the same cross-source behavioral analysis capabilities as Security Lake without US jurisdiction exposure. Elastic's OCSF integration ensures compatibility with the same analytics tooling built for Security Lake.

Graylog is a German-origin open-source log management platform widely used in European enterprises for security log aggregation and analysis. Graylog supports structured log ingestion from AWS CloudWatch, CloudTrail, and other sources, and can be deployed on any EU infrastructure. Commercial Graylog Enterprise provides threat intelligence integrations and compliance reporting features relevant to GDPR Art.30 and Art.32 documentation requirements.

Apache Iceberg + OpenSearch on EU Infrastructure — For organizations that want the closest functional equivalent to Security Lake's data lake architecture with full EU data sovereignty, the open-source stack of Apache Iceberg (table format), Apache Spark or Trino (query engine), and OpenSearch (indexing and visualization) deployed on EU cloud or bare metal provides the same Parquet-based, OCSF-compatible storage and query capability as Security Lake without any US-jurisdiction cloud service involvement.

Implementing an EU-Sovereign Security Data Lake

The following example demonstrates building an OCSF-compatible security data lake using MinIO (S3-compatible object storage) and OpenSearch, deployed on EU infrastructure:

#!/usr/bin/env python3
"""EU-sovereign security data lake: OCSF event ingestion for AWS CloudTrail.
Replaces AWS Security Lake with EU-hosted MinIO + OpenSearch stack.
"""

import json
import boto3
import hashlib
from datetime import datetime, timezone
from typing import Any
import urllib3
from opensearchpy import OpenSearch, helpers

urllib3.disable_warnings()

MINIO_ENDPOINT = "https://minio.eu-dc.example.com"
MINIO_BUCKET = "security-lake-eu"
OPENSEARCH_HOST = "opensearch.eu-dc.example.com"
OPENSEARCH_PORT = 9200

os_client = OpenSearch(
    hosts=[{"host": OPENSEARCH_HOST, "port": OPENSEARCH_PORT}],
    http_auth=("admin", "changeme"),
    use_ssl=True,
    verify_certs=True,
)


def cloudtrail_to_ocsf(ct_event: dict[str, Any]) -> dict[str, Any]:
    """Transform CloudTrail event to OCSF 1.0 API Activity schema."""
    event_time = ct_event.get("eventTime", "")
    ts = int(datetime.fromisoformat(event_time.replace("Z", "+00:00")).timestamp() * 1000)

    user_identity = ct_event.get("userIdentity", {})
    actor_user = {
        "uid": user_identity.get("arn", ""),
        "name": user_identity.get("userName") or user_identity.get("principalId", ""),
        "type": user_identity.get("type", ""),
        "account": {"uid": user_identity.get("accountId", "")},
    }

    src_ip = ct_event.get("sourceIPAddress", "")
    src_endpoint = {"ip": src_ip} if src_ip and not src_ip.startswith("AWS") else {}

    error_code = ct_event.get("errorCode")
    status_id = 2 if error_code else 1  # 1=Success, 2=Failure per OCSF

    return {
        "class_name": "API Activity",
        "class_uid": 6003,
        "category_name": "Application Activity",
        "category_uid": 6,
        "activity_id": 1,
        "activity_name": ct_event.get("eventName", ""),
        "time": ts,
        "metadata": {
            "version": "1.0.0",
            "product": {
                "name": "AWS CloudTrail",
                "vendor_name": "Amazon Web Services",
            },
            "uid": ct_event.get("eventID", ""),
        },
        "actor": {
            "user": actor_user,
            "session": {"uid": ct_event.get("requestID", "")},
        },
        "src_endpoint": src_endpoint,
        "api": {
            "service": {"name": ct_event.get("eventSource", "").replace(".amazonaws.com", "")},
            "operation": ct_event.get("eventName", ""),
            "request": {"uid": ct_event.get("requestID", "")},
        },
        "status_id": status_id,
        "status": "Success" if status_id == 1 else "Failure",
        "region": ct_event.get("awsRegion", ""),
        "cloud": {"account": {"uid": ct_event.get("recipientAccountId", "")}, "provider": "AWS"},
        "raw_data": json.dumps(ct_event),
    }


def ingest_cloudtrail_to_eu_lake(
    cloudtrail_bucket: str,
    prefix: str,
    aws_region: str = "eu-central-1",
) -> int:
    """Pull CloudTrail events from S3 and write OCSF-normalized events to EU lake."""
    s3 = boto3.client("s3", region_name=aws_region)
    paginator = s3.get_paginator("list_objects_v2")
    ocsf_events = []

    for page in paginator.paginate(Bucket=cloudtrail_bucket, Prefix=prefix):
        for obj in page.get("Contents", []):
            key = obj["Key"]
            if not key.endswith(".json.gz"):
                continue

            import gzip, io
            response = s3.get_object(Bucket=cloudtrail_bucket, Key=key)
            with gzip.GzipFile(fileobj=io.BytesIO(response["Body"].read())) as gz:
                ct_log = json.load(gz)

            for record in ct_log.get("Records", []):
                try:
                    ocsf_event = cloudtrail_to_ocsf(record)
                    ocsf_events.append(ocsf_event)
                except Exception:
                    continue

    if not ocsf_events:
        return 0

    actions = [
        {
            "_index": f"security-lake-{datetime.now(timezone.utc).strftime('%Y-%m')}",
            "_id": hashlib.sha256(e["raw_data"].encode()).hexdigest(),
            "_source": e,
        }
        for e in ocsf_events
    ]

    success, _ = helpers.bulk(os_client, actions, raise_on_error=False)
    return success


if __name__ == "__main__":
    count = ingest_cloudtrail_to_eu_lake(
        cloudtrail_bucket="my-cloudtrail-bucket",
        prefix="AWSLogs/123456789/CloudTrail/eu-central-1/2026/05/01/",
    )
    print(f"Ingested {count} OCSF events into EU security data lake")

This stack — MinIO for Parquet storage, OpenSearch for indexing, and a custom OCSF normalization layer — provides the same cross-source analytics capability as Security Lake while keeping all security telemetry on EU-hosted infrastructure. The OCSF normalization is identical; the difference is jurisdiction.

The Architecture Decision

AWS Security Lake solves a real problem — security telemetry fragmentation — with a technically sound approach. OCSF normalization and S3-based storage enable cross-source analytics at scale. For EU organizations, the question is whether the operational convenience of a managed AWS service outweighs the CLOUD Act exposure of storing personal-data-containing security events on US-jurisdiction infrastructure.

The OCSF standard is an open specification. The Parquet storage format is open-source. The Lake Formation access control model can be replicated with Apache Iceberg and open-source policy engines. The EU-sovereign alternative is not architecturally inferior — it requires operational management that Security Lake provides automatically, but it eliminates the CLOUD Act exposure that Security Lake's managed model creates.

For EU organizations processing security telemetry that contains employee personal data, the combination of GDPR Art.5, Art.22, Art.25, and Art.88 creates compliance obligations that are difficult to satisfy when that data resides on US-jurisdiction infrastructure. The behavioral profiles that Security Lake's cross-source correlation creates are exactly the kind of comprehensive employee monitoring data that European works councils, data protection officers, and supervisory authorities scrutinize most carefully — and that CLOUD Act requests can reach most productively.

EU-native alternatives — Wazuh, Logpoint, Sekoia.io, Elastic on EU infrastructure — provide the same security analytics capability without the CLOUD Act exposure. The choice between them and Security Lake is not a security effectiveness trade-off. It is a jurisdictional one.

sota.io is an EU-native Platform-as-a-Service designed for teams who need European infrastructure sovereignty. Deploy containerized applications on European servers with full GDPR compliance — no US parent company, no CLOUD Act exposure. Try sota.io →

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.

Start free — no credit card View pricing

AWS Security Lake EU Alternative 2026: OCSF Security Data, Behavioral Profiling, and GDPR Under the CLOUD Act

What AWS Security Lake Actually Does

GDPR Exposure Point 1: The OCSF Normalization Paradox — Structured Surveillance at Scale

GDPR Exposure Point 2: IP Addresses and User Identifiers as Personal Data

GDPR Exposure Point 3: Behavioral Profiling Through Cross-Source Correlation

GDPR Exposure Point 4: Subscriber Access Creates Additional Transfer Vectors

GDPR Exposure Point 5: Retention Policies vs. Art.5(1)(e) Storage Limitation

GDPR Exposure Point 6: AWS Glue Metadata Catalog as Additional Disclosure Surface

EU Alternatives for Security Data Lakes

Implementing an EU-Sovereign Security Data Lake

The Architecture Decision

Ready to move to EU-sovereign infrastructure?