2026-04-30·12 min read·

AWS MSK EU Alternative 2026: Kafka Consumer Groups, Cross-Region Replication, and GDPR

Post #724 in the sota.io EU Compliance Series

Amazon Managed Streaming for Apache Kafka (MSK) is the default Kafka infrastructure for applications built on AWS. Real-time event pipelines, microservice communication, log aggregation, clickstream processing, and financial transaction streams flow through MSK at scale across thousands of European deployments. MSK removes the operational burden of running Kafka clusters — provisioning, patching, broker replacement, and scaling are managed by AWS.

Amazon operates MSK in European regions: eu-west-1 (Ireland), eu-central-1 (Frankfurt), eu-west-3 (Paris), eu-south-1 (Milan), eu-north-1 (Stockholm). Kafka broker data, topic data, and consumer group state are stored in Europe. Many development teams treat this as a compliant configuration.

It is not. Amazon Web Services, Inc. is a Delaware corporation headquartered in Seattle, Washington. The CLOUD Act (18 U.S.C. § 2713) compels US companies to produce data stored anywhere in the world when ordered by US authorities. A valid government order served on Amazon in Seattle can reach your MSK cluster metadata in Frankfurt: consumer group offsets, connector configurations, cross-region replication state, and the topic data itself.

This is the same structural US jurisdiction problem documented across the AWS stack: AWS Kinesis, AWS Glue, AWS Athena, AWS Redshift. Kafka adds a dimension that many teams underestimate: consumer group offsets function as an access audit trail — a persistent record of which consumer services read which messages, in what order, and when.

What AWS MSK Stores That Touches Personal Data

MSK is not a dumb message relay. It maintains substantial operational and control-plane state around every cluster it manages, and significant portions of that state either contain personal data directly or create a detailed inference layer about personal data access patterns.

Consumer Group Offsets: A Persistent PII Access Log

Apache Kafka's consumer group mechanism is the foundation of its scalability model. When a consumer group reads messages from a topic, Kafka tracks its position — the committed offset — for each topic partition. MSK stores these offsets in the __consumer_offsets internal topic, which is part of the cluster's Kafka broker data.

Consumer group offsets record:

Which consumer group read which partition
The offset position (how far through the partition the group has read)
The timestamp of the last committed offset
Metadata strings optionally attached by the consumer application

When your Kafka topics carry personal data — user events, transaction records, order streams, health data updates — the consumer group offset table becomes an indirect personal data record. It establishes, for every consumer service, a timestamped log of exactly which personal data records were accessed and when.

Under GDPR Art.30 (Records of Processing Activities), organizations must document all processing operations involving personal data. The consumer group offset log is processing metadata for those operations. It is stored by AWS (a US entity) in its control plane and is accessible under CLOUD Act compulsion.

Consider a GDPR erasure request (Art.17). You delete the user's data from the topic (using compaction or direct deletion via an admin tool). The consumer group offset for a consumer that processed that message before deletion still exists — it records that a specific service consumed the position where that user's personal data resided. This residual access record persists in AWS infrastructure under US jurisdiction.

The Art.25 Problem: Consumer Group Membership Is Not Minimized by Default

GDPR Art.25 requires data protection by design and by default. Kafka consumer groups, as implemented in MSK, violate the minimization principle in a subtle way that is rarely caught during architecture review.

Consumer group names are strings set by the application. Many teams set them to descriptive names: user-notification-service, payment-processor-group, gdpr-erasure-consumer. These names appear in:

MSK CloudWatch metrics (exposed to AWS monitoring infrastructure)
MSK's own management API responses
The AWS MSK console
MSK cluster logs forwarded to CloudWatch Logs

When a consumer group name contains a reference to the category of data being processed (gdpr-erasure-consumer, health-records-processor, financial-events-sink), AWS infrastructure — and by extension, US authorities under the CLOUD Act — gains visibility into the categories of personal data flowing through your Kafka cluster. Art.25 by design means the system should not leak data categories to third parties. MSK's architecture makes this leakage structural.

Cross-Region Replication: Article 46 and Unintended Drittlandsübermittlung

MSK supports two replication mechanisms that can silently create cross-border data transfers:

MSK Replicator allows automatic replication of Kafka topics from one MSK cluster to another across AWS regions. This feature is used for disaster recovery and multi-region availability. If a team replicates from eu-central-1 (Frankfurt) to us-east-1 (Virginia) for DR purposes, the personal data in those Kafka topics — user events, transaction streams — is being transferred to a third country (the United States) within US-jurisdiction infrastructure. This is a cross-border transfer under GDPR Art.46 requiring appropriate safeguards (Standard Contractual Clauses at minimum). MSK Replicator's region configuration makes this transfer happen automatically and continuously without any explicit transfer notification to data subjects.

MSK Serverless uses AWS-managed infrastructure whose physical placement is less transparent than provisioned MSK clusters. Serverless deployments in European regions may route control-plane operations through US infrastructure for cluster management. The exact data residency guarantees for MSK Serverless are less explicit than for provisioned clusters using specific AWS region endpoints.

Even without explicit cross-region replication, the MSK control plane — cluster creation APIs, broker replacement logic, scaling decisions — runs in AWS infrastructure whose US-entity nature means operational metadata (cluster configurations, topic lists, consumer group names) is under CLOUD Act scope regardless of the data region.

MSK Connect: Uncontrolled Data Sinks Outside Your Jurisdiction

MSK Connect is the fully managed version of Kafka Connect — a framework for streaming data between Kafka topics and external systems using connector plugins. MSK Connect is one of the most significant GDPR risk vectors in the MSK ecosystem.

When you deploy an MSK Connect connector — for example, a JDBC Sink Connector writing Kafka topic data to a PostgreSQL database, or an S3 Sink Connector archiving events to S3 — you are creating an automated data transfer pipeline. The connector:

Pulls data from Kafka topics (which may contain personal data)
Transforms it according to connector configuration
Writes it to the configured sink (database, S3, Elasticsearch, Snowflake, etc.)

MSK Connect workers run on AWS-managed infrastructure (a US entity). The connector's execution environment — including any personal data passing through the transformation pipeline — is processed by AWS infrastructure under CLOUD Act jurisdiction. If the sink is also a US-based service (S3 in us-east-1, Snowflake AWS backend, Datadog), you have created a multi-hop personal data transfer chain, each segment of which is under US authority reach.

Under GDPR Art.30, each automated data transfer in this chain must be documented as a processing activity. MSK Connect creates transfers that are easy to set up but difficult to audit: a single connector configuration file can direct personal data from a topic to any reachable endpoint, without forcing the developer to confront the GDPR implications of that transfer.

CloudWatch Metrics: Usage Pattern Inference from Operational Telemetry

MSK publishes extensive metrics to CloudWatch — a US-jurisdiction service. Default metrics include:

BytesInPerSec / BytesOutPerSec per broker and per topic
MessagesInPerSec per broker
OffsetLag per consumer group and topic partition
ActiveControllerCount and broker health metrics
NetworkRxDropped / NetworkTxDropped

Topic-level throughput metrics (BytesInPerSec for a topic named user-purchase-events) reveal the volume and timing of personal data processing. Consumer group lag metrics (OffsetLag for consumer group payment-processor) reveal whether real-time processing of financial personal data is keeping up with ingestion — exposing system architecture and data volumes to AWS's monitoring infrastructure.

These metrics are stored in CloudWatch, which runs under the AWS (US) entity. They are accessible to AWS under the CLOUD Act and are not subject to EU data residency guarantees. Even if your Kafka topic data itself is EU-resident, the operational telemetry describing that data's processing is in US hands.

Schema Registry: Structural PII Metadata in AWS Glue

Many MSK deployments use AWS Glue Schema Registry for schema management — defining the structure of Kafka messages (Avro, Protobuf, JSON Schema). Schema Registry stores the schema definitions for your topics.

When a schema definition includes field names like user_id, email, ip_address, birth_date, or account_number, the schema itself becomes structural metadata about personal data processing. This metadata is stored in AWS Glue (a US entity) and is accessible under the CLOUD Act. A CLOUD Act order requesting your MSK-related Glue Schema Registry data would reveal the complete personal data model of your streaming architecture — every field type, every message format, every topic's data structure — without touching the message data itself.

For applications handling special categories of personal data (GDPR Art.9: health data, biometric data, political opinions), schema definitions in Glue Schema Registry expose the data category structure to US jurisdiction.

A Kafka-based architecture built on MSK creates a documentation problem under GDPR Art.30 that grows with scale:

Every topic is a processing activity. Every consumer group is a processor of that activity. Every connector is a data transfer pipeline. Every schema registry entry is a documentation of data structure.

Documenting all of this under Art.30 requires knowing:

What personal data flows through each topic
Which consumer groups access each topic and when
Where connectors send data
What the retention policy is for each topic's personal data

MSK provides no native Art.30 documentation assistance. The operational state — consumer offsets, connector configs, schema definitions — is distributed across MSK, MSK Connect, Glue Schema Registry, and CloudWatch. Reconstructing a complete Art.30 record from these sources for a GDPR audit is a significant manual effort.

Self-hosted Kafka provides the same operational complexity, but with one critical difference: the operator controls where all of this metadata resides. There is no AWS entity in the middle with US-jurisdiction access to the control plane.

EU Alternatives to AWS MSK

Redpanda Cloud (EU-Region Deployable)

Redpanda is a Kafka-compatible streaming platform built from scratch in C++ — no JVM dependency, no ZooKeeper. Redpanda is API-compatible with Apache Kafka: existing Kafka clients, consumer groups, and producer code work against Redpanda without modification.

Redpanda Cloud offers EU-region deployments (AWS eu-west-1, GCP europe-west4). For full EU sovereignty, Redpanda can be deployed self-hosted on EU infrastructure — Hetzner Cloud, OVHcloud, or any EU-region VM provider. Self-hosted Redpanda gives you complete control over where consumer group offsets, broker logs, and cluster metadata reside. No US entity has structural access.

Redpanda's single-binary architecture simplifies operational complexity compared to Apache Kafka: no separate ZooKeeper cluster, no JVM tuning. Consumer group semantics, topic configuration, and partition assignment work identically to Kafka.

For GDPR Art.25 (data protection by design): Redpanda's built-in RBAC and TLS mutual authentication provide access controls at the broker level. Consumer group names remain internal to your infrastructure — not exported to a third-party monitoring service by default.

Strimzi: Kafka on Kubernetes (EU Infrastructure)

Strimzi is a CNCF-hosted Kubernetes operator for running Apache Kafka on Kubernetes clusters. Strimzi automates the operational tasks that MSK abstracts: cluster provisioning, rolling upgrades, TLS certificate management, topic and user administration.

Deployed on a Kubernetes cluster in a EU-hosted environment — Hetzner Cloud Kubernetes (Frankfurt, Helsinki), OVHcloud Managed Kubernetes (Gravelines, Beauharnois EU), IONOS Kubernetes — Strimzi provides a fully EU-sovereign Kafka deployment. All consumer group offsets, schema data (via Strimzi's integrated Schema Registry support), and connector configurations reside in your Kubernetes cluster on EU infrastructure.

Strimzi supports:

Kafka Cluster management (provisioned via Custom Resources)
Kafka Connect clusters (EU-native equivalent to MSK Connect)
Kafka MirrorMaker 2 (replication between clusters without AWS involvement)
Kafka Bridge (HTTP interface for non-JVM producers/consumers)

The operational model changes: instead of MSK console, you use kubectl and Kubernetes Custom Resources. For teams already running Kubernetes workloads, this is a natural fit. The observability layer (Prometheus metrics via JMX exporter, Grafana dashboards) stays within your infrastructure — not in CloudWatch.

Apache Kafka Self-Hosted on EU VMs (Direct Control)

For teams with existing Kafka expertise, running Apache Kafka directly on EU-hosted VMs provides maximum control with no third-party managed service in the data path.

Typical deployment on Hetzner Cloud:

3× CX31 nodes (Frankfurt) for a production Kafka cluster (ZooKeeper or KRaft mode)
1× broker node per AZ for multi-zone resilience
Total cost: ~€60-80/month for a 3-broker cluster adequate for most mid-scale workloads

MSK equivalent pricing for a 3-broker kafka.m5.large cluster in eu-central-1: ~$540/month (broker hours + storage). Self-hosted Kafka on Hetzner at comparable capacity costs approximately 85-90% less. The operational overhead is real but manageable: Kafka's KRaft mode (ZooKeeper-less, stable since Kafka 3.3) significantly reduces the operational surface.

For GDPR compliance: all consumer group offsets, topic data, broker logs, and connector state reside on infrastructure you control in EU data centers with no US entity in the ownership chain (Hetzner Cloud GmbH, Falkenstein/Nuremberg, Germany).

NATS JetStream: Kafka Alternative for Simpler Streaming

NATS JetStream provides persistent messaging semantics (at-least-once delivery, consumer acknowledgments, replay) with a significantly simpler operational model than Apache Kafka. NATS JetStream covers many common Kafka use cases without the complexity of partition management, consumer group rebalancing, or broker coordination.

For teams using MSK primarily for:

Microservice event streaming
Async job queues with delivery guarantees
Simple pub/sub with replay capability

NATS JetStream may be a more appropriate fit than a full Kafka replacement. NATS is a single binary, deployable on any EU VM, with built-in clustering for high availability. Memory footprint and latency characteristics are significantly better than Kafka for moderate throughput workloads.

For GDPR compliance: NATS JetStream's consumer state (equivalent to Kafka consumer group offsets) resides entirely within your NATS cluster on EU infrastructure. No managed US service is involved in the data path.

Comparison Table

Capability	AWS MSK	Redpanda Self-Hosted	Strimzi on k8s	Kafka Self-Hosted	NATS JetStream
Kafka API Compatible	✅	✅	✅	✅	❌ (own API)
EU-Sovereign Data	❌ (US entity)	✅ (self-hosted)	✅	✅	✅
Consumer Group Offsets	US jurisdiction	Your infra	Your infra	Your infra	Your infra
CLOUD Act Risk	High	None (self-hosted)	None	None	None
Managed Service	✅	✅ (Redpanda Cloud EU)	Partial (k8s-managed)	❌	❌
Connector Framework	MSK Connect	Kafka Connect compat.	Kafka Connect	Kafka Connect	NATS Connectors
Cost (3-broker equiv.)	~€500/mo	~€60-120/mo	~€60-80/mo (infra)	~€60-80/mo	~€15-40/mo
Art.30 Audit Support	None native	Self-managed	Self-managed	Self-managed	Self-managed
Schema Registry	Glue (US entity)	Schema Registry	Apicurio EU	Confluent/Apicurio	—

Migration Path from MSK to EU-Sovereign Kafka

Step 1: Inventory Your MSK Configuration

Export your MSK configuration before migration:

# List all topics and configurations
aws kafka list-clusters --region eu-central-1
aws kafka describe-cluster --cluster-arn <arn>

# Export topic configurations
aws kafka list-topics --cluster-arn <arn> | jq '.TopicInfoList[].TopicName' | while read topic; do
  aws kafka list-topic-configurations --cluster-arn <arn> --topic-name "$topic"
done

# Export consumer groups
kafka-consumer-groups.sh --bootstrap-server <msk-endpoint>:9092 --list

Catalog every topic, its retention policy, its partition count, and its replication factor. For each topic: does it contain personal data? If yes, document it in your Art.30 record.

Step 2: Identify MSK Connect Connectors

List all connectors and their configurations:

# Via AWS CLI
aws kafkaconnect list-connectors --region eu-central-1

# For each connector, note:
# - Source or sink?
# - What data flows through it?
# - Where does it send data to (if sink)?

Each connector is a data transfer in your Art.30 record. For sink connectors writing personal data to external systems: verify the destination is EU-sovereign before migration.

Step 3: Deploy EU-Sovereign Cluster (Strimzi Example)

# strimzi-kafka-cluster.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: production-cluster
  namespace: kafka
spec:
  kafka:
    version: 3.7.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      log.retention.hours: 168
    storage:
      type: jbod
      volumes:
        - id: 0
          type: persistent-claim
          size: 100Gi
          class: hcloud-volumes  # Hetzner Cloud volumes
          deleteClaim: false
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      class: hcloud-volumes
      deleteClaim: false
  entityOperator:
    topicOperator: {}
    userOperator: {}

Deploy to a Kubernetes cluster on EU infrastructure. All consumer group offsets, broker logs, and topic data reside on Hetzner volumes (Germany). No US entity in the data path.

Step 4: Mirror Topics During Cutover (MirrorMaker 2)

Kafka MirrorMaker 2 supports bidirectional topic replication. Use it to mirror topics from MSK to your EU-sovereign cluster during the cutover period:

# mirrormaker2.yaml  
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
  name: msk-to-eu-mirror
  namespace: kafka
spec:
  version: 3.7.0
  replicas: 1
  connectCluster: eu-kafka
  clusters:
    - alias: msk
      bootstrapServers: <msk-bootstrap-endpoint>:9092
    - alias: eu-kafka
      bootstrapServers: production-cluster-kafka-bootstrap.kafka:9092
  mirrors:
    - sourceCluster: msk
      targetCluster: eu-kafka
      sourceConnector:
        config:
          replication.factor: 3
          offset-syncs.topic.replication.factor: 3
          sync.topic.acls.enabled: false
      topicsPattern: ".*"

Run MirrorMaker 2 until all consumer groups have caught up on the EU cluster. Then redirect producers to the EU cluster endpoint and decommission MSK.

Step 5: Migrate MSK Connect Connectors

For each MSK Connect connector:

Export the connector configuration (JSON)
Verify the sink/source is EU-sovereign (or acceptable under Art.46 safeguards)
Deploy equivalent Kafka Connect worker on EU infrastructure (Strimzi KafkaConnect CR)
Test connector behavior on EU cluster
Disable MSK Connect connector and start EU Kafka Connect connector

Once migrated to self-hosted Kafka:

Processing Activity: Real-Time Event Streaming
Controller: [Your Legal Entity]
Processor: None (self-hosted Kafka, no managed service)
Infrastructure: [EU-Sovereign Provider] — [Country]
Data Categories: [List from your topic inventory]
Retention: [Per-topic retention policy — log.retention.hours]
Transfer to Third Countries: None (all infrastructure EU-resident)
Technical Measures: TLS in transit, encryption at rest (LUKS/dm-crypt on broker volumes)
Access Controls: SASL/SCRAM per consumer group, topic-level ACLs
Consumer Group Audit: Kafka consumer group logs retained for [N days] for access audit

This is the Art.30 record you cannot easily produce for MSK, because the infrastructure operator (AWS, US entity) is a third party with independent CLOUD Act exposure.

What This Means for Your Data Protection Officer

If your organization uses AWS MSK to process personal data of EU data subjects, your DPO should be aware of:

Consumer group offsets are personal data processing records under US-entity control. They document which services accessed personal data and when.
MSK Connect connectors create automated personal data transfer pipelines whose execution environment is US-jurisdiction infrastructure. Each connector configuration should be reviewed as a data transfer under Art.46.
Cross-region replication from EU to US MSK clusters is a third-country transfer under Art.46. Standard Contractual Clauses alone may be insufficient given that the data processor (AWS) is also the compelled party under the CLOUD Act — the SCC counterparty cannot practically prevent their own government from issuing CLOUD Act orders.
CloudWatch metrics about topic throughput and consumer group lag expose the operational profile of personal data processing to a US-entity service. This is a secondary processing relationship not always documented in Art.30 records.
Glue Schema Registry stores the structural definition of personal data in your Kafka topics under US-entity control. For special categories of data (Art.9), this structural metadata exposure is a separate compliance risk.

The fundamental issue is not where AWS stores your Kafka data in Europe — it is that the legal entity controlling that infrastructure has US jurisdiction obligations that GDPR cannot override. Self-hosted Kafka on EU infrastructure eliminates this structural dependency entirely.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.

Start free — no credit card View pricing