AWS Keyspaces EU Alternative 2026: GDPR, CLOUD Act, and the Tombstone Erasure Problem
Post #783 in the sota.io EU Compliance Series
AWS Keyspaces (for Apache Cassandra) is Amazon's fully managed, serverless wide-column database service, offering Cassandra Query Language (CQL) compatibility without the operational overhead of managing Cassandra clusters. It is widely used for high-throughput workloads: user session storage, IoT device state, time-ordered event ledgers, product catalogs with sparse attributes, and real-time analytics at scale. Keyspaces handles replication, node management, and compaction automatically — and integrates natively with AWS Lambda, DynamoDB Streams (via exports), and Amazon SageMaker.
The GDPR exposure with AWS Keyspaces is not simply "it's on AWS infrastructure." The specific concern is structural: Cassandra's deletion model does not delete data immediately. Deletions in Cassandra are implemented as tombstones — markers that record the fact of deletion, not the absence of data. The underlying data remains in SSTables on disk until compaction physically removes it. In managed AWS Keyspaces, you cannot trigger compaction on demand, and you cannot control the compaction schedule. AWS controls when your "deleted" personal data is actually removed from disk.
This creates a concrete Art.17 erasure compliance gap that is architectural, not configurable.
AWS operates Keyspaces in Frankfurt (eu-central-1), Stockholm (eu-north-1), Ireland (eu-west-1), and other European regions. It ships with the standard AWS GDPR Data Processing Addendum and Standard Contractual Clauses. And like every AWS managed service, it is operated by Amazon Web Services, Inc., a Delaware-incorporated entity fully subject to the CLOUD Act (18 U.S.C. § 2713), regardless of where the physical data resides.
What AWS Keyspaces Stores and Processes
Keyspaces is a wide-column store — data is organized into tables with a partition key, optional clustering columns, and an arbitrary number of additional columns per row. This model is intentionally flexible: rows can have different columns, columns can be added at any time without schema migrations, and the clustering column enables efficient range scans within a partition.
This flexibility makes Cassandra exceptionally well-suited for behavioral data storage — which is precisely what creates GDPR tension.
User session and event streams as personal data. A typical Keyspaces table for user activity might use user_id as the partition key and event_timestamp as the clustering column, with additional columns for action_type, resource_id, ip_address, user_agent, and session_id. Each row is a personal data record under Art.4(1) GDPR: it identifies a natural person and describes their behaviour. The partition for a given user_id is, in aggregate, a behavioral profile — a sequential record of everything that user has done in your application.
Wide rows accumulate PII silently. Cassandra's column model encourages accumulation. Because you can add columns to individual rows without schema changes, applications frequently write new attributes to existing rows as features evolve. Over time, a user's partition may contain dozens of columns covering activity data collected across years — data that was never explicitly decided upon, never inventoried in an Art.30 record, and never evaluated for proportionality under Art.5(1)(c) data minimization. The ease of adding columns to Cassandra is the technical root of this problem.
Device and IoT state as Art.9 adjacent. IoT applications frequently store health-related device state in Keyspaces: heart rate monitor readings, blood glucose sensor data, sleep tracker events, medication dispensing records. When the device is associated with a natural person, this data constitutes health data under Art.9 GDPR — a special category requiring explicit consent, a DPO review, and typically a DPIA under Art.35. AWS Keyspaces does not distinguish between regular and special-category data at the service level.
Point-in-Time Recovery (PITR) backups. Keyspaces PITR stores continuous backup data in Amazon S3, enabling restoration to any point in the last 35 days. This backup layer is fully outside your table's compaction schedule: even if a tombstone has been compacted and the data removed from your active table, the PITR backup may still contain the pre-deletion snapshot. An Art.17 erasure request does not automatically purge PITR history. Disabling PITR is possible, but it eliminates your disaster recovery capability — a trade-off most organizations are unwilling to make.
Multi-region replication and Art.44 transfers. Keyspaces supports multi-region replication, allowing tables to be replicated across multiple AWS regions for low-latency global access and disaster recovery. If your primary table is in eu-central-1 and you add us-east-1 as a replica, you have established an Art.44 cross-border transfer of all data in that table — including personal data. The transfer is governed by AWS Standard Contractual Clauses, but it exists: your EU user data now resides on infrastructure in the United States.
Cassandra system tables and metadata. Keyspaces exposes system tables (e.g., system_schema.tables, system_schema.columns) that describe your data model. These reveal the categories of personal data you process — field names like email, health_score, location_lat, location_lon, political_affiliation are visible in schema metadata. AWS infrastructure has access to this schema metadata as part of operating the managed service.
The Tombstone Erasure Problem: Art.17 and the Compaction Schedule
Art.17 GDPR grants data subjects the right to erasure — the "right to be forgotten." When a controller receives a valid erasure request, personal data must be deleted without undue delay. In practice, most compliance teams interpret this as "deleted from the primary store," with backup retention policies covering backup data separately.
In Cassandra, deletion is more complex than in other databases.
How Cassandra tombstones work. When you execute DELETE FROM users WHERE user_id = '12345', Cassandra does not remove the data from disk. Instead, it writes a tombstone — a special marker in the SSTable that records the deletion. Any subsequent read of that partition checks for tombstones and filters them out, making the data appear deleted to your application. But the underlying data and the tombstone both remain in the SSTable on disk until compaction runs and merges the SSTables, discarding the tombstoned data.
This two-phase deletion is fundamental to how Cassandra achieves its distributed, eventually-consistent architecture. Tombstones exist because in a distributed system, a delete command might arrive at one replica before it reaches another. The tombstone travels through the cluster and ensures that even if a node was temporarily offline when the delete was issued, it will eventually learn about the deletion.
The gc_grace_seconds window. Cassandra defines gc_grace_seconds (default: 10 days) as the minimum period a tombstone must persist before compaction can remove it. This window exists to prevent "zombie data" — where a temporarily-offline replica comes back online, sees data without the tombstone, and treats it as a valid live record. During the gc_grace_seconds window, the tombstoned personal data remains on disk.
AWS Keyspaces compaction is managed, not controllable. In self-hosted Cassandra, you can trigger manual compaction with nodetool compact immediately after gc_grace_seconds expires. In AWS Keyspaces, compaction is fully managed by AWS. You cannot schedule compaction, trigger it on demand, or observe its progress. AWS does not publish a contractual guarantee on compaction latency.
This means that after issuing a Cassandra DELETE for a user's personal data:
- The data remains in SSTables on disk for at least the gc_grace_seconds period (10 days by default).
- After that period expires, the data may be removed on the next compaction cycle, which AWS schedules internally.
- You have no mechanism to force or verify physical removal.
- PITR backups retain the pre-deletion state for up to 35 days regardless of compaction.
For Art.17 compliance purposes, "deleted" in Keyspaces means "logically removed from query results" — not "physically removed from disk." Whether this satisfies Art.17's requirement to erase "without undue delay" is a legal question with no clear industry consensus, but many DPOs take the view that data remaining on disk for weeks after a deletion request is not compliant.
CLOUD Act Jurisdiction
AWS Keyspaces is operated by Amazon Web Services, Inc., a US corporation. The CLOUD Act (Clarifying Lawful Overseas Use of Data Act, 18 U.S.C. § 2713) extends US law enforcement data access to any data stored, processed, or transmitted by a US-incorporated entity — regardless of the physical location of the servers.
What this means in practice. US authorities can issue a legal order to AWS requiring disclosure of data from your Keyspaces tables in eu-central-1 without notifying you or involving a European court. AWS is legally prohibited from disclosing to you that such an order exists in many circumstances. The GDPR Standard Contractual Clauses that AWS uses do not override a valid US government legal order — AWS has acknowledged this explicitly in its documentation.
Specific CLOUD Act exposure points for Keyspaces:
- Active table data: All rows currently in your Keyspaces tables in any region, including EU regions.
- Tombstoned data: Data that has been logically deleted but not yet physically compacted — potentially more data than you believe is "live."
- PITR backup data: Up to 35 days of continuous backup history in S3, covering deleted data that has not been purged.
- Multi-region replicas: Any data replicated to a non-EU region as part of a multi-region replication setup.
- System metadata: Schema definitions, table configurations, CloudWatch metrics, and CloudTrail access logs for all Keyspaces operations.
The Schrems II ruling (C-311/18) established that Standard Contractual Clauses alone are insufficient when the recipient's local law prevents them from fulfilling the SCCs' obligations. US CLOUD Act obligations and SCCs are in direct tension: if AWS receives a CLOUD Act order, it must comply, potentially breaching its SCCs.
Art.22 Behavioral Profiling in Wide-Column Data
Art.22 GDPR governs automated decision-making that produces legal or similarly significant effects. Wide-column Cassandra tables are purpose-built for the kind of high-volume behavioral data that feeds automated profiling systems.
The partition-as-profile pattern. A standard Keyspaces table design for user analytics uses user_id as partition key and event_timestamp as clustering column. The result is that every user's partition is a chronologically-ordered behavioral profile: a complete, efficiently-retrievable record of that person's interactions with your system. This data feeds recommendation engines, churn prediction models, fraud detection systems, and credit scoring algorithms — all of which may constitute Art.22 automated decision-making.
Art.22 obligations. If your Keyspaces data feeds automated decisions with significant effects, you must: (1) inform users that profiling occurs (Art.13/14), (2) provide the right to object (Art.21), (3) implement human review mechanisms (Art.22(3)), and (4) perform a DPIA for large-scale profiling (Art.35). None of these obligations are met by using AWS Keyspaces itself — they are controller responsibilities. But they are triggered by the pattern of using Keyspaces for behavioral data.
The inference amplification problem. Cassandra's efficient range scans within a partition make it straightforward to retrieve a user's complete behavioral history. A query like SELECT * FROM user_events WHERE user_id = '12345' retrieves thousands of events covering months of activity in milliseconds. Systems that can efficiently retrieve complete behavioral histories can efficiently feed automated profiling — and GDPR Art.22 scrutiny follows.
Art.5(1)(c) Data Minimization and Column Creep
Art.5(1)(c) GDPR requires that personal data be "adequate, relevant and limited to what is necessary" for the processing purpose. Cassandra's schema-free column model makes this principle difficult to enforce in practice.
Column accumulation without governance. In a relational database, adding a new column to a table requires a schema migration — a deliberate, reviewable change. In Cassandra, a developer can add a new column to individual rows by simply writing it via CQL. There is no migration gate, no DPO review trigger, no Art.30 update requirement in the tooling itself. New columns accumulate over time, tracking new attributes of user behavior that were never evaluated for GDPR necessity.
The TTL partial solution. Cassandra supports Time-To-Live (TTL) on individual cells and rows — data automatically expires after a specified period. This provides a technical mechanism for data minimization. AWS Keyspaces supports TTL. But TTL creates the same tombstone dynamic as explicit deletions: expired TTL data is marked with a tombstone and subject to the same compaction schedule uncertainty. TTL does not bypass the gc_grace_seconds window.
Art.30 Records of Processing and Schema Metadata
Art.30 GDPR requires controllers to maintain records of processing activities, including categories of personal data processed. In a well-governed Keyspaces deployment, the schema definition should map to Art.30 categories — but this mapping is rarely automatic.
Schema as PII inventory. Column names like user_email, ip_address, health_score, location_lat, session_fingerprint, device_id reveal what personal data categories exist in your tables. The system schema is visible to AWS infrastructure as part of operating the managed service. An Art.30 record should reflect every Keyspaces column that contains personal data — a requirement that many organizations implement incompletely.
Keyspaces Streams (change data capture). Keyspaces supports a streams feature (in preview) similar to DynamoDB Streams, enabling change data capture. If streams are enabled, every INSERT, UPDATE, and DELETE generates a stream event containing the before and after state of the row. This stream data is personal data in its own right, with its own retention period and jurisdiction exposure. Stream data flowing into Kinesis Data Streams or Lambda adds further dimensions to the Art.30 record.
EU-Native Cassandra-Compatible Alternatives
Self-Hosted Apache Cassandra on EU Infrastructure
Apache Cassandra (open source, Apache License 2.0) can be deployed on any EU-based infrastructure:
- Hetzner Cloud (Germany): Dedicated Cassandra clusters on German-incorporated infrastructure. Full control over compaction scheduling, gc_grace_seconds configuration, and immediate data deletion workflows. No US-entity control over data.
- OVHcloud (France): Cassandra on OVH dedicated servers or Public Cloud instances. EU-headquartered provider under French law. GDPR DPA available.
- Scaleway (France): Cassandra deployment on Scaleway Dedibox dedicated servers. French company, EU jurisdiction, no CLOUD Act exposure.
Self-hosted Cassandra allows you to: set gc_grace_seconds to 0 for testing erasure completeness, trigger nodetool compact on demand after erasure requests, implement custom Art.17 erasure workflows with verification, and maintain full control over backup retention.
The operational overhead is real — Cassandra cluster management requires experienced operators. But the compliance guarantees that self-hosted provides are qualitatively different from managed AWS Keyspaces.
ScyllaDB (Cassandra-Compatible, EU-Hosted)
ScyllaDB is a high-performance, Cassandra-compatible database written in C++, offering significantly better throughput and lower latency than Apache Cassandra on equivalent hardware. ScyllaDB is CQL-compatible — your Keyspaces application code works without modification.
ScyllaDB Cloud offers fully managed ScyllaDB clusters with EU region options. Scylla Inc. is headquartered in Israel (EU-adjacent jurisdiction) with EU hosting options. The GDPR exposure depends on the specific deployment and DPA terms.
Self-hosted ScyllaDB on Hetzner, OVHcloud, or Scaleway delivers the performance advantages of ScyllaDB with full EU sovereignty. ScyllaDB's automatic compaction strategy is more aggressive than standard Cassandra — tombstones are typically removed faster — which improves Art.17 compliance characteristics compared to self-hosted Apache Cassandra.
Aiven for Apache Cassandra: Aiven (Finnish company) offers managed Apache Cassandra with EU region options. Aiven is EU-incorporated, which changes the CLOUD Act exposure. A detailed DPA review is required for your specific use case.
YugabyteDB (CQL-Compatible, EU-Deployable)
YugabyteDB is a distributed SQL database that offers both PostgreSQL-compatible (YSQL) and Cassandra-compatible (YCQL) interfaces. It provides stronger consistency guarantees than Cassandra (using Raft consensus), which actually simplifies some GDPR compliance scenarios: deletions in YugabyteDB are not tombstone-based, so there is no gc_grace_seconds erasure gap.
YugabyteDB can be self-hosted on EU infrastructure or deployed via Yugabyte Cloud with EU region options. For workloads that use CQL but can tolerate the operational model of YugabyteDB, this is worth evaluating specifically because it avoids the tombstone problem entirely.
PostgreSQL with JSONB (For CQL-Like Flexibility)
For workloads that use Cassandra primarily for its schema flexibility rather than for its write throughput, PostgreSQL with JSONB columns can provide comparable developer ergonomics with significantly better GDPR compliance characteristics:
DELETEin PostgreSQL is immediate from the application perspective (VACUUM handles physical removal, but this is configurable and fast).- Aiven for PostgreSQL (Finland): EU-incorporated, managed PostgreSQL with EU regions.
- Supabase (EU region): Managed PostgreSQL with built-in Row Level Security and good GDPR tooling.
- Self-hosted PostgreSQL on Hetzner/OVH: Maximum control.
The trade-off is write throughput at Cassandra scale — if you're writing millions of events per second, PostgreSQL is not the right fit. For most GDPR use cases where the behavioral profile concern is dominant, throughput requirements are lower.
sota.io Platform
sota.io is a European PaaS platform that enables deployment of any of the above alternatives — Apache Cassandra, ScyllaDB, YugabyteDB, PostgreSQL — on EU-controlled infrastructure with a single deploy command. It is designed for exactly the scenario this article describes: teams migrating away from AWS managed services to regain GDPR control without rebuilding their entire operational stack.
Deploy your ScyllaDB or Cassandra cluster on sota.io and your data remains on EU-incorporated infrastructure under EU jurisdiction. No CLOUD Act exposure. No managed-service compaction schedule outside your control.
Migration Considerations: Moving from AWS Keyspaces
Migrating a running Keyspaces workload to self-hosted Cassandra or ScyllaDB involves several steps:
Schema export. CQL DESCRIBE TABLE statements export your Keyspaces schema. ScyllaDB and Apache Cassandra accept these schemas directly with minor adjustments (Keyspaces uses a slightly modified CQL dialect — e.g., WITH CUSTOM_PROPERTIES blocks must be removed).
Data export. Keyspaces does not support direct SSTable export. Options include: (1) AWS Glue ETL jobs writing to S3 in Parquet format, then importing via CQLSH COPY; (2) Cassandra's cqlsh COPY TO command against your Keyspaces endpoint; (3) DataStax Bulk Loader (dsbulk) which works with standard CQL endpoints.
Application compatibility. Your application uses the Cassandra Java/Python/Go driver. Self-hosted Cassandra and ScyllaDB accept the same driver and CQL queries. No application code changes are required for a like-for-like migration.
Erasure workflow redesign. Post-migration, implement an explicit Art.17 erasure workflow: (1) Issue CQL DELETE, (2) Set gc_grace_seconds to 0 on the table (for testing), (3) Trigger nodetool compact immediately after the grace period, (4) Verify with nodetool cfstats that tombstone count drops to zero, (5) Restore gc_grace_seconds to production value. Document this workflow in your ROPA (Art.30 record).
Compliance Checklist: AWS Keyspaces and GDPR
| Requirement | AWS Keyspaces | Self-Hosted Cassandra (EU) | ScyllaDB EU | YugabyteDB EU |
|---|---|---|---|---|
| Art.17 Erasure (logical) | ✓ (tombstone) | ✓ (tombstone) | ✓ (tombstone) | ✓ (immediate) |
| Art.17 Erasure (physical) | ✗ (AWS-controlled compaction) | ✓ (nodetool compact on demand) | ✓ (aggressive auto-compact) | ✓ (no tombstone model) |
| Art.17 PITR Backup Purge | ✗ (manual S3 deletion) | ✓ (you control backups) | ✓ (you control backups) | ✓ (you control backups) |
| CLOUD Act exposure | ✗ (US entity) | ✓ (EU infra + EU provider) | ✓ (if EU-hosted) | ✓ (if EU-hosted) |
| Multi-region control | Partial (you choose regions) | ✓ | ✓ | ✓ |
| Compaction schedule control | ✗ | ✓ | ✓ | N/A |
| Art.22 profiling tooling | ✗ | ✓ | ✓ | ✓ |
| Art.30 schema visibility | Partial | ✓ | ✓ | ✓ |
| CQL compatibility | ✓ | ✓ | ✓ | ✓ (YCQL) |
Conclusion
AWS Keyspaces offers genuine value: serverless Cassandra with no operational overhead, native AWS integrations, and EU-region data residency. For teams building CQL-based applications on AWS, it removes significant infrastructure burden.
The GDPR compliance gaps are real and structural:
- Art.17 erasure: Tombstone-based deletion leaves personal data on disk for an indeterminate period after a deletion request. Compaction is AWS-controlled. PITR backups retain pre-deletion state for up to 35 days.
- CLOUD Act: Every row in every Keyspaces table in every region is reachable by US government legal order without European court involvement or notification to you.
- Art.22 behavioral profiling: Cassandra's partition-as-profile design makes wide-column stores the natural substrate for behavioral profiling — with the full Art.22 obligation set following.
- Art.5(1)(c) column creep: Schema-free column addition without governance gates makes data minimization difficult to enforce.
For organizations subject to GDPR — particularly those processing health data, financial behavioral records, or operating under NIS2 or the EU AI Act's high-risk provisions — these are not theoretical risks. They are concrete compliance gaps that a DPO audit will surface.
The path forward is either: (1) accept AWS Keyspaces with documented mitigations (shorter PITR window, explicit erasure SOP, Art.30 schema inventory, CLOUD Act disclosure in your privacy notice), or (2) migrate to EU-controlled infrastructure running Apache Cassandra, ScyllaDB, or YugabyteDB where you control the compaction schedule, the backup retention, and the CLOUD Act exposure does not exist.
For most organizations processing significant volumes of personal data in Cassandra, option two is the more defensible position — and sota.io exists to make that migration operationally straightforward.
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.