Starburst Galaxy EU Alternative 2026: GDPR, CLOUD Act & Trino SaaS Sovereignty
Post #4 in the sota.io EU Data Lakehouse Tools Series
Starburst Galaxy is the managed SaaS offering built on top of Trino (formerly PrestoSQL) — the open-source distributed SQL query engine originally created at Facebook. For EU data teams running federated analytics across data lakes, object stores, and operational databases, Galaxy offers powerful query-federation capabilities. But underneath the convenience lies a critical sovereignty problem: the control plane that orchestrates every federated query across your EU data sources runs in US-jurisdiction infrastructure.
This is the fourth post in our five-part EU Data Lakehouse Tools series. We've already covered Databricks, Snowflake, and dbt Cloud. Starburst presents a distinct sovereignty challenge: unlike warehouse platforms that store data, Starburst federates queries — which means your EU personal data transits a US-controlled orchestration layer even when the underlying data never leaves European servers.
What Is Starburst Galaxy?
Starburst Data Inc. is a Delaware C-Corp headquartered in Boston, Massachusetts. Founded in 2017 by the original Presto team from Facebook, Starburst raised $164M Series D (2021) led by Andreessen Horowitz. The company provides:
- Starburst Galaxy — fully managed Trino-as-a-Service with AWS, Azure, and GCP deployment options
- Starburst Enterprise — self-managed Trino distribution for on-premises and private cloud
- SEP (Starburst Enterprise Platform) — enterprise Trino with enhanced security, governance, and performance
Galaxy's core value proposition is query federation: connect to S3, Delta Lake, Iceberg, Hive, PostgreSQL, MySQL, Elasticsearch, Kafka, and dozens of other sources — and run a single SQL query across all of them simultaneously. For EU organizations with data spread across multiple cloud stores and on-premises databases, this capability is genuinely powerful.
CLOUD Act Score: 16/25
| Dimension | Score | Assessment |
|---|---|---|
| D1 — HQ Jurisdiction | 5/5 | Delaware C-Corp, Boston MA — full US jurisdiction |
| D2 — Data Routing | 3/5 | EU AWS/Azure regions available, but control plane runs US-side |
| D3 — Subprocessors | 3/5 | AWS us-east-1, Azure eastus as primary control plane infrastructure |
| D4 — Personnel Access | 3/5 | US-based SRE team can access Galaxy cluster configurations, query plans |
| D5 — Legal Framework | 2/5 | SCCs only, no BCR, no CLOUD Act Shield commitment, no FISA challenge history |
| Total | 16/25 | Significant CLOUD Act exposure |
A score of 16/25 places Starburst in the "meaningful risk" zone for GDPR Art.46 transfer mechanism compliance. The specific risk is not raw data storage — Galaxy doesn't inherently store your EU data — but rather the query orchestration layer that processes, routes, and caches query plans and result metadata.
Three Named Risk Patterns
1. Trino Query Federation Control Plane Pattern
Every SQL query executed in Galaxy follows this path: your client submits the query → Galaxy's coordinator (running in US jurisdiction) parses and optimizes the query plan → the coordinator distributes split tasks to workers → workers pull data from your EU sources → results aggregate in the coordinator → output returns to your client.
The critical CLOUD Act exposure point is the Galaxy coordinator cluster. Even when all your underlying data lives in EU AWS regions (S3 eu-west-1, Aurora eu-central-1), the coordinator that:
- Parses SQL including column names and filter conditions that may identify EU data subjects
- Stores intermediate query plans revealing your data model's structure
- Caches result metadata including row counts, distinct value estimates, and column statistics
- Logs query history for performance analysis and debugging
...all operates in US-jurisdiction infrastructure. A CLOUD Act compelled disclosure targeting Starburst could expose every SQL query your EU data team has run for the past 90 days, including queries that enumerate EU personal data attributes.
GDPR Art.32 implication: Processing operations that reveal your schema design, query patterns, and data access frequencies constitute "processing of personal data" when the queries operate on personal data. EDPB guidelines recognize that metadata about personal data processing is itself subject to Art.32 technical safeguards.
DORA Art.28 implication: For financial institutions using Galaxy for regulatory reporting or risk analytics, the coordinator's US jurisdiction creates a critical ICT third-party dependency that requires contractual documentation of "location of data processing" — which Galaxy cannot honestly represent as EU-only.
2. Ranger Policy Propagation Gap
Galaxy integrates with Apache Ranger for fine-grained access control across connected data sources. When you configure row-level security (RLS) policies in Galaxy — for example, restricting EU data analysts to only query rows where data_residency = 'EU' — those Ranger policies are stored and managed in Galaxy's US-hosted management plane.
The operational CLOUD Act gap:
- Policy state custody: Your data access policies (which users can see which columns of which tables) are CLOUD Act-accessible in Galaxy's US infrastructure
- Policy propagation latency: RLS policy changes must travel from the US management plane to EU-region worker nodes before taking effect — creating a window where updated GDPR-required access restrictions may not yet be enforced
- Audit log jurisdiction: Ranger audit logs recording who accessed which EU personal data fields are generated and stored in Galaxy's US infrastructure before optional export to your SIEM
For GDPR Art.5(1)(f) "integrity and confidentiality" compliance, your access control system's own policy state being US-accessible undermines the sovereign access control model required for EU personal data.
3. Iceberg REST Catalog Exposure
Galaxy's deepest integration is with Apache Iceberg — the open table format for data lakehouses. Galaxy includes a managed Iceberg REST Catalog that stores:
- Table schemas — column names, data types, nested struct definitions for your Iceberg tables
- Partition specifications — how EU personal data is physically partitioned (often by date of birth ranges, geographic regions, or user ID hashes)
- Snapshot history — complete time-travel metadata including every schema evolution, compaction, and optimization operation
- Statistics and manifests — file-level statistics including record counts, column value ranges, and null value distributions
This REST Catalog — which functions as the metadata brain of your Lakehouse — runs as a Galaxy-managed service in US-jurisdiction infrastructure. Under CLOUD Act compelled disclosure, a US government request could obtain the complete structural blueprint of your EU personal data Lakehouse: which columns exist, how the data is physically organized, how the schema has evolved over time, and statistical distributions that can reveal the demographics of your EU data subjects.
The Partition Specification Risk: Iceberg partition specs are particularly sensitive. A financial institution partitioning by customer_country_of_residence and risk_tier reveals both geographic distribution and creditworthiness modeling of EU data subjects. A healthcare provider partitioning by diagnosis_code_range exposes the structure of special category health data (GDPR Art.9). Galaxy's managed REST Catalog makes these partition specs US-jurisdiction-accessible by design.
EU-Native Alternatives
For EU data teams needing Trino's federated query power without US control plane exposure:
Trino Self-Hosted (CLOUD Act Score: 0/25)
The obvious starting point. Trino is Apache 2.0 licensed — free to deploy on EU infrastructure. Key considerations:
- Deployment: Kubernetes via Helm charts on eu-west-1, eu-central-1, or on-premises
- Catalog management: Configure your own Hive Metastore or Iceberg REST Catalog on EU infrastructure
- Security: Self-managed Ranger or OPA (Open Policy Agent) for access control — no US policy plane dependency
- Operational cost: Significant — Trino cluster management requires dedicated platform engineering expertise
Apache Spark + Iceberg (CLOUD Act Score: 0/25)
For batch-first use cases where Trino's low-latency federation isn't essential:
- Delta Lake and Iceberg both run natively on self-hosted Spark
- EU cloud provider: OVHcloud, Hetzner, Deutsche Telekom OTC, or Scaleway — all offer Spark-compatible object storage
- Governance: Apache Atlas for metadata catalog, Apache Ranger for access control — both self-hosted in EU
DuckDB (CLOUD Act Score: 0/25)
For smaller-scale federated analytics within a single node:
- Origin: CWI Amsterdam, Netherlands — EU-native research institution
- License: MIT — no vendor lock-in, no SaaS control plane
- Use case: BI analytics, ad-hoc data science, replacing heavy Spark for medium-scale workloads
- EU deployment: Runs anywhere — EC2, container, laptop — no external connectivity required
Ahana Cloud for Presto (EU Deployable)
Ahana (acquired by IBM) offers managed Presto (the original Facebook fork, separate from Trino) with deployment options that can be configured for EU-only data residency. However, Ahana's control plane is still IBM US-jurisdiction — evaluate with the same CLOUD Act framework before deployment.
Contractual Protections: What Galaxy's DPA Provides (and Doesn't)
Starburst offers a Data Processing Addendum that includes:
- EU SCCs (2021) — Standard Contractual Clauses for international data transfers
- Sub-processor list — disclosure of AWS and Azure as primary infrastructure providers
- Data residency options — choice of EU AWS or EU Azure regions for worker clusters
What Galaxy's DPA does not provide:
- CLOUD Act Shield: No commitment to challenge US government data requests before compliance
- Warrant canary: No regular transparency reporting on government data requests
- FISA Section 702 carve-out: No limitation on intelligence community access
- Control plane jurisdiction commitment: No guarantee that the coordinator and management plane will remain EU-jurisdiction-accessible only
For GDPR Art.46 compliance documentation, you can implement SCCs — but the Transfer Impact Assessment (TIA) required by EDPB guidelines must acknowledge that Galaxy's US-jurisdiction control plane creates a realistic risk of CLOUD Act exposure for EU query metadata, policy state, and catalog information.
Decision Framework: When Galaxy is Acceptable vs. When to Self-Host
| Scenario | Galaxy Acceptable? | Recommendation |
|---|---|---|
| EU personal data analytics (GDPR Art.9 excluded) | Conditionally | Implement SCC + TIA, document residual risk |
| Special category data (health, financial, biometric) | No | Self-host Trino or Apache Spark |
| DORA-regulated financial institution | Assess | Legal review required; Art.28 documentation complex |
| NIS2-critical infrastructure | No | On-premises Trino with EU-only policy plane |
| Non-EU non-personal data analytics | Yes | Full Galaxy deployment appropriate |
| GDPR Art.9 data — minimized (only aggregates, no PII) | Yes with caution | Ensure no raw PII transits coordinator |
The Trino Governance Stack: What EU Self-Hosters Actually Need
If you decide to self-host Trino rather than use Galaxy, here's the complete governance stack you need to replicate Galaxy's enterprise features:
Query Engine:
- Trino v440+ — active LTS from Trino Software Foundation (trinodb.org)
- Deployment via Trino Helm Chart (Kubernetes) or Starburst Enterprise for support contract
Catalog & Metadata:
- Apache Iceberg REST Catalog (self-hosted) — schema and snapshot management
- Apache Hive Metastore — for Hive-format compatibility
- Project Nessie — Git-for-data catalog with time travel, Iceberg-native
Access Control:
- Apache Ranger (self-hosted) — fine-grained column-level access control
- Open Policy Agent (OPA) — policy-as-code for programmatic access control
- Trino's built-in file-based access control — for simpler environments
Monitoring & Operations:
- Trino's built-in JMX metrics → Prometheus → Grafana
- Query audit logging → EU-jurisdiction log aggregation (Loki, OpenSearch self-hosted)
EU Cloud Providers with Trino-Ready Infrastructure:
- OVHcloud (FR) — Managed Kubernetes + S3-compatible Object Storage
- Hetzner (DE) — Cost-effective bare metal + Kubernetes for Trino workers
- IONOS (DE) — Cloud Cubes + S3 Object Storage (GDPR DPA compliant)
- Scaleway (FR) — Managed Kubernetes + Scaleway Object Storage
GDPR Technical Safeguard Checklist for Trino Deployments
For EU data protection authorities reviewing Trino-based data lakehouse deployments, these are the technical safeguards expected under GDPR Art.32:
- Query logging encryption: All Trino coordinator query logs encrypted at rest with EU-held keys
- Access control completeness: Ranger or OPA policies covering all catalogs, schemas, tables, and columns containing personal data
- Data minimization at query layer: Column-level masking for columns not required by the specific query purpose (Art.5(1)(c))
- Audit trail jurisdiction: Query audit logs stored in EU-jurisdiction systems, accessible to data subjects (Art.15) and supervisory authorities (Art.58)
- Sub-processor contracts: All infrastructure providers (cloud, storage, Kubernetes) have GDPR-compliant DPAs with EU SCCs
- Transfer documentation: TIA completed for any cross-border data flows, including EU→EU EEA where recipient country lacks adequacy decision
- Incident response procedure: Defined process for Art.33 breach notification within 72 hours of discovery, triggered by Trino audit log anomalies
KRITIS-Dachgesetz Context (July 2026)
German organizations designated as critical infrastructure under KRITIS-Dachgesetz (effective 2026-07-17) face additional requirements for ICT systems that process critical infrastructure operational data. Starburst Galaxy deployments for operational analytics in energy, water, transport, or healthcare sectors require:
- Risk assessment per §30 KRITIS-Dachgesetz for third-party ICT services
- Incident reporting capability that doesn't depend on US-jurisdiction control plane availability
- Business continuity documentation showing KRITIS operations continue if Galaxy's US control plane is disrupted
Self-hosted Trino with EU-only infrastructure satisfies KRITIS-Dachgesetz ICT requirements by design. Galaxy requires additional contractual and technical measures that may be difficult to obtain from a US-incorporated vendor.
Conclusion
Starburst Galaxy offers genuine value for EU data teams: managed Trino removes significant operational complexity, and the connector ecosystem makes cross-source federation genuinely fast to implement. The sovereignty gap is not trivial to close with contractual protections alone.
The CLOUD Act risk pattern for Galaxy is distinctive: it's not about data storage but about query orchestration. Every SQL statement your EU data team executes — including queries that enumerate, count, or analyze EU personal data — transits a US-jurisdiction coordinator that is CLOUD Act-compellable. Combined with the Iceberg REST Catalog exposure and Ranger policy plane gap, Galaxy creates a three-layer sovereignty deficit that SCCs alone cannot remediate under EDPB's TIA guidelines.
For EU organizations with GDPR Art.9 special category data or DORA-regulated analytics workloads: Self-host Trino on EU infrastructure with EU-jurisdiction governance components. The operational investment is significant but represents the only path to genuine data lakehouse sovereignty.
For EU organizations with less sensitive workloads: Galaxy with SCCs, a completed TIA, and documented residual risk acknowledgment may be acceptable — with legal and DPO sign-off.
The next and final post in this series will be our comprehensive EU Data Lakehouse Comparison Finale: Databricks vs. Snowflake vs. dbt Cloud vs. Starburst vs. EU-native OSS — with a decision framework for architects and DPOs choosing their 2026 data lakehouse strategy.
Part 4 of 5 in the sota.io EU Data Lakehouse Tools Series. See Databricks | See Snowflake | See dbt Cloud
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.