Blog — sota.io

2026-05-02·15 min read

# AWS HealthLake EU Alternative 2026: GDPR Art.9 Health Data, No EU Region, and the CLOUD Act Problem Amazon HealthLake is AWS's managed FHIR (Fast Healthcare Interoperability Resources) datastore — a purpose-built service for storing, transforming, and analysing patient health information at scale. It ingests FHIR R4 resources, normalises them into a queryable format, and powers analytics workflows across patient populations. For European healthcare organisations, HealthLake presents a compliance picture that is difficult to defend from first principles. The service has no EU region. Every EU patient record stored in HealthLake must be transferred to US-controlled AWS infrastructure under a legal framework that provides no adequate protection equivalent to GDPR's Art.9 obligations. And unlike general-purpose compute or storage services — where the argument about data residency and jurisdiction is complex — HealthLake stores a single category of data: health information about identified individuals. That is Article 9 special category personal data, subject to the highest tier of GDPR protection, processed exclusively on infrastructure that a US federal court can compel AWS to disclose. This article analyses the five critical GDPR failure vectors in HealthLake's architecture, then maps the EU-sovereign FHIR server alternatives that eliminate the cross-border transfer problem entirely. --- ## What AWS HealthLake Does (and Why That Creates GDPR Exposure) Amazon HealthLake is a HIPAA-eligible, FHIR R4-compliant managed datastore. It accepts FHIR bundles and individual resources — Patient, Observation, Condition, MedicationRequest, DiagnosticReport, Encounter, Immunization, AllergyIntolerance, and the full FHIR R4 resource library — and stores them in a queryable format. It provides: - **FHIR-native storage:** Every patient record is stored as a FHIR resource with a server-assigned logical ID, version history, and metadata - **Integrated transformation:** HealthLake can import data from legacy HL7v2 or CCDA formats, transforming it into FHIR R4 - **Search and query:** FHIR search parameters (patient ID, date range, clinical code) plus SQL-like analytics via Amazon Athena integration - **NLP enrichment:** HealthLake can run Amazon Comprehend Medical over unstructured clinical notes to extract named entities — diagnoses, medications, anatomy — and store them as structured FHIR extensions - **Population health analytics:** HealthLake's analytics features enable querying across patient populations — identifying cohorts, computing population-level statistics, and exporting to S3 for downstream ML workflows For US healthcare systems building on HIPAA-compliant infrastructure, HealthLake solves real problems: standardised FHIR ingestion, managed search indexing, and population health analytics without operating a dedicated FHIR server. The compliance question is not whether HealthLake is a technically sound service — it is — but whether European organisations can deploy it under GDPR's obligations for Article 9 data. --- ## GDPR Failure Vector 1: No EU Region — the Article 44 Transfer Problem **Article 44** prohibits transfers of personal data to third countries unless specific conditions are met: an adequacy decision, appropriate safeguards (Standard Contractual Clauses, Binding Corporate Rules), or derogations. For health data — Article 9 special category data — the bar is even higher. As of 2026, Amazon HealthLake is available in a small number of AWS regions, none of which are located in the European Union or EEA. The service has been available in: - US East (N. Virginia) - US West (Oregon) This means that any EU healthcare organisation deploying HealthLake must transfer patient records to US-hosted AWS infrastructure. Every Patient resource. Every Observation. Every Condition, Medication, and DiagnosticReport. The entire patient record crosses an Atlantic data transfer every time a FHIR resource is written. **Standard Contractual Clauses do not solve the CLOUD Act problem for health data.** SCCs are a transfer mechanism — they authorise the transfer under Art.44 by creating contractual obligations between the EU data exporter and the US data importer (AWS). But SCCs cannot override US law. The Schrems II judgment (C-311/18, July 2020) established that SCCs are insufficient where the law of the destination country enables access to transferred data in a manner that conflicts with EU fundamental rights. For health data specifically, this creates a layered problem: 1. The transfer itself requires Art.44 authorisation (SCCs or adequacy decision) 2. Even with SCCs in place, the US CLOUD Act can compel AWS to disclose the data 3. CLOUD Act compelled disclosure of patient health records is not authorised under any Art.9(2) legal basis in GDPR 4. The EU-US Data Privacy Framework (adequacy decision, 2023) applies to commercial data transfers between US companies certified under DPF — it does not eliminate CLOUD Act compelled access, and it does not create a special carve-out for health data The practical result: there is no currently available legal mechanism that authorises both (a) the transfer of EU patient health data to US-hosted HealthLake and (b) protection against CLOUD Act compelled disclosure of that data. These two requirements are legally incompatible under the current framework. **Healthcare-specific DPAs** (Data Protection Authorities) in Germany (Bundesbeauftragter für den Datenschutz und die Informationsfreiheit), France (CNIL), and the Netherlands (AP) have all issued guidance indicating that transfers of health data to US infrastructure require particularly careful analysis, and that the CLOUD Act represents a systemic risk that SCCs cannot address. The German DSK (Datenschutzkonferenz) position on cloud services with US parent companies is explicit: cloud services that are subject to US surveillance laws cannot be used for particularly sensitive personal data without additional technical measures (end-to-end encryption under controller-held keys, with key management outside US jurisdiction). --- ## GDPR Failure Vector 2: Article 9 Health Data at Scale **Article 9** prohibits processing special category personal data — including data concerning health — without explicit consent or one of the Article 9(2) legal bases. In clinical contexts, the applicable bases are typically Art.9(2)(c) (vital interests), Art.9(2)(h) (medical treatment), or Art.9(2)(j) (scientific research with appropriate safeguards). The FHIR data model maps directly onto Article 9 categories. A representative HealthLake deployment stores: **Patient resources** contain: name, date of birth, gender, address, phone number, national identifier (NHS number, Krankenversichertennummer, BSN), and deceased indicator. Every Patient resource is an identified natural person. **Condition resources** contain: clinical diagnoses coded to ICD-10 or SNOMED CT, clinical status (active/resolved/recurrence), verification status (confirmed/provisional/differential), and onset date. A single Condition resource discloses a specific diagnosis for an identified patient. **Observation resources** contain: laboratory test results (blood counts, metabolic panels, tumour markers, genetic test results, HIV viral load, hepatitis serology), vital signs (blood pressure, weight, BMI), smoking status, pregnancy status, and social history observations. Each Observation is linked to a Patient resource and a specific clinical encounter. **MedicationRequest resources** contain: prescribed medication name, dosage, route, frequency, and prescribing clinician. The medication itself often reveals diagnosis (HIV antiretrovirals indicate HIV status; antipsychotics indicate psychiatric diagnosis; chemotherapy agents indicate malignancy). Medication records are a particularly sensitive class of health data because they are highly inferential. **DiagnosticReport resources** contain: imaging study results, pathology reports, genetic panel results, and the clinical interpretation. A pathology report disclosing cancer staging or a genetic panel revealing BRCA mutation status represents some of the most sensitive personal data in existence. **Immunization resources** contain: vaccine type, administration date, lot number, and site. In the context of COVID-19 and future pandemic preparedness infrastructure, immunization records at population scale are particularly sensitive. The challenge for GDPR compliance is not that HealthLake stores this data — the clinical purpose (treatment, care coordination, population health) provides Art.9(2) legal bases. The challenge is that Art.9 obligations attach to **every downstream use of the data**: - Analytics queries that identify patient cohorts are processing Art.9 data under potentially different legal bases than the original clinical purpose - NLP extraction via Comprehend Medical creates new structured health data from unstructured clinical notes — this new data requires its own Art.9 analysis - FHIR export to S3 for ML training requires explicit legal basis under Art.9(2)(j) with appropriate safeguards (pseudonymisation, minimisation) - Cross-population queries that link HealthLake data with other datasets (claims data, genomics, wearables) require Art.9 analysis for each linking activity Under CLOUD Act compulsion, a US federal warrant could compel AWS to produce not just individual patient records, but bulk exports from a HealthLake datastore — including the structured output of NLP analysis across thousands of patients' clinical notes. --- ## GDPR Failure Vector 3: CLOUD Act Compelled Disclosure of Patient Records **The Clarifying Lawful Overseas Use of Data Act (CLOUD Act, 2018)** authorises US federal law enforcement to compel US-based cloud providers to produce stored data, including data stored on infrastructure outside the United States. AWS is a US-incorporated entity subject to the CLOUD Act. For patient health data, CLOUD Act compelled disclosure creates a problem that no contractual mechanism resolves: **The CLOUD Act warrant targets the cloud provider, not the data controller.** AWS receives the warrant, not the healthcare organisation. The healthcare organisation may not be notified. The healthcare organisation's data processing agreements with AWS do not contain provisions that override CLOUD Act warrants — the processor is obligated to comply with US law regardless of what the controller's DPA says. **Healthcare-specific US law compounds the risk.** Health data in the US is regulated by HIPAA, which contains law enforcement exception provisions. AWS's HIPAA Business Associate Agreement (BAA) — which healthcare customers typically sign to use HealthLake — does not override federal warrant authority. The healthcare BAA enables HIPAA-compliant processing between the organisation and AWS; it does not create a legal barrier to CLOUD Act compelled access. **The investigative target need not be the patient.** A CLOUD Act warrant targeting a healthcare organisation, a pharmaceutical company, a medical device manufacturer, or even an individual clinician could compel production of HealthLake data containing records of thousands of unrelated patients. The scope of a CLOUD Act warrant is determined by US law, not by GDPR proportionality principles. **GDPR does not recognise CLOUD Act warrants as a legal basis for health data disclosure.** Under Art.9(2), there is no legal basis that reads "US federal law enforcement requested the data." The healthcare organisation's DPIA, the Art.9(2) legal basis documentation, and the Art.28 processor agreement all presuppose that health data will be processed for the documented clinical or research purposes — not disclosed to US law enforcement under foreign compulsion. The data subject's rights under Art.15-22 (access, rectification, erasure, restriction, portability, objection) are also affected: the data subject has no mechanism to exercise rights against CLOUD Act disclosure, because the compulsion runs from US courts to AWS, without the healthcare organisation as an intermediary. --- ## GDPR Failure Vector 4: Article 17 Erasure in a FHIR-Based System **Article 17** grants data subjects the right to erasure — the "right to be forgotten" — under specified conditions, including withdrawal of consent and the absence of an overriding legitimate ground for continued processing. FHIR's data model creates structural erasure challenges that compound in a managed service like HealthLake: **FHIR resource versioning:** Every modification to a FHIR resource creates a new version. The FHIR specification requires that servers maintain version history to support audit trails. HealthLake maintains version history for resources in the datastore. Erasure of a Patient resource does not automatically delete the version history, references in other resources, or derived data created from the resource. **Reference integrity:** FHIR resources are heavily cross-referenced. A Patient resource is referenced by Condition, Observation, MedicationRequest, DiagnosticReport, Encounter, and dozens of other resource types. Erasing the Patient resource while leaving referencing resources creates broken references — but erasing all referencing resources first may be technically complex to orchestrate across a large patient record. **NLP-derived structured data:** Where HealthLake has run Comprehend Medical over clinical notes to extract structured entities, the resulting FHIR extensions contain derived data — new facts created from the original clinical text. Erasure of the source clinical note does not automatically erase the derived structured entities. **Analytics and export:** HealthLake integrates with S3 and Athena for analytics workflows. Data exported to S3 for analytics or ML training may exist independently of the HealthLake datastore. A HealthLake erasure does not cascade to S3 exports, Athena query results, SageMaker training datasets created from HealthLake exports, or any downstream systems that received the data. **Legal retention requirements:** Healthcare records in most EU jurisdictions are subject to minimum retention periods (10 years in Germany under §630f BGB, 10 years in France, 30 years for mental health records in some jurisdictions). These retention requirements create tension with Art.17 erasure requests where the legal basis for retention (legal obligation under Art.17(3)(b)) must be balanced against the erasure request. Documenting and enforcing these retention rules across a HealthLake deployment requires additional governance infrastructure not built into the service. --- ## GDPR Failure Vector 5: Secondary Use Risk — HealthLake Analytics and ML Features **Article 5(1)(b)** establishes the purpose limitation principle: personal data collected for specified, explicit, and legitimate purposes must not be further processed in a manner incompatible with those purposes. For health data collected under Art.9(2)(h) (medical treatment), secondary use for analytics, research, or ML model training requires separate Art.9(2) legal basis. HealthLake's integrated analytics capabilities create secondary use risks that are often overlooked in DPIA documentation: **Population health analytics** enable querying across patient cohorts — identifying all patients with Type 2 diabetes, all patients over 65 with two or more chronic conditions, or all patients with a specific prescription pattern. These queries process Art.9 data for population-level purposes distinct from the individual treatment context in which the data was collected. **ML training via SageMaker integration:** HealthLake's native integration with Amazon SageMaker enables healthcare organisations to train ML models on patient data — for diagnosis prediction, treatment optimisation, or readmission risk scoring. Each ML training run is a new processing activity requiring Art.9(2)(j) legal basis with appropriate safeguards. The Art.29 Working Party (now EDPB) guidance on health research data is explicit: pseudonymisation alone is typically insufficient for health data ML training; re-identification risks must be assessed and mitigated. **Comprehend Medical NLP enrichment:** Running Amazon Comprehend Medical over clinical notes creates new structured data — named entities (conditions, medications, anatomy), relationships between entities, and inferred attributes — that did not exist in the original clinical record. This enrichment is a new processing activity. The NLP-derived data may reveal clinical facts not explicitly documented in the structured record (e.g., inferring a diagnosis from medication patterns in a clinical note). This derived data inherits Art.9 status. **HealthLake Insights (analytics layer):** The analytics layer aggregates data across the entire datastore for query and reporting. A query across 50,000 patient records processing Art.9 data at scale for operational efficiency (e.g., "identify patients due for screening") requires the same Art.9 legal basis analysis as a formal research study — but this operational analytics use is often not separately documented in DPIA inventories. --- ## EU Alternatives to AWS HealthLake The core requirement is a FHIR R4-compliant datastore running on EU-sovereign infrastructure, under a legal entity not subject to CLOUD Act compelled disclosure. Several mature options exist: ### HAPI FHIR Server (Self-Hosted) HAPI FHIR is the reference open-source implementation of the FHIR specification, maintained by the Health Informatics Skunk Works (HISW) at University Health Network in Toronto and contributed to by a global community. It implements FHIR R4 fully, including all standard search parameters, compartments, and operations. **Deployment:** HAPI FHIR runs as a Java Spring Boot application and can be deployed on any Kubernetes cluster or Docker host. A sota.io deployment provides EU-sovereign hosting with persistent storage backed by PostgreSQL or Apache Cassandra. **GDPR posture:** Controller-operated HAPI FHIR running on EU infrastructure under an EU legal entity has no CLOUD Act exposure. The software is open-source (Apache License 2.0); there is no US software vendor relationship that creates a processor DPA subject to US law. **Capabilities:** Full FHIR R4 support including FHIR Bulk Data Access (Flat FHIR / ndjson export), SMART on FHIR authentication, FHIR Subscription for real-time notifications, and GraphQL search. Mature, production-deployed at major health systems globally. **Considerations:** Operational responsibility rests with the deploying organisation. Scaling for large patient populations requires tuning the underlying database and search indexing. EU-based Kubernetes hosting (sota.io, OVHcloud Kubernetes, Hetzner Cloud) provides the infrastructure layer. ### Medplum Medplum is a developer-first FHIR platform — an open-source FHIR server with a built-in authentication layer (SMART on FHIR), a React component library for building healthcare applications, and a managed hosting option. **Deployment:** Medplum is fully open-source (Apache License 2.0) and self-hostable. The codebase is designed for cloud-native deployment on Kubernetes. EU healthcare organisations can deploy Medplum on EU-sovereign infrastructure, eliminating the US-hosted managed service dependency. **GDPR posture:** Self-hosted Medplum on EU infrastructure operates entirely within the controller's jurisdiction. No US parent company, no CLOUD Act exposure on the infrastructure layer. **Capabilities:** FHIR R4 server, built-in SMART on FHIR (OAuth2/OIDC), background job processing, Subscriptions, GraphQL, and a React component library (`@medplum/react`) for building FHIR-native UIs. Medplum is actively developed with a focus on developer experience — a closer HealthLake analogue in terms of developer ergonomics than HAPI FHIR alone. **Considerations:** Medplum is younger than HAPI FHIR; certain edge cases in the FHIR spec have less test coverage. Rapid development velocity. The managed cloud offering (cloud.medplum.com) is US-hosted — EU organisations should self-host for GDPR compliance. ### Smile CDR (Clinical Data Repository) Smile CDR is a commercial FHIR server built on the HAPI FHIR foundation, developed by Smile Digital Health (Canadian company). It provides enterprise-grade FHIR R4 and R5 support with additional governance, consent management, and audit trail features. **Deployment:** Smile CDR is self-hostable; EU healthcare organisations can deploy it on EU infrastructure. The commercial licence covers self-hosted deployment. No US cloud hosting dependency. **GDPR posture:** Smile Digital Health is a Canadian entity; Canadian data processing law is distinct from US law, and the Canadian Privacy Act does not include a CLOUD Act equivalent. Self-hosted deployment eliminates processor DPA concerns about the CDR infrastructure itself. **Capabilities:** FHIR R4 and R5, Consent Management (Art.7 GDPR consent tracking natively integrated), SMART on FHIR, CDS Hooks, FHIR Bulk Data, advanced audit logging, and federated FHIR query across multiple repositories. Enterprise support with SLAs. **Considerations:** Commercial licence (enterprise pricing). Higher operational complexity than HAPI FHIR for smaller deployments. Best suited for larger healthcare systems or regional HIE (Health Information Exchange) deployments. ### LinuxForHealth FHIR Server (formerly IBM FHIR Server) The LinuxForHealth FHIR Server is an open-source FHIR R4 server originally developed by IBM and contributed to the Linux Foundation. It is Java-based, optimised for enterprise workloads, and supports pluggable persistence backends including PostgreSQL. **Deployment:** Runs on any Java-capable infrastructure. EU-sovereign deployment on sota.io, IONOS, OVHcloud, or self-managed Kubernetes. Apache License 2.0. **GDPR posture:** Linux Foundation-governed open-source project. No US cloud service dependency for self-hosted deployment. No CLOUD Act exposure on EU-hosted infrastructure. **Capabilities:** FHIR R4 with extensive search support, batch transaction processing, FHIR Operations, $everything and $export operations, bulk data export (ndjson), and configurable auditing via CADF (Cloud Auditing Data Federation) events. Strong enterprise deployment experience from IBM healthcare customers. --- ## Comparison: AWS HealthLake vs EU-Sovereign FHIR Alternatives | Dimension | AWS HealthLake | HAPI FHIR (EU-hosted) | Medplum (EU self-hosted) | Smile CDR (EU self-hosted) | |---|---|---|---|---| | EU Region Available | No (US only) | Yes — any EU infrastructure | Yes — any EU infrastructure | Yes — any EU infrastructure | | CLOUD Act Exposure | Yes (AWS is US entity) | No (open-source, no US vendor) | No (self-hosted, no US dependency) | No (Canadian vendor, self-hosted) | | Art.9 Legal Basis Conflict | Yes (CLOUD Act disclosure) | No | No | No | | Art.44 Transfer Required | Yes (US region mandatory) | No | No | No | | Art.28 Processor DPA | AWS (US law governed) | N/A (self-hosted) | N/A (self-hosted) | Smile Digital Health (CA) | | FHIR R4 Compliance | Full | Full | Full | Full + R5 | | NLP/ML Integration | Native (Comprehend Medical) | External (self-managed) | External (self-managed) | External (self-managed) | | Managed Operations | Yes (fully managed) | No (self-operated) | No (self-operated) | No (self-operated) | | Consent Management | Limited | Third-party / custom | Limited native | Native (strong) | | Operational Overhead | Low (managed) | High | Medium | High | | Licence | Proprietary (AWS pricing) | Apache 2.0 (free) | Apache 2.0 (free) | Commercial | --- ## What the Migration Involves A migration from AWS HealthLake to EU-sovereign FHIR infrastructure involves several components: **FHIR Bulk Data Export:** HealthLake supports the FHIR Bulk Data Access specification (`$export` operation). This produces ndjson files (newline-delimited JSON) containing all FHIR resources by type. The export is the primary data migration mechanism — all FHIR R4 data can be exported without loss of clinical meaning, since FHIR is a standard format. **Re-import into EU FHIR server:** HAPI FHIR, Medplum, and Smile CDR all support FHIR bulk data import (ndjson). The imported resources will receive new server-assigned logical IDs; if external systems reference HealthLake resource IDs, identifier mapping is required during import. **Authentication migration:** If applications authenticate against HealthLake using AWS IAM or Cognito, the target EU FHIR server requires SMART on FHIR (OAuth2/OIDC) configuration. This is typically a configuration change for the FHIR client applications rather than an application code change, assuming they already support SMART on FHIR. **NLP-derived data:** Structured FHIR extensions created by Comprehend Medical NLP processing cannot be directly migrated as such — the enrichment was created by an AWS service, and the extensions may reference AWS-specific code systems or identifiers. The clinical note source data can be migrated, and equivalent NLP enrichment can be performed using EU-hosted alternatives (GermanBERT/medBERT for German clinical notes, French biomedical models for French clinical text). **S3 and Athena integrations:** HealthLake analytics pipelines built on S3 and Athena require migration to EU-sovereign equivalents. MinIO (S3-compatible, self-hosted) and Trino (Athena-compatible, self-hosted) can serve as drop-in EU-sovereign replacements for the analytics infrastructure layer. --- ## Deploying a GDPR-Compliant FHIR Server on sota.io [sota.io](https://sota.io) provides EU-native managed hosting with no US parent company and no CLOUD Act exposure. Deploying HAPI FHIR or Medplum on sota.io gives you: - **EU data residency:** Data at rest and in transit remains within EU jurisdiction under German data protection law - **No CLOUD Act:** sota.io operates under German law; no US entity has access to your infrastructure or data - **Persistent storage:** FHIR server deployments require persistent database storage — sota.io provides this natively without the operational overhead of managing your own PostgreSQL cluster - **GDPR-compliant DPA:** The processor data protection agreement is governed by EU law and explicitly addresses Art.9 health data processing - **Docker-based deployment:** HAPI FHIR and Medplum both ship as Docker images — sota.io deploys them directly from your registry without configuration translation For healthcare organisations moving from HealthLake's fully managed experience to a self-hosted FHIR server, the operational overhead reduction from using a managed EU hosting platform can be significant: no server provisioning, no SSL certificate management, no infrastructure monitoring — sota.io handles these at the platform layer while you retain full control of the FHIR configuration, data model, and access controls. --- ## The Healthcare Data Governance Question That HealthLake Leaves Unanswered The fundamental problem with HealthLake for EU healthcare organisations is not technical — it is jurisdictional. HealthLake is an architecturally sound FHIR implementation. The problem is that it operates on US infrastructure under a US legal entity's control, and the most sensitive personal data category in GDPR (Art.9 health data) cannot be placed there without legal contradictions that SCCs and DPAs cannot resolve. A DPIA for a HealthLake deployment that is honest about the CLOUD Act risk will document a residual risk that the healthcare organisation cannot mitigate through contractual means alone. The technical measure that would mitigate it — end-to-end encryption with controller-held keys, where HealthLake never sees the plaintext FHIR data — defeats the purpose of using HealthLake, since the service's core features (search, query, NLP analysis) require access to the plaintext data. EU-sovereign FHIR servers eliminate this contradiction. The data is stored where the healthcare organisation has full legal control. The processor DPA is governed by EU law. The CLOUD Act cannot reach the data. And the FHIR standard ensures that the data model, interoperability, and downstream analytics capabilities are identical to what HealthLake provided — without the jurisdictional encumbrance.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.

Join the waitlist View plans