AWS DataExchange EU Alternative 2026: Art.6 Third-Party Data Legal Basis, CLOUD Act on Licensed Datasets, and the Data Marketplace GDPR Problem
Post #798 in the sota.io EU Compliance Series
AWS DataExchange is a managed marketplace that lets organizations find, subscribe to, and use third-party data products directly within AWS. The model is operationally compelling: instead of negotiating data licensing agreements, managing file transfers, and building custom data ingestion pipelines, you browse a catalog of over 3,500 data products from more than 300 providers, subscribe with a few clicks, and the data lands directly in your S3 bucket — ready for Athena queries, SageMaker training, QuickSight dashboards, or any other AWS analytical workflow.
The data products available on DataExchange span financial market data (stock prices, options chains, alternative investment signals), location and mobility data (foot traffic, device movement patterns), demographic and consumer data (household income estimates, purchasing behavior profiles), healthcare data (clinical trial results, prescription analytics), and industry-specific datasets across logistics, weather, and media consumption. Many of these categories — location data, demographic profiles, healthcare analytics, consumer behavior — contain or are derived from personal data about identified or identifiable EU residents.
It is not in the AWS European Sovereign Cloud service catalog.
For European organizations using DataExchange as data subscribers, the data provider relationship adds a GDPR compliance layer that the marketplace model structurally underserves. The subscribing organization becomes a data controller for any personal data in the licensed product, inheriting a set of GDPR obligations that begin the moment the data product is delivered to their S3 bucket — obligations that the DataExchange marketplace design does not help them identify, document, or fulfill.
What AWS DataExchange Does
AWS DataExchange provides a managed data licensing marketplace with direct AWS integration.
Core components:
- Data Catalog: A searchable catalog of data products from third-party providers across industries. Products are categorized by industry (financial services, healthcare, media, retail), data type (time-series, geospatial, images, machine learning models), and delivery method. Providers include established data companies (Bloomberg, Refinitiv predecessor data, TransUnion, Dun & Bradstreet) and specialist alternative data providers.
- Data Products: A data product is a licensable unit consisting of one or more datasets. A dataset is a collection of revisions — versioned snapshots of the data. Each revision contains assets (S3 objects, Redshift Datashare references, Amazon API Gateway endpoints, or AWS Lake Formation references). Providers configure the revision schedule (daily, weekly, real-time API access) and the delivery mechanism.
- Subscriptions: When a subscriber purchases a product, DataExchange delivers the current revision's assets directly to the subscriber's S3 bucket or grants access to the provider's Redshift data share or API endpoint. Subsequent revisions are delivered automatically according to the product's update schedule. The subscription is a direct data pipeline from provider to subscriber, bypassing file transfer logistics.
- AWS Direct Query: A newer DataExchange feature allowing subscribers to query provider data directly via SQL without copying it to the subscriber's S3 bucket. The data stays in the provider's environment; the subscriber queries it through DataExchange's managed query layer. This changes the data transfer model but not the GDPR compliance obligations for data processed in query results.
- Data Grants: A free-sharing mechanism for public or partner data distribution. Organizations can publish data products for open access without pricing configuration.
Scale context: A retail analytics firm subscribes to a DataExchange product containing anonymized consumer purchase behavior profiles derived from credit card transaction data — covering 80 million US residents and 12 million EU residents whose transaction patterns were captured by the data provider's merchant network. The data product arrives in the firm's S3 bucket as a structured CSV file with household-level demographic segments, purchase category propensities, and geographic mobility scores. The data is "anonymized" in the US provider's framing but aggregated to household level with location data — representing identified or identifiable natural persons under GDPR's broader definition of personal data.
ESC status: AWS DataExchange is not in the AWS European Sovereign Cloud service catalog. The data marketplace — through which EU personal data flows from providers to subscriber organizations — operates without ESC-level data residency guarantees or operator access restrictions.
GDPR Issue 1 — Art. 6: Legal Basis Gap for Third-Party Personal Data
GDPR Art. 6 requires that every processing of personal data has a valid legal basis. When an organization subscribes to a DataExchange product containing personal data about EU residents, the subscribing organization becomes a data controller for that data — and must establish its own Art. 6 legal basis for every processing activity it performs with the licensed data.
The legal basis inheritance problem: The data provider's original legal basis does not transfer to the subscriber. If the provider collected the data under Art. 6(1)(f) legitimate interests (for its own analytics business), that legal basis authorizes the provider's processing — not the subscriber's subsequent use of the same data for the subscriber's purposes. The subscriber must independently satisfy Art. 6 for each processing purpose.
The typical DataExchange subscriber processing chain:
- S3 storage: The data lands in the subscriber's S3 bucket. Storing personal data requires a legal basis. Art. 6(1)(b) (contract performance), Art. 6(1)(c) (legal obligation), or Art. 6(1)(f) (legitimate interests) may apply depending on the subscriber's use case — but the subscriber must document and justify the applicable basis.
- Analytics queries: Running Athena queries across the dataset to generate business intelligence. This is a distinct processing activity from storage, requiring its own legal basis analysis under Art. 6 and compatibility assessment under Art. 5(1)(b).
- ML training: Using the dataset to train models is a further processing activity. Art. 6(1)(f) might apply, but the EDPB's guidance on purpose limitation requires assessing whether ML training is "compatible" with the original collection purpose — a test that many DataExchange use cases would fail.
- Data enrichment: Joining the DataExchange dataset with the subscriber's own customer data to enrich internal profiles. This creates new personal data relationships and requires its own Art. 6 analysis.
The "anonymized" product problem: Many DataExchange products are marketed as "anonymized" or "aggregated" data, implying that GDPR does not apply. Under GDPR Recital 26, data is only truly anonymous if re-identification is reasonably impossible given all available means. Location data aggregated to neighborhood level, demographic profiles derived from transaction histories, or device movement data at postal code granularity may not satisfy this threshold — particularly when combined with the subscriber's own customer data or other available datasets. The subscriber inherits the anonymization risk assessment burden that the DataExchange marketplace does not perform for them.
What the DataExchange marketplace provides: The data product's listing page may include a description of the data and its intended use cases. It does not include: the original legal basis for data collection, the scope of data subjects covered, the geographic distribution of personal data subjects, or a GDPR-compliant representation about data subject rights obligations. The subscriber receives a data file and a licensing agreement covering commercial use terms — not the GDPR compliance documentation required to establish a valid Art. 6 basis for processing.
GDPR Issue 2 — CLOUD Act: US Jurisdiction Over Licensed Datasets
CLOUD Act (18 U.S.C. § 2713) requires US cloud providers to produce stored data in response to lawful US legal process regardless of where the data is physically stored. AWS DataExchange delivers data to subscriber S3 buckets. Once the data product containing EU personal data lands in an S3 bucket — even in an AWS EU region — it is stored by AWS, a US entity, under CLOUD Act jurisdiction.
The jurisdictional transfer at subscription time: When a European organization subscribes to a DataExchange product and the data is delivered to their EU-region S3 bucket, they have not moved the data outside CLOUD Act jurisdiction. The data is in AWS S3 in Frankfurt or Dublin — but AWS is a US corporation. A CLOUD Act demand can compel AWS to produce data from any S3 bucket regardless of bucket region. The EU region choice reduces latency and may satisfy certain national data residency requirements that focus on physical location — but it does not eliminate CLOUD Act jurisdiction.
The compound exposure from data products: A DataExchange dataset containing EU personal data has a CLOUD Act exposure that differs from a subscriber's own proprietary data. The data provider collected this data from EU data subjects — consumers, households, individuals — who did not contemplate that their data would flow through a US data marketplace and be stored in AWS infrastructure accessible to US law enforcement. The data subject's original consent (if consent was the legal basis) or the provider's legitimate interests assessment did not include CLOUD Act exposure as a risk.
Industry-specific severity:
- Financial data products containing EU investor transaction histories, creditworthiness profiles, or alternative investment behavior expose EU financial institution clients to regulatory risk — EU banking supervisors and the ECB have indicated that US law enforcement access to customer financial data held in US-cloud infrastructure requires careful legal analysis.
- Healthcare data products containing prescription analytics, clinical trial recruitment data, or medical claims information are Art. 9 special-category personal data. CLOUD Act access to special-category health data stored via DataExchange creates an Art. 9 legal basis conflict — there is no Art. 9(2) legal basis that authorizes disclosure under US law enforcement compulsion.
- Location and mobility data products containing EU resident movement patterns expose the subscribing organization to CLOUD Act demands that would produce detailed surveillance data about EU individuals — data whose collection may itself require careful legal basis analysis under Art. 6(1)(f) and EDPB guidance on location data.
DataExchange Direct Query: The newer "query without copying" model does not eliminate CLOUD Act exposure. Query results that return personal data are processed by AWS infrastructure. The subscriber's query results — potentially revealing personal data about EU individuals — are accessible to CLOUD Act demands even if the underlying dataset never left the provider's S3 bucket.
GDPR Issue 3 — Art. 14: Indirect Data Collection Notification Obligations
GDPR Art. 14 requires that when a controller obtains personal data not directly from the data subject (indirect collection), the controller must provide specific information to the data subject — including the source of the data, the purposes of processing, the legal basis, the categories of data, and the data subject's rights — within one month of obtaining the data (with exceptions for research and certain other purposes).
The DataExchange notification problem: When a European organization receives a DataExchange data product containing personal data about EU residents, Art. 14 requires that organization to notify each data subject within one month. For a dataset covering 12 million EU residents — not an unusual scale for a DataExchange product in the consumer data category — this notification obligation is practically impossible to fulfill.
The "impossible or disproportionate effort" exception: Art. 14(5)(b) exempts notification where it "proves impossible or involves a disproportionate effort." The Data Protection Authorities' guidance on this exception is narrow: it is not sufficient that notification is expensive or inconvenient. The controller must demonstrate that it is genuinely impossible to contact the data subjects — and where a dataset contains email addresses or postal addresses for the data subjects, notification is not impossible.
The disclosure requirement in lieu of notification: Where Art. 14(5)(b) applies (genuine impossibility), the controller must "take appropriate measures to protect the data subject's rights and freedoms and legitimate interests, including making the information publicly available." This means publishing the Art. 14 information on the organization's website, in its privacy policy, and potentially in additional public channels — not simply noting that notification was impractical.
Audit trail for Art. 14 compliance: The subscribing organization must document: the date the data was obtained, the scope of data subjects covered (by category and count), the Art. 14 notifications sent or the Art. 14(5)(b) exception justification, and the measures taken to protect data subjects in lieu of individual notification. AWS DataExchange provides no tooling for this compliance workflow. The marketplace transaction record shows when data was subscribed to — not the data subject count or the Art. 14 notification status.
The update cadence problem: DataExchange products with regular revision schedules (daily or weekly updates) create recurring Art. 14 obligations. Each new revision may introduce personal data for new data subjects not covered by the initial notification or exception documentation. Monitoring which data subjects are newly introduced with each data product revision — and triggering Art. 14 compliance workflows accordingly — requires custom governance infrastructure that DataExchange does not provide.
GDPR Issue 4 — Art. 5(1)(b): Purpose Limitation in Data Licensing
GDPR Art. 5(1)(b) requires that personal data be "collected for specified, explicit, and legitimate purposes and not further processed in a manner incompatible with those purposes." The data marketplace model — where data collected for one purpose is licensed for use in different contexts — is structurally in tension with purpose limitation.
The provider's collection purpose: A location data provider collects mobility data from mobile apps whose users have accepted location permissions for "improve app performance" or "show local content." The provider aggregates this data into mobility insights products. The data subjects' original purpose expectation — app performance improvement, local content — does not include having their movement patterns sold to insurance companies for actuarial risk modeling, to retailers for foot traffic analysis, or to political campaigns for voter targeting. The provider's purpose may be documented in their privacy policy; the subscriber's use case is not.
The purpose compatibility assessment: GDPR Art. 6(4) provides a compatibility framework: further processing for a different purpose is compatible if there is a "link" between the original and new purpose, the context and data subjects' reasonable expectations, the nature of the data (with special scrutiny for sensitive data), the consequences for data subjects, and the existence of appropriate safeguards. DataExchange does not support purpose compatibility documentation between providers and subscribers. The marketplace transaction records the commercial licensing terms, not the purpose compatibility analysis.
High-risk purpose mismatches in the DataExchange catalog:
- Credit-derived consumer segments → HR screening: Consumer creditworthiness data licensed for credit risk analytics may be used by an employer as an input to hiring decisions — a purpose entirely outside the data subjects' reasonable expectations when they applied for credit.
- Location mobility data → insurance underwriting: Individual movement pattern data licensed for retail analytics may be re-purposed for health insurance risk scoring — data subjects who accepted location tracking for retail convenience did not consent to insurance actuarial use.
- Healthcare claims analytics → pharmaceutical targeting: Clinical claims data licensed for public health research may be used by pharmaceutical companies to identify high-value patient populations for drug marketing — a purpose incompatible with the research context in which the data was originally shared.
The subscriber's documentation obligation: The subscribing organization must conduct and document an Art. 5(1)(b) compatibility assessment for each DataExchange data product before processing. This assessment must address the four factors in Art. 6(4) and document the conclusion. If the assessment concludes the use is incompatible with the original collection purpose, the subscriber must either obtain the data subjects' explicit consent (Art. 6(1)(a) with Art. 9(2)(a) for special category data) or discontinue the use.
GDPR Issue 5 — Art. 20: Portability Structural Impossibility in Multi-Subscriber Markets
GDPR Art. 20 grants data subjects the right to receive personal data they have provided to a controller "in a structured, commonly used and machine-readable format" and to transmit that data to another controller. For personal data that has been licensed through DataExchange to multiple subscriber organizations, Art. 20 creates a structural implementation problem.
The multi-subscriber portability gap: A data subject whose personal data appears in a DataExchange product has potentially had their data licensed to hundreds of subscriber organizations. Each subscriber organization is an independent data controller holding a copy of that data subject's personal data. The data subject's Art. 20 portability right theoretically applies against each subscriber — but the data subject typically does not know which organizations have licensed their data through DataExchange, cannot identify all controllers, and cannot practically exercise their portability right against the full subscriber universe.
The controller identification problem: Art. 20 requires the data subject to direct their request to "the controller." For DataExchange data, identifying which organizations have subscribed to a product containing your data requires access to the DataExchange subscription records — which are confidential commercial information held by AWS. The data provider may know who has licensed their product, but the data subject typically cannot discover this without the provider's cooperation or regulatory investigation.
The "provided by the data subject" scope limitation: Art. 20 technically applies only to data that the data subject has "provided" to the controller — arguably a narrower scope than all data the controller holds about the data subject. Location data inferred from device signals, demographic profiles derived from transaction pattern inference, and behavioral segments calculated from aggregated data may fall outside the "provided by" scope. But the legal interpretation varies across DPAs: the German DSK takes a broader view of "provided" that includes data generated by the data subject's activity, while some other DPAs take a narrower interpretation. DataExchange subscribers should not assume that derived data products are automatically excluded from Art. 20 scope.
Art. 17 erasure complexity: While Art. 20 portability creates structural implementation challenges, Art. 17 erasure for DataExchange-licensed data creates operational ones. A data subject who successfully identifies a subscriber organization holding their data and submits an Art. 17 erasure request requires the subscriber to delete all copies of that data — including in S3 buckets, Athena query result caches, SageMaker training datasets derived from the licensed product, and any downstream enrichment of the subscriber's own customer data. The DataExchange delivery model creates data spread across the subscriber's AWS account that is operationally complex to fully purge.
EU Alternatives
European data marketplaces and exchange platforms provide data commerce infrastructure under EU legal governance, eliminating CLOUD Act jurisdiction over licensed datasets while supporting GDPR compliance tooling for the data transaction lifecycle.
Dawex
Dawex is a French enterprise data exchange platform founded in 2015 and headquartered in Lyon. It provides a B2B data marketplace infrastructure for organizations to publish, license, and monetize data products — positioning itself as the European-sovereignty alternative to AWS DataExchange and Snowflake Data Marketplace.
GDPR-native design: Dawex was designed from the outset for GDPR compliance in data transactions. The platform includes:
- Standardized data licensing contract templates that include GDPR Art. 28 processor clauses for data products involving personal data
- Data product metadata requirements that document data subjects' legal basis, geographic coverage, and applicable rights
- Automatic generation of Art. 14 notification templates for data subscribers receiving products containing personal data
- Audit trails for data licensing transactions that support Art. 30 Records of Processing Activities
EU legal governance: Dawex operates as a French company under French and EU law. No CLOUD Act exposure. Data stored on Dawex infrastructure uses EU-based cloud providers. The Dawex platform itself has no US cloud provider dependency.
Market focus: Dawex serves enterprise data commerce — financial data, industry data, IoT sensor data, logistics data. It is used by industrial consortia, public sector data spaces, and enterprises building internal data marketplaces. Less coverage of consumer demographic and alternative data products compared to AWS DataExchange.
European Data Spaces (GAIA-X and Sector Data Spaces)
The EU's GAIA-X initiative and the associated sectoral data spaces (Agri-Food Data Space, Mobility Data Space, Energy Data Space, Health Data Space via the EHDS) provide EU-sovereign data exchange infrastructure for sector-specific use cases.
Data Spaces architecture: EU data spaces use a federated architecture where data remains with the provider and subscribers access it through standardized APIs and data connectors (the IDSA (International Data Spaces Association) connector architecture). Unlike a centralized marketplace where data moves to the subscriber, the data space model queries data in place — reducing CLOUD Act exposure because the data does not leave the provider's EU-sovereign infrastructure for the subscriber's environment.
GDPR posture: EU data spaces are designed to be GDPR-compliant by architecture. The federated access model means personal data does not need to be copied to the subscriber's infrastructure. Access control policies enforce purpose limitation at the API layer. GAIA-X's trust framework requires members to document data sovereignty compliance.
Practical accessibility: EU data spaces are still maturing as market infrastructure. The Mobility Data Space (MDS) is operational in Germany. The Agri-Food Data Space (Agri-GAIA) is in deployment. The EHDS is progressing through EU legislative implementation. For organizations whose data needs align with a mature sector data space, this is the highest-sovereignty option. For broader alternative data needs, Dawex or a Snowflake EU deployment is currently more practical.
Snowflake Data Marketplace (on EU-Hosted Snowflake)
Snowflake provides a data marketplace within the Snowflake platform — similar in concept to DataExchange, allowing providers to share data products with subscribers without data movement (the Direct Share model). Snowflake can be deployed with data residency in EU regions on AWS, Azure, or Google Cloud.
EU deployment for sovereignty: Snowflake itself is a US company (NYSE: SNOW) — CLOUD Act jurisdiction applies to the Snowflake SaaS service. However, Snowflake's Direct Share model means that marketplace data does not leave the provider's Snowflake account: the subscriber queries the provider's data through a shared view, with the actual data residing in the provider's storage account. For EU organizations where both provider and subscriber use EU-region Snowflake accounts, this limits the CLOUD Act exposure to query results rather than bulk data transfer.
Compliance features: Snowflake's governance capabilities include column-level security policies, dynamic data masking for personal data, and row-access policies that can enforce purpose limitation at the query layer. These features can be used to implement GDPR controls on marketplace data that DataExchange's S3-delivery model cannot support.
Limitation: Snowflake is a US company. For use cases requiring strict EU sovereignty (no US entity in the data chain), a self-managed data exchange built on EU-sovereign components (PostgreSQL + MinIO + custom API layer, or Apache Atlas + Kafka + HAPI FHIR for healthcare data) may be the only fully sovereign option.
Self-Hosted Data Exchange with EU-Sovereign Components
For organizations building internal data marketplaces or private data exchanges, EU-sovereign open-source components eliminate the US vendor dependency:
- Apache Atlas (governance and metadata management) + Apache Ranger (access control) on EU-hosted infrastructure provides a data catalog with lineage, classification, and purpose-limitation policy enforcement
- MinIO (S3-compatible object storage, EU-deployable, AGPL) as the data delivery layer
- Datahub (open-source data catalog from LinkedIn, Apache License) for dataset discovery and GDPR-relevant data lineage documentation
- Delta Sharing (open protocol by the Linux Foundation) for cross-organization data sharing without centralized marketplace infrastructure
This approach requires integration engineering investment but provides complete control over the GDPR compliance layer and eliminates any US-entity processor DPA concerns.
AWS European Sovereign Cloud Status
AWS DataExchange: NOT in the AWS European Sovereign Cloud service catalog.
The ESC catalog absence means DataExchange-delivered data products and the marketplace transaction infrastructure operate under standard AWS's US-governed access policies. For organizations in EU regulated industries (financial services under DORA, healthcare under EHDS, critical infrastructure under NIS2) that require data exchange infrastructure to meet sovereign cloud requirements, DataExchange is unavailable as a compliant option.
The ESC gap has particular significance for the emerging EU data economy use cases that DataExchange targets: EU financial data sharing under the Financial Data Access regulation (FIDA), EU energy sector data exchange under the Smart Systems Directive, and EU mobility data sharing under the European Mobility Data Space. These EU regulatory frameworks explicitly require data sovereignty measures that a non-ESC-eligible marketplace cannot satisfy.
GDPR Compliance Checklist for Data Marketplace Subscriptions
Before subscribing to any data product containing EU personal data:
-
Personal data classification: Request from the data provider a written statement of whether the data product contains personal data as defined by GDPR Art. 4(1), the categories of data subjects covered, and the geographic distribution of data subjects (EU vs. non-EU). If the provider cannot provide this, treat the data as potentially containing EU personal data and proceed accordingly.
-
Art. 6 legal basis documentation: Document the legal basis for each processing activity you intend to perform with the licensed data. For Art. 6(1)(f) legitimate interests, conduct and document a Legitimate Interests Assessment (LIA). Ensure each downstream use (storage, analytics, ML training, data enrichment) has its own legal basis entry in your Art. 30 Records of Processing Activities.
-
Art. 5(1)(b) purpose compatibility assessment: Document whether your intended use is compatible with the provider's original data collection purpose. If the provider cannot confirm the original collection purpose in writing, assume the use is incompatible and do not process unless you obtain explicit consent.
-
Art. 14 notification plan: Identify whether the Art. 14(5)(b) exception applies (genuine impossibility of individual notification). If it does, document the exception and implement the alternative measure (public disclosure). If individual notification is feasible, implement the notification workflow within one month of receiving the data. Include recurring Art. 14 obligations in your data product subscription governance process for products with regular revision schedules.
-
CLOUD Act TIA: For any DataExchange product delivered to AWS S3 (including EU-region S3), document a Transfer Impact Assessment addressing CLOUD Act jurisdiction. Assess the realistic probability of CLOUD Act demands for your use case and industry, the severity of impact on data subjects, and whether additional technical measures (subscriber-managed encryption with keys outside AWS) are feasible given the DataExchange delivery model.
-
Art. 17 erasure procedure: Document how you would respond to an Art. 17 erasure request for data subjects whose personal data appears in a licensed DataExchange product. Map all locations where the data exists in your AWS environment (S3, Athena query cache, SageMaker datasets, enriched internal records) and define the deletion procedure for each.
Conclusion
AWS DataExchange solves a real operational problem: finding, licensing, and integrating third-party data at scale without custom data ingestion pipelines. But the marketplace model — data products flowing from US-domiciled providers through AWS infrastructure to subscriber S3 buckets — creates a GDPR compliance architecture that neither providers nor subscribers are well-equipped to navigate. The Art. 6 legal basis gap for third-party personal data re-use, the CLOUD Act jurisdiction over licensed datasets, the Art. 14 indirect notification obligations, the Art. 5(1)(b) purpose limitation violations, and the structural impossibility of Art. 20 portability for multi-subscriber data products are all problems that DataExchange's marketplace design does not help subscribers identify, let alone resolve.
EU-sovereign alternatives — Dawex for enterprise data commerce, EU data spaces for sector-specific sharing, and Snowflake on EU-region deployments for SQL-native data sharing — provide data marketplace capabilities with EU legal governance over the transaction infrastructure. For the growing category of EU-regulated data sharing obligations (FIDA, EHDS, NIS2 supply-chain data), EU-sovereign marketplace infrastructure is not just preferable: it is the architecture that EU regulators will require.
Data sovereignty in the marketplace era is not just about where your data is stored. It is about the legal jurisdiction that governs the entire transaction chain — from data collection to licensing to subscriber use. AWS DataExchange puts that chain under US jurisdiction. EU alternatives keep it under yours.
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.