AWS Personalize EU Alternative 2026: Recommendation Engines, Behavioral Profiling, and GDPR Art.22
Post #740 in the sota.io EU Compliance Series
AWS Personalize is Amazon's managed machine learning service for building real-time recommendation systems. E-commerce platforms use it to power "customers also bought" and "recommended for you" widgets. Streaming services use it for content discovery queues. News platforms use it to personalize article feeds. HR platforms use it to surface relevant job postings.
Personalize abstracts away most of the ML complexity: you upload interaction datasets, choose a recipe (algorithm), train a model, and get a real-time recommendation API endpoint. The barrier to deploying a personalization engine dropped significantly when AWS launched Personalize in 2019.
Amazon runs Personalize in European regions: eu-west-1 (Ireland), eu-central-1 (Frankfurt), eu-west-3 (Paris), and eu-north-1 (Stockholm). Your interaction data — the historical record of what your users did — is stored in S3 buckets in those European regions. Many teams treat this as a compliant configuration.
It is not. Amazon Web Services, Inc. is a Delaware corporation headquartered in Seattle, Washington. The CLOUD Act (18 U.S.C. § 2713) compels US companies to produce data stored anywhere in the world when ordered by US authorities. A valid government order served on Amazon in Seattle can reach your Personalize training datasets in Frankfurt: every user ID, every product interaction, every click timestamp.
This matters more for Personalize than for most AWS services because Personalize is specifically designed to process behavioral data at scale — the same data that GDPR's profiling rules treat with heightened scrutiny.
What AWS Personalize Stores About Your Users
Personalize is built around three categories of data. All three involve personal data under GDPR.
Interaction Data: The Core Training Set
Interaction data is the heart of a Personalize deployment. Every recommendation model you train is built on interaction history: records of which users interacted with which items, when, and how.
AWS specifies four required fields for an interaction event:
- USER_ID — the identifier for the user who performed the action
- ITEM_ID — the identifier for the item that was interacted with
- TIMESTAMP — when the interaction occurred (Unix epoch)
- EVENT_TYPE — the type of interaction (click, view, purchase, rating, stream, etc.)
Optional fields extend the dataset: EVENT_VALUE (numeric value, e.g., a rating or watch duration), IMPRESSION (list of items that were shown alongside the chosen item — capturing passive exposure), and arbitrary metadata columns you define.
This interaction dataset is stored in Amazon S3 and imported into a Personalize Dataset Group. During training and inference, Personalize reads from these S3 datasets. The S3 bucket and its contents live under the AWS (US) legal entity and are subject to CLOUD Act compulsion regardless of which AWS region hosts them.
A Personalize interaction dataset is a comprehensive behavioral profile of your users. For an e-commerce platform with 500,000 users and 24 months of history, it is a record of every product page viewed, every item added to cart, every purchase completed, every review submitted — linked to user IDs that may be trivially re-identifiable from purchase records or account data. Under GDPR Art.4(1), this is personal data. Under GDPR Art.4(4), it is profiling data — processed to evaluate personal aspects relating to economic situation, personal preferences, and behavior.
Item and User Metadata: Enriching the Profile
Personalize supports two additional dataset types that extend the behavioral record.
Item datasets describe the items being recommended: product categories, price ranges, content genres, publication dates, keyword tags, geographic availability. If your items are user-generated content, item metadata may itself contain personal data.
User datasets extend the user profiles used during recommendation: demographic segments, account tenure, subscription tier, geographic location, language preferences. User datasets add structured personal data on top of the behavioral interaction history.
AWS stores all three dataset types in S3, imported into the Personalize Dataset Group. The Dataset Group is a logical container — not a data boundary. All data it references remains in S3 under AWS's (US) legal jurisdiction.
Trained Models: Behavioral Profiles Encoded as Parameters
When Personalize trains a model, it produces a Solution Version — the trained model artifact. This artifact encodes learned representations of your users' behavioral patterns. The model parameters are not raw personal data, but they represent distilled behavioral profiles derived from your users' interaction history.
AWS stores Solution Versions in Personalize infrastructure under the AWS (US) legal entity. You cannot export Solution Version artifacts — they live in AWS's internal model storage and are invoked via the Personalize runtime API.
This creates an unusual situation: even if you deleted your training data from S3, the behavioral patterns of your users remain encoded in Personalize's model storage, under US jurisdiction, until you explicitly delete the Solution Version.
Real-Time Event Ingestion: Ongoing Behavioral Tracking
Personalize supports a real-time event API (PutEvents) that records user interactions as they happen. Rather than batch-importing historical data, you stream behavioral events from your application to Personalize in real time.
Real-time events flow through AWS's Personalize API endpoints — US-controlled infrastructure — before being applied to model inference and potentially persisted to the event tracker dataset. Every click, view, add-to-cart, or search query that your application records in real time transits AWS's (US) infrastructure, regardless of which AWS region you target.
The GDPR Art.22 Problem: Automated Decision-Making Based on Profiling
Most GDPR analyses of personalization engines focus on consent for behavioral tracking. The more structurally significant issue is Art.22.
Art.22(1): The Automated Decision-Making Restriction
GDPR Art.22(1) provides:
The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal or similarly significant effects concerning him or her.
Recommendation engines are designed to produce decisions: which items a user sees, in which order, and which items are surfaced as "recommended for you." Whether a given recommendation rises to "legal or similarly significant effect" depends on context.
For a Netflix-style content platform recommending films, the significance threshold is probably not met — seeing film A instead of film B in position 1 of a feed is unlikely to have significant effects. For a platform recommending financial products, health services, job opportunities, or educational programs, the threshold is more likely met.
Even where Art.22(1) is not directly triggered, the GDPR working group (now EDPB) has consistently held that profiling for commercial purposes requires a documented lawful basis and is subject to Art.5(1)(a) fairness requirements — including non-discriminatory outcome requirements.
The Transparency Problem
Art.13 and Art.14 GDPR require that data subjects be informed, at collection time, of the purposes of processing and the logic of any automated decision-making that significantly affects them. A recommendation engine that uses 24 months of behavioral history to influence which items a user sees is subject to this transparency requirement.
Implementing Art.13/14 disclosure for a Personalize-based system requires documenting:
- Which interaction types are recorded and for how long
- That interaction data is processed by AWS (a US entity under CLOUD Act jurisdiction)
- The algorithm categories used (collaborative filtering, HRNN-based temporal models, etc.)
- How recommendations influence what the user sees and how they can request human review if Art.22 applies
Most implementations skip this disclosure entirely or reduce it to "we personalize your experience" — an insufficient description under current EDPB guidance.
Art.5(1)(b) Purpose Limitation: The Cross-Service Aggregation Risk
AWS Personalize does not exist in isolation. A typical Personalize deployment aggregates behavioral data from:
- AWS CloudFront or API Gateway — page view and navigation events
- AWS Kinesis — real-time event streaming into Personalize
- AWS DynamoDB — user account data linked to USER_ID
- AWS RDS — purchase and order records
- AWS S3 — historical interaction exports
Each of these services independently processes personal data. When data flows from RDS purchase records into Personalize training datasets, you are combining data collected under one purpose (order fulfillment) with processing under a different purpose (personalization of future recommendations). GDPR Art.5(1)(b) requires that this secondary use be compatible with the original collection purpose, or supported by explicit consent.
The AWS ecosystem's integration density makes purpose-scope creep invisible — data flows between services automatically, and the combined Personalize training dataset may aggregate behavioral signals from contexts the data subjects never anticipated would be linked.
Data Subject Rights Under a Personalize Deployment
GDPR grants data subjects several rights that create operational complexity for Personalize deployments.
Right to Erasure (Art.17): The Model Retraining Problem
When a user exercises their Art.17 right to erasure, you must delete their personal data from your systems. In a Personalize deployment, this means:
- Deleting the user's records from your S3 interaction dataset
- Deleting the user's records from your S3 user dataset
- Retraining the model — or accepting that the trained Solution Version still encodes the deleted user's behavioral patterns
Step 3 is the structural problem. Personalize does not support selective unlearning from a trained model. AWS's documentation notes that you should retrain models periodically, but periodic retraining is not the same as guaranteed erasure within the Art.17 timeframe (typically interpreted as "without undue delay," commonly operationalized as 30 days).
AWS's Personalize documentation does not address Art.17 compliance. There is no API to "forget" a specific user's contribution to a trained model.
Right of Access (Art.15): Reconstructing What Was Recorded
A data subject exercising Art.15 access rights is entitled to receive a copy of all personal data processed about them. In a Personalize deployment, this includes their complete interaction dataset records. Providing this requires:
- Querying your S3 interaction dataset for all records where USER_ID matches the requesting user
- Querying your S3 user dataset for the user's profile records
- Disclosing that this data has been processed by AWS (US entity) for model training
There is no Personalize API to extract a single user's interaction records directly — you query your underlying S3 datasets, which may not have efficient per-user indexing if designed purely for ML training imports.
Right to Object to Profiling (Art.21)
GDPR Art.21(2) provides an absolute right to object to processing for direct marketing purposes that involves profiling. If your Personalize deployment powers product recommendations in a commercial context, users have an absolute right to object. You must be able to suppress processing for an objecting user — not just exclude them from recommendation API calls, but ensure their data is not used in future model training.
The CLOUD Act + GDPR Tension for Recommendation Data
The CLOUD Act issue is more acute for behavioral profiling data than for most categories of cloud-stored data.
Standard Cloud Act risk analysis focuses on government access to stored data. For Personalize, the risk has three dimensions:
Dimension 1: Training data access. Your S3 interaction datasets — a comprehensive behavioral record of your user base — are accessible to US authorities via a CLOUD Act order served on Amazon. This is the standard cloud jurisdiction risk.
Dimension 2: Model artifact access. Trained Solution Versions encode your users' behavioral patterns. These artifacts live in AWS's internal model storage, under US jurisdiction, for as long as you maintain the Solution. A CLOUD Act order could compel Amazon to provide access to trained model artifacts representing the behavioral profiles of your users.
Dimension 3: Real-time inference access. Every GetRecommendations API call passes through AWS infrastructure. The response — a ranked list of recommended items for a specific user — is a real-time behavioral signal about that user. Real-time inference data is transient, but it transits US-controlled infrastructure.
None of these three exposure vectors appears in a standard transfer impact assessment (TIA) template written for generic cloud storage. A Personalize-specific TIA needs to address all three.
EU-Native Recommendation Engine Alternatives
Several EU-native alternatives to AWS Personalize provide recommendation functionality without US jurisdiction exposure.
Recombee (Czech Republic, EU-Native)
Recombee is a recommendation-as-a-service platform incorporated and operated in the Czech Republic, within the EU. It provides:
- Real-time recommendation API (collaborative filtering, content-based, hybrid models)
- User and item property support for context-aware recommendations
- A/B testing framework built into the recommendation API
- GDPR-compliant data deletion APIs with documented erasure workflows
- EU data residency with no US parent company and no CLOUD Act exposure
Recombee supports all standard recommendation scenarios: e-commerce product recommendations, content personalization, search result reranking, and email campaign personalization. The API design is similar in concept to Personalize's runtime API, making migration feasible.
The key GDPR advantage: Recombee documents Art.17 erasure workflows explicitly. User deletion propagates through recommendation models without full retraining — through recommendation model architecture choices designed for GDPR compliance from the start.
Strands Recommender (Spain, EU-Native)
Strands is a recommendation engine vendor incorporated in Spain, operating within the EU. It targets retail and e-commerce use cases with:
- Product recommendation widgets (frequently bought together, similar products, personalized feeds)
- Email personalization for triggered campaigns
- Real-time behavioral signal ingestion
- EU data residency and GDPR data processing agreements available
Strands has operated in the EU market since 2004 and has GDPR-compliant DPAs available. No US parent company, no CLOUD Act exposure.
Self-Hosted: LightFM, Implicit, or Surprise (EU-Controlled Infrastructure)
For teams with ML engineering capacity, running a self-hosted recommendation model on EU-controlled infrastructure eliminates the third-party cloud jurisdiction problem entirely.
Libraries commonly used for self-hosted recommendation systems:
- LightFM (Python) — hybrid matrix factorization, supports both collaborative and content-based signals. Runs on any CPU/GPU infrastructure.
- Implicit (Python) — fast collaborative filtering via ALS (Alternating Least Squares). Designed for large implicit feedback datasets (clicks, views, purchases without explicit ratings).
- Surprise (Python) — scikit-learn-compatible library for collaborative filtering algorithms including SVD, NMF, and KNN-based methods.
Hosted on EU-native cloud infrastructure (Hetzner, Scaleway, OVHcloud, Exoscale) or on your own servers, a self-hosted recommendation model keeps behavioral training data under your full control, under EU jurisdiction, with no third-party CLOUD Act exposure.
sota.io: EU-Native Deployment Layer for Recommendation Services
Whether you use Recombee, Strands, or a self-hosted recommendation service, the deployment layer matters. Running recommendation microservices, model-serving APIs, or behavioral data pipelines on US-based cloud infrastructure reintroduces the CLOUD Act problem at the infrastructure level — even if the ML library itself is EU-native.
sota.io is a PaaS platform for deploying containerized applications on EU-native infrastructure without US parent company exposure. Recommendation services, data pipelines, and model-serving APIs deployed on sota.io benefit from:
- EU jurisdiction throughout the stack (no US parent, no CLOUD Act)
- Git-push deployment for model serving containers
- Environment variable management for recommendation API keys and service credentials
- No behavioral data transiting US-controlled infrastructure during inference
If you are migrating from AWS Personalize to a self-hosted recommendation model or an EU-native SaaS, the deployment environment for your model serving API needs the same jurisdiction analysis as the recommendation engine itself.
Migration Path: AWS Personalize to EU-Native Alternative
Step 1: Export Your Interaction Data
Before deleting your Personalize Dataset Group, export your interaction data from S3. This is your historical behavioral dataset — the training data that powers your recommendation model. You will need it to train a replacement model.
aws s3 cp s3://your-personalize-bucket/interactions/ ./interactions/ --recursive
Verify that you have complete interaction history. Personalize Dataset Groups can accumulate data from multiple import jobs — ensure you export from all S3 prefixes.
Step 2: Pseudonymize USER_IDs Before Transfer
If you are moving data to an EU-native service, consider applying pseudonymization to USER_IDs before the transfer. Replace raw user identifiers with deterministic hashes (SHA-256 + secret salt) maintained in a lookup table you control. This reduces the sensitivity of the interaction dataset during migration and limits re-identification risk.
import hashlib, hmac, csv
SECRET_KEY = b"your-secret-salt-from-secure-vault"
def pseudonymize_user_id(user_id: str) -> str:
return hmac.new(SECRET_KEY, user_id.encode(), hashlib.sha256).hexdigest()[:32]
# Apply to all rows in your interaction CSV
with open("interactions.csv") as f_in, open("interactions_pseudonymized.csv", "w") as f_out:
reader = csv.DictReader(f_in)
writer = csv.DictWriter(f_out, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
row["USER_ID"] = pseudonymize_user_id(row["USER_ID"])
writer.writerow(row)
Step 3: Train Replacement Model on EU Infrastructure
Using your exported interaction data, train a replacement model using an EU-native library (LightFM, Implicit) on EU-controlled infrastructure.
from implicit import als
import scipy.sparse as sparse
import pandas as pd
df = pd.read_csv("interactions_pseudonymized.csv")
# Build user-item matrix
user_ids = df["USER_ID"].astype("category")
item_ids = df["ITEM_ID"].astype("category")
ratings = sparse.csr_matrix(
(df["EVENT_VALUE"].fillna(1.0), (user_ids.cat.codes, item_ids.cat.codes))
)
model = als.AlternatingLeastSquares(factors=64, regularization=0.1, iterations=20)
model.fit(ratings)
# Persist model artifacts on EU-controlled storage
import pickle
with open("als_model.pkl", "wb") as f:
pickle.dump({"model": model, "user_cat": user_ids.cat, "item_cat": item_ids.cat}, f)
Step 4: Deploy Model Serving API on EU-Native PaaS
Wrap your trained model in a FastAPI service and deploy on EU-native infrastructure:
from fastapi import FastAPI
import pickle, numpy as np
app = FastAPI()
with open("als_model.pkl", "rb") as f:
state = pickle.load(f)
model = state["model"]
user_cat = state["user_cat"]
item_cat = state["item_cat"]
@app.get("/recommend/{user_id}")
def recommend(user_id: str, n: int = 10):
if user_id not in user_cat.categories:
return {"recommendations": [], "reason": "new_user"}
user_idx = user_cat.categories.get_loc(user_id)
ids, scores = model.recommend(user_idx, None, N=n, filter_already_liked_items=True)
items = [{"item_id": item_cat.categories[i], "score": float(s)} for i, s in zip(ids, scores)]
return {"recommendations": items, "user_id": user_id}
Deploy this container on sota.io with a Dockerfile and git-push deployment — keeping the entire inference path under EU jurisdiction.
Step 5: Update Privacy Documentation
After completing migration:
- Update your ROPA (Art.30 Record of Processing Activities) to replace AWS as processor with your new EU-native vendor
- Update your Privacy Policy to reflect the new processing description, eliminating the CLOUD Act transfer risk section
- Update any existing SCCs/BCRs used to justify AWS transfer
- Conduct or update your DPIA if behavioral profiling for recommendation triggers Art.35
Compliance Checklist for Recommendation Engine Deployments
Before deploying any recommendation engine that processes behavioral data:
- Lawful basis documented — consent or legitimate interests for behavioral profiling, with legitimate interests assessment (LIA) if using Art.6(1)(f))
- Art.13/14 disclosure complete — users informed of automated profiling and the logic of recommendation systems that significantly affect them
- Art.22 assessment complete — determined whether your recommendation use case produces legally or similarly significant effects
- Art.17 erasure path tested — verified you can delete a specific user's interaction data and retrain or update the model within Art.17 timeframe
- Art.21 objection mechanism implemented — users in commercial contexts can opt out of profiling-based recommendations
- ROPA updated — recommendation engine vendor documented as processor with correct jurisdiction
- DPIA conducted — required if large-scale systematic behavioral profiling of data subjects (Art.35(3)(a))
- Processor DPA signed — with your recommendation engine vendor, confirming EU-compliant processing
- Transfer impact assessment — conducted if any component involves non-EU processing (covers training data, model artifacts, and real-time inference path)
Conclusion
AWS Personalize is an effective managed recommendation engine, but it processes behavioral profiling data — interaction histories, user profiles, real-time event streams — under US jurisdiction via the CLOUD Act. For European applications subject to GDPR, this creates structural compliance problems that regional endpoint selection cannot resolve.
The GDPR issues are not limited to the standard data residency question. Art.22 automated decision-making obligations, Art.17 erasure-from-trained-models problems, Art.5(1)(b) purpose limitation for cross-service behavioral aggregation, and the real-time inference path through US infrastructure all require specific analysis and remediation.
EU-native alternatives — Recombee (Czech Republic), Strands (Spain), or self-hosted models deployed on EU-native PaaS — provide equivalent recommendation functionality without the US jurisdiction exposure. Migration is operationally achievable: export your interaction data, train a replacement model on EU-controlled infrastructure, and redeploy on an EU-native platform.
For other services in the AWS stack with the same structural CLOUD Act problem, see our analysis of AWS RDS, AWS S3, AWS Lambda, AWS Comprehend Medical, and AWS SES.
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.