2026-05-01·13 min read·

AWS Personalize EU Alternative 2026: Recommendation Engines, Behavioral Profiling, and GDPR Art.22

Post #740 in the sota.io EU Compliance Series

AWS Personalize is Amazon's managed machine learning service for building real-time recommendation systems. E-commerce platforms use it to power "customers also bought" and "recommended for you" widgets. Streaming services use it for content discovery queues. News platforms use it to personalize article feeds. HR platforms use it to surface relevant job postings.

Personalize abstracts away most of the ML complexity: you upload interaction datasets, choose a recipe (algorithm), train a model, and get a real-time recommendation API endpoint. The barrier to deploying a personalization engine dropped significantly when AWS launched Personalize in 2019.

Amazon runs Personalize in European regions: eu-west-1 (Ireland), eu-central-1 (Frankfurt), eu-west-3 (Paris), and eu-north-1 (Stockholm). Your interaction data — the historical record of what your users did — is stored in S3 buckets in those European regions. Many teams treat this as a compliant configuration.

It is not. Amazon Web Services, Inc. is a Delaware corporation headquartered in Seattle, Washington. The CLOUD Act (18 U.S.C. § 2713) compels US companies to produce data stored anywhere in the world when ordered by US authorities. A valid government order served on Amazon in Seattle can reach your Personalize training datasets in Frankfurt: every user ID, every product interaction, every click timestamp.

This matters more for Personalize than for most AWS services because Personalize is specifically designed to process behavioral data at scale — the same data that GDPR's profiling rules treat with heightened scrutiny.

What AWS Personalize Stores About Your Users

Personalize is built around three categories of data. All three involve personal data under GDPR.

Interaction Data: The Core Training Set

Interaction data is the heart of a Personalize deployment. Every recommendation model you train is built on interaction history: records of which users interacted with which items, when, and how.

AWS specifies four required fields for an interaction event:

Optional fields extend the dataset: EVENT_VALUE (numeric value, e.g., a rating or watch duration), IMPRESSION (list of items that were shown alongside the chosen item — capturing passive exposure), and arbitrary metadata columns you define.

This interaction dataset is stored in Amazon S3 and imported into a Personalize Dataset Group. During training and inference, Personalize reads from these S3 datasets. The S3 bucket and its contents live under the AWS (US) legal entity and are subject to CLOUD Act compulsion regardless of which AWS region hosts them.

A Personalize interaction dataset is a comprehensive behavioral profile of your users. For an e-commerce platform with 500,000 users and 24 months of history, it is a record of every product page viewed, every item added to cart, every purchase completed, every review submitted — linked to user IDs that may be trivially re-identifiable from purchase records or account data. Under GDPR Art.4(1), this is personal data. Under GDPR Art.4(4), it is profiling data — processed to evaluate personal aspects relating to economic situation, personal preferences, and behavior.

Item and User Metadata: Enriching the Profile

Personalize supports two additional dataset types that extend the behavioral record.

Item datasets describe the items being recommended: product categories, price ranges, content genres, publication dates, keyword tags, geographic availability. If your items are user-generated content, item metadata may itself contain personal data.

User datasets extend the user profiles used during recommendation: demographic segments, account tenure, subscription tier, geographic location, language preferences. User datasets add structured personal data on top of the behavioral interaction history.

AWS stores all three dataset types in S3, imported into the Personalize Dataset Group. The Dataset Group is a logical container — not a data boundary. All data it references remains in S3 under AWS's (US) legal jurisdiction.

Trained Models: Behavioral Profiles Encoded as Parameters

When Personalize trains a model, it produces a Solution Version — the trained model artifact. This artifact encodes learned representations of your users' behavioral patterns. The model parameters are not raw personal data, but they represent distilled behavioral profiles derived from your users' interaction history.

AWS stores Solution Versions in Personalize infrastructure under the AWS (US) legal entity. You cannot export Solution Version artifacts — they live in AWS's internal model storage and are invoked via the Personalize runtime API.

This creates an unusual situation: even if you deleted your training data from S3, the behavioral patterns of your users remain encoded in Personalize's model storage, under US jurisdiction, until you explicitly delete the Solution Version.

Real-Time Event Ingestion: Ongoing Behavioral Tracking

Personalize supports a real-time event API (PutEvents) that records user interactions as they happen. Rather than batch-importing historical data, you stream behavioral events from your application to Personalize in real time.

Real-time events flow through AWS's Personalize API endpoints — US-controlled infrastructure — before being applied to model inference and potentially persisted to the event tracker dataset. Every click, view, add-to-cart, or search query that your application records in real time transits AWS's (US) infrastructure, regardless of which AWS region you target.

The GDPR Art.22 Problem: Automated Decision-Making Based on Profiling

Most GDPR analyses of personalization engines focus on consent for behavioral tracking. The more structurally significant issue is Art.22.

Art.22(1): The Automated Decision-Making Restriction

GDPR Art.22(1) provides:

The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal or similarly significant effects concerning him or her.

Recommendation engines are designed to produce decisions: which items a user sees, in which order, and which items are surfaced as "recommended for you." Whether a given recommendation rises to "legal or similarly significant effect" depends on context.

For a Netflix-style content platform recommending films, the significance threshold is probably not met — seeing film A instead of film B in position 1 of a feed is unlikely to have significant effects. For a platform recommending financial products, health services, job opportunities, or educational programs, the threshold is more likely met.

Even where Art.22(1) is not directly triggered, the GDPR working group (now EDPB) has consistently held that profiling for commercial purposes requires a documented lawful basis and is subject to Art.5(1)(a) fairness requirements — including non-discriminatory outcome requirements.

The Transparency Problem

Art.13 and Art.14 GDPR require that data subjects be informed, at collection time, of the purposes of processing and the logic of any automated decision-making that significantly affects them. A recommendation engine that uses 24 months of behavioral history to influence which items a user sees is subject to this transparency requirement.

Implementing Art.13/14 disclosure for a Personalize-based system requires documenting:

Most implementations skip this disclosure entirely or reduce it to "we personalize your experience" — an insufficient description under current EDPB guidance.

Art.5(1)(b) Purpose Limitation: The Cross-Service Aggregation Risk

AWS Personalize does not exist in isolation. A typical Personalize deployment aggregates behavioral data from:

Each of these services independently processes personal data. When data flows from RDS purchase records into Personalize training datasets, you are combining data collected under one purpose (order fulfillment) with processing under a different purpose (personalization of future recommendations). GDPR Art.5(1)(b) requires that this secondary use be compatible with the original collection purpose, or supported by explicit consent.

The AWS ecosystem's integration density makes purpose-scope creep invisible — data flows between services automatically, and the combined Personalize training dataset may aggregate behavioral signals from contexts the data subjects never anticipated would be linked.

Data Subject Rights Under a Personalize Deployment

GDPR grants data subjects several rights that create operational complexity for Personalize deployments.

Right to Erasure (Art.17): The Model Retraining Problem

When a user exercises their Art.17 right to erasure, you must delete their personal data from your systems. In a Personalize deployment, this means:

  1. Deleting the user's records from your S3 interaction dataset
  2. Deleting the user's records from your S3 user dataset
  3. Retraining the model — or accepting that the trained Solution Version still encodes the deleted user's behavioral patterns

Step 3 is the structural problem. Personalize does not support selective unlearning from a trained model. AWS's documentation notes that you should retrain models periodically, but periodic retraining is not the same as guaranteed erasure within the Art.17 timeframe (typically interpreted as "without undue delay," commonly operationalized as 30 days).

AWS's Personalize documentation does not address Art.17 compliance. There is no API to "forget" a specific user's contribution to a trained model.

Right of Access (Art.15): Reconstructing What Was Recorded

A data subject exercising Art.15 access rights is entitled to receive a copy of all personal data processed about them. In a Personalize deployment, this includes their complete interaction dataset records. Providing this requires:

There is no Personalize API to extract a single user's interaction records directly — you query your underlying S3 datasets, which may not have efficient per-user indexing if designed purely for ML training imports.

Right to Object to Profiling (Art.21)

GDPR Art.21(2) provides an absolute right to object to processing for direct marketing purposes that involves profiling. If your Personalize deployment powers product recommendations in a commercial context, users have an absolute right to object. You must be able to suppress processing for an objecting user — not just exclude them from recommendation API calls, but ensure their data is not used in future model training.

The CLOUD Act + GDPR Tension for Recommendation Data

The CLOUD Act issue is more acute for behavioral profiling data than for most categories of cloud-stored data.

Standard Cloud Act risk analysis focuses on government access to stored data. For Personalize, the risk has three dimensions:

Dimension 1: Training data access. Your S3 interaction datasets — a comprehensive behavioral record of your user base — are accessible to US authorities via a CLOUD Act order served on Amazon. This is the standard cloud jurisdiction risk.

Dimension 2: Model artifact access. Trained Solution Versions encode your users' behavioral patterns. These artifacts live in AWS's internal model storage, under US jurisdiction, for as long as you maintain the Solution. A CLOUD Act order could compel Amazon to provide access to trained model artifacts representing the behavioral profiles of your users.

Dimension 3: Real-time inference access. Every GetRecommendations API call passes through AWS infrastructure. The response — a ranked list of recommended items for a specific user — is a real-time behavioral signal about that user. Real-time inference data is transient, but it transits US-controlled infrastructure.

None of these three exposure vectors appears in a standard transfer impact assessment (TIA) template written for generic cloud storage. A Personalize-specific TIA needs to address all three.

EU-Native Recommendation Engine Alternatives

Several EU-native alternatives to AWS Personalize provide recommendation functionality without US jurisdiction exposure.

Recombee (Czech Republic, EU-Native)

Recombee is a recommendation-as-a-service platform incorporated and operated in the Czech Republic, within the EU. It provides:

Recombee supports all standard recommendation scenarios: e-commerce product recommendations, content personalization, search result reranking, and email campaign personalization. The API design is similar in concept to Personalize's runtime API, making migration feasible.

The key GDPR advantage: Recombee documents Art.17 erasure workflows explicitly. User deletion propagates through recommendation models without full retraining — through recommendation model architecture choices designed for GDPR compliance from the start.

Strands Recommender (Spain, EU-Native)

Strands is a recommendation engine vendor incorporated in Spain, operating within the EU. It targets retail and e-commerce use cases with:

Strands has operated in the EU market since 2004 and has GDPR-compliant DPAs available. No US parent company, no CLOUD Act exposure.

Self-Hosted: LightFM, Implicit, or Surprise (EU-Controlled Infrastructure)

For teams with ML engineering capacity, running a self-hosted recommendation model on EU-controlled infrastructure eliminates the third-party cloud jurisdiction problem entirely.

Libraries commonly used for self-hosted recommendation systems:

Hosted on EU-native cloud infrastructure (Hetzner, Scaleway, OVHcloud, Exoscale) or on your own servers, a self-hosted recommendation model keeps behavioral training data under your full control, under EU jurisdiction, with no third-party CLOUD Act exposure.

sota.io: EU-Native Deployment Layer for Recommendation Services

Whether you use Recombee, Strands, or a self-hosted recommendation service, the deployment layer matters. Running recommendation microservices, model-serving APIs, or behavioral data pipelines on US-based cloud infrastructure reintroduces the CLOUD Act problem at the infrastructure level — even if the ML library itself is EU-native.

sota.io is a PaaS platform for deploying containerized applications on EU-native infrastructure without US parent company exposure. Recommendation services, data pipelines, and model-serving APIs deployed on sota.io benefit from:

If you are migrating from AWS Personalize to a self-hosted recommendation model or an EU-native SaaS, the deployment environment for your model serving API needs the same jurisdiction analysis as the recommendation engine itself.

Migration Path: AWS Personalize to EU-Native Alternative

Step 1: Export Your Interaction Data

Before deleting your Personalize Dataset Group, export your interaction data from S3. This is your historical behavioral dataset — the training data that powers your recommendation model. You will need it to train a replacement model.

aws s3 cp s3://your-personalize-bucket/interactions/ ./interactions/ --recursive

Verify that you have complete interaction history. Personalize Dataset Groups can accumulate data from multiple import jobs — ensure you export from all S3 prefixes.

Step 2: Pseudonymize USER_IDs Before Transfer

If you are moving data to an EU-native service, consider applying pseudonymization to USER_IDs before the transfer. Replace raw user identifiers with deterministic hashes (SHA-256 + secret salt) maintained in a lookup table you control. This reduces the sensitivity of the interaction dataset during migration and limits re-identification risk.

import hashlib, hmac, csv

SECRET_KEY = b"your-secret-salt-from-secure-vault"

def pseudonymize_user_id(user_id: str) -> str:
    return hmac.new(SECRET_KEY, user_id.encode(), hashlib.sha256).hexdigest()[:32]

# Apply to all rows in your interaction CSV
with open("interactions.csv") as f_in, open("interactions_pseudonymized.csv", "w") as f_out:
    reader = csv.DictReader(f_in)
    writer = csv.DictWriter(f_out, fieldnames=reader.fieldnames)
    writer.writeheader()
    for row in reader:
        row["USER_ID"] = pseudonymize_user_id(row["USER_ID"])
        writer.writerow(row)

Step 3: Train Replacement Model on EU Infrastructure

Using your exported interaction data, train a replacement model using an EU-native library (LightFM, Implicit) on EU-controlled infrastructure.

from implicit import als
import scipy.sparse as sparse
import pandas as pd

df = pd.read_csv("interactions_pseudonymized.csv")

# Build user-item matrix
user_ids = df["USER_ID"].astype("category")
item_ids = df["ITEM_ID"].astype("category")
ratings = sparse.csr_matrix(
    (df["EVENT_VALUE"].fillna(1.0), (user_ids.cat.codes, item_ids.cat.codes))
)

model = als.AlternatingLeastSquares(factors=64, regularization=0.1, iterations=20)
model.fit(ratings)

# Persist model artifacts on EU-controlled storage
import pickle
with open("als_model.pkl", "wb") as f:
    pickle.dump({"model": model, "user_cat": user_ids.cat, "item_cat": item_ids.cat}, f)

Step 4: Deploy Model Serving API on EU-Native PaaS

Wrap your trained model in a FastAPI service and deploy on EU-native infrastructure:

from fastapi import FastAPI
import pickle, numpy as np

app = FastAPI()

with open("als_model.pkl", "rb") as f:
    state = pickle.load(f)

model = state["model"]
user_cat = state["user_cat"]
item_cat = state["item_cat"]

@app.get("/recommend/{user_id}")
def recommend(user_id: str, n: int = 10):
    if user_id not in user_cat.categories:
        return {"recommendations": [], "reason": "new_user"}
    
    user_idx = user_cat.categories.get_loc(user_id)
    ids, scores = model.recommend(user_idx, None, N=n, filter_already_liked_items=True)
    
    items = [{"item_id": item_cat.categories[i], "score": float(s)} for i, s in zip(ids, scores)]
    return {"recommendations": items, "user_id": user_id}

Deploy this container on sota.io with a Dockerfile and git-push deployment — keeping the entire inference path under EU jurisdiction.

Step 5: Update Privacy Documentation

After completing migration:

Compliance Checklist for Recommendation Engine Deployments

Before deploying any recommendation engine that processes behavioral data:

Conclusion

AWS Personalize is an effective managed recommendation engine, but it processes behavioral profiling data — interaction histories, user profiles, real-time event streams — under US jurisdiction via the CLOUD Act. For European applications subject to GDPR, this creates structural compliance problems that regional endpoint selection cannot resolve.

The GDPR issues are not limited to the standard data residency question. Art.22 automated decision-making obligations, Art.17 erasure-from-trained-models problems, Art.5(1)(b) purpose limitation for cross-service behavioral aggregation, and the real-time inference path through US infrastructure all require specific analysis and remediation.

EU-native alternatives — Recombee (Czech Republic), Strands (Spain), or self-hosted models deployed on EU-native PaaS — provide equivalent recommendation functionality without the US jurisdiction exposure. Migration is operationally achievable: export your interaction data, train a replacement model on EU-controlled infrastructure, and redeploy on an EU-native platform.

For other services in the AWS stack with the same structural CLOUD Act problem, see our analysis of AWS RDS, AWS S3, AWS Lambda, AWS Comprehend Medical, and AWS SES.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.