AWS Lex EU Alternative 2026: Conversational AI, Psychographic Profiling, and the GDPR Problem
Post #737 in the sota.io EU Compliance Series
AWS Lex is Amazon's managed conversational AI service. It powers chatbots and voice assistants by combining automatic speech recognition (ASR) with natural language understanding (NLU) to interpret user intent and drive dialogue flows. Teams building customer support bots, HR assistants, healthcare intake forms, legal FAQs, or internal knowledge bases are drawn to Lex for its tight integration with Lambda, Connect, and other AWS services — and its ability to handle both text and voice interactions with minimal speech engineering expertise.
That convenience conceals a GDPR exposure that is structurally different from most cloud AI services, and considerably more serious. A cloud object storage service receives files. A cloud TTS service receives text. But a conversational AI service receives the content of human conversations — the questions users ask, the answers they give, the concerns they express, the medical symptoms they describe, the legal problems they disclose. Conversation logs are behavioural records. In aggregate, they enable inference about the speaker that far exceeds the content of any individual message.
AWS Lex is operated by Amazon Web Services, Inc., a Delaware corporation headquartered in Seattle, Washington. The CLOUD Act (18 U.S.C. § 2713) allows US federal agencies to compel production of data controlled by US cloud providers regardless of the AWS region selected. Choosing eu-central-1 or eu-west-1 for Lex does not change the jurisdiction analysis — it changes the physical location of the servers, not the legal reach of US law.
This guide covers six GDPR exposure points that European teams must understand before routing user conversations through the Lex API.
What AWS Lex Actually Does
At its simplest, Lex exposes a RecognizeText or RecognizeUtterance API. Callers submit a user message — either as text or as audio — and Lex returns the detected intent, extracted slot values, and any prompt the bot should deliver next. The session state tracks conversation context across turns: which intent the user is pursuing, what slots have been filled, and what follow-up questions remain.
Beyond basic intent classification, Lex supports several features that significantly expand the data surface. Custom intents are trained on example utterances submitted by developers — utterances that frequently mirror the language real users employ when discussing sensitive topics. Slot types can be configured to extract specific data types from utterances: dates, numbers, names, addresses, medical codes, or free-form text. Voice input via the RecognizeUtterance endpoint submits audio to AWS for speech-to-text conversion before intent classification — adding biometric processing to the pipeline. Session attributes carry arbitrary key-value pairs across conversation turns, which developers routinely use to store user identifiers, authentication tokens, and contextual state.
Lex integrates natively with Amazon Connect for call centre deployments and with Amazon Kendra for retrieval-augmented dialogue. Both integrations extend the data surfaces and jurisdictions involved.
Exposure Point 1: Conversation Logs as Psychographic Records Under Article 9
The most legally significant risk in Lex is one that emerges over time rather than in any single API call: the accumulation of conversation logs that reveal special-category personal data through inference.
GDPR Article 9(1) protects health data, data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data, and data concerning a natural person's sex life or sexual orientation. None of these categories requires the data subject to have explicitly disclosed them. GDPR Recital 51 clarifies that special-category data includes information that "by their nature, context or by the use to which they are put, merit specific protection."
A series of apparently innocuous chatbot interactions can constitute Art.9 data in aggregate:
- A user who repeatedly asks a health chatbot about "managing stress," "sleep problems," and "feeling overwhelmed" is disclosing mental health indicators.
- A user who asks a legal chatbot about "asylum procedures," "refugee status," and "country of origin documentation" may be disclosing ethnic origin and political status.
- A user who asks an HR chatbot about "pregnancy leave entitlements," "maternity policies," and "part-time arrangements" is disclosing data about reproductive status.
AWS Lex logs conversation sessions — intents, utterances, slot values, and session attributes — in CloudWatch Logs by default when conversation logging is enabled. These logs, over time, constitute psychographic profiles: records of what users are concerned about, what decisions they are considering, what personal circumstances they are navigating. The European Data Protection Board has consistently held that behavioural profiling at scale constitutes high-risk processing requiring a DPIA under Article 35.
The structural problem: When these conversation logs sit in AWS CloudWatch under CLOUD Act jurisdiction, the profiling database that emerges is accessible to US law enforcement without prior judicial authorisation and without notification to the data subjects whose conversations created it.
Exposure Point 2: Voice Biometrics in the RecognizeUtterance Pipeline
When AWS Lex is used with voice input — whether via the RecognizeUtterance API directly or via Amazon Connect integration — the audio data submitted to AWS is biometric data under GDPR Article 4(14).
Voice recordings capture the speaker's unique physiological and behavioural characteristics: fundamental frequency, formant patterns, prosodic timing, and articulatory dynamics. These characteristics enable identification of the speaker with high confidence using voice recognition technology that is widely available. GDPR Article 9(1) designates biometric data processed for the purpose of uniquely identifying natural persons as a special category requiring explicit legal basis.
The voice data flow in a Lex voice deployment is: user speaks → audio captured → audio submitted to AWS ASR → transcript returned to Lex NLU → intent classified. At minimum, the audio is processed by AWS infrastructure. Whether AWS retains audio beyond the synthesis step is not fully documented. What is documented is that Amazon Connect, which routes voice calls to Lex, offers call recording as a standard feature — meaning that in many deployments, audio is explicitly retained in S3.
The Art.9(2) basis question: Processing biometric data for the purpose of uniquely identifying natural persons requires an explicit legal basis in Article 9(2). Most organisations deploying voice chatbots have not evaluated whether they have a valid Art.9(2) basis for the biometric processing that occurs when a user speaks to a Lex-powered assistant. If the legal basis is explicit consent under Art.9(2)(a), the consent must be specific, informed, and freely given — and must cover the transfer of voice data to AWS for processing.
Exposure Point 3: Custom Intent Training Data and Article 17 Erasure Gaps
AWS Lex allows developers to improve intent classification by providing example utterances — sample phrases that real users have submitted to the bot. Best practice for training NLU models is to use real-world utterances rather than synthetic examples, because real utterances better reflect the vocabulary, spelling variations, and phrasing that actual users employ. In practice, this means that production conversation logs become training data.
The feedback loop creates an Article 17 problem. When a user exercises their right to erasure under Article 17 GDPR, the data controller must erase all personal data relating to that individual. If the user's historical utterances have been incorporated into training data for the Lex custom intent model, erasure of the raw conversation log does not erase the influence of those utterances on the model weights. The trained model is not simply a copy of the training data — it is a transformation — but European data protection authorities have increasingly taken the position that models trained on personal data carry the obligation to demonstrate that erasure is effective, not merely that the source records have been deleted.
Lex's model versioning compounds this: each time a bot version is published and deployed, a snapshot of the trained model is persisted. Versions are not automatically deleted when training data is deleted. An organisation that erases a user's conversation records may still have Lex bot versions that were trained on those records.
The accountability gap: Article 5(2) requires that data controllers be able to demonstrate compliance with GDPR principles ("accountability"). For organisations using Lex custom intents trained on user data, demonstrating that Art.17 erasure is complete — covering both raw logs and trained model influence — is currently technically intractable without retraining or retiring the affected bot versions.
Exposure Point 4: Session Attributes as Persistent User State Under CLOUD Act
AWS Lex session attributes are key-value pairs that persist across conversation turns within a session. They are commonly used to carry authentication tokens, user identifiers, preference settings, and contextual state that the Lambda integration functions need to personalise responses. In more sophisticated deployments, session attributes carry aggregated user profile data fetched from backend systems — purchase history, service tier, case status, health record identifiers.
Session attributes are transmitted in every API call to Lex during a session. They appear in conversation logs when logging is enabled. In the architecture pattern where Lex is the front end and Lambda functions are the business logic layer, session attributes are the channel through which personal data flows through the entire conversational pipeline.
The CLOUD Act scope: The CLOUD Act's compelled production authority covers "data" stored or processed by a US provider. Session attributes in active Lex sessions and in conversation logs represent personal data under this authority. In a healthcare deployment where session attributes carry patient case numbers or authentication tokens linked to health records, a CLOUD Act request targeting a specific individual could expose session log data that provides a roadmap to their health record access patterns — without requiring production of the records themselves.
Exposure Point 5: Cross-Service Purpose Limitation Failures with Kendra and Connect
AWS Lex's native integrations with Amazon Kendra (retrieval-augmented responses) and Amazon Connect (voice routing) create cross-service data flows that multiply the jurisdiction exposure and introduce purpose limitation risks under GDPR Article 5(1)(b).
Lex + Kendra: When Lex uses Kendra to retrieve answers from a document corpus, the user's query is submitted to Kendra for semantic search. If the document corpus contains personal data — as HR knowledge bases and legal FAQ systems commonly do — the combination of query (from conversation log) and retrieved document (from Kendra) reveals what personal information the user was seeking. This is a purpose expansion: the conversation was collected for intent classification, and it is now being used to build a retrieval-augmented profile of user information-seeking behaviour.
Lex + Connect: Amazon Connect routes phone calls through Lex for IVR (Interactive Voice Response) flows. In a Connect deployment, caller telephone numbers are captured by Connect, and conversation audio is captured as a voice stream processed by Lex ASR. Telephone numbers are personal data. Audio is biometric data. The Connect-Lex integration routinely processes both simultaneously without explicit data minimisation controls.
Article 5(1)(b) compatibility test: Subsequent processing of personal data for a purpose compatible with the original collection purpose is permitted under GDPR. Processing conversation logs (collected for intent classification) to train retrieval models (a different purpose) fails the Art.5(1)(b) compatibility test unless the data controller can demonstrate that the purposes are compatible based on the factors in Art.6(4). Most organisations have not conducted this analysis.
Exposure Point 6: CLOUD Act on Conversation Logs in Customer-Facing Deployments
The CLOUD Act's practical impact is most severe for organisations that use Lex in customer-facing roles — customer support bots, onboarding assistants, compliance FAQs, patient intake forms. In these deployments, every user interaction generates a conversation record that includes the user's identity (directly or via session state), their stated problem or question, and the resolution or guidance provided.
For a CLOUD Act production request targeting a specific individual, the conversation log is a high-value record: it reveals what problems the person was experiencing, what information they sought, and in what context they were seeking it. In a legal context, this could reveal pending litigation strategy. In a healthcare context, it reveals conditions the patient was researching. In an HR context, it reveals what employment concerns an employee was investigating before taking formal action.
The Art.13/14 transparency gap: GDPR Articles 13 and 14 require that data subjects are informed of data transfers to third countries and the safeguards in place. When a customer service chatbot is powered by AWS Lex, the user is rarely informed that their conversation is being processed by a US cloud provider subject to CLOUD Act jurisdiction. Privacy notices that disclose "third-party service providers" without specifying AWS's US-controlled infrastructure do not satisfy the specificity requirement of Art.13(1)(f) and Art.14(1)(f).
EU-Native Conversational AI Alternatives
Rasa
Rasa is an open-source conversational AI framework with deep roots in the European tech ecosystem. Rasa NLU and Rasa Core — now unified as Rasa Open Source — were developed by Rasa Technologies GmbH, a German company, before the company pivoted to focus on the enterprise Rasa Pro product. The open-source framework remains MIT-licensed and actively maintained.
Rasa uses a pipeline architecture for NLU — entities, intents, and responses are defined in YAML configuration files, and models are trained locally on developer hardware or self-hosted infrastructure. Rasa's default training pipeline uses spaCy for featurisation and scikit-learn classifiers, with options for transformer-based models via HuggingFace. All training, inference, and conversation logging runs on infrastructure you control.
Deploying Rasa on sota.io means the entire conversational pipeline — intent classification, entity extraction, dialogue management, and conversation logging — runs within EU jurisdiction. There is no US data transfer and no CLOUD Act exposure.
Botpress
Botpress is an open-source chatbot platform developed by Botpress, Inc., a Canadian company. The self-hosted Community Edition is MIT-licensed and operates entirely on infrastructure you control. Botpress provides a visual flow builder, built-in NLU based on fastText and a custom intent classifier, and a multi-channel connector framework supporting web, Messenger, Slack, and Telegram.
Botpress's architecture stores conversation logs, user sessions, and trained NLU models in a PostgreSQL database that you manage. When deployed on EU infrastructure, all data remains within your jurisdiction. The NLU training pipeline does not submit data to any external API — it runs locally within the Botpress runtime.
For European enterprises migrating from Lex, Botpress offers a Lex-compatible flow concept (nodes, transitions, actions) that reduces migration friction. The open-source license and self-hosting requirement eliminate US jurisdiction exposure by construction.
Snips NLU (EU-Origin, Sonos Acquisition)
Snips was a French AI startup founded in Paris that built on-device voice AI with privacy as a core design principle. Snips NLU — their natural language understanding library — was acquired by Sonos in 2019, but the open-source library snips-nlu (Apache 2.0) remains available and maintained by the community.
Snips NLU is a pure NLU library — not a full conversational framework — but it provides state-of-the-art intent classification and entity extraction that runs entirely locally. For teams that need the NLU component without a full dialogue manager, Snips NLU provides EU-origin, self-hosted NLU that requires no external API calls. It supports English, German, French, Spanish, Italian, Portuguese, and Japanese.
The on-device design philosophy that defined Snips — process locally, never transmit — maps directly to GDPR data minimisation requirements. Snips NLU is the correct choice for organisations that need high-quality NLU on healthcare or legal data where any cloud transmission is unacceptable.
OpenDialog
OpenDialog is a UK-based open-source conversational AI platform (pre-Brexit UK, Scottish jurisdiction). It provides a graph-based dialogue management engine, NLU integration layer, and conversation logging framework. OpenDialog is designed for enterprise deployments where conversation flows are complex, compliance requirements are strict, and auditability of dialogue decisions is required.
OpenDialog's design explicitly targets regulated industries — healthcare, financial services, legal — where the data handled in conversations is inherently sensitive. The platform supports integration with various NLU backends, including Rasa and custom models, giving operators full control over the NLU processing tier.
DeepPavlov (Skoltech / EU Research)
DeepPavlov is an open-source conversational AI framework developed at the Moscow Institute of Physics and Technology and Skoltech. While its origin is not strictly EU-based, the framework is open-source (Apache 2.0), self-hosted, and subject to no US jurisdiction. For organisations focused on eliminating CLOUD Act exposure, self-hosted DeepPavlov running on EU infrastructure achieves the jurisdictional objective.
DeepPavlov supports Russian, English, and Chinese NLP pipelines with strong entity extraction and dialogue management capabilities. For German-language chatbots, the framework requires more configuration than Rasa or Botpress but offers strong underlying NLP capabilities.
Hosting Conversational AI on EU Infrastructure
Running any of these self-hosted conversational AI frameworks on sota.io keeps the entire pipeline — model inference, conversation logging, session state, training data — within EU jurisdiction. The GDPR obligations remain, but the CLOUD Act exposure is eliminated by design.
For production deployments handling Article 9 special-category data in conversational AI, the minimum viable compliance architecture includes:
- Self-hosted NLU: Deploy Rasa, Botpress, or Snips NLU on EU infrastructure — no data leaves your deployment for intent classification or entity extraction.
- EU-resident conversation logging: Store conversation logs in a PostgreSQL or MongoDB instance on EU infrastructure, not in CloudWatch or any AWS service.
- Retention policy enforcement: Configure automatic log deletion at the maximum retention period required for your use case. For chatbots handling health queries, this may be as short as the session duration.
- DPIA for psychographic profiling: If your chatbot accumulates conversation history across sessions per user, conduct a Data Protection Impact Assessment under GDPR Article 35. Psychographic profiling from conversation logs is explicitly high-risk processing.
- Art.9(2) basis documentation: If your chatbot handles health, legal, or other special-category topics, document the explicit legal basis for each special-category processing activity before deployment.
Conclusion
AWS Lex offers genuine developer productivity — managed infrastructure, tight AWS service integration, and voice processing without speech engineering expertise. For European organisations processing sensitive user conversations, that productivity comes with a structural GDPR problem that is not solvable through configuration or contractual addenda: conversation logs accumulated under CLOUD Act jurisdiction become a psychographic database accessible to US law enforcement, and voice input adds Article 9 biometric processing to every spoken interaction.
Rasa, Botpress, and Snips NLU are production-ready alternatives that run entirely on infrastructure you control. Deployed on sota.io — EU-native, GDPR-aligned managed infrastructure — they provide the same conversational AI capabilities as AWS Lex with the jurisdictional guarantee that your users' conversations remain under EU data protection law.
sota.io is EU-native managed infrastructure for teams that cannot afford GDPR surprises in their AI stack. Get started free.
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.