2026-05-02·12 min read·

AWS Neptune EU Alternative 2026: Graph Database, Social Graph Inference, and the CLOUD Act Problem

Post #778 in the sota.io EU Compliance Series

AWS Neptune is Amazon's fully managed graph database service, supporting both the Property Graph model (via Apache TinkerPop Gremlin and openCypher) and the RDF model (via SPARQL). Neptune powers knowledge graphs, fraud detection networks, social graphs, recommendation engines, and identity resolution systems. For teams modelling complex, highly-connected data on AWS, Neptune is often the default choice.

AWS operates Neptune in Frankfurt (eu-central-1) and Stockholm (eu-north-1). It provides the standard AWS GDPR Data Processing Addendum and Standard Contractual Clauses. Many architects treat this configuration as compliant.

The problem is structural — and for graph databases, it is more severe than for any other data store. Amazon Web Services, Inc. is incorporated in Delaware and is a wholly-owned subsidiary of Amazon.com, Inc., a US corporation headquartered in Seattle, Washington. The CLOUD Act (18 U.S.C. § 2713) requires every US-incorporated provider to comply with US government data orders regardless of where the servers sit. A valid CLOUD Act order can compel disclosure of your entire Neptune graph — nodes, edges, properties, inference outputs, and analytics results — without involving a European court, without advance notice, and potentially under a gag order.

But Neptune's GDPR risk is not merely jurisdictional. Graph databases are uniquely capable of generating new personal data through inference — revealing religion, political affiliation, sexual orientation, health conditions, and financial distress from relationship patterns alone. This creates a category of GDPR exposure that most DPIAs fail to assess: the inference surface of your graph schema.

This post analyses Neptune's GDPR and CLOUD Act exposure across its full service stack and maps the best EU-native managed graph database alternatives for 2026.


What Neptune Actually Stores — and What It Infers

Neptune's GDPR exposure surface extends far beyond the data you deliberately load into the graph. It includes data your graph schema generates through traversal.

Vertices (nodes): The entities in your graph — users, products, organisations, locations, concepts. Each vertex can carry arbitrary property values. In a social or identity graph, vertex properties typically include: email addresses, display names, account creation timestamps, authentication tokens, device identifiers, geolocation history, and behavioural signals (last seen, session counts, preference flags). All of these are personal data under GDPR Art.4(1).

Edges (relationships): The connections between entities. In graph modelling, edges carry semantic meaning: FOLLOWS, PURCHASED, LIVES_AT, WORKS_FOR, CONNECTED_TO, FLAGGED_AS. Edge properties can include timestamps, confidence scores, interaction weights, and metadata. The edge data structure is where graph databases diverge most significantly from relational databases for GDPR purposes.

Inferred personal data — the critical GDPR gap: A graph database does not merely store the data you insert. It creates conditions for inferring new personal data through traversal. Under GDPR Art.4(1), "personal data means any information relating to an identified or identifiable natural person." The European Data Protection Board's guidelines on automated decision-making clarify that inferred data — including data derived from processing other personal data — falls within this definition.

Concretely: a Neptune graph that stores user-to-user FOLLOWS relationships can infer, through graph traversal, the likely religious community, political affiliation, or sexual orientation of any user — from their community membership alone, without any explicit self-reported attribute. This is Art.9 special category data, generated by your graph, without the data subject having provided it.

Neptune Analytics: AWS's newer graph analytics layer adds vector search and machine learning capabilities to Neptune. Neptune Analytics can run PageRank, community detection (Label Propagation Algorithm, Louvain method), and similarity searches across your graph. These algorithms amplify the inference surface: community detection specifically identifies clusters that correlate with demographic and behavioural categories. Running community detection on a social graph is, for GDPR purposes, an automated profiling operation under Art.22.

Neptune Global Database: Like DynamoDB Global Tables, Neptune Global Database allows multi-region replication. Adding a US replica to a Neptune cluster containing EU personal data is an Art.44 international transfer. Neptune does not prevent you from doing this.


GDPR Art.9: The Graph Inference Risk

Graph databases present a specific and underanalysed GDPR Art.9 exposure that affects organisations whose graphs contain relationship data about natural persons.

GDPR Art.9 prohibits processing "special categories of personal data" — which include data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, health data, and data concerning sexual orientation — without explicit consent or another Art.9(2) basis.

The inference problem is this: you may be processing Art.9 data in your Neptune cluster without having inserted any Art.9 data. The relationships in your graph generate Art.9 inferences as a mathematical consequence of graph topology.

The canonical examples from academic and regulatory literature:

The EDPB's guidance on profiling (Guidelines 04/2019) explicitly notes that "profiles may reveal information about natural persons that they did not choose to share." This applies directly to graph inference. Your Neptune DPIA must assess not just what data you insert, but what data your graph queries and ML models can derive.

The DPIA gap in practice: Most DPIAs assess the data inventory — what personal data fields exist — and the transfer mechanism — do we use SCCs. Fewer DPIAs assess the inference surface — what new personal data does our graph topology generate. This gap is where Neptune creates hidden Art.9 exposure.


The CLOUD Act Exposure Pathway

The legal mechanism is 18 U.S.C. § 2713. AWS Inc. (Delaware) is the legal entity that operates Neptune. A CLOUD Act order directed at AWS requires compliance regardless of where the Neptune cluster physically runs.

For graph databases, the CLOUD Act exposure surface is larger than for relational databases because of what a graph reveals about relationships. A relational database exposes the records for specific identified users. A graph database exposes not just those records but the entire network of relationships — the social graph of everyone connected to the subject, their communities, their influence metrics, and the inferences the graph supports.

A CLOUD Act order targeting "all data related to user ID X" in a Neptune cluster can legally reach:

This neighbourhood-expansion property of graph queries is why law enforcement agencies find graph databases particularly valuable. It is also why CLOUD Act orders targeting graph databases can expose data about far more individuals than relational database orders for the same legal target.

AWS's own language: AWS's GDPR FAQ states that AWS will "scrutinise" requests and notify customers "where legally permitted." Gag-order provisions under 18 U.S.C. § 2705(b) apply to Neptune as they do to every AWS service. AWS cannot guarantee notification.

The SCC limitation: GDPR Art.46 Standard Contractual Clauses are insufficient where the recipient is subject to surveillance laws that override them — which the CLOUD Act does. The EDPB's Schrems II guidance explicitly states this. AWS's Supplementary Addendum acknowledges that CLOUD Act requests are binding under US law.


GDPR Art.17: Erasure in Graph Databases

GDPR Art.17 grants data subjects the right to erasure — the "right to be forgotten." Implementing Art.17 in a relational database is straightforward: delete the row. Implementing Art.17 in a graph database is structurally different and, in some architectures, incomplete.

Node deletion vs. edge orphaning: When you delete a vertex (user node) from a Neptune graph, you must also delete all edges connected to that vertex. Neptune supports vertex deletion with automatic edge cleanup via the g.V(id).drop() Gremlin traversal. However, if your application has replicated edge data to other systems (S3 exports, Neptune Analytics snapshots, DynamoDB projections), those copies may retain relationship data that references the deleted user — not by vertex ID, but by properties (username, email hash, phone hash) that survive the Neptune deletion.

Neptune Streams: Neptune Streams is Neptune's change-data-capture mechanism — a log of every mutation to the graph. Neptune Streams retains records for 7 days by default. When you delete a vertex, the deletion appears in the Stream — but the Stream also contains all previous ADD records for that vertex, including the original property values. A CLOUD Act order issued within 7 days of deletion can access the deleted data from the Stream. Neptune Snapshots retain the deleted data for longer.

Neptune Snapshots: Neptune creates automated snapshots based on your backup retention period (up to 35 days). A deleted vertex is absent from the primary graph but present in all snapshots taken before the deletion. Art.17 compliance technically requires deleting these snapshots or cryptographically shredding the deleted records within the snapshots — which Neptune does not support natively. This is the same backup-erasure paradox documented for AWS DynamoDB and AWS RDS.

Neptune Analytics snapshots: If you have run Neptune Analytics (graph algorithms, vector search) against your graph and stored the results — PageRank scores, community labels, embedding vectors — these outputs may encode personal data in derived form. Deleting the source vertex does not automatically delete the analytics outputs that were derived from it.

Practical compliance: Full Art.17 compliance in Neptune requires: (1) vertex and edge deletion with cascade, (2) Stream retention management, (3) snapshot rotation or cryptographic deletion, (4) analytics output invalidation, and (5) downstream system cleanup. Most Neptune deployments implement only step 1. The remaining steps require custom operational procedures that are rarely documented in DPIAs.


GDPR Art.22: Automated Decision-Making and Graph Traversal

GDPR Art.22 restricts "automated individual decision-making, including profiling, which produces legal effects concerning [the data subject] or similarly significantly affects [them]."

Graph databases are profiling engines. Their primary function is to traverse relationship networks and derive insights — recommendations, rankings, fraud scores, risk assessments, community memberships. When Neptune is used to:

...these are Art.22 profiling operations. The GDPR requires that users have the right to (a) explanation of the automated decision, (b) human review, and (c) challenge of the outcome.

The challenge for Neptune architectures is that graph traversal algorithms are not inherently explainable. A PageRank score or a Label Propagation community assignment does not have a single traceable cause — it is an emergent property of the graph topology. This creates structural tension with Art.22's explainability requirement.

Neptune Analytics specifically: Neptune Analytics supports ML-based graph processing — embedding generation, similarity search, and predictive analytics. ML-derived graph scores are even less explainable than purely structural graph metrics. A Neptune Analytics fraud score computed from graph embeddings cannot be explained in the natural-language terms Art.22 implicitly requires.

What architects should do: If Neptune is used for any automated decision-making, your DPIA must document (a) which graph algorithms produce decisions, (b) the Art.22 basis for each (consent, contract necessity, or explicit EU/Member State law), (c) how subjects can request human review, and (d) how the system supports explanation and challenge. Most Neptune deployments do not document this at the algorithm level.


Neptune Pricing and the Data Gravity Trap

Neptune's pricing model creates a data gravity effect that complicates migration to EU-native alternatives.

Neptune charges per graph query (Neptune I/O for provisioned clusters, Neptune Serverless for variable workloads). The per-query cost model incentivises organisations to co-locate query compute with their Neptune cluster — keeping Lambda functions, application servers, and analytics pipelines in AWS. This co-location dependency compounds over time as downstream services are built to assume Neptune as their graph backend.

Neptune uses a proprietary storage layer that is not compatible with standard open-source graph database storage formats. Migrating away from Neptune requires exporting your graph data in Gremlin or SPARQL format and re-importing it to the target system. For large graphs (billions of edges), this export-reimport process is operationally complex and may require extended maintenance windows.

The GDPR-specific concern: during migration, your personal data exists in both the Neptune cluster and the migration intermediary (typically S3 bulk export). This creates a temporary dual-jurisdiction exposure. Your DPIA should document the migration window as a distinct processing phase with its own risk assessment.


EU-Native Graph Database Alternatives

The EU-native graph database ecosystem has matured significantly. These options offer comparable graph processing capabilities without US-jurisdiction exposure.

Neo4j AuraDB on EU Infrastructure

Neo4j AuraDB is the managed cloud service for the Neo4j graph database. Neo4j, Inc. is headquartered in San Mateo, California — which means the standard AuraDB deployment has the same US-incorporation issue as Neptune.

However, Neo4j can be self-hosted on EU-jurisdiction infrastructure (Hetzner, OVH Cloud, Scaleway, IONOS) — and for organisations with sufficient operational capability, this is the closest functional equivalent to Neptune. Neo4j supports Cypher (the same query language as Neptune's openCypher mode), property graphs, ACID transactions, and a rich ecosystem of graph algorithms via the Graph Data Science library.

Self-hosting Neo4j on Hetzner (EU/DE jurisdiction) eliminates the CLOUD Act exposure. Your graph data sits on hardware operated by a German company with no US parent. A CLOUD Act order cannot compel Hetzner to disclose data.

MemGraph

MemGraph is a graph database designed for real-time streaming graph analytics. It is developed by MemGraph Ltd., a UK company, and can be deployed on EU infrastructure. MemGraph uses Cypher as its query language (directly compatible with Neptune's openCypher mode) and supports the Bolt protocol (Neo4j-compatible drivers). MemGraph is specifically optimised for high-throughput graph workloads and supports Python and C++ procedural extensions.

For teams replacing Neptune's real-time analytics use cases, MemGraph on Hetzner or OVH is the most direct functional equivalent. MemGraph's in-memory architecture provides lower query latency than Neptune for analytical workloads.

Apache AGE (A Graph Extension)

Apache AGE adds a graph processing layer to PostgreSQL, extending it with openCypher support. If your organisation already uses PostgreSQL on EU infrastructure, Apache AGE allows you to add graph processing to your existing database without introducing a new service dependency. Apache AGE supports property graphs via a SQL/openCypher hybrid query interface.

The trade-off: Apache AGE is less mature than Neo4j or MemGraph, and its performance on very large graphs (hundreds of millions of edges) is inferior to native graph databases. For knowledge graphs and moderate-scale social graphs, it is a viable option.

SurrealDB

SurrealDB is a multi-model database that supports document, relational, and graph data models in a single engine. SurrealDB is open-source (Apache 2.0 for the community version) and can be self-hosted on EU infrastructure. Its SurrealQL query language includes graph traversal capabilities via the -> and <- relation traversal syntax.

SurrealDB is particularly suitable for teams that want to combine graph, document, and relational data without running separate services. The trade-off is that SurrealDB's graph capabilities are less feature-complete than a dedicated graph database for complex traversal algorithms.

FalkorDB

FalkorDB is a graph database built on Redis modules, offering low-latency property graph queries via the Cypher query language. FalkorDB is open-source and self-hostable on EU infrastructure. It is particularly suitable for teams whose graph workload is latency-sensitive and moderate in scale.

RDF/SPARQL: Apache Jena Fuseki, Oxigraph

For teams using Neptune's RDF/SPARQL mode (knowledge graphs, semantic web applications, ontology-based reasoning), Apache Jena Fuseki and Oxigraph are the primary EU-deployable alternatives. Both are open-source, support SPARQL 1.1, and can be hosted on EU infrastructure. Apache Jena Fuseki is the more mature option with broader SPARQL feature support. Oxigraph is a Rust-based implementation optimised for performance and small deployment footprint.


When Neptune is the Wrong Choice for GDPR Architects

Neptune is the wrong choice when:

  1. Your graph contains social relationship data (user-to-user connections, community memberships, interaction histories) — the inference surface creates hidden Art.9 exposure that requires explicit DPIA assessment for every graph algorithm you run.

  2. Your use case involves Art.22 automated decisions — recommendations, fraud scores, risk assessments derived from graph traversal — and you cannot provide explainability or human review at the algorithm level.

  3. Your organisation cannot accept CLOUD Act exposure — including the neighbourhood-expansion effect where a single order can expose the graph relationships of uninvolved parties.

  4. You require complete Art.17 compliance — including backup erasure, Stream retention management, and analytics output invalidation — and cannot implement custom operational procedures for each.

  5. Your data subjects include EU residents who have not consented to US jurisdiction data processing — which, for Art.46 SCCs to be valid, requires a positive finding that SCCs provide essentially equivalent protection despite CLOUD Act.

Neptune is appropriate in contexts where: graph processing is applied to non-personal data (product catalogues, infrastructure dependency maps, logistics networks with no person-linked data) or where your DPO has documented a valid Art.9(2) basis for the inferred special category data and an Art.22 basis for all automated decisions.


Migration Path: Neptune to EU-Native Graph

A structured Neptune-to-EU migration typically follows this sequence:

Phase 1 — Graph export: Export your Neptune data using the Neptune bulk export feature, which generates Gremlin JSON Lines or RDF N-Quads files in S3. For large graphs, bulk export via neptune-export is more reliable than stream-based export.

Phase 2 — Data audit: Before migration, audit your exported data for inference-generating edges. Identify edge types that create Art.9 inference risk (community, affiliation, health-adjacent). Assess whether these edges require an Art.9(2) basis that you have not yet documented.

Phase 3 — Target provisioning: Deploy your EU-native graph database (Neo4j, MemGraph, SurrealDB, or FalkorDB) on EU infrastructure. For Hetzner, the Frankfurt (FSN1) or Nuremberg (NBG1) locations provide the lowest latency to EU-region AWS. For OVH, the Roubaix (RBX) or Gravelines (GRA) data centres are typically selected.

Phase 4 — Schema translation: Translate your Neptune schema (vertex labels, edge labels, property keys) to the target graph model. For openCypher-to-openCypher migrations (Neo4j, MemGraph, FalkorDB), schema translation is minimal. For SPARQL-to-SPARQL migrations (Fuseki, Oxigraph), the ontology structure typically transfers with minor adaptation. For Gremlin-to-openCypher migrations, the traversal language differences require query rewriting.

Phase 5 — Application cutover: Update your application layer to point to the EU-native graph endpoint. For Neo4j and MemGraph, the Bolt protocol driver is drop-in compatible if your Neptune application used the openCypher mode. For SPARQL applications, Apache Jena client libraries work with both Neptune and Fuseki.

Phase 6 — Neptune decommission: After validating the EU-native deployment, trigger Neptune's final snapshot, then delete the cluster. Delete Neptune Snapshots explicitly — they are not automatically removed when you delete the cluster. Delete any Neptune Streams data retained in the 7-day window. Delete Neptune Analytics snapshots and stored algorithm results.


Conclusion

AWS Neptune is a capable managed graph database with strong operational characteristics. But for EU organisations processing personal data in graph form, Neptune presents GDPR risks that go beyond the standard jurisdictional analysis applicable to other AWS services.

The inference risk — that your graph topology generates new Art.9 special category personal data through traversal — is specific to graph databases and is systematically underassessed in most DPIAs. The Art.17 erasure gap — snapshot retention and analytics output survivorship — creates structural non-compliance risk that requires custom operational procedures. The Art.22 explainability problem — graph algorithm outputs are not naturally explainable in the terms GDPR requires — creates compliance tension for any Neptune use case that produces automated decisions.

The CLOUD Act exposure amplifies all of these: because graph traversal can reveal the relationship networks of uninvolved parties, a single CLOUD Act order against your Neptune cluster can produce data about far more individuals than a comparable relational database order.

EU-native alternatives — Neo4j self-hosted on Hetzner, MemGraph, Apache AGE, SurrealDB, FalkorDB — now provide comparable graph processing capabilities without US-jurisdiction exposure. The migration path is operationally straightforward for most Neptune use cases.

For EU architects building or maintaining graph database systems that process personal data: Neptune's Frankfurt region is not a compliance solution. The jurisdiction question must be resolved at the corporate entity level, not the data centre level.


This post is part of the sota.io EU Compliance Series — practical GDPR and CLOUD Act analysis for European developers and architects.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.