2026-05-01·12 min read·

AWS DataSync EU Alternative 2026: On-Premises Data Migration and the CLOUD Act Transfer Architecture Problem

Post #751 in the sota.io EU Compliance Series

AWS DataSync is Amazon's fully managed data transfer service for migrating, replicating, and synchronizing data between on-premises storage systems and AWS cloud storage services. It supports NFS shares, SMB shares, HDFS (Hadoop Distributed File System), self-managed object storage, and other on-premises file systems as sources. It can transfer data to Amazon S3, Amazon EFS (Elastic File System), Amazon FSx for Windows File Server, Amazon FSx for Lustre, Amazon FSx for NetApp ONTAP, and Amazon FSx for OpenZFS as destinations.

Organizations use DataSync for three primary scenarios: large-scale one-time migrations of on-premises file servers to cloud storage, ongoing replication of on-premises data to AWS for analytics or backup, and hybrid workflows where data is regularly synchronized between on-premises systems and cloud storage tiers. DataSync handles file transfer scheduling, bandwidth throttling, checksum verification, encryption in transit, and progress monitoring — eliminating the need to build and maintain custom transfer scripts.

The personal data implications are substantial. Enterprise file servers and NAS systems contain employee files, customer documents, financial records, and other data subject to GDPR. Healthcare organizations use DataSync to migrate patient record archives, clinical imaging, and diagnostic data from on-premises PACS systems to cloud storage. HR departments use it to move employee document stores to cloud archives. Legal teams migrate document review sets. In each case, DataSync becomes the transfer mechanism for large volumes of personal data moving between on-premises EU infrastructure and AWS cloud storage.

The structural GDPR problem: AWS DataSync requires deploying a DataSync Agent — a virtual machine appliance — in the source environment. This agent runs in the on-premises data center and bridges the connection between local storage and AWS. However, the agent is not self-managed: it is controlled by, reports to, and receives commands from the AWS DataSync service operated by Amazon.com, Inc., a US company subject to CLOUD Act compelled disclosure. Every task definition, execution log, transfer report, scheduling configuration, and filter rule — the complete record of what personal data moved from on-premises EU infrastructure to cloud storage, when, and how — is held in AWS-managed service state under US jurisdiction.

What AWS DataSync Actually Does

DataSync operates on a task model. A task connects a source location (an NFS share, an SMB share, an HDFS cluster, an on-premises S3-compatible endpoint) to a destination location (an S3 bucket, an EFS file system, an FSx instance). The task configuration specifies which files to include or exclude via filter rules, how to handle metadata preservation, whether to verify file integrity via checksums, bandwidth limits, and scheduling triggers.

The DataSync Agent is the on-premises component. It is a VMware ESXi or Linux KVM virtual machine image provided by AWS, deployed inside the organization's network. The agent connects to AWS DataSync service endpoints over HTTPS. Once activated, the agent is registered to the AWS account and communicates bidirectionally with AWS — receiving transfer instructions and reporting transfer status, file counts, error details, and metrics back to the AWS DataSync service.

DataSync provides several operational features: task filtering allows including or excluding files and directories based on patterns, preserving only the relevant subset of on-premises data for cloud migration. Task reports document every file transferred, skipped, or failed, providing an audit trail of the migration. CloudWatch integration sends metrics (BytesTransferred, FilesTransferred, FilesFound, TaskExecutionThrottled) and logs to Amazon CloudWatch for monitoring and alerting. Scheduling allows recurring tasks to run on cron-like schedules, creating continuous replication pipelines.

DataSync is often used in combination with other AWS services: migrating data into S3 where it feeds into AWS Glue ETL pipelines, into EFS where it is processed by Lambda functions or EC2 instances, or into FSx for Windows where it integrates with Active Directory environments. The DataSync task is typically the first step in a multi-service personal data processing pipeline.

GDPR Exposure Point 1: DataSync Task Definitions as Art. 46 Transfer Architecture Maps

AWS DataSync task definitions document the organization's complete data migration and replication architecture. Each task definition specifies the exact source location (which NFS server, which share path, which on-premises system), the destination (which S3 bucket, EFS file system, or FSx instance, in which AWS region), and which data is being transferred (through filter include/exclude rules). The complete set of DataSync task definitions constitutes a machine-readable inventory of all cross-boundary data flows from on-premises EU infrastructure to cloud storage.

For a healthcare organization, DataSync task definitions would reveal: which on-premises storage systems contain patient data (NFS server addresses, share paths), which AWS storage destinations receive the data (S3 bucket ARNs, EFS file system IDs), and which filter patterns control what medical records are included or excluded from transfers. This is not merely metadata — it is the operational blueprint of how EU health data under Art. 9 GDPR moves from on-premises clinical systems to cloud infrastructure.

Under GDPR Art. 46, organizations must ensure that transfers of personal data to third countries are subject to appropriate safeguards. When personal data moves from on-premises EU infrastructure to an AWS S3 bucket (even in eu-central-1), DataSync manages the transfer — and the transfer configuration describing what data moves and where is held by a US company. A CLOUD Act demand targeting task definitions would expose the architecture of personal data flows even without access to the data itself.

For organizations subject to sector-specific regulations, the exposure is compounded. Healthcare organizations under the NIS2 Directive or national health data laws may have additional obligations around documenting and securing data transfer architectures. Financial institutions under DORA must ensure operational resilience and have clear records of data flows. DataSync task definitions stored under US jurisdiction represent exactly the kind of operational architecture documentation that creates CLOUD Act exposure risk.

GDPR Exposure Point 2: The DataSync Agent as an AWS-Controlled PII Transfer Proxy

The DataSync Agent is the most structurally significant GDPR exposure point. The agent is presented as an on-premises component — it runs in the organization's data center, processes files locally, and transfers data to AWS. But the agent is not autonomous: it is controlled by and reports to the AWS DataSync service operated by Amazon.com, Inc.

The agent activation model illustrates this control relationship. When a DataSync Agent is first deployed, it must be activated against an AWS account endpoint. The activation process registers the agent with AWS DataSync, creates an AWS resource identifier for the agent, and establishes the bidirectional control channel. After activation, the agent receives task instructions from AWS, reports task execution status to AWS, sends transfer metrics and logs to AWS, and can be managed (updated, deactivated, restarted) through the AWS DataSync console or API.

From a GDPR Art. 28 (processor) analysis, the DataSync Agent creates an unusual arrangement: the agent software runs in the data controller's infrastructure but is functionally controlled by the data processor (AWS). AWS controls which tasks the agent executes, can push agent software updates, receives granular execution telemetry from the agent, and maintains the authoritative state of agent configuration. The organizational boundary between "on-premises" and "cloud-managed" is blurred by the control-plane architecture.

Under CLOUD Act, this control relationship is significant. A demand served on Amazon.com, Inc. for DataSync service records could yield agent activation records (which on-premises networks contain DataSync agents, with network address information), task execution logs (what the agents transferred, when, from which source paths), and agent status history (availability, connectivity, transfer throughput over time). The agent is in the on-premises network, but the control plane and logging infrastructure are under US jurisdiction.

# rclone: EU-hosted alternative — the transfer engine runs entirely in YOUR infrastructure
# No AWS-controlled agent, no US control plane, no CLOUD Act exposure on transfer metadata

# Install rclone on your EU-hosted server
curl https://rclone.org/install.sh | sudo bash

# Configure EU-hosted MinIO as destination (replace AWS S3)
rclone config create minio-eu s3 \
  provider=Minio \
  access_key_id=YOUR_ACCESS_KEY \
  secret_access_key=YOUR_SECRET_KEY \
  endpoint=https://minio.eu.internal:9000 \
  region=eu-central-1

# Migrate NFS share to EU-hosted MinIO (DataSync equivalent)
rclone sync /mnt/nfs/employee-documents minio-eu:hr-archive-bucket \
  --transfers=16 \
  --checkers=8 \
  --checksum \
  --progress \
  --log-file=/var/log/rclone-migration.log \
  --log-level=INFO \
  --exclude="*.tmp" \
  --exclude=".DS_Store"
# All transfer logs stay on YOUR infrastructure — no AWS service involved
# Transfer metadata (what moved, when, from where) never leaves EU jurisdiction

GDPR Exposure Point 3: Task Execution Reports as GDPR Art. 30 Processing Records

AWS DataSync task execution reports document every file transferred, skipped, or failed during each task run. For tasks transferring personal data, the execution report is a GDPR-relevant processing record: it documents which personal data files were processed (file paths, file sizes, file modification timestamps), when the processing occurred (execution start time, end time), the outcome (transferred, skipped, error), and the scope of the processing event (byte counts, file counts by category).

DataSync stores execution reports in S3 and records execution history in the DataSync service. CloudWatch receives detailed metrics for each task execution: BytesTransferred, FilesTransferred, FilesFound (total files scanned in source), FilesDeleted (files removed at destination during sync), and execution duration. Error details are logged to CloudWatch Logs, including file paths and error descriptions for files that failed to transfer.

Under GDPR Art. 30 (records of processing activities), data controllers must maintain records documenting the purposes of processing, categories of personal data processed, categories of recipients, transfers to third countries, and envisaged deletion timeframes. DataSync execution reports provide granular evidence of personal data processing operations — but this evidence is stored in AWS-managed infrastructure under US jurisdiction. The Art. 30 documentation asset is itself subject to CLOUD Act.

For organizations subject to supervisory authority audits under GDPR Art. 58, DataSync execution reports represent discoverable evidence of data transfer operations. A supervisory authority requesting documentation of personal data transfers might find that the most detailed records of what data moved and when are stored in AWS CloudWatch and DataSync service state, accessible to AWS under CLOUD Act before they are accessible to the organization's own data protection officer.

GDPR Exposure Point 4: DataSync Locations as Art. 46 Transfer Pathway Registry

DataSync locations are configuration objects representing storage endpoints: source NFS/SMB servers, HDFS clusters, on-premises S3-compatible storage, and destination AWS storage services. Each location stores connection details — NFS server addresses, SMB domain and credentials, S3 bucket ARNs, EFS mount targets, FSx file system identifiers — that document the complete topology of the data transfer infrastructure.

The source locations reveal on-premises infrastructure that processes personal data: NFS server hostnames or IP addresses (identifying which systems contain employee files, customer records, or patient data), share paths (indicating which directories are subject to transfer), and protocol-specific authentication details. For SMB locations, DataSync stores the Windows domain, username, and password or Kerberos configuration used to access file shares containing personal data.

Under CLOUD Act, a demand for DataSync location configurations would expose: the network addresses of on-premises systems containing personal data (NFS servers, HDFS cluster addresses), the authentication credentials used to access file shares, and the complete mapping from on-premises storage systems to cloud storage destinations. This creates a transitive exposure: the CLOUD Act demand yields not just transfer metadata but the credentials and network addresses needed to access the on-premises personal data sources directly.

For organizations with on-premises systems behind corporate firewalls, the assumption that firewall perimeter controls protect EU personal data infrastructure is undermined by the fact that the connection details (hostnames, credentials) for accessing those systems are stored in AWS-managed DataSync location configurations under US jurisdiction.

GDPR Exposure Point 5: Filter Rules as GDPR Data Minimization Implementation Records

DataSync task filter rules specify which files and directories are included or excluded from transfers. Organizations use filter rules to implement GDPR Art. 5(1)(c) (data minimization): transferring only the personal data that is necessary for the specified processing purpose, excluding sensitive files not required in the cloud destination, and preventing inadvertent migration of data that should remain on-premises.

Filter rule configurations serve a dual function: they implement data minimization, and they document the data minimization implementation. A filter rule excluding */health-records/* from a general file server migration documents that health records exist in that path. A filter including only */active-customers/* documents the data controller's determination that only active customer data, not historical records, is transferred. The filter rules are implementation artifacts of GDPR processing decisions.

These processing decisions — which categories of personal data are transferred, which are retained on-premises, which path structures contain sensitive data — are stored in DataSync task configuration under US jurisdiction. Under CLOUD Act, the filter rules disclose the organization's internal data categorization and the implicit classification of which directories contain different categories of personal data. This is more sensitive than a simple list of what was transferred — it reveals the organization's mental model of its own personal data architecture.

For organizations with Art. 9 special category data (health data, biometric data, union membership records, political opinions), filter rules that explicitly exclude or include paths containing this data create documented evidence that the organization is aware of and processing Art. 9 data in those storage locations — under US jurisdiction.

GDPR Exposure Point 6: CloudWatch Metrics as Personal Data Volume Intelligence

DataSync emits detailed CloudWatch metrics for every task execution: BytesTransferred (total data volume transferred), FilesTransferred (file count), FilesFound (total files scanned in source), FilesDeleted (files removed from destination), FilesVerified (files checksum-verified), and TaskQueuedDuration and TaskExecutionDuration for performance monitoring.

These metrics, stored in Amazon CloudWatch under US jurisdiction, constitute quantitative intelligence about personal data volumes and processing patterns. BytesTransferred time series over months reveals seasonal patterns in personal data processing — quarterly HR data migrations, annual medical records archival cycles, weekly customer data transfers. FilesFound reveals the total size of personal data repositories being scanned. FilesDeleted reveals deletion operations on personal data.

Under CLOUD Act, CloudWatch DataSync metrics provide aggregate intelligence about personal data processing operations without requiring access to the data itself: how much data is processed (volume), when it is processed (temporal patterns), and how the volumes change over time (growth rates). For organizations subject to GDPR Art. 30 documentation obligations, this aggregate processing metadata is held by AWS under US jurisdiction.

The combination of DataSync task execution metrics with VPC Flow Logs and CloudWatch transfer duration metrics can reconstruct the timing and approximate volume of every personal data transfer operation conducted via DataSync — creating a detailed audit trail of GDPR processing operations under CLOUD Act jurisdiction.

EU Alternatives to AWS DataSync

rclone is an open-source command-line program for managing files on cloud storage and between file systems. It supports 70+ storage backends including S3-compatible object storage (MinIO, Wasabi, Scaleway, OVH Object Storage, Exoscale), SFTP, FTP, SMB/CIFS, NFS via mount, WebDAV, and local filesystems. rclone runs entirely on infrastructure you control — no agent registration, no cloud control plane, no transfer metadata sent to any external service.

rclone provides DataSync-equivalent functionality: bidirectional sync, bandwidth throttling (--bwlimit), checksum verification (--checksum), include/exclude filters, progress reporting, log files, retry logic, and parallel transfers. It can run as a scheduled cron job or systemd timer with structured logging. For large migrations, rclone supports parallel multi-part uploads.

The critical GDPR difference: all rclone transfer logs, filter configurations, and execution records remain on the infrastructure where rclone runs — your EU-hosted servers. No external service receives transfer metadata. The equivalent of DataSync's CloudWatch metrics (file counts, byte counts, timing) are written to log files under your control.

# rclone sync equivalent to AWS DataSync task with filter rules and reporting
rclone sync /mnt/nfs/customer-data eu-s3:customer-archive \
  --transfers=32 \
  --checkers=16 \
  --checksum \
  --progress \
  --stats=30s \
  --stats-file=/var/log/rclone-stats.json \
  --log-file=/var/log/rclone-customer-data.log \
  --log-level=INFO \
  --include="*.pdf" \
  --include="*.docx" \
  --exclude="*/tmp/**" \
  --exclude="*/health-records/**" \
  --backup-dir=eu-s3:customer-archive-versions \
  --suffix=".$(date +%Y%m%d)"
# Logs stay on YOUR server in /var/log/ — no AWS CloudWatch, no CLOUD Act exposure

MinIO (EU-Hosted S3-Compatible Storage Target)

MinIO is an open-source, S3-compatible object storage server designed for deployment on commodity hardware. It provides the same API surface as Amazon S3 — any tool that works with S3 works with MinIO — but runs entirely within your EU infrastructure. Deploying MinIO in EU data centers replaces S3 as the storage destination, eliminating the storage-layer AWS dependency.

MinIO supports erasure coding for data durability, TLS encryption, object versioning, lifecycle policies, object locking, and replication between MinIO clusters. The MinIO console provides a web UI equivalent to the S3 console. MinIO's distributed mode allows clustering across multiple EU-hosted nodes for high availability.

For DataSync migration scenarios: rclone transfers data from on-premises NFS/SMB to EU-hosted MinIO. All storage remains under EU jurisdiction. All transfer metadata remains on infrastructure you control. The complete migration architecture — source NFS servers, transfer engine, destination object storage — can operate without any US service involvement.

# MinIO EU deployment with docker-compose
services:
  minio:
    image: quay.io/minio/minio:latest
    command: server /data --console-address ":9001"
    environment:
      MINIO_ROOT_USER: "${MINIO_ACCESS_KEY}"
      MINIO_ROOT_PASSWORD: "${MINIO_SECRET_KEY}"
      MINIO_SITE_REGION: "eu-central-1"
    volumes:
      - minio-data:/data
    ports:
      - "9000:9000"
      - "9001:9001"
    healthcheck:
      test: ["CMD", "mc", "ready", "local"]
      interval: 30s
      timeout: 20s
      retries: 3

volumes:
  minio-data:
    driver: local

rsync with SSH (Classic On-Premises to EU Server Migration)

rsync is the UNIX standard for file synchronization and transfer. For migrations between on-premises NFS/SMB mounts and EU-hosted Linux servers, rsync over SSH provides a complete transfer solution: incremental transfers (only changed blocks), checksum verification, permission preservation, and structured logging — with zero cloud service dependencies.

rsync is appropriate for scenarios where the destination is a Linux server or NFS mount rather than object storage. EU-hosted bare metal servers or VMs with NFS exports serve as migration targets. The SSH transfer channel is encrypted. Log output (transferred files, bytes, timing) is written locally. No external service is involved.

# rsync equivalent to DataSync with NFS source to EU server destination
rsync -avz \
  --progress \
  --stats \
  --checksum \
  --exclude="*.tmp" \
  --exclude="*/health-records/*" \
  --log-file=/var/log/rsync-migration.log \
  --log-file-format="%t %f %b" \
  /mnt/nfs-source/employee-data/ \
  user@eu-server.internal:/data/employee-archive/
# Complete transfer log written to /var/log/rsync-migration.log on local server
# No CloudWatch, no AWS DataSync service, no CLOUD Act exposure

Syncthing (Continuous Peer-to-Peer Replication)

Syncthing is an open-source, decentralized file synchronization application. It creates direct peer-to-peer synchronization between nodes without requiring any central server or relay (though relay servers exist for NAT traversal). For ongoing replication scenarios (DataSync recurring task equivalents), Syncthing provides continuous folder synchronization between on-premises systems and EU-hosted servers.

Syncthing uses TLS for all connections, supports selective sync with ignore patterns, provides versioning for deleted/overwritten files, and offers a web UI and REST API for programmatic management. Discovery can be configured to use only local LAN discovery or self-hosted discovery servers, eliminating any external service dependency. For hybrid architectures with persistent on-premises and EU-cloud components, Syncthing creates a continuously synchronized shared folder that requires no AWS DataSync equivalent.

Apache NiFi (Enterprise Data Flow with EU Deployment)

Apache NiFi is an open-source, enterprise-grade data flow platform that supports directed graph data pipelines with visual orchestration, real-time monitoring, and comprehensive provenance tracking. For complex DataSync scenarios requiring transformation, routing, or processing during migration, NiFi provides more sophistication than rclone — while running entirely on EU-hosted infrastructure.

NiFi processors support NFS/SMB via FetchFile and ListFile, S3-compatible storage, SFTP, and many other sources and destinations. NiFi's provenance store records every data flow event, creating an audit trail of data processing operations that remains within the EU-hosted NiFi cluster. Flow configurations (equivalent to DataSync task definitions) are stored in NiFi's internal configuration repository, on EU-hosted infrastructure.

Migration Path from AWS DataSync to EU-Hosted Transfer Stack

Phase 1 — Inventory existing DataSync tasks: Document all active tasks (source locations, destination locations, filter rules, schedules). This becomes the rclone/rsync configuration inventory. Use aws datasync list-tasks and aws datasync describe-task to export task definitions before migration.

Phase 2 — Deploy EU storage target: Deploy MinIO on EU-hosted infrastructure (Hetzner, OVH, Scaleway, or sotaio.io). Create buckets matching existing S3 destination structure. Configure rclone with MinIO endpoint as the eu-s3 remote.

Phase 3 — Convert DataSync tasks to rclone commands: Map each DataSync task to an rclone sync command. DataSync filter rules (include/exclude patterns) map directly to rclone --include and --exclude flags. DataSync scheduling (cron-like) maps to systemd timer units or cron entries. DataSync task execution reporting maps to rclone --log-file and --stats-file output.

Phase 4 — Validate transfer equivalence: Run rclone in dry-run mode (--dry-run) against existing NFS sources to verify the filter rules produce the expected file selection. Compare file counts and byte counts against DataSync execution report baselines.

Phase 5 — Deactivate DataSync agents: After validating EU-hosted rclone transfers, deactivate DataSync agents and delete DataSync task definitions, execution history, and location configurations. This removes the CLOUD Act-accessible transfer architecture record from AWS service state.

What This Means for sota.io Customers

sota.io is a EU-native deployment platform that runs entirely on European infrastructure with no US parent company — making it the appropriate hosting layer for EU organizations deploying rclone, MinIO, and NiFi-based data transfer stacks as DataSync replacements.

Unlike AWS DataSync, where transfer architecture configuration is stored in service state under US jurisdiction, rclone running on sota.io uses sota.io's EU-hosted server infrastructure. Transfer logs, configuration files, and scheduling scripts are files on your deployment — stored in EU data centers, subject to EU jurisdiction, accessible under EU legal frameworks.

For EU organizations migrating from DataSync, the transfer stack — rclone for transfer, MinIO for storage, systemd timers for scheduling, Prometheus for metrics — can be containerized and deployed to sota.io using standard Docker Compose or Kubernetes configurations. The complete migration infrastructure runs under EU jurisdiction with no AWS service layer.


AWS DataSync is a managed data transfer service operated by Amazon.com, Inc., a US company. DataSync Agents, task definitions, execution logs, and location configurations are stored in AWS service infrastructure subject to US jurisdiction and CLOUD Act compelled disclosure. Organizations transferring EU personal data using DataSync should evaluate whether the transfer architecture documentation stored in DataSync service state creates acceptable risk under GDPR Art. 46 and their organizational risk tolerance.

EU-Native Hosting

Ready to move to EU-sovereign infrastructure?

sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.