EU Data Act Cloud Switching 2026: Data Format Standards, Open APIs & Interoperability Requirements
Post #1417 in the sota.io EU Cloud Compliance Series
The EU Data Act (Regulation (EU) 2023/2854) does not specify a list of approved file formats for cloud switching exports. What it requires is something more demanding: data must be exported in "commonly used, machine-readable, open, interoperable formats" — and the receiving provider must be able to import that data without manual intervention. This post covers what that means in practice, which formats qualify, how to document your export schema, and what a compliant multi-format export pipeline looks like in TypeScript.
This is the second post in our EU-DATA-ACT-CLOUD-SWITCHING-2026 series. The first post covered the switching request API and 30-day state machine.
The Four Format Criteria
The Data Act Chapter IV switching provisions establish four format requirements. Each is a genuine constraint, not just aspirational language:
1. Commonly Used
The format must have meaningful adoption in the industry. This disqualifies:
- Proprietary binary formats (even well-documented ones)
- Vendor-specific export flavors without external adoption
- Undocumented internal formats
This criterion explicitly targets formats like "Salesforce export", "HubSpot export", or a custom binary database dump. If a data engineer at the receiving provider needs to write an ad-hoc parser, the format fails this test.
Formats that qualify: JSON, CSV, XML, Parquet, NDJSON/JSON Lines, YAML, TSV
Formats that don't qualify: proprietary binary blobs, undocumented custom encodings, base64-wrapped binary data without schema documentation
2. Machine-Readable
The format must be parseable by standard tools without ambiguity. This means:
- Structured data, not natural language exports
- Consistent encoding (UTF-8 mandatory for text-based formats)
- No presentation-layer formats masquerading as data formats
PDFs, Word documents, and HTML pages generally fail this criterion for structured data (though they may be included as supplementary attachments for human-readable summaries).
Edge case: XLSX spreadsheets are borderline. They are machine-readable by tooling but are inherently cell-based rather than schema-based. XLSX may satisfy the criterion for tabular data, but JSON or CSV with a documented schema is strongly preferable.
3. Open
The format specification must be publicly available, royalty-free. This eliminates any format with licensing restrictions on parsing tools.
All standard formats (JSON, CSV, XML, OpenAPI, Parquet) satisfy this criterion. The openness requirement is rarely the binding constraint in practice.
4. Interoperable
This is the most demanding criterion. The format must allow a receiving provider to import the data without custom integration work. Practically, this means:
- A documented JSON Schema or OpenAPI schema that describes every exported field
- Semantic clarity: field names and values must be unambiguous (ISO dates, IETF language codes, standard identifiers)
- No references to internal IDs that only have meaning within your system
- If you export user profiles, the receiving provider must be able to create equivalent user profiles from your export without calling your API for additional metadata
The interoperability requirement is what separates a compliant export from a simple data dump.
Recommended Format Stack by Data Type
Different data types have different optimal formats. The following matrix reflects common SaaS data categories:
| Data Type | Recommended Format | Schema Standard | Notes |
|---|---|---|---|
| Structured records (users, projects, items) | JSON Lines (NDJSON) | JSON Schema Draft-07+ | One record per line, streaming-friendly for large datasets |
| Relational data with joins | JSON (nested) or Parquet | JSON Schema + column metadata | Parquet preferred for >10MB |
| Binary assets (files, images, attachments) | ZIP with manifest | JSON Schema for manifest.json | Binary content in /files/, manifest indexes metadata |
| Time-series / audit logs | JSON Lines | JSON Schema | ISO 8601 timestamps, UTC timezone |
| Configurations and settings | YAML or JSON | JSON Schema | Single-file preferred |
| Financial records / billing history | JSON Lines | JSON Schema + FHIR-inspired profiles | Currency in ISO 4217, amounts as integers (cents) |
| Calendar / scheduling data | iCalendar (RFC 5545) + JSON summary | Standard iCal + JSON Schema | iCal for calendar apps, JSON for non-calendar targets |
| Contacts / identity data | vCard (RFC 6350) + JSON | Standard vCard + JSON Schema | vCard for addressbook apps, JSON for general targets |
Schema Documentation: The OpenAPI Approach
The technical requirements in Chapter IV implicitly require that your export schema be machine-readable and documented. The receiving provider's import system must be able to validate imported data against your schema. We recommend using OpenAPI 3.1 to document your export format — the same tooling as your API.
Export Manifest Schema
Every Data Act compliant export package must include a manifest.json at the root:
interface ExportManifest {
schemaVersion: "1.0";
exportId: string; // UUID
requestId: string; // Matches the switch request ID
exportedAt: string; // ISO 8601, UTC
provider: {
name: string;
legalEntity: string; // Registered company name
euRepresentative?: string; // If provider is non-EU
dataActContact: string; // Email for portability questions
};
customer: {
accountId: string; // Your internal ID
externalId?: string; // Optional stable external identifier
dataResidency: string[]; // ISO 3166-1 alpha-2 country codes
};
exportContents: ExportContent[];
schemaDocumentation: string; // URL to your OpenAPI/JSON Schema doc
retentionExpiry: string; // ISO 8601: 30 days post-export
exportFormat: ExportFormatSpec;
}
interface ExportContent {
scope: string; // "user_data" | "application_data" | etc.
files: ExportFile[];
recordCount: number;
dataSizeBytes: number;
dateRangeFrom?: string;
dateRangeTo?: string;
}
interface ExportFile {
path: string; // Relative to archive root
format: "ndjson" | "json" | "csv" | "parquet" | "zip" | "ical" | "vcard";
encoding: "utf-8";
schemaRef: string; // URL to JSON Schema for this file
recordCount?: number;
checksum: string; // SHA-256 hex
}
interface ExportFormatSpec {
primaryFormat: string;
compressionAlgorithm: "zip" | "gzip" | "none";
characterEncoding: "utf-8";
lineEnding: "LF";
}
OpenAPI Export Schema Declaration
Publish your export schema as an OpenAPI document. This allows receiving providers to auto-generate import clients:
# openapi.export.yaml — published at https://api.yoursaas.com/data-act/export-schema
openapi: "3.1.0"
info:
title: "YourSaaS EU Data Act Export Schema"
version: "1.0.0"
description: "Machine-readable schema for EU Data Act cloud switching exports. Compliant with Regulation (EU) 2023/2854 Chapter IV technical requirements."
contact:
name: "EU Data Portability Team"
email: "data-portability@yoursaas.com"
license:
name: "CC0 1.0" # Schema is public domain — receiving providers can use freely
paths: {} # No API paths — this is schema-only
components:
schemas:
UserRecord:
type: object
required: [id, email, createdAt, status]
properties:
id:
type: string
format: uuid
description: "Stable user identifier. Unique within this export."
email:
type: string
format: email
displayName:
type: string
locale:
type: string
pattern: "^[a-z]{2}-[A-Z]{2}$"
description: "BCP 47 language tag"
timezone:
type: string
description: "IANA timezone identifier"
createdAt:
type: string
format: date-time
updatedAt:
type: string
format: date-time
status:
type: string
enum: [active, suspended, deleted]
metadata:
type: object
description: "Additional customer-specific fields. Keys use snake_case."
additionalProperties:
type: string
Avoiding Interoperability Failures
Most Data Act compliance failures will occur at the interoperability layer, not the format layer. Common patterns that produce technically compliant but practically non-interoperable exports:
Anti-Pattern 1: Internal ID References Without Context
// BAD — receiving provider cannot resolve team_id without your system
{
"userId": "u_abc123",
"teamId": "t_xyz789",
"roleId": "r_owner"
}
// GOOD — self-contained, receiving provider can reconstruct relationships
{
"userId": "550e8400-e29b-41d4-a716-446655440000",
"email": "alice@example.com",
"team": {
"id": "t_xyz789",
"name": "Engineering",
"slug": "engineering"
},
"role": {
"id": "r_owner",
"name": "Owner",
"permissions": ["admin", "billing", "member_management"]
}
}
Anti-Pattern 2: Non-Standard Timestamps
// BAD — ambiguous, timezone unclear
{
"created": "05/31/26 8:03 PM",
"updated": 1748700180
}
// GOOD — ISO 8601 with explicit UTC offset
{
"createdAt": "2026-05-31T20:03:00Z",
"updatedAt": "2026-05-31T20:03:00Z"
}
Anti-Pattern 3: Opaque Enum Values
// BAD — receiving provider doesn't know what status=2 means
{
"userId": "u_abc123",
"status": 2,
"plan": "p3",
"tier": 4
}
// GOOD — human-readable, stable string values
{
"userId": "u_abc123",
"status": "active",
"plan": "professional",
"tier": "standard"
}
Anti-Pattern 4: Missing Pagination Context in Large Exports
// BAD — customer has 50,000 records but export is a single 500MB file with no structure
[{ "id": "..." }, ...]
// GOOD — chunked export with manifest
// manifest.json lists:
// - users_001.ndjson (10,000 records)
// - users_002.ndjson (10,000 records)
// - users_003.ndjson (10,000 records)
// etc.
// Each file is independently parseable
TypeScript: Multi-Format Export Pipeline
A production-ready export pipeline must support multiple output formats from the same data model. This allows your platform to satisfy both the Data Act's format requirements and individual customer preferences:
import { createWriteStream } from "fs";
import { pipeline } from "stream/promises";
import { stringify } from "csv-stringify";
type ExportFormat = "ndjson" | "csv" | "parquet";
interface ExportPipeline<T> {
format: ExportFormat;
outputPath: string;
schema: Record<string, unknown>; // JSON Schema
write(records: AsyncIterable<T>): Promise<ExportStats>;
}
export class UserExportPipeline implements ExportPipeline<UserRecord> {
readonly format: ExportFormat;
readonly outputPath: string;
readonly schema = USER_JSON_SCHEMA;
constructor(options: { format: ExportFormat; outputPath: string }) {
this.format = options.format;
this.outputPath = options.outputPath;
}
async write(records: AsyncIterable<UserRecord>): Promise<ExportStats> {
const stats: ExportStats = { recordCount: 0, sizeBytes: 0, errors: [] };
switch (this.format) {
case "ndjson":
return this.writeNDJSON(records, stats);
case "csv":
return this.writeCSV(records, stats);
case "parquet":
return this.writeParquet(records, stats);
}
}
private async writeNDJSON(
records: AsyncIterable<UserRecord>,
stats: ExportStats
): Promise<ExportStats> {
const writeStream = createWriteStream(this.outputPath, {
encoding: "utf8",
flags: "w",
});
try {
for await (const record of records) {
const line = JSON.stringify(this.normalizeRecord(record)) + "\n";
writeStream.write(line);
stats.recordCount++;
stats.sizeBytes += Buffer.byteLength(line, "utf8");
}
} finally {
writeStream.end();
await new Promise((resolve) => writeStream.on("finish", resolve));
}
return stats;
}
private async writeCSV(
records: AsyncIterable<UserRecord>,
stats: ExportStats
): Promise<ExportStats> {
const csvStringifier = stringify({
header: true,
columns: CSV_COLUMNS, // flat column list for tabular export
});
const writeStream = createWriteStream(this.outputPath, {
encoding: "utf8",
});
try {
for await (const record of records) {
csvStringifier.write(this.flattenRecord(record));
stats.recordCount++;
}
csvStringifier.end();
await pipeline(csvStringifier, writeStream);
stats.sizeBytes = (await import("fs")).statSync(this.outputPath).size;
} catch (error) {
stats.errors.push(error instanceof Error ? error.message : String(error));
}
return stats;
}
private normalizeRecord(record: UserRecord): SerializableUserRecord {
return {
id: record.id,
email: record.email,
displayName: record.displayName ?? null,
locale: record.locale ?? "en-US",
timezone: record.timezone ?? "UTC",
createdAt: record.createdAt.toISOString(),
updatedAt: record.updatedAt.toISOString(),
status: record.status,
};
}
private flattenRecord(record: UserRecord): FlatUserRecord {
return {
id: record.id,
email: record.email,
display_name: record.displayName ?? "",
locale: record.locale ?? "en-US",
timezone: record.timezone ?? "UTC",
created_at: record.createdAt.toISOString(),
updated_at: record.updatedAt.toISOString(),
status: record.status,
};
}
}
// Multi-format coordinator
export async function generateSwitchingExport(
accountId: string,
options: { formats: ExportFormat[]; outputDir: string }
): Promise<ExportPackage> {
const results: ExportPackage = {
files: [],
manifest: createBaseManifest(accountId),
};
for (const format of options.formats) {
const outputPath = `${options.outputDir}/users.${format === "ndjson" ? "ndjson" : format}`;
const pipeline = new UserExportPipeline({ format, outputPath });
const records = streamUsersForAccount(accountId);
const stats = await pipeline.write(records);
results.files.push({
path: `users.${format}`,
format,
encoding: "utf-8",
schemaRef: "https://api.yoursaas.com/data-act/export-schema#/components/schemas/UserRecord",
recordCount: stats.recordCount,
checksum: await computeSHA256(outputPath),
});
}
return results;
}
interface ExportStats {
recordCount: number;
sizeBytes: number;
errors: string[];
}
Interoperability Profiles: The GAIA-X Approach
The EU's GAIA-X initiative and the European Data Standards Body (EDSB) are developing sector-specific interoperability profiles for cloud switching. These profiles specify:
- Required fields for each data category
- Preferred formats and encoding conventions
- Semantic vocabularies (schema.org, Dublin Core, DCAT)
- Validation rules for receiving providers
While mandatory profiles are not yet finalized for general SaaS, EDSB working groups have published guidance for specific sectors (financial services, health, mobility). Even where profiles are not yet mandatory, implementing them positions your platform ahead of the compliance curve.
The DCAT vocabulary (W3C Data Catalog Vocabulary) is the leading semantic layer for Data Act exports, particularly for B2B data spaces. For customer-facing exports, JSON Schema is sufficient.
Minimal DCAT-Compatible Export Manifest
{
"@context": {
"dcat": "http://www.w3.org/ns/dcat#",
"dct": "http://purl.org/dc/terms/",
"xsd": "http://www.w3.org/2001/XMLSchema#"
},
"@type": "dcat:Dataset",
"@id": "urn:export:550e8400-e29b-41d4-a716-446655440000",
"dct:title": "YourSaaS Account Export — Account abc123",
"dct:description": "EU Data Act cloud switching export for customer account abc123",
"dct:issued": {
"@type": "xsd:dateTime",
"@value": "2026-05-31T20:00:00Z"
},
"dct:publisher": {
"@id": "https://yoursaas.com",
"@type": "foaf:Organization"
},
"dct:license": {
"@id": "https://creativecommons.org/licenses/by/4.0/"
},
"dcat:distribution": [
{
"@type": "dcat:Distribution",
"dcat:downloadURL": "urn:local:users.ndjson",
"dcat:mediaType": "application/x-ndjson",
"dct:format": "NDJSON",
"dcat:byteSize": 1048576,
"spdx:checksum": {
"spdx:algorithm": "SHA-256",
"spdx:checksumValue": "a3f5d..."
}
}
]
}
Validation: Testing Your Export for Interoperability
Before publishing your export feature, run it through this validation checklist:
Automated Validation Checklist
import Ajv from "ajv";
import addFormats from "ajv-formats";
export async function validateExportCompliance(
exportDir: string
): Promise<ComplianceReport> {
const report: ComplianceReport = { passed: [], failed: [], warnings: [] };
const ajv = new Ajv({ strict: true, allErrors: true });
addFormats(ajv);
// 1. Manifest validation
const manifestPath = `${exportDir}/manifest.json`;
const manifest = await loadJSON(manifestPath);
const manifestValid = ajv.validate(MANIFEST_SCHEMA, manifest);
manifestValid
? report.passed.push("Manifest schema valid")
: report.failed.push(`Manifest invalid: ${ajv.errorsText()}`);
// 2. All referenced files exist
for (const content of manifest.exportContents ?? []) {
for (const file of content.files ?? []) {
const exists = await fileExists(`${exportDir}/${file.path}`);
exists
? report.passed.push(`File exists: ${file.path}`)
: report.failed.push(`Missing file: ${file.path}`);
}
}
// 3. Checksums match
for (const file of getAllFiles(manifest)) {
const computed = await computeSHA256(`${exportDir}/${file.path}`);
computed === file.checksum
? report.passed.push(`Checksum OK: ${file.path}`)
: report.failed.push(
`Checksum mismatch: ${file.path} (expected ${file.checksum}, got ${computed})`
);
}
// 4. Record count consistency
for (const content of manifest.exportContents ?? []) {
for (const file of content.files ?? []) {
if (file.format === "ndjson" && file.recordCount) {
const count = await countNDJSONLines(`${exportDir}/${file.path}`);
count === file.recordCount
? report.passed.push(`Record count OK: ${file.path}`)
: report.failed.push(
`Record count mismatch: ${file.path} (manifest says ${file.recordCount}, actual ${count})`
);
}
}
}
// 5. Timestamp format validation (sample first 100 records)
for (const file of getAllNDJSONFiles(manifest)) {
const sample = await readFirstNRecords(`${exportDir}/${file.path}`, 100);
for (const record of sample) {
for (const [key, value] of Object.entries(record)) {
if (key.endsWith("At") || key.endsWith("Date") || key === "timestamp") {
const isISO = ISO_DATE_REGEX.test(String(value));
if (!isISO) {
report.failed.push(
`Non-ISO timestamp in ${file.path}: ${key}=${value}`
);
}
}
}
}
}
// 6. UTF-8 encoding check
for (const textFile of getTextFiles(manifest)) {
const isUTF8 = await checkUTF8(`${exportDir}/${textFile.path}`);
isUTF8
? report.passed.push(`UTF-8 encoding OK: ${textFile.path}`)
: report.failed.push(`Non-UTF-8 file: ${textFile.path}`);
}
return report;
}
const ISO_DATE_REGEX = /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?Z$/;
interface ComplianceReport {
passed: string[];
failed: string[];
warnings: string[];
}
Schema Versioning and Migration
Your export schema will evolve. New fields, deprecated fields, restructured objects. The Data Act's interoperability requirement includes an implicit versioning obligation: if you change your export format, receiving providers must be able to update their import pipeline.
Best practices for schema versioning under the Data Act:
- Semantic versioning in the manifest — include
schemaVersion: "1.2.0"in every manifest.json - Breaking change notice — minimum 90 days advance notice before breaking schema changes (analogous to the API change notice requirement)
- Migration guides — publish diff documentation at your schema URL (e.g.
https://api.yoursaas.com/data-act/schema-changelog) - Additive-only for minor versions — only add new optional fields in minor versions, never remove or rename required fields
- Schema URL stability — the schema URL referenced in your manifest must remain stable; use
/data-act/export-schema/v1not/data-act/export-schema
export const SCHEMA_VERSIONS = {
"1.0.0": {
released: "2025-09-12",
deprecated: null,
schemaUrl: "https://api.yoursaas.com/data-act/export-schema/v1.0",
breakingChanges: [],
},
"1.1.0": {
released: "2026-01-15",
deprecated: null,
schemaUrl: "https://api.yoursaas.com/data-act/export-schema/v1.1",
breakingChanges: [],
additions: ["user.metadata", "team.parentTeamId"],
},
} as const;
export function getCurrentSchemaVersion(): string {
const versions = Object.keys(SCHEMA_VERSIONS).sort();
return versions[versions.length - 1];
}
export function isDeprecatedSchema(version: string): boolean {
const schema = SCHEMA_VERSIONS[version as keyof typeof SCHEMA_VERSIONS];
return !!schema?.deprecated;
}
What's Coming from the EDSB
The European Data Standards Body (EDSB), established under the Data Act, is tasked with developing common specifications for cloud switching. As of mid-2026, the EDSB has published early-stage guidance on:
- Reference data formats for general business data
- Minimum data set specifications for common SaaS categories (project management, CRM, document management)
- API gateway specifications for switching request submission
These specifications are not yet mandatory, but regulators are expected to adopt them via delegated acts once finalized. Implementing EDSB guidance now reduces future migration risk.
Practical recommendation: sign up for EDSB consultation alerts and subscribe to the ENISA cloud switching guidance publication RSS. When sector-specific profiles are published for your industry, your lead time to implement is typically 12-18 months from publication to enforcement.
Implementation Checklist
Before your Data Act export feature goes live, verify:
- Export produces JSON, NDJSON, or Parquet (not proprietary binary)
- All timestamps are ISO 8601 UTC (
2026-05-31T20:00:00Z) - All enum/status values are human-readable strings, not opaque codes
- No internal ID-only references — all foreign keys include display names
-
manifest.jsonat archive root with complete file inventory and checksums - OpenAPI/JSON Schema document published at stable public URL
- Schema URL referenced in every manifest
- UTF-8 encoding throughout
- Export validated by your automated compliance checker (see above)
- Schema version declared and
SCHEMA_VERSIONSchangelog maintained - Breaking changes policy documented in public developer docs
Next in This Series
The next post covers Art. 27 switching obstacle compliance — identifying and removing the hidden lock-in patterns that regulators will flag: service degradation during switching, removal of integrations, and contractual terms that restrict export scope. We'll provide an audit framework and the TypeScript patterns for safe switching window enforcement.
The EU Data Act (Regulation (EU) 2023/2854) text is available at EUR-Lex. EDSB guidance documents are published at the European Commission data economy portal.
EU-Native Hosting
Ready to move to EU-sovereign infrastructure?
sota.io is a German-hosted PaaS — no CLOUD Act exposure, no US jurisdiction, full GDPR compliance by design. Deploy your first app in minutes.