2026-04-09·11 min read·sota.io team

GDPR Article 25: Privacy by Design and by Default — The Developer's Implementation Guide

GDPR Article 25 is one of the most technically demanding provisions of the General Data Protection Regulation, yet it is routinely reduced to a compliance checkbox in DPA agreements and privacy policies. For developers actually building systems that process personal data of EU residents, Article 25 is an engineering requirement — it mandates specific technical and organisational measures at the design stage, not as an afterthought.

This guide covers what Article 25 actually requires, how it translates into concrete implementation decisions across your stack, and why infrastructure jurisdiction is a structural component of Privacy by Design compliance.

What Article 25 Actually Says

Article 25 contains two distinct obligations:

Article 25(1) — Privacy by Design (PbD): The controller shall, both at the time of the determination of the means for processing and at the time of the processing itself, implement appropriate technical and organisational measures which are designed to implement data protection principles effectively and to integrate the necessary safeguards into the processing.

Article 25(2) — Privacy by Default (PbDf): The controller shall implement appropriate technical and organisational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed. That obligation applies to the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility.

The key distinction: PbD is about building privacy into the system architecture from the start. PbDf is about the default configuration — what happens when a user does nothing. If your API collects more data than minimally necessary by default, or if your system stores data indefinitely by default, or if your admin interface exposes all user data to all staff by default, you fail Article 25(2) regardless of your PbD measures.

Article 25 exists to operationalise the core data protection principles from Article 5. Understanding the link makes implementation clearer:

Article 5 Principle	Article 25 Engineering Requirement
Lawfulness, fairness, transparency (Art. 5(1)(a))	Audit trails; data processing maps; consent management at API layer
Purpose limitation (Art. 5(1)(b))	Separate data stores per processing purpose; no cross-contamination of datasets
Data minimisation (Art. 5(1)(c))	Collect only required fields; nullable rather than mandatory for non-essential data
Accuracy (Art. 5(1)(d))	Update mechanisms; validation at input; no stale data propagation
Storage limitation (Art. 5(1)(e))	Automated retention policies; scheduled deletion jobs; TTL on sensitive fields
Integrity and confidentiality (Art. 5(1)(f))	Encryption at rest and in transit; pseudonymisation; access control
Accountability (Art. 5(2))	Documentation of technical measures; DPIA for high-risk processing

Article 25 is not a separate compliance track — it is the engineering implementation of these seven principles into your system's technical architecture.

Layer 1: API Design and Data Collection

The API layer is where data minimisation starts. Every field you collect creates a processing obligation under GDPR. Article 25 requires you to design APIs that collect the minimum data necessary for the stated purpose.

Data Minimisation at the Request Level

// ❌ Non-compliant: collecting more than necessary for newsletter signup
interface NewsletterSignupRequest {
  email: string
  firstName: string
  lastName: string
  dateOfBirth: string    // Not needed for newsletter
  phoneNumber: string    // Not needed for newsletter
  address: string        // Not needed for newsletter
  gender: string         // Special category data — explicit consent required
}

// ✅ Compliant: minimal data for the stated purpose
interface NewsletterSignupRequest {
  email: string          // Required — the service delivery mechanism
  firstName?: string     // Optional — personalisation only if user provides it
}

The principle: if you cannot explain why you need a specific data point for the specific processing purpose, do not collect it. This is not just a GDPR requirement — it reduces your attack surface, simplifies your data model, and reduces breach impact.

Nullable vs Mandatory: Default Privacy

Article 25(2) requires that by default, only data necessary for each specific purpose is processed. In API terms, this means non-essential fields must be optional, not required — and your system must function correctly without them.

// ❌ Fails Art. 25(2): all fields mandatory by default
const schema = z.object({
  email: z.string().email(),
  phone: z.string(),          // Mandatory when not needed for core service
  marketingConsent: z.boolean().default(true),  // Opted-in by default
})

// ✅ Compliant: minimal mandatory, optional for non-essential
const schema = z.object({
  email: z.string().email(),
  phone: z.string().optional(),
  marketingConsent: z.boolean().default(false), // Opted-out by default
})

The marketingConsent: false default is directly required by Article 25(2). Consent for secondary processing purposes must not be pre-ticked or defaulted to true — this is consistently enforced across EU DPAs (French CNIL, German DSK, Irish DPC).

Purpose-Bound API Endpoints

Separate API endpoints for separate processing purposes. This enforces purpose limitation at the architectural level and makes data processing maps accurate:

// POST /auth/register — identity verification only
// POST /newsletter/subscribe — marketing communications only
// POST /analytics/events — usage telemetry (if consent given)
// POST /support/tickets — customer support processing only

Mixing processing purposes in a single endpoint makes it structurally impossible to demonstrate purpose limitation. Data minimisation and purpose separation also simplify your Article 30 Record of Processing Activities (ROPA) — each endpoint maps to one or few processing activities.

Layer 2: Database Schema Design

Database schema decisions made at the design stage are the most expensive to change retroactively. Article 25 explicitly applies "at the time of the determination of the means for processing" — meaning schema design is in scope.

Pseudonymisation at the Schema Level

GDPR Article 4(5) defines pseudonymisation as processing personal data such that it can no longer be attributed to a specific individual without additional information. Article 25 specifically references pseudonymisation as a relevant technical measure.

-- ❌ Non-compliant: direct identifier in analytics events
CREATE TABLE analytics_events (
  id UUID PRIMARY KEY,
  user_email VARCHAR,          -- Direct identifier — links event to individual
  event_type VARCHAR,
  created_at TIMESTAMPTZ
);

-- ✅ Pseudonymised: internal ID separates identity from behaviour
CREATE TABLE users (
  id UUID PRIMARY KEY,
  email VARCHAR UNIQUE NOT NULL,  -- Identity data — restricted access
  email_hash VARCHAR GENERATED ALWAYS AS (encode(sha256(email::bytea), 'hex')) STORED
);

CREATE TABLE analytics_events (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL REFERENCES users(id),  -- Pseudonymous reference
  event_type VARCHAR,
  created_at TIMESTAMPTZ
);

-- Separate the key: analytics team cannot link user_id to email without JOIN privilege

The architecture separates the identity store (users table, restricted access) from the behavioural data (analytics_events, broader read access). This is pseudonymisation by design: the events table alone does not identify individuals.

Storage Limitation: Automated Retention

Article 5(1)(e) and Article 25 together require that data is not stored longer than necessary. This must be enforced technically, not just in a privacy policy:

-- Automated deletion via pg_cron or scheduled job
-- Example: delete user sessions older than 30 days
CREATE OR REPLACE FUNCTION delete_expired_sessions()
RETURNS void LANGUAGE plpgsql AS $$
BEGIN
  DELETE FROM user_sessions WHERE created_at < NOW() - INTERVAL '30 days';
  DELETE FROM audit_logs WHERE created_at < NOW() - INTERVAL '1 year';
  DELETE FROM analytics_events WHERE created_at < NOW() - INTERVAL '2 years';
END;
$$;

-- Schedule: daily at 02:00 UTC
SELECT cron.schedule('delete-expired-sessions', '0 2 * * *', 'SELECT delete_expired_sessions()');

Retention periods must match your privacy policy and ROPA. They should be differentiated by data category — session tokens (hours/days), transaction records (7 years for tax purposes), marketing interactions (consent lifetime + reasonable period).

Separate Schemas for Separate Purposes

Database-level separation of processing purposes:

-- Schema separation enforces purpose limitation
CREATE SCHEMA identity;      -- Users, authentication, account data
CREATE SCHEMA commerce;      -- Orders, payments, invoices
CREATE SCHEMA analytics;     -- Anonymised/pseudonymised usage data
CREATE SCHEMA support;       -- Support tickets, communications

-- Role-based access: analytics team cannot read identity schema
GRANT USAGE ON SCHEMA analytics TO analytics_role;
REVOKE USAGE ON SCHEMA identity FROM analytics_role;

This makes it structurally impossible for analytics processing to contaminate identity data — enforcing purpose limitation at the database access control layer.

Layer 3: Logging and Monitoring

Logging is one of the most frequent Article 25 failure points. Application logs routinely capture personal data — IP addresses, email addresses, user-agent strings, query parameters — without any data minimisation or retention policy.

What Counts as Personal Data in Logs

Under GDPR, personal data is any information relating to an identified or identifiable natural person. In logs:

Direct identifiers: Email addresses, usernames, national ID numbers
Quasi-identifiers: IP addresses (identifiable with ISP cooperation — CJEU C-582/14 Breyer), device fingerprints, session tokens
Indirect identifiers: User IDs (if linked to identifiable persons in another system)
Behavioural data: Specific sequences of actions attributable to an individual

// ❌ Non-compliant: full personal data in application logs
logger.info({
  event: 'user_login',
  email: user.email,       // Direct identifier in logs
  ip: req.ip,              // Quasi-identifier
  userAgent: req.headers['user-agent'],
  userId: user.id,
})

// ✅ Privacy-by-design logging: pseudonymous identifiers only
import { createHash } from 'crypto'

const pseudonymise = (value: string, salt: string): string =>
  createHash('sha256').update(value + salt).digest('hex').slice(0, 16)

logger.info({
  event: 'user_login',
  userRef: pseudonymise(user.id, process.env.LOG_PSEUDONYM_SALT!),  // Pseudonymous
  ipRegion: getRegionFromIp(req.ip),   // Aggregated — not individually identifiable
  // No email, no full IP, no user-agent
})

Log Retention and Structured Retention Policies

// Structured log retention: different TTLs for different log types
const logConfig = {
  security_events: { retention: '1 year', justification: 'security incident investigation' },
  access_logs: { retention: '90 days', justification: 'operational debugging' },
  application_errors: { retention: '30 days', justification: 'bug investigation' },
  analytics_events: { retention: '2 years', justification: 'product improvement (pseudonymised)' },
}

Document this in your ROPA. EU DPAs increasingly audit log retention as part of Article 25 enforcement.

Infrastructure-Level Log Isolation

Your logging pipeline itself can be a GDPR compliance risk. If application logs flow through a US-incorporated logging SaaS (Datadog, Splunk, Logtail on US infrastructure), personal data in logs is subject to US CLOUD Act jurisdiction — meaning US law enforcement can compel the logging provider to disclose log contents without notifying the EU data subject.

This is not theoretical: the Austrian DSB ruled in January 2022 that using Google Analytics (data flows to US servers) violated GDPR Article 46 transfer rules. The same logic applies to any personal data processed on US infrastructure, including logs.

For applications processing significant EU personal data, the Article 25 response is to route logs to EU-incorporated infrastructure with no US parent company — where CLOUD Act jurisdiction does not reach.

Layer 4: Authentication and Session Management

Authentication systems are high-risk for Article 25 non-compliance because they inherently process identity data, and default configurations often favour convenience over privacy.

Federated Authentication: Privacy by Default

Federated authentication (OAuth 2.0 / OIDC with EU-incorporated identity providers) reduces the personal data your system must store:

// ❌ Custom auth: storing sensitive identity data in your database
// You store: email, hashed password, email verification tokens,
// password reset tokens, MFA secrets, session tokens — all personal data

// ✅ Federated auth: minimal personal data stored locally
// You receive: sub (opaque identifier), email (if requested and needed)
// You store: sub, profile completion data — identity verification delegated

// With EU-incorporated OIDC provider (e.g., Keycloak self-hosted on EU PaaS):
// - Authentication handled by the IdP
// - Your system stores only the opaque 'sub' claim
// - No passwords, no email verification tokens to manage
// - Breach of your database reveals no authenticable credentials

Federated authentication with EU-incorporated identity providers also reduces third-country transfer risk: if you use a US-incorporated IdP (Auth0, Okta, Cognito), the authentication flow sends user credentials to US infrastructure.

Session Token Design

// ❌ Long-lived sessions by default
const sessionConfig = {
  maxAge: 30 * 24 * 60 * 60 * 1000,  // 30 days by default
  rolling: true,                       // Extends on activity
}

// ✅ Privacy by default: shorter sessions, explicit extension
const sessionConfig = {
  maxAge: 24 * 60 * 60 * 1000,  // 24 hours by default
  rolling: false,                 // Explicit re-authentication required
  // User can opt-in to "remember me" (30 days) — not default
}

Article 25(2) requires that the "period of storage" be limited to what is necessary. Session lifetime is a storage period. A 30-day rolling session is hard to justify for a SaaS with no continuous usage requirement.

Layer 5: Access Control — Privacy by Default for Internal Systems

Article 25(2) requires that by default, personal data is not accessible to "an indefinite number of natural persons." In internal systems, this means role-based access control (RBAC) is not optional — it is an Article 25 requirement.

Least-Privilege by Default

// Define roles with least-privilege defaults
enum Role {
  USER = 'user',
  SUPPORT = 'support',
  ANALYST = 'analyst',
  ADMIN = 'admin',
}

const permissions = {
  [Role.USER]: ['read:own_data', 'write:own_data'],
  [Role.SUPPORT]: ['read:user_profile', 'read:support_tickets'], // No payment data
  [Role.ANALYST]: ['read:aggregated_analytics'],                  // No individual-level data
  [Role.ADMIN]: ['read:all', 'write:all', 'delete:all'],
}

// ❌ Fails Art. 25(2): all authenticated staff get admin-level access by default
// ✅ Compliant: new staff assigned USER role by default, elevated by explicit grant

The "by default" requirement means the default role for new accounts must be the most restricted. Privilege escalation requires affirmative action, not the reverse.

Data Export Controls

Article 25 applies to "accessibility" — who can export or download personal data. Admin interfaces that allow bulk export of user data without logging or approval create Article 25 risk:

// Compliant data export: logged, rate-limited, restricted by role
async function exportUserData(requestedBy: string, targetUserId: string) {
  // Check authorisation
  if (!hasPermission(requestedBy, 'export:user_data')) {
    throw new UnauthorizedError('Insufficient permissions for data export')
  }

  // Audit log — who exported what, when
  await auditLog.record({
    action: 'data_export',
    requestedBy,
    targetUserId: pseudonymise(targetUserId, AUDIT_SALT),
    timestamp: new Date(),
  })

  // Rate limit: max 100 exports per day per admin
  await rateLimit.check(`export:${requestedBy}`, { limit: 100, window: '1d' })

  return getUserData(targetUserId)
}

Layer 6: Infrastructure Jurisdiction as a Privacy by Design Decision

This is the dimension that most PbD frameworks underemphasise: where your infrastructure is incorporated determines which government can compel access to data processed on it.

GDPR Article 25 requires appropriate technical measures for data protection. The CLOUD Act of 2018 (18 U.S.C. § 2713) requires US-incorporated cloud providers to disclose stored data to US law enforcement upon a warrant — regardless of where the data physically resides. This applies to AWS, Azure, Google Cloud, Cloudflare, Supabase (US-incorporated), Vercel, Render, Railway, and most mainstream PaaS providers.

The structural issue: running GDPR-compliant applications on US-incorporated infrastructure creates a dual-jurisdiction architecture where EU data protection authorities and US law enforcement have simultaneous theoretical access to the same data — under incompatible legal frameworks.

The Art. 25 Infrastructure Checklist

For applications processing significant EU personal data volumes, the infrastructure jurisdiction question is an Article 25 technical measure decision:

Infrastructure Layer	Art. 25 Consideration	EU-Native Alternative
Compute (PaaS)	CLOUD Act jurisdiction if US-incorporated	EU-incorporated PaaS (e.g., sota.io)
Database	Data location + provider jurisdiction	Managed PostgreSQL on EU PaaS
CDN	Edge cache locations + provider jurisdiction	EU-only edge nodes
Analytics	User behaviour data jurisdiction	Self-hosted Plausible, Matomo (EU servers)
Email delivery	Transactional email processing jurisdiction	EU-incorporated email provider
Logging	Log data jurisdiction	EU-hosted log aggregation
Authentication	Identity data jurisdiction	Self-hosted Keycloak or EU OIDC provider

Deploying the compute layer on EU-incorporated PaaS — where the provider has no US parent company and is not subject to CLOUD Act jurisdiction — eliminates the dual-jurisdiction problem at the infrastructure level. This is a Privacy by Design decision: structural elimination of a risk, rather than legal mitigation of it.

Schrems II and Infrastructure Selection

The CJEU's Schrems II ruling (C-311/18, July 2020) invalidated Privacy Shield and established that supplementary measures (encryption, pseudonymisation) alone may not be sufficient to protect personal data transferred to the US if US intelligence access obligations can override them. EDPB Recommendations 01/2020 on supplementary measures explicitly identifies the scenario where encrypted data can be accessed by the US provider in decrypted form as insufficient.

For Art. 25 purposes, the clean solution is not supplementary measures on top of US infrastructure — it is infrastructure that is not subject to US jurisdiction at all. EU-incorporated PaaS providers are not subject to FISA Section 702 (50 U.S.C. § 1881a) or CLOUD Act obligations.

Article 25 and the DPIA Trigger

Article 35 requires a Data Protection Impact Assessment (DPIA) for high-risk processing. Article 25 technical measures determine whether processing crosses into high-risk territory and what the DPIA must document.

Specifically, DPIA mandatory triggers include:

Systematic and extensive evaluation of personal aspects using automated processing (Art. 35(3)(a)) — e.g., AI-driven credit scoring, recruitment screening
Large-scale processing of special categories (Art. 35(3)(b)) — health data, biometric data, political opinions
Systematic monitoring of publicly accessible areas (Art. 35(3)(c)) — surveillance

Art. 25 technical measures (pseudonymisation, minimisation, separation of purposes) can reduce processing from "high risk" to standard risk — which changes the DPIA requirement. Well-designed PbD can be the difference between mandatory DPIA and optional DPIA for borderline processing activities.

Practical Article 25 Implementation Checklist

At design stage (before first line of code):

Data flow map: what personal data enters the system, where it goes, who can access it
Processing purpose register: one entry per purpose, with legal basis and data category
Mandatory vs optional fields: everything optional that is not strictly necessary for the core service
Default configurations: opted-out for all secondary processing, minimal session duration
Infrastructure selection: EU-incorporated PaaS for compute, database, logging, CDN
Retention periods defined: different TTL per data category, enforced technically

At implementation stage:

API input validation: only required fields accepted per endpoint
Database schema: pseudonymised identifiers for analytics/logging tables
Automated deletion: pg_cron or scheduled job for retention enforcement
RBAC: least-privilege default role, explicit escalation
Logging: no direct identifiers, pseudonymous user references, regional aggregation for IP
Session management: minimal default lifetime, explicit opt-in for extended sessions

At deployment stage:

Infrastructure jurisdiction confirmed: EU-incorporated for all layers handling personal data
Access control audit: no admin-level default access for new accounts
Penetration test for data exposure: verify no personal data in error responses or logs
ROPA entry created: maps to each processing purpose identified at design stage
DPIA assessment: confirm no mandatory DPIA triggers, or complete DPIA if triggered

Enforcement Landscape: Article 25 in DPA Decisions

Article 25 enforcement has accelerated since 2023. Notable decisions:

Meta (Irish DPC, Nov 2023, €1.2B): Data transfer violations with Art. 25 technical measure deficiencies cited
Amazon (Luxemburg CNPD, Jul 2021, €746M): Cookie consent default settings — failed Art. 25(2) requirement for opt-out default
Google Analytics (Austrian DSB, Jan 2022; French CNIL, Jan 2022): Infrastructure jurisdiction as Art. 46 transfer violation — directly relevant to Art. 25 infrastructure decisions
TikTok (Irish DPC, Sep 2023, €345M): Default privacy settings for minors — Art. 25(2) failure on default accessibility

The pattern: DPAs are enforcing not just documented privacy policies but the actual technical default state of systems. "By default" means the first state a user encounters, not the state after they navigate a settings menu.

Summary

Article 25 is a technical requirement, not a legal one. The obligation falls on the engineers and architects who determine how data is collected, stored, processed, and accessed — not just on the legal team that drafts the privacy policy.

The five implementation layers:

API design: Collect only what the purpose requires. Make everything else optional. Default consent to off.
Database schema: Pseudonymise at the schema level. Separate schemas per processing purpose. Automated retention enforcement.
Logging: No direct identifiers. Pseudonymous references. Structured retention by log category.
Authentication: Federated auth reduces identity data storage. Short sessions by default. Least-privilege RBAC.
Infrastructure: EU-incorporated PaaS eliminates CLOUD Act jurisdiction conflict — the structural PbD solution at the infrastructure layer.

Privacy by Design is easier to implement correctly at the design stage than to retrofit onto a running system. Every field added to a data model, every logging call, every default configuration setting is an Article 25 decision. The question is whether those decisions are made deliberately or by accident.