GDPR Article 25: Privacy by Design and by Default — The Developer's Implementation Guide
GDPR Article 25 is one of the most technically demanding provisions of the General Data Protection Regulation, yet it is routinely reduced to a compliance checkbox in DPA agreements and privacy policies. For developers actually building systems that process personal data of EU residents, Article 25 is an engineering requirement — it mandates specific technical and organisational measures at the design stage, not as an afterthought.
This guide covers what Article 25 actually requires, how it translates into concrete implementation decisions across your stack, and why infrastructure jurisdiction is a structural component of Privacy by Design compliance.
What Article 25 Actually Says
Article 25 contains two distinct obligations:
Article 25(1) — Privacy by Design (PbD): The controller shall, both at the time of the determination of the means for processing and at the time of the processing itself, implement appropriate technical and organisational measures which are designed to implement data protection principles effectively and to integrate the necessary safeguards into the processing.
Article 25(2) — Privacy by Default (PbDf): The controller shall implement appropriate technical and organisational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed. That obligation applies to the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility.
The key distinction: PbD is about building privacy into the system architecture from the start. PbDf is about the default configuration — what happens when a user does nothing. If your API collects more data than minimally necessary by default, or if your system stores data indefinitely by default, or if your admin interface exposes all user data to all staff by default, you fail Article 25(2) regardless of your PbD measures.
Article 25 in Context: The GDPR Data Protection Principles
Article 25 exists to operationalise the core data protection principles from Article 5. Understanding the link makes implementation clearer:
| Article 5 Principle | Article 25 Engineering Requirement |
|---|---|
| Lawfulness, fairness, transparency (Art. 5(1)(a)) | Audit trails; data processing maps; consent management at API layer |
| Purpose limitation (Art. 5(1)(b)) | Separate data stores per processing purpose; no cross-contamination of datasets |
| Data minimisation (Art. 5(1)(c)) | Collect only required fields; nullable rather than mandatory for non-essential data |
| Accuracy (Art. 5(1)(d)) | Update mechanisms; validation at input; no stale data propagation |
| Storage limitation (Art. 5(1)(e)) | Automated retention policies; scheduled deletion jobs; TTL on sensitive fields |
| Integrity and confidentiality (Art. 5(1)(f)) | Encryption at rest and in transit; pseudonymisation; access control |
| Accountability (Art. 5(2)) | Documentation of technical measures; DPIA for high-risk processing |
Article 25 is not a separate compliance track — it is the engineering implementation of these seven principles into your system's technical architecture.
Layer 1: API Design and Data Collection
The API layer is where data minimisation starts. Every field you collect creates a processing obligation under GDPR. Article 25 requires you to design APIs that collect the minimum data necessary for the stated purpose.
Data Minimisation at the Request Level
// ❌ Non-compliant: collecting more than necessary for newsletter signup
interface NewsletterSignupRequest {
email: string
firstName: string
lastName: string
dateOfBirth: string // Not needed for newsletter
phoneNumber: string // Not needed for newsletter
address: string // Not needed for newsletter
gender: string // Special category data — explicit consent required
}
// ✅ Compliant: minimal data for the stated purpose
interface NewsletterSignupRequest {
email: string // Required — the service delivery mechanism
firstName?: string // Optional — personalisation only if user provides it
}
The principle: if you cannot explain why you need a specific data point for the specific processing purpose, do not collect it. This is not just a GDPR requirement — it reduces your attack surface, simplifies your data model, and reduces breach impact.
Nullable vs Mandatory: Default Privacy
Article 25(2) requires that by default, only data necessary for each specific purpose is processed. In API terms, this means non-essential fields must be optional, not required — and your system must function correctly without them.
// ❌ Fails Art. 25(2): all fields mandatory by default
const schema = z.object({
email: z.string().email(),
phone: z.string(), // Mandatory when not needed for core service
marketingConsent: z.boolean().default(true), // Opted-in by default
})
// ✅ Compliant: minimal mandatory, optional for non-essential
const schema = z.object({
email: z.string().email(),
phone: z.string().optional(),
marketingConsent: z.boolean().default(false), // Opted-out by default
})
The marketingConsent: false default is directly required by Article 25(2). Consent for secondary processing purposes must not be pre-ticked or defaulted to true — this is consistently enforced across EU DPAs (French CNIL, German DSK, Irish DPC).
Purpose-Bound API Endpoints
Separate API endpoints for separate processing purposes. This enforces purpose limitation at the architectural level and makes data processing maps accurate:
// POST /auth/register — identity verification only
// POST /newsletter/subscribe — marketing communications only
// POST /analytics/events — usage telemetry (if consent given)
// POST /support/tickets — customer support processing only
Mixing processing purposes in a single endpoint makes it structurally impossible to demonstrate purpose limitation. Data minimisation and purpose separation also simplify your Article 30 Record of Processing Activities (ROPA) — each endpoint maps to one or few processing activities.
Layer 2: Database Schema Design
Database schema decisions made at the design stage are the most expensive to change retroactively. Article 25 explicitly applies "at the time of the determination of the means for processing" — meaning schema design is in scope.
Pseudonymisation at the Schema Level
GDPR Article 4(5) defines pseudonymisation as processing personal data such that it can no longer be attributed to a specific individual without additional information. Article 25 specifically references pseudonymisation as a relevant technical measure.
-- ❌ Non-compliant: direct identifier in analytics events
CREATE TABLE analytics_events (
id UUID PRIMARY KEY,
user_email VARCHAR, -- Direct identifier — links event to individual
event_type VARCHAR,
created_at TIMESTAMPTZ
);
-- ✅ Pseudonymised: internal ID separates identity from behaviour
CREATE TABLE users (
id UUID PRIMARY KEY,
email VARCHAR UNIQUE NOT NULL, -- Identity data — restricted access
email_hash VARCHAR GENERATED ALWAYS AS (encode(sha256(email::bytea), 'hex')) STORED
);
CREATE TABLE analytics_events (
id UUID PRIMARY KEY,
user_id UUID NOT NULL REFERENCES users(id), -- Pseudonymous reference
event_type VARCHAR,
created_at TIMESTAMPTZ
);
-- Separate the key: analytics team cannot link user_id to email without JOIN privilege
The architecture separates the identity store (users table, restricted access) from the behavioural data (analytics_events, broader read access). This is pseudonymisation by design: the events table alone does not identify individuals.
Storage Limitation: Automated Retention
Article 5(1)(e) and Article 25 together require that data is not stored longer than necessary. This must be enforced technically, not just in a privacy policy:
-- Automated deletion via pg_cron or scheduled job
-- Example: delete user sessions older than 30 days
CREATE OR REPLACE FUNCTION delete_expired_sessions()
RETURNS void LANGUAGE plpgsql AS $$
BEGIN
DELETE FROM user_sessions WHERE created_at < NOW() - INTERVAL '30 days';
DELETE FROM audit_logs WHERE created_at < NOW() - INTERVAL '1 year';
DELETE FROM analytics_events WHERE created_at < NOW() - INTERVAL '2 years';
END;
$$;
-- Schedule: daily at 02:00 UTC
SELECT cron.schedule('delete-expired-sessions', '0 2 * * *', 'SELECT delete_expired_sessions()');
Retention periods must match your privacy policy and ROPA. They should be differentiated by data category — session tokens (hours/days), transaction records (7 years for tax purposes), marketing interactions (consent lifetime + reasonable period).
Separate Schemas for Separate Purposes
Database-level separation of processing purposes:
-- Schema separation enforces purpose limitation
CREATE SCHEMA identity; -- Users, authentication, account data
CREATE SCHEMA commerce; -- Orders, payments, invoices
CREATE SCHEMA analytics; -- Anonymised/pseudonymised usage data
CREATE SCHEMA support; -- Support tickets, communications
-- Role-based access: analytics team cannot read identity schema
GRANT USAGE ON SCHEMA analytics TO analytics_role;
REVOKE USAGE ON SCHEMA identity FROM analytics_role;
This makes it structurally impossible for analytics processing to contaminate identity data — enforcing purpose limitation at the database access control layer.
Layer 3: Logging and Monitoring
Logging is one of the most frequent Article 25 failure points. Application logs routinely capture personal data — IP addresses, email addresses, user-agent strings, query parameters — without any data minimisation or retention policy.
What Counts as Personal Data in Logs
Under GDPR, personal data is any information relating to an identified or identifiable natural person. In logs:
- Direct identifiers: Email addresses, usernames, national ID numbers
- Quasi-identifiers: IP addresses (identifiable with ISP cooperation — CJEU C-582/14 Breyer), device fingerprints, session tokens
- Indirect identifiers: User IDs (if linked to identifiable persons in another system)
- Behavioural data: Specific sequences of actions attributable to an individual
// ❌ Non-compliant: full personal data in application logs
logger.info({
event: 'user_login',
email: user.email, // Direct identifier in logs
ip: req.ip, // Quasi-identifier
userAgent: req.headers['user-agent'],
userId: user.id,
})
// ✅ Privacy-by-design logging: pseudonymous identifiers only
import { createHash } from 'crypto'
const pseudonymise = (value: string, salt: string): string =>
createHash('sha256').update(value + salt).digest('hex').slice(0, 16)
logger.info({
event: 'user_login',
userRef: pseudonymise(user.id, process.env.LOG_PSEUDONYM_SALT!), // Pseudonymous
ipRegion: getRegionFromIp(req.ip), // Aggregated — not individually identifiable
// No email, no full IP, no user-agent
})
Log Retention and Structured Retention Policies
// Structured log retention: different TTLs for different log types
const logConfig = {
security_events: { retention: '1 year', justification: 'security incident investigation' },
access_logs: { retention: '90 days', justification: 'operational debugging' },
application_errors: { retention: '30 days', justification: 'bug investigation' },
analytics_events: { retention: '2 years', justification: 'product improvement (pseudonymised)' },
}
Document this in your ROPA. EU DPAs increasingly audit log retention as part of Article 25 enforcement.
Infrastructure-Level Log Isolation
Your logging pipeline itself can be a GDPR compliance risk. If application logs flow through a US-incorporated logging SaaS (Datadog, Splunk, Logtail on US infrastructure), personal data in logs is subject to US CLOUD Act jurisdiction — meaning US law enforcement can compel the logging provider to disclose log contents without notifying the EU data subject.
This is not theoretical: the Austrian DSB ruled in January 2022 that using Google Analytics (data flows to US servers) violated GDPR Article 46 transfer rules. The same logic applies to any personal data processed on US infrastructure, including logs.
For applications processing significant EU personal data, the Article 25 response is to route logs to EU-incorporated infrastructure with no US parent company — where CLOUD Act jurisdiction does not reach.
Layer 4: Authentication and Session Management
Authentication systems are high-risk for Article 25 non-compliance because they inherently process identity data, and default configurations often favour convenience over privacy.
Federated Authentication: Privacy by Default
Federated authentication (OAuth 2.0 / OIDC with EU-incorporated identity providers) reduces the personal data your system must store:
// ❌ Custom auth: storing sensitive identity data in your database
// You store: email, hashed password, email verification tokens,
// password reset tokens, MFA secrets, session tokens — all personal data
// ✅ Federated auth: minimal personal data stored locally
// You receive: sub (opaque identifier), email (if requested and needed)
// You store: sub, profile completion data — identity verification delegated
// With EU-incorporated OIDC provider (e.g., Keycloak self-hosted on EU PaaS):
// - Authentication handled by the IdP
// - Your system stores only the opaque 'sub' claim
// - No passwords, no email verification tokens to manage
// - Breach of your database reveals no authenticable credentials
Federated authentication with EU-incorporated identity providers also reduces third-country transfer risk: if you use a US-incorporated IdP (Auth0, Okta, Cognito), the authentication flow sends user credentials to US infrastructure.
Session Token Design
// ❌ Long-lived sessions by default
const sessionConfig = {
maxAge: 30 * 24 * 60 * 60 * 1000, // 30 days by default
rolling: true, // Extends on activity
}
// ✅ Privacy by default: shorter sessions, explicit extension
const sessionConfig = {
maxAge: 24 * 60 * 60 * 1000, // 24 hours by default
rolling: false, // Explicit re-authentication required
// User can opt-in to "remember me" (30 days) — not default
}
Article 25(2) requires that the "period of storage" be limited to what is necessary. Session lifetime is a storage period. A 30-day rolling session is hard to justify for a SaaS with no continuous usage requirement.
Layer 5: Access Control — Privacy by Default for Internal Systems
Article 25(2) requires that by default, personal data is not accessible to "an indefinite number of natural persons." In internal systems, this means role-based access control (RBAC) is not optional — it is an Article 25 requirement.
Least-Privilege by Default
// Define roles with least-privilege defaults
enum Role {
USER = 'user',
SUPPORT = 'support',
ANALYST = 'analyst',
ADMIN = 'admin',
}
const permissions = {
[Role.USER]: ['read:own_data', 'write:own_data'],
[Role.SUPPORT]: ['read:user_profile', 'read:support_tickets'], // No payment data
[Role.ANALYST]: ['read:aggregated_analytics'], // No individual-level data
[Role.ADMIN]: ['read:all', 'write:all', 'delete:all'],
}
// ❌ Fails Art. 25(2): all authenticated staff get admin-level access by default
// ✅ Compliant: new staff assigned USER role by default, elevated by explicit grant
The "by default" requirement means the default role for new accounts must be the most restricted. Privilege escalation requires affirmative action, not the reverse.
Data Export Controls
Article 25 applies to "accessibility" — who can export or download personal data. Admin interfaces that allow bulk export of user data without logging or approval create Article 25 risk:
// Compliant data export: logged, rate-limited, restricted by role
async function exportUserData(requestedBy: string, targetUserId: string) {
// Check authorisation
if (!hasPermission(requestedBy, 'export:user_data')) {
throw new UnauthorizedError('Insufficient permissions for data export')
}
// Audit log — who exported what, when
await auditLog.record({
action: 'data_export',
requestedBy,
targetUserId: pseudonymise(targetUserId, AUDIT_SALT),
timestamp: new Date(),
})
// Rate limit: max 100 exports per day per admin
await rateLimit.check(`export:${requestedBy}`, { limit: 100, window: '1d' })
return getUserData(targetUserId)
}
Layer 6: Infrastructure Jurisdiction as a Privacy by Design Decision
This is the dimension that most PbD frameworks underemphasise: where your infrastructure is incorporated determines which government can compel access to data processed on it.
GDPR Article 25 requires appropriate technical measures for data protection. The CLOUD Act of 2018 (18 U.S.C. § 2713) requires US-incorporated cloud providers to disclose stored data to US law enforcement upon a warrant — regardless of where the data physically resides. This applies to AWS, Azure, Google Cloud, Cloudflare, Supabase (US-incorporated), Vercel, Render, Railway, and most mainstream PaaS providers.
The structural issue: running GDPR-compliant applications on US-incorporated infrastructure creates a dual-jurisdiction architecture where EU data protection authorities and US law enforcement have simultaneous theoretical access to the same data — under incompatible legal frameworks.
The Art. 25 Infrastructure Checklist
For applications processing significant EU personal data volumes, the infrastructure jurisdiction question is an Article 25 technical measure decision:
| Infrastructure Layer | Art. 25 Consideration | EU-Native Alternative |
|---|---|---|
| Compute (PaaS) | CLOUD Act jurisdiction if US-incorporated | EU-incorporated PaaS (e.g., sota.io) |
| Database | Data location + provider jurisdiction | Managed PostgreSQL on EU PaaS |
| CDN | Edge cache locations + provider jurisdiction | EU-only edge nodes |
| Analytics | User behaviour data jurisdiction | Self-hosted Plausible, Matomo (EU servers) |
| Email delivery | Transactional email processing jurisdiction | EU-incorporated email provider |
| Logging | Log data jurisdiction | EU-hosted log aggregation |
| Authentication | Identity data jurisdiction | Self-hosted Keycloak or EU OIDC provider |
Deploying the compute layer on EU-incorporated PaaS — where the provider has no US parent company and is not subject to CLOUD Act jurisdiction — eliminates the dual-jurisdiction problem at the infrastructure level. This is a Privacy by Design decision: structural elimination of a risk, rather than legal mitigation of it.
Schrems II and Infrastructure Selection
The CJEU's Schrems II ruling (C-311/18, July 2020) invalidated Privacy Shield and established that supplementary measures (encryption, pseudonymisation) alone may not be sufficient to protect personal data transferred to the US if US intelligence access obligations can override them. EDPB Recommendations 01/2020 on supplementary measures explicitly identifies the scenario where encrypted data can be accessed by the US provider in decrypted form as insufficient.
For Art. 25 purposes, the clean solution is not supplementary measures on top of US infrastructure — it is infrastructure that is not subject to US jurisdiction at all. EU-incorporated PaaS providers are not subject to FISA Section 702 (50 U.S.C. § 1881a) or CLOUD Act obligations.
Article 25 and the DPIA Trigger
Article 35 requires a Data Protection Impact Assessment (DPIA) for high-risk processing. Article 25 technical measures determine whether processing crosses into high-risk territory and what the DPIA must document.
Specifically, DPIA mandatory triggers include:
- Systematic and extensive evaluation of personal aspects using automated processing (Art. 35(3)(a)) — e.g., AI-driven credit scoring, recruitment screening
- Large-scale processing of special categories (Art. 35(3)(b)) — health data, biometric data, political opinions
- Systematic monitoring of publicly accessible areas (Art. 35(3)(c)) — surveillance
Art. 25 technical measures (pseudonymisation, minimisation, separation of purposes) can reduce processing from "high risk" to standard risk — which changes the DPIA requirement. Well-designed PbD can be the difference between mandatory DPIA and optional DPIA for borderline processing activities.
Practical Article 25 Implementation Checklist
At design stage (before first line of code):
- Data flow map: what personal data enters the system, where it goes, who can access it
- Processing purpose register: one entry per purpose, with legal basis and data category
- Mandatory vs optional fields: everything optional that is not strictly necessary for the core service
- Default configurations: opted-out for all secondary processing, minimal session duration
- Infrastructure selection: EU-incorporated PaaS for compute, database, logging, CDN
- Retention periods defined: different TTL per data category, enforced technically
At implementation stage:
- API input validation: only required fields accepted per endpoint
- Database schema: pseudonymised identifiers for analytics/logging tables
- Automated deletion: pg_cron or scheduled job for retention enforcement
- RBAC: least-privilege default role, explicit escalation
- Logging: no direct identifiers, pseudonymous user references, regional aggregation for IP
- Session management: minimal default lifetime, explicit opt-in for extended sessions
At deployment stage:
- Infrastructure jurisdiction confirmed: EU-incorporated for all layers handling personal data
- Access control audit: no admin-level default access for new accounts
- Penetration test for data exposure: verify no personal data in error responses or logs
- ROPA entry created: maps to each processing purpose identified at design stage
- DPIA assessment: confirm no mandatory DPIA triggers, or complete DPIA if triggered
Enforcement Landscape: Article 25 in DPA Decisions
Article 25 enforcement has accelerated since 2023. Notable decisions:
- Meta (Irish DPC, Nov 2023, €1.2B): Data transfer violations with Art. 25 technical measure deficiencies cited
- Amazon (Luxemburg CNPD, Jul 2021, €746M): Cookie consent default settings — failed Art. 25(2) requirement for opt-out default
- Google Analytics (Austrian DSB, Jan 2022; French CNIL, Jan 2022): Infrastructure jurisdiction as Art. 46 transfer violation — directly relevant to Art. 25 infrastructure decisions
- TikTok (Irish DPC, Sep 2023, €345M): Default privacy settings for minors — Art. 25(2) failure on default accessibility
The pattern: DPAs are enforcing not just documented privacy policies but the actual technical default state of systems. "By default" means the first state a user encounters, not the state after they navigate a settings menu.
Summary
Article 25 is a technical requirement, not a legal one. The obligation falls on the engineers and architects who determine how data is collected, stored, processed, and accessed — not just on the legal team that drafts the privacy policy.
The five implementation layers:
- API design: Collect only what the purpose requires. Make everything else optional. Default consent to off.
- Database schema: Pseudonymise at the schema level. Separate schemas per processing purpose. Automated retention enforcement.
- Logging: No direct identifiers. Pseudonymous references. Structured retention by log category.
- Authentication: Federated auth reduces identity data storage. Short sessions by default. Least-privilege RBAC.
- Infrastructure: EU-incorporated PaaS eliminates CLOUD Act jurisdiction conflict — the structural PbD solution at the infrastructure layer.
Privacy by Design is easier to implement correctly at the design stage than to retrofit onto a running system. Every field added to a data model, every logging call, every default configuration setting is an Article 25 decision. The question is whether those decisions are made deliberately or by accident.