Deploy Coccinelle to Europe β Julia Lawall π«π· (INRIA Paris), the Semantic Patch Engine Behind 6000+ Linux Kernel Security Fixes, on EU Infrastructure in 2026
The Linux kernel is thirty million lines of C. When a security researcher discovers that kmalloc leaves memory uninitialized β allowing an attacker to read stale heap contents via a kernel information leak β the fix is not one line. The same pattern appears across thousands of call sites scattered through hundreds of drivers and subsystems. A human maintainer reviewing and patching each instance individually would take months and inevitably miss cases. A naive sed replacement would corrupt the code. The problem requires something that understands C structure, not just text.
Coccinelle is that tool. It is a semantic patch engine for C code: a domain-specific language and engine that matches structural patterns in C programs β not strings, but parsed syntax trees β and applies coordinated transformations across entire codebases. A single Coccinelle semantic patch can safely replace every instance of a vulnerable pattern across the full Linux kernel source in seconds.
Julia Lawall π«π· designed and has led Coccinelle development at INRIA Paris (Gallium/Whisper group, later Cambium) since its inception in the mid-2000s, with Gilles Muller π«π· (LIP6 / Sorbonne UniversitΓ© π«π·) as co-author of the foundational work. The formal introduction appeared as "Coccinelle: Tool Support for Automated CERT C Secure Coding Standard Enforcement" β Lawall, Muller, Padioleau, Meunier, Hansen, Ugurlu, and Reveillere β in the Journal of Empirical Software Engineering (EMSE 2009). Over 6000 commits in the Linux kernel git history trace back to Coccinelle-generated patches. The kernel's own tree includes a scripts/coccinelle/ directory with over 130 semantic patches maintained by the kernel community and invoked by make coccicheck.
The Semantic Patch Language (SmPL)
The core of Coccinelle is SmPL β the Semantic Patch Language. SmPL rules look like diff hunks but operate on abstract syntax trees. A rule consists of a metavariable declaration block followed by a transformation block using - for deletions and + for additions.
Example 1: kmalloc β kzalloc (Information Leak Elimination)
A classical Linux kernel security fix: kmalloc allocates uninitialized memory. If the kernel then copies this buffer to userspace without fully initializing every field, stale heap content leaks. kzalloc zero-initializes the allocation, eliminating the information disclosure:
@@
expression E1, E2;
@@
- kmalloc(E1, E2)
+ kzalloc(E1, E2)
This rule matches every kmalloc(...) call with any two arguments and replaces it with kzalloc(...). The metavariable expression E1, E2 binds to arbitrary C expressions β not just identifiers or constants but any valid C expression, including function calls, pointer arithmetic, and casts. The engine parses the C source into an AST, finds all matching call sites, and emits a unified diff. Running spatch --sp-file kmalloc_to_kzalloc.cocci --dir drivers/net/ --jobs 4 processes all .c files in the directory tree using four parallel threads.
The semantic patch engine understands that kmalloc(sizeof(*ptr), GFP_KERNEL) and kmalloc((sizeof(struct foo)), flags) are both matches β it works on expressions, not strings. A text sed replacement would corrupt calls like devm_kmalloc or kmalloc_array.
Example 2: Null Pointer Dereference (CWE-476)
A common Linux driver pattern: platform_get_resource() returns NULL on failure, but early code forgets to check:
@@
expression dev, res, r, flags;
@@
res = platform_get_resource(dev, flags, r);
+ if (res == NULL) return -EINVAL;
...
devm_ioremap(&dev->dev, res->start, resource_size(res));
This rule adds a null check after every unchecked platform_get_resource call where res->start is subsequently dereferenced. The ... (ellipsis) in SmPL matches any sequence of statements between the two anchors β it handles arbitrary code between the allocation and the use, as long as res is not reassigned in between.
Example 3: Lock/Unlock Symmetry (CWE-667)
Resource release must be symmetric. A lock acquired in one branch must be released in all exit paths:
@@
expression lock;
@@
spin_lock(lock);
...
if (...) {
+ spin_unlock(lock);
return ...;
}
...
spin_unlock(lock);
SmPL's ... with branching awareness finds execution paths where spin_unlock is missing before an early return. This class of fix appears hundreds of times in kernel history β drivers that acquired a spinlock, hit an error path, and returned without releasing.
Example 4: Memory Leak on Error Path (CWE-401)
@@
expression E, flags;
identifier ptr;
@@
ptr = kmalloc(E, flags);
if (ptr == NULL) return -ENOMEM;
...
if (...) {
+ kfree(ptr);
return -...;
}
Coccinelle finds paths where an error return after successful allocation fails to kfree the allocated buffer. This pattern, applied across the kernel's thousands of probe() functions, has contributed hundreds of memory-leak fixes.
How the Engine Works
Coccinelle's analysis pipeline:
C source files
β
βΌ
C Parser (coccilib: written in OCaml)
β Parses C including GCC extensions (statement expressions, __attribute__, typeof)
βΌ
Control Flow Graph (CFG)
β Each function β CFG with explicit join points
βΌ
SmPL matching engine
β Pattern β NFA over CFG edges; metavariables bound by unification
β Isomorphisms: commutative operators, equivalent pointer/member notation
βΌ
Binding environment
β Metavariables β concrete AST subtrees
βΌ
Transformation + pretty-printer
β Emits patch preserving original formatting/comments
βΌ
Unified diff output
The engine processes C including GCC extensions that are pervasive in the Linux kernel: __attribute__((packed)), __typeof__, statement expressions ({...}), and variadic macros. This is non-trivial β naive C parsers fail on kernel source. Coccinelle's OCaml-based parser handles the full dialect.
Isomorphisms are a key feature: the engine knows that a->b and (*a).b are equivalent, that !a and a == 0 are semantically identical for pointer contexts, and that A && B patterns match B && A unless order matters. This prevents false negatives where a vulnerable pattern is written with a syntactically different but semantically equivalent form.
Python scripting via coccilib enables rules that cannot be expressed as pure structural patterns. A Python script embedded in a Coccinelle rule can inspect metavariable bindings, perform arithmetic on offsets, look up symbol tables, or emit custom diagnostics. This powers complex refactorings like "rename all occurrences of an error code to a new constant, but only in the affected subsystem."
Scale: The Linux Kernel
The Linux kernel's scripts/coccinelle/ directory (introduced by Lawall's collaboration with the kernel community) contains over 130 semantic patches organized by category:
api/β API evolution patches (deprecated function replacements)free/β Memory leak detection (double-free, missing kfree on error paths)locks/β Lock/unlock symmetry checksnull/β Null pointer dereference patternsmisc/β Miscellaneous patterns (return value checks, iterator correctness)
make coccicheck runs all 130+ patches against the full kernel source. Kernel maintainers run it in CI and require contributors to verify their drivers pass before merging. The result: a systematic, machine-enforced baseline of security properties across the entire kernel tree.
Julia Lawall's research group measured that Coccinelle has contributed patches that fix patterns across 30+ million lines of code, with a false-positive rate below 5% for well-written semantic patches β comparable to the precision of specialized static analyzers but with the generality of a transformation language rather than a fixed rule set.
Beyond the Linux kernel, Coccinelle is used by:
- OpenSSL (EU mirror: OpenSSL Foundation π©πͺ relocated to Berlin) β crypto library pattern checks
- QEMU β device emulation C codebase (Red Hat + Intel contributors)
- FreeBSD kernel β imported subset of Linux Coccinelle patches
- Eclipse CDT β C/C++ toolchain
- Major European embedded software vendors for IEC 61508 / ISO 26262 compliance workflows
EU Compliance Angles
CRA 2027 β Cyber Resilience Act
The EU Cyber Resilience Act (expected enforcement 2027) requires CE-marked products to be "designed, developed and produced" with security in mind and to document systematic vulnerability handling. Coccinelle addresses this directly:
- CWE-908 (Use of Uninitialized Memory):
kmallocβkzallocsemantic patches eliminate this CWE at the source, across entire codebases, with a verifiable audit trail in git history. - CWE-476 (NULL Pointer Dereference): Null-check insertion rules are CWE-476 remediation at scale.
- CWE-401 (Missing Memory Release): Error-path
kfreepatches document and remediate resource leaks systematically. - CWE-667 (Improper Locking): Lock symmetry rules document verified locking contracts.
For CRA compliance, make coccicheck output in a CI pipeline constitutes evidence of systematic vulnerability class elimination β which is exactly the "appropriate and proportionate technical measures" the Regulation requires.
NIS2 β Network and Information Security Directive
Article 21 of NIS2 requires operators of essential services and digital service providers to implement "policies and procedures for assessing the effectiveness of cybersecurity risk-management measures". Coccinelle semantic patches applied to critical infrastructure software (power grid controllers, medical devices, automotive ECUs written in C) constitute machine-executable risk management policies β the patch file is the policy, the spatch invocation is the enforcement, and the git diff is the audit log.
ISO 26262 (Automotive) and IEC 61508 (Industrial Safety)
AUTOSAR C++ coding guidelines and MISRA C β both widely adopted in safety-critical automotive/industrial software β require specific programming patterns. Coccinelle semantic patches can enforce MISRA C rules at scale, automating the compliance verification that static analysis tools charge thousands of Euros for. The Lawall group published several papers on using Coccinelle for embedded/safety-critical C code transformation.
Deploying Coccinelle on EU Infrastructure
Docker deployment
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
coccinelle \
python3 \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /workspace
ENTRYPOINT ["spatch"]
Running semantic patches in CI
# .github/workflows/coccicheck.yml (or sota.io CI equivalent)
name: Coccinelle Security Check
on: [push, pull_request]
jobs:
coccicheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Coccinelle
run: sudo apt-get install -y coccinelle
- name: Run null pointer checks
run: |
spatch --sp-file checks/null_deref.cocci \
--dir src/ \
--jobs $(nproc) \
--very-quiet \
2>&1 | tee coccinelle-report.txt
if grep -q "^---" coccinelle-report.txt; then
echo "Coccinelle found potential issues"
cat coccinelle-report.txt
exit 1
fi
Example: enforcing error path resource release across a project
# check_kfree.cocci β Python-scripted Coccinelle rule
@ rule1 @
expression ptr, size, flags;
position p1, p2;
@@
ptr = kmalloc@p1(size, flags);
... when != kfree(ptr)
return@p2 ...;
@ script:python @
ptr << rule1.ptr;
p1 << rule1.p1;
p2 << rule1.p2;
@@
print("Missing kfree of %s (allocated at %s, returned at %s)" % (ptr, p1, p2))
This Python-scripted rule finds every path from a kmalloc to a return where kfree is not called on the allocated pointer, and prints a structured diagnostic with file/line information.
The INRIA Paris OCaml Ecosystem
Coccinelle is written in OCaml β INRIA's native functional language, the same language used for Coq/Rocq (the proof assistant powering CompCert and seL4) and the OCaml toolchain itself. This is not coincidence: the same research culture at INRIA Paris that produced formal verification tooling for safety-critical software (Leroy's CompCert, Blanchet's ProVerif, the Jasmin/EasyCrypt/HACL* stack) also produces practical tooling for systematic code transformation.
Julia Lawall's position in this ecosystem: she is a senior INRIA researcher at the Cambium team (Paris, formerly Gallium), the team that maintains the OCaml compiler itself. Her daily working environment is OCaml β Coccinelle's implementation reflects deep familiarity with the language semantics and the practical requirements of large-scale C codebase maintenance.
The result is a tool that Linux kernel maintainers have adopted as infrastructure β not a research prototype but production tooling that lives in scripts/coccinelle/ and runs on every kernel CI system in the world.
Deploy Coccinelle to EU Servers with sota.io
sota.io is the EU-native PaaS built for exactly this kind of tooling deployment. Run Coccinelle in a CI/CD pipeline, a code review bot, or a continuous compliance scanner β all on German infrastructure, GDPR-compliant by default.
# sota.io deployment
sota deploy --image ubuntu:22.04 \
--run "apt-get install -y coccinelle && spatch --version" \
--region eu-west-1
Free tier includes enough compute to run spatch across a medium-sized embedded C codebase. No Cloud Act. No PRISM. No EU data leaving the EU.
INRIA Paris π«π· β 100% EU research provenance. GPL v2. github.com/coccinelle/coccinelle.