2026-07-12Β·10 min readΒ·sota.io team

Deploy CompCert to Europe β€” Xavier Leroy πŸ‡«πŸ‡· + INRIA (2006), the Only Formally Verified C Compiler with a Machine-Checked Proof of Correctness, on EU Infrastructure in 2026

Every C compiler contains bugs. GCC, Clang, MSVC β€” all of them, at some level of optimisation, have been found to miscompile correct C programs: to produce binaries whose behaviour differs from what the source code specifies. For most software, a miscompilation is a latent bug, caught eventually by testing. For safety-critical software β€” the control firmware of an Airbus A380 flight control unit, the braking system of a Siemens ERTMS railway controller, the software inside a Philips MRI scanner β€” a miscompilation that reaches production could kill people.

CompCert is the answer to that problem. Created in 2006 by Xavier Leroy πŸ‡«πŸ‡· at INRIA Paris, CompCert is the only widely used C compiler whose correctness is formally proved: a machine-checked proof in Coq/Rocq establishes that for any C program with defined semantics, CompCert's compiled output faithfully implements the source code's behaviour. When CompCert's compiler produces a binary, it is not just tested: the proof guarantees that the compiled binary cannot exhibit behaviour that the source program could not.

In 2026, with the EU AI Act requiring systematic verification for high-risk AI systems and the Cyber Resilience Act imposing software correctness obligations on connected products, CompCert's formally proved correctness is precisely what EU safety engineers and compliance teams need. Running CompCert on EU infrastructure keeps all compilation artefacts and formal proofs within EU jurisdiction β€” a requirement for aerospace and defence contractors operating under ITAR constraints and an expectation for Annex III AI system operators under the AI Act.

What CompCert Proves β€” Semantics Preservation

The standard statement of CompCert's correctness theorem is called semantics preservation. It says: if the source C program P has well-defined behaviour under a given execution β€” that is, no undefined behaviour in the C standard sense occurs β€” then the compiled binary P' produced by CompCert exhibits the same observable behaviour.

More formally: if Clight_semantics(P, inputs) = outputs, then x86_semantics(CompCert(P), inputs) = outputs.

What does "observable behaviour" mean? In CompCert's formal model, observable events are: reads and writes to volatile memory locations, calls to external functions, and the final program exit value. The proof guarantees that all of these match between source and compiled binary. What CompCert's proof does not cover: undefined behaviour in the C source (out-of-bounds array access, use-after-free, signed integer overflow) β€” these remain the programmer's responsibility, often addressed by running tools like Frama-C or Clang's sanitisers before passing to CompCert.

The proof is machine-checked in Coq/Rocq (INRIA πŸ‡«πŸ‡· β€” the same theorem prover underlying VerCors and Nagini's Coq-based meta-theory). The total proof base is approximately 100,000 lines of Coq, including:

This is not an informal argument or a test suite. It is a formal mathematical proof that a machine verified.

Xavier Leroy and the CompCert Research Programme

Xavier Leroy πŸ‡«πŸ‡· is one of the most influential programming language researchers in Europe. Born in France, educated at Γ‰cole Normale SupΓ©rieure and Paris VI, Leroy has spent his career at INRIA Paris β€” the French national computing research institute β€” and since 2017 holds the Chair of Software Safety at the CollΓ¨ge de France in Paris, the most prestigious academic chair in France (Leroy is the first computer scientist to hold a CollΓ¨ge de France chair in software safety).

Leroy's pre-CompCert contributions established the theoretical foundations that made formal compiler verification tractable:

The Caml/OCaml type system β€” Leroy is one of the principal architects of OCaml's type system, including the module system, polymorphic variants, and the class system. OCaml is the implementation language of CompCert itself, and the language in which Coq/Rocq is implemented. The French programming language ecosystem β€” OCaml, Coq, Why3, Alt-Ergo, CompCert β€” form a coherent, interconnected stack that is the most complete formally-verified software toolchain available anywhere.

Typed compilation β€” Leroy's work on typed intermediate representations (ZINC machine, 1990; the CAML abstract machine) demonstrated that strong typing could be preserved through compilation, a precursor to the certified compilation approach of CompCert.

CompCert (POPL 2006, JACM 2009) β€” the centrepiece of Leroy's career. The JACM 2009 paper "Formal verification of a realistic compiler" is one of the most cited papers in programming language research, establishing CompCert as the flagship result in verified systems software.

The Collège de France lectures (freely available online in French and English) are among the finest public expositions of formal methods for software, covering CompCert, Coq/Rocq, and the landscape of formally verified systems software. They are essential reading for any EU engineer working on AI Act compliance.

The Compiler Architecture β€” Eight Verified Passes

CompCert compiles a large subset of C (defined as Clight, a slightly cleaned-up version of C with explicit casts and a cleaner semantics) through eight intermediate languages, each transformation formally proved to preserve semantics:

Clight (source)
    ↓ Cshmgen (cast elaboration, switch lowering)
C#minor
    ↓ Cminorgen (stack frame allocation, block structure removal)
Cminor
    ↓ RTLgen (register transfer language generation, SSA-like)
RTL (Register Transfer Language)
    ↓ Tailcall (tail call optimisation)
    ↓ Inlining (function inlining with call-graph analysis)
    ↓ Renumber (liveness-friendly renumbering)
    ↓ ConstProp (constant propagation via dataflow)
    ↓ CSE (common subexpression elimination via value numbering)
    ↓ Deadcode (dead code elimination)
    ↓ Allocation (register allocation via graph colouring + spilling)
LTL (Location Transfer Language β€” with spill/reload)
    ↓ Tunneling (branch chain elimination)
    ↓ Linearize (CFG linearisation to instruction list)
Linear (linearised LTL)
    ↓ CleanupLabels (dead label removal)
    ↓ Debugvar (debug info generation)
    ↓ Stacking (stack frame layout, calling convention)
Mach (abstract machine code with calling convention)
    ↓ Asmgen (target instruction selection)
x86-64 / ARM / RISC-V / PowerPC assembly

Each arrow in this chain represents a Coq proof. The composition of all proofs yields the end-to-end semantics-preservation guarantee. The key insight of CompCert's architecture is that the intermediate languages are designed to be small enough that individual transformation proofs are tractable, while the composition covers the full gap from C to machine code.

The optimisation passes (ConstProp, CSE, Deadcode, Inlining) are formally verified too β€” they are sound optimisations in the mathematical sense: they cannot introduce incorrect behaviour, only remove redundant computation.

Industrial Use: Airbus, AbsInt, and DO-178C Level A

The most significant industrial user of CompCert is Airbus πŸ‡«πŸ‡· (Toulouse), where CompCert is used as the C compiler for avionics software components compiled to the LEON processor (a SPARC-V8 core used in space and avionics). The relevant standard is DO-178C (Software Considerations in Airborne Systems and Equipment Certification), the avionics software certification standard accepted by EASA in Europe and FAA in the USA.

DO-178C defines software levels A (catastrophic failure condition) through E (no safety effect). For Level A software β€” software whose failure could cause a catastrophic accident β€” DO-178C requires either full structural coverage testing (MC/DC coverage) or formal methods at the highest assurance level (DO-178C with DO-333 formal methods supplement). CompCert satisfies the compiler qualification requirements of DO-178C because its formal proof demonstrates absence of miscompilation β€” directly addressing the "tool qualification" requirement that normally demands extensive testing of the compiler itself.

AbsInt Angewandte Informatik GmbH πŸ‡©πŸ‡ͺ (SaarbrΓΌcken, Germany β€” on the campus of Saarland University) is the company that provides commercial support and qualification for CompCert. AbsInt's product CompCert for WCET combines CompCert (semantically correct compilation) with aiT (worst-case execution time analysis, another formally-verified tool) to produce the complete tool chain required for DO-178C Level A certification of real-time avionics software.

Saarland University (where AbsInt is located) is itself one of Europe's premier CS research institutions, home to the Max Planck Institute for Software Systems (MPI-SWS). The Saarland–Paris–Lyon corridor of formal methods research (AbsInt + INRIA + ENS Lyon) is the industrial and academic backbone of EU verified systems software.

Other EU industrial users include:

EU AI Act Art. 9 and CRA 2027

EU AI Act Article 9 requires high-risk AI systems to implement risk management including systematic verification. For AI systems implemented in C (embedded inference engines, microcontroller-based vision systems, automotive sensor fusion) β€” where performance constraints preclude higher-level languages β€” the correctness of the C compiler is part of the verification chain.

A compiler bug can silently invalidate all source-level verification. If you have formally specified your AI inference algorithm in Frama-C, verified it with the WP plugin, and then compiled it with a compiler that contains a miscompilation bug, your formal proof is undermined. CompCert closes this gap: with CompCert, the compilation step is provably correct, making source-level formal verification results valid all the way to the machine code.

For EU AI Act Annex III high-risk applications:

The Cyber Resilience Act (CRA, 2027) imposes security and correctness obligations on manufacturers of products with digital elements. For products where a compiler bug could introduce exploitable behaviour (buffer overflows introduced by aggressive optimisation, incorrect sign-extension creating security vulnerabilities), CompCert's proved correctness is a relevant technical control.

Deploying CompCert on sota.io

sota.io is a European PaaS β€” infrastructure in EU data centres, GDPR-compliant, managed PostgreSQL, zero DevOps, free tier.

For CompCert workloads:

# Install CompCert (Ubuntu/Debian)
# CompCert requires Coq for building from source, or use prebuilt
apt-get install -y opam coq

# Install via OPAM (OCaml package manager)
opam repo add coq-released https://coq.inria.fr/opam/released
opam install coq-compcert

# Or download prebuilt binary (AbsInt commercial version includes DO-178C qualification)
# wget https://compcert.org/release/compcert-<version>.tgz

# Compile a C file with CompCert
ccomp -o my_binary my_program.c

# Compile with CompCert and generate RTL dump (for inspection)
ccomp -dumptree my_program.c -o my_binary

# Cross-compile for ARM (embedded target)
ccomp -target arm-linux -o firmware.elf firmware.c
# Dockerfile: CompCert compilation in CI on sota.io
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y opam coq build-essential
RUN opam init --disable-sandboxing && eval $(opam env) \
    && opam repo add coq-released https://coq.inria.fr/opam/released \
    && opam install coq-compcert
WORKDIR /src
COPY . .
RUN eval $(opam env) && ccomp -o output my_program.c

On sota.io, a Standard tier instance (€9/month, 2 GB RAM, 2 vCPU) handles CompCert compilation of typical embedded firmware projects (up to ~50k lines of C). The OCaml garbage collector and Coq type checker are memory-intensive; compilation of very large codebases benefits from the Pro tier.

The OCaml–Coq–CompCert Stack: A 100% EU-Native Verified Toolchain

CompCert is the capstone of what is arguably the world's most complete EU-native formally verified software stack:

LayerToolInstitutionCountry
Theorem proverCoq/RocqINRIAπŸ‡«πŸ‡·
SMT solverAlt-ErgoCNRS/UniversitΓ© Paris-SaclayπŸ‡«πŸ‡·
VCgen / deductive verifierWhy3INRIA SaclayπŸ‡«πŸ‡·
Rust verifierCreusotINRIA SaclayπŸ‡«πŸ‡·
C verifierFrama-CCEA LIST + INRIAπŸ‡«πŸ‡·
C compilerCompCertINRIA + CollΓ¨ge de FranceπŸ‡«πŸ‡·
Implementation languageOCamlINRIAπŸ‡«πŸ‡·

From mathematical foundations (Coq) through specification (Why3), verification (Frama-C), and compilation (CompCert), the entire chain is: formally verified, open-source, EU-funded, and subject to no US Cloud Act or export control. For EU organisations building safety-critical AI systems under the EU AI Act, this stack provides a defensible, auditable, sovereign verification and compilation pipeline.

See Also