Deploy Alive2 to Europe β Nuno Lopes π΅πΉ (Universidade de Lisboa), the LLVM Optimization Verifier That Found 47 Compiler Bugs, on EU Infrastructure in 2026
There is a subtle failure mode in the formal verification of software that practitioners rarely discuss. You write a program, prove properties about it using Dafny or Prusti or Frama-C, and deploy it with confidence. What you have proved is that the source code satisfies the specification. What you have not proved is that the binary the compiler produces does the same thing.
Compilers are complex. LLVM's optimization pipeline β InstCombine, GVN, SROA, LICM, and dozens more passes β transforms your IR through hundreds of thousands of lines of C++ before a byte of machine code is emitted. If any one of those passes contains a bug that introduces undefined behavior, miscompiles a pointer arithmetic expression, or drops a volatile store that zeroes a cryptographic key, your formal verification guarantee does not survive the compiler. The source is correct; the binary is not.
This is not theoretical. The Heartbleed vulnerability in OpenSSL was caused in part by compiler optimizations removing security-critical memset calls β the compiler, correctly applying the "dead store elimination" optimization, determined that the buffer being zeroed was never read again and removed the zeroing entirely. The programmer's intent was irrelevant; the optimizer was technically correct in the absence of volatile, and the result was that cryptographic key material leaked into outgoing TLS packets.
Alive2 addresses this problem directly. It is a translation validation framework for LLVM: given a sequence of IR transformations, Alive2 checks β using an SMT solver β whether each transformation is semantically correct. Not via a global proof, but pass-by-pass, at the granularity of individual optimization rewrites.
Nuno P. Lopes π΅πΉ created Alive2 while at Microsoft Research Cambridge π¬π§ (now a faculty member at Universidade de Lisboa π΅πΉ, Instituto Superior TΓ©cnico), with co-authors Juneyoung Lee, Chandrakana Nandi, Zhengyang Liu, and John Regehr (University of Utah). The paper "Alive2: Bounded Translation Validation for LLVM" appeared at PLDI 2021. It follows the original Alive (PLDI 2015, Lopes + Menendez + Nagarakatte + Regehr), which verified LLVM's InstCombine peephole optimizations. Alive2 generalises the approach to arbitrary LLVM passes.
Since deployment, Alive2 has found 47 previously unknown bugs in LLVM's optimization pipeline β bugs that survived years of testing and fuzzing, exposed only when the SMT-based equivalence check found a concrete counterexample.
What Translation Validation Means
The correctness property Alive2 checks is refinement: given a source IR program S and a target IR program T (the result of an optimization), T refines S if every observable behavior of T is also an observable behavior of S. Informally: the optimizer may reduce nondeterminism (eliminate undefined behavior, specialize under assumptions) but must not introduce behaviors the source does not have.
The direction matters: this is not equivalence. An optimizer may take a program with undefined behavior (which can do anything) and produce a program with defined behavior (which does something specific). That is valid refinement β the target is more defined, not less. But an optimizer may not take a program with defined behavior and produce one that does something different.
Formally, for a source IR function F_S and target F_T:
β inputs I, β memory states M:
defined(F_S(I, M)) β (defined(F_T(I, M)) β§ F_T(I, M) = F_S(I, M))
Where defined means the execution reaches a return without triggering undefined behavior. If the source execution is undefined, the target may do anything β that is the LLVM UB contract. But if the source is defined, the target must produce the same result.
The LLVM Undefined Behavior Model
LLVM IR has a rich undefined behavior model that complicates correctness reasoning substantially. Alive2 must encode all of it:
Poison Values
LLVM's poison represents a value that is the result of an undefined operation β integer overflow with nsw/nuw flags, out-of-bounds GEP with inbounds, a udiv by zero. Poison propagates through computations: any operation on a poison value produces poison. A branch on poison is UB; a store of poison is UB.
; %a is poison if %x + 1 overflows (nsw = no signed wrap)
%a = add nsw i32 %x, 1
; %b = %a * 2 β also poison if %a is poison
%b = mul i32 %a, 2
; br on poison = undefined behavior
%cmp = icmp sgt i32 %b, 0
br i1 %cmp, label %true, label %false
An optimization that propagates or creates poison where the source had a defined value is incorrect. Alive2 encodes poison as a separate boolean tracking per-value in its SMT encoding.
Undef (Deprecated but Present)
Older LLVM IR used undef β a value that can be any bit pattern, chosen independently at each use. Unlike poison, undef does not propagate and does not cause UB. A computation on undef returns some concrete but unspecified value. This made reasoning about programs extremely difficult (different uses of the same undef could have different values). LLVM is gradually replacing undef with poison + freeze.
The freeze Instruction
freeze %x returns a concrete non-poison value: if %x is poison, freeze returns an arbitrary but fixed value; if %x is defined, freeze is the identity. The freeze instruction was introduced to make optimizations that "freeze" poison correct β e.g., a loop induction variable that starts from a potentially-poison base can be frozen to get a defined (if arbitrary) starting value.
Alive2 models freeze correctly: the SMT encoding uses existential quantification over the possible concrete values a freeze of a poison can return.
Memory Model
LLVM's memory model handles:
noaliasannotations: two pointers declarednoaliasdo not alias β optimizations may reorder loads/stores between themalignrequirements: misaligned memory access is UBdereferenceableattributes: a pointer declareddereferenceable(n)may be loaded within bounds without checking for null
Alive2 encodes aliasing constraints as uninterpreted functions with uniqueness axioms. Memory is modelled as a map from byte addresses to symbolic values, with constraints tracking which regions are allocated and with what alignment.
The SMT Encoding
For each optimization rewrite, Alive2 constructs an SMT formula that is satisfiable if and only if the transformation is incorrect. It then queries Z3. If Z3 returns UNSAT, the transformation is verified. If Z3 returns SAT, Z3 returns a concrete counterexample β an input and memory state that witnesses the incorrect transformation.
The encoding proceeds by symbolic execution of source and target IR:
Source IR F_S β symbolic executor β (result_S, ub_condition_S, poison_S)
Target IR F_T β symbolic executor β (result_T, ub_condition_T, poison_T)
Refinement check:
Β¬ (ub_condition_S β¨ Β¬ub_condition_T) β§ (result_S β result_T)
If this formula is satisfiable, there exists an input where the source is defined (no UB), but the target is either UB or returns a different value β a transformation bug.
Integer operations encode straightforwardly to bitvector SMT. Floating-point requires IEEE 754 semantics, which Z3 supports via its FP theory. Memory operations involve the symbolic memory model. Loops require loop unrolling (hence "bounded" validation β Alive2 checks up to a configurable depth).
Real Bugs Found
The 47 bugs Alive2 found in LLVM cover a wide range of passes. Selected examples:
InstCombine β incorrect nsw flag propagation:
; Source (correct: nsw means overflow = poison)
%1 = add nsw i32 %x, 1
%2 = sub i32 %1, 1 ; may produce poison if %1 is poison
; Incorrect optimization attempted:
; "x + 1 - 1 = x" β but only valid if x+1 doesn't overflow
; Optimizer dropped the nsw, producing undef instead of poison
%2 = %x ; WRONG if %x + 1 overflows
GVN β incorrect load forwarding across aliasing stores:
An optimization forwarded a loaded value across a store to a different address without correctly accounting for aliasing β in the presence of two pointers that Alive2 determined could alias (despite the optimization assuming they could not), the forwarded value was stale.
Dead Store Elimination β dropping volatile stores:
A volatile store to zero out a buffer was incorrectly removed. The optimizer determined the buffer was not subsequently read via non-volatile loads, applying dead store elimination without checking the volatile attribute. Alive2's model of volatile operations (which must not be reordered or eliminated) caught this.
This last class of bug is directly security-relevant: volatile memset or explicit volatile zeroing of cryptographic key material can be eliminated by incorrect dead store elimination, leaving key material in memory. Alive2 catches it at the optimization pass level.
Alive2 vs CompCert: Complementary Approaches
The question naturally arises: if CompCert (Xavier Leroy π«π·, INRIA Paris, JACM 2009) provides a fully formally verified C compiler with a Coq proof of semantic preservation, why is Alive2 needed?
The answer is scope and applicability:
CompCert:
- Full formal proof of correctness: all executions, not bounded
- Covers the CompCert compiler pipeline: C light β ... β x86-64 assembly
- Proof assistant (Coq): proof is machine-checked once, forever
- Limitation: CompCert is a separate compiler, not GCC or LLVM. Adopting CompCert means replacing your entire toolchain.
Alive2:
- Bounded translation validation: checks each pass per-invocation, not a global proof
- Covers LLVM: the compiler used by Clang (C/C++), Rust, Swift, Kotlin/Native, and more
- SMT-based: discharges proof obligations automatically per-transformation
- Can be run on the actual LLVM optimizing your actual code, with your actual flags
- Limitation: bounded unrolling means very long loops may not be fully verified
In the EU formal verification ecosystem, CompCert is the gold standard for safety-critical C (aerospace, railway, nuclear via AbsInt DE). Alive2 is the practical tool for everyone using LLVM β which is most of the industry. They are not competing; they are complementary layers of a verification stack.
A future ideal: Rust compiled via LLVM, with Prusti π¨π (ETH Zurich) verifying source semantics and Alive2 verifying the LLVM optimization pipeline, producing an end-to-end certificate from source specification to machine code.
EU Compliance Angles
CRA 2027 β Supply Chain Security
The EU Cyber Resilience Act's Art. 13 requires manufacturers to "address and remediate vulnerabilities without delay" and to "document components... including open-source components". Art. 14 addresses supply chain risk.
Compiler correctness is a supply chain risk: your compiler is a component of your product. If the compiler has bugs that silently miscompile security-critical code β key zeroing, bounds checks, cryptographic operations β the CRA "state of the art" security obligation is not met by source-level verification alone. Alive2 provides the tooling to validate the LLVM component of your supply chain.
Running Alive2 against the specific LLVM version in your toolchain, with the specific optimization flags your build system uses, produces a per-build validation artifact documenting that your optimization pipeline preserves program semantics for your specific code.
DO-178C / ED-12C β Aerospace Qualification
DO-178C (avionics software standard) requires compiler qualification: the compiler used must be qualified as a tool, or the produced object code must be verified. CompCert + AbsInt π©πͺ toolchain is one route to DO-178C Level A compliance. For projects using LLVM-based toolchains, Alive2-style translation validation is a pathway to the object code verification alternative β generating evidence that the optimizer did not introduce errors.
ISO 26262 / SOTIF β Automotive
ISO 26262 requires tool confidence level (TCL) assessment for compilers. A compiler with known bugs (e.g., bugs found by Alive2 but not yet patched) affects the TCL. Running Alive2 in a qualification context β and confirming that identified bugs do not affect the specific IR patterns generated by your codebase β is a defensible component of ISO 26262 tool qualification evidence.
NIS2 Art. 21 β Critical Infrastructure
For operators of essential services using safety-critical software in critical infrastructure (power grids, water treatment, transport management), the supply chain security obligations of NIS2 Art. 21(2)(d) extend to compiler correctness. Translation validation for the specific compiler version and flags used in production is auditable NIS2 evidence.
Deploying Alive2 on EU Infrastructure
Alive2 integrates into LLVM development and testing workflows. It can run as:
- A standalone verifier: given two LLVM IR files, check refinement
- Integrated into LLVM's CI: run on each optimization pass automatically
- A mutation testing tool: generate IR rewrites and check correctness
Docker deployment
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
cmake ninja-build clang llvm-dev libz3-dev \
git python3 && rm -rf /var/lib/apt/lists/*
# Build Alive2
RUN git clone --depth=1 https://github.com/AliveToolkit/alive2 /alive2
RUN cmake -G Ninja -S /alive2 -B /alive2/build \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_TV=ON
RUN ninja -C /alive2/build alive-tv
ENTRYPOINT ["/alive2/build/alive-tv"]
Running translation validation on an optimization
# Compile source to LLVM IR
clang -O0 -emit-llvm -S -o before.ll crypto_zero.c
# Run optimization pass
opt -passes=instcombine -S -o after.ll before.ll
# Check that optimization is correct via Alive2
alive-tv before.ll after.ll
If the optimization is correct, Alive2 prints Transformation seems to be correct!. If incorrect, it prints the counterexample input that demonstrates the semantic mismatch.
Integrating into CI
# .github/workflows/alive2-check.yml
name: Alive2 Translation Validation
on: [push, pull_request]
jobs:
alive2:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build Alive2 Docker image
run: docker build -t alive2 .
- name: Compile to IR (unoptimized)
run: |
clang -O0 -emit-llvm -S -o src/before.ll src/crypto.c
- name: Apply optimizations
run: |
opt -passes=instcombine,gvn,dse -S -o src/after.ll src/before.ll
- name: Validate via Alive2
run: |
docker run --rm -v $(pwd)/src:/src alive2 \
/src/before.ll /src/after.ll
Deploying on sota.io
# sota.io deployment β build and run Alive2 checker service
sota deploy --dockerfile ./Dockerfile.alive2 \
--name alive2-validator \
--region eu-central-1 \
--memory 2gb
Running Alive2 on EU infrastructure (German data centres, no Cloud Act jurisdiction) means your compiler validation artifacts β the IR files, the Z3 proofs, the counterexamples β remain in EU territory. For aerospace and automotive qualification workflows that require documentation under ITAR or NIS2, data residency is a non-negotiable constraint.
The LLVM Ecosystem EU Presence
While LLVM itself originates from the University of Illinois (Chris Lattner, 2000), the academic community contributing to LLVM verification and semantics is substantially European:
- Nuno Lopes π΅πΉ (Universidade de Lisboa / IST): Alive, Alive2, LLVM-TV
- John Regehr (Utah) + Arnd Bergmann π©πͺ (Linux kernel, Linaro): compiler correctness for kernel code
- Ralf Jung π©πͺ (MPI-SWS Kaiserslautern π©πͺ β MIT β ETH Zurich π¨π): LLVM memory model formalization, Stacked Borrows (Rust MIR semantics)
- Derek Dreyer πΊπΈ (MPI-SWS Kaiserslautern π©πͺ): relaxed memory models for compiler correctness
- Viktor Vafeiadis π¬π· (MPI-SWS Kaiserslautern π©πͺ): LLVM memory model, compiler optimizations under weak memory
The MPI-SWS (Max Planck Institute for Software Systems, Kaiserslautern π©πͺ) cluster is a major centre for compiler and memory model semantics research, directly feeding into the formal understanding of LLVM that Alive2 depends on.
Deploy Alive2 to EU Servers with sota.io
sota.io provides the EU-native infrastructure to run Alive2 as a CI service, a qualification artifact generator, or a continuous compiler validation daemon β all on German infrastructure, GDPR-compliant by default.
# sota.io one-command deployment
sota deploy --image ubuntu:22.04 \
--run "apt-get install -y alive2 && alive-tv --version" \
--region eu-central-1
Free tier includes enough compute to validate optimization pipelines for medium-sized embedded C codebases. No Cloud Act. No PRISM. No EU data leaving the EU.
Universidade de Lisboa π΅πΉ / MSR Cambridge π¬π§ β EU-anchored research. Apache 2.0. github.com/AliveToolkit/alive2.