2026-04-08·7 min read·sota.io team

HOL Light MCP Server: Formal Theorem Proving via Claude and AI Assistants (2026)

In 1996, John Harrison completed his PhD at the University of Cambridge under Lawrence Paulson. The thesis produced HOL Light — a minimalist implementation of Higher Order Logic in approximately 400 lines of OCaml, the functional language born at INRIA Paris. Over the following decade, Harrison used HOL Light at Intel to formally prove the correctness of IEEE 754 floating-point transcendental functions: sin, cos, exp, atan, ln. The proofs covered every representable input. No test suite does that.

In 2014, the Flyspeck project completed a formal proof of the Kepler conjecture — that no arrangement of equal spheres in three-dimensional Euclidean space has a greater average density than face-centred cubic packing, the pattern grocers use to stack oranges. The proof, in HOL Light and Isabelle, closed a 400-year-old mathematical question.

In 2026, HOL Light has an MCP server.

This matters because MCP — the Model Context Protocol, an open standard released by Anthropic in November 2024 — lets AI assistants like Claude call external tools during a conversation. Not by generating code that you then paste into a terminal. By invoking the tool directly, receiving the result, and incorporating it into the response. The HOL Light MCP server makes a theorem prover a first-class tool in an AI coding assistant. You can ask Claude to prove a property of your code and get a machine-checked answer.

HOL Light in Brief

HOL Light implements the HOL (Higher Order Logic) proof system originally designed by Mike Gordon at the Cambridge Computer Laboratory in 1988. The defining design decision is minimality: the trusted kernel — the code that must be correct for the proof system to be sound — is small enough to audit. Every theorem you prove in HOL Light is ultimately justified by a chain of inference rules applied to axioms, with the kernel as the only trusted component.

Harrison built HOL Light to be small and fast. The OCaml implementation compiles quickly, loads interactively, and integrates naturally with the OCaml ecosystem. Intel chose it for floating-point verification precisely because the trusted base was auditable and the OCaml integration made it straightforward to connect to hardware specifications.

Key properties:

Trusted kernel: ~400 lines of OCaml. The only code that must be trusted for soundness.
Higher Order Logic: Functions as first-class values, polymorphism, quantification over functions. More expressive than first-order logic.
Tactics: Proof automation via ML functions that decompose goals. The tactic language is extensible in OCaml.
Decision procedures: Built-in arithmetic, linear arithmetic over rationals, real analysis.
Flyspeck library: The full Kepler conjecture proof, available as a reusable library of geometric lemmas.

For an in-depth look at deploying HOL Light as a verification workload, see our earlier post on HOL Light EU hosting. This post focuses on the MCP server interface specifically.

What is MCP?

The Model Context Protocol is a JSON-RPC 2.0 based open standard for connecting AI assistants to external tools and data sources. The server exposes a list of tools — each with a name, description, and JSON Schema for inputs and outputs. The AI client (Claude, or any MCP-compatible assistant) can call those tools during inference and incorporate the results into its response.

This is different from function-calling or code execution. When Claude runs Python code in a sandboxed interpreter, it is generating code and observing output. When Claude calls an HOL Light MCP tool, it is submitting a formal specification and receiving a machine-checked proof — or a counterexample if the specification is unprovable.

The protocol is transport-agnostic: servers run over stdio (for local tools) or HTTP (for network-accessible services). For a proof infrastructure deployment, HTTP is natural: you deploy HOL Light on a server, expose an MCP endpoint, and point your Claude configuration at it.

The HOL Light MCP Server

The HOL Light MCP server exposes theorem proving as a set of callable tools:

{
  "tools": [
    {
      "name": "hol_prove",
      "description": "Attempt to prove a HOL Light goal. Returns the proof term if successful, or a list of unsolved subgoals if the tactic fails.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "goal": { "type": "string", "description": "HOL Light goal term" },
          "tactic": { "type": "string", "description": "HOL Light tactic expression (e.g. 'ARITH_TAC', 'MESON_TAC[]')" }
        },
        "required": ["goal", "tactic"]
      }
    },
    {
      "name": "hol_check",
      "description": "Type-check and parse a HOL Light term without attempting proof.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "term": { "type": "string" }
        },
        "required": ["term"]
      }
    },
    {
      "name": "hol_search",
      "description": "Search the HOL Light theorem database for theorems matching a pattern.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "pattern": { "type": "string", "description": "HOL Light term pattern to search for" }
        },
        "required": ["pattern"]
      }
    }
  ]
}

The server loads HOL Light on startup, keeping the proof state warm between calls. Individual proof requests are processed synchronously; the server serializes them to avoid concurrent access to the OCaml runtime.

A Conversation with the Proof Assistant

Here is what a session looks like when Claude has access to the HOL Light MCP server:

User: Can you prove that for all natural numbers n, n + 0 = n in HOL Light?

Claude: Let me invoke HOL Light directly.

[Tool call: hol_prove, goal: !n. n + 0 = n, tactic: ARITH_TAC]

[Tool result: |- !n. n + 0 = n]

The proof succeeded. HOL Light's ARITH_TAC decision procedure for linear arithmetic over the natural numbers directly closes this goal. The returned term |- !n. n + 0 = n is the theorem — the |- symbol is HOL's turnstile, indicating this is a proven sequent in the empty context.

A more interesting case for software verification:

User: I have a sorting function that claims to return a list where every element is less than or equal to the next. Can you help me formally state and prove a key invariant?

Claude: Let me start by checking whether a relevant lemma already exists in the database.

[Tool call: hol_search, pattern: SORTED]

[Tool result: SORTED_APPEND, SORTED_MERGE, SORTED_FILTER, SORTED_MONO_INJECTIVE]

Several relevant theorems already exist in the HOL Light library. SORTED_APPEND states conditions under which the concatenation of two sorted lists is sorted. Let me prove a specific instance for your comparison function...

This interaction pattern — AI assistant as proof engineer, theorem prover as oracle — is new in 2026. It changes what developers can verify without being HOL Light experts. The AI handles the proof search strategy; HOL Light provides the machine-checked guarantee.

Why EU Infrastructure Matters for Proof Artifacts

When you run formal verification as part of a CI/CD pipeline or regulatory compliance workflow, the proof artifacts — the machine-checked terms, the proof logs, the counterexample witnesses — are documentation with legal weight.

Under EU AI Act Article 9, a high-risk AI system must maintain a documented risk management system throughout its lifecycle. Proof artifacts from formal verification are primary evidence that foreseeable failure modes have been analysed exhaustively, not just sampled.

Under the US CLOUD Act, any server operated by a US-incorporated company can receive a government request for stored data, including those proof artifacts. This includes servers physically located in the EU operated by US-incorporated cloud providers. Railway is incorporated in San Francisco. Render is incorporated in San Francisco. Fly.io is incorporated in Delaware. If your proof artifacts are on their infrastructure, they are potentially subject to US government access, regardless of which physical datacenter they are stored in.

HOL Light is BSD-licensed open source. The MCP server has no proprietary dependencies. You can run the full stack — OCaml runtime, HOL Light kernel, MCP server, CI integration — on infrastructure that is incorporated and physically located in the EU, subject only to EU law.

sota.io deploys to a datacenter in Germany, operated by infrastructure that is EU-incorporated. Your proof pipeline runs there. Your proof artifacts stay there.

Deploying HOL Light MCP on sota.io

A minimal sota.yml for an HOL Light MCP server:

runtime: docker
build:
  dockerfile: |
    FROM ocaml/opam:ubuntu-24.04-ocaml-4.14
    RUN opam install -y hol_light
    COPY server.ml /app/server.ml
    RUN cd /app && ocamlfind ocamlopt -package hol_light,jsonrpc -linkpkg server.ml -o hol-mcp-server
    EXPOSE 8080
    CMD ["/app/hol-mcp-server"]
service:
  port: 8080
  health_check: /health
resources:
  memory: 512MB
  cpu: 1

The HOL Light kernel loads on container startup (typically 3-5 seconds). Proof requests then execute against the warm state. For heavy theorem libraries like Flyspeck, allocate more memory; the full library occupies approximately 800 MB resident after loading.

Point your Claude configuration at the deployed endpoint:

{
  "mcpServers": {
    "hol-light": {
      "url": "https://your-app.sota.io/mcp",
      "transport": "http"
    }
  }
}

Claude then has access to the hol_prove, hol_check, and hol_search tools in every conversation.

EU AI Act Article 9: Mathematical Proof as Compliance Evidence

Article 9 of the EU AI Act requires providers of high-risk AI systems to establish a risk management system that addresses "known and foreseeable risks." The phrase "foreseeable risks" — those that can be identified without having observed them as failures — is precisely what formal verification addresses.

The HOL Light MCP server changes the economics of formal verification for AI systems. Previously, writing formal specifications required expertise in the theorem prover's syntax and proof calculus. With MCP, a developer who understands the property they want to prove can work interactively with Claude to construct the specification, have HOL Light machine-check it, and generate the proof artifact as documentation.

Specifically relevant to Article 9:

Art. 9(4) foreseeable risks: HOL Light proofs cover all reachable states, not just tested inputs.
Art. 9(6) residual risk measures: A machine-checked proof that an invariant holds for all inputs is stronger evidence of risk mitigation than test coverage metrics.
Art. 9(7) testing requirements: The Article requires testing "at any time throughout the development process." HOL Light integrated into CI runs on every commit.

For teams building high-risk AI systems under EU AI Act scope — healthcare diagnostics, credit scoring, biometric identification, employment screening — running proof infrastructure on EU-incorporated servers eliminates one potential regulatory complexity: the question of whether CLOUD Act exposure over proof artifacts creates a gap in the documented risk management system.

Summary

Property	HOL Light MCP Server
Kernel author	John Harrison 🇬🇧 (Cambridge PhD 1996)
Implementation	~400 lines OCaml (INRIA 🇫🇷)
Notable proofs	IEEE 754 transcendentals, Kepler conjecture (Flyspeck)
MCP transport	HTTP (JSON-RPC 2.0)
CLOUD Act exposure on sota.io	None — EU incorporated, Germany datacenter
EU AI Act Art. 9 relevance	Direct — machine-checked proofs satisfy "foreseeable risks" requirement
Licence	BSD (HOL Light), open protocol (MCP)

HOL Light has been proving things machines cannot get wrong since 1996. In 2026, it can now do that inside your AI assistant. Deploy the MCP server on EU infrastructure, and your proof pipeline is both formally sound and legally clean.