Deploy Futhark to Europe β GPU Array Computing on EU Infrastructure in 2026
Writing fast parallel code is usually an exercise in managing complexity that should not exist. You describe the computation you want β multiply every element of an array by a constant, sum the results, find the maximum β and then you rewrite it in a different language to make it fast. CUDA kernels. OpenCL dispatch code. Thread pools, work queues, memory transfer boilerplate. The algorithm is three lines; the parallel version is three hundred.
Futhark is built on a different premise: if your program is purely functional and operates on arrays, parallelism can be derived automatically. You write sequential array operations using map, reduce, scan, and filter. The Futhark compiler analyses the data dependencies and generates parallel GPU kernels or vectorised CPU code from your sequential description. The correctness proof β that the parallel execution produces the same result as the sequential semantics β is the compiler's responsibility, not yours.
Futhark was created at DIKU β the Department of Computer Science at the University of Copenhagen π©π° β by a team of Danish and European researchers. It is the only EU-origin GPU programming language designed for automatic parallelisation. Futhark backends run on NVIDIA CUDA, AMD/Intel OpenCL, and multicore CPU β the same code, compiled once, runs across hardware targets. For EU compute backends serving scientific workloads, ML inference, financial risk modelling, or image processing, Futhark provides GPU-class performance from purely functional sequential code.
Futhark applications deploy to sota.io on EU infrastructure with full GDPR compliance. This guide shows how.
The European Futhark Team
Futhark is a research language created entirely within the European academic system. Every primary contributor is affiliated with EU institutions.
Troels Henriksen π©π° β Danish computer scientist at DIKU, the Department of Computer Science at the University of Copenhagen (Datalogisk Institut, KΓΈbenhavns Universitet) β is the creator and primary developer of Futhark. Henriksen began the Futhark project during his PhD at DIKU, defended in 2017 under the title "Design and Implementation of the Futhark Programming Language." His thesis laid out the core technical contributions: the array programming model, the fusion optimisation pipeline (eliminating intermediate arrays), the flattening transformation that converts nested parallelism into flat GPU-executable parallelism, and the CUDA and OpenCL code generation backends. Henriksen has continued developing Futhark as a research project and production tool at DIKU. He maintains the language, the compiler, the standard library, and the package ecosystem (futhark pkg). His work represents a strand of EU programming language research focused on the practical problem that GPU programming is too difficult for domain scientists β climate researchers, bioinformaticians, financial engineers β who need high-performance array computations but should not need to learn CUDA to get them.
Martin Elsman π©π° β Danish professor at DIKU β is a co-designer of Futhark's type system and co-author of the foundational Futhark papers. Elsman's research spans programming language theory, type systems, and high-performance functional programming. His contributions to Futhark include the size-polymorphic type system (arrays are typed with their sizes; the type checker enforces size consistency statically) and the work on in-place updates with uniqueness types (allowing safe mutation inside otherwise pure functions). Elsman is also the principal investigator on EU-funded research grants that have supported Futhark's development.
Cosmin Oancea π·π΄ β Romanian associate professor at DIKU β is a co-author of Futhark's GPU backend and the primary author of its loop transformation and kernel optimisation passes. Oancea's research focuses on the compiler analysis that enables automatic GPU parallelisation: how to detect and exploit irregular parallelism, how to map nested parallel loops onto flat GPU execution models, and how to optimise memory access patterns for GPU caches. His work on flattening nested parallelism β transforming deeply nested map/reduce compositions into GPU-executable flat parallel operations β is the core transformation that makes Futhark's automatic parallelisation practical on real hardware.
Niels G.W. Serup π©π° β Danish researcher at DIKU β contributed to Futhark's package manager (futhark pkg), the interactive tooling, and the language server. His work on tooling lowered the barrier to entry for domain scientists at EU research institutions who want to use Futhark for real computations.
Robert Schenck π©πͺ β German researcher who collaborated with the DIKU group β contributed to Futhark's automatic differentiation (AD) support. Futhark now supports reverse-mode AD for differentiating Futhark programs, making it suitable for ML-adjacent workloads (gradient computation for custom neural network layers, scientific parameter estimation). Schenck's work connects Futhark to the EU machine learning research community that needs differentiable array programming at GPU speed.
Why Futhark for EU Compute Backends
Automatic parallelisation from purely functional code. Futhark eliminates the gap between algorithm and implementation. You write map (\x -> x * 2.0f32) data and the compiler generates a parallel GPU kernel. You write reduce (+) 0 (map f data) and the compiler fuses the map and reduce into a single GPU kernel with no intermediate array allocation. The sequential semantics guarantee: your parallel program produces the same result as running the operations one at a time. For EU research backends (climate modelling, genomics, financial simulation), this means domain experts can write correct algorithms in familiar functional style without learning GPU programming.
Size-polymorphic type system eliminates out-of-bounds errors. Futhark arrays are typed with their sizes. [n]f32 is an array of n 32-bit floats; [m][n]f32 is an mΓn matrix. Size variables propagate through the type system: map f (a: [n]f32) : [n]f32 β the output is the same length as the input, statically checked. zip (a: [n]f32) (b: [n]f32) : [n](f32, f32) β both arrays must have the same size, enforced at compile time. Out-of-bounds access is structurally impossible for operations that respect the size types. For EU data processing backends (medical imaging, genomics pipelines), this means array shape errors are caught at compile time, not as runtime crashes in production.
Multiple hardware backends from one source. The same Futhark source compiles to CUDA (NVIDIA GPUs), OpenCL (AMD/Intel GPUs and FPGAs), multicore C (CPU parallel via pthreads), sequential C (single-threaded CPU), and ISPC (Intel SIMD). You do not write different code for different hardware. For EU scientific computing services that need to run on different infrastructure β GPU-accelerated containers for training workloads, CPU containers for inference, edge devices for sensor processing β Futhark's multi-backend model means one codebase serves all deployment targets.
Python bindings for production service integration. Futhark compiles to Python modules via futhark pyopencl or futhark pycuda. The compiled module is a Python class with methods corresponding to your Futhark entry points. You call compute.matrix_multiply(a, b) from a FastAPI handler; the arrays transfer to GPU, compute, and return as NumPy arrays. For EU backend services that need GPU-accelerated endpoints alongside standard HTTP APIs, this means Futhark integrates into standard Python/FastAPI/Django stacks without a foreign runtime.
Purely functional guarantees for multi-user compute isolation. Futhark programs have no mutable global state and no I/O in compute kernels. Every entry point takes inputs and produces outputs with no side effects outside the explicit GPU memory allocations for that computation. For EU services processing personal data (genomics analysis, medical imaging, financial risk), this means there is no mechanism by which computation for one user can affect another user's results β pure functions are the strongest possible isolation guarantee.
Fusion eliminates intermediate allocations. Futhark's compiler performs producer-consumer fusion: when the output of one operation is immediately consumed by another, the compiler merges them into a single GPU pass with no intermediate array written to GPU memory. map f (map g data) becomes a single kernel; filter p (map f data) fuses into one pass. For EU analytics backends processing large datasets (Eurostat statistics, ECB financial data, genomic reference panels), fusion can reduce memory bandwidth requirements by 2β5Γ compared to naive element-wise GPU kernels.
Futhark Language Essentials
Futhark's syntax is clean and functional. Entry points are marked entry and are exposed as methods in the compiled Python/C bindings:
-- Basic array operations
entry map_scale [n] (data: [n]f32) (factor: f32) : [n]f32 =
map (* factor) data
-- Reduction with fusion
entry sum_of_squares [n] (data: [n]f32) : f32 =
reduce (+) 0.0f32 (map (** 2.0f32) data)
-- Matrix multiplication (automatically parallelised)
entry matmul [n][m][p] (a: [n][m]f32) (b: [m][p]f32) : [n][p]f32 =
map (\row ->
map (\col ->
f32.sum (map2 (*) row col)
) (transpose b)
) a
-- Scan (prefix sum) β GPU-parallel via parallel prefix algorithm
entry prefix_sum [n] (data: [n]f32) : [n]f32 =
scan (+) 0.0f32 data
The [n], [m], [p] in the function signatures are size parameters β the type checker verifies that a has m columns and b has m rows, statically. No runtime shape checks; the type system handles it.
GDPR-Compliant Analytics: Statistical Aggregations
A common EU backend pattern is computing aggregate statistics on personal data without retaining individual records. Futhark's pure functional model is ideal β the computation transforms data to aggregates with no mechanism for storing intermediate state:
-- Histogram computation for anonymised statistics
entry age_histogram [n] (ages: [n]i32) (bins: i32) : []i32 =
let bin_size = 100 / bins
let bin_idx = map (\age -> i32.min (bins - 1) (age / bin_size)) ages
in reduce_by_index (replicate bins 0) (+) 0 bin_idx (replicate n 1)
-- K-anonymity check: count per group, flag groups below threshold
entry k_anonymity_mask [n] (group_ids: [n]i32) (groups: i32) (k: i32) : [n]bool =
let counts = reduce_by_index (replicate groups 0) (+) 0 group_ids (replicate n 1)
in map (\gid -> counts[gid] >= k) group_ids
-- Differential privacy: add Laplace noise to counts
entry add_laplace_noise [n] (counts: [n]f32) (sensitivity: f32) (epsilon: f32) (noise: [n]f32) : [n]f32 =
let scale = sensitivity / epsilon
in map2 (\c n -> c + scale * n) counts noise
reduce_by_index is Futhark's histogram primitive β it computes a scatter-reduction in a single GPU pass. For GDPR Article 89 research exemption workloads (statistical analysis with appropriate safeguards), Futhark's pattern of transforming personal data to anonymised aggregates with no intermediate persistence matches the data minimisation principle of GDPR Article 5(1)(c).
Financial Risk: Monte Carlo on GPU
EU financial services regulation (MiFID II, Solvency II, FRTB) requires large-scale risk computations. Monte Carlo simulation β the standard technique β parallelises trivially on GPU and is a natural fit for Futhark:
-- Black-Scholes Monte Carlo option pricing
-- paths: [num_paths]f32 of standard normal samples
entry bs_monte_carlo [n]
(paths: [n]f32)
(spot: f32) (strike: f32) (rate: f32) (vol: f32) (T: f32)
: f32 =
let dt = T
let drift = (rate - 0.5f32 * vol * vol) * dt
let diffusion = vol * f32.sqrt dt
let terminal_prices = map (\z ->
spot * f32.exp (drift + diffusion * z)
) paths
let payoffs = map (\s -> f32.max 0.0f32 (s - strike)) terminal_prices
let mean_payoff = f32.sum payoffs / f32.i32 n
in mean_payoff * f32.exp (-rate * T)
-- Value-at-Risk at confidence level alpha
entry compute_var [n] (pnl: [n]f32) (alpha: f32) : f32 =
let sorted = radix_sort_float n pnl
let idx = t32 (f32.i32 n * (1.0f32 - alpha))
in sorted[idx]
One million Monte Carlo paths on a GPU takes milliseconds in Futhark. The same code runs on CPU (multicore C backend) for smaller workloads. For EU fintech services computing real-time risk metrics for MiFID II best-execution reporting or Solvency II SCR calculations, Futhark provides GPU-class performance from functional sequential code.
Scientific Computing: Signal Processing
EU research infrastructure (CERN, ECMWF, ESA) produces enormous sensor datasets. Futhark's array model maps directly to signal processing algorithms:
-- Convolution (1D)
entry convolve1d [n][m] (signal: [n]f32) (kernel: [m]f32) : [n]f32 =
let half = m / 2
in map (\i ->
let padded_i j = if i + j - half < 0 || i + j - half >= n
then 0.0f32
else signal[i + j - half]
in f32.sum (map2 (*) kernel (map padded_i (iota m)))
) (iota n)
-- Discrete Fourier Transform (naive, for illustration)
entry dft [n] (signal: [n]f32) : [n](f32, f32) =
let twopi = 2.0f32 * f32.pi
in map (\k ->
let angle j = -twopi * f32.i32 (k * j) / f32.i32 n
let real = f32.sum (map (\j -> signal[j] * f32.cos (angle j)) (iota n))
let imag = f32.sum (map (\j -> signal[j] * f32.sin (angle j)) (iota n))
in (real, imag)
) (iota n)
-- Normalise a batch of signals to [0,1] range
entry batch_normalise [b][n] (signals: [b][n]f32) : [b][n]f32 =
map (\s ->
let mn = f32.minimum s
let mx = f32.maximum s
let range = mx - mn
in if range == 0.0f32 then s else map (\x -> (x - mn) / range) s
) signals
batch_normalise processes a batch of b signals, each of length n, in parallel across the batch dimension. On GPU, all b signals normalise concurrently. For EU medical device backends processing sensor batches (ECG signals, EEG readings, MRI reconstruction), Futhark's batch processing model maps directly onto GPU execution.
Serving Futhark via Python FastAPI
Futhark compiles to Python bindings. A production deployment wraps Futhark kernels in a FastAPI service:
# requirements.txt
# futhark-pycffi, fastapi, uvicorn, numpy
import numpy as np
import futhark_pycffi as fc
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
# Compile: futhark c --library analytics.fut
# Then: python -m futhark_pycffi analytics
ctx = fc.Context()
app = FastAPI()
class ScaleRequest(BaseModel):
data: list[float]
factor: float
class RiskRequest(BaseModel):
paths: list[float] # standard normal samples
spot: float
strike: float
rate: float
vol: float
T: float
@app.post("/api/scale")
async def scale_data(req: ScaleRequest):
data_arr = np.array(req.data, dtype=np.float32)
result = ctx.map_scale(data_arr, np.float32(req.factor))
return {"result": result.tolist()}
@app.post("/api/risk/option-price")
async def price_option(req: RiskRequest):
paths_arr = np.array(req.paths, dtype=np.float32)
price = ctx.bs_monte_carlo(
paths_arr,
np.float32(req.spot),
np.float32(req.strike),
np.float32(req.rate),
np.float32(req.vol),
np.float32(req.T)
)
return {"price": float(price)}
@app.get("/health")
async def health():
return {"status": "ok"}
The Futhark Context manages GPU memory and kernel loading. Arrays transfer as NumPy arrays β zero-copy when using CFFI with pinned memory. For EU compute services, this architecture separates concerns cleanly: HTTP routing in Python, compute kernels in Futhark, both running in the same container.
Deploying to sota.io
sota.io detects Dockerfiles automatically. A Futhark/Python service deploys as a multi-stage container:
# Stage 1: compile Futhark kernels
FROM haskell:9.4 AS futhark-builder
RUN cabal update && cabal install futhark
WORKDIR /build
COPY analytics.fut .
# Compile to C library (CPU multicore backend β no GPU required on host)
RUN futhark c --library analytics.fut
# Stage 2: production service
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy compiled Futhark C library and Python bindings
COPY --from=futhark-builder /build/analytics.c .
COPY --from=futhark-builder /build/analytics.h .
# Build Python CFFI bindings
COPY setup.py .
RUN python setup.py build_ext --inplace
COPY main.py .
EXPOSE 8080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
The multicore C backend (futhark c) runs parallel computations across all available CPU cores without requiring GPU hardware. Hetzner's dedicated server line (used by sota.io) provides high core counts with AVX-512 SIMD β Futhark's C backend vectorises automatically.
With the sota.io CLI:
sota deploy
The build runs in Germany. Your Futhark application runs in Germany. Personal data processed by your compute service never leaves EU jurisdiction.
GPU Backend for High-Performance Workloads
For GPU-accelerated deployments, sota.io supports GPU-enabled containers. Use the OpenCL backend:
FROM nvidia/opencl:devel AS futhark-builder
RUN apt-get update && apt-get install -y haskell-platform
RUN cabal update && cabal install futhark
WORKDIR /build
COPY analytics.fut .
# Compile to OpenCL β runs on NVIDIA/AMD/Intel GPU
RUN futhark opencl --library analytics.fut
The Futhark pyopencl Python backend (futhark pyopencl analytics.fut) generates a Python class that transfers arrays to GPU, executes OpenCL kernels, and returns NumPy arrays. The GPU executes thousands of Futhark array operations concurrently.
Environment Variables and Database
Futhark handles computation; Python handles I/O and database access. Read environment variables via standard Python:
import os
DATABASE_URL = os.environ["DATABASE_URL"] # injected by sota.io
PORT = int(os.environ.get("PORT", "8080"))
sota.io injects DATABASE_URL when you provision a managed PostgreSQL database. Use asyncpg for non-blocking database access alongside Futhark GPU computation:
import asyncpg
import numpy as np
async def compute_and_store(pool: asyncpg.Pool, dataset_id: int, raw_data: list[float]):
# 1. Load from EU database
async with pool.acquire() as conn:
rows = await conn.fetch(
"SELECT value FROM measurements WHERE dataset_id = $1 ORDER BY timestamp",
dataset_id
)
# 2. Compute in Futhark (CPU or GPU)
data_arr = np.array([r["value"] for r in rows], dtype=np.float32)
result = ctx.prefix_sum(data_arr)
# 3. Store results back (parameterised β no SQL injection)
async with pool.acquire() as conn:
await conn.execute(
"INSERT INTO results (dataset_id, values) VALUES ($1, $2)",
dataset_id,
result.tolist()
)
All database operations are parameterised. Raw data loads from Hetzner Germany. Futhark computes on Hetzner Germany CPUs (or GPUs). Results store to Hetzner Germany PostgreSQL. No data leaves EU jurisdiction at any point in the pipeline.
Automatic Differentiation for ML Inference
Futhark supports reverse-mode automatic differentiation (AD) via the vjp (vector-Jacobian product) combinator. This enables gradient computation for custom ML layers:
-- Simple neural network layer: linear + ReLU activation
def linear_relu [n][m] (W: [m][n]f32) (b: [m]f32) (x: [n]f32) : [m]f32 =
let z = map2 (\row bi -> f32.sum (map2 (*) row x) + bi) W b
in map (f32.max 0.0f32) z
-- Compute gradient of loss with respect to input x
entry compute_gradient [n][m]
(W: [m][n]f32) (b: [m]f32) (x: [n]f32) (loss_grad: [m]f32)
: [n]f32 =
let (_, grad_x) = vjp (linear_relu W b) x loss_grad
in grad_x
vjp takes a function and a "seed" gradient vector and returns the output and the gradient of the inputs using reverse-mode AD, automatically derived from the Futhark source. For EU ML inference services that need custom gradient computation β physics-informed neural networks, scientific ML models, Bayesian optimisation β Futhark's AD support provides GPU-accelerated gradient computation from functional array code.
EU Infrastructure and GDPR Compliance
Futhark compute kernels (CPU multicore / GPU)
β
FastAPI HTTP service
β
sota.io platform
β
Hetzner Germany (Frankfurt)
β
EU jurisdiction β no Cloud Act exposure
Deploying on sota.io means:
- No US Cloud Act jurisdiction β sota.io is a German company. Hetzner is a German company. No US corporate parent can receive a CLOUD Act production order for your EU users' data.
- GDPR Article 28 DPA β Data Processing Agreement available on request.
- Data residency guarantee β your PostgreSQL database and application containers run in Germany.
- TLS by default β HTTPS certificate provisioned automatically.
- Pure function isolation β Futhark's purely functional compute model means no global state, no inter-request data leakage, no side effects between user computations.
For EU compute backends processing personal data β genomics analysis, medical signal processing, financial risk modelling, anonymised statistical analytics β sota.io provides the compliance layer that US-jurisdiction cloud providers cannot, combined with Futhark's purely functional guarantees at the compute layer.
Getting Started
Install Futhark (requires GHC/Haskell Stack or Cabal):
# macOS
brew install futhark
# Ubuntu (via Stack)
stack install futhark
# Docker (recommended for reproducible builds)
docker pull futharklang/futhark:latest
Create your first Futhark program:
-- hello.fut
entry main [n] (data: [n]f32) : f32 =
reduce (+) 0.0f32 (map (** 2.0f32) data)
Compile and test:
# Compile to C executable for testing
futhark c hello.fut
echo "3.0 4.0 5.0" | ./hello
# Compile to Python bindings (CPU backend)
futhark c --library hello.fut
python -m futhark_pycffi hello
# Run Futhark's built-in benchmarking tool
futhark bench hello.fut --backend=c
Deploy to sota.io:
# Install sota CLI
curl -fsSL https://sota.io/install.sh | sh
# Authenticate
sota auth set-key YOUR_API_KEY
# Deploy from your project directory (with Dockerfile)
sota deploy
Your Futhark compute service is live at your-project.sota.io on German infrastructure within minutes.
sota.io is the EU-native PaaS for Futhark and GPU-accelerated compute backends β GDPR-compliant infrastructure, managed PostgreSQL, and zero-configuration TLS. Deploy your first Futhark application β