Cellule-PRO · Sovereign microsystem

Self-hosted LLM inference :
for organizations that won't hand their data over to the cloud.

A complete self-contained microsystem in a single Docker image. Your data never leaves your infrastructure. GDPR & global privacy frameworks, AI compliance-ready (EU AI Act · NIST AI RMF · ISO/IEC 42001), native multi-site federated cluster.

🎁 PIONEERS 3 first organizations: 2 years free + lifetime pioneer pricing. See offer →
✓ GDPR & global privacy frameworks ✓ AI compliance-ready (EU AI Act · NIST · ISO 42001) ✓ Air-gap runtime ✓ Hardened binary delivery OpenAI-compatible (opencode, OpenWebUI, Continue.dev) Ed25519 · pgvector · PostgreSQL

Who is it for?

If you handle confidential data that you cannot legally or ethically hand over to the cloud, and you still want a powerful internal AI assistant, Cellule-PRO is for you.

⚖️ Law firms

Professional secrecy, sensitive client files, contract review, case-law search. Cloud = leak of privilege. Host it in-house.

🏥 Hospitals & healthcare

Health data, patient records, clinical protocols. Health-data hosting regulations require self-hosting or sovereign certified hosting.

🏭 Multi-site industrial groups

Intellectual property, technical drawings, supplier contracts. Federated cluster across your factories and offices, with no central server.

🔒 Defense & government

Classified data, mandatory state sovereignty. Air-gapped runtime guaranteed. Docker image delivered without any Internet dependency.

The problem you're living today

Cloud AI = data leak

ChatGPT Enterprise, Claude for Work, Azure OpenAI: your data transits and is stored at a third party. Incompatible with privilege, health-data regulations, industrial IP.

Ollama / LM Studio = a toy

One machine, one user, no RAG, no enterprise governance, no audit trail. Nice for tinkering, unusable in production.

DIY stack = nightmare

Wiring vLLM + pgvector + OpenWebUI + Keycloak + monitoring = 6 months of integration. You're not an AI startup.

EU AI Act is coming

August 2026: audit, traceability and DPIA obligations. Today's cloud AI does not provide these guarantees. You need to take back control.

Cellule-PRO: the complete sovereign microsystem

📦

A single Docker image

Private LLM chat + conversational RAG + document RAG + collaborative project spaces + OpenAI-compatible API + admin dashboard + GDPR employee portal. All in one shippable image.

🔒

No data leaves your network

Air-gapped runtime guaranteed. The image installs on your hardware. Full firewall isolation possible. No phone-home, no cloud licensing.

🌐

Native multi-site cluster

Ed25519 federation with no central pool. Headquarters + branches + GPU datacenter, all federated as a single cluster. Symmetric admin HA — any pool gives you the full view.

📋

Native GDPR articles 15-22

Self-service employee portal: access, rectification, erasure, portability. Append-only audit trail. Cascade anonymization. Ready for regulatory audit without consulting fees.

🛡️

Admin pilots, zero magic

Every switchover, migration and alert is visible, traced, approvable. Configurable failsafe. No "the AI decided on its own". Your IT team stays in control.

🔑

Initialization ceremony

Ed25519 private key generated air-gapped on your premises during a controlled ritual. Your cryptographic identity depends on no third party. Same model used by certificate authorities and bank cold wallets.

Four-layer architecture

A simple mental model that surfaces everywhere: in the docs, the admin UI, the commercial diagrams. Everything flows from this.

Atom

Worker node (GPU/CPU)
or lightweight proxy. Interchangeable, stateless.

Pool

Docker orchestrator: routing, RAG, API, admin. Self-contained and stateful.

Federation

Mesh of N Ed25519-paired pools. No central pool. Symmetric admin HA.

Governance

GDPR audit, offline JWT licensing, incident workflow, failsafe.

Explore the 6 architecture diagrams →   ▶ Open the control room (live)

What you won't find anywhere else

Beyond private LLM chat and RAG, Cellule-PRO ships an orchestration layer designed for IT departments that want a microsystem still autonomous three years from now — not a POC that becomes unmanageable after six months.

🌐

Multi-site replicated model catalog

Drop a model (drag-drop) on any pool: it replicates automatically across sites through signed Ed25519 federation. If a site goes down, another takes over with one click — multi-site RAID for your models, no central server, no third-party cloud.

🎯

Automatic smart routing

A single API, several specialized models behind it. The pool routes each request to the right model: a Coder for code, a conversational model for chat, a reasoning model for long tasks. The user writes, the pool picks. Zero friction.

🧠

Infinite cross-session memory — your LLM remembers you

Your devs use opencode, Cursor, Continue.dev on Monday, close their laptop, reopen Wednesday on another machine — the LLM still remembers: the bug to fix, the architecture decisions, the project conventions. End any prompt with [MEMORIZE: fact] to plant a fact, then it surfaces automatically next time you ask about it. Per-user encrypted RAG, on-LAN, no --continue needed. The only solution that does this without sending your data to the cloud.

⚙️

Auto-tier workers on heterogeneous fleets

Your office PCs, workstations and GPU servers don't have the same horsepower — that's normal. On startup, each worker self-benchmarks and the pool assigns it the right model (2B, 4B, 9B, 30B). Your heterogeneous IT fleet onboards without manual intervention.

🔄

Self-improvement: your LLM helps your IT team

The OpenAI-compatible API lets your IT team use Cellule-PRO as a sub-agent in their own dev/admin workflow and internal procedure writing. Virtuous loop: your private LLM helps you maintain the system that runs it. Complete sovereignty, near-zero marginal cost.

💬

Built-in employee web UI — no OpenWebUI needed

Your employees never touch a terminal or a config file. They open a URL, log in, and chat directly: saved conversations, RAG document drag-and-drop, collaborative project workspaces, [MEMORIZE: fact] mode. Project mode = an isolated workspace for a client matter, a patient case, or an audit, with its own dedicated memory. All on your LAN, zero heavy client to install.

🔌

…and compatible with the tech tools they already use

Your devs prefer opencode / Continue.dev / Cursor? Your data scientists want their Python openai SDK? Some teams already adopted OpenWebUI on Ollama and want to keep it? The OpenAI-compatible API of Cellule-PRO accepts them all via sk-cellule-* tokens. The built-in UI for those who want simple, the API for those who want integration.

Released this month — May 2026

Cellule-PRO ships continuously. Here is what landed in production over the last 30 days, all directly available in the current image.

🚀

Zero-touch worker upgrades

Bump the pool, the entire workforce updates itself on the next handshake — Windows ScheduledTask, Linux systemd, all under SYSTEM/root, no admin walks across the office. Verified end-to-end on heterogeneous fleets, including rollout of new flags and protocols. You ship, they catch up by themselves.

🎚️

Automatic VRAM-aware context cap

The pool detects each worker's real VRAM and caps ctx_size at a safe value — no GPU saturation, no manual tuning. Observed end-to-end: a saturated RTX 2060 went from 0.1 tok/s to 46 tok/s once the cap kicked in (a 460× speedup), and an entry-level RTX A400 jumped from 4 to 18.9 tok/s. The DSI sets a single global toggle, the pool handles every model.

🛰️

Satellite pool in one command, zero wizard

A second site joins your federation with one shell line — the image streams from the master over LAN (no DockerHub, no internet), the join token is auto-consumed, the federation handshake is signed Ed25519, and the catalog of GGUF models is replicated automatically. Total LAN sovereignty. No admin needed on the satellite.

💻

Workers on Windows, Linux, macOS Apple Silicon

Same one-liner installer for every employee desktop — the script auto-detects OS and architecture, pulls the bundled Python and the matching engine wheel (CUDA/Metal/CPU), then registers the worker as a native service (ScheduledTask on Windows, systemd on Linux, launchd plist on macOS). M1/M2/M3/M4 Apple Silicon supported with Metal acceleration.

🎛️

26 admin flags across 11 DSI categories

Every operational tunable — VRAM cap, recall depth, queue size, KNN top-k, audit retention, satellite gossip, forwarding policy — is a labeled toggle with a business description and per-profile recommendation, in French and English. No hidden environment variables, no patched config files. The DSI pilots, the pool obeys.

🛡️

Hardened binary delivery — IP-critical modules compiled

The license validator, the federation cryptography, the smart-routing engine, the MoE sharding logic, the RAG retrieval engine and the anti-entropy loop are compiled to native shared objects inside the delivered Docker image. A pentest engagement at the client site can't trivially recover the Python source of the load-bearing IP from docker save. Your competitive edge stays yours. The wheels served via /pypi to employee desktops keep their cross-platform Python source — only the pool runtime is hardened.

🔐

Long-term cryptographic continuity

The pool's license verifier supports multiple signing keys in parallel — built for decade-long deployments. When a signing key needs to rotate (planned maintenance, regulatory updates, ceremony renewal), a transitional image accepts both the old and the new key during a grace period. Your deployment never goes dark for a key reason. Same pattern used by certificate authorities for root CA rotation. Constant-time token comparisons throughout the stack (hmac.compare_digest), no admin-token timing leak under network probing.

Cellule-PRO vs alternatives

Criterion ChatGPT Enterprise Azure OpenAI Ollama / LM Studio Cellule-PRO
Data stays on your premises No (OpenAI cloud) Partial (EU Azure cloud) Yes Yes (air-gap runtime)
Built-in document RAG Limited knowledge files Build it yourself No Yes (pgvector + citations)
Collaborative project mode Basic workspaces No No Yes (Ed25519 membership)
Multi-site cluster N/A Build it yourself No Native (Ed25519 federation)
GDPR articles 15-22 OpenAI DPA Microsoft DPA Build it yourself Native (self-service)
Incident audit trail Limited Azure Monitor No Append-only DB
EU AI Act compliance In progress In progress N/A Ready by design
Multi-model smart routing Single model Single model Script it yourself Auto (Coder / Chat / Reasoning)
Multi-site replicated model catalog N/A Build it yourself No Yes (Ed25519 federation)
OpenAI-compatible API (opencode/Continue/OpenWebUI) Proprietary API Azure subset Limited Yes (sk-cellule-* tokens)
Hardened binary delivery (no readable source IP) N/A (cloud) N/A (cloud) No (plain .py) Yes (compiled .so)

How to get started?

60-day BETA — zero commitment
You provide a machine (server or VM). We deploy the Docker image, train your IT team, onboard 1-5 pilot employees. At day 60 you decide: continue in production, or stop — we destroy everything, your data was always yours.

1. Discovery call (30 min)

We frame it: your data, your employees, your regulatory constraints, your existing infrastructure. We confirm Cellule-PRO is the right fit (sometimes it isn't — we'll say so).

2. BETA scoping

We define together the perimeter: number of pilot employees, data used, success criteria at day 60. Formal commitment only after cross-validation of the BETA.

3. Installation (2-4h)

Docker image + PostgreSQL + first-boot wizard. We guide remotely or on-site as you prefer. Your IT team follows step by step.

4. Pilot training (2h)

2-hour live tutorial: chat, document RAG, project mode, GDPR self-service. Your employees start using the system right away.

5. 60-day support

Email + video. You ask any question, we tune the config. Goal: your team is autonomous by day 40.

6. Day-60 decision

Review with your IT team: actual usage, employee satisfaction, perceived ROI. Continue in production or clean stop. No pressure.

Frequently asked questions

What hardware do we need?

For 5-20 employees: 1 server with a recent GPU (RTX 4080/4090 or A4000+) or a powerful AMD Ryzen AI CPU, 32-96 GB RAM, 500 GB SSD. For 50-200 employees: 2-3 servers. Validated precisely during the discovery call.

Which LLM is used?

Qwen 3.5 by default (open source, multilingual, strong on office workloads). You can load any GGUF model: Mistral, Llama, Gemma, DeepSeek. No proprietary-model lock-in.

What if you go away?

Fair question. The Docker image is on your hardware, the source code is accessible to you through a notarized escrow. You can keep operating it without us. The offline JWT Ed25519 license model depends on no remote server.

What if we want to stop?

You export your data (GDPR article 20 native: standard JSON/ZIP export in one click). You stop the containers. Done.

Is Cellule-PRO open source?

Cellule-PRO is proprietary, distributed under commercial license with code access via notarized escrow in case of vendor failure. A public sister project (cellule.ai) remains under AGPLv3 for the community — both share a common technical base but are distributed separately.

What support do you offer?

Email (48h SLA on business days), crisis video calls, signed image updates. Support tiers are scoped to your needs during the discovery call.

Ready to test?

30-min discovery call to find out if Cellule-PRO meets your needs.
No pressure. If it isn't the right fit, we'll tell you — and point you elsewhere.