Annual license per pool. Unlimited queries. No per-token billing, no surprise. Performance depends on your hardware — you don't pay for cloud inference, you buy the software that orchestrates.
3-lawyer firm, solo doctor, accountant
1–5 users
Law firm, group medical practice, SMB
5–30 users
Mid-cap, multi-site group, regional hospital
30–200 users
Group ≥ 200 users
or strictly regulated sector
Cellule-PRO is orchestration software, not a cloud offering. Inference runs on your existing infrastructure (idle employee desktops with CPU/GPU, or dedicated servers you already own). You pay no cloud inference cost, no tokens, no lock-in.
Performance therefore depends on your hardware: expect 15-50 tokens/second per worker depending on hardware (2B-30B models Q4_K_M, local execution). It's not GPT-4-cloud throughput, but it's fast, and it stays 100% on your premises — including when the EU AI Act asks you to account for it in August 2026.
Specific needs not in your plan? Here's what we can add on demand.
Production Ed25519 key initialization ritual on your premises, with your DSI and DPO present. 1 day.
3h session on your premises, up to 20 employees. Chat, RAG, project mode, GDPR self-service.
SAML 2.0, OIDC, internal LDAP. Unified authentication with Active Directory or Azure AD.
ERP connector, legal DMS, hospital EHR. Custom quote based on complexity.
Transfer your conversations, existing RAG, prompts. 2-3 day engagement.
Review with your CISO: pentest, architecture, hardening. 2-3 engineer days.
Because inference runs on your infrastructure, not ours. We don't know how much you consume and we don't want to. You pay an annual license, period. No end-of-month surprise.
Tiers are indicative, not technically enforced. If you grow from 28 to 35 users mid-year, we revisit once a year at renewal. No instant blocking, no stress.
On a desktop with RTX 2060 (6 GB VRAM) + 4B Q4_K_M model: 40-50 tok/s. On a workstation RTX 3090 + 30B: 15-25 tok/s. On a MacBook M2 Pro: 20-40 tok/s. Fast for chat and RAG, slower than cloud GPT-4 on long reasoning. The pool automatically caps the context based on your VRAM to avoid saturation.
Enterprise tier = source code escrow with a trusted third party (escrow.com or equivalent). If we stop, you recover the code and continue autonomously. For other tiers, the runtime is air-gap: as long as your JWT license is valid, your pool runs without external dependency.
An annual license, that's it. If you don't renew the following year, your pool keeps running with your data but without new versions or support. No threat, no leak.