Getting started

Overview

PlanVault is an enterprise AI orchestration platform. Clients send natural-language prompts; an LLM generates structured execution plans with dynamic routing to the right tools (REST APIs, MCP servers, webhooks). The platform handles multi-tenant isolation, adaptive tool selection, webhook and API confirmations, secret management, multi-provider LLM routing, event-sourced FSM execution with crash recovery, and provides both a web console for administration and a Runtime API for programmatic integration.

This guide covers the system architecture, core concepts, request lifecycle, and security model. For REST API endpoints, request/response schemas, authentication details, and interactive Swagger — see the API documentation page.

Key capabilities: • Multi-tenant architecture with organisation/project isolation and role-based access • Dual planner modes: structured JSON Schema and Python-like DSL — same internal plan representation, same runtime • Adaptive tool retrieval with vector search, FTS, hierarchical centroid routing, and scenario-based fusion • Event-sourced FSM execution with crash recovery (Apache Pekko persistence) • Envelope encryption (AES-256-GCM) for all secrets, provider keys, and session data at rest • Real-time SSE streaming for execution progress • Human-in-the-loop: plan approval and slot/form input • Lifecycle webhooks for backend integration (outbound POST for subscribed types: completed, failed, requires_action, interrupted, recovery_required) • Inbound webhook triggers to start sessions from external systems (Slack, GitHub, CI) • Data export, per–external-user erasure workflows, and configurable session retention for portability and subject-rights handling • Self-hosted and air-gapped VPC deployment support

API docs and Swagger

Technical Advantages

PlanVault is built around capabilities that matter in production: scaling to thousands of tools without context overflow, handling multi-megabyte API responses, encrypting every byte at rest, and recovering from crashes mid-execution. This section covers the technical differentiators in detail.

Intelligent tool selection

Most LLM-based agent platforms are limited to 128–200 tools per prompt context. PlanVault’s 4-tier adaptive retrieval scales to thousands of registered tools — the planner always receives a focused, relevant shortlist. On cold starts (before the feedback loop kicks in) semantically similar APIs may compete, but auto-scenarios quickly adapt the ranking.

• Automatic OpenAPI → tools ingestion with full lifecycle: versioning, embedding generation, search document construction • 4-tier adaptive strategy with configurable thresholds (defaults: Direct ≤20 tools, FlatRag ≤100, FullRag ≤200, HierarchicalRag 200+) • Scenario-based boosting — manual templates (priority 2–100) and auto scenarios with Semantic Routing Cache (semantic embedding match; raw user prompts are not stored) plus success_rate tracking • Per-group caps prevent any single service from dominating the shortlist • Test selection endpoint — dry-run the full pipeline without real execution (POST …/tools/test-selection) • Configurable fusion weights (RRF-K, usage boost cap, centroid top-K) per org/project • Hard cap at 200 tools in the final shortlist; retrievalMaxTools default 30

Large response handling

API responses of 1–20 MB don’t break the agent. PlanVault extracts what matters before data becomes runtime context; the planner does not receive raw payloads by default, and bounded evidence replan for read-only tools is explicitly opt-in.

• Input & output schema flattening — nested OpenAPI structures are converted into flat parameter lists, reducing hallucinations during plan generation • resultJsonPath on webhook tool execution details — extract a specific fragment from a large JSON response before it enters execution scope • get_field / set_field / merge stdlib tools — the planner can work with large objects incrementally without pulling everything into the prompt • Output field visibility cap in prompt generators — the LLM sees only the first five output schema fields, not an entire 500-field JSON definition • On the **evidence replan** path with `postSuccessReplan.redaction` enabled, large JSON is trimmed via configurable maxDepth (default 4), string caps, and optional key drops before the planner sees a fragment — ordinary tool results are not universally depth-truncated for the model

Envelope encryption

Bank-grade encryption. Secrets never reach the LLM. Every byte at rest is encrypted with the organisation’s unique key.

• AES-256-GCM with per-organisation Data Encryption Key (DEK) • DEK wrapping follows deployment configuration with deployment-supplied KEK or customer KMS integration • Async DEK rotation with batch re-encryption — no downtime for reads; new encrypted writes during healthy rotation use the pending DEK version • Secrets never placed into LLM prompts — only variable names as handles; FSM decrypts and injects values at tool execution time • Session events are always encrypted at rest in the configured session event store (PostgreSQL or filesystem per `session-store.mode`) • External user IDs hashed with HMAC-SHA256 before storage (never stored in plaintext) • Documented threat model: database compromise, stolen API keys, JWT forgery, prompt injection

Multi-protocol integration

Import your OpenAPI spec. Connect MCP servers. Wire up webhooks. All tools land in one catalog, one selection pipeline, one execution runtime.

• OpenAPI / Swagger — automatic import from JSON or YAML with full schema parsing, auto-embedding generation, and search document construction • MCP (Model Context Protocol) — stdio and remote HTTP transport; tools synced into the org catalog automatically • Outbound webhooks — PlanVault calls external services (n8n, Zapier, Make, custom endpoints) with resultJsonPath for response filtering • Inbound webhooks — external systems trigger new PlanVault sessions (Kafka, event buses, CI) via HMAC-SHA256 signed requests • Unified catalog — all tool sources share the same adaptive selection pipeline and execution runtime

Self-hosted deployment

Deploy in your VPC, data centre, or air-gapped network. Customer-managed infrastructure keeps tenant data and catalog metadata under your control; LLM traffic goes only to backends you configure.

• Full stack deploys on customer infrastructure (Docker, Kubernetes, or bare-metal) • Organisation DEK wrapping is driven by your self-hosted KEK configuration — not universally “AWS KMS in your customer account”; development stacks may use a compatible KMS-like endpoint • An LLM proxy layer enables local models (Ollama, vLLM, custom base URLs) without mandatory external providers • Outbound calls go only to backends you configure (LLM vendors, integrated APIs); there is no separate PlanVault product telemetry in self-hosted deployments • Optional `session-store.mode=postgres` or `file` for durable session events; file mode is typically single-node — see the public self-hosted setup guide • GDPR export and erasure out of the box (organisation, project, external user) • Configurable session retention per org with automatic pruning • Ephemeral execution state is cleaned up after runs, including across crash/restart

Crash recovery & event sourcing

Agent crashes mid-execution? PlanVault reconciles the run automatically, keeps encrypted history, and separates recoverable states from manual recovery.

• A durable execution journal lets the runtime reconcile state after a crash and resume only safe transitions • Long-term encrypted event history is stored separately from the ephemeral execution journal • Finished runs clear the temporary journal after durable events are confirmed • Run lifecycle uses explicit statuses for interruptions and cases that need manual recovery • The session message queue serialises concurrent prompts • The Idempotency-Key header with Redis supports safe client retries within a typical TTL

Human-in-the-loop

Keep humans in control. Plans can require approval before execution, concrete tool calls can wait for approval after parameter evaluation, and hard safety policies can block an action without approval.

• Plan approval before execution — three-layer policy: tool level (approvalPolicy: always/default/auto_ok), project level (planApprovalMode: require/auto), and session level (autoApprovePlan: true) • Runtime tool approval — settings.runtimePolicy evaluates already-resolved parameters for a concrete tool call; runtimeApproval.rules pause with a correlationId, safe displayParams, and approve/reject actions • Runtime hard deny — runtimeSafety.constraints block a call before tool_start and end the run as a controlled policy failure without routing to approval • Auto-approve bypasses plan HITL and writes a PLAN_AUTO_APPROVED audit row with approvalSource; a tool with approvalPolicy=always blocks plan auto-approval and writes PLAN_AUTO_APPROVE_BLOCKED • Slot filling — the agent can pause and request additional data from the user via structured input forms • SSE streaming for real-time execution UX (GET …/sessions/{id}/chat); events include confirm_plan_result, tool_approval_required, tool_approval_result, and tool_policy_denied • Lifecycle webhooks: session.requires_action for plan approval, slots, or runtime tool approval; session.failed carries safe reason metadata for hard policy denial; session.completed / session.failed cover terminal outcomes

LLM budget control

Control costs at every level. Set token and spend caps per organisation and per project with automatic enforcement.

• Per-org and per-project budgets: token count and/or USD caps per billing period (calendar month or rolling 30 days) • Multi-provider LLM routing (OpenAI, Anthropic, Google, local models via custom api_base) • Model override per project — different projects can use different models and cost profiles • Provider API key encryption — all vendor credentials encrypted with the org DEK • Automatic enforcement: HTTP 403 with specific error codes (ORG_LLM_BUDGET_TOKENS_EXCEEDED, PROJECT_LLM_BUDGET_SPEND_EXCEEDED) when limits are hit

Scoped API keys (HRN)

Fine-grained access control for API integrations. Each key carries explicit scopes matching specific resource patterns.

• Each API key carries HRN-based scopes (e.g. hrn:project:session:create, hrn:project:tools:read, hrn:project:*) • Per-project key quota comes from deployment config — one primary (full access) plus additional scoped keys • Key rotation without downtime — new key issued instantly, old hash invalidated • Key preview (last 4 characters) for identification without exposing the full secret • Keys hashed (SHA-256) before storage; plaintext shown only at creation or rotation

Integration examples

Production-ready integration examples for common patterns, available in the open-source planvault-examples repository.

• React SSE chat — real-time streaming chat UI with session management • Kafka → webhook — event-driven session creation (Scala, Java, Python variants) • MCP stdio — Python + SQLite tool server connected via Model Context Protocol • n8n workflows — outbound + inbound webhook integration patterns • Bash E2E smoke test — quick deployment verification script

PlanVault/planvault-examples on GitHub

Why PlanVault

PlanVault was designed for production enterprise workloads from day one. The table below shows where PlanVault usually sits alongside common agent APIs, frameworks, and orchestration tools for regulated, high-scale deployments.

CapabilityPlanVaultOpenAI agent APIsLangGraphCrewAI
Tool limit per session1 000+ tools in catalog; adaptive shortlist per turn (hard cap 200)128Depends on agent design and contextDepends on agent design
Large response handlingSchema flattening, JSONPath extraction, stdlib tools, depth truncationDepends on token/context limits and API patternUsually app-level handlingUsually app-level handling
Encryption at restAES-256-GCM envelope, per-org DEK; deployment KEK or customer KMS integrationProvider-managedDeployment-dependentDeployment-dependent
On-premise / air-gappedFull stack, local LLMs via built-in LLM proxyTypically external APISelf-managed infra, cloud LLMSelf-managed infra, cloud LLM
Human-in-the-loopPlan approval, slot filling, webhooksDepends on API and app-level flowCustom implementationCustom implementation
Crash recoveryEvent-sourced FSM, auto-recovery, idempotency keysDepends on provider-managed API surfaceCheckpoints (manual)Implementation-dependent
Multi-protocol integrationOpenAPI, MCP, webhooks — unified catalogMostly custom functions/toolsMostly custom toolsMostly custom tools
Routing latencyDB-level centroid routing for large catalogs (typically milliseconds vs LLM classifiers)Managed by the relevant OpenAI APIDepends on app/graph routingDepends on agent design
Popularity bias protectionLogarithmic RRF smoothingDepends on API and integrationUsually app-levelImplementation-dependent
Adaptive routing (feedback)Auto-scenarios with success weight updatesDepends on API and integrationManual prompt tuningManual prompt tuning
Built-in audit trailImmutable audit log (append-only), all approvals/rejections with timestamps and details, configurable retentionDepends on API and integrationUsually app-levelImplementation-dependent
Comparison methodology
This is a high-level comparison based on publicly documented capabilities as of Q2 2026; third-party platforms may have changed their feature sets or offer deployment-specific options. Third-party product names are used only to describe categories and compatibility; PlanVault is not affiliated with, sponsored by, or endorsed by their owners.

APIArchitecture

Support page

API and documentation questions: support@planvault.ai