Getting started
Overview
PlanVault is an enterprise AI orchestration platform. Clients send natural-language prompts; an LLM generates structured execution plans with dynamic routing to the right tools (REST APIs, MCP servers, webhooks). The platform handles multi-tenant isolation, adaptive tool selection, webhook and API confirmations, secret management, multi-provider LLM routing, event-sourced FSM execution with crash recovery, and provides both a web console for administration and a Runtime API for programmatic integration.
This guide covers the system architecture, core concepts, request lifecycle, and security model. For REST API endpoints, request/response schemas, authentication details, and interactive Swagger — see the API documentation page.
Key capabilities: • Multi-tenant architecture with organisation/project isolation and role-based access • Dual planner modes: structured JSON Schema and Python-like DSL — same internal plan representation, same runtime • Adaptive tool retrieval with vector search, FTS, hierarchical centroid routing, and scenario-based fusion • Event-sourced FSM execution with crash recovery (Apache Pekko persistence) • Envelope encryption (AES-256-GCM) for all secrets, provider keys, and session data at rest • Real-time SSE streaming for execution progress • Human-in-the-loop: plan approval and slot/form input • Lifecycle webhooks for backend integration (outbound POST for subscribed types: completed, failed, requires_action, interrupted, recovery_required) • Inbound webhook triggers to start sessions from external systems (Slack, GitHub, CI) • Data export, per–external-user erasure workflows, and configurable session retention for portability and subject-rights handling • Self-hosted and air-gapped VPC deployment support
Technical Advantages
PlanVault is built around capabilities that matter in production: scaling to thousands of tools without context overflow, handling multi-megabyte API responses, encrypting every byte at rest, and recovering from crashes mid-execution. This section covers the technical differentiators in detail.
Intelligent tool selection
Most LLM-based agent platforms are limited to 128–200 tools per prompt context. PlanVault’s 4-tier adaptive retrieval scales to thousands of registered tools — the planner always receives a focused, relevant shortlist. On cold starts (before the feedback loop kicks in) semantically similar APIs may compete, but auto-scenarios quickly adapt the ranking.
• Automatic OpenAPI → tools ingestion with full lifecycle: versioning, embedding generation, search document construction • 4-tier adaptive strategy with configurable thresholds (defaults: Direct ≤20 tools, FlatRag ≤100, FullRag ≤200, HierarchicalRag 200+) • Scenario-based boosting — manual templates (priority 2–100) and auto scenarios with Semantic Routing Cache (semantic embedding match; raw user prompts are not stored) plus success_rate tracking • Per-group caps prevent any single service from dominating the shortlist • Test selection endpoint — dry-run the full pipeline without real execution (POST …/tools/test-selection) • Configurable fusion weights (RRF-K, usage boost cap, centroid top-K) per org/project • Hard cap at 200 tools in the final shortlist; retrievalMaxTools default 30
Large response handling
API responses of 1–20 MB don’t break the agent. PlanVault extracts what matters before data becomes runtime context; the planner does not receive raw payloads by default, and bounded evidence replan for read-only tools is explicitly opt-in.
• Input & output schema flattening — nested OpenAPI structures are converted into flat parameter lists, reducing hallucinations during plan generation • resultJsonPath on webhook tool execution details — extract a specific fragment from a large JSON response before it enters execution scope • get_field / set_field / merge stdlib tools — the planner can work with large objects incrementally without pulling everything into the prompt • Output field visibility cap in prompt generators — the LLM sees only the first five output schema fields, not an entire 500-field JSON definition • On the **evidence replan** path with `postSuccessReplan.redaction` enabled, large JSON is trimmed via configurable maxDepth (default 4), string caps, and optional key drops before the planner sees a fragment — ordinary tool results are not universally depth-truncated for the model
Envelope encryption
Bank-grade encryption. Secrets never reach the LLM. Every byte at rest is encrypted with the organisation’s unique key.
• AES-256-GCM with per-organisation Data Encryption Key (DEK) • DEK wrapping follows deployment configuration with deployment-supplied KEK or customer KMS integration • Async DEK rotation with batch re-encryption — no downtime for reads; new encrypted writes during healthy rotation use the pending DEK version • Secrets never placed into LLM prompts — only variable names as handles; FSM decrypts and injects values at tool execution time • Session events are always encrypted at rest in the configured session event store (PostgreSQL or filesystem per `session-store.mode`) • External user IDs hashed with HMAC-SHA256 before storage (never stored in plaintext) • Documented threat model: database compromise, stolen API keys, JWT forgery, prompt injection
Multi-protocol integration
Import your OpenAPI spec. Connect MCP servers. Wire up webhooks. All tools land in one catalog, one selection pipeline, one execution runtime.
• OpenAPI / Swagger — automatic import from JSON or YAML with full schema parsing, auto-embedding generation, and search document construction • MCP (Model Context Protocol) — stdio and remote HTTP transport; tools synced into the org catalog automatically • Outbound webhooks — PlanVault calls external services (n8n, Zapier, Make, custom endpoints) with resultJsonPath for response filtering • Inbound webhooks — external systems trigger new PlanVault sessions (Kafka, event buses, CI) via HMAC-SHA256 signed requests • Unified catalog — all tool sources share the same adaptive selection pipeline and execution runtime
Self-hosted deployment
Deploy in your VPC, data centre, or air-gapped network. Customer-managed infrastructure keeps tenant data and catalog metadata under your control; LLM traffic goes only to backends you configure.
• Full stack deploys on customer infrastructure (Docker, Kubernetes, or bare-metal) • Organisation DEK wrapping is driven by your self-hosted KEK configuration — not universally “AWS KMS in your customer account”; development stacks may use a compatible KMS-like endpoint • An LLM proxy layer enables local models (Ollama, vLLM, custom base URLs) without mandatory external providers • Outbound calls go only to backends you configure (LLM vendors, integrated APIs); there is no separate PlanVault product telemetry in self-hosted deployments • Optional `session-store.mode=postgres` or `file` for durable session events; file mode is typically single-node — see the public self-hosted setup guide • GDPR export and erasure out of the box (organisation, project, external user) • Configurable session retention per org with automatic pruning • Ephemeral execution state is cleaned up after runs, including across crash/restart
Crash recovery & event sourcing
Agent crashes mid-execution? PlanVault reconciles the run automatically, keeps encrypted history, and separates recoverable states from manual recovery.
• A durable execution journal lets the runtime reconcile state after a crash and resume only safe transitions • Long-term encrypted event history is stored separately from the ephemeral execution journal • Finished runs clear the temporary journal after durable events are confirmed • Run lifecycle uses explicit statuses for interruptions and cases that need manual recovery • The session message queue serialises concurrent prompts • The Idempotency-Key header with Redis supports safe client retries within a typical TTL
Human-in-the-loop
Keep humans in control. Plans can require approval before execution, concrete tool calls can wait for approval after parameter evaluation, and hard safety policies can block an action without approval.
• Plan approval before execution — three-layer policy: tool level (approvalPolicy: always/default/auto_ok), project level (planApprovalMode: require/auto), and session level (autoApprovePlan: true) • Runtime tool approval — settings.runtimePolicy evaluates already-resolved parameters for a concrete tool call; runtimeApproval.rules pause with a correlationId, safe displayParams, and approve/reject actions • Runtime hard deny — runtimeSafety.constraints block a call before tool_start and end the run as a controlled policy failure without routing to approval • Auto-approve bypasses plan HITL and writes a PLAN_AUTO_APPROVED audit row with approvalSource; a tool with approvalPolicy=always blocks plan auto-approval and writes PLAN_AUTO_APPROVE_BLOCKED • Slot filling — the agent can pause and request additional data from the user via structured input forms • SSE streaming for real-time execution UX (GET …/sessions/{id}/chat); events include confirm_plan_result, tool_approval_required, tool_approval_result, and tool_policy_denied • Lifecycle webhooks: session.requires_action for plan approval, slots, or runtime tool approval; session.failed carries safe reason metadata for hard policy denial; session.completed / session.failed cover terminal outcomes
LLM budget control
Control costs at every level. Set token and spend caps per organisation and per project with automatic enforcement.
• Per-org and per-project budgets: token count and/or USD caps per billing period (calendar month or rolling 30 days) • Multi-provider LLM routing (OpenAI, Anthropic, Google, local models via custom api_base) • Model override per project — different projects can use different models and cost profiles • Provider API key encryption — all vendor credentials encrypted with the org DEK • Automatic enforcement: HTTP 403 with specific error codes (ORG_LLM_BUDGET_TOKENS_EXCEEDED, PROJECT_LLM_BUDGET_SPEND_EXCEEDED) when limits are hit
Scoped API keys (HRN)
Fine-grained access control for API integrations. Each key carries explicit scopes matching specific resource patterns.
• Each API key carries HRN-based scopes (e.g. hrn:project:session:create, hrn:project:tools:read, hrn:project:*) • Per-project key quota comes from deployment config — one primary (full access) plus additional scoped keys • Key rotation without downtime — new key issued instantly, old hash invalidated • Key preview (last 4 characters) for identification without exposing the full secret • Keys hashed (SHA-256) before storage; plaintext shown only at creation or rotation
Integration examples
Production-ready integration examples for common patterns, available in the open-source planvault-examples repository.
• React SSE chat — real-time streaming chat UI with session management • Kafka → webhook — event-driven session creation (Scala, Java, Python variants) • MCP stdio — Python + SQLite tool server connected via Model Context Protocol • n8n workflows — outbound + inbound webhook integration patterns • Bash E2E smoke test — quick deployment verification script
PlanVault/planvault-examples on GitHub
Why PlanVault
PlanVault was designed for production enterprise workloads from day one. The table below shows where PlanVault usually sits alongside common agent APIs, frameworks, and orchestration tools for regulated, high-scale deployments.
| Capability | PlanVault | OpenAI agent APIs | LangGraph | CrewAI |
|---|---|---|---|---|
| Tool limit per session | 1 000+ tools in catalog; adaptive shortlist per turn (hard cap 200) | 128 | Depends on agent design and context | Depends on agent design |
| Large response handling | Schema flattening, JSONPath extraction, stdlib tools, depth truncation | Depends on token/context limits and API pattern | Usually app-level handling | Usually app-level handling |
| Encryption at rest | AES-256-GCM envelope, per-org DEK; deployment KEK or customer KMS integration | Provider-managed | Deployment-dependent | Deployment-dependent |
| On-premise / air-gapped | Full stack, local LLMs via built-in LLM proxy | Typically external API | Self-managed infra, cloud LLM | Self-managed infra, cloud LLM |
| Human-in-the-loop | Plan approval, slot filling, webhooks | Depends on API and app-level flow | Custom implementation | Custom implementation |
| Crash recovery | Event-sourced FSM, auto-recovery, idempotency keys | Depends on provider-managed API surface | Checkpoints (manual) | Implementation-dependent |
| Multi-protocol integration | OpenAPI, MCP, webhooks — unified catalog | Mostly custom functions/tools | Mostly custom tools | Mostly custom tools |
| Routing latency | DB-level centroid routing for large catalogs (typically milliseconds vs LLM classifiers) | Managed by the relevant OpenAI API | Depends on app/graph routing | Depends on agent design |
| Popularity bias protection | Logarithmic RRF smoothing | Depends on API and integration | Usually app-level | Implementation-dependent |
| Adaptive routing (feedback) | Auto-scenarios with success weight updates | Depends on API and integration | Manual prompt tuning | Manual prompt tuning |
| Built-in audit trail | Immutable audit log (append-only), all approvals/rejections with timestamps and details, configurable retention | Depends on API and integration | Usually app-level | Implementation-dependent |