Architecture

Architecture

PlanVault follows a layered architecture with clear separation between the HTTP API layer, the planning engine, the FSM execution runtime, and the persistence layer. The backend is Scala; the frontend is a React SPA.

Pipeline (static diagram)

End-to-end orchestration from integrated APIs and the tool catalog through PlanVault to deterministic execution (diagram below). The vertical client message path is described in the Data flow subsection further down.

Your APIs

OpenAPI / Swagger

MCP servers

Webhooks

Tool Selection

500+ tools in catalog

Vector + FTS search

Adaptive routing

AI Planning

LLM produces an execution plan

Structured JSON or DSL

HITL approval gate

Execution

Event-sourced FSM

AES-256-GCM encrypted

Real-time SSE stream

System components
  • HTTP API Layer

    Exposes the Runtime API (/api/v1/…): project-scoped routes (sessions, messages, tools, scheduled jobs, and related operations) plus public organisation inbound triggers at POST /api/v1/orgs/{orgId}/webhooks/{triggerKey}. The Admin API (/admin/v1/…) covers organisation, project, and console configuration (Keycloak JWT). SSE endpoints stream live execution. OpenAPI is generated automatically; the interactive explorer at /api-docs publishes the public contract (platform-operator `/superadmin/v1` routes are typically served by a separate process and are not part of that bundle). Errors follow RFC 7807; use X-Request-Id for correlation.

  • Planning Engine

    Receives user prompts with a shortlisted set of tools, constructs a system prompt including scenario instructions and context variables, and calls the configured LLM via the proxy. Depending on plannerMode, the response is parsed as structured JSON or a text DSL within <script> tags. Both paths compile into a single execution plan for the FSM.

  • Execution Runtime

    Session execution is modelled as a crash-resilient finite-state machine on Apache Pekko: state transitions are journaled for recovery after restarts; after completion the ephemeral journal is removed while long-term encrypted history stays in the configured event store.

  • Tool Executor

    Dispatches tool calls to external services via HTTP (REST), MCP protocol, or outbound webhook POST. Secrets are decrypted from scope and injected into request headers/bodies at execution time — values never reach the LLM prompt. Tool approvalPolicy, project planApprovalMode, and session autoApprovePlan control plan HITL; settings.runtimePolicy evaluates live tool-call parameters for runtime approval or hard deny before the external call.

  • Adaptive Retrieval

    Narrows the full tool catalog to a relevant shortlist per query. Strategies include vector similarity, PostgreSQL full-text search (FTS), hierarchical centroid routing, and scenario-based boosts — all fused via weighted scoring with usage statistics. Configurable per project and org with auto mode that adapts based on tool count thresholds.

  • Persistence Layer

    PostgreSQL stores organisations, projects, sessions, tool catalog metadata, scenarios, audit logs, and GDPR exports. Long-term encrypted session history is separated from the ephemeral execution journal; deployment operators choose storage mode (PostgreSQL or filesystem where supported). Optional Redis handles idempotency keys on message and action endpoints.

Data flow

The public Runtime API orchestrates the full pipeline: tool selection → planning → execution. The FSM runs asynchronously — after POST /api/v1/projects/{projectId}/sessions/{id}/messages the client typically receives HTTP 202 Accepted and monitors progress via SSE (GET …/sessions/{id}/chat) or history polling.

1
Client → Runtime API
POST /api/v1/projects/{projectId}/sessions/{id}/messages — the client submits a natural-language prompt.
2
Adaptive Retrieval → tool shortlist
The retrieval pipeline selects the optimal strategy (Direct / FlatRag / FullRag / HierarchicalRag) based on catalog size and narrows thousands of tools to a focused shortlist.
3
Planning engine → LLM → execution plan
Tool signatures and session context are sent to the LLM via the proxy. The model returns a structured execution plan — never raw function calls.
4
Session → execution runtime
Inside the session a crash-resilient FSM is created; state transitions are journaled for recovery before they take effect.
5
Tool execution → external services
Each plan step is executed against your APIs (webhooks, MCP servers, OpenAPI endpoints). Results are encrypted and stored as session events.
6
SSE stream → Client
Execution progress, tool results, and final output stream back to the client in real-time via Server-Sent Events.
Integration surfaces

The integration-surface model mirrors the public /platform page: external systems enter through the REST Runtime API, MCP, signed inbound org webhooks, outbound tool calls, orchestration platforms, and Knowledge Base retrieval; inside a single org/project boundary adaptive tool selection, planning, the execution FSM, and journaling run; outward paths include governed HTTP/MCP calls, SSE, lifecycle webhooks, diagnostics, replay, and GDPR operations.

Trust boundaries and data flow: client HTTPS and public inbound org webhooks cross the Runtime API; catalog metadata, scenarios, and audit data live in PostgreSQL under the org encryption boundary; the ephemeral FSM journal is separated from long-term encrypted session history; optional Redis holds idempotency keys for POST messages/actions — they never enter prompts. The LLM is invoked only at planning time with an already filtered shortlist.

Catalog entries are created from imported OpenAPI/Swagger revisions (including sync lifecycle), registered MCP tool lists, and outbound webhook tool definitions. Automation platforms can connect through webhooks, but they are not presented as native connectors. Each source passes the embedding/index pipeline before participating in adaptive retrieval.

Adaptive retrieval and the Semantic Routing Cache run before the planner call: vector search, PostgreSQL FTS, and centroid routing fuse into a shortlist of tool definitions; manual and auto scenarios add boosts. The routing cache stores only anonymised embedding vectors for auto scenarios — not raw end-user query text.

Secrets stay in envelope encryption and project/org scoped stores; the Tool Executor materialises values only at the HTTP/MCP/webhook boundary. The LLM-facing system prompt carries variable names, parameter JSON schemas, and policies — not decrypted credentials.

Signals leaving the runtime: the session SSE stream (GET …/chat) for live progress; lifecycle webhooks to your backends (completed, failed, requires_action, and related states); append-only audit logs; run diagnostics and replay/diff APIs behind Keycloak JWT and appropriate scopes; GDPR export/delete via org/project/user flows. Usage and spend aggregate into org/project views. Outbound tool responses do not return to the LLM for replan by default unless a bounded evidence path is explicitly enabled.

Technology stack

Architecture Decision Records

Key architectural decisions behind PlanVault and the reasoning that led to each choice.

Why adaptive tool selection?

Enterprise API landscapes routinely exceed 500 endpoints. Passing all tool signatures into a single LLM prompt is infeasible: context windows are finite, latency grows linearly, and plan quality degrades with noise. A fixed-strategy approach forces operators to choose between coverage and quality.

Decision: implement a 4-tier adaptive retrieval pipeline that automatically selects the optimal strategy based on catalog size. • Direct (≤20 tools) — all tools in prompt, zero retrieval overhead • FlatRag (≤100) — vector similarity narrows the set • FullRag (≤200) — hybrid vector + FTS with Reciprocal Rank Fusion • HierarchicalRag (200+) — centroid-based group routing before vector search Why centroids instead of an LLM classifier? The classic approach (pass each group description to an LLM and ask it to choose) adds 1–2 seconds of latency before every agent action and scales linearly with the number of groups. Instead, PlanVault stores a mean L2-normalised embedding (centroid) for each tool_group in the tool_group_centroids table. Finding the nearest K groups is a single pgvector query at the DB level (~10–50ms) vs ~1000–2000ms for an LLM call. This ensures plan quality remains constant regardless of catalog size, while keeping prompt token usage minimal. Scenario-based boosting adds an adaptive feedback loop: Semantic Routing Cache tracks which tools succeed for which query patterns and promotes them in future shortlists.

Why event sourcing?

AI agent execution is inherently non-deterministic and long-running. Traditional request-response patterns fail when an agent crashes mid-plan, when a tool call times out, or when a human approval gate pauses execution for hours. Losing execution state in any of these scenarios means lost work, duplicate side effects, and broken audit trails.

Decision: build the execution runtime as a crash-resilient finite-state machine (FSM) on Apache Pekko: every state transition is persisted as an event before it takes effect; after a failure the journal is replayed and execution is reconciled. • Encrypted long-term session history is stored separately from the ephemeral execution journal; storage mode is configured per deployment (`session-store.mode`). • The Idempotency-Key header with Redis reduces duplicate mutations on client retries. The result: sessions survive restarts, network partitions, and infrastructure failures through explicit recovery states and idempotency.

Why envelope encryption?

Multi-tenant platforms store sensitive data from multiple organisations in shared infrastructure. A single database encryption key means one breach exposes every tenant. Column-level encryption with a shared key only protects against disk theft, not application-layer compromise. Regulatory frameworks (GDPR, HIPAA, SOC 2) increasingly require tenant-isolated key management.

Decision: envelope encryption with a per-organisation AES-256-GCM DEK; wrapping uses Google Tink with a deployment-supplied KEK or customer KMS integration. • Each organisation receives a unique DEK at creation • The `organizations.dek_wrap` column is a TINK invariant, not a runtime stack selector • Async DEK rotation re-encrypts data in background batches; reads stay available and new encrypted writes during rotation use the pending DEK version • Crypto-shredding semantics tie to org deletion / GDPR flows rather than a single toggle described here • Secrets are stored encrypted and only resolved at execution time — they never appear in LLM prompts • A separate HMAC signing key pseudonymises external user identifiers and is distinct from KMS/Tink KEK material This yields tenant-isolated keys; KEK material stays outside application code and is controlled by the deployment operator.

Why separate planning from execution?

Most agent frameworks give the LLM direct control over side effects: the model decides what to call and the framework immediately executes it. This makes the model a runtime controller with no safety boundary — one hallucinated tool call can mutate production data, and there is no consistent point to insert approval gates, retry policies, or audit logging.

Decision: separate planning (the LLM produces a structured execution plan) from execution (deterministic FSM runtime). • By default, the LLM receives tool signatures and metadata, not raw payloads; the only exception is explicitly enabled bounded evidence replan for read-only tools. Secrets are never sent into the prompt. • The output is a structured execution plan, not a direct function call • The runtime validates the plan, applies plan approval and runtime tool gates after parameter evaluation, then executes steps one at a time • Each step has explicit retry, timeout, and error-handling policies • The plan can be reviewed, summarised (via utility model), and approved before any side effect occurs; risky concrete tool calls can require separate approval or receive a hard deny This architecture makes AI orchestration auditable, controllable, and safe for regulated environments where uncontrolled model-driven execution is unacceptable.

Operational Capabilities

Beyond the orchestration core, PlanVault includes a set of operational capabilities critical for production: adaptive tool selection with Semantic Routing Cache and feedback loops, replay debugging, run-level tracing, automatic OpenAPI drift correction, crash recovery, and scheduled execution.

Feedback Loops for Scenarios and Plans

The adaptive tool selection system does not just execute queries — it refines routing from every outcome. Scenario ranking considers execution outcomes, HITL rejections, and explicit user feedback (like/dislike) to improve future tool selection without retraining the model.

• Signal-aware scenario scoring: weighted execution failures, HITL rejects, and explicit likes/dislikes • Penalisation on HITL reject for contributing auto-scenario vectors • POST …/sessions/{sessionId}/feedback records like/dislike for a terminal run (Runtime project key with sessionWrite scope or member JWT); optional X-Operator-Id header when using an API key • Encrypted feedback events tied to runs/scenarios feed ranking updates • Console chat surfaces feedback controls after completed runs

Scenarios & Adaptive Tool Selection

Scenarios are the key mechanism for ensuring the right tools are selected for every query and that selection accuracy improves with each execution. Scenarios act as cached routing hints: they add proven tool candidates and planner instructions before the final LLM call.

There are two types of scenarios. Manual scenarios are created by administrators for critical business processes and have priority 2–100. Auto scenarios are created after successful executions, have priority 1, and use the Semantic Routing Cache to match semantically similar future requests.

Manual scenarios always carry more weight than auto scenarios. If your team explicitly describes a process — for example, “create an invoice and email it to the customer” — PlanVault treats that as operator-owned policy knowledge. Auto scenarios do not override those rules; they fill gaps, discover repeated patterns where no manual scenario exists yet, and produce candidates for review.

• Manual scenarios: priority 2–100, explicit query pattern → tool set, optional systemInstruction with {{key}} placeholders • Auto scenarios: priority 1, created after successful executions and used to boost future routing for similar requests • Weighted fusion: manual boosts + auto-scenario boosts + retrieval RRF scores + capped usage boost; scenarios complement search rather than replace it • Feedback loop: failed execution, HITL rejection, and explicit like/dislike reduce the influence of poor auto patterns • Group caps prevent one integration from dominating the shortlist • Test selection endpoint supports dry-running the full pipeline without executing tools

Semantic Routing Cache

Semantic Routing Cache is tenant-scoped routing memory for auto scenarios. When enabled, PlanVault stores anonymized embedding vectors for successful workflow requests, linked to your organization and to a specific auto scenario, instead of storing raw end-user prompts.

For a new request, PlanVault computes an embedding through your configured BYOK embedding provider, compares it with vectors in PostgreSQL/pgvector, and contributes relevant auto scenarios to the adaptive retrieval pipeline. Each scenario keeps a bounded set of vectors, hit counts, and success signals; similar vectors are merged, while weak or disliked signals can lose weight or be removed.

We recommend enabling Semantic Routing Cache, especially when your catalog has dozens or hundreds of tools. Without this layer the system leans more heavily on generic retrieval and wider shortlists; with it, PlanVault quickly remembers which tool sequences already worked for similar requests, reduces prompt noise, improves plan stability, and lowers the need to manually encode every repeated pattern.

• Why use it: better tool selection on large catalogs, less token overhead, lower latency, and more stable plans • How it works: query embedding → vector similarity against org-scoped auto-scenario vectors → scenario boost in hybrid fusion • Security: raw end-user prompts are not stored as routing cache; vectors are org_id-isolated and kept in your tenant database under the organization encryption boundary • Control: OWNER can disable the cache; disabling synchronously deletes stored vectors • Pipeline role: the cache does not execute actions by itself, it only guides the planner shortlist and scenario instructions

Suggested Patterns and Auto Scenarios

Suggested Patterns are the review layer on top of auto scenarios. When an auto scenario repeats often enough, has successful executions, and has a canonical plan, PlanVault marks it as suggestion-eligible and surfaces it in the console as a recommended pattern.

Your team can open the derived pattern, inspect the sanitized canonical plan, review the tool sequence, and choose whether to promote it to a Composite Tool or dismiss it. Promotion turns a repeated workflow into an explicit reusable tool that can be versioned, governed, and invoked as part of the catalog. Dismiss disables an unwanted pattern and removes its routing vectors.

• Suggested Patterns come from real successful executions, not model guesses • Labels are derived from tool names / refs; raw user prompts are not needed for review • Promote creates a Composite Tool from canonical planner JSON and an input schema derived from free variables • Dismiss deactivates the auto scenario and deletes its Semantic Routing Cache vectors • Manual scenarios remain the primary mechanism for policy-critical workflows; Suggested Patterns help identify what should be formalized next

Debugging, Replay & Failure Analysis

Debugging complex AI-orchestrated workflows is expensive: re-running entire scenarios, spending tokens, re-invoking external systems. PlanVault provides a debug/replay layer for efficient failure analysis.

• Modes frozen and live reuse the saved plan snapshot; frozen replays recorded tool outputs, live performs real HTTP/MCP/webhook calls (optional JSON parameter/tool overrides) • replan_frozen / replan_live re-run planning from the original prompt with frozen or live side effects respectively • Checkpoint replay (frozen/live only): resume from tool:<name> plus optional completion index — incompatible with replan_* modes • Plan snapshots are persisted per run for replay inputs • Runtime API (Keycloak JWT only, sessionHistoryRead scope + org debug content access): POST /api/v1/projects/{projectId}/sessions/{sessionId}/runs/{runId}/replay, GET /api/v1/projects/{projectId}/sessions/{sessionId}/runs/{runId}/replay-status, GET /api/v1/projects/{projectId}/sessions/{sessionId}/runs/{runId}/replay-runs, GET /api/v1/projects/{projectId}/sessions/{sessionId}/runs/{runId}/diff?compareWith={otherRunId} — rate limits apply

Run Tracing & Operational Diagnostics

Every run receives a full diagnostics stream with graph-oriented events: from selection and planning through every tool call to the terminal state. This enables step-level execution analysis.

• Persisted diagnostics rows: seq_no, visibility, graph-normalised kinds (node_entered, node_completed, branch_taken, node_paused for HITL, terminal for completion/errors), graph node ID, duration, outcome, error taxonomy • GET …/runs/{runId}/timeline — product-safe timeline; GET …/runs/{runId}/diagnostics — detailed rows (Keycloak JWT required; project API keys are rejected) • Correlation: requestId and traceId on diagnostic rows when present • Planner/tool latency, replan, HITL, and failure signals surface in the graph • Console Run diagnostics page (JWT users): execution graph and timeline • Retention: diagnostics are pruned after a short default window unless org session retention extends it — not an indefinite audit archive

Cost Visibility and Operations Console

PlanVault is designed not only as an agent runtime, but as an operations console for teams running AI workflows in production. LLM spend can be controlled at the organization and project level, while usage and spend can be reviewed in the context of projects, sessions, and tags.

• Per-org and per-project budget caps for tokens and spend with automatic enforcement • Usage/spend views show which projects, sessions, and workflow categories drive cost • Session tags and metadata support chargeback, customer attribution, and cost-driver analysis • Console-wide search helps operators quickly find sessions, external users, tags, and project context • Audit logs and diagnostics provide context for support, finance review, and incident response

OpenAPI Auto-Healer

OpenAPI specs drift: fields become required, types change, responses stop matching the imported schema. When an OpenAPI-backed HTTP tool fails on PlanVault’s HTTP replan path, Auto-Healer ingests the status code and error snippet, diagnoses likely drift with deterministic rules (optional LLM-assisted proposals), and applies or queues patches under policy.

• Ingestion from OpenAPI HTTP tool failures on the planner replan path (HTTP status + sanitised error text) • Deterministic diagnosis: missing fields, type mismatch, nullable/required drift, enum/content-type mismatches (some classes stay review-only) • Policy modes: auto-apply supported patches or escalate risky changes to operators • Apply creates a new tool revision (integrations rebound); audit logs record OPENAPI_TOOL_SPEC_HEALED • Admin API (Keycloak JWT): cursor-paged list, fetch one event, actions retry | apply | reject | review

OpenAPI Sources and Safe Sync Lifecycle

OpenAPI in PlanVault is not a one-time Swagger import. Each source has a lifecycle: a stable source record, runtime URL override, change preview, controlled sync, and sync-run history. This lets teams move integrations from staging to production and update APIs without manually rewriting tools.

• Import creates typed tool definitions, embeddings, and search documents for adaptive retrieval • serverUrlOverride changes the runtime target without republishing every tool revision • Sync preview shows the diff before applying changes, and sync runs can be polled until completion • Auto-Healer complements the lifecycle: when runtime drift appears, it creates a patch/review flow instead of silently breaking the workflow • SSRF/outbound URL policy checks OpenAPI document URLs and runtime targets before use

Runtime Recovery & Execution Semantics

PlanVault formalises every run lifecycle through canonical statuses (queued, planning, awaiting_confirmation, awaiting_slots, executing, awaiting_external_signal, completed, failed, interrupted, needs_manual_recovery, and related transitions). After a process restart the runtime reconciles open runs automatically.

• Explicit run lifecycle with canonical states and transitions via FSM events • First-class run persistence in a dedicated table with recovery policy • Run-level source of truth independent of coarse session status • Restart reconciliation: open runs automatically recover after a process restart • Idempotency keys to prevent duplicate mutations during crash recovery • SSE and lifecycle webhooks reflect run-level transitions

Production Reliability, Idempotency, and Outbound Safety

For production clients, the hard part is not only generating a plan; it is surviving retries, deploys, network failures, and duplicate callbacks safely. PlanVault combines an event-sourced FSM, explicit run lifecycle, Redis-backed idempotency, and webhook delivery semantics so client integrations behave predictably.

• Idempotency-Key on POST messages/actions prevents duplicate mutations during client retries • Run lifecycle includes explicit interrupted and needs_manual_recovery states for controlled recovery • SSE streams live progress to UIs, while lifecycle webhooks push state changes to backend systems • Outbound URL policy reduces SSRF risk for HTTP tools, webhooks, and OpenAPI sync • RFC 7807 problem details and X-Request-Id simplify integration debugging

Scheduled Execution

PlanVault supports delayed and recurring orchestration. One-shot delays use durable scheduled jobs (native schedule_execution tool or Scheduled Jobs API). Recurring schedules use hourly/daily/weekly rules with an IANA timezone — not arbitrary cron strings surfaced in the product UI.

• Native schedule_execution enqueues a durable job to open a new session or resume an existing one after delaySeconds or an absolute runAt — capped by maxScheduleHorizonDays • Recurring schedules (Admin/API): hourly, daily, or weekly rules with timezone + recurrence payload; targets new_session_prompt or resume_session • Worker dispatch survives process restarts; short sleep-style waits inside a plan remain separate from business scheduling • Job state, retries, and encrypted schedule secrets persist in PostgreSQL

External Signals, Callbacks, and Live Integrations

Not every workflow finishes in a single HTTP call. PlanVault supports long-running processes where the agent must wait for an external event: an approval from your backend, a payment callback, a human signal, job queue completion, or a response from another orchestration system.

• Native wait_for_signal puts the run into awaiting_external_signal and emits awaiting_signal on the SSE stream • An external system delivers a JSON callback through a dedicated endpoint; duplicate delivery of the same signal is handled idempotently • Lifecycle webhooks notify your backend about completed, failed, requires_action, interrupted, and recovery_required states • Inbound webhooks can start new sessions from event buses, CI, SaaS automation, or internal systems • This enables workflows where LLM planning is combined with deterministic backend callbacks and human approval

APIArchitecture

Support page

API and documentation questions: support@planvault.ai