Introduction
Production agentic systems increasingly read context from data-lakehouse stores at inference time, compacting heterogeneous sources into engine-agnostic columnar formats — Apache Arrow Parquet on S3-class object storage — queried by in-process engines such as DuckDB 1, often co-located with on-prem or edge runtimes hosting local models. The access-control consequence is sharp: the entity issuing queries is no longer a human analyst on a stable role but an LLM agent mediated by tool-call protocols such as the Model Context Protocol 2. Agent access is short-term, contextual, and task-oriented; role-based access control — and its temporally-scoped variant TRBAC — does not fit. Two existing responses each concede something. Per-engine row- and column-level security 3 does not survive ETL into an engine-agnostic Parquet store, forcing per-source policy duplication that scales poorly with source count and churn. Physical tenant segregation across object-storage prefixes queried by disjoint engines restores safety but forfeits the cross-source joins that motivate the lakehouse in the first place. Neither addresses the deeper mismatch: an agent's effective permissions depend on the principal it acts for, the task it is invoked under, and the calling context — none of which the static identity → role → permission chain of RBAC encodes. The same agent code called by two principals, or invoked by one principal under two tasks, may legitimately need two different views. We propose plan-level rewriting against a column-grant policy as a third point in this design space, gated under the agent's task-scoped identity at query time. Postern, the artifact we present, has three components: a Lean~4-mechanised plan rewriter, a Rust capability-tracking layer constraining the agent's downstream computation, and a reference-conformance harness binding the two.
Contributions
-
A Plan IR (//), a Biscuit-Datalog policy language (Horn-fragment with ground
right(p, r, c)facts compiled from the surface column-grant syntax), and a rewriter, mechanised in Lean~4 4. The rewriter is inspired by Cedar's Lean-mechanised authorization core 5; we transpose the technique from per-request decisions to plan-level outputs over an IR, and so prove soundness of a transformation rather than a classification. The rewriter side comprises ninesorry-free theorems: output-column soundness, filter-predicate soundness, schema subset, idempotence under repeated application, monotonicity in the policy, two no-new-column lemmas, and explicit-refusal lemmas for unknown relations and for filter predicates over forbidden columns (§4). The Datalog evaluator (verifier/lean/Datalog.lean) contributes a furthereval_monotonetheorem (proved modulo one isolated combinatorial obligation,herbrandBound_mono) plus foursorry-free specialisation lemmas covering the rule-free regime that all our scenarios use today. Two further evaluator meta-theorems —eval_soundandeval_terminates— are stated withsorryAxobligations named explicitly inCheckAxioms.lean. Axiom dependencies for proved declarations are bounded bypropextandQuot.sound; two rewriter theorems depend on none. -
A Rust capability-tracking layer (
postern-guardrail) implementing three composable mechanisms: sealedCap<'sc, C>tokens whose construction is private to the crate, an invariant brand lifetime'scenforced byPhantomData<fn(&'sc ()) -> &'sc ()>and gated by a universally-quantified scope combinator, and opaque-receipt sinks that consume both theCapand the carrierTaggedwithout exposing the underlying value. Three lexical bypass attempts (forgingCap, projectingTagged::value, escaping the brand) are pinned ascompile_faildoctests. The agent-facing surface isno_std-compatible (§3). -
A Rust implementation of the rewriter (
postern-core) structurally mirroring the Lean reference, and a reference-conformance harness (postern-diff) that asserts byte-equivalence between the Rust output and the Lean reference on a corpus of 18 hand-curated cases (15 accept, 3 refusal). The gateway's Datalog evaluation surface is designed aroundbiscuit-auth's publicbiscuit_auth::datalog::Worldevaluator 6; the in-treepostern-coremigration to that backend is in progress, with the column-grant DSL serving as the byte-equivalent stand-in until the migration lands. We label the procedure reference-conformance testing rather than QuickCheck-style differential testing, reserving the latter term for property-based generation; the latter is among the open problems of §6. -
A case study over the Kaggle
transactions-fraud-datasetsschema, with three principals (CRM, Card Operations, Fraud Risk) exercising PII redaction, cross-departmental refusal, and minimum-necessary disclosure (§5). Each row of the evaluation table corresponds to a corpus case driving the conformance harness.
Plan IR
We state the IR before the threat model so §2 may reference its operators directly. Plans are single-relation expressions built from three constructors:
where , , and . We write for the output schema of plan under a catalog , defined inductively: ; ; . The asymmetry between (which alters the output schema) and (which does not, despite reading ) is the source of the filter side-channel discussed in §2.
Threat model
The trusted computing base (TCB) consists of the gateway process itself, the catalog that it consults, the plan-to-executor lowering step that hands the rewritten plan to DuckDB, the principal extraction step that maps a verified capability token to a string, and the DuckDB + Parquet store. All other parties are untrusted.
| Component | Trust | Justification |
|---|---|---|
| LLM / agent planner | ✗ | Susceptible to direct and indirect prompt injection. |
| Agent-generated code | ✗ | Composes upstream-tainted context with downstream effects. |
| Capability tokens 6 | ~ | Trust contingent on the gateway's signature verification. |
| Gateway process | ✓ | Hosts the Lean-verified rewriter and the policy. |
| Catalog | ✓ | Assumed bound to the physical Parquet schema (§6). |
| Plan-to-executor lowering | ✓ | The rewritten plan is assumed honoured literally by DuckDB. |
| Principal-string extraction | ✓ | A bug in token verification invalidates every theorem of §4. |
| DuckDB + Parquet store | ✓ | Standard storage-engine assumptions apply. |
The attacks within scope of the formal model are: (i) over-
projection of forbidden columns; (ii) filter on a forbidden
column (the side-channel addressed by rewrite_filter_sound);
(iii) scan of a relation absent from the catalog (addressed by
rewrite_refuses_unknown); (iv) cross-departmental reach by a
principal lacking matching grants; and (v) unknown principals,
which we treat fail-closed via the empty-allow convention.
The following are deliberately out of scope and discussed in §6: aggregation and inference attacks; covert channels through latency or row-count observation; multi-relation joins; biscuit attenuation modelled inside the Lean proof; policy synthesis from natural language; and the planner-to-executor lowering step.
Design
Postern compiles a single policy artifact to plan-level enforcement.
flowchart LR
agent[LLM agent] -->|MCP plan + biscuit| gw{{Postern gateway}}
subgraph TCB
gw -->|verify| bc[biscuit verifier]
bc -->|principal| rw[Lean-extracted rewriter]
pol[(policy)] --> rw
cat[(catalog)] --> rw
rw -->|Option Plan| exe[DuckDB / Polars]
end
exe --> agent
Policy
A policy is a list of column-grants : "principal may read columns on relation ". Multiple grants for the same flat-union. Anything outside the union is denied — fail-closed. No deny-lists: the policy language is deliberately monotone grant-only, which makes policy review additive (a new grant can only widen). Deny-lists and attribute-based predicates are §6.
A concrete policy from the financial-institution scenario of §5, in Postern's surface syntax:
grant CRM on users_data { id, name, region, age }
grant CardOps on cards_data { card_id, card_type, limit, activated }
grant FraudRisk on transactions_data { txn_id, card_id, amount, merchant, timestamp }
grant FraudRisk on users_data { id, region }
Anything outside these grants is implicitly denied;
users_data.ssn, users_data.email, and cards_data.card_number
are never released regardless of which principal queries. The
rewriter projects each plan's output schema down to the grant
union under the querying principal and refuses plans whose
filter predicates touch columns outside that union (§4).
Rewriter
rewrite cat P prin q :=
if cat q.touched = [] then none -- unknown relation
else if ¬ q.filterCols ⊆ allow then none -- forbidden filter col
else
some (Project q (q.schema cat ∩ allow))
where allow := P.allowed prin q.touched
Post-hoc projection is the simplest algorithm that admits a clean soundness proof. Predicate-pushdown variants can be verified against this rewriter as a reference; we leave that to future work.
Capability-bounded data flow
The rewriter of §3 (Layer 1) constrains what data reaches the agent. A separate layer (henceforth Layer 2) constrains what the agent's code may do with the values released. Odersky et al. 7 propose Scala 3 capture-checking as a type-level mechanism for this purpose: capabilities are first-class program variables, and the compiler tracks each function's capture set — the capabilities it may use — so that agent-generated code cannot perform an effect for which it does not hold a capability. Rust has no capture-checking. We mechanise a weaker analog by composing three pure-Rust constructions, each closing one face of the gap.
Sealed capability tokens. Cap<'sc, C> carries no public
constructor and contains a Sealed field whose constructor is
private to the crate. Forging a Cap is therefore a privacy
violation rejected at compile time (verified by a
compile_fail doctest in the crate). Carrier values
Tagged<'sc, T, C> likewise have a private value field;
the inner is unreachable by direct field access (also
verified by compile_fail).
Invariant brand lifetime. Both Cap and Tagged carry a
brand parameter , made invariant by
. The sole entry
point to the layer is a scope combinator
The universal quantification of together with
forces the closure's return type to be free of , ruling out
any path by which a or might
escape the scope. The construction is the same one used by
ghost-cell for branded references. A third compile_fail
doctest demonstrates that attempting to return the Cap from
the closure is rejected.
Opaque-receipt sinks. Carrier extraction is governed by a fixed set of sink functions, each of which consumes both and and returns a receipt type that contains no information derived from beyond its serialised length:
pub fn to_llm<'sc, T, C, S>(cap: Cap<'sc, C>,
data: Tagged<'sc, T, C>,
serialize: S) -> LlmAck
where S: FnOnce(T) -> String;
The agent has no public method that returns raw ; the prior
draft's Tagged::release(cap) -> T is removed in favour of
the sink interface above. The only operations on
that the agent's code can perform are map and and_then, both
of which preserve the brand and the kind.
The agent-facing surface is #![no_std]. The crate is
partitioned so that the gateway-side integration with
postern-core lives behind a gateway feature flag; the
agent-facing types and operations (Cap, Tagged, run,
sinks) depend only on core and alloc. A downstream agent
crate declaring #![no_std] and depending on postern-guardrail
with default-features = false therefore has no link-level
access to std::println!, std::process::exit, network sockets,
or filesystem APIs, and the only side-effect channels available
are those reached through the sanctioned sinks.
Residual. Inside a map closure body the agent has
temporary access to a value of type ; Rust does not bound
what that body may do with the value. Three classes of
side-channel survive: panic! with formatted strings, timing
observed by the host, and stash in thread-local storage when
. We mitigate the obvious thread-move attack by
making both Cap and Tagged !Send + !Sync via the
*const () phantom, but a determined adversary inside the agent
runtime is out of scope for the present construction. A
genuinely tight analog of capture-checking in Rust appears to
require either a custom lint over closure bodies or a Wasm-class
sandbox; both directions are discussed in §6.
Formal model
The development is mechanised in verifier/lean/Postern.lean;
the per-theorem axiom set is reported by CheckAxioms.lean. We
write for the rewriter
applied to catalog , policy , principal , and
plan . The output type is ;
denotes explicit refusal.
Theorem 1 (output-column soundness, rewrite_sound). For
every , if
,
then for every column ,
.
Theorem 2 (filter-predicate soundness, rewrite_filter_sound).
Under the same hypothesis, every column read by a
predicate inside is also in
. Theorems 1 and 2
together rule out the side-channel in which a principal lacking
read access to column uses it as a row selector without
projecting it.
Theorems 3 and 4 (rewrite_schema_subset,
rewrite_no_new_columns). The output schema is contained in
the input schema. Equivalently, implies
.
Theorem 5 (idempotence, rewrite_idempotent). If
and
,
then as sets. The rewriter is a
closure operator on schemas.
Theorem 6 (monotonicity in the policy, rewrite_monotone).
If
for every and , then the output schema under is
contained in the output schema under . Strengthening the
policy can only widen the released set.
Theorem 7 (touched-relation preservation, rewrite_touched).
.
Theorem 8 (refusal under unknown relation,
rewrite_refuses_unknown).
implies .
A relation absent from the catalog is rejected explicitly rather
than reduced to an empty output schema — relevant under catalog
drift, where the physical store may diverge from the catalog.
Theorem 9 (refusal under forbidden filter,
rewrite_refuses_forbidden_filter). If
and
, then
. The
contrapositive of Theorem 2.
CheckAxioms.lean audits the axiom dependencies of each theorem.
For the rewriter side (Theorems 1–9) the set is bounded by
, Lean~4's
foundational axioms; the proofs of rewrite_touched and
rewrite_refuses_unknown depend on no axioms, and no proof
uses sorry.
Datalog evaluator (verifier/lean/Datalog.lean). The policy
language is mechanised independently. Eight supporting
list-membership lemmas (step_extensive, allMatches_subset_facts,
the joint step_subset, iterate_succ_extensive, iterate_subset_le,
iterate_subset_program) are proved without sorry. Four
specialisation lemmas (step_no_rules, iterate_no_rules,
eval_no_rules, eval_fact_mem) cover the rule-free regime
the financial-institution scenario uses today; they give an
unconditional soundness direction for ground-fact policies and
depend only on propext. The headline eval_monotone is
proved modulo one isolated combinatorial obligation,
herbrandBound_mono (a length-after-eraseDups arithmetic
argument with no semantic content). Two further meta-theorems,
eval_sound and eval_terminates, are stated with sorryAx
obligations named explicitly in the audit so the residual proof
surface is visible at CI time. The corresponding open problems
are listed in §6.
Implementation and conformance testing
The Rust implementation in prototype/crates/postern-core mirrors
the Lean types and the function structurally.
The conformance harness postern-diff consumes a JSON corpus
emitted by lake exe postern-corpus and asserts three equalities
per case: that the Rust outcome kind ( /
) matches the Lean reference; that, on accept,
the rewritten plan, output schema, predicate read-set, and
touched relation are structurally equal to the Lean reference;
and that the input plan's matches the Lean
auxiliary, independent of the rewriter.
JSON-corpus conformance is preferred to Lean-to-Rust extraction on the grounds that the corpus interface is stable across compiler-version churn in both languages and that divergence manifests as a CI failure rather than a build failure.
Datalog backend. The gateway is designed to evaluate
policies through biscuit-auth's public biscuit_auth::datalog
module — World::new(), add_fact, add_rule, run,
query_match — which is the same evaluator used inside the
production token-verification surface, only without the
token-handling layers we put out of scope (§6). The in-tree
postern-core migration to this backend is in progress; the
column-grant DSL serves as a byte-equivalent stand-in until the
migration lands. A second conformance corpus, exercising Lean
eval against biscuit_auth::datalog::World directly, is the
natural pair for the rewriter corpus and is queued as the next
work item.
The corpus comprises 18 cases: seven behavioural cases drawn from the financial-institution scenario of §5, four refusal regressions for known attack shapes (filter-on-forbidden-column, unknown-relation, two nested forbidden-filter variants), and seven policy-language edge cases (empty policy, duplicate grants, catalog-absent columns, case-sensitive principal, trailing-whitespace principal, nonexistent project column, nested narrowing). All eighteen pass on the current Rust implementation.
Evaluation: a financial institution with three principals
We evaluate the artifact on the Kaggle
transactions-fraud-datasets schema. The policy is reproduced
in scenarios/financial-institution/policy.postern; the
principal cases are summarised below.
| principal | plan | outcome | rewritten schema |
|---|---|---|---|
CRM |
Scan users_data |
accept | id, name, region, age |
CRM |
Project [ssn,email] over above |
accept | ∅ (over-projection collapses) |
CRM |
Filter on ssn |
refuse | — |
CardOps |
Scan users_data (cross-dept) |
accept | ∅ (no matching grant) |
FraudRisk |
Scan users_data |
accept | id, region (minimum-necessary) |
Marketing |
Scan users_data (unknown prin.) |
accept | ∅ (empty allow) |
CRM |
Scan credit_bureau_imports |
refuse | — (unknown relation) |
Each row corresponds to a corpus case in the conformance harness; the rows annotated exercise Theorems 8 and 9 of §4.
Related work
We are not aware of prior work that establishes a mechanised soundness theorem for a plan-level rewriter in an LLM-agent- facing lakehouse setting. The closest landmarks fall into four groups.
Verified authorization decision procedures. Cedar 5 formalises and proves the soundness of an authorization-decision function in Lean. The axis of verification differs from ours: Cedar establishes $\mathrm{authorize}(\mathit{request}) \in {\mathit{allow}, \mathit{deny}}$ correctly classifies a per-call request, whereas we establish that the output of a plan transformation is contained in the policy-allowed set. The two are complementary; we adopt the Cedar style of Lean-mechanised denotational semantics for our policy.
Capability-based enforcement runtimes. SEAL 8
provides capability-based access control for analytic
workloads at the runtime level; the policy core is not
mechanically verified. Our development is the dual: the policy
core is verified, and the runtime is correspondingly lighter.
Biscuit 6 is both the deployed capability-token
distribution mechanism we assume on the front end and the
policy-language layer we mechanise: the Horn fragment of its
Datalog dialect underpins our policy semantics, and the
production-grade biscuit-auth Rust crate is the gateway's
runtime evaluator (biscuit_auth::datalog::World). Block
attenuation, audience, expiry, and key rotation are explicit
out-of-scope items (§6); the column-grant surface syntax
compiles to ground right(principal, relation, column) Datalog
facts in the in-scope fragment.
Information-flow control for database-backed applications. Jeeves 9, Jacqueline 10, and the faceted-execution line 11 enforce IFC inside ORM-backed applications using a faceted-value runtime discipline. They predate the LLM-agent threat model and do not target the lakehouse setting; we view them as the closest PL-side relatives of Layer 2 of our development.
Defences for LLM-agent prompt injection. AgentDojo 12 and CaMeL 13 develop capability-flow defences at the agent boundary. Our development is complementary: the rewriter of §3–§4 enforces a policy at the lake-facing boundary on plans; their constructions enforce analogous properties on the agent's own emitted code. Closest to our Layer 2 specifically, Odersky et al. 7 propose Scala~3 capture-checking as the type-level mechanism for tracking capabilities through agent code; we adapt the same intuition under Rust's weaker type-system commitments (§3).
Deployed alternatives we improve upon. PostgreSQL row security policies 3 and equivalent CLS facilities require per-engine integration and do not compose across the heterogeneous ingest paths typical of lakehouse deployments. Open Policy Agent 14 is general-purpose but does not reason about query outputs at the plan level. Tenant segregation forfeits cross-source analytics and is the silo case we discuss in §1.
Open challenges and future work
Three extensions of the Lean development are the natural next research questions.
Cross-relation joins. The Plan IR is single-relation. The Rust implementation handles joins by per-leg rewriting but the composition is not under proof. The conjecture is a theorem of the form
on a Plan IR extended with a constructor. The join-key leak — joining on a column without projecting it, which leaks 's value distribution — mirrors the filter side-channel and admits the analogous coverage condition.
Aggregation with a differential-privacy boundary. A principal may be permitted to read without permission to read individual rows. The rewriter extension here is straightforward in shape but admits a non-trivial soundness statement once the differential-privacy boundary is parameterised; SEAL 8 and the faceted line 11 are the closest reference points.
Capability attenuation inside the proof. The Lean development takes as a flat string and assumes the gateway has already verified the bearer of a biscuit token. Modelling biscuit's Datalog-based attenuation, expiry, and audience checks inside the Lean proof lifts the principal- extraction row out of the trusted base — a substantial strengthening of the artifact's overall claim. Adjacent open problems include catalog-integrity attestation and plan- integrity in transit.
Reproducibility
verifier/lean/ Lean 4 spec + theorems + corpus emitter
prototype/ Rust workspace: postern-core, postern-diff
scenarios/ Financial-institution case study
paper/ This document
scripts/ reproduce.sh — chains everything
Toolchains: Lean 4.29.1 (pinned in verifier/lean/lean-toolchain),
Rust stable (tested 1.93). Single command:
scripts/reproduce.sh
Expected output ends with
18/18 cases pass (Lean reference == Rust impl) and an axiom
audit showing only propext and Quot.sound. Runs in under
two minutes on an M-series Mac on a warm cache.
References
::: {#refs} :::