Pre-publication research artifact

Postern: a Lean-verified access gateway for agentic data lakehouses

Plan-level rewriting against a column-grant policy. The Lean rewriter is sorry-free; a Rust mirror runs the same logic; a conformance harness keeps them byte-equal on a hand-curated JSON corpus. The Datalog evaluator and Biscuit-Datalog dispatch are in progress.

The shape

Agents reading the lakehouse aren't humans on stable roles — they're short-lived and contextual, and the same agent code may legitimately serve different principals under different tasks. RBAC's static identity → role → permission chain encodes none of that; per-engine RLS doesn't survive ETL; tenant segregation forfeits the cross-source joins. Policy has to live at the query plane, gated under the agent's task-scoped identity at query time.

The rewriter

rewrite : Catalog → Policy → Principal → Plan → Option Plan — projects the output schema to the policy-allowed columns under the querying principal and refuses plans whose filter predicates touch forbidden columns. Mechanised in Lean 4, mirrored in Rust, validated against a JSON corpus.

What the proof covers

  • proved rewrite_sound — output columns ⊆ policy-allowed
  • proved rewrite_filter_sound — predicate columns ⊆ policy-allowed
  • partial eval_monotone — proof reduces to herbrandBound_mono (one labelled sorry)
  • stated eval_sound — every derived atom traces to a fact or rule
  • stated eval_terminates — iterating past the bound is a no-op

The Datalog evaluator (verifier/lean/Datalog.lean) is a work in progress. The plan rewriter (verifier/lean/Postern.lean) is sorry-free with axioms bounded by {propext, Quot.sound}.

How the parts fit

Lean defines the reference and proves the soundness claims; postern-core mirrors the rewriter in Rust; postern-diff diff-tests Rust ↔ Lean output on a JSON corpus (18/18 cases pass); postern-guardrail provides invariant-branded capability tokens around the rewriter's outputs.

Where things live