The most powerful watermark Oversight ships is also the one that changes the document. Layer 3 (L3) encodes the recipient identifier into the wording of the prose: a synonym swap here, an Oxford comma there, a British or American spelling chosen by the deterministic expansion of a 64-bit mark. L3 survives the attacks that defeat L1 and L2, including screenshot-and-OCR, aggressive reformatting, and manual retyping, because it lives in choices the reader does not perceive as choices at all.
That is exactly why L3 is dangerous. For most prose, a substituted synonym is invisible. For a contract, a regulation, a technical specification, a piece of source code, or a log file, the exact wording is the document. If L3 rewrites a clause, a defined term, or a keyword, the recipient no longer has the document they were meant to receive. This post documents how v0.4.5 makes L3 safe by default, why the acknowledgement flow is non-optional, the collusion limit I refuse to paper over, and the Tkinter GUI starter that begins the path from CLI tool to broadly usable product.
The wording-sensitive class problem
An L3 synonym rotation treats text as a sequence of word choices governed by 151 synonym classes plus sublayers for punctuation, spelling, contractions, and number formatting. The embedding function is deterministic: given a mark ID and a position, the chosen variant is fixed. Two recipients with different mark IDs receive different wordings of the same semantic content. That is the source of L3's attribution power, and it is also the source of its risk.
Consider a contract that says "the Licensee shall commence performance within thirty (30) days." L3 might rewrite commence as begin or start, each a valid synonym under the surface linguistics. In ordinary prose that is a rounding error. In a contract, a regulatory filing, or a standards document written under RFC 2119 and BCP 14 discipline, that word is the hinge the obligation turns on. In source code or SQL, the substitution may not even parse. In a log or a structured data payload, it breaks downstream consumers silently.
The honest answer is that L3 must not be applied to these documents by default. v0.4.5
introduces a document-class-aware policy in oversight_core/l3_policy.py that
defaults L3 off for legal, regulatory, technical-spec, source-code, SQL, log, and
structured-data inputs. L1 and L2 remain on, because they do not change visible bytes,
and the signed manifest still binds the sealed content to the recipient. What is lost is
the paraphrase-resistant attribution that only L3 provides. What is gained is that the
recipient's copy is the canonical source, byte for byte.
New seal flags and the acknowledgement flow
The seal CLI (and the Rich CLI) now take three new flags:
--l3-mode: one ofauto,off,full, orboilerplate.--l3-ack: the explicit acknowledgement that L3 will produce a recipient copy whose text is non-identical to the canonical source.--document-class: one of the recognized classes (prose,legal,regulatory,technical,source_code,sql,log,structured).
auto is the default. It inspects the document class (either provided by the
user or detected from content and file extension) and picks the safest mode for that
class. For prose and marketing content, auto enables full L3. For contracts
and specs, auto enables boilerplate mode, which marks only the
header, footer, and cover-page regions where synonym rotation is safe. For source code
and logs, auto disables L3 entirely.
Whenever L3 would rewrite body text, in either full or
boilerplate mode, the CLI refuses to seal without --l3-ack.
The acknowledgement is not a dialog box that you dismiss without reading. It is a
structural gate: the seal path raises an error, prints the non-identity warning, and
exits. You pass --l3-ack only when you have read the warning and accepted
that the recipient is getting a non-identical copy. That decision is recorded in the
manifest under l3_policy, alongside the selected mode and document class,
so a later auditor can see that the non-identity was disclosed at seal time.
The high-level API follows the same discipline. watermark.apply_all() and
formats.text.apply() now make L3 opt-in. Library users have to pass
l3_mode explicitly; the default path returns L1+L2 watermarked output and
never rewrites words. Upgrading to v0.4.5 will not silently turn L3 on for anyone.
canonical_content_hash as a dispute anchor
Even with every safety gate in place, an L3-sealed copy is textually non-identical to the
canonical source. If a dispute later turns on whether the recipient received the exact
text of the contract, the answer has to be demonstrable. v0.4.5 adds
canonical_content_hash to the manifest: a SHA-256 hash of the original
source bytes, computed before any watermarking. The recipient can verify that the
non-identity is bounded, that their L3-modified copy derives from a specific canonical
source hash both parties agreed to, and a neutral auditor can compare the sealed source
archive against that hash. The manifest also records the full l3_policy
(mode, document class, ack) so the dispute-resolution posture at seal time is part of
the signed record.
The collusion limit
L3 attribution rests on a uniqueness assumption: the mark ID determines the wording, so two recipient copies differ at the positions where their mark IDs differ. If N recipients collude and diff their copies, they can identify every controlled vocabulary position and canonicalize it, replacing each variant with a single canonical choice, before leaking. The leaked text still carries the original semantic content, but the attribution signal is destroyed. L3 cannot defend against this on its own.
Pretending otherwise would be dishonest, so v0.4.5's docs/security.md states
the limit plainly. Issuers should treat L3 as attribution evidence against ordinary leaks
and low-to-medium effort stripping, not as a collusion-resistant watermark. Mitigations
under evaluation include per-recipient vocabulary randomization, stronger candidate
scoring that models collusion edits, and warnings or thresholds for large recipient sets
before L3 is enabled. None of those ship as part of v0.4.5. What ships is the honest
labeling and a specific audit trail (canonical_content_hash,
l3_policy) that lets a later investigation reconstruct what was claimed at
seal time.
The GUI starter
The roadmap correction for v0.4.5 is blunt: I do not launch Oversight broadly on HN or
Reddit while the only interface is a CLI. A provenance protocol whose non-technical
recipients cannot open a sealed file is a protocol they will not use. v0.4.5 begins the
migration with a Tkinter desktop starter app, invoked with oversight gui,
that provides the three first-contact flows a recipient or issuer needs: generate a
keypair, seal a file to a recipient, and open a sealed file. The code lives in the
standard library Tkinter toolkit, so there is no GUI-specific dependency to install on
top of the existing wheel.
The Tkinter starter is not the launch surface. It is a stepping stone. The public launch sequence is: L3 safety (shipped), GUI and web viewer and drag-drop share workflow, Outlook add-in, one regulated-industry design partner deployment, SOC 2 Type 1 scoping in parallel, and only then a broad public launch. Drive, Box, SharePoint, and Teams plugins are deferred until there is a maintainer or design partner paying for them. FedRAMP is dropped from near-term planning; it is a multi-year program requiring sponsor-agency backing, and Oversight has not earned that yet.
GUI hardening after fuzz testing
A follow-up GUI fuzz pass found that the first Tkinter build was too trusting about
local output paths. The patch now blocks seal/open outputs that point at selected input
files, refuses to overwrite Oversight private-key JSON, rejects Windows device names,
normalizes malformed key and manifest JSON into clean user-facing errors, writes keys
atomically, and rejects container suite-byte tamper or trailing bytes after ciphertext.
The GUI beacon path also now derives its domain from the configured registry URL and
binds beacons to the actual manifest file_id.
Dependency floors after a Dependabot follow-up
v0.4.5 also raises PyPI and Rust dependency floors after a Dependabot and advisory
follow-up pass. The Python minimums are setuptools>=78.1.1,
cryptography>=46.0.7, PyNaCl>=1.6.2,
pydantic>=2.4.0, python-multipart>=0.0.26,
Pillow>=12.2.0, and pypdf>=6.10.2. The Rust
manifest floors now include patched minima for sqlx, tokio,
rand_core, zip, chrono, regex,
once_cell, and tracing-subscriber. Local
pip-audit -r requirements.txt is clean, and OSV lower-bound checks are
clean for the declared Python and Rust floors.
What v0.4.5 is not
v0.4.5 does not solve collusion. It does not ship the web viewer. It does not replace the CLI; the Tkinter GUI is a starter, not the launch surface. It does not add a cloud vendor or a hosted service. This is a safety release: a set of opinions about which documents L3 should never touch, a disclosure gate for the ones where it might, a hash in the manifest that lets disputes reach ground truth, and a first step toward an interface that a non-technical recipient can actually use. The next time someone asks whether a semantic watermark can be applied to their contract, the answer is "not by default, and only if you acknowledge what that means." That answer is the point of this release.