CLI Reference
Complete reference for the oversight command-line tool
Zion Boggan · April 2026 · Oversight Protocol v0.4.5
The oversight CLI provides six subcommands: keygen for
identity creation, seal for encrypting and watermarking documents,
open for decryption with policy enforcement, inspect for
manifest examination without decryption, attribute for forensic
watermark recovery from leaked text, and gui for the Tkinter desktop
starter app. The CLI is a thin wrapper around the oversight_core
Python library; all cryptographic and watermarking logic lives in the library layer.
keygen
Generate a new Oversight identity containing an X25519 keypair (encryption) and an
Ed25519 keypair (signing). The command writes two files: a private identity file and
a public-only sibling with the .pub.json suffix.
Flags
| Flag | Required | Description |
|---|---|---|
--out PATH | Yes | Output path for the private identity file (JSON). A public sibling is written alongside it. |
--id NAME | No | Human-readable identifier stored in the identity file. Defaults to "identity". |
Output Files
The private identity file contains x25519_priv, x25519_pub,
ed25519_priv, and ed25519_pub as hex strings, plus the
id field. The public file (.pub.json) contains only
id, x25519_pub, and ed25519_pub. Distribute
the public file to issuers; keep the private file secure.
Example
oversight keygen --out alice.json --id alice@example.com
# Output:
# [+] wrote private identity to alice.json
# [+] wrote public identity to alice.pub.json
seal
Produce a .sealed file for a specific recipient. The seal operation
optionally watermarks the plaintext (L1, L2, L3), computes a content fingerprint,
signs the manifest with the issuer's Ed25519 key, encrypts the content with
XChaCha20-Poly1305, and wraps the DEK to the recipient's X25519 public key.
Flags
| Flag | Required | Description |
|---|---|---|
INPUT | Yes | Positional argument: path to the plaintext file to seal. |
--recipient-pub PATH | Yes | Path to the recipient's public identity file (.pub.json). |
--issuer-id ID | Yes | Issuer identifier string stored in the manifest. |
--issuer-key PATH | Yes | Path to the issuer's private identity file (JSON). |
--registry-url URL | Yes | Registry URL stored in the manifest policy field. |
--out PATH | Yes | Output path for the .sealed file. |
--watermark | No | Enable text watermarking. Requires UTF-8 input. Applies L1 and L2 by default; L3 is governed by --l3-mode and defaults to safe behavior for wording-sensitive document classes. |
--l3-mode MODE | No | One of auto (default), off, full, or boilerplate. auto picks the safest mode for the document class; boilerplate marks only header, footer, and cover-page regions; full marks body text. |
--l3-ack | Conditional | Required when L3 would rewrite body text (full or boilerplate modes that hit body regions). Explicit acknowledgement that the recipient copy is textually non-identical to the canonical source. Recorded in the manifest under l3_policy. |
--document-class CLASS | No | One of prose, legal, regulatory, technical, source_code, sql, log, or structured. Controls the auto L3 policy. If omitted, the class is inferred from content and file extension. |
--registry-domain DOMAIN | No | Domain for DNS beacon generation. Defaults to "oversight.example". |
--content-type MIME | No | MIME type stored in the manifest. Defaults to "application/octet-stream". |
--register URL | No | If provided, POST the manifest, watermarks, beacons, and fingerprint to this registry URL after sealing. |
Watermark Behavior
When --watermark is specified and the input is valid UTF-8, the CLI
generates a single 64-bit mark_id. L1 (zero-width Unicode) and L2
(trailing whitespace) are applied by default because they do not change visible bytes.
L3 (synonym rotation, punctuation, spelling, contractions, number formatting) is
governed by --l3-mode.
Under --l3-mode auto (the default), v0.4.5 picks the safest mode for the
document class: full for prose, boilerplate for legal and
regulatory, and off for technical specifications, source code, SQL, logs,
and structured data. boilerplate marks only header, footer, and cover-page
style regions. Whenever L3 would rewrite body text, --l3-ack is required
and the manifest records the acknowledgement, mode, and document class under
l3_policy. The manifest also records canonical_content_hash,
a SHA-256 of the pre-watermark source bytes, so disputes can verify the recipient copy
against the canonical source.
If the input is not valid UTF-8 (for example, binary files), watermarking is skipped with a warning. The seal still proceeds without marks.
L3 Safety Examples
# Legal contract: L3 off, body text is byte-identical to canonical source
oversight seal contract.txt --to alice --l3-mode off
# Legal contract with boilerplate-only marks, ack required
oversight seal contract.txt --to alice --l3-mode boilerplate --l3-ack --document-class legal
# Ordinary prose: full L3 with explicit ack
oversight seal memo.txt --to alice --l3-mode full --l3-ack --document-class prose
Fingerprint Output
For UTF-8 inputs, the CLI also computes a ContentFingerprint (winnowing +
sentence hashing) and writes it to OUTPUT.fingerprint.json alongside the
sealed file. This fingerprint is the server-side record used for attribution when all
embedded watermarks have been stripped (the VM-export attack defense). If
--register is specified, the fingerprint is included in the registration
payload.
Example
oversight seal report.txt \
--recipient-pub alice.pub.json \
--issuer-id legal-dept \
--issuer-key issuer.json \
--registry-url https://reg.example.com \
--out report.sealed \
--watermark \
--register https://reg.example.com
# Output:
# [+] embedded L1 mark a1b2c3d4e5f60718
# [+] embedded L2 mark a1b2c3d4e5f60718
# [+] embedded L3 mark a1b2c3d4e5f60718 (semantic + punctuation)
# [+] content fingerprint: 142 winnow hashes, 38 sentence hashes
# [+] wrote report.sealed (14832 bytes)
# [+] file_id=9f3a...
# [+] recipient=alice@example.com
# [+] beacons=4 watermarks=3
# [+] wrote fingerprint to report.fingerprint.json
# [+] registered with https://reg.example.com: {"status": "ok"}
open
Decrypt a .sealed file using the recipient's private identity. The open
operation verifies the manifest signature, enforces policy constraints (time windows,
jurisdiction, max_opens), unwraps the DEK, decrypts the content, and performs a
post-decrypt content hash check.
Flags
| Flag | Required | Description |
|---|---|---|
INPUT | Yes | Positional argument: path to the .sealed file. |
--identity PATH | Yes | Path to the recipient's private identity file (JSON). |
--out PATH | Yes | Output path for the decrypted plaintext. |
Policy Enforcement
Before decryption proceeds, the client checks the manifest's policy fields against
the current environment. If not_after is set and the system clock
exceeds it, decryption is refused. If max_opens is set, an atomic
check-and-bump operation (advisory file lock + atomic rename to prevent TOCTOU races)
enforces the limit. If the open count exceeds the maximum, decryption does not proceed
and the violation is logged. Jurisdiction checks use IP geolocation when configured.
Example
oversight open report.sealed \
--identity alice.json \
--out report_decrypted.txt
# Output:
# [+] decrypted to report_decrypted.txt
# [+] file_id = 9f3a...
# [+] issuer = legal-dept
# [+] recipient = alice@example.com
# [+] marks = 3
# [+] beacons = 4
inspect
Dump the signed manifest from a .sealed file without decrypting
the content. Useful for auditing metadata (issuer, recipient, policy, watermark
references, beacons) without requiring the recipient's private key.
Flags
| Flag | Required | Description |
|---|---|---|
INPUT | Yes | Positional argument: path to the .sealed file. |
Output
Prints the manifest as indented JSON to stdout, followed by a line indicating whether the Ed25519 signature is valid. The manifest includes all fields: file_id, issued_at, issuer_id, recipient binding, watermark references, beacon tokens, policy constraints, and algorithm suite.
Example
oversight inspect report.sealed
# Output:
# {
# "file_id": "9f3a...",
# "issued_at": 1745020800,
# "version": "OVERSIGHT-v1",
# "suite": "OSGT-CLASSIC-v1",
# "issuer_id": "legal-dept",
# "recipient": {
# "recipient_id": "alice@example.com",
# "x25519_pub": "a4b5..."
# },
# "watermarks": [
# {"layer": "L1_zero_width", "mark_id": "a1b2c3d4e5f60718"},
# {"layer": "L2_whitespace", "mark_id": "a1b2c3d4e5f60718"},
# {"layer": "L3_semantic", "mark_id": "a1b2c3d4e5f60718"}
# ],
# ...
# }
#
# [valid manifest signature] True
attribute
Forensic attribution pipeline for leaked text. Reads a suspected leak, attempts to recover watermark marks across all three layers, queries the registry for recipient identification, and optionally compares content fingerprints against stored copies. The pipeline runs in five phases.
Flags
| Flag | Required | Description |
|---|---|---|
--leak PATH | Yes | Path to the leaked text file (read as UTF-8). |
--registry URL | Yes | Registry server URL for mark_id lookups and candidate retrieval. |
--fingerprints PATH | No | Path to a .fingerprint.json file or a directory containing multiple fingerprint files. Enables Phase 5 content fingerprint comparison for VM-strip-export detection. |
Five-Phase Pipeline
Phase 1: Direct Extraction (L1 + L2)
Extracts zero-width Unicode frames (L1) and trailing whitespace marks (L2) directly
from the text. L1 reports the number of frames found and unique mark IDs. L2 uses
partial recovery (extract_ws_partial), reporting bits recovered, total
bits needed, and confidence percentage.
Phase 2: Registry Query
For each mark_id recovered in Phase 1, queries the registry's
POST /attribute endpoint to resolve the mark to a recipient and file.
Also fetches all known mark_ids from the registry's GET /marks endpoint
to build a candidate set for L3 verification in the next phase.
Phase 3: L3 Semantic Verification
Tests all candidate mark_ids (from L1/L2 extraction plus registry candidates) against
the semantic marks in the text. For each candidate, verify_semantic()
checks synonym choices, punctuation style, spelling variants, and contractions.
Candidates scoring above the 0.65 weighted threshold are reported with per-sublayer
detail.
Phase 4: Multi-Layer Fusion
Combines evidence from all layers using recover_marks_v2(). Bayesian
fusion scores each candidate using independence-assumption combination:
L1 exact match scores 0.95, L2 exact match scores 0.90 (scaled by confidence for
partial recovery), L3 match scores the semantic score times 0.85. When multiple
layers agree on the same mark_id, the combined score is computed as
1 - (1 - s1)(1 - s2)...(1 - sN). The best candidate is reported
with confidence percentage and evidence breakdown.
Phase 5: Content Fingerprint Comparison (Optional)
If --fingerprints is provided, computes a ContentFingerprint
of the leaked text and compares it against each stored fingerprint file. Reports
winnowing similarity (Jaccard), sentence similarity (set overlap), combined score
(0.4 * winnowing + 0.6 * sentence), and a verdict (MATCH, LIKELY, UNLIKELY, NO_MATCH).
Threshold for positive attribution is combined score ≥ 0.3.
Phase 5 is the fallback for the VM-strip-export attack, where the adversary opens the document in an airgapped VM, strips all invisible characters and whitespace artifacts, and exports a clean file. The content fingerprint identifies which recipient's copy was the source by comparing the text structure (not the watermarks) against stored records.
Example: Full Attribution
oversight attribute \
--leak leaked_report.txt \
--registry https://reg.example.com \
--fingerprints ./sealed_copies/
# Output:
# [*] Phase 1: Direct extraction (L1 + L2)
# L1: 24 frames, 1 unique mark(s)
# a1b2c3d4e5f60718
# L2: 64/64 bits recovered (100%): a1b2c3d4e5f60718
#
# [*] Phase 2: Registry query (https://reg.example.com)
# MATCH: a1b2c3d4e5f60718 -> recipient=alice, file=9f3a...
# fetched 3 candidate mark_id(s) from registry
#
# [*] Phase 3: L3 semantic verification (3 candidate(s))
# L3 MATCH: a1b2c3d4e5f60718 score=0.87 (synonyms=0.87, punct=2/3, dict=v2)
#
# [*] Phase 4: Multi-layer fusion
# a1b2c3d4e5f60718 score=0.999 layers=L1+L2+L3
#
# [!!] BEST ATTRIBUTION: a1b2c3d4e5f60718
# confidence = 99.9%
# evidence = L1+L2+L3
# file_id = 9f3a...
# recipient = alice
# issuer = legal-dept
#
# [*] Phase 5: Content fingerprint comparison
# Leak fingerprint: 138 winnow hashes, 36 sentence hashes
# report.fingerprint.json: recipient=alice winnow=0.91 sentence=0.94 combined=0.93 [MATCH]
#
# [!!] FINGERPRINT ATTRIBUTION [MATCH]:
# recipient = alice
# mark_id = a1b2c3d4e5f60718
# confidence = 92.8%
Example: Attribution After Watermark Stripping
oversight attribute \
--leak stripped_report.txt \
--registry https://reg.example.com \
--fingerprints ./sealed_copies/
# Output:
# [*] Phase 1: Direct extraction (L1 + L2)
# L1: no zero-width frames found (stripped?)
# L2: no trailing whitespace marks found (stripped?)
#
# [*] Phase 2: Registry query (https://reg.example.com)
# fetched 3 candidate mark_id(s) from registry
#
# [*] Phase 3: L3 semantic verification (3 candidate(s))
# L3 MATCH: a1b2c3d4e5f60718 score=0.82 (synonyms=0.82, punct=3/3, dict=v2)
#
# [*] Phase 4: Multi-layer fusion
# a1b2c3d4e5f60718 score=0.697 layers=L3
#
# [!!] BEST ATTRIBUTION: a1b2c3d4e5f60718
# confidence = 69.7%
# evidence = L3
#
# [*] Phase 5: Content fingerprint comparison
# report.fingerprint.json: recipient=alice winnow=0.88 sentence=0.91 combined=0.90 [MATCH]
#
# [!!] FINGERPRINT ATTRIBUTION [MATCH]:
# recipient = alice
# confidence = 89.8%
gui
Launch the Tkinter desktop starter app added in v0.4.5. The GUI exposes the three
first-contact workflows a recipient or issuer typically needs: generate an identity,
seal a file to a recipient, and open a sealed file. It is built on the standard library
tkinter toolkit, so no additional GUI dependencies are installed alongside
the existing wheel.
Scope
The GUI starter is a stepping stone, not the public launch surface. The broader launch sequence (web viewer, drag-drop share workflow, Outlook add-in, design-partner deployment, SOC 2 Type 1 scoping) still gates a broad public launch. For production batch workflows, prefer the CLI.
Example
oversight gui
# Opens a Tkinter window with three panes:
# - Generate identity: writes a new .json / .pub.json pair.
# - Seal: pick input file, recipient .pub.json, issuer identity, output path.
# - Open: pick a .sealed file and a recipient identity; writes decrypted output.
This reference documents the Python CLI as of v0.4.5, including the L3 safety flags
(--l3-mode, --l3-ack, --document-class) and the
oversight gui subcommand. The Rust CLI (oversight-cli crate)
supports keygen, seal, open, and
inspect but does not yet embed L3 watermarks or compute content
fingerprints. Consult the
repository for the
latest CLI capabilities.