Zion Boggan · April 2026 · Oversight Protocol v0.4.5

The oversight CLI provides six subcommands: keygen for identity creation, seal for encrypting and watermarking documents, open for decryption with policy enforcement, inspect for manifest examination without decryption, attribute for forensic watermark recovery from leaked text, and gui for the Tkinter desktop starter app. The CLI is a thin wrapper around the oversight_core Python library; all cryptographic and watermarking logic lives in the library layer.

keygen

Generate a new Oversight identity containing an X25519 keypair (encryption) and an Ed25519 keypair (signing). The command writes two files: a private identity file and a public-only sibling with the .pub.json suffix.

Flags

FlagRequiredDescription
--out PATHYesOutput path for the private identity file (JSON). A public sibling is written alongside it.
--id NAMENoHuman-readable identifier stored in the identity file. Defaults to "identity".

Output Files

The private identity file contains x25519_priv, x25519_pub, ed25519_priv, and ed25519_pub as hex strings, plus the id field. The public file (.pub.json) contains only id, x25519_pub, and ed25519_pub. Distribute the public file to issuers; keep the private file secure.

Example

oversight keygen --out alice.json --id alice@example.com

# Output:
# [+] wrote private identity to alice.json
# [+] wrote public  identity to alice.pub.json

seal

Produce a .sealed file for a specific recipient. The seal operation optionally watermarks the plaintext (L1, L2, L3), computes a content fingerprint, signs the manifest with the issuer's Ed25519 key, encrypts the content with XChaCha20-Poly1305, and wraps the DEK to the recipient's X25519 public key.

Flags

FlagRequiredDescription
INPUTYesPositional argument: path to the plaintext file to seal.
--recipient-pub PATHYesPath to the recipient's public identity file (.pub.json).
--issuer-id IDYesIssuer identifier string stored in the manifest.
--issuer-key PATHYesPath to the issuer's private identity file (JSON).
--registry-url URLYesRegistry URL stored in the manifest policy field.
--out PATHYesOutput path for the .sealed file.
--watermarkNoEnable text watermarking. Requires UTF-8 input. Applies L1 and L2 by default; L3 is governed by --l3-mode and defaults to safe behavior for wording-sensitive document classes.
--l3-mode MODENoOne of auto (default), off, full, or boilerplate. auto picks the safest mode for the document class; boilerplate marks only header, footer, and cover-page regions; full marks body text.
--l3-ackConditionalRequired when L3 would rewrite body text (full or boilerplate modes that hit body regions). Explicit acknowledgement that the recipient copy is textually non-identical to the canonical source. Recorded in the manifest under l3_policy.
--document-class CLASSNoOne of prose, legal, regulatory, technical, source_code, sql, log, or structured. Controls the auto L3 policy. If omitted, the class is inferred from content and file extension.
--registry-domain DOMAINNoDomain for DNS beacon generation. Defaults to "oversight.example".
--content-type MIMENoMIME type stored in the manifest. Defaults to "application/octet-stream".
--register URLNoIf provided, POST the manifest, watermarks, beacons, and fingerprint to this registry URL after sealing.

Watermark Behavior

When --watermark is specified and the input is valid UTF-8, the CLI generates a single 64-bit mark_id. L1 (zero-width Unicode) and L2 (trailing whitespace) are applied by default because they do not change visible bytes. L3 (synonym rotation, punctuation, spelling, contractions, number formatting) is governed by --l3-mode.

Under --l3-mode auto (the default), v0.4.5 picks the safest mode for the document class: full for prose, boilerplate for legal and regulatory, and off for technical specifications, source code, SQL, logs, and structured data. boilerplate marks only header, footer, and cover-page style regions. Whenever L3 would rewrite body text, --l3-ack is required and the manifest records the acknowledgement, mode, and document class under l3_policy. The manifest also records canonical_content_hash, a SHA-256 of the pre-watermark source bytes, so disputes can verify the recipient copy against the canonical source.

If the input is not valid UTF-8 (for example, binary files), watermarking is skipped with a warning. The seal still proceeds without marks.

L3 Safety Examples

# Legal contract: L3 off, body text is byte-identical to canonical source
oversight seal contract.txt --to alice --l3-mode off

# Legal contract with boilerplate-only marks, ack required
oversight seal contract.txt --to alice --l3-mode boilerplate --l3-ack --document-class legal

# Ordinary prose: full L3 with explicit ack
oversight seal memo.txt --to alice --l3-mode full --l3-ack --document-class prose

Fingerprint Output

For UTF-8 inputs, the CLI also computes a ContentFingerprint (winnowing + sentence hashing) and writes it to OUTPUT.fingerprint.json alongside the sealed file. This fingerprint is the server-side record used for attribution when all embedded watermarks have been stripped (the VM-export attack defense). If --register is specified, the fingerprint is included in the registration payload.

Example

oversight seal report.txt \
  --recipient-pub alice.pub.json \
  --issuer-id legal-dept \
  --issuer-key issuer.json \
  --registry-url https://reg.example.com \
  --out report.sealed \
  --watermark \
  --register https://reg.example.com

# Output:
# [+] embedded L1 mark a1b2c3d4e5f60718
# [+] embedded L2 mark a1b2c3d4e5f60718
# [+] embedded L3 mark a1b2c3d4e5f60718 (semantic + punctuation)
# [+] content fingerprint: 142 winnow hashes, 38 sentence hashes
# [+] wrote report.sealed (14832 bytes)
# [+] file_id=9f3a...
# [+] recipient=alice@example.com
# [+] beacons=4  watermarks=3
# [+] wrote fingerprint to report.fingerprint.json
# [+] registered with https://reg.example.com: {"status": "ok"}

open

Decrypt a .sealed file using the recipient's private identity. The open operation verifies the manifest signature, enforces policy constraints (time windows, jurisdiction, max_opens), unwraps the DEK, decrypts the content, and performs a post-decrypt content hash check.

Flags

FlagRequiredDescription
INPUTYesPositional argument: path to the .sealed file.
--identity PATHYesPath to the recipient's private identity file (JSON).
--out PATHYesOutput path for the decrypted plaintext.

Policy Enforcement

Before decryption proceeds, the client checks the manifest's policy fields against the current environment. If not_after is set and the system clock exceeds it, decryption is refused. If max_opens is set, an atomic check-and-bump operation (advisory file lock + atomic rename to prevent TOCTOU races) enforces the limit. If the open count exceeds the maximum, decryption does not proceed and the violation is logged. Jurisdiction checks use IP geolocation when configured.

Example

oversight open report.sealed \
  --identity alice.json \
  --out report_decrypted.txt

# Output:
# [+] decrypted to report_decrypted.txt
# [+] file_id   = 9f3a...
# [+] issuer    = legal-dept
# [+] recipient = alice@example.com
# [+] marks     = 3
# [+] beacons   = 4

inspect

Dump the signed manifest from a .sealed file without decrypting the content. Useful for auditing metadata (issuer, recipient, policy, watermark references, beacons) without requiring the recipient's private key.

Flags

FlagRequiredDescription
INPUTYesPositional argument: path to the .sealed file.

Output

Prints the manifest as indented JSON to stdout, followed by a line indicating whether the Ed25519 signature is valid. The manifest includes all fields: file_id, issued_at, issuer_id, recipient binding, watermark references, beacon tokens, policy constraints, and algorithm suite.

Example

oversight inspect report.sealed

# Output:
# {
#   "file_id": "9f3a...",
#   "issued_at": 1745020800,
#   "version": "OVERSIGHT-v1",
#   "suite": "OSGT-CLASSIC-v1",
#   "issuer_id": "legal-dept",
#   "recipient": {
#     "recipient_id": "alice@example.com",
#     "x25519_pub": "a4b5..."
#   },
#   "watermarks": [
#     {"layer": "L1_zero_width", "mark_id": "a1b2c3d4e5f60718"},
#     {"layer": "L2_whitespace", "mark_id": "a1b2c3d4e5f60718"},
#     {"layer": "L3_semantic",   "mark_id": "a1b2c3d4e5f60718"}
#   ],
#   ...
# }
#
# [valid manifest signature] True

attribute

Forensic attribution pipeline for leaked text. Reads a suspected leak, attempts to recover watermark marks across all three layers, queries the registry for recipient identification, and optionally compares content fingerprints against stored copies. The pipeline runs in five phases.

Flags

FlagRequiredDescription
--leak PATHYesPath to the leaked text file (read as UTF-8).
--registry URLYesRegistry server URL for mark_id lookups and candidate retrieval.
--fingerprints PATHNoPath to a .fingerprint.json file or a directory containing multiple fingerprint files. Enables Phase 5 content fingerprint comparison for VM-strip-export detection.

Five-Phase Pipeline

Phase 1: Direct Extraction (L1 + L2)

Extracts zero-width Unicode frames (L1) and trailing whitespace marks (L2) directly from the text. L1 reports the number of frames found and unique mark IDs. L2 uses partial recovery (extract_ws_partial), reporting bits recovered, total bits needed, and confidence percentage.

Phase 2: Registry Query

For each mark_id recovered in Phase 1, queries the registry's POST /attribute endpoint to resolve the mark to a recipient and file. Also fetches all known mark_ids from the registry's GET /marks endpoint to build a candidate set for L3 verification in the next phase.

Phase 3: L3 Semantic Verification

Tests all candidate mark_ids (from L1/L2 extraction plus registry candidates) against the semantic marks in the text. For each candidate, verify_semantic() checks synonym choices, punctuation style, spelling variants, and contractions. Candidates scoring above the 0.65 weighted threshold are reported with per-sublayer detail.

Phase 4: Multi-Layer Fusion

Combines evidence from all layers using recover_marks_v2(). Bayesian fusion scores each candidate using independence-assumption combination: L1 exact match scores 0.95, L2 exact match scores 0.90 (scaled by confidence for partial recovery), L3 match scores the semantic score times 0.85. When multiple layers agree on the same mark_id, the combined score is computed as 1 - (1 - s1)(1 - s2)...(1 - sN). The best candidate is reported with confidence percentage and evidence breakdown.

Phase 5: Content Fingerprint Comparison (Optional)

If --fingerprints is provided, computes a ContentFingerprint of the leaked text and compares it against each stored fingerprint file. Reports winnowing similarity (Jaccard), sentence similarity (set overlap), combined score (0.4 * winnowing + 0.6 * sentence), and a verdict (MATCH, LIKELY, UNLIKELY, NO_MATCH). Threshold for positive attribution is combined score ≥ 0.3.

Phase 5 is the fallback for the VM-strip-export attack, where the adversary opens the document in an airgapped VM, strips all invisible characters and whitespace artifacts, and exports a clean file. The content fingerprint identifies which recipient's copy was the source by comparing the text structure (not the watermarks) against stored records.

Example: Full Attribution

oversight attribute \
  --leak leaked_report.txt \
  --registry https://reg.example.com \
  --fingerprints ./sealed_copies/

# Output:
# [*] Phase 1: Direct extraction (L1 + L2)
#     L1: 24 frames, 1 unique mark(s)
#         a1b2c3d4e5f60718
#     L2: 64/64 bits recovered (100%): a1b2c3d4e5f60718
#
# [*] Phase 2: Registry query (https://reg.example.com)
#     MATCH: a1b2c3d4e5f60718 -> recipient=alice, file=9f3a...
#     fetched 3 candidate mark_id(s) from registry
#
# [*] Phase 3: L3 semantic verification (3 candidate(s))
#     L3 MATCH: a1b2c3d4e5f60718 score=0.87 (synonyms=0.87, punct=2/3, dict=v2)
#
# [*] Phase 4: Multi-layer fusion
#     a1b2c3d4e5f60718  score=0.999  layers=L1+L2+L3
#
# [!!] BEST ATTRIBUTION: a1b2c3d4e5f60718
#      confidence = 99.9%
#      evidence   = L1+L2+L3
#      file_id    = 9f3a...
#      recipient  = alice
#      issuer     = legal-dept
#
# [*] Phase 5: Content fingerprint comparison
#     Leak fingerprint: 138 winnow hashes, 36 sentence hashes
#     report.fingerprint.json: recipient=alice winnow=0.91 sentence=0.94 combined=0.93 [MATCH]
#
# [!!] FINGERPRINT ATTRIBUTION [MATCH]:
#      recipient  = alice
#      mark_id    = a1b2c3d4e5f60718
#      confidence = 92.8%

Example: Attribution After Watermark Stripping

oversight attribute \
  --leak stripped_report.txt \
  --registry https://reg.example.com \
  --fingerprints ./sealed_copies/

# Output:
# [*] Phase 1: Direct extraction (L1 + L2)
#     L1: no zero-width frames found (stripped?)
#     L2: no trailing whitespace marks found (stripped?)
#
# [*] Phase 2: Registry query (https://reg.example.com)
#     fetched 3 candidate mark_id(s) from registry
#
# [*] Phase 3: L3 semantic verification (3 candidate(s))
#     L3 MATCH: a1b2c3d4e5f60718 score=0.82 (synonyms=0.82, punct=3/3, dict=v2)
#
# [*] Phase 4: Multi-layer fusion
#     a1b2c3d4e5f60718  score=0.697  layers=L3
#
# [!!] BEST ATTRIBUTION: a1b2c3d4e5f60718
#      confidence = 69.7%
#      evidence   = L3
#
# [*] Phase 5: Content fingerprint comparison
#     report.fingerprint.json: recipient=alice winnow=0.88 sentence=0.91 combined=0.90 [MATCH]
#
# [!!] FINGERPRINT ATTRIBUTION [MATCH]:
#      recipient  = alice
#      confidence = 89.8%

gui

Launch the Tkinter desktop starter app added in v0.4.5. The GUI exposes the three first-contact workflows a recipient or issuer typically needs: generate an identity, seal a file to a recipient, and open a sealed file. It is built on the standard library tkinter toolkit, so no additional GUI dependencies are installed alongside the existing wheel.

Scope

The GUI starter is a stepping stone, not the public launch surface. The broader launch sequence (web viewer, drag-drop share workflow, Outlook add-in, design-partner deployment, SOC 2 Type 1 scoping) still gates a broad public launch. For production batch workflows, prefer the CLI.

Example

oversight gui

# Opens a Tkinter window with three panes:
#   - Generate identity: writes a new .json / .pub.json pair.
#   - Seal: pick input file, recipient .pub.json, issuer identity, output path.
#   - Open: pick a .sealed file and a recipient identity; writes decrypted output.

This reference documents the Python CLI as of v0.4.5, including the L3 safety flags (--l3-mode, --l3-ack, --document-class) and the oversight gui subcommand. The Rust CLI (oversight-cli crate) supports keygen, seal, open, and inspect but does not yet embed L3 watermarks or compute content fingerprints. Consult the repository for the latest CLI capabilities.