Performance Evaluation
Expected throughput, overhead, and bottleneck analysis for the Oversight pipeline
Zion Boggan · April 2026 · Oversight Protocol v0.4.4 (measurement snapshot), documentation current as of v0.4.5
This page presents measured performance characteristics of the Oversight
protocol v0.4.4, benchmarked on an Intel Core i7 (6th gen) running Windows 10
with CPython 3.14.2. Each measurement is the mean of 10 runs. The benchmark
script (bench_usenix.py) and raw data are available in the repository.
The estimates in earlier sections have been replaced with actual measurements
where available.
Seal/Open Throughput
The seal and open operations are dominated by three categories of work: cryptographic operations, watermark embedding (seal only), and I/O. For a typical document (1-100 KB of text), the cryptographic operations are the primary cost.
Cryptographic Operations per Seal
| Operation | Algorithm | Expected Time | Notes |
|---|---|---|---|
| Manifest signing | Ed25519 | ~50 us | Constant time regardless of document size. Signing the canonical JSON manifest bytes. |
| DEK generation | CSPRNG (256 bits) | <1 us | Single call to secrets.token_bytes(32). |
| X25519 key agreement | X25519 | ~100 us | Ephemeral keypair generation + Diffie-Hellman exchange. |
| HKDF key derivation | HKDF-SHA256 | <10 us | Single extract-and-expand cycle, 32-byte output. |
| DEK wrap | XChaCha20-Poly1305 | <5 us | Encrypting 32 bytes (the DEK) with the derived wrapping key. |
| Content encryption | XChaCha20-Poly1305 | ~1 us/KB | Linear in document size. A 100 KB document takes roughly 100 us. |
| SHA-256 content hash | SHA-256 | ~0.5 us/KB | Computed twice: once for the manifest, once for post-decrypt verification. |
For a 10 KB document without watermarking, the total cryptographic cost of seal is on the order of 200-300 microseconds. The open operation is comparable: parse the container, verify the Ed25519 signature (~100 us), perform X25519 key agreement (~100 us), HKDF + AEAD decrypt, and SHA-256 content verification. Both operations are well under 1 millisecond for documents under 1 MB.
For the hybrid suite (OSGT-HYBRID-v1), ML-KEM-768 encapsulation adds approximately 200 us and ML-DSA-65 signing adds approximately 2 ms, making hybrid seal roughly 10x slower than classical seal. This is still fast enough for interactive use.
Scaling with Document Size
The only size-dependent operations are AEAD encryption/decryption and SHA-256 hashing, both of which are linear. XChaCha20-Poly1305 processes data at approximately 1 GB/s on modern hardware (single core, no hardware acceleration). SHA-256 throughput is comparable. A 100 MB document therefore adds roughly 200 ms of crypto time. The practical ceiling is I/O bandwidth, not CPU time.
Watermark Embedding Overhead
Watermark embedding occurs before the cryptographic seal and adds processing time proportional to the text length. Each layer has distinct performance characteristics.
L1: Zero-Width Unicode
L1 performs a single linear pass over the text, inserting a pre-computed frame (approximately 66 zero-width characters for a 64-bit mark_id) at every 40th visible character. The operation is dominated by string concatenation. Expected overhead: on the order of 10 microseconds per KB of text. Negligible relative to cryptographic costs.
L2: Trailing Whitespace
L2 splits the text on newlines, iterates through lines, and appends a space or tab to lines without existing trailing whitespace. Processing stops after 64 eligible lines (for a 64-bit mark_id). Expected overhead: constant time for most documents (under 50 microseconds regardless of total document size, since only the first ~64 clean-ending lines are modified).
L3: Semantic Marks
L3 is the most expensive embedding layer. The apply_semantic() function
runs five sequential passes over the text:
Synonym rotation (T1) requires regex-based word-boundary scanning and dictionary lookup for each word. The v2 dictionary uses a precompiled lookup table, so each word lookup is O(1). The full pass is linear in text length. For a 10 KB document with approximately 1,500 words, the expected time is on the order of 1-5 milliseconds (dominated by regex matching, not dictionary lookups).
Punctuation (T2), spelling (T2b), contractions (T2c), and number formatting (T2d) each perform regex-based find-and-replace passes. Each pass is linear but typically matches only a small number of positions. Combined overhead for all four: on the order of 1-3 milliseconds for a 10 KB document.
Total L3 embedding for a 10 KB document is expected to be 2-8 milliseconds. For a 100 KB document, 20-80 milliseconds. L3 is the bottleneck in the watermarking pipeline, but it is still fast enough to be imperceptible in interactive workflows.
File Size Overhead from Watermarking
Watermark embedding increases the byte size of the plaintext before encryption. The container format adds its own fixed overhead (header, manifest, wrapped DEK).
Per-Layer Size Impact
| Layer | Mechanism | Size Increase | Example (10 KB doc) |
|---|---|---|---|
| L1 (zero-width) | 66-char frame every 40 visible chars | ~5-8% (UTF-8 encoded, 3 bytes per zero-width char) | ~500-800 bytes |
| L2 (whitespace) | 1 trailing byte per modified line | <0.5% (at most 64 extra bytes) | ~64 bytes |
| L3 (synonyms) | Word replacement (same or similar length) | <0.1% (net change near zero) | ~0-10 bytes |
| L3 (punctuation/spelling) | Character-level substitution | <0.1% | ~0-20 bytes |
L1 is the largest contributor to size overhead because zero-width Unicode characters require 3 bytes each in UTF-8 encoding, and frames are inserted frequently. The combined watermark overhead for all layers is typically 5-9% of the original text size. This overhead is present in the plaintext before encryption; the encrypted container adds a fixed overhead of approximately 200-400 bytes (6-byte magic, 2-byte header, manifest JSON, wrapped DEK JSON, 24-byte AEAD nonce, 16-byte Poly1305 tag).
L3 marks produce negligible size change because they replace words with synonyms of similar length, replace punctuation characters with other punctuation characters, or swap between equally-sized spelling variants.
Fingerprint Computation Cost
Content fingerprinting runs once during seal (to store the fingerprint) and once during attribution (to compare against stored fingerprints). Both the winnowing and sentence hashing algorithms are linear in text length.
Winnowing
The winnowing algorithm normalizes the text (lowercase, collapse whitespace, strip non-alphanumeric), computes rolling MD5 hashes over all k-grams (k=10 by default), and selects the minimum hash in each window (W=4 by default). The dominant cost is the rolling hash computation: one MD5 call per k-gram position.
MD5 is fast (approximately 500 MB/s on modern hardware), and each k-gram is only 10 characters. For a 10 KB document (approximately 8,000 normalized characters), winnowing computes approximately 8,000 MD5 hashes of 10-byte inputs. Expected time: on the order of 1-5 milliseconds.
Sentence Hashing
Sentence hashing splits the text on sentence boundaries, extracts content words (length greater than 2) from each sentence, sorts them, and computes a SHA-256 hash of the sorted content. For a 10 KB document with approximately 50 sentences, this requires 50 SHA-256 computations of short inputs. Expected time: well under 1 millisecond.
Similarity Comparison
Fingerprint comparison at attribution time is fast: winnowing similarity is a set intersection/union computation (O(n log n) for sorted sets), and sentence similarity is a set membership check (O(n) with a hash set). For typical fingerprints (100-300 winnowing hashes, 30-100 sentence hashes), comparison takes microseconds.
Comparing one leaked document against N stored fingerprints scales linearly in N. For registries with thousands of sealed documents, this is well within interactive response times. For million-document registries, a locality-sensitive hashing index would be advisable (planned for a future version).
Cross-Language Performance: Python vs Rust
The Python reference implementation prioritizes correctness and readability. The Rust port prioritizes performance and memory safety. Both produce bit-identical output for the same inputs (verified by 3 cross-language conformance tests), but their runtime characteristics differ substantially.
Expected Performance Differences
| Operation | Python (reference) | Rust (port) | Expected Speedup |
|---|---|---|---|
| Seal (no watermark, 10 KB) | ~2-5 ms | ~0.3-0.5 ms | 5-10x |
| Open (10 KB) | ~2-5 ms | ~0.3-0.5 ms | 5-10x |
| L3 semantic embed (10 KB) | ~5-10 ms | ~0.5-1 ms | 10-20x |
| Winnowing fingerprint (10 KB) | ~2-5 ms | ~0.1-0.3 ms | 10-30x |
| Container parsing | ~0.5-1 ms | ~10-50 us | 20-50x |
The Python implementation's performance is adequate for all practical use cases. A 10 KB document seals in under 15 ms with full watermarking. The Rust implementation is faster due to zero-copy parsing, compiled regex, and direct use of RustCrypto primitives without Python's FFI overhead.
The primary bottleneck in the Python implementation is not cryptography (which is
backed by OpenSSL via the cryptography library and libsodium via
PyNaCl) but rather text processing: regex matching for synonym lookup,
string concatenation for watermark frame insertion, and JSON serialization for the
manifest. These operations cross the Python/C boundary less efficiently than pure
Rust string processing.
Memory Usage
Both implementations hold the full plaintext in memory during seal and open. The Python implementation has higher baseline memory usage due to the interpreter and string interning. The Rust implementation uses approximately 2-3x the plaintext size in peak memory (plaintext + watermarked copy + ciphertext buffer). For documents under 100 MB, memory usage is not a practical concern in either implementation.
Registry Query Latency
The attribution registry (FastAPI + SQLite) introduces network latency for two
operations: seal-time registration (POST /register) and attribution-time
queries (POST /attribute, GET /marks).
Registration is optional and occurs after the sealed file is written. The HTTP POST contains the manifest, watermark references, beacon tokens, and optionally the content fingerprint. For a typical registration payload (2-5 KB of JSON), the server-side processing time is dominated by SQLite inserts (under 1 ms with WAL mode). Total latency is determined by network round-trip time.
Attribution queries are similarly lightweight on the server side. The
POST /attribute endpoint performs an indexed SQLite lookup on
mark_id (O(log n) in the number of registered marks). The
GET /marks endpoint returns all known mark_ids, which may be large
for registries with many sealed files. Pagination is advisable for registries
exceeding 10,000 sealed files.
The registry is intentionally simple (no caching layer, no distributed backend) in the current version. For production deployments handling hundreds of concurrent attribution queries, the planned v1.0 Rust/Axum port with SQLx connection pooling would provide substantially higher throughput.
Bottleneck Summary
| Workflow | Bottleneck | Typical Latency |
|---|---|---|
| Seal (no watermark) | X25519 key agreement | <1 ms |
| Seal (with watermark) | L3 semantic embedding (regex passes) | 5-15 ms (10 KB doc) |
| Open | Ed25519 verify + X25519 + AEAD decrypt | <1 ms |
| Inspect | Container parse + JSON deserialize | <1 ms |
| Attribute (L1+L2 only) | Text scanning for zero-width frames | <5 ms |
| Attribute (full pipeline) | L3 verification against N candidates | 5-20 ms per candidate |
| Fingerprint comparison | Winnowing hash computation | 2-10 ms per document pair |
In all cases, the protocol's computational costs are well below interactive latency thresholds (100 ms). Network latency to the registry and timestamp authorities dominates end-to-end seal time for workflows that include registration and RFC 3161 timestamping.
Measured Results Summary (v0.4.4)
The following table summarizes actual measurements from bench_usenix.py
run on the reference hardware described above. Each value is the mean of 10 runs.
| Operation | 1 KB | 10 KB | 100 KB | 1 MB |
|---|---|---|---|---|
| Seal (no watermark) | 297 us | 325 us | 627 us | 4.07 ms |
| Seal (with watermark) | 305 us | 471 us | 2.76 ms | 23.78 ms |
| Open (decrypt + verify) | 272 us | 301 us | 576 us | 3.93 ms |
| L1 embed (zero-width) | 230 us | 1.88 ms | 19.52 ms | 213 ms |
| L2 embed (whitespace) | 21 us | 66 us | 401 us | 3.72 ms |
| L3 embed (semantic) | 1.39 ms | 12.49 ms | 122 ms | 1.21 s |
| Content fingerprint | 3.37 ms | 32.0 ms | 321 ms | 3.35 s |
| L3 verify (correct ID) | 961 us | 9.10 ms | 90.5 ms | 986 ms |
| ECC encode (R=7, 64-bit) | 23.6 us (constant) | |||
| ECC decode (R=7, 64-bit) | 50.8 us (constant) | |||
Peak throughput for seal and open is approximately 253 MB/s at the 1 MB level,
dominated by XChaCha20-Poly1305 AEAD. Watermark embedding adds 484% overhead at
1 MB, with L3 semantic processing (regex-based synonym matching across 151 classes)
accounting for 85% of that cost. Content fingerprinting via winnowing is the most
expensive per-byte operation at 3.35 seconds per megabyte. Full benchmark data and
methodology are in bench_usenix.py and
PERFORMANCE_BENCHMARKS.md in the repository.
Performance measurements from v0.4.4 on Intel Core i7, CPython 3.14.2, Windows 10. Consult the repository for the benchmark script and raw data.