A Year of Hardening

2026-05-30 · Zion Boggan · ~9 min read

It is my birthday, so I let myself do the thing I usually avoid: stop and look back. Not at features. At the security round that sits underneath them. A birthday is a year marker, and the honest way to mark a year on a project like this is to ask whether it lies less than it did before. Oversight is a protocol for proving where data came from and noticing when it leaks. If it cannot be trusted when storage, parsing, logs, and release pipelines go sideways, then the diagram on the homepage is decoration. The work this round was about removing decoration.

This note collects that round in one place: what changed across the protocol core and the mobile verifier, and why each piece mattered. A lot of it is unglamorous. That is the point. A security protocol earns trust by refusing to fail quietly, and most of the engineering that buys that refusal is small, specific, and boring.

The parser stopped trusting its own arithmetic

The sealed-container format is the first thing an attacker touches. Every byte of a bundle is hostile until proven otherwise, and the code that walks those bytes has to assume the length fields are lies. Two fixes this round came from exactly that posture.

Earlier in the round the ciphertext size ceiling, MAX_CIPHERTEXT_BYTES, was written as a literal that overflowed at const-eval time on 32-bit targets, which quietly blocked 32-bit Android and iOS builds. That got split by pointer width so the bound is 4 GiB on 64-bit and capped at usize::MAX on 32-bit, where it lands just under 4 GiB anyway. The sealed-parser follow-up tightened the surrounding length checks in the same spirit.

The most recent fix is the one I like best, because it is the kind of bug that hides in a function everyone has already read. The container reader advanced its cursor with *at + n and sliced with &buf[*at..*at + n], where n comes straight from an attacker-controlled u32 length. On a 64-bit host that addition cannot overflow, so the code looked correct under every test that ran on a normal machine. On a 32-bit target it can wrap, and a wrapped end offset slips past the bounds check and then panics on the slice. The mobile verifier ships a 32-bit ARM build, and it parses bundles handed to it by other people. So this was a real crash path reachable from a malicious file on a real device, invisible on the machine where the tests pass. The reader now computes the end offset with checked addition and treats a wrap as truncation, which is what a wrap is. Valid bundles behave exactly as before, the seventeen container tests still pass, and the cross-language conformance run still shows Python and Rust producing byte-identical output.

That last sentence is the whole discipline in one line. A parser fix that changes the wire format is not a fix, it is a new bug with a confident commit message. Conformance is the gate.

The registry learned to fail closed

The Rust registry was the long thread of the round. It moved from serving the v1 surface to surviving the operational reality of an evidence service: migration from the original Python registry, corruption, and partial writes. The shape that emerged has three interlocking rules.

First, append before store. Register, HTTP beacon, DNS beacon, OCSP-style beacon, and license-style beacon writes now fail if the local transparency log cannot append the matching leaf. Before that, a service could write a database row while failing to write its audit leaf, which is a split-brain evidence state: a row that looks real and cannot be proven. Returning an internal error is the correct outcome.

Second, rows must point at matching leaves. The database validator checks the relationships that matter after a migration, then goes further than existence: an event row's transparency-log index must point at a leaf whose payload actually matches that row. An in-range index is not enough. If row 24 claims a beacon fired for token A but leaf 24 records a beacon for token B, the database is not clean, whatever the cause. The validator compares event kind, token, file and recipient bindings, source IP, user agent, timestamp, and the DNS sidecar fields against the indexed leaf.

Third, recovered logs must verify against themselves. The transparency log used to recover permissively at startup, skipping malformed lines or bad hashes. That is the wrong posture for an append-only log. Recovery now rejects malformed records, non-contiguous indexes, bad hash lengths, and leaf-hash mismatches. New leaf records carry leaf_data_hex, the exact bytes used to compute SHA-256(0x00 || leaf_bytes), so a monitor can recompute the RFC 6962 leaf hash from exact bytes instead of trusting a lossy display field. The range reads on top of the log were hardened too, and the error envelopes across the registry were aligned so that a client cannot read internal structure out of the way a request failed.

Operator authentication on the write side was brought into parity between the Python and Rust paths during the migration work, so the write surface enforces the same token rule regardless of which backend is live.

The container and the dependency surface

Two smaller protocol-side fixes closed out the round. The registry Docker image ran as root; a process that handles signed evidence and listens on a port should not. It now runs as an unprivileged user that owns its data volume, so a registry compromise lands without root inside the container. And the DNS sidecar imported dnslib without declaring it, which is the sort of gap that works on the machine where it was written and fails on a clean install. It now has a declared optional-dependency group.

Dependencies got attention at the edges as well. The rustls-webpki bump pulled in an upstream security fix on the TLS verification path, and the mobile Rust core is pinned to a specific Oversight release tag rather than a floating branch, so the verifier on a phone is built from exactly the crates that were audited under that tag.

The mobile verifier got privacy guardrails

The mobile repo changed in ways that are deliberately invisible to a user. That is the correct kind of invisible. Recent verification history is session-only now, and the app clears the old persisted history key on boot and after verification. Issuer IDs, filenames, and content-hash summaries should not sit in a phone backup just because someone verified a bundle once.

The verifier's privacy claim is narrow and I want it to stay narrow: no accounts, no telemetry, no server-side verification requirement, all checking done on the device. A claim like that drifts the moment a well-meaning dependency adds an analytics SDK. So CI now runs scripts/privacy_guard.py on both mobile workflows, and it fails the build if a release manifest requests network, location, microphone, or contacts access, or if a telemetry or crash SDK appears in pubspec.yaml. The claim is now enforced by the pipeline, not by my memory.

Build provenance moved in the same round. Android and iOS CI emit an oversight-mobile-build-manifest.json next to each artifact, recording commit, ref, lockfile hashes, toolchain settings, and the artifact paths, sizes, and SHA-256 hashes, and a verification step checks that manifest in CI. It is not byte-for-byte reproducible builds yet. It is the bridge artifact until that lands, and it means a release carries a signed-in-spirit record of what produced it.

Secrets stopped lingering on the build runners

Signing material is the highest-value secret a mobile project handles, because a stolen signing key lets someone publish as you. The iOS job already locked its App Store key down with a tight umask, explicit chmod 600, and a scrub step. The Android job did not. On a runner with a permissive umask, the decoded upload keystore and the plaintext key.properties holding the store and key passwords were group and world readable, and both files were left on disk after the build. The Android job now matches the iOS posture: restrictive permissions on the keystore, the keystore directory, and the properties file, and an always-run step that removes both when the build finishes. The window where signing material sits readable on a shared runner is closed.

Opsec of the source itself

One thread this round was about the repository rather than the running code. An opsec scanner, a pre-commit hook, and a CI workflow now look for private IPs, workspace paths, container IDs, and credential patterns before anything reaches the public tree, and a source comment-style check keeps the committed code from carrying the kind of prose that fingerprints how and where it was written. A protocol whose whole value is provenance should not leak its own operational provenance through careless commits.

Where this leaves things

The tagged release remains v0.4.11, the hardware-key line that runs across the Python reference, the Rust core, and the browser inspector. Everything in this note lives on main after that tag, pointed at the same gate I have been describing for weeks: registry burn-in, then a wire-format stability statement, then a v1.0 only when the implementation has earned it rather than when the calendar suggests it.

If there is a theme to a year of this, it is that the interesting work was almost never the feature. It was the bounds check that only fails on the architecture nobody tests on, the database row that points at the wrong leaf, the password file with the wrong permissions, the history list that quietly survives a reboot. None of it demos well. All of it is the difference between a protocol that claims to be trustworthy and one that behaves that way when something breaks. That is a decent thing to have spent a year on, and a good thing to spend a birthday writing down.

Relevant public references: the protocol repository at oversight-protocol/oversight, the mobile verifier at oversight-protocol/oversight-mobile, docs/REGISTRY_DEPLOYMENT.md, docs/spec/registry-v1.md, and docs/ROADMAP.md.