Replacing the Custom Merkle Log with Sigstore Rekor v2

Oversight v0.4.1 ships with its own transparency log: an RFC 6962 Merkle tree implementation that records every seal event with signed tree heads and inclusion proofs. It works. The implementation is correct (after fixing the left-heavy split bug in v0.4.0). But it has a fundamental problem that I cannot engineer around: nobody trusts my log server.

When I present a Merkle inclusion proof, an auditor has to trust that my log operator did not tamper with the tree. Consistency proofs help detect tampering after the fact, but only if the auditor was already watching. A newly arriving auditor has no baseline to compare against. They are trusting that I operated honestly from day one, and there is no independent evidence for that claim.

Sigstore's Rekor v2, which went GA in October 2025, solves this. Rekor is a publicly operated, community-monitored transparency log. Multiple independent monitors watch the log continuously. The signed tree heads are distributed via TUF (The Update Framework), a root-of-trust system designed to survive key compromise. When an Oversight seal event is recorded in Rekor, any auditor can verify its inclusion using standard Sigstore tooling without installing Oversight or trusting my infrastructure.

Why DSSE, not hashedrekord

Rekor v2 accepts two entry types: hashedrekord and dsse. A hashedrekord entry proves that a specific key signed a specific digest. That is not what Oversight needs to record. An Oversight registration event is a structured attestation: issuer K asserts that mark_id M maps to file_id F with content_hash H, bound to recipient R, under suite S, with policy constraints P. This is a statement about relationships, not a bare signature over a hash.

DSSE (Dead Simple Signing Envelope) wraps an in-toto v1 statement with a typed predicate. The predicate carries the full registration metadata. The envelope is signed by the issuer's Ed25519 key, the same key that already signs the seal manifest. No new key material is required. No Fulcio or OIDC integration is needed; Oversight uses Rekor's self-managed key mode.

The predicate type is pinned to a git-tagged URI in the Oversight repository so it resolves to documentation and cannot be squatted by a third party. The payload looks like this:

{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [{
    "name": "mark:<mark_id>",
    "digest": {"sha256": "<content_hash>"}
  }],
  "predicateType": "https://github.com/oversight-protocol/oversight/blob/v0.5.0/docs/predicates/registration-v1.md",
  "predicate": {
    "predicate_version": 1,
    "file_id": "<uuid>",
    "issuer_pubkey_ed25519": "<base64>",
    "recipient_pubkey_sha256": "<hex>",
    "suite": "OSGT-CLASSIC-v1",
    "watermarks": {"L1": true, "L2": true, "L3": true},
    "registered_at": "<iso8601>"
  }
}

One critical privacy decision: the predicate carries a SHA-256 hash of the recipient's X25519 public key, not the raw key. Without this, anyone watching the public Rekor log could enumerate recipients by public key or correlate marks across different issuers. The raw recipient key stays in the local .sealed bundle, which only the issuer and recipient possess.

What changes in the bundle format

The evidence bundle (the metadata sidecar that accompanies each sealed file) gains a new tlog_kind field. Pre-v0.5 bundles omit this field or set it to oversight-self-merkle-v1, and verifiers route to the existing tlog.py Merkle tree code. New bundles set tlog_kind to rekor-v2-dsse and include the Rekor-specific fields: the log URL, the base64-encoded TransparencyLogEntry protobuf, the DSSE envelope, the log's public key at write time, and a signed checkpoint (the tree head promoted out of the protobuf so serializers cannot drop it).

A bundle_schema: 2 integer gives pre-v0.5 verifiers a fast rejection path. If a v0.4 client encounters bundle_schema: 2, it fails immediately with "unknown schema version, upgrade" rather than attempting to parse fields it does not understand.

Backward compatibility

Every existing v0.4.1 .sealed file will continue to parse, open, and verify exactly as it does today. The cross-language conformance tests run against v0.4-era fixtures without modification. The local Merkle tree implementation stays in the codebase as a fallback verifier for old bundles; no writes go through it for new seals.

The Python and Rust implementations must agree on the canonical JSON ordering of every new field. JCS (JSON Canonicalization Scheme) already enforces this, but the constraint is worth stating explicitly because a divergence here would break cross-language conformance, which is the hardest invariant to restore after the fact.

What this buys an auditor

Before v0.5, an auditor verifying an Oversight bundle needs Oversight-specific code. They parse the Merkle proof format, check against the self-hosted log's signed tree head, and trust that the log operator (me) did not retroactively insert or remove entries.

After v0.5, an auditor needs only the sigstore-python library (or Go, Java, etc.) and the DSSE envelope bytes from the bundle. They verify the envelope's Ed25519 signature, check the Rekor inclusion proof against the publicly distributed tree head, and confirm that the entry was logged at the claimed time. No Oversight code. No trust in my log infrastructure. The auditor's verification path goes through Sigstore's community-operated trust infrastructure instead.

This matters most in adversarial contexts. If Oversight is used to attribute a leak and the accused recipient challenges the attribution in a legal or compliance proceeding, the transparency log evidence should not depend on the accuser's infrastructure. Rekor gives us a neutral third party with public monitoring.

Implementation scope

The migration touches both implementations. On the Python side: oversight_core/rekor.py builds DSSE envelopes and handles the Rekor v2 upload via a pure-stdlib HTTP client, with no sigstore-python dependency in the write path. A separate auditor_helper.py wraps sigstore-python for the verify side so auditors can install it independently. On the Rust side: a new oversight-rekor crate mirrors the Python module, using tokio for async upload and the sigstore crate for verify.

The registry server (registry/server.py) replaces its inline tlog append call with a Rekor upload. The SQLite event index stays, because Rekor v2 has removed its search indexing API. The registry is now the only place that answers "list all marks for issuer X." This is an intentional architectural split: Rekor provides tamper-evidence, the registry provides searchability.

Three new tests bring the total from 76 to at least 79: an end-to-end Rekor registration test, a backward compatibility test against v0.4 fixtures, and a cross-language Rekor conformance test that uploads from Python and verifies from Rust.

Gotchas I am watching for

Rekor shards rotate approximately every six months. The current shard is log2025-1. When it freezes, a new shard URL replaces it, but the frozen shard stays read-only for verification. Bundles must record the exact log URL they used at write time. Hardcoding the current shard URL would silently break when Sigstore rotates, so the client discovers the active shard via the TUF trusted root.

There is no online inclusion proof API in Rekor v2. The proof is bundled into the TransparencyLogEntry returned at write time. If a verifier is missing a proof, they have to fetch tiles from the log and compute it locally. This is a departure from Rekor v1, where proofs were fetched on demand. Oversight handles this by persisting the full entry (including the proof) in the evidence bundle.

The minimum write timeout is 20 seconds. Rekor v2 is tile-backed and sometimes slow on first writes. The HTTP client configuration in both Python and Rust sets a conservative timeout to avoid premature failures during registration.

Timeline

The migration is planned for three sessions. Session A builds the DSSE envelope construction and unit tests against fixture data, with no network dependency. Session B wires the registry to upload to the public Rekor instance and runs end-to-end tests. Session C builds the Rust crate, runs cross-language conformance, updates the spec, and ships v0.5.0.

v0.4.1 stays frozen as the paper artifact safety net for USENIX Security Cycle 2. If the Rekor integration ships before the submission deadline, it goes in as a stretch contribution. If not, the paper describes the v0.4.1 architecture with the Rekor migration as future work. Either way, the protocol does not lose its auditable transparency log; it gains a stronger one.