The roadmap lists SIEM integration ahead of SOC 2 because enterprise evaluators only take provenance telemetry seriously once it lands in the same pane of glass as the rest of their incident data. A beacon callback that lives in a vendor sidecar is evidence nobody reads. v0.4.6 ships what that integration actually requires: a normalized event model, three schema-stable formatters, and a CLI that reads the registry database without taking a write lock.
One model, three shapes
The registry's events table already records every beacon callback
with the fields a SIEM cares about: kind (dns, http_img,
ocsp, license), source IP, user agent, the
issuer/recipient/file triplet, and the local transparency-log index. The new
oversight_core.siem module lifts a row into an OversightEvent
dataclass and hands it to one of three formatters. to_splunk_hec()
returns a Splunk HEC envelope with time as epoch seconds and a
fields block that indexed extractions can key off. to_ecs()
returns an Elastic Common Schema 8.x document, with event.dataset
set to oversight.beacon so the Elastic Security app renders the
events without custom mappings. to_sentinel() returns a flat
record ready for the Log Analytics Data Collector API, where nested fields
become dynamic columns that KQL has to dereference at query time.
Formatters are pure. They do not perform network I/O, they do not open the
database, and they do not invent fields that are not in the source row. Every
optional field is dropped from the output when the source value is empty
rather than emitted as null, because SIEM dashboards that treat
null as a legitimate value will produce bad alerts on quiet
registries.
Transport is a thin sink layer
I do not want Oversight carrying SIEM credentials by default. Splunk HEC tokens,
Sentinel shared keys, and Elastic API keys are the kind of secret that silently
rots into a postmortem. The sink layer ships a FileSink that writes
JSON lines, a StdoutSink that writes to the console, and an
HTTPJSONSink that POSTs to a generic endpoint. The recommended
path is the first two: let the Splunk Universal Forwarder, Azure Monitor Agent,
or Filebeat watch a file, and keep the SIEM credential in the tool that was
designed to hold it. The HTTP sink is there for operators who have already
decided that an Oversight-side credential is the right tradeoff.
Sentinel is the exception to the "operator holds the credential" rule, because
the Data Collector API requires an HMAC-SHA256 Authorization
header signed over a canonical string that includes the content length and an
RFC 1123 date. sentinel_authorization() implements the signing
recipe against the exact inputs Microsoft documents, and the unit tests pin
the output so a library user can treat it as a reference. Live POSTs still
need a caller that mints the date, serializes the body, and wires both into
the request, because any wrapper that computes those for you will eventually
drift out of spec.
Read-only iteration against a live registry
iter_registry_events() opens the registry database with
PRAGMA query_only=ON and a SQLite URI in read-only mode. That
matters because the intended deployment is "run the export job on the same
host as the registry on a cron." If the export took a write lock, a busy
registry would block beacon ingests while the SIEM forwarder caught up,
which is the wrong direction for a security tool. The CLI supports
--since and --limit so the cron job advances a
watermark rather than re-exporting every row every cycle.
What this is not
v0.4.6 is not a DLP product and it does not try to be. The formatters do not attempt sentiment, severity, or user-risk scoring; those belong in the SIEM's correlation engine or in the customer's existing identity graph. The documentation is explicit that absence of a beacon is not evidence of no leak. Corporate egress filtering, air-gapped readers, and sandboxed previews all suppress beacon traffic. Dashboards that alert on the absence of beacons need a baseline and an explicit policy, not a rule that fires the moment a single beacon is missed.
The point of this release is simple. A provenance protocol whose output never reaches the desk of the person investigating the leak is a protocol whose output is decorative. Beacon events now ship in the native format of the three SIEMs a regulated-industry customer is likely to have already paid for, with the minimum honest framing around what the telemetry proves and what it does not.