v0.4.6: SIEM Export for Splunk, Sentinel, and Elastic

The roadmap lists SIEM integration ahead of SOC 2 because enterprise evaluators only take provenance telemetry seriously once it lands in the same pane of glass as the rest of their incident data. A beacon callback that lives in a vendor sidecar is evidence nobody reads. v0.4.6 ships what that integration actually requires: a normalized event model, three schema-stable formatters, and a CLI that reads the registry database without taking a write lock.

One model, three shapes

The registry's events table already records every beacon callback with the fields a SIEM cares about: kind (dns, http_img, ocsp, license), source IP, user agent, the issuer/recipient/file triplet, and the local transparency-log index. The new oversight_core.siem module lifts a row into an OversightEvent dataclass and hands it to one of three formatters. to_splunk_hec() returns a Splunk HEC envelope with time as epoch seconds and a fields block that indexed extractions can key off. to_ecs() returns an Elastic Common Schema 8.x document, with event.dataset set to oversight.beacon so the Elastic Security app renders the events without custom mappings. to_sentinel() returns a flat record ready for the Log Analytics Data Collector API, where nested fields become dynamic columns that KQL has to dereference at query time.

Formatters are pure. They do not perform network I/O, they do not open the database, and they do not invent fields that are not in the source row. Every optional field is dropped from the output when the source value is empty rather than emitted as null, because SIEM dashboards that treat null as a legitimate value will produce bad alerts on quiet registries.

Transport is a thin sink layer

I do not want Oversight carrying SIEM credentials by default. Splunk HEC tokens, Sentinel shared keys, and Elastic API keys are the kind of secret that silently rots into a postmortem. The sink layer ships a FileSink that writes JSON lines, a StdoutSink that writes to the console, and an HTTPJSONSink that POSTs to a generic endpoint. The recommended path is the first two: let the Splunk Universal Forwarder, Azure Monitor Agent, or Filebeat watch a file, and keep the SIEM credential in the tool that was designed to hold it. The HTTP sink is there for operators who have already decided that an Oversight-side credential is the right tradeoff.

Sentinel is the exception to the "operator holds the credential" rule, because the Data Collector API requires an HMAC-SHA256 Authorization header signed over a canonical string that includes the content length and an RFC 1123 date. sentinel_authorization() implements the signing recipe against the exact inputs Microsoft documents, and the unit tests pin the output so a library user can treat it as a reference. Live POSTs still need a caller that mints the date, serializes the body, and wires both into the request, because any wrapper that computes those for you will eventually drift out of spec.

Read-only iteration against a live registry

iter_registry_events() opens the registry database with PRAGMA query_only=ON and a SQLite URI in read-only mode. That matters because the intended deployment is "run the export job on the same host as the registry on a cron." If the export took a write lock, a busy registry would block beacon ingests while the SIEM forwarder caught up, which is the wrong direction for a security tool. The CLI supports --since and --limit so the cron job advances a watermark rather than re-exporting every row every cycle.

What this is not

v0.4.6 is not a DLP product and it does not try to be. The formatters do not attempt sentiment, severity, or user-risk scoring; those belong in the SIEM's correlation engine or in the customer's existing identity graph. The documentation is explicit that absence of a beacon is not evidence of no leak. Corporate egress filtering, air-gapped readers, and sandboxed previews all suppress beacon traffic. Dashboards that alert on the absence of beacons need a baseline and an explicit policy, not a rule that fires the moment a single beacon is missed.

The point of this release is simple. A provenance protocol whose output never reaches the desk of the person investigating the leak is a protocol whose output is decorative. Beacon events now ship in the native format of the three SIEMs a regulated-industry customer is likely to have already paid for, with the minimum honest framing around what the telemetry proves and what it does not.