Notes

Ward's working journal.

Long-form technical writeups, methodology specifications, and benchmark results. Permanent record of how Ward measures itself.

  • 2026-05-13 · public benchmark writeup
    Ward leads the tested off-the-shelf scanners on Rust unsafe-class vulnerability detection

    Headline writeup of the 80-pair RUSTSEC head-to-head benchmark — among tested off-the-shelf scanner configurations, Ward is the only one that fires in-class true positives at user-facing severity. F1=0.655, MCC=+0.564, McNemar p ≈ 1.46 × 10⁻¹¹ vs Semgrep, CodeQL, Rudra, and cargo-geiger. Internal pre-release benchmark; public reproduction planned.

  • 2026-05-13 · locked methodology spec
    Unsafe-Rust head-to-head benchmark — methodology

    Pre-registered scoring rules, statistical contract, ruleset selection, container image digest, and the exact paired finding-identity policy used to score every tool.

  • 2026-05-13 · full headline + audit
    Unsafe-Rust head-to-head benchmark — results

    Paired confusion matrix, per-class breakdown, McNemar pairings, bootstrap CIs, and the audit section (Rudra rule-id mapping fix + stdout-parser patch verifying the headline ranking is robust).

  • 2026-05-13 · fairness check
    Unsafe-Rust head-to-head benchmark — max-breadth auxiliary results

    Re-ran the competitor set on their broadest publicly available rulesets (Semgrep across 1,079 rules; CodeQL on rust-security-and-quality.qls; Rudra with parser fix). The ranking does not change.