Ward — examples

EXAMPLE 01 · confirmed

aiohttp CVE-2024-23334 — path traversal under follow_symlinks=True.

A real, public CVE in aiohttp's static file route. Default-safe; exploitable only when the application developer opts in to the unsafe parameter. Ward routes this to a config-dependent evidence grade, not a raw "high severity" alert.

model

claude-sonnet-4.6

turns used

11 of 32

wall time

3m 15s

api spend

$0.1866

01

Static finding candidate

Ward's taint analyzer flags a candidate sink.

The scanner traces a request URL's filepath segment into the static-route handler and marks the FileResponse call as a sink. The output is conservative by design — by itself this is a hypothesis worth investigating, nothing more.

ruleTAINT-GENERIC

classpath_traversal

fileaiohttp/web_urldispatcher.py:668

sinkFileResponse(filepath, …)

flowsrc[expect_handler] → sink[fileresponse.arg0]

conf.0.80
02

Investigation opens turn 0

Discoverer agent picks up the candidate.

A pinned model receives the finding alongside Ward's v0.3.2 system prompt — tool catalogue, attachment contracts per evidence grade, and a turn budget. Provider, model snapshot, and generation id are all recorded so the run can be replayed or audited later.

modelanthropic/claude-4.6-sonnet-20260217

promptdiscoverer-system-v0.3.2

backendOpenRouterPinned · zdr=true

gen idgen-1777057578-7UaFDtUkdZcofpI8gHw7
03
Read source turns 2–8

Agent walks the surrounding code.

Three windows in web_urldispatcher.py: the _handle body around the sink, the containment-check that runs before it, and StaticResource.__init__ where the follow_symlinks knob is set.
```
643filepath = self._directory.joinpath(filename).resolve()
644if not self._follow_symlinks:
645    filepath.relative_to(self._directory)   # raises if escaped
…
667elif filepath.is_file():
668    return FileResponse(filepath, chunk_size=self._chunk_size)
```
04

Form hypothesis turn 8

Agent isolates the gating condition.

Default config blocks the traversal. The unsafe parameter removes the containment check. The library is default-safe — the bug only activates under a specific deployment choice.

discoverer · turn 8

"With follow_symlinks=True, the filepath.relative_to(self._directory) check is skipped. A symlink inside the static directory pointing to /etc/passwd would be followed. This is the classic follow_symlinks=True path traversal."
05

Build & run repro 4 attempts

Sandboxed exploit drafted, then iterated against the harness.

A symlink inside the static dir pointing at a secret file outside it, served through an app configured with follow_symlinks=True. The first three runs failed for harness-shape reasons — file served but oracle didn't see the canary. The agent diagnosed each miss and adapted.

turn 9 harness failed to start — top-level web.run_app blocked the import unproven

turn 13 harness ran, file served (200) — oracle didn't see canary in body digest unproven

turn 15 added /check echo handler — canary still not in oracle's view unproven

turn 18 canary inlined in response prose — read of out-of-root file echoed back proved
06
Oracle verdict proven

An independent oracle confirms the read.

The oracle is not the model. It's a separate check — a harness canary planted in a file outside the static root that has to surface in the response body for the verdict to flip proven=true.
oracle response · turn 19
```
{
  "oracle_type": "path_traversal_canary_echo",
  "proven": true,
  "evidence": "harness canary 'WARD_PATH_TRAVERSAL_CANARY' echoed
               in response body — traversal read the out-of-root file",
  "response": { "status": 200, "latency_ms": 0,
                "body_digest": "blake3:903caef4…" }
}
```
07

Submit evidence 3 attachments

Typed attachments filed against the bundle.

The config-dependent grade has a fixed contract: name the knob, pin the exploit source, attach the oracle's verdict. Each artifact has a content digest — a reviewer can verify nothing was edited after the fact.

config-trigger 3aba08de…

Names follow_symlinks, default False, unsafe True, gating site web_urldispatcher.py:529.

repro-test ca0c033e…

Byte-pinned aiohttp app: symlink inside static_dir pointing at a secret outside it, plus a /check handler that exposes the read for the oracle.

repro-result 105c404c…

exit_code 0, proven=true, response 200 with the canary echoed.
08
Claim & bundle turn 31

Discoverer claims; the ledger files the bundle.

Notable: the ledger reports supported_grade: reproduced — the evidence is strong enough for the higher tier — but the discoverer claimed config_dependent because the library is default-safe. Honest grading is part of the contract, not the model's choice.
```
make_claim(outcome="proved", grade_hint="config-dependent", …)

→ bundle: {
    "evidence_grade":   "config_dependent",
    "review_status":    "under_review",
    "supported_grade":  "reproduced",
    "missing_for_next": ["patch + regression_test"],
    "attachments":      3
  }
  claim_confidence_ppm: 900000
```
09

Analyst queue under review

Routed for human sign-off.

reason = config_dependent_approval. The reviewer sees the proof, the trigger, and the full provenance chain. The "fix" here is a deployment decision — audit application configs for follow_symlinks=True — not a library patch.

EXAMPLE 02 · refuted

Sanic cli/app.py:142 — taint reaches import_module, but isn't a vulnerability.

The same scanner flags a CLI-arg-controlled value flowing into import_module() in Sanic's CLI loader. A naïve "any tainted sink" rule would call this a path-traversal hit. Ward investigates, finds the reason it can't actually be exploited, and ships the explanation alongside the dismissal.

model

qwen3-next-80b-thinking

turns used

2 of 32

wall time

91s

api spend

~$0.0025

01

Static finding candidate

Same scanner, same rule, similar shape.

A CLI-arg flowing through to a module loader. Conservative taint analysis says: this looks like a sink. Twenty-four distinct taint paths in this codebase converge on the same line — a strong static signal worth verifying.

ruleTAINT-GENERIC

classpath_traversal

filesanic/cli/app.py:142

sinkimport_module(module_name)

flowsrc[path] → sink[import_module.arg0]

conf.0.70

variants24 distinct paths to same sink
02

Investigation opens turn 0

Different model, same pipeline.

An open-weights model — pinned with the same provenance-bearing backend Sonnet uses. The pipeline isn't tied to one provider, and the run id is enough to go pull the actual transcript.

modelqwen/qwen3-next-80b-a3b-thinking-2509

promptdiscoverer-system-v0.3.2

backendOpenRouterPinned · Alibaba

gen idgen-1777084375-nbyQMafjbeGouwlniX5a
03
Read source turn 1

Agent reads the surrounding window.

One pass at cli/app.py lines 132–152. The shape of the API — module-name resolver, not file opener — is enough to settle the question. No further reading required.
```
135f"   eg. sanic {self.args.module} --simple"
…
142module = import_module(module_name)
143app = getattr(module, app_name, None)
```
04

Identify contract violation turn 2

Sink can't be exercised — wrong grammar.

import_module resolves Python module names — dot-separated identifiers. Path-traversal characters aren't valid module names; the call raises ImportError before any file is touched. The taint flow is real; the sink is not.

discoverer · refutation

"Python module names cannot contain path traversal characters (must be valid identifiers). import_module() requires dot-separated module names (e.g. app.main), and user-provided input results in ImportError for invalid syntax, not actual path traversal."
05

Submit counterexample 1 attachment

A structured refutation, not a flat dismissal.

A reviewer doesn't see "ignored." They see the lines that were considered, the contract that rules them out, and a content digest that pins the artifact for replay.

counterexample 4f3f7a7c…

Label: "False positive — Path traversal misclassification". Cites cli/app.py:142 and cli/app.py:72 as evidence sites; reasoning rests on the import_module grammar.
06
Claim & bundle turn 5

Discoverer claims unproven; ledger files the bundle.

Suppression isn't boolean. Three states track the bundle independently:
- Filed at semantic_only · the lowest grade compatible with an "unproven" claim.
- Evidence supports semantic_plus_counterexample · the structured counterexample is strong enough for a higher tier.
- Awaits promotion on review · a human can lift the grade after confirming the reasoning.
```
make_claim(outcome="unproven", grade_hint="semantic-plus-counterexample", …)

→ bundle: {
    "evidence_grade":   "semantic_only",
    "review_status":    "draft",
    "supported_grade":  "semantic_plus_counterexample",
    "missing_for_next": ["repro_result"],
    "attachments":      1
  }
  claim_confidence_ppm: 900000
```
07

Analyst queue draft

Routed with the reasoning attached.

reason = semantic_only_draft. Next time the same code shape shows up, that knowledge is still there. Ward correctly rejected a misranked candidate, and the audit trail proves it.

Two findings, step by step.

aiohttp CVE-2024-23334 — path traversal under follow_symlinks=True.

Ward's taint analyzer flags a candidate sink.

Discoverer agent picks up the candidate.

Agent walks the surrounding code.

Agent isolates the gating condition.

Sandboxed exploit drafted, then iterated against the harness.

An independent oracle confirms the read.

Typed attachments filed against the bundle.

Discoverer claims; the ledger files the bundle.

Routed for human sign-off.

Sanic cli/app.py:142 — taint reaches import_module, but isn't a vulnerability.

Same scanner, same rule, similar shape.

Different model, same pipeline.

Agent reads the surrounding window.

Sink can't be exercised — wrong grammar.

A structured refutation, not a flat dismissal.

Discoverer claims unproven; ledger files the bundle.

Routed with the reasoning attached.

The point of "evidence-backed" isn't that everything reaches "exploited."