Findings
Findings are confirmed bugs, failures, or vulnerabilities that Qodex records with evidence. A failed run should not always become a bug report. Qodex first decides whether the failure is a real product issue, a stale test, or an environment problem.What happens when a test fails
When a scenario fails or a security probe lands, Qodex does more than show a red X. It analyzes the failure, classifies it, and writes a finding only when the failure looks like a real issue. A finding includes severity, reproduction steps, evidence, and the affected endpoint or page. Qodex deduplicates findings against the existing set so the same bug does not pile up across nightly runs.Severity model
Five levels, with explicit definitions enforced by the security skill:| Severity | What it means |
|---|---|
| critical | RCE, SQLi with data access, auth bypass to admin, SSRF to cloud metadata, exposed secrets |
| high | Stored XSS, IDOR with data exposure, CSRF on account actions, privilege escalation, broken access control |
| medium | Reflected XSS, CSRF on low-impact actions, info disclosure, missing rate limiting |
| low | Missing security headers, verbose error messages, cookie flags, clickjacking without sensitive actions |
| info | Technology disclosure, attack surface notes, deprecated TLS, version numbers |
Failure classification
Every failed run goes throughsrc/scanner/failure-analyzer.ts. The classifier reads the failed script, the error and stack, the page screenshot, the DOM snapshot, the original scenario, and the HTTP response. It emits one of three classifications:
| Class | Meaning | Action |
|---|---|---|
| REAL_BUG | The app broke | Open a finding with severity, evidence, repro |
| STALE_TEST | Selectors or expectations no longer match | Mark scenario stale, suggest a fix |
| ENVIRONMENT_ISSUE | Target down, 503, DNS failure | Report as env, not bug |
Deduplication
Dedup happens inline insidefinding_report. Before persisting a new finding, the tool computes a fingerprint from the affected endpoint or page, an error signature, severity, and category. If a matching open finding exists, the new occurrence is recorded as a re-observation on the existing row rather than a duplicate. The matching logic lives in findOpenByFingerprint and recordFindingReobservation.
Evidence guard
Thefinding_report tool refuses to file a high or critical security finding unless evidence is present. Specifically, a recent browser_snapshot must follow a failed wait_visible or verify_* call. This is the guard that prevents the agent from inventing severity-inflated findings without proof.
The guard runs at report time. It does not write a persisted verified flag onto the finding row.
Status lifecycle
Findings carry one of four statuses:What evidence includes
Every finding ships with:- The exact HTTP request that triggered the failure, redacted
- The response that proves the vulnerability or bug
- A screenshot (UI) or response snippet (API)
- Reproduction steps a human can follow without the agent
- For security findings, the OWASP category (for example,
A01:2021 Broken Access Control)
When to use it
- Promote any agent-classified REAL_BUG to a tracked finding for the team
- File a finding when a security scenario fails, since pass means blocked and fail means vulnerable
- Triage findings in batch through the Findings page or via the API
When not to use it
- STALE_TEST classifications. Those are scenario maintenance, not bugs. Use the scenario triage path instead.
- ENVIRONMENT_ISSUE classifications. Surface those to the team that owns the environment.
On the roadmap
Related
Findings reference
The deeper reference and data model.
Failure classification
REAL_BUG vs STALE_TEST vs ENVIRONMENT_ISSUE in depth.
Triage workflow
The status lifecycle and evidence model.
Security testing
Where the inverted-semantics finding rule lives.