Skip to main content

How Qodex works

Qodex turns testing requests into reusable test coverage. You describe what should be tested. Qodex explores the target, creates scenarios, turns them into runnable scripts, and runs those scripts on demand, on a schedule, through webhooks, or during PR review.

The basic flow

Every interaction starts in chat. A coordinator agent reads your brief, decides which specialty skills apply, and starts sub-agents for specific testing areas such as functionality, security, exploration, vulnerability checks, or collection analysis. Each sub-agent gets a focused task and an isolated context. That keeps API, UI, and security work from blending together. When the sub-agents finish, their results flow back to the coordinator, which turns them into scenarios, findings, or next steps.

Scan types

Qodex supports six scan types, all driven from the same chat:
Scan typeWhat it does
exploreProbe a target, find bugs, propose scenarios
runExecute saved scenarios against an environment
importParse an OpenAPI spec or Postman collection into endpoints and scenarios
fill-coverageFind endpoints and pages with zero scenarios and propose tests
analyze-failureClassify a failed run as REAL_BUG, STALE_TEST, or ENVIRONMENT_ISSUE
regressionRun the full active suite on a schedule or webhook

Data flow

Chat
  -> Coordinator agent
       -> Sub-agents (functionality, security, vulnerability, ...)
            -> Scenarios (draft, then human-promoted to active)
                 -> Scripts (Playwright .spec.ts, HTTP .test.ts)
                      -> Runs (per-environment, per-trigger)
                           -> Findings (severity, evidence, classification)
Every step is observable. The WebSocket stream emits events such as chat.message, agent.tool_call, agent.tool_result, subagent.spawned, scan.started, scan.finding, and scan.scenario_created. Every LLM call also writes one row to llm_usage_log with provider, model, token counts, latency, outcome, and cost.

Why the cost model is different

Most AI QA tools call an LLM every time a test runs. Qodex uses the LLM when a scenario is created or repaired, then replays the saved script directly. The generated script is standard Playwright or HTTP code, parameterized by environment variables. Nightly regression runs at Playwright or HTTP cost, not OpenAI cost. The gap grows as your test suite grows. For UI scenarios, the intent runner uses a step cache. The first successful run caches each step. Later runs replay from that cache with zero LLM calls unless the page changes. A cache miss triggers a self-heal pass via gpt-5-mini, then updates the cache. For API scenarios, replay is fully deterministic from the start. No LLM call on a saved scenario.

When to use it

  • You ship daily and your QA team is the bottleneck.
  • You need regression coverage that grows with the product, not behind it.
  • You want continuous security coverage on API endpoints, not annual pentests.
  • You want to read, edit, and check in your tests as standard code.

When not to use it

  • You need mobile-native testing. Qodex is web only.
  • You need a drag-and-drop visual authoring UI. Qodex is chat plus generated code.
  • You need air-gapped LLM execution. Self-hosted ships LLM traffic to your chosen provider.

On the roadmap

Planned: worker fleet separation moves long-running scans off the web process onto a queue-backed worker pool. See TESTING-PLATFORM-ARCHITECTURE.md.
Planned: performance skill (Lighthouse, load tests, memory leak detection) and accessibility skill (axe-core WCAG 2.1 AA). The resolver references them; skill files are not yet shipped.
Planned intelligence track: self-critique on scenario save, reflection pass after scan completion, findings-aware generation via endpoint_brief and page_brief, flaky detection with rolling-window pattern analysis, skill routing feedback loop, sub-agent synthesis upgrade. See product.md roadmap.

Operating modes

On-demand, scheduled, and event-driven runs on the same scenarios.

Scenarios

The atomic unit of testing.

Skills

Specialty agents as drop-in .skill.md files.

Memory

Persistent markdown injected into every LLM call.