How Qodex works

Qodex turns testing requests into reusable test coverage. You describe what should be tested. Qodex explores the target, creates scenarios, turns them into runnable scripts, and runs those scripts on demand, on a schedule, through webhooks, or during PR review.

The basic flow

Every interaction starts in chat. A coordinator agent reads your brief, decides which specialty skills apply, and starts sub-agents for specific testing areas such as functionality, security, exploration, vulnerability checks, or collection analysis. Each sub-agent gets a focused task and an isolated context. That keeps API, UI, and security work from blending together. When the sub-agents finish, their results flow back to the coordinator, which turns them into scenarios, findings, or next steps.

Scan types

Qodex supports six scan types, all driven from the same chat:

Scan type	What it does
explore	Probe a target, find bugs, propose scenarios
run	Execute saved scenarios against an environment
import	Parse an OpenAPI spec or Postman collection into endpoints and scenarios
fill-coverage	Find endpoints and pages with zero scenarios and propose tests
analyze-failure	Classify a failed run as REAL_BUG, STALE_TEST, or ENVIRONMENT_ISSUE
regression	Run the full active suite on a schedule or webhook

Data flow

Chat
  -> Coordinator agent
       -> Sub-agents (functionality, security, vulnerability, ...)
            -> Scenarios (draft, then human-promoted to active)
                 -> Scripts (Playwright .spec.ts, HTTP .test.ts)
                      -> Runs (per-environment, per-trigger)
                           -> Findings (severity, evidence, classification)

Every step is observable. The WebSocket stream emits events such as chat.message, agent.tool_call, agent.tool_result, subagent.spawned, scan.started, scan.finding, and scan.scenario_created. Every LLM call also writes one row to llm_usage_log with provider, model, token counts, latency, outcome, and cost.

Why the cost model is different

Most AI QA tools call an LLM every time a test runs. Qodex uses the LLM when a scenario is created or repaired, then replays the saved script directly. The generated script is standard Playwright or HTTP code, parameterized by environment variables. Nightly regression runs at Playwright or HTTP cost, not OpenAI cost. The gap grows as your test suite grows. For UI scenarios, the intent runner uses a step cache. The first successful run caches each step. Later runs replay from that cache with zero LLM calls unless the page changes. A cache miss triggers a self-heal pass via gpt-5-mini, then updates the cache. For API scenarios, replay is fully deterministic from the start. No LLM call on a saved scenario.

When to use it

You ship daily and your QA team is the bottleneck.
You need regression coverage that grows with the product, not behind it.
You want continuous security coverage on API endpoints, not annual pentests.
You want to read, edit, and check in your tests as standard code.

When not to use it

You need mobile-native testing. Qodex is web only.
You need a drag-and-drop visual authoring UI. Qodex is chat plus generated code.
You need air-gapped LLM execution. Self-hosted ships LLM traffic to your chosen provider.

On the roadmap

Planned: worker fleet separation moves long-running scans off the web process onto a queue-backed worker pool. See TESTING-PLATFORM-ARCHITECTURE.md.

Planned: performance skill (Lighthouse, load tests, memory leak detection) and accessibility skill (axe-core WCAG 2.1 AA). The resolver references them; skill files are not yet shipped.

Planned intelligence track: self-critique on scenario save, reflection pass after scan completion, findings-aware generation via endpoint_brief and page_brief, flaky detection with rolling-window pattern analysis, skill routing feedback loop, sub-agent synthesis upgrade. See product.md roadmap.

Operating modes

On-demand, scheduled, and event-driven runs on the same scenarios.

Scenarios

The atomic unit of testing.

Skills

Specialty agents as drop-in .skill.md files.

Memory

Persistent markdown injected into every LLM call.

Get started

Concepts

PR review

API testing

UI testing

Security testing

Run tests

Findings

Memory

Skills

Integrations

Self-hosted

Account

How Qodex works

How Qodex works

The basic flow

Scan types

Data flow

Why the cost model is different

When to use it

When not to use it

On the roadmap

Operating modes

Scenarios

Skills

Memory

​How Qodex works

​The basic flow

​Scan types

​Data flow

​Why the cost model is different

​When to use it

​When not to use it

​On the roadmap

​Related

Operating modes

Scenarios

Skills

Memory

How Qodex works

The basic flow

Scan types

Data flow

Why the cost model is different

When to use it

When not to use it

On the roadmap

Related