How Qodex works
Qodex turns testing requests into reusable test coverage. You describe what should be tested. Qodex explores the target, creates scenarios, turns them into runnable scripts, and runs those scripts on demand, on a schedule, through webhooks, or during PR review.The basic flow
Every interaction starts in chat. A coordinator agent reads your brief, decides which specialty skills apply, and starts sub-agents for specific testing areas such as functionality, security, exploration, vulnerability checks, or collection analysis. Each sub-agent gets a focused task and an isolated context. That keeps API, UI, and security work from blending together. When the sub-agents finish, their results flow back to the coordinator, which turns them into scenarios, findings, or next steps.Scan types
Qodex supports six scan types, all driven from the same chat:| Scan type | What it does |
|---|---|
| explore | Probe a target, find bugs, propose scenarios |
| run | Execute saved scenarios against an environment |
| import | Parse an OpenAPI spec or Postman collection into endpoints and scenarios |
| fill-coverage | Find endpoints and pages with zero scenarios and propose tests |
| analyze-failure | Classify a failed run as REAL_BUG, STALE_TEST, or ENVIRONMENT_ISSUE |
| regression | Run the full active suite on a schedule or webhook |
Data flow
chat.message, agent.tool_call, agent.tool_result, subagent.spawned, scan.started, scan.finding, and scan.scenario_created. Every LLM call also writes one row to llm_usage_log with provider, model, token counts, latency, outcome, and cost.
Why the cost model is different
Most AI QA tools call an LLM every time a test runs. Qodex uses the LLM when a scenario is created or repaired, then replays the saved script directly. The generated script is standard Playwright or HTTP code, parameterized by environment variables. Nightly regression runs at Playwright or HTTP cost, not OpenAI cost. The gap grows as your test suite grows. For UI scenarios, the intent runner uses a step cache. The first successful run caches each step. Later runs replay from that cache with zero LLM calls unless the page changes. A cache miss triggers a self-heal pass viagpt-5-mini, then updates the cache.
For API scenarios, replay is fully deterministic from the start. No LLM call on a saved scenario.
When to use it
- You ship daily and your QA team is the bottleneck.
- You need regression coverage that grows with the product, not behind it.
- You want continuous security coverage on API endpoints, not annual pentests.
- You want to read, edit, and check in your tests as standard code.
When not to use it
- You need mobile-native testing. Qodex is web only.
- You need a drag-and-drop visual authoring UI. Qodex is chat plus generated code.
- You need air-gapped LLM execution. Self-hosted ships LLM traffic to your chosen provider.
On the roadmap
Related
Operating modes
On-demand, scheduled, and event-driven runs on the same scenarios.
Scenarios
The atomic unit of testing.
Skills
Specialty agents as drop-in
.skill.md files.Memory
Persistent markdown injected into every LLM call.