VelocAI logoVelocAI Blog
Octopus test workflow

Octopus Should Not Approve Flaky Test Reruns Forever

Published on 2026-06-20 | Topic: Mobile Flaky Test Approval

The first rerun feels harmless. The fifth rerun is just a slot machine with a terminal attached. Octopus is useful here only if the phone approval makes the test state clearer, not if it lets everyone avoid diagnosing the flaky thing.

Useful answer: An Octopus workflow for approving one mobile Codex rerun, then stopping when flaky tests need evidence instead of another hopeful tap.

One Rerun

Approve one mobile rerun when the failure is narrow, the branch is known, and the expected pass condition is clear. Ask Codex to state the test file, failure line, previous command, and what changes if the rerun passes.

Collect Evidence

If the rerun fails again, switch from rerun mode to evidence mode. Capture seed, timing, network call, fixture setup, browser/device version, and whether the failure moved. Data is data, even when the data says your test is annoying.

No Blind Loop

Do not approve a broad test loop from mobile just because the terminal is already there. Repeated failure needs a hypothesis: race condition, shared state, timeout, mock leak, animation wait, or environment drift.

Desktop Return

Return to desktop when the fix needs multi-file changes, dependency updates, visual diff review, or a long log comparison. The phone can keep the thread moving. It should not make you pretend you read 900 lines of CI output on a sidewalk.

Rerun Boundary

  • Approve one rerun only when the test target is narrow.
  • Ask for the failure line, command, branch, and expected pass condition.
  • After a second failure, collect evidence instead of rerunning.
  • Name one hypothesis before approving any new command.
  • Use desktop review for broad logs, screenshots, dependencies, and multi-file fixes.

Quick Checks

When is a mobile test rerun okay?
When it is one bounded rerun with a known target and a clear expected result.

What stops the loop?
A repeated failure, moving failure point, broad command, or missing hypothesis.

How does Octopus help?
It keeps the Codex thread visible and lets the user approve a narrow next step without losing state.

Next Paths