If it takes multiple attempts to merge a PR (without pushing any commits in between attempts), the bot should detect the test that failed the first N times, and open an issue.
Note that this is distinct from handling flaky tests within a PR that ultimately succeeds (i.e. tests that 'cargo nextest' retries that eventually succeed)