Skip to content

flake: TestIntegration (enterprise/aibridged) on Windows #1167

@flake-investigator

Description

@flake-investigator

CI Run Link: https://github.com/coder/coder/actions/runs/19846816488

Commit: 645da33767057ff0585703510d76bd8cb8030f8d by Ethan (2025-12-01T23:53:36Z)
Commit message: "test: fix TestDescCacheTimestampUpdate flake (#20975)" (Windows time resolution context likely relevant)

Failing Job: nightly-gauntlet / test-go-pg (windows-2022)
Completed at: 2025-12-02T04:25:47Z (same minute as Slack alert)
Run attempt: 1

Failure Evidence:

  • From job logs:
    • aibridged_integration_test.go:237: Error: Should be true
    • Test: TestIntegration
    • DONE 12826 tests, 138 skipped, 1 failure in 493.731s

Suspected Failing Assertion (from enterprise/aibridged/aibridged_integration_test.go around line 237 at commit 645da33):

  • require.True(t, intc0.StartedAt.Before(intc0.EndedAt.Time))
  • require.Less(t, intc0.EndedAt.Time.Sub(intc0.StartedAt), 5*time.Second)

Analysis:

  • Root cause classification: Flaky test
  • The failure presents as a strict time ordering assertion in TestIntegration. On Windows, time.Now() has ~15.6ms resolution; closely spaced events can share the same timestamp, making StartedAt.Before(EndedAt) false even when they occur effectively simultaneously. A recent main-branch commit fixed a similar Windows-only flake by injecting a mock clock (see commit 645da33 message).
  • No signs of data race, panic, OOM, or infrastructure failure in this job.
  • MacOS matrix job was cancelled due to this failure (expected matrix behavior; not causal).

Duplicates Search (coder/internal):

  • Queries: "TestIntegration", "aibridged_integration_test.go", "aibridged", "use of closed network connection", "InmemoryListener is already closed" (last 30 days, open and closed)
  • Result: No existing issue for this test flake.

Assignment Analysis:

  • Primary blame by function lines was not possible via available tooling.
  • Secondary: Recent contributors to enterprise/aibridged and this test file include Danny Kopping (aibridged ownership and related test/metrics work: #20865, graduation PR #20522).
  • Assigning to component owner for triage and fix.

Proposed Fix Direction:

  • Make the time ordering assertion tolerant of identical timestamps on Windows, e.g.:
    • allow StartedAt.Equal(EndedAt), or
    • assert non-negative duration and an upper bound (<= 5s), or
    • inject a test clock and advance between start/end to make deterministic.

Reproduction Hints:

  • Re-run nightly-gauntlet on Windows runner.
  • Focus on enterprise/aibridged TestIntegration; failures are intermittent.

Related Context:

  • Similar Windows flake recently addressed in another test by injecting quartz.Clock (commit 645da33). The same approach likely applies here.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions