Skip to content

OBSINTA-1290: Incident Tests Stability#887

Open
DavidRajnoha wants to merge 7 commits intoopenshift:mainfrom
DavidRajnoha:test/OBSINTA-1290-incident-test-stability
Open

OBSINTA-1290: Incident Tests Stability#887
DavidRajnoha wants to merge 7 commits intoopenshift:mainfrom
DavidRajnoha:test/OBSINTA-1290-incident-test-stability

Conversation

@DavidRajnoha
Copy link
Copy Markdown
Contributor

@DavidRajnoha DavidRajnoha commented Apr 16, 2026

Testsuite stability improvement generated by POC agentic loop. Looking for coderabbit feedback.

  • OOM prevention in search loop: cy.reload() to free browser DOM, _quietSearch to suppress
    Cypress snapshot accumulation, and a 35-min hard timeout safety net against infinite hangs

  • Plugin warm-up: warmUpForPlugin() polls for the Incidents tab up to 3 min, fixing flaky
    reg/02–05 specs caused by async plugin registration after session restoration

  • Test hygiene: remove cy.pause() debug calls, add missing fixture

Summary by CodeRabbit

  • Tests

    • Improved incident test polling with explicit interval timing, OOM-safe polling, and a hard timeout to prevent infinite waits.
    • Added plugin/page warm-up steps across multiple suites and a reset for incident search timeout to improve readiness and isolation.
    • Removed manual pause points to speed full test runs.
    • Reduced noisy logging/snapshots and improved incident chart traversal/search reliability.
    • Added a new incident scenario fixture for broader coverage.
  • Documentation

    • Added a Test Stability Ledger summarizing per-test stability metrics.

Claude Agent and others added 4 commits April 16, 2026 10:39
Three layered defenses against Chrome/Node.js OOM (exit 137) and
infinite hangs in findIncidentWithAlert's retry loop:

1. cy.reload() at start of each search iteration — releases browser
   DOM from previous iteration, preventing browser-side accumulation.

2. _quietSearch/_qLog() pattern — suppresses Cypress command logging
   (and DOM snapshot serialization, ~1-5 MB each) during search.
   Without this, ~40 commands * 15+ iterations = 600+ snapshots OOM.

3. Hard timeout safety net (35 min) — Date.now()-based kill switch
   that fires if cy.waitUntil's timeout breaks due to cy.reload()
   interfering with the Cypress command queue.

Also adds warmUpForPlugin() page object method for plugin loading
race condition (used by subsequent commit).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the simple waitUntil loop with the OOM-protected
findIncidentWithAlert — it uses cy.reload(), _quietSearch, and
hard timeout to safely poll for up to 30 minutes without OOM.

The previous approach accumulated Cypress command snapshots and
browser DOM across 15+ retry cycles, causing exit 137 in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The monitoring-console-plugin registers the Incidents tab extension
asynchronously after Cypress session restoration. Without warm-up,
the tab may not be registered before beforeEach() tries to navigate,
causing flaky failures in reg/02-05 specs.

Uses warmUpForPlugin() (added in previous commit) which polls for
the Incidents tab via jQuery check for up to 3 minutes — replacing
the fragile goTo() that relied on the 80s default command timeout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove 3 cy.pause() debug calls from the tooltip boundary test
  (02.reg_ui_tooltip_boundary_times) that block automated runs.
  The @xfail tag is already excluded by CI test commands (--@xfail).

- Add missing silenced-and-firing-mixed-severity fixture referenced
  by 02.incidents-mocking-example.cy.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 16, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 16, 2026

@DavidRajnoha: This pull request references OBSINTA-1290 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Testsuite stability improvement generated by POC agentic loop. Looking for coderabbit feedback.

  • OOM prevention in search loop: cy.reload() to free browser DOM, _quietSearch to suppress
    Cypress snapshot accumulation, and a 35-min hard timeout safety net against infinite hangs

  • Plugin warm-up: warmUpForPlugin() polls for the Incidents tab up to 3 min, fixing flaky
    reg/02–05 specs caused by async plugin registration after session restoration

  • Test hygiene: remove cy.pause() debug calls, add missing fixture

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from jgbernalp and zhuje April 16, 2026 10:51
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: DavidRajnoha
Once this PR has been reviewed and has the lgtm label, please assign zhuje for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 16, 2026

@DavidRajnoha: This pull request references OBSINTA-1290 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Testsuite stability improvement generated by POC agentic loop. Looking for coderabbit feedback.

  • OOM prevention in search loop: cy.reload() to free browser DOM, _quietSearch to suppress
    Cypress snapshot accumulation, and a 35-min hard timeout safety net against infinite hangs

  • Plugin warm-up: warmUpForPlugin() polls for the Incidents tab up to 3 min, fixing flaky
    reg/02–05 specs caused by async plugin registration after session restoration

  • Test hygiene: remove cy.pause() debug calls, add missing fixture

Summary by CodeRabbit

  • Tests
  • Optimized incident test polling behavior with adjusted interval timing and hard timeout guards to improve stability.
  • Added plugin warm-up initialization steps across multiple test suites for enhanced readiness.
  • Removed manual pause statements to accelerate test execution.
  • Introduced search timeout reset functionality for cleaner test state management.
  • Added new incident scenario fixture for comprehensive test coverage.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Walkthrough

Added plugin warm-up and search-timeout management to Cypress incidents tests and the incidents-page view; replaced a maxMinutes wait with explicit interval-based polling, muted Cypress DOM logging during searches for OOM-safe traversal, removed manual pauses, added a mixed-severity incident fixture, and added a test-stability report.

Changes

Cohort / File(s) Summary
Polling & Timeout Adjustment
web/cypress/e2e/incidents/00.coo_incidents_e2e.cy.ts
Replaced maxMinutes-based cy.waitUntil with explicit interval-based timeout using a 60s polling interval, increased polling spacing to 2 * intervalMs, and set total timeout to 30 * intervalMs + 2 * intervalMs. Added comments describing OOM-safe polling, reload behavior, suppressed DOM snapshots, and a hard timeout.
Plugin Warm-up Hooks
web/cypress/e2e/incidents/01.incidents.cy.ts, web/cypress/e2e/incidents/regression/02.reg_ui_charts_comprehensive.cy.ts, web/cypress/e2e/incidents/regression/03.reg_api_calls.cy.ts, web/cypress/e2e/incidents/regression/04.reg_redux_effects.cy.ts, web/cypress/e2e/incidents/regression/05.reg_stress_testing_ui.cy.ts, web/cypress/e2e/incidents/03-04.reg_e2e_firing_alerts.cy.ts
Inserted incidentsPage.warmUpForPlugin() into suite before() hooks to ensure the Incidents plugin/tab is registered before tests run. Also added incidentsPage.resetSearchTimeout() in the firing-alerts spec to reset search timing state.
Test Flow Cleanup
web/cypress/e2e/incidents/regression/02.reg_ui_tooltip_boundary_times.cy.ts
Removed three cy.pause() calls so tests no longer halt for manual inspection and proceed uninterrupted.
New Fixture
web/cypress/fixtures/incident-scenarios/silenced-and-firing-mixed-severity.yaml
Added a fixture describing two incidents: a resolved critical (silenced, non-firing) and an active warning (firing, unsilenced), with per-incident and per-alert timing and metadata.
Incidents Page View & Search Logic
web/cypress/views/incidents-page.ts
Major refactor: added module-level search state (_findIncidentSearchStart, _FIND_INCIDENT_HARD_TIMEOUT_MS), _quietSearch mode, helper _qLog() and _resetSearchState(). Added warmUpForPlugin() and resetSearchTimeout(). Modified goTo()/prepareIncidentsPageForSearch() to wait for Incidents tab registration, suppressed Cypress logging during searches, changed chart bar traversal to use bar-group selector and click first visible path, and rewrote findIncidentWithAlert() to use quiet mode, count bar groups via body query, return false when none found, and enforce a hard timeout.
Test Stability Report
web/cypress/reports/test-stability.md
Added a Test Stability Ledger with per-test metrics and an embedded machine-readable stability dataset between markers for automated updates.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'OBSINTA-1290: Incident Tests Stability' clearly and concisely summarizes the main objective of the changeset: improving stability of incident-related tests through various fixes and warm-up mechanisms.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed This pull request modifies only Cypress E2E tests (*.cy.ts files), not Ginkgo tests, so the Ginkgo naming check is not applicable.
Test Structure And Quality ✅ Passed The custom check for Ginkgo test code quality is not applicable to this pull request. The PR contains no Ginkgo test code—only Cypress e2e tests, test utilities, YAML fixtures, and documentation.
Microshift Test Compatibility ✅ Passed The check addresses Ginkgo e2e tests (Go-based) but the PR contains only Cypress e2e tests (TypeScript/JavaScript), making the check not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR modifies Cypress e2e tests (TypeScript UI tests), not Ginkgo e2e tests (Go infrastructure tests); SNO compatibility check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed Pull request modifies only Cypress E2E test code and fixtures, not deployment manifests or operator code, so topology-aware scheduling check is not applicable.
Ote Binary Stdout Contract ✅ Passed The OTE Binary Stdout Contract check is specific to Go/Ginkgo test code communicating with openshift-tests via JSON stdout. This PR modifies only Cypress E2E tests and fixtures in TypeScript, YAML, and Markdown files within web/cypress/ directory, with no Go code or process-level stdout operations touched.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR modifies only Cypress browser-based E2E tests in TypeScript, which do not involve Kubernetes networking or IPv6 concerns that the check addresses.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
web/cypress/views/incidents-page.ts (1)

403-431: ⚠️ Potential issue | 🟠 Major

Index incidents by bar group, not by flattened path order.

These helpers still treat incidentsChartBarsVisiblePaths() as “one entry per incident”, but multi-severity incidents render multiple visible paths. That means selectIncidentByBarIndex() can click the wrong incident, and deselectIncidentByBar() always clicks the first visible path rather than the currently selected bar. Downstream, traverseAllIncidentsBars() can skip incidents, revisit the same one, or fail once it searches past index 0.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/cypress/views/incidents-page.ts` around lines 403 - 431,
selectIncidentByBarIndex and deselectIncidentByBar are indexing into the
flattened nodes returned by incidentsChartBarsVisiblePaths() but the chart
renders multi-severity incidents as multiple <path>s per logical bar; change
both helpers to operate on bar groups instead of flattened paths: use a grouped
selector (e.g., incidentsChartBars() or find the parent/bar container around
paths) to get one element per incident, validate index against that group
length, and click the appropriate clickable child inside the chosen group (first
path or specific selector). For deselectIncidentByBar, locate the currently
selected bar group (e.g., filter by a selected class/attribute) or accept an
index parameter and click that group's clickable child instead of always
clicking $paths.eq(0). Also update traverseAllIncidentsBars to iterate over the
bar groups (the same grouped selector) so it visits each logical incident once.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/cypress/e2e/incidents/00.coo_incidents_e2e.cy.ts`:
- Around line 53-55: The waitUntil call uses interval: 2 * intervalMs but
timeout: 30 * intervalMs, which shortens the effective search window by one
poll; update the timeout to include one full polling interval (e.g., set timeout
to 30 * intervalMs + (2 * intervalMs) or otherwise add the interval value) so
that cy.waitUntil(..., { interval: 2 * intervalMs, timeout: 30 * intervalMs + (2
* intervalMs) }) ensures incidentsPage.findIncidentWithAlert(currentAlertName)
is polled across the full intended 30-minute window.

In `@web/cypress/e2e/incidents/regression/03-04.reg_e2e_firing_alerts.cy.ts`:
- Around line 42-46: The before() hook currently calls cy.beforeBlockCOO(MCP,
MP, ...) and incidentsPage.resetSearchTimeout() but omits the plugin warm-up;
add a call to incidentsPage.warmUpForPlugin() in this before() hook (run it
before navigating or before resetSearchTimeout) so the Incidents tab is warmed
before session restore/navigation and avoids the async plugin-registration race;
update the before() in 03-04.reg_e2e_firing_alerts.cy.ts to invoke
incidentsPage.warmUpForPlugin() alongside cy.beforeBlockCOO(...) and
incidentsPage.resetSearchTimeout().

In `@web/cypress/views/incidents-page.ts`:
- Around line 639-641: The module-level flags _findIncidentSearchStart and
_quietSearch are only cleared in the hard-timeout and success (.then) branches;
ensure you call the existing reset helper (resetSearchTimeout) from all
failure/cleanup paths as well (including any .catch/.finally or Cypress failure
callbacks inside the search chain) so that when a Cypress command fails the
module state is cleared and subsequent tests don't inherit quiet mode or a stale
start timestamp; update the failure branches referenced around the search flow
(including the block covering lines 842-875) to invoke resetSearchTimeout where
appropriate.

---

Outside diff comments:
In `@web/cypress/views/incidents-page.ts`:
- Around line 403-431: selectIncidentByBarIndex and deselectIncidentByBar are
indexing into the flattened nodes returned by incidentsChartBarsVisiblePaths()
but the chart renders multi-severity incidents as multiple <path>s per logical
bar; change both helpers to operate on bar groups instead of flattened paths:
use a grouped selector (e.g., incidentsChartBars() or find the parent/bar
container around paths) to get one element per incident, validate index against
that group length, and click the appropriate clickable child inside the chosen
group (first path or specific selector). For deselectIncidentByBar, locate the
currently selected bar group (e.g., filter by a selected class/attribute) or
accept an index parameter and click that group's clickable child instead of
always clicking $paths.eq(0). Also update traverseAllIncidentsBars to iterate
over the bar groups (the same grouped selector) so it visits each logical
incident once.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 325d507e-5629-49b5-9b0c-1758c8e7bbc0

📥 Commits

Reviewing files that changed from the base of the PR and between ca47ec8 and 0d43a91.

📒 Files selected for processing (9)
  • web/cypress/e2e/incidents/00.coo_incidents_e2e.cy.ts
  • web/cypress/e2e/incidents/regression/02.reg_ui_charts_comprehensive.cy.ts
  • web/cypress/e2e/incidents/regression/02.reg_ui_tooltip_boundary_times.cy.ts
  • web/cypress/e2e/incidents/regression/03-04.reg_e2e_firing_alerts.cy.ts
  • web/cypress/e2e/incidents/regression/03.reg_api_calls.cy.ts
  • web/cypress/e2e/incidents/regression/04.reg_redux_effects.cy.ts
  • web/cypress/e2e/incidents/regression/05.reg_stress_testing_ui.cy.ts
  • web/cypress/fixtures/incident-scenarios/silenced-and-firing-mixed-severity.yaml
  • web/cypress/views/incidents-page.ts
💤 Files with no reviewable changes (1)
  • web/cypress/e2e/incidents/regression/02.reg_ui_tooltip_boundary_times.cy.ts

Comment thread web/cypress/e2e/incidents/00.coo_incidents_e2e.cy.ts Outdated
Comment thread web/cypress/views/incidents-page.ts
- waitUntil timeout: add one extra interval to avoid off-by-one
  where the last poll fires 2 min before timeout expires
- 03-04.reg_e2e_firing_alerts: add missing warmUpForPlugin() call
  that all other regression specs already have
- incidents-page.ts: extract _resetSearchState() helper and use it
  on both success and error paths so _quietSearch and
  _findIncidentSearchStart don't leak across specs on failure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DavidRajnoha DavidRajnoha force-pushed the test/OBSINTA-1290-incident-test-stability branch from 03ddbe2 to 580dc96 Compare April 16, 2026 15:01
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 16, 2026

@DavidRajnoha: This pull request references OBSINTA-1290 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Testsuite stability improvement generated by POC agentic loop. Looking for coderabbit feedback.

  • OOM prevention in search loop: cy.reload() to free browser DOM, _quietSearch to suppress
    Cypress snapshot accumulation, and a 35-min hard timeout safety net against infinite hangs

  • Plugin warm-up: warmUpForPlugin() polls for the Incidents tab up to 3 min, fixing flaky
    reg/02–05 specs caused by async plugin registration after session restoration

  • Test hygiene: remove cy.pause() debug calls, add missing fixture

Summary by CodeRabbit

  • Tests
  • Improved incident test polling with explicit interval timing, OOM-safe polling, and a hard timeout to prevent infinite waits.
  • Added plugin/page warm-up steps across multiple suites to improve readiness.
  • Removed manual pause points to speed full test runs.
  • Added a reset for incident search timeout to ensure clean test state.
  • Introduced a new incident scenario fixture for broader coverage.
  • Reduced noisy test logging/snapshots during long-running searches.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
web/cypress/views/incidents-page.ts (1)

847-890: ⚠️ Potential issue | 🟠 Major

Reset the search state on every terminal path.

This still only clears _findIncidentSearchStart on hard timeout or a true result. A normal false result leaves the old start time alive, so later independent findIncidentWithAlert() calls inherit a shortened 35-minute budget. If any Cypress command in this chain fails before the final .then(...), _quietSearch leaks too. _resetSearchState() needs to run whenever the search ends, not only on success.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/cypress/views/incidents-page.ts` around lines 847 - 890, The search state
reset is only invoked on hard timeout or when a match is found, causing
_findIncidentSearchStart and _quietSearch to leak on normal false results or on
errors; update findIncidentWithAlert to ensure _resetSearchState() always runs
when the search finishes (success, false, or error) and that _quietSearch is
restored to false for non-success paths. Concretely, add a terminal cleanup
handler on the Cypress chain (e.g., a .finally or a concluding .then/.catch
pair) that calls _resetSearchState() and sets _quietSearch = false when found
=== false or on errors, and also call _resetSearchState() only once for the
successful path, referencing the existing function name findIncidentWithAlert
plus symbols _resetSearchState and _quietSearch to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/cypress/views/incidents-page.ts`:
- Around line 419-442: In selectIncidentByBarIndex and deselectIncidentByBar,
stop clicking the unconditional .first() path inside a bar group; instead locate
the visible segment first by filtering for visible/non-zero-opacity paths (the
same visible-segment filter used elsewhere) on
incidentsChartBarsGroups().eq(index) before calling .first() and .click({ force:
true, ..._qLog() }) so you target the actual displayed segment rather than a
hidden placeholder.

---

Duplicate comments:
In `@web/cypress/views/incidents-page.ts`:
- Around line 847-890: The search state reset is only invoked on hard timeout or
when a match is found, causing _findIncidentSearchStart and _quietSearch to leak
on normal false results or on errors; update findIncidentWithAlert to ensure
_resetSearchState() always runs when the search finishes (success, false, or
error) and that _quietSearch is restored to false for non-success paths.
Concretely, add a terminal cleanup handler on the Cypress chain (e.g., a
.finally or a concluding .then/.catch pair) that calls _resetSearchState() and
sets _quietSearch = false when found === false or on errors, and also call
_resetSearchState() only once for the successful path, referencing the existing
function name findIncidentWithAlert plus symbols _resetSearchState and
_quietSearch to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 22eef24d-65bd-42b6-9e63-d65990bb9270

📥 Commits

Reviewing files that changed from the base of the PR and between 0d43a91 and 580dc96.

📒 Files selected for processing (4)
  • web/cypress/e2e/incidents/00.coo_incidents_e2e.cy.ts
  • web/cypress/e2e/incidents/01.incidents.cy.ts
  • web/cypress/e2e/incidents/regression/03-04.reg_e2e_firing_alerts.cy.ts
  • web/cypress/views/incidents-page.ts
✅ Files skipped from review due to trivial changes (1)
  • web/cypress/e2e/incidents/regression/03-04.reg_e2e_firing_alerts.cy.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • web/cypress/e2e/incidents/00.coo_incidents_e2e.cy.ts

Comment thread web/cypress/views/incidents-page.ts
@DavidRajnoha
Copy link
Copy Markdown
Contributor Author

/test e2e-incidents

@DavidRajnoha DavidRajnoha force-pushed the test/OBSINTA-1290-incident-test-stability branch from 580dc96 to d9f37d2 Compare April 17, 2026 07:44
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 17, 2026

@DavidRajnoha: This pull request references OBSINTA-1290 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Testsuite stability improvement generated by POC agentic loop. Looking for coderabbit feedback.

  • OOM prevention in search loop: cy.reload() to free browser DOM, _quietSearch to suppress
    Cypress snapshot accumulation, and a 35-min hard timeout safety net against infinite hangs

  • Plugin warm-up: warmUpForPlugin() polls for the Incidents tab up to 3 min, fixing flaky
    reg/02–05 specs caused by async plugin registration after session restoration

  • Test hygiene: remove cy.pause() debug calls, add missing fixture

Summary by CodeRabbit

  • Tests
  • Improved incident test polling with explicit interval timing, OOM-safe polling, and a hard timeout to prevent infinite waits.
  • Added plugin/page warm-up steps across multiple suites to improve readiness.
  • Removed manual pause points to speed full test runs.
  • Added a reset for incident search timeout to ensure clean test state.
  • Introduced a new incident scenario fixture for broader coverage.
  • Reduced noisy test logging/snapshots and improved incident traversal/search reliability.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
web/cypress/views/incidents-page.ts (1)

875-898: ⚠️ Potential issue | 🟠 Major

Search state still is not cleaned up on command failures.

This still only resets via the hard-timeout branch or the final success .then(...). If any Cypress command in prepareIncidentsPageForSearch() or traverseAllIncidentsBars() fails first, that tail callback never runs, so later searches inherit _quietSearch and a stale _findIncidentSearchStart.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/cypress/views/incidents-page.ts` around lines 875 - 898, The search state
(_quietSearch and _findIncidentSearchStart) must be cleaned on both success and
failure: wrap the cy.get(...).then(...) promise chain that calls
prepareIncidentsPageForSearch() and traverseAllIncidentsBars(...) with an error
handler so failures also call _resetSearchState() (and set _quietSearch = false
if appropriate) before rethrowing the error; in practice add a .catch /
second-argument .then error callback after the chain that calls
_resetSearchState() and resets _quietSearch/_findIncidentSearchStart and then
rethrows so the original test failure still surfaces.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/cypress/views/incidents-page.ts`:
- Around line 90-100: The DOM existence probe runs before the queued
cy.wait(500, _qLog()) resolves, so move the probe to run after the wait: inside
the .then(($body) => { ... }) return cy.wait(500, _qLog()).then(() => { const
exists =
$body.find('g[role="presentation"][data-test*="incidents-chart-bar-"]').length >
0; if (exists) { return
cy.get('g[role="presentation"][data-test*="incidents-chart-bar-"]', _qLog()) ...
} else { return [] } }); this ensures the check for the selector executes after
the delay and uses the same $body reference.

---

Duplicate comments:
In `@web/cypress/views/incidents-page.ts`:
- Around line 875-898: The search state (_quietSearch and
_findIncidentSearchStart) must be cleaned on both success and failure: wrap the
cy.get(...).then(...) promise chain that calls prepareIncidentsPageForSearch()
and traverseAllIncidentsBars(...) with an error handler so failures also call
_resetSearchState() (and set _quietSearch = false if appropriate) before
rethrowing the error; in practice add a .catch / second-argument .then error
callback after the chain that calls _resetSearchState() and resets
_quietSearch/_findIncidentSearchStart and then rethrows so the original test
failure still surfaces.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 1c4eca30-99ea-489d-9013-012288f9179c

📥 Commits

Reviewing files that changed from the base of the PR and between 580dc96 and d9f37d2.

📒 Files selected for processing (2)
  • web/cypress/e2e/incidents/01.incidents.cy.ts
  • web/cypress/views/incidents-page.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • web/cypress/e2e/incidents/01.incidents.cy.ts

Comment thread web/cypress/views/incidents-page.ts Outdated
Claude Agent and others added 2 commits April 22, 2026 09:09
- selectIncidentByBarIndex/deselectIncidentByBar: use
  incidentsChartBarsGroups() (one element per incident) instead of
  incidentsChartBarsVisiblePaths() (flattened paths). Fixes incorrect
  incident selection/deselection with multi-severity incidents.
- findIncidentWithAlert: count bar groups instead of flattened paths
- deselectIncidentByBar: accept index parameter to deselect the
  correct incident instead of always clicking the first bar
- 01.incidents.cy.ts: add warmUpForPlugin() to before() hook —
  same plugin loading race fix as reg/02-05

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Records all verification runs from the OBSINTA-1290 consolidation:
- Cluster 1 (ci-ln-trfv3nt): 3 regression + 1 e2e-real pass
- Cluster 2 (ci-ln-zgwt0qt): 1 regression pass
- Cluster 3 (ci-ln-lg6ry1t): 2 regression passes
- Cluster 4 (ci-ln-y7v0t92): 1 regression pass

Test 1.5 (Traverse Incident Table) had a transient failure on cluster 2
due to plugin loading race — fixed by adding warmUpForPlugin() to goTo()
and BVT before() hook in commit 0cb566d.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DavidRajnoha DavidRajnoha force-pushed the test/OBSINTA-1290-incident-test-stability branch from d9f37d2 to 35f3540 Compare April 22, 2026 09:47
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
web/cypress/views/incidents-page.ts (1)

238-262: Consider extracting shared navigation logic.

warmUpForPlugin() duplicates nearly all of goTo(). Both navigate to Alerting, wait for the Incidents tab, switch to it, and verify visibility. If the navigation flow changes, both methods must be updated.

Consider extracting the shared logic or having warmUpForPlugin() delegate to goTo():

♻️ Optional refactor
   warmUpForPlugin: () => {
     cy.log('incidentsPage.warmUpForPlugin: waiting for monitoring-console-plugin Incidents tab');
-    nav.sidenav.clickNavLink(['Observe', 'Alerting']);
-    // Wait up to 3 minutes for the Incidents tab to appear. Uses synchronous jQuery check
-    // inside cy.waitUntil() to avoid the 80s default command timeout, then uses
-    // nav.tabs.switchTab() which correctly clicks the button element (not the li wrapper).
-    incidentsPage.waitForIncidentsTab();
-    nav.tabs.switchTab('Incidents');
-    incidentsPage.elements.daysSelectToggle().should('be.visible');
+    incidentsPage.goTo();
   },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/cypress/views/incidents-page.ts` around lines 238 - 262,
warmUpForPlugin() duplicates the navigation flow in goTo() (both call
nav.sidenav.clickNavLink(['Observe','Alerting']), waitForIncidentsTab(),
nav.tabs.switchTab('Incidents'), and check daysSelectToggle), so extract the
shared sequence into a single helper or have warmUpForPlugin() call goTo() to
avoid duplicate maintenance; update warmUpForPlugin() to delegate to goTo() (or
call the new helper) and keep waitForIncidentsTab() as the shared poll function
used by both.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/cypress/reports/test-stability.md`:
- Around line 14-33: The JSON test keys and the human-readable table labels are
inconsistent (e.g., JSON uses "BVT: Incidents - UI 1. Toolbar..." while the
table uses "BVT: Incidents - 1.1 Toolbar and charts toggle functionality");
update the sources so they share a single canonical identifier—either rename the
JSON keys to match the table labels (change "BVT: Incidents - UI 1. Toolbar..."
→ "BVT: Incidents - 1.1 Toolbar and charts toggle functionality") or add a
stable id field (e.g., "testId": "BVT_Incidents_1.1") in the JSON and use that
id in the table generation and matching scripts; apply the same change for all
affected entries (examples include "BVT: Incidents - 1.5 Traverse Incident
Table" and "Regression: Filtering - 1. Severity filtering") so automated
correlation scripts can reliably match entries.

---

Nitpick comments:
In `@web/cypress/views/incidents-page.ts`:
- Around line 238-262: warmUpForPlugin() duplicates the navigation flow in
goTo() (both call nav.sidenav.clickNavLink(['Observe','Alerting']),
waitForIncidentsTab(), nav.tabs.switchTab('Incidents'), and check
daysSelectToggle), so extract the shared sequence into a single helper or have
warmUpForPlugin() call goTo() to avoid duplicate maintenance; update
warmUpForPlugin() to delegate to goTo() (or call the new helper) and keep
waitForIncidentsTab() as the shared poll function used by both.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: b3a685cf-a947-4a7a-a706-d41be662e084

📥 Commits

Reviewing files that changed from the base of the PR and between d9f37d2 and 35f3540.

📒 Files selected for processing (3)
  • web/cypress/e2e/incidents/01.incidents.cy.ts
  • web/cypress/reports/test-stability.md
  • web/cypress/views/incidents-page.ts
✅ Files skipped from review due to trivial changes (1)
  • web/cypress/e2e/incidents/01.incidents.cy.ts

Comment on lines +14 to +33
| Test | Pass Rate | Trend | Runs | Last Failure | Fixed By |
|------|-----------|-------|------|-------------|----------|
| BVT: Incidents - 1.1 Toolbar and charts toggle functionality | 100% | stable | 7 | — | — |
| BVT: Incidents - 1.2 Incidents chart renders with bars | 100% | stable | 7 | — | — |
| BVT: Incidents - 1.3 Incidents table renders with rows | 100% | stable | 7 | — | — |
| BVT: Incidents - 1.4 Charts and alerts empty state | 100% | stable | 7 | — | — |
| BVT: Incidents - 1.5 Traverse Incident Table | 100% | stable | 7 | 2026-04-16: plugin tab timeout (80s) | 0cb566d (warmUpForPlugin in goTo + BVT before) |
| Regression: Filtering - 1. Severity filtering | 100% | stable | 7 | — | — |
| Regression: Filtering - 2. Chart interaction with active filters | 100% | stable | 7 | — | — |
| Regression: Charts UI - 2.1 Chart renders with correct bar count | 100% | stable | 7 | — | — |
| Regression: Charts UI - 2.2 Chart bars have correct severity colors | 100% | stable | 7 | — | — |
| Regression: Charts UI - 2.3 Toggle charts button hides/shows chart | 100% | stable | 7 | — | — |
| Regression: Charts UI - 2.4 Incident selection updates alert chart | 100% | stable | 7 | — | — |
| Regression: Silences - 3.1 Silenced alerts not shown as active | 100% | stable | 7 | — | — |
| Regression: Silences - 3.2 Mixed silenced and firing alerts | 100% | stable | 7 | — | — |
| Regression: Redux - 4.1 Redux state updates on filter change | 100% | stable | 7 | — | — |
| Regression: Redux - 4.2 Redux state persists across navigation | 100% | stable | 7 | — | — |
| Regression: Redux - 4.3 Days selector updates redux state | 100% | stable | 7 | — | — |
| Regression: Stress Testing - 5.1 No excessive padding | 100% | stable | 7 | — | — |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test name keys in JSON don't match human-readable table labels.

The human-readable table uses names like BVT: Incidents - 1.1 Toolbar... while the JSON uses BVT: Incidents - UI 1. Toolbar.... This inconsistency could break automated correlation between the two sections if scripts attempt to match by test name.

Either align the naming conventions or document that they differ intentionally.

Also applies to: 54-155

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/cypress/reports/test-stability.md` around lines 14 - 33, The JSON test
keys and the human-readable table labels are inconsistent (e.g., JSON uses "BVT:
Incidents - UI 1. Toolbar..." while the table uses "BVT: Incidents - 1.1 Toolbar
and charts toggle functionality"); update the sources so they share a single
canonical identifier—either rename the JSON keys to match the table labels
(change "BVT: Incidents - UI 1. Toolbar..." → "BVT: Incidents - 1.1 Toolbar and
charts toggle functionality") or add a stable id field (e.g., "testId":
"BVT_Incidents_1.1") in the JSON and use that id in the table generation and
matching scripts; apply the same change for all affected entries (examples
include "BVT: Incidents - 1.5 Traverse Incident Table" and "Regression:
Filtering - 1. Severity filtering") so automated correlation scripts can
reliably match entries.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

@DavidRajnoha: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants