|
| 1 | +--- |
| 2 | +name: analyze-ci-results |
| 3 | +description: Analyze OpenShift CI (Prow) test results from a gcsweb URL - identifies infra vs test/code failures and correlates with git commits |
| 4 | +parameters: |
| 5 | + - name: ci-url |
| 6 | + description: > |
| 7 | + The gcsweb URL for a CI run. Can be any level of the artifact tree: |
| 8 | + - Job root: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/{PR}/{JOB}/{RUN_ID}/ |
| 9 | + - Test artifacts: .../{RUN_ID}/artifacts/e2e-incidents/monitoring-plugin-tests-incidents-ui/ |
| 10 | + - Prow UI: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/{PR}/{JOB}/{RUN_ID} |
| 11 | + required: true |
| 12 | + - name: focus |
| 13 | + description: "Optional: focus analysis on specific test file or area (e.g., 'regression', '01.incidents', 'filtering')" |
| 14 | + required: false |
| 15 | +--- |
| 16 | + |
| 17 | +# Analyze OpenShift CI Test Results |
| 18 | + |
| 19 | +Fetch, parse, and classify failures from an OpenShift CI (Prow) test run. This skill is designed to be the **first step** in an agentic test iteration workflow — it produces a structured diagnosis that the orchestrator can act on. |
| 20 | + |
| 21 | +## Instructions |
| 22 | + |
| 23 | +### Step 1: Normalize the URL |
| 24 | + |
| 25 | +The user may provide a URL at any level. Normalize it to the **job root**: |
| 26 | + |
| 27 | +``` |
| 28 | +https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/{PR}/{JOB}/{RUN_ID}/ |
| 29 | +``` |
| 30 | + |
| 31 | +If the user provides a Prow UI URL (`prow.ci.openshift.org/view/gs/...`), convert it: |
| 32 | +- Replace `https://prow.ci.openshift.org/view/gs/` with `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/` |
| 33 | +- Append trailing `/` if missing |
| 34 | + |
| 35 | +Derive these base paths: |
| 36 | +- **Job root**: `{normalized_url}` |
| 37 | +- **Test artifacts root**: `{normalized_url}artifacts/e2e-incidents/monitoring-plugin-tests-incidents-ui/` |
| 38 | +- **Screenshots root**: `{test_artifacts_root}artifacts/screenshots/` |
| 39 | +- **Videos root**: `{test_artifacts_root}artifacts/videos/` |
| 40 | + |
| 41 | +### Step 2: Fetch Job Metadata (parallel) |
| 42 | + |
| 43 | +Fetch these files from the **job root** using WebFetch: |
| 44 | + |
| 45 | +| File | What to extract | |
| 46 | +|------|----------------| |
| 47 | +| `started.json` | `timestamp`, `pull` (PR number), `repos` (commit SHAs) | |
| 48 | +| `finished.json` | `passed` (bool), `result` ("SUCCESS"/"FAILURE"), `revision` (PR HEAD SHA) | |
| 49 | +| `prowjob.json` | PR title, PR author, PR branch, base branch, base SHA, PR SHA, job name, cluster, duration | |
| 50 | + |
| 51 | +From `started.json` `repos` field, extract: |
| 52 | +- **Base commit**: the SHA after `main:` (before the comma) |
| 53 | +- **PR commit**: the SHA after `{PR_NUMBER}:` |
| 54 | + |
| 55 | +Present a summary: |
| 56 | +``` |
| 57 | +CI Run Summary: |
| 58 | + PR: #{PR_NUMBER} - {PR_TITLE} |
| 59 | + Author: {AUTHOR} |
| 60 | + Branch: {PR_BRANCH} -> {BASE_BRANCH} |
| 61 | + PR commit: {PR_SHA} (short: first 7 chars) |
| 62 | + Base commit: {BASE_SHA} (short: first 7 chars) |
| 63 | + Result: PASSED / FAILED |
| 64 | + Duration: {DURATION} |
| 65 | + Job: {JOB_NAME} |
| 66 | +``` |
| 67 | + |
| 68 | +### Step 3: Fetch and Parse Test Results |
| 69 | + |
| 70 | +Fetch `{test_artifacts_root}build-log.txt` using WebFetch. |
| 71 | + |
| 72 | +#### Cypress Output Format |
| 73 | + |
| 74 | +The build log contains Cypress console output. Parse these sections: |
| 75 | + |
| 76 | +**Per-spec results block** — appears after each spec file runs: |
| 77 | +``` |
| 78 | + (Results) |
| 79 | +
|
| 80 | + ┌──────────────────────────────────────────────────────────┐ |
| 81 | + │ Tests: N │ |
| 82 | + │ Passing: N │ |
| 83 | + │ Failing: N │ |
| 84 | + │ Pending: N │ |
| 85 | + │ Skipped: N │ |
| 86 | + │ Screenshots: N │ |
| 87 | + │ Video: true │ |
| 88 | + │ Duration: X minutes, Y seconds │ |
| 89 | + │ Spec Ran: {spec-file-name}.cy.ts │ |
| 90 | + └──────────────────────────────────────────────────────────┘ |
| 91 | +``` |
| 92 | + |
| 93 | +**Final summary table** — appears at the very end: |
| 94 | +``` |
| 95 | + (Run Finished) |
| 96 | +
|
| 97 | + ┌──────────────────────────────────────────────────────────┐ |
| 98 | + │ Spec Tests Passing Failing Pending │ |
| 99 | + ├──────────────────────────────────────────────────────────┤ |
| 100 | + │ ✓ spec-file.cy.ts 5 5 0 0 │ |
| 101 | + │ ✗ other-spec.cy.ts 3 1 2 0 │ |
| 102 | + └──────────────────────────────────────────────────────────┘ |
| 103 | +``` |
| 104 | + |
| 105 | +**Failure details** — appear inline during test execution: |
| 106 | +``` |
| 107 | + 1) Suite Name |
| 108 | + "before all" hook for "test description": |
| 109 | + ErrorType: error message |
| 110 | + > detailed error |
| 111 | + at stack trace... |
| 112 | +
|
| 113 | + N failing |
| 114 | +``` |
| 115 | + |
| 116 | +Or for test-level (not hook) failures: |
| 117 | +``` |
| 118 | + 1) Suite Name |
| 119 | + test description: |
| 120 | + AssertionError: Timed out retrying after Nms: Expected to find element: .selector |
| 121 | +``` |
| 122 | + |
| 123 | +Extract per-spec: |
| 124 | +- Spec file name |
| 125 | +- Pass/fail/skip counts |
| 126 | +- For failures: test name, error type, error message, whether it was in a hook |
| 127 | + |
| 128 | +### Step 4: Fetch Failure Screenshots |
| 129 | + |
| 130 | +For each failing spec, navigate to `{screenshots_root}{spec-file-name}/` and list available screenshots. |
| 131 | + |
| 132 | +**Screenshot naming convention:** |
| 133 | +``` |
| 134 | +{Suite Name} -- {Test Title} -- before all hook (failed).png |
| 135 | +{Suite Name} -- {Test Title} (failed).png |
| 136 | +``` |
| 137 | + |
| 138 | +Fetch each screenshot URL and **read it using the Read tool** (multimodal) to understand the visual state at failure time. Describe what you see: |
| 139 | +- What page/view is shown? |
| 140 | +- Are there error dialogs, loading spinners, empty states? |
| 141 | +- Is the expected UI element visible? If not, what's in its place? |
| 142 | +- Are there console errors visible in the browser? |
| 143 | + |
| 144 | +### Step 5: Classify Each Failure |
| 145 | + |
| 146 | +For every failing test, classify it into one of these categories: |
| 147 | + |
| 148 | +#### Infrastructure Failures (not actionable by test code changes) |
| 149 | + |
| 150 | +| Classification | Indicators | |
| 151 | +|---------------|------------| |
| 152 | +| `INFRA_CLUSTER` | Certificate expired, API server unreachable, node not ready, cluster version mismatch | |
| 153 | +| `INFRA_OPERATOR` | COO/CMO installation timeout, operator pod not running, CRD not found | |
| 154 | +| `INFRA_PLUGIN` | Plugin deployment unavailable, dynamic plugin chunk loading error, console not accessible | |
| 155 | +| `INFRA_AUTH` | Login failed, kubeconfig invalid, RBAC permission denied (for expected operations) | |
| 156 | +| `INFRA_CI` | Pod eviction, OOM killed, timeout at infrastructure level (not test timeout) | |
| 157 | + |
| 158 | +**Key signals for infra issues:** |
| 159 | +- Errors in `before all` hooks related to cluster setup |
| 160 | +- Certificate/TLS errors |
| 161 | +- `oc` command failures with connection errors |
| 162 | +- Element `.co-clusterserviceversion-install__heading` not found (operator install UI) |
| 163 | +- Errors mentioning pod names, namespaces, or k8s resources |
| 164 | +- `e is not a function` or similar JS errors from the console application itself (not test code) |
| 165 | + |
| 166 | +#### Test/Code Failures (actionable) |
| 167 | + |
| 168 | +| Classification | Indicators | |
| 169 | +|---------------|------------| |
| 170 | +| `TEST_BUG` | Wrong selector, incorrect assertion logic, race condition / timing issue, test assumes wrong state | |
| 171 | +| `FIXTURE_ISSUE` | Mock data doesn't match expected structure, missing alerts/incidents in fixture, edge case not covered | |
| 172 | +| `PAGE_OBJECT_GAP` | Page object method missing, selector outdated, doesn't match current DOM | |
| 173 | +| `MOCK_ISSUE` | cy.intercept not matching the actual API call, response shape incorrect, query parameter mismatch | |
| 174 | +| `CODE_REGRESSION` | Test was passing before, UI behavior genuinely changed — the source code has a bug | |
| 175 | + |
| 176 | +**Key signals for test/code issues:** |
| 177 | +- `AssertionError: Timed out retrying` on application-specific selectors (not infra selectors) |
| 178 | +- `Expected X to equal Y` where the assertion logic is wrong |
| 179 | +- Failures only in specific test scenarios, not across the board |
| 180 | +- Screenshot shows the UI rendered correctly but test expected something different |
| 181 | + |
| 182 | +### Step 6: Correlate with Git Commits |
| 183 | + |
| 184 | +Using the PR commit SHA and base commit SHA from Step 2: |
| 185 | + |
| 186 | +1. **Check local git history**: Run `git log {base_sha}..{pr_sha} --oneline` to see what changed in the PR |
| 187 | +2. **Identify relevant changes**: Run `git diff {base_sha}..{pr_sha} --stat` to see which files were modified |
| 188 | +3. **For CODE_REGRESSION failures**: Check if the failing component's source code was modified in the PR |
| 189 | +4. **For TEST_BUG failures**: Check if the test itself was modified in the PR (new test might have a bug) |
| 190 | + |
| 191 | +Present the correlation: |
| 192 | +``` |
| 193 | +Commit correlation for {test_name}: |
| 194 | + PR modified: src/components/incidents/IncidentChart.tsx (+45, -12) |
| 195 | + Test file: cypress/e2e/incidents/01.incidents.cy.ts (unchanged) |
| 196 | + Verdict: CODE_REGRESSION - chart rendering changed but test expectations not updated |
| 197 | +``` |
| 198 | + |
| 199 | +Or: |
| 200 | +``` |
| 201 | +Commit correlation for {test_name}: |
| 202 | + PR modified: cypress/e2e/incidents/regression/01.reg_filtering.cy.ts (+30, -5) |
| 203 | + Source code: src/components/incidents/ (unchanged) |
| 204 | + Verdict: TEST_BUG - new test code has incorrect assertion |
| 205 | +``` |
| 206 | + |
| 207 | +### Step 7: Produce Structured Report |
| 208 | + |
| 209 | +Output a structured report with this format: |
| 210 | + |
| 211 | +``` |
| 212 | +# CI Analysis Report |
| 213 | +
|
| 214 | +## Run: PR #{PR} - {TITLE} |
| 215 | +- Commit: {SHORT_SHA} by {AUTHOR} |
| 216 | +- Branch: {BRANCH} |
| 217 | +- Result: {RESULT} |
| 218 | +- Duration: {DURATION} |
| 219 | +
|
| 220 | +## Summary |
| 221 | +- Total specs: N |
| 222 | +- Passed: N |
| 223 | +- Failed: N (M infra, K test/code) |
| 224 | +
|
| 225 | +## Infrastructure Issues (not actionable via test changes) |
| 226 | +
|
| 227 | +### INFRA_CLUSTER: Certificate expired |
| 228 | +- Affected: ALL tests (cascade failure) |
| 229 | +- Detail: x509 certificate expired at {timestamp} |
| 230 | +- Action needed: Cluster certificate renewal (outside test scope) |
| 231 | +
|
| 232 | +## Test/Code Issues (actionable) |
| 233 | +
|
| 234 | +### TEST_BUG: Selector timeout in filtering test |
| 235 | +- Spec: regression/01.reg_filtering.cy.ts |
| 236 | +- Test: "should filter incidents by severity" |
| 237 | +- Error: Timed out retrying after 80000ms: Expected to find element: [data-test="severity-filter"] |
| 238 | +- Screenshot: [description of what screenshot shows] |
| 239 | +- Commit correlation: Test file was modified in this PR (+30 lines) |
| 240 | +- Suggested fix: Update selector to match current DOM structure |
| 241 | +
|
| 242 | +### CODE_REGRESSION: Chart not rendering after component refactor |
| 243 | +- Spec: regression/02.reg_ui_charts_comprehensive.cy.ts |
| 244 | +- Test: "should display incident bars in chart" |
| 245 | +- Error: Expected 5 bars, found 0 |
| 246 | +- Screenshot: Chart area is empty, no error messages visible |
| 247 | +- Commit correlation: src/components/incidents/IncidentChart.tsx was refactored |
| 248 | +- Suggested fix: Investigate chart rendering logic in the refactored component |
| 249 | +
|
| 250 | +## Flakiness Indicators |
| 251 | +- If a test failed with a timing-related error but similar tests in the same suite passed, |
| 252 | + flag it as potentially flaky |
| 253 | +- If the error message contains "Timed out retrying" on an element that should exist, |
| 254 | + it may be a race condition rather than a missing element |
| 255 | +
|
| 256 | +## Recommendations |
| 257 | +- List prioritized next steps |
| 258 | +- For infra issues: what needs to happen before tests can run |
| 259 | +- For test/code issues: which fixes to attempt first (quick wins vs complex) |
| 260 | +- Whether local reproduction is recommended |
| 261 | +``` |
| 262 | + |
| 263 | +### Step 8: If `focus` parameter is provided |
| 264 | + |
| 265 | +Filter the analysis to only the relevant tests. For example: |
| 266 | +- `focus=regression` -> only analyze `regression/*.cy.ts` specs |
| 267 | +- `focus=filtering` -> only analyze tests with "filter" in their name |
| 268 | +- `focus=01.incidents` -> only analyze `01.incidents.cy.ts` |
| 269 | + |
| 270 | +Still fetch all metadata and provide the full context, but limit detailed diagnosis to the focused area. |
| 271 | + |
| 272 | +## Notes for the Orchestrator |
| 273 | + |
| 274 | +When this skill is used as the first step of `/cypress:test-iteration:iterate-incident-tests`: |
| 275 | + |
| 276 | +1. **If all failures are INFRA_***: Report to user and STOP. No test changes will help. |
| 277 | +2. **If mixed INFRA_* and TEST/CODE**: Report infra issues to user, proceed with test/code fixes only. |
| 278 | +3. **If all failures are TEST/CODE**: Proceed with the full iteration loop. |
| 279 | +4. **The commit correlation** tells the orchestrator whether to focus on fixing tests or investigating source code changes. |
| 280 | +5. **Screenshots** give the Diagnosis Agent a head start — it can reference the CI screenshot analysis instead of reproducing the failure locally first. |
0 commit comments