Skip to content

Commit 23376e4

Browse files
DavidRajnohaclaude
andcommitted
feat: add test iteration skills and supporting scripts
Add four Claude Code skills for agentic test iteration: - /analyze-ci-results: analyze OpenShift CI (Prow) test results - /diagnose-test-failure: diagnose Cypress test failures - /iterate-incident-tests: autonomous local test iteration loop - /iterate-ci-flaky: CI-based flaky test iteration loop Supporting scripts: - poll-ci-status.py: poll Prow CI job status - notify-slack.py: Slack notifications for CI results - review-github.py: GitHub PR comment review flow Also adds test stability ledger for tracking test reliability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c8c6b38 commit 23376e4

8 files changed

Lines changed: 2052 additions & 0 deletions

File tree

Lines changed: 280 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,280 @@
1+
---
2+
name: analyze-ci-results
3+
description: Analyze OpenShift CI (Prow) test results from a gcsweb URL - identifies infra vs test/code failures and correlates with git commits
4+
parameters:
5+
- name: ci-url
6+
description: >
7+
The gcsweb URL for a CI run. Can be any level of the artifact tree:
8+
- Job root: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/{PR}/{JOB}/{RUN_ID}/
9+
- Test artifacts: .../{RUN_ID}/artifacts/e2e-incidents/monitoring-plugin-tests-incidents-ui/
10+
- Prow UI: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/{PR}/{JOB}/{RUN_ID}
11+
required: true
12+
- name: focus
13+
description: "Optional: focus analysis on specific test file or area (e.g., 'regression', '01.incidents', 'filtering')"
14+
required: false
15+
---
16+
17+
# Analyze OpenShift CI Test Results
18+
19+
Fetch, parse, and classify failures from an OpenShift CI (Prow) test run. This skill is designed to be the **first step** in an agentic test iteration workflow — it produces a structured diagnosis that the orchestrator can act on.
20+
21+
## Instructions
22+
23+
### Step 1: Normalize the URL
24+
25+
The user may provide a URL at any level. Normalize it to the **job root**:
26+
27+
```
28+
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/{PR}/{JOB}/{RUN_ID}/
29+
```
30+
31+
If the user provides a Prow UI URL (`prow.ci.openshift.org/view/gs/...`), convert it:
32+
- Replace `https://prow.ci.openshift.org/view/gs/` with `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/`
33+
- Append trailing `/` if missing
34+
35+
Derive these base paths:
36+
- **Job root**: `{normalized_url}`
37+
- **Test artifacts root**: `{normalized_url}artifacts/e2e-incidents/monitoring-plugin-tests-incidents-ui/`
38+
- **Screenshots root**: `{test_artifacts_root}artifacts/screenshots/`
39+
- **Videos root**: `{test_artifacts_root}artifacts/videos/`
40+
41+
### Step 2: Fetch Job Metadata (parallel)
42+
43+
Fetch these files from the **job root** using WebFetch:
44+
45+
| File | What to extract |
46+
|------|----------------|
47+
| `started.json` | `timestamp`, `pull` (PR number), `repos` (commit SHAs) |
48+
| `finished.json` | `passed` (bool), `result` ("SUCCESS"/"FAILURE"), `revision` (PR HEAD SHA) |
49+
| `prowjob.json` | PR title, PR author, PR branch, base branch, base SHA, PR SHA, job name, cluster, duration |
50+
51+
From `started.json` `repos` field, extract:
52+
- **Base commit**: the SHA after `main:` (before the comma)
53+
- **PR commit**: the SHA after `{PR_NUMBER}:`
54+
55+
Present a summary:
56+
```
57+
CI Run Summary:
58+
PR: #{PR_NUMBER} - {PR_TITLE}
59+
Author: {AUTHOR}
60+
Branch: {PR_BRANCH} -> {BASE_BRANCH}
61+
PR commit: {PR_SHA} (short: first 7 chars)
62+
Base commit: {BASE_SHA} (short: first 7 chars)
63+
Result: PASSED / FAILED
64+
Duration: {DURATION}
65+
Job: {JOB_NAME}
66+
```
67+
68+
### Step 3: Fetch and Parse Test Results
69+
70+
Fetch `{test_artifacts_root}build-log.txt` using WebFetch.
71+
72+
#### Cypress Output Format
73+
74+
The build log contains Cypress console output. Parse these sections:
75+
76+
**Per-spec results block** — appears after each spec file runs:
77+
```
78+
(Results)
79+
80+
┌──────────────────────────────────────────────────────────┐
81+
│ Tests: N │
82+
│ Passing: N │
83+
│ Failing: N │
84+
│ Pending: N │
85+
│ Skipped: N │
86+
│ Screenshots: N │
87+
│ Video: true │
88+
│ Duration: X minutes, Y seconds │
89+
│ Spec Ran: {spec-file-name}.cy.ts │
90+
└──────────────────────────────────────────────────────────┘
91+
```
92+
93+
**Final summary table** — appears at the very end:
94+
```
95+
(Run Finished)
96+
97+
┌──────────────────────────────────────────────────────────┐
98+
│ Spec Tests Passing Failing Pending │
99+
├──────────────────────────────────────────────────────────┤
100+
│ ✓ spec-file.cy.ts 5 5 0 0 │
101+
│ ✗ other-spec.cy.ts 3 1 2 0 │
102+
└──────────────────────────────────────────────────────────┘
103+
```
104+
105+
**Failure details** — appear inline during test execution:
106+
```
107+
1) Suite Name
108+
"before all" hook for "test description":
109+
ErrorType: error message
110+
> detailed error
111+
at stack trace...
112+
113+
N failing
114+
```
115+
116+
Or for test-level (not hook) failures:
117+
```
118+
1) Suite Name
119+
test description:
120+
AssertionError: Timed out retrying after Nms: Expected to find element: .selector
121+
```
122+
123+
Extract per-spec:
124+
- Spec file name
125+
- Pass/fail/skip counts
126+
- For failures: test name, error type, error message, whether it was in a hook
127+
128+
### Step 4: Fetch Failure Screenshots
129+
130+
For each failing spec, navigate to `{screenshots_root}{spec-file-name}/` and list available screenshots.
131+
132+
**Screenshot naming convention:**
133+
```
134+
{Suite Name} -- {Test Title} -- before all hook (failed).png
135+
{Suite Name} -- {Test Title} (failed).png
136+
```
137+
138+
Fetch each screenshot URL and **read it using the Read tool** (multimodal) to understand the visual state at failure time. Describe what you see:
139+
- What page/view is shown?
140+
- Are there error dialogs, loading spinners, empty states?
141+
- Is the expected UI element visible? If not, what's in its place?
142+
- Are there console errors visible in the browser?
143+
144+
### Step 5: Classify Each Failure
145+
146+
For every failing test, classify it into one of these categories:
147+
148+
#### Infrastructure Failures (not actionable by test code changes)
149+
150+
| Classification | Indicators |
151+
|---------------|------------|
152+
| `INFRA_CLUSTER` | Certificate expired, API server unreachable, node not ready, cluster version mismatch |
153+
| `INFRA_OPERATOR` | COO/CMO installation timeout, operator pod not running, CRD not found |
154+
| `INFRA_PLUGIN` | Plugin deployment unavailable, dynamic plugin chunk loading error, console not accessible |
155+
| `INFRA_AUTH` | Login failed, kubeconfig invalid, RBAC permission denied (for expected operations) |
156+
| `INFRA_CI` | Pod eviction, OOM killed, timeout at infrastructure level (not test timeout) |
157+
158+
**Key signals for infra issues:**
159+
- Errors in `before all` hooks related to cluster setup
160+
- Certificate/TLS errors
161+
- `oc` command failures with connection errors
162+
- Element `.co-clusterserviceversion-install__heading` not found (operator install UI)
163+
- Errors mentioning pod names, namespaces, or k8s resources
164+
- `e is not a function` or similar JS errors from the console application itself (not test code)
165+
166+
#### Test/Code Failures (actionable)
167+
168+
| Classification | Indicators |
169+
|---------------|------------|
170+
| `TEST_BUG` | Wrong selector, incorrect assertion logic, race condition / timing issue, test assumes wrong state |
171+
| `FIXTURE_ISSUE` | Mock data doesn't match expected structure, missing alerts/incidents in fixture, edge case not covered |
172+
| `PAGE_OBJECT_GAP` | Page object method missing, selector outdated, doesn't match current DOM |
173+
| `MOCK_ISSUE` | cy.intercept not matching the actual API call, response shape incorrect, query parameter mismatch |
174+
| `CODE_REGRESSION` | Test was passing before, UI behavior genuinely changed — the source code has a bug |
175+
176+
**Key signals for test/code issues:**
177+
- `AssertionError: Timed out retrying` on application-specific selectors (not infra selectors)
178+
- `Expected X to equal Y` where the assertion logic is wrong
179+
- Failures only in specific test scenarios, not across the board
180+
- Screenshot shows the UI rendered correctly but test expected something different
181+
182+
### Step 6: Correlate with Git Commits
183+
184+
Using the PR commit SHA and base commit SHA from Step 2:
185+
186+
1. **Check local git history**: Run `git log {base_sha}..{pr_sha} --oneline` to see what changed in the PR
187+
2. **Identify relevant changes**: Run `git diff {base_sha}..{pr_sha} --stat` to see which files were modified
188+
3. **For CODE_REGRESSION failures**: Check if the failing component's source code was modified in the PR
189+
4. **For TEST_BUG failures**: Check if the test itself was modified in the PR (new test might have a bug)
190+
191+
Present the correlation:
192+
```
193+
Commit correlation for {test_name}:
194+
PR modified: src/components/incidents/IncidentChart.tsx (+45, -12)
195+
Test file: cypress/e2e/incidents/01.incidents.cy.ts (unchanged)
196+
Verdict: CODE_REGRESSION - chart rendering changed but test expectations not updated
197+
```
198+
199+
Or:
200+
```
201+
Commit correlation for {test_name}:
202+
PR modified: cypress/e2e/incidents/regression/01.reg_filtering.cy.ts (+30, -5)
203+
Source code: src/components/incidents/ (unchanged)
204+
Verdict: TEST_BUG - new test code has incorrect assertion
205+
```
206+
207+
### Step 7: Produce Structured Report
208+
209+
Output a structured report with this format:
210+
211+
```
212+
# CI Analysis Report
213+
214+
## Run: PR #{PR} - {TITLE}
215+
- Commit: {SHORT_SHA} by {AUTHOR}
216+
- Branch: {BRANCH}
217+
- Result: {RESULT}
218+
- Duration: {DURATION}
219+
220+
## Summary
221+
- Total specs: N
222+
- Passed: N
223+
- Failed: N (M infra, K test/code)
224+
225+
## Infrastructure Issues (not actionable via test changes)
226+
227+
### INFRA_CLUSTER: Certificate expired
228+
- Affected: ALL tests (cascade failure)
229+
- Detail: x509 certificate expired at {timestamp}
230+
- Action needed: Cluster certificate renewal (outside test scope)
231+
232+
## Test/Code Issues (actionable)
233+
234+
### TEST_BUG: Selector timeout in filtering test
235+
- Spec: regression/01.reg_filtering.cy.ts
236+
- Test: "should filter incidents by severity"
237+
- Error: Timed out retrying after 80000ms: Expected to find element: [data-test="severity-filter"]
238+
- Screenshot: [description of what screenshot shows]
239+
- Commit correlation: Test file was modified in this PR (+30 lines)
240+
- Suggested fix: Update selector to match current DOM structure
241+
242+
### CODE_REGRESSION: Chart not rendering after component refactor
243+
- Spec: regression/02.reg_ui_charts_comprehensive.cy.ts
244+
- Test: "should display incident bars in chart"
245+
- Error: Expected 5 bars, found 0
246+
- Screenshot: Chart area is empty, no error messages visible
247+
- Commit correlation: src/components/incidents/IncidentChart.tsx was refactored
248+
- Suggested fix: Investigate chart rendering logic in the refactored component
249+
250+
## Flakiness Indicators
251+
- If a test failed with a timing-related error but similar tests in the same suite passed,
252+
flag it as potentially flaky
253+
- If the error message contains "Timed out retrying" on an element that should exist,
254+
it may be a race condition rather than a missing element
255+
256+
## Recommendations
257+
- List prioritized next steps
258+
- For infra issues: what needs to happen before tests can run
259+
- For test/code issues: which fixes to attempt first (quick wins vs complex)
260+
- Whether local reproduction is recommended
261+
```
262+
263+
### Step 8: If `focus` parameter is provided
264+
265+
Filter the analysis to only the relevant tests. For example:
266+
- `focus=regression` -> only analyze `regression/*.cy.ts` specs
267+
- `focus=filtering` -> only analyze tests with "filter" in their name
268+
- `focus=01.incidents` -> only analyze `01.incidents.cy.ts`
269+
270+
Still fetch all metadata and provide the full context, but limit detailed diagnosis to the focused area.
271+
272+
## Notes for the Orchestrator
273+
274+
When this skill is used as the first step of `/cypress:test-iteration:iterate-incident-tests`:
275+
276+
1. **If all failures are INFRA_***: Report to user and STOP. No test changes will help.
277+
2. **If mixed INFRA_* and TEST/CODE**: Report infra issues to user, proceed with test/code fixes only.
278+
3. **If all failures are TEST/CODE**: Proceed with the full iteration loop.
279+
4. **The commit correlation** tells the orchestrator whether to focus on fixing tests or investigating source code changes.
280+
5. **Screenshots** give the Diagnosis Agent a head start — it can reference the CI screenshot analysis instead of reproducing the failure locally first.

0 commit comments

Comments
 (0)