Skip to content

Commit 35660d6

Browse files
authored
Reduce aw-failure-investigator issue churn by prioritizing closure and reusing parent tracking (#26795)
1 parent a5c7a94 commit 35660d6

File tree

2 files changed

+92
-33
lines changed

2 files changed

+92
-33
lines changed

.github/workflows/aw-failure-investigator.lock.yml

Lines changed: 75 additions & 20 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.github/workflows/aw-failure-investigator.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
description: Investigates [aw] failures from the last 6 hours, correlates with open agentic-workflows issues, and opens a parent report with fix sub-issues
2+
description: Investigates [aw] failures from the last 6 hours, correlates with open agentic-workflows issues, closes fixed issues, and opens focused fix sub-issues when needed
33
on:
44
schedule:
55
- cron: "every 6h"
@@ -22,10 +22,13 @@ safe-outputs:
2222
expires: 7d
2323
title-prefix: "[aw-failures] "
2424
labels: [agentic-workflows, automation, cookie]
25-
max: 8
25+
max: 2
2626
group: true
27+
update-issue:
28+
target: "*"
29+
max: 10
2730
link-sub-issue:
28-
max: 20
31+
max: 10
2932
noop:
3033
timeout-minutes: 60
3134
imports:
@@ -49,7 +52,7 @@ Investigate agentic workflow failures from the last 6 hours and produce actionab
4952
1. Find recent failures from agentic workflows in the last 6 hours.
5053
2. Correlate findings with currently open `agentic-workflows` issues.
5154
3. Perform large-scale failure analysis using logs + audit + audit-diff.
52-
4. Create one parent report issue and linked sub-issues proposing concrete fixes.
55+
4. Close fixed/stale issues first, then create only the minimum necessary linked fix sub-issues.
5356

5457
## Required Investigation Steps
5558

@@ -91,16 +94,15 @@ Use `agentic-workflows` MCP `audit-diff` to compare:
9194

9295
Identify regressions and deltas (metrics/tooling/firewall/MCP behavior) that support fix recommendations.
9396

94-
### 5) Create parent report issue + sub-issues
97+
### 5) Close fixed issues first, then add focused sub-issues
9598

96-
Create a **single parent report issue** with a temporary ID (format `aw_` + 3-8 alphanumeric characters) summarizing:
97-
- observed failure clusters in last 6h
98-
- links to analyzed run IDs
99-
- evidence from logs/audit/audit-diff
100-
- mapping to existing open issues (duplicate / related / new)
101-
- prioritized fix plan
99+
First, identify currently open `agentic-workflows` issues that are now fixed, stale, or no longer actionable based on fresh evidence, and close them using `update-issue`.
102100

103-
Then create **sub-issues** (linked to the parent) for concrete fixes. Each sub-issue must include:
101+
Then, if new uncovered work remains, add **sub-issues** for concrete fixes to the **most recent open parent report issue** instead of creating a new parent by default.
102+
103+
Only create a new parent report issue (temporary ID format `aw_` + 3-8 alphanumeric characters) when **P0 failures have no existing tracking coverage**.
104+
105+
Each new sub-issue must include:
104106
- clear problem statement
105107
- affected workflows and run IDs
106108
- probable root cause
@@ -128,7 +130,9 @@ Include these sections:
128130
## Decision Rules
129131

130132
- If there are **no failures** in the last 6h, or no actionable delta vs existing issues, call `noop` with a concise reason.
131-
- If failures exist but are already fully tracked, update by creating a minimal parent report that links to existing issues and only create new sub-issues for uncovered gaps.
133+
- If failures exist but are already fully tracked, prefer closing stale/fixed issues and avoid creating new issues.
134+
- Only create a new parent report issue when P0 failures have no existing tracking coverage.
135+
- Prefer closing stale/fixed issues over creating new issues when issue volume is high.
132136
- Always be explicit about confidence and unknowns.
133137

134138
**Important**: If no action is needed after completing your analysis, you **MUST** call the `noop` safe-output tool with a brief explanation.

0 commit comments

Comments
 (0)