Skip to content

Commit 4a47495

Browse files
JulieLeeMSFTewhapdxCopilotCopilotkg
authored
Add ci-pipeline-monitor Copilot CLI skill (#125809)
Automates monitoring of CI test pipelines on Azure DevOps, triaging failures, and generating weekly reports with GitHub issue tracking. It currently monitors 20+ JIT pipelines only. It needs to expand to other pipelines. --------- Co-authored-by: Julie Lee <jeonlee@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Katelyn Gadd <kg@luminance.org>
1 parent 4585910 commit 4a47495

16 files changed

+3142
-0
lines changed
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Generated during monitoring runs — recreated each time the skill runs
2+
logs/
3+
helix-logs/
4+
scripts/monitor.db
5+
scripts/__pycache__/
6+
7+
# Temporary ad-hoc scripts (created during triage, deleted after use)
8+
temp/
9+
10+
# Intermediate JSON output files (piped between scripts)
11+
failing_builds.json
12+
failed_tests.json
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# CI Pipeline Monitor
2+
3+
A Copilot CLI skill that automates monitoring CI stress/PGO test pipelines on
4+
Azure DevOps, triaging failures, and coordinating with GitHub issue tracking.
5+
6+
## What It Does
7+
8+
- Monitors 20+ CI test pipelines (`dnceng-public/public`, `main` branch)
9+
- Extracts every test failure via the AzDO Test Results API
10+
- Downloads full Helix console logs for each failure
11+
- Triages failures: classifies, groups by root cause, searches for matching GitHub issues
12+
- Generates a formatted weekly report with action items
13+
- Can bisect regressions and file new GitHub issues
14+
15+
## Prerequisites
16+
17+
1. **GitHub Copilot CLI**`winget install GitHub.Copilot` (Windows) or
18+
`brew install copilot-cli` (macOS/Linux)
19+
2. **Python 3.8+** with `requests`:
20+
```bash
21+
pip install requests
22+
```
23+
3. **Azure CLI** — for AzDO Test Results API authentication:
24+
```bash
25+
az login
26+
```
27+
28+
## One-Time Setup
29+
30+
1. **Clone dotnet/runtime** (or use your existing clone):
31+
```bash
32+
git clone https://github.com/dotnet/runtime.git
33+
cd runtime
34+
```
35+
36+
2. **Launch Copilot CLI** from the runtime repo root:
37+
```bash
38+
copilot
39+
```
40+
41+
3. **Verify the skill is available:**
42+
```
43+
/skills
44+
```
45+
You should see `ci-pipeline-monitor` listed.
46+
47+
## Usage
48+
49+
### Invoke the Skill
50+
51+
In Copilot CLI, type:
52+
```
53+
/ci-pipeline-monitor
54+
```
55+
56+
Or ask naturally — Copilot will detect and invoke the skill automatically:
57+
- "Check the CI test pipelines"
58+
- "Generate the weekly CI test report"
59+
60+
### What Happens
61+
62+
The skill runs the full pipeline end-to-end:
63+
1. Fetches latest builds from all 20+ monitored pipelines
64+
2. Extracts failed tests and downloads Helix console logs
65+
3. Triages each failure (classifies, groups by root cause, searches GitHub)
66+
4. Generates a formatted report with action items
67+
68+
## Authentication
69+
70+
No manual token or credential setup is needed. The skill handles authentication
71+
automatically:
72+
73+
- **AzDO Builds API + Helix API** — public, no auth required
74+
- **AzDO Test Results API** — uses `az account get-access-token` (requires
75+
`az login` from prerequisites)
76+
- **GitHub API (triage)** — the agent uses GitHub MCP tools built into Copilot
77+
CLI, authenticated via your Copilot CLI login. No separate configuration needed.
78+
- **GitHub API (validation)**`validate_results.py` spot-checks NEW failures
79+
against the unauthenticated GitHub Search API (`api.github.com`). Rate-limited
80+
to 10 searches/minute, with automatic pauses between requests. No auth needed.
81+
82+
## How It Works
83+
84+
The skill combines **Python scripts** (deterministic data collection) with
85+
**agent triage** (non-deterministic analysis):
86+
87+
| Step | What | Run By | APIs / Tools |
88+
|------|------|--------|-------------|
89+
| 1. Resolve Pipeline Definitions | Resolve missing def IDs, update `pipelines.md` | Agent | AzDO Definitions API (no auth) |
90+
| 2. Fetch Latest Builds | Create DB, fetch latest build per pipeline | Script (`setup_and_fetch_builds.py`) | AzDO Builds API (no auth) |
91+
| 3. Extract Failed Tests and Fetch Logs | Extract failed test methods, download Helix console logs | Script (`extract_failed_tests.py`, `fetch_helix_logs.py`) | AzDO Test Results API (Bearer token), Helix API (no auth) |
92+
| 4. Triage Failures | Read logs, extract errors verbatim, classify, group, search GitHub | Agent | GitHub MCP (`search_issues`, `issue_read`) |
93+
| 5. Validate DB | Validate DB completeness and accuracy | Script (`validate_results.py`) | GitHub Search API (unauthenticated spot-checks) |
94+
| 6. Generate Report | Generate markdown report from DB | Script (`generate_report.py`) | None (reads DB only) |
95+
| 7. Bisect Regressions | Identify regressing commit/PR (on request) | Agent | GitHub MCP (`list_commits`, `search_pull_requests`) |
96+
97+
Generated output (logs, reports, DB) stays local — nothing is committed to the repo.

0 commit comments

Comments
 (0)