Skip to content

Commit c9b94cb

Browse files
feat: add JSON CLI fallback for PET server failures (microsoft#1450)
## Summary When the PET (Python Environment Tools) JSON-RPC server fails to start or crashes beyond all restart attempts, environment discovery currently stops entirely. This PR adds a transparent fallback that uses PET's CLI `--json` mode to continue environment discovery even when the server mode is fully exhausted. ## Problem The extension communicates with `pet server` — a long-lived process over stdin/stdout using JSON-RPC 2.0. The server already has robust retry logic (3 restarts with exponential backoff: 1s, 2s, 4s). But when all 3 restart attempts fail, the extension throws an error and environment discovery stops. Users see no Python environments and cannot select an interpreter until they manually reload the window. ## Solution PET already has a CLI `--json` mode that does the same discovery work as a one-shot subprocess: - `pet find --json [paths] [flags]` — discovers all environments, outputs a single JSON object to stdout - `pet resolve <exe> --json` — resolves a single executable to full environment metadata These CLI commands output the exact same JSON shape as the JSON-RPC notifications (same Rust structs, same `camelCase` serialization), so the existing TypeScript types (`NativeEnvInfo`, `NativeEnvManagerInfo`) work without modification. The fallback is **entirely internal** to `NativePythonFinderImpl`. All callers (`condaUtils`, `venvUtils`, `poetryUtils`, etc.) call the same `refresh()` and `resolve()` interface methods and receive the same return types — they never know whether the data came from the server or the CLI. ## What Changed ### `src/managers/common/nativePythonFinder.ts` **New constant** - `CLI_FALLBACK_TIMEOUT_MS = 120_000` (2 minutes) — generous budget for a full one-shot process scan **`resolve()` — restructured with outer fallback catch** `ensureProcessRunning()` is now inside the outer `try` block. When all 3 server restarts are exhausted, `ensureProcessRunning()` throws, the outer `catch` checks `isServerExhausted()`, and diverts to `resolveViaJsonCli()` instead of propagating the error to callers. **`doRefresh()` — two new fallback checkpoints** After the retry loop exhausts all attempts, the catch block checks `isServerExhausted()` and calls `refreshViaJsonCli()` instead of rethrowing. A second check before the final `throw lastError` acts as a safety net. **New private method: `buildConfigurationOptions()`** Extracts the config-building logic out of `configure()` so both the server mode and CLI fallback build the same `ConfigurationOptions` from the same source (workspace dirs, conda/poetry/pipenv paths, cache dir, extra env dirs). **New private method: `isServerExhausted()`** Returns `true` only when `restartAttempts >= 3` AND the process is currently down (`startFailed || processExited`) AND no restart is currently in progress (`!isRestarting`). The `isRestarting` guard prevents concurrent callers from prematurely bypassing to CLI fallback while a valid restart is still underway. **New private method: `runPetCliProcess(args, timeoutMs)`** Spawns the PET binary directly (no shell, to avoid injection risks from user-supplied paths), collects stdout, routes stderr to debug output, and enforces a hard timeout via SIGTERM → SIGKILL. A `settled` boolean guards against double-settling the returned Promise when the timeout and `close`/`error` handlers race each other. **New private method: `refreshViaJsonCli(options)`** Fallback for `refresh()`. Calls `buildConfigurationOptions()`, assembles CLI args via `buildFindCliArgs()`, spawns PET, parses the `{ managers, environments }` JSON output, and — mirroring server mode's inline resolve behavior — calls `resolveViaJsonCli()` for any environments with an executable but missing `version` or `prefix`. Resolves these incomplete environments **in parallel** via `Promise.all`, matching the throughput of server mode's `Promise.all(unresolved)` pattern. **New private method: `resolveViaJsonCli(executable)`** Fallback for `resolve()`. Runs `pet resolve <exe> --json`, handles the `null` output case (PET couldn't identify the environment), handles malformed JSON separately from the not-found case, and logs all spawn failures to the output channel. **New exported functions (for testability)** - `buildFindCliArgs(config, options?, venvFolders?)` — maps `ConfigurationOptions` + refresh options to a `string[]` of CLI arguments - `parseRefreshCliOutput(stdout)` — parses `pet find --json` output; rejects non-object JSON (including arrays and primitives) - `parseResolveCliOutput(stdout, executable)` — parses `pet resolve --json` output; rejects non-object JSON (including arrays and primitives) separately from the `null` (not-found) case **`ConfigurationOptions` type** — changed from module-private `type` to `export type` to allow test files to construct configs directly. ### `src/common/telemetry/constants.ts` New telemetry event `PET.JSON_CLI_FALLBACK` with properties: - `operation: 'refresh' | 'resolve'` — which code path used the fallback - `result: 'success' | 'error'` — outcome - `<duration>` (measurement) — milliseconds taken This lets us track how often server mode fails in the wild and whether the CLI fallback reliably recovers. ### `src/test/managers/common/nativePythonFinder.jsonCli.unit.test.ts` _(new file)_ 47 unit tests covering the three exported pure functions: | Suite | What it tests | |---|---| | `buildFindCliArgs — no options` | Workspace dirs as positional args, no spurious flags | | `buildFindCliArgs — kind filter` | `--kind` flag, workspace dirs still included, no `--workspace` | | `buildFindCliArgs — Uri[] options` | `--workspace` only when paths are non-empty; empty `Uri[]` + empty `venvFolders` falls back to workspace dirs; URI fsPaths as positional; venvFolders appended; workspace dirs excluded when paths provided | | `buildFindCliArgs — configuration flags` | Each config field maps to the correct CLI flag; missing fields are omitted; `--environment-directories` repeated per-dir (not comma-joined) | | `buildFindCliArgs — edge cases` | Paths with spaces, comma-containing env dir paths passed safely, venvFolders not added for kind options | | `parseRefreshCliOutput — valid JSON` | Well-formed output, missing keys default to `[]`, multiple envs, field preservation | | `parseRefreshCliOutput — error cases` | Malformed JSON, empty string, JSON `null`, JSON primitive | | `parseResolveCliOutput — valid JSON` | Full field preservation | | `parseResolveCliOutput — null` | Throws with message that identifies the executable | | `parseResolveCliOutput — malformed` | `SyntaxError` on non-JSON, empty string, partial JSON | ## Behavior Notes - **No streaming in CLI mode.** Server mode streams `environment` notifications one at a time as PET discovers them. CLI mode blocks until the full scan finishes, then returns everything at once. Environments won't appear gradually in the fallback path. This is acceptable because the fallback is a last resort. - **`--environment-directories` uses repeated flags** — each directory is passed as a separate `--environment-directories <dir>` argument instead of comma-joining, so paths containing commas are handled correctly on both POSIX and Windows. - **`--workspace` flag** is used when `options` is a non-empty `Uri[]` (or `Uri[]` + non-empty `venvFolders`), mirroring the server mode's `search_scope = Workspace` behavior (skips global locators, searches only the provided paths). When both are empty, `--workspace` is omitted to avoid PET falling back to scanning the current working directory. - **`--kind` with workspace dirs** mirrors server mode's behavior where `search_kind` keeps the configured workspace dirs so workspace-scoped environments of that kind (e.g. a `Venv` inside the project) are still found. ## Testing All existing tests pass. New unit tests added: 47 tests covering `buildFindCliArgs`, `parseRefreshCliOutput`, and `parseResolveCliOutput`.
1 parent ccd8daf commit c9b94cb

3 files changed

Lines changed: 908 additions & 33 deletions

File tree

src/common/telemetry/constants.ts

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,15 @@ export enum EventNames {
113113
* - errorType: string (classified error category, on failure only)
114114
*/
115115
MANAGER_LAZY_INIT = 'MANAGER.LAZY_INIT',
116+
/**
117+
* Telemetry event fired when the JSON CLI fallback is used for environment discovery.
118+
* Triggered when the PET JSON-RPC server mode is exhausted after all restart attempts.
119+
* Properties:
120+
* - operation: 'refresh' | 'resolve'
121+
* - result: 'success' | 'error'
122+
* - duration: number (milliseconds taken for the CLI operation)
123+
*/
124+
PET_JSON_CLI_FALLBACK = 'PET.JSON_CLI_FALLBACK',
116125
}
117126

118127
// Map all events to their properties
@@ -403,4 +412,16 @@ export interface IEventNamePropertyMapping {
403412
toolSource: string;
404413
errorType?: string;
405414
};
415+
416+
/* __GDPR__
417+
"pet.json_cli_fallback": {
418+
"operation": { "classification": "SystemMetaData", "purpose": "FeatureInsight", "owner": "StellaHuang95" },
419+
"result": { "classification": "SystemMetaData", "purpose": "FeatureInsight", "owner": "StellaHuang95" },
420+
"<duration>": { "classification": "SystemMetaData", "purpose": "FeatureInsight", "isMeasurement": true, "owner": "StellaHuang95" }
421+
}
422+
*/
423+
[EventNames.PET_JSON_CLI_FALLBACK]: {
424+
operation: 'refresh' | 'resolve';
425+
result: 'success' | 'error';
426+
};
406427
}

0 commit comments

Comments
 (0)