Skip to content

bun test: add --isolate and --parallel#29354

Open
Jarred-Sumner wants to merge 107 commits intomainfrom
claude/isolated-parallel-test
Open

bun test: add --isolate and --parallel#29354
Jarred-Sumner wants to merge 107 commits intomainfrom
claude/isolated-parallel-test

Conversation

@Jarred-Sumner
Copy link
Copy Markdown
Collaborator

@Jarred-Sumner Jarred-Sumner commented Apr 15, 2026

Summary

Adds two flags to bun test:

  • --isolate runs each test file in a fresh ZigGlobalObject on the same JSC::VM. Between files the runner drains microtasks, walks usockets contexts to close all sockets, closes FSWatcher/StatWatcher, cancels timers, kills subprocesses, bumps a generation counter (which setTimeout/setInterval/AbortSignal.timeout check before firing), unprotects the old global, and creates a new one via Zig__GlobalObject__createForTestIsolation. --preload re-executes in each fresh global. A VM-level path → SourceProvider cache (IsolatedModuleCache.{h,cpp}) means shared dependencies are transpiled once per process; module_info is generated at runtime so cached providers skip the analyze parse via Bun__analyzeTranspiledModule. delete require.cache[key] evicts.

  • --parallel[=N] runs a coordinator with up to N worker processes (default = CPU count, configurable via --parallel-delay=MS for the lazy-spawn threshold). Files are sorted lexicographically and partitioned into K contiguous FileRanges for cache locality; idle workers stealBackHalf() from the largest remaining range. Workers run with --isolate between files. The version banner shows N× PARALLEL.

    IPC is a single duplex channel (parallel/Channel.zig): usockets-adopted socketpair on POSIX, libuv uv_pipe_t (ipc=1) on Windows — same dance as process.send(). Length-prefixed binary frames (parallel/Frame.zig), event-loop-driven both directions, send-side truncates oversized payloads instead of triggering channel close. Dead workers re-queue their in-flight file once. Cross-worker --bail stops dispatching at file granularity. Snapshot writes flush per worker. JUnit and LCOV coverage merge across workers; coverage thresholds enforced regardless of reporter; crashed files get a synthetic <testsuite> so the merged XML stays schema-valid. Per-file --randomize shuffle is seeded by hash(basename, global_seed) so the printed seed reproduces order regardless of which worker ran which file.

    All transpiler/resolver flags (--define, --loader, --tsconfig-override, --conditions, --drop, --jsx-*, --no-addons, --no-macros, --no-env-file, --env-file, --feature, --preserve-symlinks, etc.) are forwarded to workers.

    Coordinator SIGINT/SIGTERM kills the worker process group silently. Linux: prctl(PR_SET_PDEATHSIG) between vfork/exec in bun-spawn.cpp (new linux_pdeathsig SpawnOption). Windows: KILL_ON_JOB_CLOSE Job Object (recursive). macOS: process-group kill covers Ctrl-C; SIGKILL leaves grandchildren until stdin EOF.

Output is identical to serial: per-test results stream as they complete, with each test's console.log/console.error buffered and flushed atomically before its result line so files never interleave mid-test. --dots works.

Benchmarks (release, M-series macOS)

Suite bun vitest ×
--parallel (32 files × 4 tests, ~100ms/file) 299 ms 1.25 s 4.2×
--isolate cache (30 files × 2.5 MB shared module) 42 ms 1.79 s 42.5×
--isolate cache on vs BUN_FEATURE_FLAG_DISABLE_ISOLATION_SOURCE_CACHE=1 42 ms vs 247 ms 5.9×

Benchmarks live in bench/test/; the isolate-cache suite fails with a clear box error if run without --isolate so you can't accidentally benchmark the wrong thing.

Key entry points

  • src/bun.js/bindings/ZigGlobalObject.cppZig__GlobalObject__createForTestIsolation
  • src/bun.js/bindings/IsolatedModuleCache.{h,cpp} — single canUse()/isTagCacheable()/lookup()/insert() (with ASSERT(isNewEntry))/evict()/clear()
  • src/bun.js/bindings/ModuleLoader.cpp, JSCommonJSModule.cpp, BunAnalyzeTranspiledModule.cpp — cache integration; module_info now generated at runtime (was --bytecode-only)
  • src/bun.js/VirtualMachine.zigswapGlobalForTestIsolation(), pending_internal_promise as jsc.Strong
  • src/cli/test/parallel/{Coordinator,Worker,Channel,Frame,FileRange,runner,aggregate}.zig
  • src/bun.js/bindings/bun-spawn.cpp, src/bun.js/api/bun/process.ziglinux_pdeathsig, new_process_group
  • packages/bun-usockets/src/loop.c, epoll_kqueue.ctick_depth to defer free_closed_sockets to outermost tick (pre-existing reentrancy UAF, exposed by expect().toThrow()waitForPromise)

Also fixed in passing

  • JSC__JSInternalPromise__resolvedPromise was a type lie (called base JSPromise::resolvedPromise, reinterpret-cast result); jsDynamicCast exposed it
  • import * as ns; export { ns } was recorded as Local export instead of Namespace re-export in module_info
  • Zig::SourceProvider held a raw m_globalObject pointer that dangled when the provider outlived its creating global

Known limits

  • The fetch keep-alive pool lives on HTTPThread with its own uws loop and isn't drained on swap; bounded at ~128 fds and JS-invisible.
  • % Funcs coverage under --parallel takes the per-worker max instead of a true union, because the existing LCOV writer doesn't emit FN/FNDA records. Line coverage is exact.
  • The SourceProvider cache covers ESM and CommonJS via both require() and import; plugin onLoad results are not cached (output may vary).

Test plan

  • bun bd test test/cli/test/isolation.test.ts test/cli/test/parallel.test.ts — 39 pass
  • bun bd test --parallel=4 test/js/bun/util/ — 49 files, ~45s
  • bun run zig:check-all — all targets
  • CI

Adds Zig__GlobalObject__createForTestIsolation which constructs a new
Zig::GlobalObject on the existing JSC::VM (reusing the warm heap/JIT)
and gcUnprotects the previous one so its module graph becomes
collectable.

VirtualMachine.swapGlobalForTestIsolation() drains microtasks, closes
tracked listening sockets, kills tracked subprocesses, bumps a
generation counter, and points vm.global at the fresh global.
JS-created timers capture the generation at init and self-cancel on
fire if it no longer matches, so a setInterval leaked by file N never
calls into file N+1's global.

The test runner calls swapGlobalForTestIsolation between files when
--isolate is passed. The reporter and TestRunner are Zig-side and
survive the swap so summary counts accumulate correctly.

Not yet handled (covered by future worker recycling): non-listening
socket connections, FSWatchers, --preload re-execution.
The 2-arg GlobalObject constructor calls initialIdentifier() which
asserts on second use. Creating with generateIdentifier() works but
gives the new global a context ID != 1, breaking Bun.isMainThread and
WTFTimer cross-thread dispatch which assume the main-thread context is
ID 1.

Instead, move the old context to a fresh identifier (it's about to be
collected anyway) and create the new global with the inherited ID.
Coordinator spawns N workers (bun test --test-worker --isolate), hands
out one file at a time over stdin, and reads per-file summaries back
over fd 3. Worker stdout/stderr are captured and prefixed. Workers swap
GlobalObject between files (Phase A isolation) and exit after
--isolate-recycle-after files; the coordinator respawns them and
re-queues any in-flight file once before marking it as crashed.

Summary totals (pass/fail/skip/todo/expect/files) are aggregated into
the existing CommandLineReporter so --parallel composes with the normal
exit-code and printSummary paths.
Coordinator now mirrors the worker's files_run counter and skips
assignWork() when the worker is at its recycle threshold. Previously
the next file was written to a worker's stdin just before it exited
to recycle, causing onWorkerExit to misreport the recycle as a crash
and burn the file's retry budget.
Preload scripts (custom matchers, beforeAll/beforeEach/afterAll hooks)
were only evaluated once in the first global. After swapGlobalForTestIsolation()
the new global had a fresh Expect prototype and empty hook_scope, so the
second and later files lost preload state.

- loadPreloads no longer zeroes vm.preload when test_isolation_enabled,
  so reloadEntryPointForTestRunner re-imports preloads in each new global
- BunTestRoot.resetHookScopeForTestIsolation() drops Strong<> refs to the
  old global's preload hooks before the next file re-registers them
- Under --isolate every file is treated as first=true,last=true so
  preload-level beforeAll/afterAll fire per-file (each file is a complete
  isolated run from preload's perspective)
- Worker path in ParallelRunner.zig gets the same treatment
Coordinator now buffers each worker's stdout/stderr and flushes it
atomically on file_done, so per-test status lines appear contiguously
per file (matching serial output) instead of interleaving across
workers with a [worker N] prefix.

Workers ship their failures/skips/todos repeat buffers to the
coordinator (hex-encoded) before exit so the end-of-run summary
section is populated.

When --reporter=junit is set the coordinator points workers at a temp
dir via BUN_TEST_WORKER_JUNIT; each worker writes its own XML fragment
on exit and reports the path over IPC. The coordinator concatenates
fragment bodies into a single <testsuites> document with correct
totals. Works across worker recycling.

Coverage is not yet aggregated; --coverage with --parallel now warns
instead of silently producing an empty report. The merge plan is
documented inline.
Mirror security_scanner.zig's Windows pattern: coordinator creates
non-overlapped uv_pipe(0,0) pairs for stdin and the fd-3 results
channel and inherits the child ends via .pipe (UV_INHERIT_FD), so the
worker's blocking ReadFile/WriteFile work without libuv. The previous
.buffer extra_fd would set UV_OVERLAPPED_PIPE on the child end and
break the worker's sync writes. stdout/stderr stay on .buffer libuv
pipes; coordinator now reads them via startWithPipe().

Also fix WorkerPipe.loop() / Worker.loop() to use vm.uvLoop() so they
type-check on Windows (Async.Loop = uv.Loop there), and resolve fd 3
at runtime via .fromUV(3) which maps to the inherited CRT handle on
Windows and is a no-op on POSIX.
Ensures every parallel test asserts the coordinator header so they
fail under USE_SYSTEM_BUN (which lacks --parallel and falls back to
serial). Previously two tests passed on system bun because their
assertions were satisfied by serial output.
…orkers

Coordinator now builds the worker argv once via buildWorkerArgv() and
forwards every flag that affects how tests execute inside a worker:
-t/--test-name-pattern, --preload, --rerun-each, --retry, --concurrent,
--randomize, --seed, --max-concurrency, --coverage (and its sub-flags),
plus the previously-forwarded --timeout/--todo/--only/--update-snapshots/
--isolate-recycle-after. Coordinator-only flags (discovery, reporters,
--pass-with-no-tests) are intentionally not forwarded.

--bail is handled at file granularity by the coordinator: once cumulative
failures reach the threshold it stops dispatching, prints the bail
message, and lets in-flight files finish. --bail is not forwarded to
workers because the serial-mode bail path Global.exit(1)s, which the
coordinator would misread as a worker crash.
swapGlobalForTestIsolation now walks the uws loop's SocketContext list and
calls us_socket_context_close on each (skipping the IPC context), so leaked
net.Socket / Bun.connect / WebSocket clients and listening servers are torn
down before the next test file runs. Listening sockets are now closed via
this path instead of the raw-fd watch-mode list, which avoids stale poll
state.

FSWatcher and StatWatcher instances created under --isolate are tracked in
rare_data and detached/closed on swap.

Adds us_socket_context_next() to expose the loop's context linked list to
Zig.

The off-thread HTTP client keepalive pool is left for worker recycling.
… and loosen perf-test threshold

- std.c.getpid() doesn't compile on all targets; use a random u32 for the
  worker's JUnit fragment filename instead.
- The 'faster than serial' assertion flaked under heavy machine load; loosen
  to 0.9× and skip if serial timing is implausible.
Workers now enable the ControlFlowProfiler (was previously skipped because
the runAsWorker branch came first), write an LCOV fragment to a shared
temp dir on recycle/shutdown, and report the path over IPC. The
coordinator merges fragments per source file (sums DA hit counts,
recomputes LF/LH, takes max of FNF/FNH since per-function FN/FNDA records
aren't emitted yet) and produces both the lcov.info file and the text
table. BUN_TEST_WORKER_JUNIT renamed to BUN_TEST_WORKER_TMP since it now
carries both fragment kinds.
…atus

Workers now emit test_start/test_done events over IPC the moment each
test begins/finishes instead of letting the coordinator buffer the
worker's stderr until file_done. The coordinator prints each result line
immediately and, on a TTY, draws a one-line-per-worker status block at
the bottom showing the currently-running file/test with an elapsed
counter; result lines scroll above it via clear+write+redraw.

- CommandLineReporter.worker_ipc_file_idx redirects handleTestCompleted
  to workerEmitTestDone (hex-encoded formatted line) instead of stderr
- Execution.onSequenceStarted emits test_start so the live block can
  show the running test name
- CurrentFile.set suppresses the worker's own file header; coordinator
  re-emits headers whenever the source file changes so interleaved lines
  stay self-describing
- FORCE_COLOR=1 passed to workers when coordinator's stderr is a color
  terminal, so streamed lines carry the same ANSI as serial output
- worker stderr (errors, console.log) streams per-line immediately
  under the in-flight file's header instead of being held to file_done
@robobun
Copy link
Copy Markdown
Collaborator

robobun commented Apr 15, 2026

Updated 10:19 AM PT - Apr 17th, 2026

@Jarred-Sumner, your commit e47a73c has 2 failures in Build #46067 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29354

That installs a local version of the PR into your bun-29354 executable, so you can run:

bun-29354 --bun

@github-actions
Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. implement --file-parallelism flag for running test files in parallel … #28715 - Also implements parallel test file execution (--file-parallelism), touching 5 of the same files; bun test: add --isolate and --parallel #29354 is a superset with a multi-process coordinator/worker design and adds --isolate

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds per-file test isolation and a coordinator/worker parallel test runner. Introduces VM global swapping, isolation-generation guards for timers, watcher/socket tracking and cleanup, a uWS socket-context iterator, CLI flags for isolation/parallelism, IPC for worker events, and end-to-end tests for isolation and parallel execution.

Changes

Cohort / File(s) Summary
uWS Socket Context Iteration
packages/bun-usockets/src/context.c, packages/bun-usockets/src/libusockets.h, src/deps/uws/SocketContext.zig
Added C export us_socket_context_next() and Zig extern/wrapper SocketContext.next() to iterate socket contexts on a loop.
VM Test Isolation Core
src/bun.js/VirtualMachine.zig
Added test_isolation_enabled, test_isolation_generation, pending_internal_promise_is_protected, and swapGlobalForTestIsolation() to drain microtasks, close/cleanup contexts/watchers (skip IPC context), bump generation, reset per-run state, and install a fresh global; adjusted preload and watch-mode socket-tracking to respect isolation.
Global Creation for Isolation (bindings)
src/bun.js/bindings/JSGlobalObject.zig, src/bun.js/bindings/ZigGlobalObject.cpp
Added Zig wrapper and new C++ API Zig__GlobalObject__createForTestIsolation(...) to create a new JS global on the same VM, reuse/regenerate the script execution context id, transfer GC protection, and handle allocation failures.
Timer Generation Guarding
src/bun.js/api/Timer/TimerObjectInternals.zig
Added generation: u32 to timer internals; timers skip firing/runImmediate when their generation mismatches VM test_isolation_generation.
Process Exit Handling Extension
src/bun.js/api/bun/process.zig
Extended ProcessExitHandler tagged-pointer union to include TestWorkerHandle and dispatch TestWorkerHandle.onProcessExit on that tag.
Watcher Lifecycle Hooks for Isolation
src/bun.js/node/node_fs_watcher.zig, src/bun.js/node/node_fs_stat_watcher.zig
Conditionally register/deregister FS and Stat watchers into isolation-specific lists when test_isolation_enabled to enable per-file cleanup.
Isolation Resource Tracking
src/bun.js/rare_data.zig
Added fs_watchers_for_isolation and stat_watchers_for_isolation lists with add/remove helpers and closeAllWatchersForIsolation() to drain and close tracked watchers.
Test Hooks / Reporting Adjustments
src/bun.js/test/Execution.zig, src/bun.js/test/bun_test.zig, src/bun.js/test/jest.zig
Made onSequenceStarted an instance method to emit worker IPC test-start; added resetHookScopeForTestIsolation() to recreate root DescribeScope between files; suppressed worker-side filename/header printing in worker mode.
CLI Flags & Parsing
src/cli.zig, src/cli/Arguments.zig
Added --isolate, --parallel [NUMBER], --isolate-recycle-after, and internal --test-worker options; parsing/validation for numeric args and forcing isolate when parallel is used.
Test Command Orchestration
src/cli/test_command.zig
Enable VM isolation during tests, add reporter worker_ipc_file_idx routing for per-test IPC, add writeLcovOnly(), integrate ParallelRunner coordinator/worker paths, swap globals and reset hooks between isolated files, and expose a top-level test error handler.
Parallel Runner (coordinator & worker)
src/cli/test/ParallelRunner.zig
New coordinator/worker module: coordinator spawns workers, dispatches files via line protocol, aggregates IPC events and merges JUnit/LCOV; workers run files under isolation, emit per-test/file events, optionally recycle, and support crash/retry semantics.
End-to-end Tests (isolation & parallel)
test/cli/test/isolation.test.ts, test/cli/test/parallel.test.ts
Added integration tests exercising --isolate semantics (global/module resets, preload behavior, leaked-handle and watcher cleanup) and --parallel coordination, crash/recycle behavior, streaming/formatting, and merged JUnit/LCOV outputs.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'bun test: add --isolate and --parallel' clearly and specifically summarizes the two primary features being added to the bun test command.
Description check ✅ Passed PR description provides comprehensive technical detail on --isolate and --parallel flags, benchmarks, implementation entry points, known limits, and test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/bun-usockets/src/libusockets.h`:
- Around line 321-323: The declaration comment for us_socket_context_next lacks
thread-safety guidance; update it to state that callers must hold the
loop->data.mutex (the loop's context list mutex used in loop.c) while iterating
with us_socket_context_next to avoid races with context unlink/free paths, or
otherwise synchronize externally, and note that the iterator returns NULL at the
end and does not take ownership or pin contexts against concurrent frees.

In `@src/bun.js/rare_data.zig`:
- Around line 303-315: The append() calls in addFSWatcherForIsolation and
addStatWatcherForIsolation currently swallow all errors (append(... ) catch {}),
which loses watcher registration on OOM; replace the empty catch with proper
error handling that calls bun.handleOom(err) to convert OutOfMemory into a crash
and does not silently ignore other errors (i.e., after bun.handleOom(err)
propagate or fail for non-OOM errors rather than returning silently) so watcher
registration never gets silently dropped.

In `@src/bun.js/test/Execution.zig`:
- Around line 535-547: The inline import of ParallelRunner.zig inside the
function onSequenceStarted is illegal; move the import to file scope and use the
imported symbol inside the function. Add a top-of-file import like const
ParallelRunner = `@import`("../../cli/test/ParallelRunner.zig"); then replace the
inline `@import` call in onSequenceStarted with
ParallelRunner.workerEmitTestStart(file_idx, entry.base.name orelse
"(unnamed)"); leave the rest of onSequenceStarted, Execution, and
workerEmitTestStart usage unchanged.

In `@src/bun.js/VirtualMachine.zig`:
- Around line 2419-2422: The code clears pending_internal_promise without
undoing any prior protect() calls, which can leak the old global; modify the
cleanup so that before setting this.pending_internal_promise = null you call the
matching unprotect() for the stored promise (or only clear the slot if it was
never protect()ed), ensuring the code paths that store to
pending_internal_promise (the preload-failure branches that call protect()) are
paired with an unprotect() here (or use a helper that conditionally
unprotects-and-clears) so the promise/global can be collected.

In `@src/cli/Arguments.zig`:
- Around line 625-642: The current parsing of --parallel and
--isolate-recycle-after accepts 0 which yields silent fallback behavior; update
the parsing logic for args.option("--parallel") and
args.option("--isolate-recycle-after") so parsed u32 values must be > 0 (for
--parallel still allow empty to fall back to `@max`(bun.getThreadCount(), 1)), and
if parse succeeds but value == 0 call Output.prettyErrorln with an explanatory
message (e.g. "<red>error<r>: --parallel expects a positive number" or
"<red>error<r>: --isolate-recycle-after expects a positive number") and then
Global.exit(1); ensure you reference and validate ctx.test_options.parallel and
ctx.test_options.isolate_recycle_after after std.fmt.parseInt and before
assignment so zero is explicitly rejected.

In `@src/cli/test/ParallelRunner.zig`:
- Around line 883-931: buildWorkerArgv currently only forwards bare "--coverage"
and "--concurrent", so worker processes use defaults for coverage and
concurrency sub-options (e.g., coverage.ignore_patterns,
coverage.ignore_sourcemap, coverage.skip_test_files, concurrent_test_glob);
update buildWorkerArgv to append the full set of worker-affecting options from
ctx.test_options: for opts.coverage fields append flags/values for
ignore_patterns, ignore_sourcemap, skip_test_files, and any other coverage.*
fields that exist, and for concurrency append the concurrent_test_glob (and any
related concurrency settings) using printZ/dupeZ as done elsewhere; reference
buildWorkerArgv, opts.coverage, opts.concurrent, and the printZ/arena.dupeZ
helpers to locate where to add these argv entries and ensure null terminator
remains intact.
- Around line 584-597: The worker temp dir creation failure currently just logs
an error and continues, leaving worker_tmpdir null while the coordinator JUnit
reporter is still disabled; change the error branch of std.fs.cwd().makePath to
abort parallel-artifact mode instead of continuing—either return the error up
the call (propagate/try) or set/clear a flag that disables parallel artifact
behavior, ensure you do not deinit reporter.reporters.junit when dir creation
fails, and avoid inserting BUN_TEST_WORKER_TMP into vm.transpiler.env.map;
update handling around worker_tmpdir, vm.transpiler.env.map.put, and
reporter.reporters.junit so that failing to create the temp dir prevents
entering the parallel JUnit/coverage path and preserves the coordinator reporter
state.
- Around line 1006-1028: When sending the "file_done" message from the worker
(the block that constructs Runner, calls vm.runWithAPILock(Runner, &runner,
Runner.begin), computes after = reporter.summary() and builds msg then
writeIpcLine(msg)), also flush that file's JUnit/LCOV fragments immediately
instead of waiting for workerFlushAggregates() on recycle/exit; specifically
invoke the same logic that emits "junit_file"/"coverage_file" (or call
workerFlushAggregates() scoped to just the current file's aggregates) right
after computing after and before writeIpcLine(msg) so completed files’ artifacts
are sent per-file and not lost if the worker crashes.
- Around line 297-301: The code in Worker.dispatch sets this.inflight before
calling std.fmt.bufPrint which can fail and leave inflight set while no message
was sent; change the flow so you only set this.inflight after successfully
formatting and sending the message: call std.fmt.bufPrint(...) and check its
result, then call this.send(line) and only if both succeed assign this.inflight
= file_idx (or, alternatively, on any formatting/send error reset this.inflight
back to an invalid sentinel); update the dispatch function (references:
dispatch, this.inflight, std.fmt.bufPrint, this.send) so no path leaves inflight
set when no run message was delivered.

In `@test/cli/test/isolation.test.ts`:
- Around line 184-185: Hoist the dynamic imports of node:net and node:fs out of
the test body into module scope: replace the in-test await import("node:net")
and await import("node:fs") with top-level imports (e.g., const net = await
import(...) converted to import/net and import/fs at module scope) so the test
file uses module-scope imports for net and fs; update references in this file
(test/cli/test/isolation.test.ts) to use those module-scope symbols instead of
performing dynamic imports inside the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c81fd742-5acf-4692-92b2-058d8a34d48b

📥 Commits

Reviewing files that changed from the base of the PR and between a2f8580 and f888522.

📒 Files selected for processing (20)
  • packages/bun-usockets/src/context.c
  • packages/bun-usockets/src/libusockets.h
  • src/bun.js/VirtualMachine.zig
  • src/bun.js/api/Timer/TimerObjectInternals.zig
  • src/bun.js/api/bun/process.zig
  • src/bun.js/bindings/JSGlobalObject.zig
  • src/bun.js/bindings/ZigGlobalObject.cpp
  • src/bun.js/node/node_fs_stat_watcher.zig
  • src/bun.js/node/node_fs_watcher.zig
  • src/bun.js/rare_data.zig
  • src/bun.js/test/Execution.zig
  • src/bun.js/test/bun_test.zig
  • src/bun.js/test/jest.zig
  • src/cli.zig
  • src/cli/Arguments.zig
  • src/cli/test/ParallelRunner.zig
  • src/cli/test_command.zig
  • src/deps/uws/SocketContext.zig
  • test/cli/test/isolation.test.ts
  • test/cli/test/parallel.test.ts

Comment thread packages/bun-usockets/src/libusockets.h Outdated
Comment thread src/bun.js/rare_data.zig
Comment thread src/bun.js/test/Execution.zig Outdated
Comment thread src/bun.js/VirtualMachine.zig Outdated
Comment thread src/cli/Arguments.zig
Comment thread src/cli/test/ParallelRunner.zig Outdated
Comment thread src/cli/test/ParallelRunner.zig Outdated
Comment thread src/cli/test/ParallelRunner.zig Outdated
Comment thread test/cli/test/isolation.test.ts Outdated
Comment thread test/cli/test/parallel.test.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (7)
src/bun.js/rare_data.zig (1)

303-315: ⚠️ Potential issue | 🟠 Major

Don't silently drop isolation watcher registration on allocation failure.

If either append() fails here, that watcher is never tracked and closeAllWatchersForIsolation() will miss it, so --isolate can leak live watchers under memory pressure. Use bun.handleOom(...) instead of catch {}.

As per coding guidelines "Use bun.handleOom() to convert error.OutOfMemory into a crash without swallowing other errors."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/rare_data.zig` around lines 303 - 315, The append() calls in
addFSWatcherForIsolation and addStatWatcherForIsolation currently swallow
allocation failures (catch {}), so watchers can be lost and leak; replace the
empty catches with bun.handleOom(...) to convert error.OutOfMemory into a crash
and not silently drop the watcher, e.g., call bun.handleOom(err, "adding
isolation watcher") on the append failure for both addFSWatcherForIsolation and
addStatWatcherForIsolation, ensuring closeAllWatchersForIsolation can reliably
see all registered watchers.
src/cli/test/ParallelRunner.zig (4)

883-931: ⚠️ Potential issue | 🟠 Major

Forward the execution-affecting coverage and concurrency sub-options to workers.

Only bare --coverage and --concurrent are forwarded today. Workers fall back to defaults for options like coverage.ignore_patterns, coverage.ignore_sourcemap, coverage.skip_test_files, and concurrent_test_glob, so parallel runs can behave differently from serial.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/test/ParallelRunner.zig` around lines 883 - 931, buildWorkerArgv
currently only forwards bare "--coverage" and "--concurrent", so worker
processes miss sub-options (coverage.ignore_patterns, coverage.ignore_sourcemap,
coverage.skip_test_files, and concurrent_test_glob) and fall back to defaults;
update buildWorkerArgv to check those fields on ctx.test_options (e.g.,
opts.coverage.ignore_patterns, opts.coverage.ignore_sourcemap,
opts.coverage.skip_test_files, opts.concurrent_test_glob) and append the
corresponding CLI flags to argv (using arena.dupeZ for string values and printZ
for numeric values) just like the existing coverage/concurrent handling so
workers receive the same execution-affecting options.

297-301: ⚠️ Potential issue | 🟠 Major

Set inflight only after the run command is actually queued.

A long path or send failure can return before any message reaches the worker while leaving this.inflight set, and the coordinator will wait forever on a job that never started.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/test/ParallelRunner.zig` around lines 297 - 301, The worker currently
sets this.inflight before queuing the run command in dispatch, which can leave
inflight set if bufPrint or send fail; change dispatch so that you only assign
this.inflight after the command has been successfully queued/sent (i.e., call
std.fmt.bufPrint and then this.send and check for success), or if send can
error, perform the assignment after a successful send return and ensure any send
failure leaves this.inflight unchanged or reset; update the dispatch function to
move the inflight assignment below the successful send path and handle send
errors accordingly.

1016-1034: ⚠️ Potential issue | 🟠 Major

Flush JUnit/LCOV fragments per completed file, not only on recycle or exit.

The coordinator commits each file as soon as it receives file_done, but workers only emit artifact fragments from workerFlushAggregates(). If a worker crashes after finishing earlier files, only the in-flight file is retried and the already-completed files' artifacts disappear from the merged reports.

Also applies to: 1043-1083

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/test/ParallelRunner.zig` around lines 1016 - 1034, The code currently
only calls workerFlushAggregates(reporter, vm, ctx, worker_tmp) when
will_recycle or on exit, so artifact fragments for completed files are not
emitted per file and are lost if the worker crashes; move or add a call to
workerFlushAggregates so it runs for every completed file before the coordinator
is sent the "file_done" message (i.e., invoke workerFlushAggregates just prior
to writeIpcLine(msg) in the block that builds the file_done message and
replicate the same change in the analogous block around lines 1043-1083),
keeping the same arguments and preserving existing ordering of
reporter.summary(), msg construction, and writeIpcLine calls.

584-597: ⚠️ Potential issue | 🟠 Major

Abort parallel artifact mode if the worker temp dir can't be created.

This branch logs and continues, but Line 594 still tears down the coordinator JUnit reporter and the serial coverage path is later skipped. The run can therefore succeed without producing the requested JUnit/coverage outputs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/test/ParallelRunner.zig` around lines 584 - 597, When creating the
worker temp directory (the result stored in dir) fails, the current code logs
the error and continues which leads to teardown of the coordinator JUnit
reporter (reporter.reporters.junit / jr.deinit()) and skipping of serial
coverage output; change the failure path so it aborts the parallel-artifact run
instead of continuing: after std.fs.cwd().makePath(dir) returns error, log the
error via Output.err, free or discard dir as needed, and then return or
propagate an error (or set a failure flag) from the ParallelRunner routine so
the caller stops the parallel mode; do not deinit reporter.reporters.junit or
proceed with worker_tmpdir/coverage setup when makePath fails (i.e., only set
worker_tmpdir and vm.transpiler.env.map.put and only deinit jr when makePath
succeeded).
test/cli/test/parallel.test.ts (1)

121-136: ⚠️ Potential issue | 🟡 Minor

The regex doesn't verify >1 as the test name claims.

The pattern /--parallel: \d+ workers/ matches 1 workers as well as higher counts. If the default ever regresses to 1, this test would still pass.

Consider parsing the worker count and asserting it matches expectation based on available CPUs, or at minimum ensuring the regex captures a number greater than 1.

🔧 Suggested fix
-  expect(stderr).toMatch(/--parallel: \d+ workers, 2 files/);
+  const m = stderr.match(/--parallel: (\d+) workers, 2 files/);
+  expect(m).not.toBeNull();
+  const workerCount = Number(m![1]);
+  // On single-CPU CI, parallel defaults to 1; otherwise verify >1
+  const cpuCount = (await import("node:os")).cpus().length;
+  if (cpuCount > 1) {
+    expect(workerCount).toBeGreaterThan(1);
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cli/test/parallel.test.ts` around lines 121 - 136, The test's regex
allows "1 workers" so update the assertion to parse the worker count from stderr
and assert it's greater than 1: take the stderr string (variable stderr), match
it against a regex with a capture for the worker number (e.g. /--parallel: (\d+)
workers, 2 files/), convert the capture to a number and replace the current
expect(stderr).toMatch(...) with an assertion that Number(capture) > 1 (e.g.
expect(workers).toBeGreaterThan(1)) to ensure the default is truly >1.
test/cli/test/isolation.test.ts (1)

150-207: 🧹 Nitpick | 🔵 Trivial

Hoist node:net and node:fs imports to module scope.

The dynamic imports at lines 184-185 are not testing dynamic import behavior—they're used to set up a TCP server for the isolation test. As per coding guidelines, these should be module-scope imports.

🔧 Suggested fix

Add at file top (after line 2):

+import net from "node:net";
+import fs from "node:fs";

Then update lines 184-186:

-    const net = await import("node:net");
-    const fs = await import("node:fs");
-    const server = net.createServer(sock => {
+    const server = net.createServer(sock => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cli/test/isolation.test.ts` around lines 150 - 207, The test currently
uses dynamic imports (await import("node:net") and await import("node:fs"))
inside the test body; hoist these to module scope by adding top-level imports:
import net from "node:net" and import fs from "node:fs", then remove the await
import lines and any local "const net" / "const fs" declarations so the existing
references (server, net.createServer, fs.writeFileSync, fs.existsSync) use the
module-scope bindings; ensure no other code relies on the dynamic import timing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cli/test_command.zig`:
- Around line 1509-1513: Before the early worker-mode return, initialize the
default timezone to UTC so worker processes use the same "Etc/UTC" setup as
serial runs: detect ctx.test_options.test_worker and, prior to calling
ParallelRunner.runAsWorker(reporter, vm, ctx), perform the same timezone
initialization used elsewhere in this module (the "Etc/UTC" setup) so workers
inherit UTC even when TZ is not set in the environment.
- Around line 1688-1692: The conditional that skips the serial coverage path
when ctx.test_options.parallel > 0 causes ParallelRunner.runAsCoordinator(...)
(which may fall back to calling runAllTests(...) for single-file runs) to avoid
running the final coverage logic; change the branch so that after calling
ParallelRunner.runAsCoordinator(reporter, vm, test_files, ctx,
&coverage_options) you detect its fallback-to-serial result (or return
value/flag) and, when it indicates the coordinator delegated to serial
execution, explicitly invoke the same final-coverage path used by
runAllTests(reporter, vm, test_files, ctx.allocator) (or call runAllTests
directly with the same arguments) and ensure coverage_options
finalization/collection still runs so bun test --parallel --coverage <single
file> produces the coverage report.

In `@src/cli/test/ParallelRunner.zig`:
- Around line 1068-1081: The current logic unconditionally emits IPC lines
"junit_file" and "coverage_file" even if writing the files failed; change it so
that you only call writeIpcLine(...) after successful writes: for JUnit, call
junit.writeToFile(path) and if it returns success then build the line (using
line.clearRetainingCapacity(); line.writer(...).print(...)) and call
writeIpcLine(line.items), otherwise handle/propagate the error (or at least
avoid emitting the path and free the allocated path); similarly for coverage,
only call reporter.writeLcovOnly(vm, &ctx.test_options.coverage, path) and on
success emit the "coverage_file" IPC line, otherwise do not emit and handle the
failure. Ensure you reference and preserve the existing allocPrintSentinel path
handling and cleanup logic around reporter.reporters.junit, junit.writeToFile,
reporter.writeLcovOnly, line.writer, and writeIpcLine.
- Around line 528-542: In the w.start() catch block (inside ParallelRunner;
symbols: w.start(), this.next_file, this.files.len, retry_idx, this.files_done,
this.reporter.summary().fail/files, drive()), detect if work remains
(this.next_file < this.files.len or retry_idx != null) and immediately abort the
run instead of returning silently: set this.bailed = true, increment
reporter.summary().fail and reporter.summary().files by the count of remaining
files (this.files.len - this.files_done), set this.files_done = this.files.len,
and then return; this ensures drive() won't spin forever with no live workers
after a respawn failure.

In `@test/cli/test/parallel.test.ts`:
- Around line 250-292: The test awaits proc.exited twice; capture the process
exit once and reuse it to avoid redundant awaits: after the stderr loop await
proc.exited and store the result in a variable (e.g., const exitCode = await
proc.exited) then replace subsequent uses of await proc.exited with that
variable when asserting the exit code; update any references around the
assertions that currently call proc.exited directly to use the captured exitCode
instead.

---

Duplicate comments:
In `@src/bun.js/rare_data.zig`:
- Around line 303-315: The append() calls in addFSWatcherForIsolation and
addStatWatcherForIsolation currently swallow allocation failures (catch {}), so
watchers can be lost and leak; replace the empty catches with bun.handleOom(...)
to convert error.OutOfMemory into a crash and not silently drop the watcher,
e.g., call bun.handleOom(err, "adding isolation watcher") on the append failure
for both addFSWatcherForIsolation and addStatWatcherForIsolation, ensuring
closeAllWatchersForIsolation can reliably see all registered watchers.

In `@src/cli/test/ParallelRunner.zig`:
- Around line 883-931: buildWorkerArgv currently only forwards bare "--coverage"
and "--concurrent", so worker processes miss sub-options
(coverage.ignore_patterns, coverage.ignore_sourcemap, coverage.skip_test_files,
and concurrent_test_glob) and fall back to defaults; update buildWorkerArgv to
check those fields on ctx.test_options (e.g., opts.coverage.ignore_patterns,
opts.coverage.ignore_sourcemap, opts.coverage.skip_test_files,
opts.concurrent_test_glob) and append the corresponding CLI flags to argv (using
arena.dupeZ for string values and printZ for numeric values) just like the
existing coverage/concurrent handling so workers receive the same
execution-affecting options.
- Around line 297-301: The worker currently sets this.inflight before queuing
the run command in dispatch, which can leave inflight set if bufPrint or send
fail; change dispatch so that you only assign this.inflight after the command
has been successfully queued/sent (i.e., call std.fmt.bufPrint and then
this.send and check for success), or if send can error, perform the assignment
after a successful send return and ensure any send failure leaves this.inflight
unchanged or reset; update the dispatch function to move the inflight assignment
below the successful send path and handle send errors accordingly.
- Around line 1016-1034: The code currently only calls
workerFlushAggregates(reporter, vm, ctx, worker_tmp) when will_recycle or on
exit, so artifact fragments for completed files are not emitted per file and are
lost if the worker crashes; move or add a call to workerFlushAggregates so it
runs for every completed file before the coordinator is sent the "file_done"
message (i.e., invoke workerFlushAggregates just prior to writeIpcLine(msg) in
the block that builds the file_done message and replicate the same change in the
analogous block around lines 1043-1083), keeping the same arguments and
preserving existing ordering of reporter.summary(), msg construction, and
writeIpcLine calls.
- Around line 584-597: When creating the worker temp directory (the result
stored in dir) fails, the current code logs the error and continues which leads
to teardown of the coordinator JUnit reporter (reporter.reporters.junit /
jr.deinit()) and skipping of serial coverage output; change the failure path so
it aborts the parallel-artifact run instead of continuing: after
std.fs.cwd().makePath(dir) returns error, log the error via Output.err, free or
discard dir as needed, and then return or propagate an error (or set a failure
flag) from the ParallelRunner routine so the caller stops the parallel mode; do
not deinit reporter.reporters.junit or proceed with worker_tmpdir/coverage setup
when makePath fails (i.e., only set worker_tmpdir and vm.transpiler.env.map.put
and only deinit jr when makePath succeeded).

In `@test/cli/test/isolation.test.ts`:
- Around line 150-207: The test currently uses dynamic imports (await
import("node:net") and await import("node:fs")) inside the test body; hoist
these to module scope by adding top-level imports: import net from "node:net"
and import fs from "node:fs", then remove the await import lines and any local
"const net" / "const fs" declarations so the existing references (server,
net.createServer, fs.writeFileSync, fs.existsSync) use the module-scope
bindings; ensure no other code relies on the dynamic import timing.

In `@test/cli/test/parallel.test.ts`:
- Around line 121-136: The test's regex allows "1 workers" so update the
assertion to parse the worker count from stderr and assert it's greater than 1:
take the stderr string (variable stderr), match it against a regex with a
capture for the worker number (e.g. /--parallel: (\d+) workers, 2 files/),
convert the capture to a number and replace the current
expect(stderr).toMatch(...) with an assertion that Number(capture) > 1 (e.g.
expect(workers).toBeGreaterThan(1)) to ensure the default is truly >1.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 47c9bc78-58f5-4607-b97c-a680659141fe

📥 Commits

Reviewing files that changed from the base of the PR and between f888522 and 4b77d24.

📒 Files selected for processing (5)
  • src/bun.js/rare_data.zig
  • src/cli/test/ParallelRunner.zig
  • src/cli/test_command.zig
  • test/cli/test/isolation.test.ts
  • test/cli/test/parallel.test.ts

Comment thread src/cli/test_command.zig Outdated
Comment thread src/cli/test_command.zig
Comment thread src/cli/test/ParallelRunner.zig Outdated
Comment thread src/cli/test/ParallelRunner.zig Outdated
Comment thread test/cli/test/parallel.test.ts
Comment thread src/cli/test/parallel/Coordinator.zig
Jarred-Sumner and others added 4 commits April 16, 2026 19:46
…lel-test

# Conflicts:
#	test/js/bun/perf/static-initializers.test.ts
Pre-existing on main: an on_data callback can re-enter the loop
(expect().toThrow() -> waitForPromise -> us_loop_run_bun_tick), and the
inner tick's loop_post would free the very socket the outer tick's
dispatch is mid-iteration on (it reads s->flags right after on_data
returns). Track tick_depth and skip the free in nested loop_post; the
outermost tick's loop_post drains the closed list.

Reproduced on origin/main with test/bake/deinitialization.test.ts under
ASAN; fixed by this change.
Comment thread src/cli/test/parallel/Frame.zig
Comment thread src/cli/test/parallel/Worker.zig
…double-free

Frame.str() now truncates strings that would push the encoded frame past
max_payload (64MB), with a marker. Without this, a test producing a huge
result line (e.g. a 70M-char test name) would cause the receiver's
Channel.ingest to drop the channel and mark the file as crashed.

adoptPipe() no longer closeAndDestroy()s on readStart failure — it returns
false and the caller frees. backend.pipe is only set after readStart
succeeds, so Worker.start()'s errdefer correctly distinguishes
'adoptPipe never called' from 'adoptPipe failed'.
Comment thread src/bun.js/VirtualMachine.zig Outdated
Comment thread src/cli/test/parallel/Coordinator.zig
…call

prev captured a *JSInternalPromise, then set(null) removed its only GC
root, then callOverriddenRunMain ran JS (can GC). prev was used after.
Dropped prev — set(null) before the call means get()==null afterward iff
the override didn't store its own promise.
Drops the owner: *Owner field so Channel default-inits as .{} and
consumers no longer need = undefined placeholders. read_chunk stays
zero-initialized to satisfy ban-words; the cost is one 16KB memset on
the Windows path.
Within-file test shuffle previously used a single PRNG whose state carried
across files. Under --parallel, file-to-worker assignment depends on
timing/stealing, so a file's test order depended on what files preceded it
on that worker — replaying the printed --seed=N didn't reproduce.

Seed each file's Order PRNG from bun.hash(file_path) +% global_seed so
within-file order is a function of (seed, path) only.
The coordinator passed the extra fd as .buffer (uv_pipe_init ipc=0) but
that path has only ever been exercised unidirectionally for stdout/stderr
capture. Switch to .ipc for the parent stdio container and ipc=1 in the
child's uv_pipe_init+open, which is byte-for-byte the dance ipc.zig
windowsConfigureClient/Server use for Bun.spawn({ipc}) / process.send().

Both ends agreeing on libuv's IPC framing means our [u32 len][u8 kind]
frames ride inside it transparently; the previous (ipc=0, ipc=0) pair
left --parallel hanging on Windows CI with no frames ever delivered.
ipc.zig unrefs its pipe but its loop is driven by user JS that keeps
other handles alive. Our worker (and the coordinator before any process
exit handler is registered) has nothing else ref'd, so unref()ing the
channel pipe makes uv_loop_alive() false and autoTick() takes the
tickWithoutIdle (uv_run NOWAIT) path instead of blocking for the peer's
first frame. Both sides exit explicitly (Global.exit / drive() return),
so the extra ref never holds the process open.

Also surface uv_pipe_init/open/readStart errors via Output.debugWarn so
if this isn't the hang the next CI run shows where adopt fails.
Comment thread src/bun.js/bindings/ModuleLoader.cpp
Comment thread src/cli/test/parallel/Coordinator.zig
Comment thread src/cli/test/parallel/Channel.zig
…, sendWindows error)

- ModuleLoader.cpp fetchESMSourceCode: don't serve cached Program-type
  providers when hasOverriddenModuleWrapper is set in this global; mirror
  the guard already in fetchCommonJSModule.
- parallel/Worker.zig start() errdefer: reset ipc/out/err to fresh values
  after deinit so reapWorker's !respawned cleanup doesn't deinit on the
  undefined memory ArrayListUnmanaged.deinit leaves behind.
- parallel/Channel.zig deinit(): reset buffers to .empty after free
  (idempotent).
- parallel/Channel.zig sendWindows(): markDone() on non-EAGAIN tryWrite
  error instead of silently dropping. uv_try_write on UV_NAMED_PIPE
  Windows always returns EAGAIN (vendor/libuv/src/win/stream.c:185), so
  the path was already correct in practice — this hardens EBADF/EPIPE.
Comment thread src/cli/test/parallel/aggregate.zig Outdated
Comment thread src/cli/test/parallel/aggregate.zig Outdated
Comment thread src/cli/test/parallel/Channel.zig
The jsc.Strong wrapper made every entry-point promise a permanent GC
root, transitively rooting the module graph. bundler_defer.test.ts'
FinalizationRegistry callbacks never fired (Windows CI, 3× fail). The
original raw-pointer + protect()/is_protected design only rooted on the
two preload-early-return paths where the caller can't await immediately.
reload() now unprotects before reassigning, fixing the one leak the
Strong refactor was meant to address.
Comment thread src/cli/test/parallel/aggregate.zig
@Jarred-Sumner
Copy link
Copy Markdown
Collaborator Author

CI status — build #46067 on e47a73c63c

59/60 jobs pass. All review threads resolved. 39/39 isolation+parallel tests, zig:check-all 61/61, 24 verification rounds.

The one failure is test/js/node/worker_threads/worker_threads.test.ts segfaulting in JSC::Heap visitStrongHandles during worker termination on darwin-14-x64 only — this is the same crash already tracked at test/expectations.txt:20 as [ASAN][CRASH], surfacing here on a non-ASAN release build (so the expectations gate doesn't catch it). Passes 5× locally on debug aarch64; my diff adds no Strong<> handles. The Strong-based pending_internal_promise refactor that did root the module graph was reverted in 7073099226 (which fixed bundler_defer.test.ts FinalizationRegistry).

For reference, build #46066 (one commit prior) was 59/60 with the only fail being test/cli/inspect/test-reporter.test.ts on alpine-aarch64ci:errors tags that one [pre-existing], and the inspector code path is untouched in this diff. Neither failure repeats across builds; both look like timing-sensitive pre-existing races.

ci:errors '#29354'   # build #46067
== test/js/node/worker_threads/worker_threads.test.ts == [new]
  Segmentation fault at address 0x10
  Heap.cpp:3067 visitStrongHandles
  (test/expectations.txt:20: [ASAN] worker_threads.test.ts [CRASH])

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All reported bugs have been addressed. Given the scope of this PR — 64 files, new GC-visible globals, IPC process management, and event-loop changes — I'd recommend a final human read before merge.

Extended reasoning...

Overview

This PR adds two new bun test flags: --isolate (fresh ZigGlobalObject per file on the same JSC VM) and --parallel[=N] (multi-process coordinator/worker pool). The changes span 64 files across Zig/C++/TypeScript, touching the JS VM, GC roots, IPC channels, process spawning, uWS event loop, module caching, timer generation guards, and test reporters.

Review round summary

The multi-agent bug-hunting system filed 142 inline comments over several rounds. Every single one is now marked resolved=true, with Jarred-Sumner committing fixes in roughly 15 distinct commits. Spot-checks of the most critical fixes confirm they are correctly in place:

  • ingest() now guards if (self.done) return at line 254 of Channel.zig
  • mergeJUnitFragments now switches on the writeAll result (lines 79-82 of aggregate.zig)
  • reload() unprotects the old pending_internal_promise before clearing (lines 767-770 of VirtualMachine.zig)
  • swapGlobalForTestIsolation skips test_parallel_ipc_context alongside the spawn/process IPC contexts (line 2427)
  • buildWorkerArgv now forwards --define, --loader, and --tsconfig-override to workers
  • Frame kind bytes are validated with std.meta.intToEnum(...) catch instead of @enumFromInt
  • spawned_count incremented only after a successful start() call
  • Orphaned pending_retry slots handled in onWorkerExit
  • BUN_TEST_PARALLEL_SCALE_MS timespec correctly computed with @divTrunc/@mod

Security risks

No authentication, permissions, or crypto paths are touched. The main risks are:

  • Process escalation surface: the coordinator spawns child workers with new_process_group=true and linux_pdeathsig=SIGTERM; the signal cleanup path looks correct
  • IPC frame parsing: the 64 MB max-payload guard and enum validation are now in place
  • No injection or data-exposure risks identified

Level of scrutiny

This is production-critical runtime infrastructure — not sandbox code. The GC root management (protect/unprotect around pending_internal_promise, IsolatedModuleCache lifetime), the uWS tick_depth re-entrancy fix, and the module cache coherence under hot-reload + isolation all require careful correctness reasoning that goes beyond mechanical bug fixes. The PR has been reviewed exhaustively by the automated system, but a human familiar with the JSC/uWS internals should do a final pass before merge.

Other factors

The new test files (test/cli/test/isolation.test.ts, test/cli/test/parallel.test.ts) provide good coverage of the happy paths and several edge cases. The bench harness is a nice addition. The commit history shows responsive and thorough iteration by the author in response to every reported issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants