bun test: add --isolate and --parallel#29354
Conversation
Adds Zig__GlobalObject__createForTestIsolation which constructs a new Zig::GlobalObject on the existing JSC::VM (reusing the warm heap/JIT) and gcUnprotects the previous one so its module graph becomes collectable. VirtualMachine.swapGlobalForTestIsolation() drains microtasks, closes tracked listening sockets, kills tracked subprocesses, bumps a generation counter, and points vm.global at the fresh global. JS-created timers capture the generation at init and self-cancel on fire if it no longer matches, so a setInterval leaked by file N never calls into file N+1's global. The test runner calls swapGlobalForTestIsolation between files when --isolate is passed. The reporter and TestRunner are Zig-side and survive the swap so summary counts accumulate correctly. Not yet handled (covered by future worker recycling): non-listening socket connections, FSWatchers, --preload re-execution.
The 2-arg GlobalObject constructor calls initialIdentifier() which asserts on second use. Creating with generateIdentifier() works but gives the new global a context ID != 1, breaking Bun.isMainThread and WTFTimer cross-thread dispatch which assume the main-thread context is ID 1. Instead, move the old context to a fresh identifier (it's about to be collected anyway) and create the new global with the inherited ID.
Coordinator spawns N workers (bun test --test-worker --isolate), hands out one file at a time over stdin, and reads per-file summaries back over fd 3. Worker stdout/stderr are captured and prefixed. Workers swap GlobalObject between files (Phase A isolation) and exit after --isolate-recycle-after files; the coordinator respawns them and re-queues any in-flight file once before marking it as crashed. Summary totals (pass/fail/skip/todo/expect/files) are aggregated into the existing CommandLineReporter so --parallel composes with the normal exit-code and printSummary paths.
Coordinator now mirrors the worker's files_run counter and skips assignWork() when the worker is at its recycle threshold. Previously the next file was written to a worker's stdin just before it exited to recycle, causing onWorkerExit to misreport the recycle as a crash and burn the file's retry budget.
Preload scripts (custom matchers, beforeAll/beforeEach/afterAll hooks) were only evaluated once in the first global. After swapGlobalForTestIsolation() the new global had a fresh Expect prototype and empty hook_scope, so the second and later files lost preload state. - loadPreloads no longer zeroes vm.preload when test_isolation_enabled, so reloadEntryPointForTestRunner re-imports preloads in each new global - BunTestRoot.resetHookScopeForTestIsolation() drops Strong<> refs to the old global's preload hooks before the next file re-registers them - Under --isolate every file is treated as first=true,last=true so preload-level beforeAll/afterAll fire per-file (each file is a complete isolated run from preload's perspective) - Worker path in ParallelRunner.zig gets the same treatment
Coordinator now buffers each worker's stdout/stderr and flushes it atomically on file_done, so per-test status lines appear contiguously per file (matching serial output) instead of interleaving across workers with a [worker N] prefix. Workers ship their failures/skips/todos repeat buffers to the coordinator (hex-encoded) before exit so the end-of-run summary section is populated. When --reporter=junit is set the coordinator points workers at a temp dir via BUN_TEST_WORKER_JUNIT; each worker writes its own XML fragment on exit and reports the path over IPC. The coordinator concatenates fragment bodies into a single <testsuites> document with correct totals. Works across worker recycling. Coverage is not yet aggregated; --coverage with --parallel now warns instead of silently producing an empty report. The merge plan is documented inline.
Mirror security_scanner.zig's Windows pattern: coordinator creates non-overlapped uv_pipe(0,0) pairs for stdin and the fd-3 results channel and inherits the child ends via .pipe (UV_INHERIT_FD), so the worker's blocking ReadFile/WriteFile work without libuv. The previous .buffer extra_fd would set UV_OVERLAPPED_PIPE on the child end and break the worker's sync writes. stdout/stderr stay on .buffer libuv pipes; coordinator now reads them via startWithPipe(). Also fix WorkerPipe.loop() / Worker.loop() to use vm.uvLoop() so they type-check on Windows (Async.Loop = uv.Loop there), and resolve fd 3 at runtime via .fromUV(3) which maps to the inherited CRT handle on Windows and is a no-op on POSIX.
Ensures every parallel test asserts the coordinator header so they fail under USE_SYSTEM_BUN (which lacks --parallel and falls back to serial). Previously two tests passed on system bun because their assertions were satisfied by serial output.
…orkers Coordinator now builds the worker argv once via buildWorkerArgv() and forwards every flag that affects how tests execute inside a worker: -t/--test-name-pattern, --preload, --rerun-each, --retry, --concurrent, --randomize, --seed, --max-concurrency, --coverage (and its sub-flags), plus the previously-forwarded --timeout/--todo/--only/--update-snapshots/ --isolate-recycle-after. Coordinator-only flags (discovery, reporters, --pass-with-no-tests) are intentionally not forwarded. --bail is handled at file granularity by the coordinator: once cumulative failures reach the threshold it stops dispatching, prints the bail message, and lets in-flight files finish. --bail is not forwarded to workers because the serial-mode bail path Global.exit(1)s, which the coordinator would misread as a worker crash.
swapGlobalForTestIsolation now walks the uws loop's SocketContext list and calls us_socket_context_close on each (skipping the IPC context), so leaked net.Socket / Bun.connect / WebSocket clients and listening servers are torn down before the next test file runs. Listening sockets are now closed via this path instead of the raw-fd watch-mode list, which avoids stale poll state. FSWatcher and StatWatcher instances created under --isolate are tracked in rare_data and detached/closed on swap. Adds us_socket_context_next() to expose the loop's context linked list to Zig. The off-thread HTTP client keepalive pool is left for worker recycling.
… and loosen perf-test threshold - std.c.getpid() doesn't compile on all targets; use a random u32 for the worker's JUnit fragment filename instead. - The 'faster than serial' assertion flaked under heavy machine load; loosen to 0.9× and skip if serial timing is implausible.
Workers now enable the ControlFlowProfiler (was previously skipped because the runAsWorker branch came first), write an LCOV fragment to a shared temp dir on recycle/shutdown, and report the path over IPC. The coordinator merges fragments per source file (sums DA hit counts, recomputes LF/LH, takes max of FNF/FNH since per-function FN/FNDA records aren't emitted yet) and produces both the lcov.info file and the text table. BUN_TEST_WORKER_JUNIT renamed to BUN_TEST_WORKER_TMP since it now carries both fragment kinds.
…atus Workers now emit test_start/test_done events over IPC the moment each test begins/finishes instead of letting the coordinator buffer the worker's stderr until file_done. The coordinator prints each result line immediately and, on a TTY, draws a one-line-per-worker status block at the bottom showing the currently-running file/test with an elapsed counter; result lines scroll above it via clear+write+redraw. - CommandLineReporter.worker_ipc_file_idx redirects handleTestCompleted to workerEmitTestDone (hex-encoded formatted line) instead of stderr - Execution.onSequenceStarted emits test_start so the live block can show the running test name - CurrentFile.set suppresses the worker's own file header; coordinator re-emits headers whenever the source file changes so interleaved lines stay self-describing - FORCE_COLOR=1 passed to workers when coordinator's stderr is a color terminal, so streamed lines carry the same ANSI as serial output - worker stderr (errors, console.log) streams per-line immediately under the in-flight file's header instead of being held to file_done
|
Updated 10:19 AM PT - Apr 17th, 2026
❌ @Jarred-Sumner, your commit e47a73c has 2 failures in
🧪 To try this PR locally: bunx bun-pr 29354That installs a local version of the PR into your bun-29354 --bun |
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds per-file test isolation and a coordinator/worker parallel test runner. Introduces VM global swapping, isolation-generation guards for timers, watcher/socket tracking and cleanup, a uWS socket-context iterator, CLI flags for isolation/parallelism, IPC for worker events, and end-to-end tests for isolation and parallel execution. Changes
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/bun-usockets/src/libusockets.h`:
- Around line 321-323: The declaration comment for us_socket_context_next lacks
thread-safety guidance; update it to state that callers must hold the
loop->data.mutex (the loop's context list mutex used in loop.c) while iterating
with us_socket_context_next to avoid races with context unlink/free paths, or
otherwise synchronize externally, and note that the iterator returns NULL at the
end and does not take ownership or pin contexts against concurrent frees.
In `@src/bun.js/rare_data.zig`:
- Around line 303-315: The append() calls in addFSWatcherForIsolation and
addStatWatcherForIsolation currently swallow all errors (append(... ) catch {}),
which loses watcher registration on OOM; replace the empty catch with proper
error handling that calls bun.handleOom(err) to convert OutOfMemory into a crash
and does not silently ignore other errors (i.e., after bun.handleOom(err)
propagate or fail for non-OOM errors rather than returning silently) so watcher
registration never gets silently dropped.
In `@src/bun.js/test/Execution.zig`:
- Around line 535-547: The inline import of ParallelRunner.zig inside the
function onSequenceStarted is illegal; move the import to file scope and use the
imported symbol inside the function. Add a top-of-file import like const
ParallelRunner = `@import`("../../cli/test/ParallelRunner.zig"); then replace the
inline `@import` call in onSequenceStarted with
ParallelRunner.workerEmitTestStart(file_idx, entry.base.name orelse
"(unnamed)"); leave the rest of onSequenceStarted, Execution, and
workerEmitTestStart usage unchanged.
In `@src/bun.js/VirtualMachine.zig`:
- Around line 2419-2422: The code clears pending_internal_promise without
undoing any prior protect() calls, which can leak the old global; modify the
cleanup so that before setting this.pending_internal_promise = null you call the
matching unprotect() for the stored promise (or only clear the slot if it was
never protect()ed), ensuring the code paths that store to
pending_internal_promise (the preload-failure branches that call protect()) are
paired with an unprotect() here (or use a helper that conditionally
unprotects-and-clears) so the promise/global can be collected.
In `@src/cli/Arguments.zig`:
- Around line 625-642: The current parsing of --parallel and
--isolate-recycle-after accepts 0 which yields silent fallback behavior; update
the parsing logic for args.option("--parallel") and
args.option("--isolate-recycle-after") so parsed u32 values must be > 0 (for
--parallel still allow empty to fall back to `@max`(bun.getThreadCount(), 1)), and
if parse succeeds but value == 0 call Output.prettyErrorln with an explanatory
message (e.g. "<red>error<r>: --parallel expects a positive number" or
"<red>error<r>: --isolate-recycle-after expects a positive number") and then
Global.exit(1); ensure you reference and validate ctx.test_options.parallel and
ctx.test_options.isolate_recycle_after after std.fmt.parseInt and before
assignment so zero is explicitly rejected.
In `@src/cli/test/ParallelRunner.zig`:
- Around line 883-931: buildWorkerArgv currently only forwards bare "--coverage"
and "--concurrent", so worker processes use defaults for coverage and
concurrency sub-options (e.g., coverage.ignore_patterns,
coverage.ignore_sourcemap, coverage.skip_test_files, concurrent_test_glob);
update buildWorkerArgv to append the full set of worker-affecting options from
ctx.test_options: for opts.coverage fields append flags/values for
ignore_patterns, ignore_sourcemap, skip_test_files, and any other coverage.*
fields that exist, and for concurrency append the concurrent_test_glob (and any
related concurrency settings) using printZ/dupeZ as done elsewhere; reference
buildWorkerArgv, opts.coverage, opts.concurrent, and the printZ/arena.dupeZ
helpers to locate where to add these argv entries and ensure null terminator
remains intact.
- Around line 584-597: The worker temp dir creation failure currently just logs
an error and continues, leaving worker_tmpdir null while the coordinator JUnit
reporter is still disabled; change the error branch of std.fs.cwd().makePath to
abort parallel-artifact mode instead of continuing—either return the error up
the call (propagate/try) or set/clear a flag that disables parallel artifact
behavior, ensure you do not deinit reporter.reporters.junit when dir creation
fails, and avoid inserting BUN_TEST_WORKER_TMP into vm.transpiler.env.map;
update handling around worker_tmpdir, vm.transpiler.env.map.put, and
reporter.reporters.junit so that failing to create the temp dir prevents
entering the parallel JUnit/coverage path and preserves the coordinator reporter
state.
- Around line 1006-1028: When sending the "file_done" message from the worker
(the block that constructs Runner, calls vm.runWithAPILock(Runner, &runner,
Runner.begin), computes after = reporter.summary() and builds msg then
writeIpcLine(msg)), also flush that file's JUnit/LCOV fragments immediately
instead of waiting for workerFlushAggregates() on recycle/exit; specifically
invoke the same logic that emits "junit_file"/"coverage_file" (or call
workerFlushAggregates() scoped to just the current file's aggregates) right
after computing after and before writeIpcLine(msg) so completed files’ artifacts
are sent per-file and not lost if the worker crashes.
- Around line 297-301: The code in Worker.dispatch sets this.inflight before
calling std.fmt.bufPrint which can fail and leave inflight set while no message
was sent; change the flow so you only set this.inflight after successfully
formatting and sending the message: call std.fmt.bufPrint(...) and check its
result, then call this.send(line) and only if both succeed assign this.inflight
= file_idx (or, alternatively, on any formatting/send error reset this.inflight
back to an invalid sentinel); update the dispatch function (references:
dispatch, this.inflight, std.fmt.bufPrint, this.send) so no path leaves inflight
set when no run message was delivered.
In `@test/cli/test/isolation.test.ts`:
- Around line 184-185: Hoist the dynamic imports of node:net and node:fs out of
the test body into module scope: replace the in-test await import("node:net")
and await import("node:fs") with top-level imports (e.g., const net = await
import(...) converted to import/net and import/fs at module scope) so the test
file uses module-scope imports for net and fs; update references in this file
(test/cli/test/isolation.test.ts) to use those module-scope symbols instead of
performing dynamic imports inside the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c81fd742-5acf-4692-92b2-058d8a34d48b
📒 Files selected for processing (20)
packages/bun-usockets/src/context.cpackages/bun-usockets/src/libusockets.hsrc/bun.js/VirtualMachine.zigsrc/bun.js/api/Timer/TimerObjectInternals.zigsrc/bun.js/api/bun/process.zigsrc/bun.js/bindings/JSGlobalObject.zigsrc/bun.js/bindings/ZigGlobalObject.cppsrc/bun.js/node/node_fs_stat_watcher.zigsrc/bun.js/node/node_fs_watcher.zigsrc/bun.js/rare_data.zigsrc/bun.js/test/Execution.zigsrc/bun.js/test/bun_test.zigsrc/bun.js/test/jest.zigsrc/cli.zigsrc/cli/Arguments.zigsrc/cli/test/ParallelRunner.zigsrc/cli/test_command.zigsrc/deps/uws/SocketContext.zigtest/cli/test/isolation.test.tstest/cli/test/parallel.test.ts
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (7)
src/bun.js/rare_data.zig (1)
303-315:⚠️ Potential issue | 🟠 MajorDon't silently drop isolation watcher registration on allocation failure.
If either
append()fails here, that watcher is never tracked andcloseAllWatchersForIsolation()will miss it, so--isolatecan leak live watchers under memory pressure. Usebun.handleOom(...)instead ofcatch {}.As per coding guidelines "Use
bun.handleOom()to converterror.OutOfMemoryinto a crash without swallowing other errors."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/bun.js/rare_data.zig` around lines 303 - 315, The append() calls in addFSWatcherForIsolation and addStatWatcherForIsolation currently swallow allocation failures (catch {}), so watchers can be lost and leak; replace the empty catches with bun.handleOom(...) to convert error.OutOfMemory into a crash and not silently drop the watcher, e.g., call bun.handleOom(err, "adding isolation watcher") on the append failure for both addFSWatcherForIsolation and addStatWatcherForIsolation, ensuring closeAllWatchersForIsolation can reliably see all registered watchers.src/cli/test/ParallelRunner.zig (4)
883-931:⚠️ Potential issue | 🟠 MajorForward the execution-affecting coverage and concurrency sub-options to workers.
Only bare
--coverageand--concurrentare forwarded today. Workers fall back to defaults for options likecoverage.ignore_patterns,coverage.ignore_sourcemap,coverage.skip_test_files, andconcurrent_test_glob, so parallel runs can behave differently from serial.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/test/ParallelRunner.zig` around lines 883 - 931, buildWorkerArgv currently only forwards bare "--coverage" and "--concurrent", so worker processes miss sub-options (coverage.ignore_patterns, coverage.ignore_sourcemap, coverage.skip_test_files, and concurrent_test_glob) and fall back to defaults; update buildWorkerArgv to check those fields on ctx.test_options (e.g., opts.coverage.ignore_patterns, opts.coverage.ignore_sourcemap, opts.coverage.skip_test_files, opts.concurrent_test_glob) and append the corresponding CLI flags to argv (using arena.dupeZ for string values and printZ for numeric values) just like the existing coverage/concurrent handling so workers receive the same execution-affecting options.
297-301:⚠️ Potential issue | 🟠 MajorSet
inflightonly after theruncommand is actually queued.A long path or send failure can return before any message reaches the worker while leaving
this.inflightset, and the coordinator will wait forever on a job that never started.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/test/ParallelRunner.zig` around lines 297 - 301, The worker currently sets this.inflight before queuing the run command in dispatch, which can leave inflight set if bufPrint or send fail; change dispatch so that you only assign this.inflight after the command has been successfully queued/sent (i.e., call std.fmt.bufPrint and then this.send and check for success), or if send can error, perform the assignment after a successful send return and ensure any send failure leaves this.inflight unchanged or reset; update the dispatch function to move the inflight assignment below the successful send path and handle send errors accordingly.
1016-1034:⚠️ Potential issue | 🟠 MajorFlush JUnit/LCOV fragments per completed file, not only on recycle or exit.
The coordinator commits each file as soon as it receives
file_done, but workers only emit artifact fragments fromworkerFlushAggregates(). If a worker crashes after finishing earlier files, only the in-flight file is retried and the already-completed files' artifacts disappear from the merged reports.Also applies to: 1043-1083
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/test/ParallelRunner.zig` around lines 1016 - 1034, The code currently only calls workerFlushAggregates(reporter, vm, ctx, worker_tmp) when will_recycle or on exit, so artifact fragments for completed files are not emitted per file and are lost if the worker crashes; move or add a call to workerFlushAggregates so it runs for every completed file before the coordinator is sent the "file_done" message (i.e., invoke workerFlushAggregates just prior to writeIpcLine(msg) in the block that builds the file_done message and replicate the same change in the analogous block around lines 1043-1083), keeping the same arguments and preserving existing ordering of reporter.summary(), msg construction, and writeIpcLine calls.
584-597:⚠️ Potential issue | 🟠 MajorAbort parallel artifact mode if the worker temp dir can't be created.
This branch logs and continues, but Line 594 still tears down the coordinator JUnit reporter and the serial coverage path is later skipped. The run can therefore succeed without producing the requested JUnit/coverage outputs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/test/ParallelRunner.zig` around lines 584 - 597, When creating the worker temp directory (the result stored in dir) fails, the current code logs the error and continues which leads to teardown of the coordinator JUnit reporter (reporter.reporters.junit / jr.deinit()) and skipping of serial coverage output; change the failure path so it aborts the parallel-artifact run instead of continuing: after std.fs.cwd().makePath(dir) returns error, log the error via Output.err, free or discard dir as needed, and then return or propagate an error (or set a failure flag) from the ParallelRunner routine so the caller stops the parallel mode; do not deinit reporter.reporters.junit or proceed with worker_tmpdir/coverage setup when makePath fails (i.e., only set worker_tmpdir and vm.transpiler.env.map.put and only deinit jr when makePath succeeded).test/cli/test/parallel.test.ts (1)
121-136:⚠️ Potential issue | 🟡 MinorThe regex doesn't verify
>1as the test name claims.The pattern
/--parallel: \d+ workers/matches1 workersas well as higher counts. If the default ever regresses to 1, this test would still pass.Consider parsing the worker count and asserting it matches expectation based on available CPUs, or at minimum ensuring the regex captures a number greater than 1.
🔧 Suggested fix
- expect(stderr).toMatch(/--parallel: \d+ workers, 2 files/); + const m = stderr.match(/--parallel: (\d+) workers, 2 files/); + expect(m).not.toBeNull(); + const workerCount = Number(m![1]); + // On single-CPU CI, parallel defaults to 1; otherwise verify >1 + const cpuCount = (await import("node:os")).cpus().length; + if (cpuCount > 1) { + expect(workerCount).toBeGreaterThan(1); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli/test/parallel.test.ts` around lines 121 - 136, The test's regex allows "1 workers" so update the assertion to parse the worker count from stderr and assert it's greater than 1: take the stderr string (variable stderr), match it against a regex with a capture for the worker number (e.g. /--parallel: (\d+) workers, 2 files/), convert the capture to a number and replace the current expect(stderr).toMatch(...) with an assertion that Number(capture) > 1 (e.g. expect(workers).toBeGreaterThan(1)) to ensure the default is truly >1.test/cli/test/isolation.test.ts (1)
150-207: 🧹 Nitpick | 🔵 TrivialHoist
node:netandnode:fsimports to module scope.The dynamic imports at lines 184-185 are not testing dynamic import behavior—they're used to set up a TCP server for the isolation test. As per coding guidelines, these should be module-scope imports.
🔧 Suggested fix
Add at file top (after line 2):
+import net from "node:net"; +import fs from "node:fs";Then update lines 184-186:
- const net = await import("node:net"); - const fs = await import("node:fs"); - const server = net.createServer(sock => { + const server = net.createServer(sock => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli/test/isolation.test.ts` around lines 150 - 207, The test currently uses dynamic imports (await import("node:net") and await import("node:fs")) inside the test body; hoist these to module scope by adding top-level imports: import net from "node:net" and import fs from "node:fs", then remove the await import lines and any local "const net" / "const fs" declarations so the existing references (server, net.createServer, fs.writeFileSync, fs.existsSync) use the module-scope bindings; ensure no other code relies on the dynamic import timing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cli/test_command.zig`:
- Around line 1509-1513: Before the early worker-mode return, initialize the
default timezone to UTC so worker processes use the same "Etc/UTC" setup as
serial runs: detect ctx.test_options.test_worker and, prior to calling
ParallelRunner.runAsWorker(reporter, vm, ctx), perform the same timezone
initialization used elsewhere in this module (the "Etc/UTC" setup) so workers
inherit UTC even when TZ is not set in the environment.
- Around line 1688-1692: The conditional that skips the serial coverage path
when ctx.test_options.parallel > 0 causes ParallelRunner.runAsCoordinator(...)
(which may fall back to calling runAllTests(...) for single-file runs) to avoid
running the final coverage logic; change the branch so that after calling
ParallelRunner.runAsCoordinator(reporter, vm, test_files, ctx,
&coverage_options) you detect its fallback-to-serial result (or return
value/flag) and, when it indicates the coordinator delegated to serial
execution, explicitly invoke the same final-coverage path used by
runAllTests(reporter, vm, test_files, ctx.allocator) (or call runAllTests
directly with the same arguments) and ensure coverage_options
finalization/collection still runs so bun test --parallel --coverage <single
file> produces the coverage report.
In `@src/cli/test/ParallelRunner.zig`:
- Around line 1068-1081: The current logic unconditionally emits IPC lines
"junit_file" and "coverage_file" even if writing the files failed; change it so
that you only call writeIpcLine(...) after successful writes: for JUnit, call
junit.writeToFile(path) and if it returns success then build the line (using
line.clearRetainingCapacity(); line.writer(...).print(...)) and call
writeIpcLine(line.items), otherwise handle/propagate the error (or at least
avoid emitting the path and free the allocated path); similarly for coverage,
only call reporter.writeLcovOnly(vm, &ctx.test_options.coverage, path) and on
success emit the "coverage_file" IPC line, otherwise do not emit and handle the
failure. Ensure you reference and preserve the existing allocPrintSentinel path
handling and cleanup logic around reporter.reporters.junit, junit.writeToFile,
reporter.writeLcovOnly, line.writer, and writeIpcLine.
- Around line 528-542: In the w.start() catch block (inside ParallelRunner;
symbols: w.start(), this.next_file, this.files.len, retry_idx, this.files_done,
this.reporter.summary().fail/files, drive()), detect if work remains
(this.next_file < this.files.len or retry_idx != null) and immediately abort the
run instead of returning silently: set this.bailed = true, increment
reporter.summary().fail and reporter.summary().files by the count of remaining
files (this.files.len - this.files_done), set this.files_done = this.files.len,
and then return; this ensures drive() won't spin forever with no live workers
after a respawn failure.
In `@test/cli/test/parallel.test.ts`:
- Around line 250-292: The test awaits proc.exited twice; capture the process
exit once and reuse it to avoid redundant awaits: after the stderr loop await
proc.exited and store the result in a variable (e.g., const exitCode = await
proc.exited) then replace subsequent uses of await proc.exited with that
variable when asserting the exit code; update any references around the
assertions that currently call proc.exited directly to use the captured exitCode
instead.
---
Duplicate comments:
In `@src/bun.js/rare_data.zig`:
- Around line 303-315: The append() calls in addFSWatcherForIsolation and
addStatWatcherForIsolation currently swallow allocation failures (catch {}), so
watchers can be lost and leak; replace the empty catches with bun.handleOom(...)
to convert error.OutOfMemory into a crash and not silently drop the watcher,
e.g., call bun.handleOom(err, "adding isolation watcher") on the append failure
for both addFSWatcherForIsolation and addStatWatcherForIsolation, ensuring
closeAllWatchersForIsolation can reliably see all registered watchers.
In `@src/cli/test/ParallelRunner.zig`:
- Around line 883-931: buildWorkerArgv currently only forwards bare "--coverage"
and "--concurrent", so worker processes miss sub-options
(coverage.ignore_patterns, coverage.ignore_sourcemap, coverage.skip_test_files,
and concurrent_test_glob) and fall back to defaults; update buildWorkerArgv to
check those fields on ctx.test_options (e.g., opts.coverage.ignore_patterns,
opts.coverage.ignore_sourcemap, opts.coverage.skip_test_files,
opts.concurrent_test_glob) and append the corresponding CLI flags to argv (using
arena.dupeZ for string values and printZ for numeric values) just like the
existing coverage/concurrent handling so workers receive the same
execution-affecting options.
- Around line 297-301: The worker currently sets this.inflight before queuing
the run command in dispatch, which can leave inflight set if bufPrint or send
fail; change dispatch so that you only assign this.inflight after the command
has been successfully queued/sent (i.e., call std.fmt.bufPrint and then
this.send and check for success), or if send can error, perform the assignment
after a successful send return and ensure any send failure leaves this.inflight
unchanged or reset; update the dispatch function to move the inflight assignment
below the successful send path and handle send errors accordingly.
- Around line 1016-1034: The code currently only calls
workerFlushAggregates(reporter, vm, ctx, worker_tmp) when will_recycle or on
exit, so artifact fragments for completed files are not emitted per file and are
lost if the worker crashes; move or add a call to workerFlushAggregates so it
runs for every completed file before the coordinator is sent the "file_done"
message (i.e., invoke workerFlushAggregates just prior to writeIpcLine(msg) in
the block that builds the file_done message and replicate the same change in the
analogous block around lines 1043-1083), keeping the same arguments and
preserving existing ordering of reporter.summary(), msg construction, and
writeIpcLine calls.
- Around line 584-597: When creating the worker temp directory (the result
stored in dir) fails, the current code logs the error and continues which leads
to teardown of the coordinator JUnit reporter (reporter.reporters.junit /
jr.deinit()) and skipping of serial coverage output; change the failure path so
it aborts the parallel-artifact run instead of continuing: after
std.fs.cwd().makePath(dir) returns error, log the error via Output.err, free or
discard dir as needed, and then return or propagate an error (or set a failure
flag) from the ParallelRunner routine so the caller stops the parallel mode; do
not deinit reporter.reporters.junit or proceed with worker_tmpdir/coverage setup
when makePath fails (i.e., only set worker_tmpdir and vm.transpiler.env.map.put
and only deinit jr when makePath succeeded).
In `@test/cli/test/isolation.test.ts`:
- Around line 150-207: The test currently uses dynamic imports (await
import("node:net") and await import("node:fs")) inside the test body; hoist
these to module scope by adding top-level imports: import net from "node:net"
and import fs from "node:fs", then remove the await import lines and any local
"const net" / "const fs" declarations so the existing references (server,
net.createServer, fs.writeFileSync, fs.existsSync) use the module-scope
bindings; ensure no other code relies on the dynamic import timing.
In `@test/cli/test/parallel.test.ts`:
- Around line 121-136: The test's regex allows "1 workers" so update the
assertion to parse the worker count from stderr and assert it's greater than 1:
take the stderr string (variable stderr), match it against a regex with a
capture for the worker number (e.g. /--parallel: (\d+) workers, 2 files/),
convert the capture to a number and replace the current
expect(stderr).toMatch(...) with an assertion that Number(capture) > 1 (e.g.
expect(workers).toBeGreaterThan(1)) to ensure the default is truly >1.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 47c9bc78-58f5-4607-b97c-a680659141fe
📒 Files selected for processing (5)
src/bun.js/rare_data.zigsrc/cli/test/ParallelRunner.zigsrc/cli/test_command.zigtest/cli/test/isolation.test.tstest/cli/test/parallel.test.ts
…lel-test # Conflicts: # test/js/bun/perf/static-initializers.test.ts
Pre-existing on main: an on_data callback can re-enter the loop (expect().toThrow() -> waitForPromise -> us_loop_run_bun_tick), and the inner tick's loop_post would free the very socket the outer tick's dispatch is mid-iteration on (it reads s->flags right after on_data returns). Track tick_depth and skip the free in nested loop_post; the outermost tick's loop_post drains the closed list. Reproduced on origin/main with test/bake/deinitialization.test.ts under ASAN; fixed by this change.
…double-free Frame.str() now truncates strings that would push the encoded frame past max_payload (64MB), with a marker. Without this, a test producing a huge result line (e.g. a 70M-char test name) would cause the receiver's Channel.ingest to drop the channel and mark the file as crashed. adoptPipe() no longer closeAndDestroy()s on readStart failure — it returns false and the caller frees. backend.pipe is only set after readStart succeeds, so Worker.start()'s errdefer correctly distinguishes 'adoptPipe never called' from 'adoptPipe failed'.
…call prev captured a *JSInternalPromise, then set(null) removed its only GC root, then callOverriddenRunMain ran JS (can GC). prev was used after. Dropped prev — set(null) before the call means get()==null afterward iff the override didn't store its own promise.
Drops the owner: *Owner field so Channel default-inits as .{} and
consumers no longer need = undefined placeholders. read_chunk stays
zero-initialized to satisfy ban-words; the cost is one 16KB memset on
the Windows path.
Within-file test shuffle previously used a single PRNG whose state carried across files. Under --parallel, file-to-worker assignment depends on timing/stealing, so a file's test order depended on what files preceded it on that worker — replaying the printed --seed=N didn't reproduce. Seed each file's Order PRNG from bun.hash(file_path) +% global_seed so within-file order is a function of (seed, path) only.
The coordinator passed the extra fd as .buffer (uv_pipe_init ipc=0) but
that path has only ever been exercised unidirectionally for stdout/stderr
capture. Switch to .ipc for the parent stdio container and ipc=1 in the
child's uv_pipe_init+open, which is byte-for-byte the dance ipc.zig
windowsConfigureClient/Server use for Bun.spawn({ipc}) / process.send().
Both ends agreeing on libuv's IPC framing means our [u32 len][u8 kind]
frames ride inside it transparently; the previous (ipc=0, ipc=0) pair
left --parallel hanging on Windows CI with no frames ever delivered.
… timeout for slower x64 macOS
ipc.zig unrefs its pipe but its loop is driven by user JS that keeps other handles alive. Our worker (and the coordinator before any process exit handler is registered) has nothing else ref'd, so unref()ing the channel pipe makes uv_loop_alive() false and autoTick() takes the tickWithoutIdle (uv_run NOWAIT) path instead of blocking for the peer's first frame. Both sides exit explicitly (Global.exit / drive() return), so the extra ref never holds the process open. Also surface uv_pipe_init/open/readStart errors via Output.debugWarn so if this isn't the hang the next CI run shows where adopt fails.
…, sendWindows error) - ModuleLoader.cpp fetchESMSourceCode: don't serve cached Program-type providers when hasOverriddenModuleWrapper is set in this global; mirror the guard already in fetchCommonJSModule. - parallel/Worker.zig start() errdefer: reset ipc/out/err to fresh values after deinit so reapWorker's !respawned cleanup doesn't deinit on the undefined memory ArrayListUnmanaged.deinit leaves behind. - parallel/Channel.zig deinit(): reset buffers to .empty after free (idempotent). - parallel/Channel.zig sendWindows(): markDone() on non-EAGAIN tryWrite error instead of silently dropping. uv_try_write on UV_NAMED_PIPE Windows always returns EAGAIN (vendor/libuv/src/win/stream.c:185), so the path was already correct in practice — this hardens EBADF/EPIPE.
The jsc.Strong wrapper made every entry-point promise a permanent GC root, transitively rooting the module graph. bundler_defer.test.ts' FinalizationRegistry callbacks never fired (Windows CI, 3× fail). The original raw-pointer + protect()/is_protected design only rooted on the two preload-early-return paths where the caller can't await immediately. reload() now unprotects before reassigning, fixing the one leak the Strong refactor was meant to address.
CI status — build #46067 on
|
There was a problem hiding this comment.
All reported bugs have been addressed. Given the scope of this PR — 64 files, new GC-visible globals, IPC process management, and event-loop changes — I'd recommend a final human read before merge.
Extended reasoning...
Overview
This PR adds two new bun test flags: --isolate (fresh ZigGlobalObject per file on the same JSC VM) and --parallel[=N] (multi-process coordinator/worker pool). The changes span 64 files across Zig/C++/TypeScript, touching the JS VM, GC roots, IPC channels, process spawning, uWS event loop, module caching, timer generation guards, and test reporters.
Review round summary
The multi-agent bug-hunting system filed 142 inline comments over several rounds. Every single one is now marked resolved=true, with Jarred-Sumner committing fixes in roughly 15 distinct commits. Spot-checks of the most critical fixes confirm they are correctly in place:
ingest()now guardsif (self.done) returnat line 254 of Channel.zigmergeJUnitFragmentsnow switches on thewriteAllresult (lines 79-82 of aggregate.zig)reload()unprotects the oldpending_internal_promisebefore clearing (lines 767-770 of VirtualMachine.zig)swapGlobalForTestIsolationskipstest_parallel_ipc_contextalongside the spawn/process IPC contexts (line 2427)buildWorkerArgvnow forwards--define,--loader, and--tsconfig-overrideto workers- Frame kind bytes are validated with
std.meta.intToEnum(...) catchinstead of@enumFromInt spawned_countincremented only after a successfulstart()call- Orphaned
pending_retryslots handled inonWorkerExit - BUN_TEST_PARALLEL_SCALE_MS timespec correctly computed with
@divTrunc/@mod
Security risks
No authentication, permissions, or crypto paths are touched. The main risks are:
- Process escalation surface: the coordinator spawns child workers with
new_process_group=trueandlinux_pdeathsig=SIGTERM; the signal cleanup path looks correct - IPC frame parsing: the 64 MB max-payload guard and enum validation are now in place
- No injection or data-exposure risks identified
Level of scrutiny
This is production-critical runtime infrastructure — not sandbox code. The GC root management (protect/unprotect around pending_internal_promise, IsolatedModuleCache lifetime), the uWS tick_depth re-entrancy fix, and the module cache coherence under hot-reload + isolation all require careful correctness reasoning that goes beyond mechanical bug fixes. The PR has been reviewed exhaustively by the automated system, but a human familiar with the JSC/uWS internals should do a final pass before merge.
Other factors
The new test files (test/cli/test/isolation.test.ts, test/cli/test/parallel.test.ts) provide good coverage of the happy paths and several edge cases. The bench harness is a nice addition. The commit history shows responsive and thorough iteration by the author in response to every reported issue.
Summary
Adds two flags to
bun test:--isolateruns each test file in a freshZigGlobalObjecton the sameJSC::VM. Between files the runner drains microtasks, walks usockets contexts to close all sockets, closesFSWatcher/StatWatcher, cancels timers, kills subprocesses, bumps a generation counter (whichsetTimeout/setInterval/AbortSignal.timeoutcheck before firing), unprotects the old global, and creates a new one viaZig__GlobalObject__createForTestIsolation.--preloadre-executes in each fresh global. A VM-levelpath → SourceProvidercache (IsolatedModuleCache.{h,cpp}) means shared dependencies are transpiled once per process;module_infois generated at runtime so cached providers skip the analyze parse viaBun__analyzeTranspiledModule.delete require.cache[key]evicts.--parallel[=N]runs a coordinator with up to N worker processes (default = CPU count, configurable via--parallel-delay=MSfor the lazy-spawn threshold). Files are sorted lexicographically and partitioned into K contiguousFileRanges for cache locality; idle workersstealBackHalf()from the largest remaining range. Workers run with--isolatebetween files. The version banner showsN× PARALLEL.IPC is a single duplex channel (
parallel/Channel.zig): usockets-adopted socketpair on POSIX, libuvuv_pipe_t(ipc=1) on Windows — same dance asprocess.send(). Length-prefixed binary frames (parallel/Frame.zig), event-loop-driven both directions, send-side truncates oversized payloads instead of triggering channel close. Dead workers re-queue their in-flight file once. Cross-worker--bailstops dispatching at file granularity. Snapshot writes flush per worker. JUnit and LCOV coverage merge across workers; coverage thresholds enforced regardless of reporter; crashed files get a synthetic<testsuite>so the merged XML stays schema-valid. Per-file--randomizeshuffle is seeded byhash(basename, global_seed)so the printed seed reproduces order regardless of which worker ran which file.All transpiler/resolver flags (
--define,--loader,--tsconfig-override,--conditions,--drop,--jsx-*,--no-addons,--no-macros,--no-env-file,--env-file,--feature,--preserve-symlinks, etc.) are forwarded to workers.Coordinator SIGINT/SIGTERM kills the worker process group silently. Linux:
prctl(PR_SET_PDEATHSIG)between vfork/exec inbun-spawn.cpp(newlinux_pdeathsigSpawnOption). Windows:KILL_ON_JOB_CLOSEJob Object (recursive). macOS: process-group kill covers Ctrl-C; SIGKILL leaves grandchildren until stdin EOF.Output is identical to serial: per-test results stream as they complete, with each test's
console.log/console.errorbuffered and flushed atomically before its result line so files never interleave mid-test.--dotsworks.Benchmarks (release, M-series macOS)
--parallel(32 files × 4 tests, ~100ms/file)--isolatecache (30 files × 2.5 MB shared module)--isolatecache on vsBUN_FEATURE_FLAG_DISABLE_ISOLATION_SOURCE_CACHE=1Benchmarks live in
bench/test/; the isolate-cache suite fails with a clear box error if run without--isolateso you can't accidentally benchmark the wrong thing.Key entry points
src/bun.js/bindings/ZigGlobalObject.cpp—Zig__GlobalObject__createForTestIsolationsrc/bun.js/bindings/IsolatedModuleCache.{h,cpp}— singlecanUse()/isTagCacheable()/lookup()/insert()(withASSERT(isNewEntry))/evict()/clear()src/bun.js/bindings/ModuleLoader.cpp,JSCommonJSModule.cpp,BunAnalyzeTranspiledModule.cpp— cache integration;module_infonow generated at runtime (was--bytecode-only)src/bun.js/VirtualMachine.zig—swapGlobalForTestIsolation(),pending_internal_promiseasjsc.Strongsrc/cli/test/parallel/{Coordinator,Worker,Channel,Frame,FileRange,runner,aggregate}.zigsrc/bun.js/bindings/bun-spawn.cpp,src/bun.js/api/bun/process.zig—linux_pdeathsig,new_process_grouppackages/bun-usockets/src/loop.c,epoll_kqueue.c—tick_depthto deferfree_closed_socketsto outermost tick (pre-existing reentrancy UAF, exposed byexpect().toThrow()→waitForPromise)Also fixed in passing
JSC__JSInternalPromise__resolvedPromisewas a type lie (called baseJSPromise::resolvedPromise, reinterpret-cast result);jsDynamicCastexposed itimport * as ns; export { ns }was recorded as Local export instead of Namespace re-export inmodule_infoZig::SourceProviderheld a rawm_globalObjectpointer that dangled when the provider outlived its creating globalKnown limits
HTTPThreadwith its own uws loop and isn't drained on swap; bounded at ~128 fds and JS-invisible.% Funcscoverage under--paralleltakes the per-worker max instead of a true union, because the existing LCOV writer doesn't emitFN/FNDArecords. Line coverage is exact.SourceProvidercache covers ESM and CommonJS via bothrequire()andimport; pluginonLoadresults are not cached (output may vary).Test plan
bun bd test test/cli/test/isolation.test.ts test/cli/test/parallel.test.ts— 39 passbun bd test --parallel=4 test/js/bun/util/— 49 files, ~45sbun run zig:check-all— all targets