perf: optimize gs_design_ahr (~3-4x speedup) by yihui · Pull Request #623 · Merck/gsDesign2

yihui · 2026-05-05T01:11:13Z

Summary

Optimize gs_design_ahr() and its dependency chain for 3-4x speedup across all input variants
Replace expensive object.size() calls in cache pruning with O(1) numhash() check
Replace dplyr operations (tibble, mutate, full_join, select, arrange, filter) with base R equivalents in hot-path functions (gs_power_npe, gs_design_npe, gs_design_ahr, expected_time, gs_info_ahr)
Rewrite expected_event() internals using pure vector arithmetic instead of data.frame/merge/order operations (8x speedup for this function alone)

Benchmark results (20 calls each, after warm-up)

Scenario	Before	After	Speedup
Default (single analysis)	0.985s	0.327s	3.0x
Multiple analysis_time (3 analyses)	2.707s	0.997s	2.7x
Info_frac driven	2.847s	1.157s	2.5x
Info_frac + analysis_time	3.790s	1.318s	2.9x
2-sided symmetric (O'Brien-Fleming)	3.059s	0.910s	3.4x
gs_b lower (no futility bound)	3.399s	0.773s	4.4x

Key changes by commit

prune_hash: Replace object.size() (walks entire hash, ~2ms/call) with numhash() (O(1)) for the frequent check; clear when entry count > 100
gs_power_npe output: Replace tibble() + mutate() + arrange() with data.frame() + base R sort (called 9+ times per design via gs_design_npe's root-finding)
gs_design_npe output: Replace full_join + select + rename + arrange with merge() + column subsetting
gs_design_ahr output: Replace dplyr chain (mutate, full_join, select, arrange, filter) with base R equivalents
Hot-path functions: Remove dplyr::select(), dplyr::mutate(), dplyr::transmute() from expected_time, gs_info_ahr, and the info_frac loop in gs_design_ahr
expected_event: Rewrite internals using vectors instead of data.frame/merge/order (8x speedup: 3.6ms to 0.43ms per call)
Backward compatibility: Exported functions (gs_power_npe, gs_design_npe) still return tibbles via as_tibble() at the return boundary

Test plan

All 787 existing tests pass (0 failures, 28 pre-existing skips)
Numerical output verified identical to baseline for all design variants
Tested with: default args, multiple analysis_time, info_frac driven, info_frac + analysis_time, 2-sided symmetric, gs_b lower bound

🤖 Generated with Claude Code

object.size() walks the entire hash table structure on every cache_fun() call, taking ~2ms per invocation. Since cache_fun is called 15+ times per gs_design_ahr run (via expected_time/ahr and gs_power_npe), this adds up to significant overhead. Replace with numhash() which returns the entry count in O(1), and use clrhash() for a simple eviction strategy when the limit is exceeded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gs_power_npe is called 9+ times per gs_design_ahr invocation (via gs_design_npe's bracket search and uniroot). The tibble() + mutate() + arrange() output assembly accounted for ~39% of gs_power_npe time. Replacing with data.frame() and base R ordering is 8x faster for the output assembly step, yielding ~25% improvement in overall gs_design_ahr runtime for multi-analysis designs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…_npe The output assembly in gs_design_npe used full_join (to merge H0 and H1 probabilities), select, rename, and arrange from dplyr. Since gs_design_npe is called once per gs_design_ahr and these operations are on small data frames (6 rows), base R merge() and column subsetting are much faster. Combined with the gs_power_npe change, this yields ~50% overall improvement for multi-analysis designs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace mutate, full_join, select, arrange, and filter operations in the output assembly section of gs_design_ahr with equivalent base R operations (direct column assignment, merge, column subsetting, order). This eliminates the dplyr overhead for the final output formatting which previously involved multiple tibble round-trips on small data frames. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace select(-n) with base R column removal, and replace mutate/transmute in the info_frac loop of gs_design_ahr with direct column assignment. These functions are called repeatedly during uniroot iterations, so even small per-call savings add up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…frames The original expected_event used data.frame(), merge(), and multiple order() calls for computation on small interval tables. Profiling showed expected_event accounted for 65% of pw_info time, and data.frame overhead was 35% of expected_event time. Rewrite using pure vector operations: compute the union of enrollment and failure breakpoints directly, use stepfun2 for rate lookups, and perform all arithmetic on plain numeric vectors. Only construct a data.frame for the final output when simple=FALSE. This yields an 8x speedup for expected_event (3.6ms -> 0.43ms per call) and ~2.5x speedup for pw_info. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Exported functions gs_power_npe and gs_design_npe must return tibbles for backward compatibility. Add tibble::as_tibble() at the return point to convert the base R data.frame used for fast internal computation back to the expected output type. Also fix row ordering in gs_design_npe to maintain upper-before-lower within each analysis (matching the original arrange(analysis) with upper-first convention). Refine prune_hash to use a 100-entry limit per function, giving predictable memory bounds (each entry is typically a few KB, so ~100KB per cached function). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Profiling revealed that object.size() overcounts gs_power_npe cache entries by ~600x (reports 1.8 MB per entry when true incremental cost is ~3 KB). This is because object.size() walks into shared namespace environments of function arguments, counting the same gsDesign2 namespace (833 KB) and gsDesign namespace (75 KB) for every entry. Changes: - Remove object.size() from the pruning path (both slow and inaccurate) - Only check entry count before insertions, not on cache hits - Set max_entries = 1024, justified by: - True cost: ~3 KB (gs_power_npe) to ~5 KB (ahr) per entry - 1024 entries ≈ 3-5 MB real memory - Supports ~200 cached designs in a session - A single design creates only 5-28 entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

yihui · 2026-05-05T04:55:35Z

R CMD check timing comparison (GHA `R-CMD-check` workflow)

Comparing the latest run on main (24492c62) vs this PR (838cd521).

ubuntu-latest (release)

Step	main	PR	Speedup
`checking examples`	31s/27s	25s/21s	1.3x
`checking examples --run-donttest`	62s/55s	51s/45s	1.2x
`Running testthat.R`	112s/99s	88s/77s	1.3x
Total R CMD check	262s	221s	1.2x

windows-latest (release)

Step	main	PR	Speedup
`checking examples`	34s	28s	1.2x
`checking examples --run-donttest`	82s	67s	1.2x
`Running testthat.R`	154s	126s	1.2x

macos-latest (release)

Step	main	PR	Speedup
`checking examples`	22s/23s	14s/15s	1.6x
`checking examples --run-donttest`	50s/51s	35s/36s	1.4x
`checking tests` (total)	86s	72s	1.2x

Notes

The GHA speedup (~1.2-1.4x) is more modest than local benchmarks (~3-4x) because R CMD check includes overhead (compilation, documentation checks, etc.) and the test suite exercises many functions beyond gs_design_ahr.
The examples show clear improvement because they directly call gs_design_ahr() with various arguments.
All platforms pass with Status: OK.

yihui · 2026-05-07T03:07:22Z

@jdblischak @LittleBeannie This PR is ready. Most commits should be straightforward to understand. The only one that's a little challenging is 85f2dd4 (that's because the original code was also not easy to digest).

Xie and others added 8 commits May 4, 2026 20:38

jdblischak mentioned this pull request May 5, 2026

Optimize return object preparation from gs_power_npe() #624

Merged

yihui added 5 commits May 6, 2026 20:25

Merge branch 'main' into perf/optimize-gs-design-ahr

60883b2

just return data frame

285fb6d

we require R >= 4.1, so stringsAsFactors should be unnecessary

5cf031a

switch x and y to save an as.data.frame() call

74f01b1

a terser way (ans is data.frame)

4152d73

yihui requested review from LittleBeannie and jdblischak May 7, 2026 03:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize gs_design_ahr (~3-4x speedup)#623

perf: optimize gs_design_ahr (~3-4x speedup)#623
yihui wants to merge 13 commits intomainfrom
perf/optimize-gs-design-ahr

yihui commented May 5, 2026

Uh oh!

yihui commented May 5, 2026

Uh oh!

yihui commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yihui commented May 5, 2026

Summary

Benchmark results (20 calls each, after warm-up)

Key changes by commit

Test plan

Uh oh!

yihui commented May 5, 2026

R CMD check timing comparison (GHA R-CMD-check workflow)

ubuntu-latest (release)

windows-latest (release)

macos-latest (release)

Notes

Uh oh!

yihui commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

R CMD check timing comparison (GHA `R-CMD-check` workflow)