Open
Conversation
object.size() walks the entire hash table structure on every cache_fun() call, taking ~2ms per invocation. Since cache_fun is called 15+ times per gs_design_ahr run (via expected_time/ahr and gs_power_npe), this adds up to significant overhead. Replace with numhash() which returns the entry count in O(1), and use clrhash() for a simple eviction strategy when the limit is exceeded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gs_power_npe is called 9+ times per gs_design_ahr invocation (via gs_design_npe's bracket search and uniroot). The tibble() + mutate() + arrange() output assembly accounted for ~39% of gs_power_npe time. Replacing with data.frame() and base R ordering is 8x faster for the output assembly step, yielding ~25% improvement in overall gs_design_ahr runtime for multi-analysis designs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_npe The output assembly in gs_design_npe used full_join (to merge H0 and H1 probabilities), select, rename, and arrange from dplyr. Since gs_design_npe is called once per gs_design_ahr and these operations are on small data frames (6 rows), base R merge() and column subsetting are much faster. Combined with the gs_power_npe change, this yields ~50% overall improvement for multi-analysis designs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace mutate, full_join, select, arrange, and filter operations in the output assembly section of gs_design_ahr with equivalent base R operations (direct column assignment, merge, column subsetting, order). This eliminates the dplyr overhead for the final output formatting which previously involved multiple tibble round-trips on small data frames. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace select(-n) with base R column removal, and replace mutate/transmute in the info_frac loop of gs_design_ahr with direct column assignment. These functions are called repeatedly during uniroot iterations, so even small per-call savings add up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…frames The original expected_event used data.frame(), merge(), and multiple order() calls for computation on small interval tables. Profiling showed expected_event accounted for 65% of pw_info time, and data.frame overhead was 35% of expected_event time. Rewrite using pure vector operations: compute the union of enrollment and failure breakpoints directly, use stepfun2 for rate lookups, and perform all arithmetic on plain numeric vectors. Only construct a data.frame for the final output when simple=FALSE. This yields an 8x speedup for expected_event (3.6ms -> 0.43ms per call) and ~2.5x speedup for pw_info. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exported functions gs_power_npe and gs_design_npe must return tibbles for backward compatibility. Add tibble::as_tibble() at the return point to convert the base R data.frame used for fast internal computation back to the expected output type. Also fix row ordering in gs_design_npe to maintain upper-before-lower within each analysis (matching the original arrange(analysis) with upper-first convention). Refine prune_hash to use a 100-entry limit per function, giving predictable memory bounds (each entry is typically a few KB, so ~100KB per cached function). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Profiling revealed that object.size() overcounts gs_power_npe cache entries by ~600x (reports 1.8 MB per entry when true incremental cost is ~3 KB). This is because object.size() walks into shared namespace environments of function arguments, counting the same gsDesign2 namespace (833 KB) and gsDesign namespace (75 KB) for every entry. Changes: - Remove object.size() from the pruning path (both slow and inaccurate) - Only check entry count before insertions, not on cache hits - Set max_entries = 1024, justified by: - True cost: ~3 KB (gs_power_npe) to ~5 KB (ahr) per entry - 1024 entries ≈ 3-5 MB real memory - Supports ~200 cached designs in a session - A single design creates only 5-28 entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
R CMD check timing comparison (GHA
|
| Step | main | PR | Speedup |
|---|---|---|---|
checking examples |
31s/27s | 25s/21s | 1.3x |
checking examples --run-donttest |
62s/55s | 51s/45s | 1.2x |
Running testthat.R |
112s/99s | 88s/77s | 1.3x |
| Total R CMD check | 262s | 221s | 1.2x |
windows-latest (release)
| Step | main | PR | Speedup |
|---|---|---|---|
checking examples |
34s | 28s | 1.2x |
checking examples --run-donttest |
82s | 67s | 1.2x |
Running testthat.R |
154s | 126s | 1.2x |
macos-latest (release)
| Step | main | PR | Speedup |
|---|---|---|---|
checking examples |
22s/23s | 14s/15s | 1.6x |
checking examples --run-donttest |
50s/51s | 35s/36s | 1.4x |
checking tests (total) |
86s | 72s | 1.2x |
Notes
- The GHA speedup (~1.2-1.4x) is more modest than local benchmarks (~3-4x) because R CMD check includes overhead (compilation, documentation checks, etc.) and the test suite exercises many functions beyond
gs_design_ahr. - The examples show clear improvement because they directly call
gs_design_ahr()with various arguments. - All platforms pass with Status: OK.
Collaborator
Author
|
@jdblischak @LittleBeannie This PR is ready. Most commits should be straightforward to understand. The only one that's a little challenging is 85f2dd4 (that's because the original code was also not easy to digest). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gs_design_ahr()and its dependency chain for 3-4x speedup across all input variantsobject.size()calls in cache pruning with O(1)numhash()checkgs_power_npe,gs_design_npe,gs_design_ahr,expected_time,gs_info_ahr)expected_event()internals using pure vector arithmetic instead of data.frame/merge/order operations (8x speedup for this function alone)Benchmark results (20 calls each, after warm-up)
Key changes by commit
object.size()(walks entire hash, ~2ms/call) withnumhash()(O(1)) for the frequent check; clear when entry count > 100tibble()+mutate()+arrange()withdata.frame()+ base R sort (called 9+ times per design via gs_design_npe's root-finding)full_join+select+rename+arrangewithmerge()+ column subsettingdplyr::select(),dplyr::mutate(),dplyr::transmute()fromexpected_time,gs_info_ahr, and the info_frac loop ings_design_ahrgs_power_npe,gs_design_npe) still return tibbles viaas_tibble()at the return boundaryTest plan
🤖 Generated with Claude Code