Skip to content

Commit 7f1cd40

Browse files
committed
Optimisations for executing on clusters using dask.distributed - documented remaining warnings.
1 parent 8614fb1 commit 7f1cd40

1 file changed

Lines changed: 12 additions & 0 deletions

File tree

src/pyscenic/prune.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,18 @@ def wrap(data):
261261
# of recovery curves - 20K features (max. enriched) * rank_threshold * 8 bytes (float) * num_cores),
262262
# this might not be a sound idea to do.
263263

264+
# NOTE ON REMAINING WARNINGS:
265+
# >> distributed.worker - WARNING - Memory use is high but worker has no data to store to disk.
266+
# >> Perhaps some other process is leaking memory? Process memory: 1.51 GB -- Worker memory limit: 2.15 GB
267+
# My current idea is that this cannot be avoided processing a single module can sometimes required
268+
# substantial amount of memory because of pre-allocation of recovery curves (see code notes on how to
269+
# mitigate this problem). Setting module_chunksize=1 also limits this problem.
270+
#
271+
# >> distributed.utils_perf - WARNING - full garbage collections took 10% CPU time recently (threshold: 10%)
272+
# The current implementation of module2df removes substantial amounts of memory (i.e. the RCCs) so this might
273+
# again be unavoidable. TBI + See following stackoverflow question:
274+
# https://stackoverflow.com/questions/47776936/why-is-a-computation-much-slower-within-a-dask-distributed-worker
275+
264276
return aggregate_func(
265277
(delayed(transform_func)
266278
(db, gs_chunk, delayed_or_future_annotations)

0 commit comments

Comments
 (0)