Skip to content

Commit 7d95dea

Browse files
capccodelookman-olowochristyanamarieddhangddCopilot
authored
Dev/grasp full pipeline (#905)
* feat: migrate GRASP model from PyHealth 1.0 to 2.0 API Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * feat: add GRASP mortality prediction notebook and fix cluster_num Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * Restore code_mapping support in SequenceProcessor for PyHealth 2.0 Adds optional code_mapping parameter to SequenceProcessor that maps granular medical codes to grouped vocabularies (e.g. ICD9CM→CCSCM) before building the embedding table. Resolves the functional gap from the 1.x→2.0 rewrite where code_mapping was removed. Ref #535 Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> * Add RNN baseline and code_mapping comparison notebooks for MIMIC-III Two identical notebooks for A/B testing code_mapping impact on mortality prediction. Only difference is the schema override in Step 2. Both use seed=42 for reproducible splits. Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> * fix(tasks): extract NDC codes instead of drug names for prescription mapping event.drug returns drug names (e.g. "Aspirin") which produce zero matches in CrossMap NDC→ATC; event.ndc returns actual NDC codes enabling 3/3 feature mapping for mortality and readmission tasks. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * test(tasks): add tests verifying NDC extraction in drug tasks Checks that mortality and readmission task processors build vocabulary from NDC codes (numeric strings) rather than drug names (e.g. "Aspirin"), confirming the event.drug -> event.ndc fix works correctly. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * fix(tasks): fix missed MortalityPredictionMIMIC4 event.drug and update docs - Fix event.drug -> event.ndc in MortalityPredictionMIMIC4 (line 282) - Update readmission task docstrings to reflect NDC extraction Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * fix(tasks): fix DrugRecommendationMIMIC3 to extract NDC codes DrugRecommendationMIMIC3 used prescriptions/drug (drug names) via Polars column select; changed to prescriptions/ndc to match MIMIC-4 variant and enable NDC->ATC code mapping. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * fix(models): guard RNNLayer and ConCare against zero-length sequences RNNLayer: clamp sequence lengths to min 1 so pack_padded_sequence does not crash on all-zero masks, matching TCNLayer (tcn.py:186). ConCare: guard covariance divisor with max(n-1, 1) to prevent ZeroDivisionError when attention produces single-element features. Both edge cases are triggered when code_mapping collapses vocabularies and some patients have all codes map to <unk>, producing all-zero embeddings and all-zero masks. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * docs: add docstrings to SequenceProcessor class and fit method Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * docs: add docstrings, type hints, and fix test dims for GRASP module Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * feat: add GRASP mortality prediction notebooks for baseline and code_mapping Baseline notebook runs GRASP with raw ICD-9/NDC codes. Code_mapping notebook collapses vocab via ICD9CM→CCSCM, ICD9PROC→CCSPROC, NDC→ATC for trainable embeddings on full MIMIC-III. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * fix(models): guard ConCare and GRASP against batch_size=1 crashes - ConCare FinalAttentionQKV: bare .squeeze() removed batch dim when batch_size=1, causing IndexError in softmax. Use .squeeze(-1) and .squeeze(1) to target only the intended dimensions. - ConCare cov(): division by zero when x.size(1)==1. Guard with max(). - GRASP grasp_encoder: remove stale torch.squeeze(hidden_t, 0) that collapsed [1, hidden] to [hidden] with batch_size=1. Both RNNLayer and ConCareLayer already return [batch, hidden]. - GRASP random_init: clamp num_centers to num_points to prevent ValueError when cluster_num > batch_size. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * feat: add GRASP mortality prediction notebooks for baseline and code_mapping Baseline notebook runs GRASP with raw ICD-9/NDC codes. Code_mapping notebook collapses vocab via ICD9CM→CCSCM, ICD9PROC→CCSPROC, NDC→ATC for trainable embeddings on full MIMIC-III. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * Add code_mapping as task __init__ argument Allow tasks to accept a code_mapping dict that upgrades input_schema entries so SequenceProcessor maps raw codes (e.g. ICD9CM) to grouped vocabularies (e.g. CCSCM) at fit/process time. This avoids manual schema manipulation after task construction. - Add code_mapping parameter to BaseTask.__init__() - Thread **kwargs + super().__init__() through all task subclasses with existing __init__ methods (4 readmission tasks, 1 multimodal mortality task) - Add 17 tests covering SequenceProcessor mapping and task-level code_mapping initialization Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * Update code_mapping notebook to use task init argument Replace manual task.input_schema override with the new code_mapping parameter on MortalityPredictionMIMIC3(). Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * feat(examples): add ConCare hyperparameter grid sweep script Mirrors the GRASP+ConCare mortality notebook pipeline exactly (same tables, split, seed, metrics) but sweeps 72 configurations of embedding_dim, hidden_dim, cluster_num, lr, and weight_decay. Results are logged to sweep_results.csv. Supports --root for pointing at local MIMIC-III, --code-mapping, --dev, and --monitor. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * chore(sweep): increase early stopping patience from 10 to 15 epochs Smaller ConCare configs (embedding_dim=8/16) may learn slower and need more epochs before plateauing. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * Initial plan * fix: filter falsy NDCs, guard None tokens in process(), fix NDC regex Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-authored-by: ddhangdd <43976109+ddhangdd@users.noreply.github.com> * refactor(sweep): rename and generalize sweep script for all backbones Rename sweep_concare_grasp.py → sweep_grasp.py. Now supports --block GRU|ConCare|LSTM with per-backbone default grids, --resume for crash recovery, --grid JSON override, auto-dated output dirs (sweep/{BLOCK}_{YYYYMMDD}_{HHMMSS}_{mapping}/), and config.json saved alongside results for reproducibility. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * test(sweep): add unit and integration tests for sweep_grasp utilities Covers grid building, combo hashing, CSV resume parsing, output directory naming, and end-to-end single-config runs for GRU and ConCare on synthetic data (13 tests, all passing). Co-Authored-By: Colton Loew <loewcx@illinois.edu> Co-Authored-By: lookman-olowo <lookman-olowo@github.com> Co-Authored-By: christiana-beard <christiana-beard@github.com> Co-Authored-By: ddhangdd <ddhangdd@github.com> * docs(sweep): add tmux copy-paste instructions for each paper run Co-Authored-By: Colton Loew <loewcx@illinois.edu> Co-Authored-By: lookman-olowo <lookman-olowo@github.com> Co-Authored-By: christiana-beard <christiana-beard@github.com> Co-Authored-By: ddhangdd <ddhangdd@github.com> * chore(examples): adds cleans examples, removes util script * Delete tests/core/test_grasp.py we removed grasp script from examples, dropped test * Revert "Delete tests/core/test_grasp.py" This reverts commit 0d95758. * fix: remove orphaned sweep test, restore grasp tests * feat(grasp): add static_key support for demographic features with tests * fix(test): add valid NDC to test prescriptions so readmit test produces both labels --------- Co-authored-by: lookman-olowo <lookmanolowo@hotmail.com> Co-authored-by: christiana-beard <christyanamarie116@gmail.com> Co-authored-by: ddhangdd <dfung2@wisc.edu> Co-authored-by: Lookman Olowo <42081779+lookman-olowo@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ddhangdd <43976109+ddhangdd@users.noreply.github.com> Co-authored-by: ddhangdd <desmondfung123@gmail.com> Co-authored-by: Colton Loew <loewcx@illinois.edu> Co-authored-by: lookman-olowo <lookman-olowo@github.com> Co-authored-by: christiana-beard <christiana-beard@github.com> Co-authored-by: ddhangdd <ddhangdd@github.com> Co-authored-by: lookman-olowo <lookman-olowo@users.noreply.github.com>
1 parent 8c0f157 commit 7d95dea

17 files changed

+11161
-234
lines changed

examples/mortality_prediction/mortality_mimic3_grasp.py

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,44 @@
1+
import tempfile
2+
13
from pyhealth.datasets import MIMIC3Dataset
24
from pyhealth.datasets import split_by_patient, get_dataloader
35
from pyhealth.models import GRASP
4-
from pyhealth.tasks import mortality_prediction_mimic3_fn
6+
from pyhealth.tasks import MortalityPredictionMIMIC3
57
from pyhealth.trainer import Trainer
68

79
if __name__ == "__main__":
810
# STEP 1: load data
911
base_dataset = MIMIC3Dataset(
10-
root="/srv/local/data/physionet.org/files/mimiciii/1.4",
12+
root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III",
1113
tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
14+
cache_dir=tempfile.TemporaryDirectory().name,
15+
dev=True,
1216
)
13-
base_dataset.stat()
17+
base_dataset.stats()
1418

1519
# STEP 2: set task
16-
sample_dataset = base_dataset.set_task(mortality_prediction_mimic3_fn)
17-
sample_dataset.stat()
20+
task = MortalityPredictionMIMIC3()
21+
sample_dataset = base_dataset.set_task(task)
1822

1923
train_dataset, val_dataset, test_dataset = split_by_patient(
2024
sample_dataset, [0.8, 0.1, 0.1]
2125
)
22-
train_dataloader = get_dataloader(train_dataset, batch_size=256, shuffle=True)
23-
val_dataloader = get_dataloader(val_dataset, batch_size=256, shuffle=False)
24-
test_dataloader = get_dataloader(test_dataset, batch_size=256, shuffle=False)
26+
train_dataloader = get_dataloader(train_dataset, batch_size=32, shuffle=True)
27+
val_dataloader = get_dataloader(val_dataset, batch_size=32, shuffle=False)
28+
test_dataloader = get_dataloader(test_dataset, batch_size=32, shuffle=False)
2529

2630
# STEP 3: define model
2731
model = GRASP(
2832
dataset=sample_dataset,
29-
feature_keys=["conditions", "procedures"],
30-
label_key="label",
31-
mode="binary",
32-
use_embedding=[True, True, True],
33-
embedding_dim=32,
34-
hidden_dim=32,
33+
cluster_num=2,
3534
)
3635

3736
# STEP 4: define trainer
3837
trainer = Trainer(model=model)
3938
trainer.train(
4039
train_dataloader=train_dataloader,
4140
val_dataloader=val_dataloader,
42-
epochs=5,
41+
epochs=1,
4342
monitor="roc_auc",
4443
)
4544

examples/mortality_prediction/mortality_mimic3_grasp_gru_code_mapping_cached.ipynb

Lines changed: 3258 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)