Skip to content

Add DILI Discovery Proteomics vignette#4

Draft
Copilot wants to merge 9 commits intomainfrom
copilot/add-dili-discovery-proteomics-vignette
Draft

Add DILI Discovery Proteomics vignette#4
Copilot wants to merge 9 commits intomainfrom
copilot/add-dili-discovery-proteomics-vignette

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 5, 2026

Adds vignettes/DILI_Hotgenes.Rmd, documenting the full limma-based proteomics workflow from the Federspiel et al. 2023 (J Hepatol) DILI dataset (MassIVE: MSV000089782) as a reproducible, educational vignette.

What's included

  • All 12 analysis steps from data-raw/Build_DILI_Hotgenes.R: data loading, protein filtering (contaminants, UniProt reviewed, min_peps > 0), VSN normalization, no-intercept design matrix, voomaByGroup batch correction, robust lmFit, 7 contrasts + eBayes, and Hotgeneslimma() object creation
  • Public data limitations documented — two private-metadata variables (subject for duplicateCorrelation(), channel for per-channel variance) are absent from the MassIVE deposit; the vignette explains the proxy approach used (Pool) and what is omitted
  • Optional MassIVE refresh block (eval=FALSE) for downloading the source .xlsx directly from the deposit
  • Downstream analysis examples using the resulting dili_hotgenes object: DE summaries, DEPlot, VPlot, Venn_Report, DEphe heatmap, ExpsPlot, KEGG GSEA via fgsea_, and Shiny_Hotgenes
dili_hotgenes <- Hotgeneslimma(
  limmafit        = fit2,
  coldata         = coldata,
  Expression      = vm_exp,
  Expression_name = "VSN",
  Exps_list       = list(log2 = log2(expr_matrix + 1)),
  Mapper          = mapper_df
)

Follows the existing vignettes/Hotgenes.Rmd format and style. No production code changes.

Original prompt

Create a new vignette for the Hotgenes package that documents the DILI (Drug-Induced Liver Injury) Discovery Proteomics analysis workflow based on the existing data-raw/Build_DILI_Hotgenes.R script.

Requirements:

  • Convert the Build_DILI_Hotgenes.R script into a comprehensive R Markdown vignette
  • Include all 12 analysis steps (data loading, filtering, normalization, modeling, contrasts, Hotgenes object creation)
  • Add context from the original publication (Federspiel et al. 2023, J Hepatol)
  • Document the MassIVE deposit ID (MSV000089782)
  • Include examples of downstream analyses using the resulting Hotgenes object (plots, GSEA, etc.)
  • Follow the format and style of the existing vignette (vignettes/Hotgenes.Rmd)
  • Include optional code for refreshing data directly from MassIVE
  • Add explanatory text about data limitations and methodology choices

The vignette should serve as both a reproducible example and educational material for users working with proteomics data and limma-based differential expression analysis in the Hotgenes framework.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Hi I need a new vignette that needs to be part of this repo and package. It should be based on this script: @pfizer-opensource/Open-Hotgenes/files/data-raw/Build_DILI_Hotgenes.R

Assistant: I'll help you create a new vignette based on the Build_DILI_Hotgenes.R script. Let me first examine the repository structure to understand how vignettes are organized and the package structure.

Assistant: Now let me check the existing vignettes structure to understand the format:

Assistant: Now I'll create a comprehensive vignette based on the DILI Hotgenes build script. This vignette will document how to build the DILI (Drug-Induced Liver Injury) Hotgenes object from proteomics data.

Assistant: I've created a new vignette DILI_Hotgenes.Rmd for the Open-Hotgenes package based on the Build_DILI_Hotgenes.R script. Here's what the vignette includes:

Key Features:

  1. Clear Workflow Structure - All 12 steps from the script organized as executable sections with explanations

  2. Data Context - Detailed information about:

    • Original publication (Federspiel et al. 2023, J Hepatol)
    • MassIVE deposit ID (MSV000089782)
    • Important notes about private metadata limitations
  3. Processing Steps Documented:

    • Data loading from pre-parsed RDS
    • Sample column identification
    • Protein filtering (contaminants, UniProt reviews, min peptides)
    • VSN normalization
    • voomaByGroup transformation with Pool batch correction
    • Robust lmFit modeling
    • Contrast definitions (7 comparisons)
    • Hotgenes object creation
  4. Downstream Analysis Examples - Demonstrates how to use the resulting DILI Hotgenes object with:

    • Differential expression summaries
    • Volcano plots
    • Venn diagrams
    • Heatmaps
    • Expression plots
    • GSEA analysis
    • Interactive Shiny exploration
  5. Best Practices - Includes optional code for refreshing data directly from MassIVE

The vignette follows the same format and style as the existing Hotgenes.Rmd vignette and is ready to build with the package documentation system.

User: This needs to be a PR.

Assistant: I'm waiting for your confirmation on whether to proceed with creating the pull request. The tool is asking to verify the repository details and PR scope.

This pull request was created from Copilot chat.

Agent-Logs-Url: https://github.com/pfizer-opensource/Open-Hotgenes/sessions/637b665d-7f72-41b9-89f9-23e317a56479

Co-authored-by: Rvirgenslane <53314471+Rvirgenslane@users.noreply.github.com>
Copilot AI changed the title [WIP] Add vignette for DILI discovery proteomics analysis Add DILI Discovery Proteomics vignette May 5, 2026
Copilot AI requested a review from Rvirgenslane May 5, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants