A nextflow pipeline which executes treemix on a population of interest
This pipeline is an attempt to call treemix
on a PLINK binary file. All the steps required to produce the treemix input files
are managed within this pipeline. It's possible to calculate multiple iterations
(and adding -bootstrap as additional parameter) results will be evaluated
with the OptM
package and a consensus tree will be finally generated.
It's also possible to replace treemix with OrientaGraph
This pipeline require at least two input file. First is a PLINK binary files already
filtered by missing genotypes or samples. The second is a TSV file required to extract
the samples of interest from the PLINK file and to assign label ids to treemix cluster.
This file need to have 3 columns: FID, IID and cluster ID: You can derive the first
two columns form the plink binary file (the .fam) file. The third column could be
equal to the first column if you want to use the FID as treemix cluster IDs, or
you could provide different cluster ids, for example if you want more FID be included
in the same group or vice-versa.
The simplest way to calling this pipeline is to call nextflow and provide your plink binary file prefix (the same way you use to specify it with PLINK) and the samples TSV as a parameters, for example:
nexflow run cnr-ibba/nf-treemix -profile singularity --input <samples TSV> \
--plink_prefix <the plink prefix>You can provide outgroup to treemix simply by passing the --treemix_outgroup
option, for example:
nexflow run cnr-ibba/nf-treemix -profile singularity --input <samples TSV> \
--plink_prefix <the plink prefix> --treemix_outgroup <you outgroup ID>If you want to test different migration hypothesis, you could provide the
migrations steps with the --migrations parameter:
nexflow run cnr-ibba/nf-treemix -profile singularity --input <samples TSV> \
--plink_prefix <the plink prefix> --migrations 5This will call treemix 5 times, each one evaluating a different migration. If you
want to call a single migration (for example, only M=5), you can add --single_migration
option to nextflow command line:
nexflow run cnr-ibba/nf-treemix -profile singularity --input <samples TSV> \
--plink_prefix <the plink prefix> --migrations 5 --single_migrationYou can evaluate a single migration hypothesis multiple times, by providing the
--n_iterations parameter:
nexflow run cnr-ibba/nf-treemix -profile singularity --input <samples TSV> \
--plink_prefix <the plink prefix> --migrations 5 --n_iterations 5Having more than 1 iterations let you to evaluate models with the
OptM r-package.
If you prefer calling OrientaGraph instead of treemix, you can use the
--with_orientagraph option:
nexflow run cnr-ibba/nf-treemix -profile singularity --input <samples TSV> \
--plink_prefix <the plink prefix> --migrations 5 --n_iterations 5 \
--with_orientagraphThe best way to manage this pipeline is to create a custom configuration file in which you can specify the parameters you need:
params {
// pipeline input args
input = "<your TSV file>"
plink_prefix = "<the PLINK prefix>"
migrations = 5
single_migration = false
// treemix options
treemix_outgroup = "<your outgroup id>"
treemix_k = 1000
// PLINK options (required to subset data properly)
plink_species_opts = '--chr-set 26 no-xy no-mt --allow-no-sex'
// iterations options
n_iterations = 1
}This let you to provide to software additional parameters. For example, if you
want to provide the -bootstrap option to treemix, you can provide it using
ext.args key, for example:
process {
withName: TREEMIX {
ext.args = '-bootstrap'
}
}Remember that process configurations have a different scopes than parameters, so they need to declared outside parameter configuration.
You can have a list of all the available options by inspecting the nextflow.config file of this pipeline