Ir para o conteúdo

3. Adding modules

The nf-core pipeline template is a working pipeline and comes pre-configured with two modules:

  • FastQC: A tool that performs quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses that can be used to give a quick impression of your data.
  • MultiQC: A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.

3.1 Testing your pipeline

The test profile can be used to check if your pipeline is still working during your development cycle. It is also used as a part of GitHub Actions to test your pipeline during pull requests.

The default template test profile leverages small test files that are stored in the nf-core test data GitHub repository as inputs for the pipeline.

Additionally, the template comes with profiles for the management of software dependencies (e.g., docker, singularity, and conda). For pipelines with processes are shipped with containers/images/recipes, these profiles can be used to change the way dependencies are handled when you execute your pipeline.

Warning

If -profile for managing software dependencies is not specified, the pipeline will run locally and expect all software to be installed and available on the PATH. This is not recommended.

Additional test profiles can be created to test different parts of your pipeline and can also be added to GitHub actions.

Exercise

Run your pipeline with the test and singularity profile.

cd /workspace/gitpod/nf-develop
nextflow run nf-core-myfirstpipeline -profile test,singularity --outdir results

The pipeline should run successfully!

3.2 Adding a new tool to your pipeline

Here, as an example, you want to trim low-quality bases from both ends of your fastq files using the seqtk trim command.

The seqtk trim module will take fastq files from the sample sheet as inputs and will produce trimmed fastq files that can be used as an input for other tools and version information about the seqtk tools that will be mixed into the inputs for the MultiQC process.

While you could develop a module for this tool independently, you can save a lot of time and effort by leveraging nf-core modules and subworkflows.

nf-core modules and subworkflows are written and maintained by the nf-core community. They are designed to be flexible but may require additional configuration to suit different use cases. Currently, there are more than 1200 nf-core modules and 60 nf-core subworkflows (April 2024) available.

Modules and subworkflows can be listed, installed, updated, removed, and patched using nf-core tooling.

3.2.1 Working with branches

GitHub branches are used to isolate development work without affecting other branches in a repository. Each repository has one default branch, and can have multiple other branches.

You can merge updates from one branch into another branch using a pull request.

The nf-core create command initiated three branches: main, dev, and TEMPLATE.

In nf-core, the main branch is for stable releases and the dev branch is for merging feature branches together. This enables the main branch to remain fully functional while new features are developed in feature branches, collected in the dev branch, and then merged into main once they are ready.

Feature branches should be checked out from the dev branch.

Exercise

Checkout a new feature branch named myFeature from the dev branch

git checkout -b myFeature dev

You can find out more about working collaboratively with branches on the GitHub documentation.

Executing revisions

Remote GitHub branches can be executed with Nextflow using the revision flag (e.g., -r dev).

3.2.2 The TEMPLATE branch

The TEMPLATE branch is used by the nf-core sync command to integrate template changes to your pipeline. You should never modify the TEMPLATE branch as any changes will likely disrupt the syncing functionality.

3.2.3 Installing the seqtk_trim module

The nf-core modules list command can be used to show the modules in your local pipeline or the nf-core remote repository.

nf-core modules list remote

The nf-core modules install command can be used to install the seqtk_trim module directly from the nf-core repository:

cd nf-core-myfirstpipeline
nf-core modules install

You can follow the prompts to find and install the module you are interested in:

? Tool name: seqtk_trim

Once selected, the tooling will install the module in the modules/nf-core/ folder and suggest code that you can add to your main workflow file (workflows/mypipeline.nf).

INFO     Installing 'seqtk_trim'
INFO     Use the following statement to include this module:

include { SEQTK_TRIM } from '../modules/nf-core/seqtk/trim/main'

Exercise

Run the nf-core modules install command to add the seqtk_trim module to your pipeline.

To enable reporting and reproducibility, modules and subworkflows from the nf-core repository are tracked using hashes in the modules.json file. When modules are installed or removed using the nf-core tooling the modules.json file will be automatically updated.

Exercise

Open your modules.json file and see if the seqtk_trim module is being tracked.

3.2.4 Adding a module to your pipeline

Although the module has been installed in your local pipeline repository, it is not yet added to your pipeline.

The suggested include statement needs to be added to your workflows/mypipeline.nf file and the process call (with inputs) needs to be added to the workflow block.

workflows/mypipeline.nf
7
8
9
include { FASTQC                 } from '../modules/nf-core/fastqc/main'
include { SEQTK_TRIM             } from '../modules/nf-core/seqtk/trim/main'
include { MULTIQC                } from '../modules/nf-core/multiqc/main'

Exercise

Add the suggested include statement to your mypipeline.nf file.

workflows/mypipeline.nf
include { SEQTK_TRIM             } from '../modules/nf-core/seqtk/trim/main'

To add the SEQTK_TRIM module to your workflow you will need to check what inputs are required.

You can view the input channels for the module by opening the ./modules/nf-core/seqtk/trim/main.nf file.

/modules/nf-core/seqtk/trim/main.nf
input:
tuple val(meta), path(reads)

Each nf-core module also has a meta.yml file which describes the inputs and outputs. This meta file is rendered on the nf-core website, or can be viewed using the nf-core modules info command.

Exercise

Use the nf-core modules info command to view information for the seqtk_trim module

nf-core modules info seqtk_trim

Using this module information you can work out what inputs are required for the SEQTK_TRIM process:

  1. tuple val(meta), path(reads)

    • A tuple with a meta map and a list of fastq files
    • The channel ch_samplesheet used by the FASTQC process can be used as the reads input.

As only one input channel required and it already exists it can be added to your mypipeline.nf file without any additional channel creation or modifications.

Exercise

Add the SEQTK_TRIM process to your mypipeline.nf file.

workflows/mypipeline.nf
//
// MODULE: Run SEQTK_TRIM
//
SEQTK_TRIM (
    ch_samplesheet
)

As with the inputs, you can view the outputs for the module by opening the /modules/nf-core/seqtk/trim/main.nf file and viewing the module metadata.

/modules/nf-core/seqtk/trim/main.nf
output:
tuple val(meta), path("*.fastq.gz"), emit: reads
path "versions.yml"                , emit: versions

To help with organization and readability it is beneficial to create named output channels.

For SEQTK_TRIM, the reads output could be put into a channel named ch_trimmed.

workflows/mypipeline.nf
ch_trimmed  = SEQTK_TRIM.out.reads

Similarly, it is beneficial immediately mix the versions of tools into the ch_versions channel so they can be used as an input for the MULTIQC process.

workflows/mypipeline.nf
ch_versions = ch_versions.mix(SEQTK_TRIM.out.versions.first())

Exercise

Create a channel named ch_trimmed from the SEQTK_TRIM.out.reads output mix the SEQTK_TRIM.out.versions output with the ch_versions channel.

workflows/mypipeline.nf
ch_trimmed  = SEQTK_TRIM.out.reads
ch_versions = ch_versions.mix(SEQTK_TRIM.out.versions.first())

Note

The first operator is used to emit the first item from SEQTK_TRIM.out.versions to avoid duplication.

3.2.5 Additional configuration options

To prevent changing the nf-core modules, additional configuration options can be applied to a module using scopes within configuration files.

The configuration of modules is commonly added to the modules.conf file in the conf folder. Process selectors (e.g., withName) are used to apply configuration to modules selectively. Process selectors must be used within the process scope.

Extra configuration may also be applied as directives by using args. You can find many examples of how arguments are added to modules in nf-core pipelines, for example, the nf-core/rnaseq modules.config file.

Exercise

Add this snippet to your conf/modules.config file to save the trimmed fastq files reports in folders named using meta.id.

conf/modules.config
withName: 'SEQTK_TRIM' {
    publishDir = [
        path: { "${params.outdir}/fq/${meta.id}" },
        mode: params.publish_dir_mode,
        pattern: "*.{fastq.gz}"
    ]
}

Closures

Closures can be used in configuration files to inject code evaluated at runtime.

3.2.6 Checking your module has been added

It is important to regularly check that you have not broken your pipeline during development. Testing often can help identify issues quicker as you have less files have been modified and mistakes will be easier to identify.

The test profile is perfect for this use case.

Exercise

Check your new SEQTK_TRIM process is working by testing your pipeline.

nextflow run nf-core-myfirstpipeline -profile test,singularity --outdir results

The pipeline should execute successfully with a new SEQTK_TRIM process shown in the terminal and result files.


Congratulations! You have added your first nf-core module to the nf-core template!