Part 5: Hello Config¶
This section will explore how to configure Nextflow pipelines using configuration files, profiles, process directives, and executors. Configuration management is an essential aspect of Nextflow pipeline development, allowing you to customize the behavior of your pipeline, adapt it to different environments, and optimize resource usage. By understanding and effectively utilizing these configuration options, you can enhance the flexibility, scalability, and performance of your pipelines.
1. Check and modify configuration¶
1.1. Run nf-hello-gatk with default settings¶
When you run the pipeline with the default settings using the command above, the following happens:
- Nextflow downloads the pipeline from the GitHub repository
seqeralabs/nf-hello-gatk
. - It then executes the pipeline using the default configuration.
- The pipeline will likely use Docker containers to run the required tools (Samtools and GATK).
- It processes the input BAM files, creates index files, and performs variant calling.
- The results are generated by default in the
results
directory. - Nextflow also creates a
work
directory containing intermediate files and logs. - Upon completion, Nextflow displays a run summary, including any errors or warnings.
Now, let's see how this was configured and set up.
1.2. Check configuration¶
Open the nextflow.config
file and inspect the contents:
The contents should look like this:
nextflow.config | |
---|---|
This config block tells the pipeline to use Docker containers to run the required tools.
1.3. Modify configuration¶
Let's modify the configuration to use Conda instead of Docker and explicitly disable Docker.
Before:
nextflow.config | |
---|---|
After:
Now let's run the pipeline again with the modified configuration:
This time, the pipeline will use Conda environments to run the required tools.
Takeaway¶
You know how to switch software packaging systems using configuration files.
What's next?¶
Learn how to use profiles to customize the behavior of your pipeline.
2. Profiles¶
Profiles are a way to customize the behavior of Nextflow pipelines by selection, rather than setting them permanently.
2.1. Create a profile¶
Before:
After:
nextflow.config | |
---|---|
2.2. Run the pipeline with a profile¶
or
As demonstrated above, by creating and using profiles, we've enhanced our pipeline's flexibility and ease of use.
We can now run our pipeline with Docker or Conda using a single command line argument by specifying the appropriate profile (-profile docker
or -profile conda
).
This method of configuration management improves the portability and maintainability of our Nextflow pipeline, enabling us to accommodate various execution scenarios easily.
Takeaway¶
You know how to use profiles to customize the configuration of your pipeline.
What's next?¶
Learn how to change process resource use with configuration.
3. Process directives and resources¶
3.1. Process directives¶
In a previous training module, we used process directives to modify the behavior of a process when we added the publishDir
directive to export files from the working directory.
Let's look into directives in more detail.
3.1.1 Set process resources¶
By default, Nextflow will use a single CPU and 2GB of memory for each process.
We can modify this behavior by setting the cpu
and memory
directives in the process
block.
Add the following to the end of your nextflow.config
file:
Run the pipeline again with the modified configuration:
You shouldn't see any difference; however, you might notice that the three processes get bottlenecked behind each other. This is because Nextflow will ensure we aren't using more CPUs than are available.
Tip
You can check the number of CPUs given to the process by looking at the .command.run. There will be a function called nxf_launch()
that includes the command docker run -i --cpu-shares 1024
, where --cpu-shares
is the number of CPUs given to the process multiplied by 1024.
3.1.2 Modify process resources for a specific process¶
We can also modify the resources for a specific process using the withName
directive.
Add the following to the end of your nextflow.config
file:
nextflow.config | |
---|---|
Run the pipeline again with the modified configuration:
Now, the settings are only applied to the GATK HaplotypeCaller process. This is useful when your processes have different resource requirements so you can right-size your resources for each process.
Takeaway¶
You know how to modify process resources using configuration files.
What's next?¶
Learn how to change the executor used by Nextflow.
4. Executor¶
4.1. Local executor¶
Until now, we have been running our pipeline with the local executor. This runs each step on the same machine that Nextflow is running on. However, for large genomics pipelines, you will want to use a distributed executor. Nextflow supports several different distributed executors, including:
- HPC (SLURM, PBS, SGE)
- AWS Batch
- Google Batch
- Azure Batch
- Kubernetes
We can modify the executor used by nextflow using the executor
process directive. Because local
is the default executor, the following configuration is implied:
4.2. Other executors¶
Note
This is a demonstration and designed to go wrong!
If we wish to change executor, we could simply set this to one of the values in the documentation:
However, if we add this to our config and run the pipeline we will that includes this error:
Nextflow has interpreted that we wish to submit to a Slurm cluster, which requires the use of the command sbatch
.
However, because our Gitpod instance doesn't have slurm installed (and isn't connected to a cluster) this throws an error.
If we check inside the .command.run
file created in the work directory, we can see that Nextflow has created a script to submit the job to Slurm.
Note
The output of your nextflow console will have the hash of the work subdirectory, which will differ from the paths shown below.
If our process had more directives, such as clusterOptions
, cpus
, memory
, queue
, and time
, these would also be included in the .command.run
file and directly passed to the Slurm execution.
They would also be translated to the equivalent options for other executors.
This is how Nextflow creates the commands required to correctly submit a job to the sbatch cluster via a single configuration change.
4.3. Using Executors in Profiles¶
Let's combine profiles
with executors
. Add the following to your configuration file:
Remove the following lines:
Before:
nextflow.config | |
---|---|
After:
profiles {
docker {
docker.enabled = true
conda.enabled = false
}
conda {
docker.enabled = false
conda.enabled = true
}
local {
process.executor = 'local'
}
slurm {
process.executor = 'slurm'
}
}
Now run the pipeline using two profiles, docker
and local
:
We have returned to the original configuration of using Docker containers with local execution. However, now we can use profiles to switch to a different software packaging system (conda) or a different executor (slurm) with a single command-line option.
Takeaway¶
You now know how to change the executor in Nextflow.
What's next?¶
Well done! You've successfully modified the execution of a pipeline without altering a single line of code. This highlights the power of Nextflow's configuration; enabling you to control how the pipeline runs without changing what it runs. Use this flexibility to adapt your pipeline to run in any environment.