10. Nextflow configuration¶
A key Nextflow feature is the ability to decouple the workflow implementation by the configuration setting required by the underlying execution platform. This enables portable deployment without the need to modify the application code.
When you launch Nextflow, it will look for configuration files in several locations. As each source can contain conflicting settings, the sources are ranked to decide which settings to apply. Configuration sources are reported below and listed in order of priority:
- Parameters specified on the command line (
--parameter
) - Parameters that are provided using the
-params-file
option - Config file that are provided using the
-c
option - The config file named
nextflow.config
in the current directory - The config file named
nextflow.config
in the pipeline project directory - The config file
$HOME/.nextflow/config
- Values defined within the pipeline script itself (e.g.,
main.nf
)
10.1 Parameters¶
Parameters are pipeline specific settings. Parameters can be defined in the workflow script using the params
keyword followed by the parameter name. For example:
At the highest level, parameters can be customised using the command line.
Any parameter can be configured on the command line by prefixing the parameter name with a double dash (--).
For example, the greeting
parameter in the hello.nf
script can be configured using the command line as follows:
Instead of including each parameter on the command line, parameters can also be configured using the -params-file
and a JSON or YML file:
Multiple parameters can be included in one params file and added to the execution command using the -params-file
option:
Exercise
Run the hello.nf
script with the greeting
parameter set to Hallo Welt!
using a params file.
Tip
Parameters files are useful to consolidate large number of parameters in a single file.
Summary
In this step you have learned:
- How to define parameters in a workflow script
- How to configure parameters on the command line
- How to configure parameters using
-params-file
10.2 Configuration files¶
When a workflow script is launched, Nextflow looks for a file named nextflow.config
in the current directory and in the script base directory (if it is not the same as the current directory). Finally, it checks for the file: $HOME/.nextflow/config
.
When more than one of the above files exists, they are merged, so that the settings in the first override the same settings that may appear in the second, and so on.
The default config file search mechanism can be extended by providing an extra configuration file by using the command line option: -c <config file>
. For example:
10.2.1 Config syntax¶
A Nextflow configuration file is a simple text file containing a set of properties defined using the syntax:
nextflow.config | |
---|---|
Info
String values need to be wrapped in quotation characters while numbers and boolean values (true
, false
) do not. Also, note that values are typed, meaning for example that, 1
is different from '1'
, since the first is interpreted as the number one, while the latter is interpreted as a string value.
10.2.2 Config variables¶
Configuration properties can be used as variables in the configuration file itself, by using the usual $propertyName
or ${expression}
syntax.
nextflow.config | |
---|---|
Tip
In the configuration file it’s possible to access any variable defined in the host environment such as $PATH
, $HOME
, $PWD
, etc.
10.2.3 Config comments¶
Configuration files use the same conventions for comments used in the Nextflow script:
10.2.4 Config scopes¶
Configuration settings can be organized in different scopes by dot prefixing the property names with a scope identifier or grouping the properties in the same scope using the curly brackets notation:
nextflow.config | |
---|---|
10.2.5 Config params¶
The scope params
allows the definition of workflow parameters that override the values defined in the main workflow script.
This is useful to consolidate one or more execution parameters in a separate file.
snippet.nf | |
---|---|
Exercise
Using the code blocks above, run snippet.nf
without specifying any parameters. Then, run it again specifying the foo
parameter on the command line.
Solution
Run the script without any modification:
Execute the snippit again specifying the foo
parameter on the command line:
Note how the foo
parameter is overridden by the value specified on the command line and the bar
parameter is taken from the configuration file.
10.2.6 Config env¶
The env
scope allows the definition of one or more variables that will be exported into the environment where the workflow tasks will be executed.
snippet.nf | |
---|---|
Executing the snippets above will produce the following output:
10.2.7 Config process¶
Process directives allow the specification of settings for the task execution such as cpus
, memory
, container
, and other resources in the workflow script.
This is useful when prototyping a small workflow script.
However, it’s always a good practice to decouple the workflow execution logic from the process configuration settings, i.e. it’s strongly suggested to define the process settings in the workflow configuration file instead of the workflow script.
The process
configuration scope allows the setting of any process
directives in the Nextflow configuration file:
nextflow.config | |
---|---|
The above config snippet defines the cpus
, memory
and container
directives for all processes in your workflow script.
The process selector can be used to apply the configuration to a specific process or group of processes (discussed later).
Info
Memory and time duration units can be specified either using a string-based notation in which the digit(s) and the unit can be separated by a blank or by using the numeric notation in which the digit(s) and the unit are separated by a dot character and are not enclosed by quote characters.
String syntax | Numeric syntax | Value |
---|---|---|
'10 KB' |
10.KB |
10240 bytes |
'500 MB' |
500.MB |
524288000 bytes |
'1 min' |
1.min |
60 seconds |
'1 hour 25 sec' |
- | 1 hour and 25 seconds |
The syntax for setting process
directives in the configuration file requires =
(i.e. assignment operator), whereas it should not be used when setting the process directives within the workflow script.
Example
This is especially important when you want to define a config setting using a dynamic expression using a closure. For example, in a workflow script:
You can also define the same setting in the configuration file using a similar syntax:
Directives that require more than one value, e.g. pod, in the configuration file need to be expressed as a map object.
Finally, directives that are to be repeated in the process definition, in the configuration files need to be defined as a list object:
10.2.8 Config Docker execution¶
The container image to be used for the process execution can be specified in the nextflow.config
file:
The use of unique "SHA256" Docker image IDs guarantees that the image content does not change over time, for example:
nextflow.config | |
---|---|
10.2.9 Config Singularity execution¶
To run a workflow execution with Singularity, a container image file path is required in the Nextflow config file using the container directive:
Info
The container image file must be an absolute path: it must start with a /
.
The following protocols are supported:
library://
download the container image from the Singularity Library service.shub://
download the container image from the Singularity Hub.docker://
download the container image from the Docker Hub and convert it to the Singularity format.docker-daemon://
pull the container image from a local Docker installation and convert it to a Singularity image file.
Warning
Singularity hub shub://
is no longer available as a builder service. Though existing images from before 19th April 2021 will still work.
Tip
By specifying a plain Docker container image name, Nextflow implicitly downloads and converts it to a Singularity image when the Singularity execution is enabled.
The above configuration instructs Nextflow to use the Singularity engine to run your script processes. The container is pulled from the Docker registry and cached in the current directory to be used for further runs.
Alternatively, if you have a Singularity image file, its absolute path location can be specified as the container name either using the -with-singularity
option or the process.container
setting in the config file.
10.2.10 Config Conda execution¶
The use of a Conda environment can also be provided in the configuration file by adding the following setting in the nextflow.config
file:
nextflow.config | |
---|---|
You can specify the path of an existing Conda environment as either directory or the path of Conda environment YAML file.
Summary
In this step you have learned:
- How to write a Nextflow configuration file.
- How to use configuration files to define parameters, environment variables, and process directives
- How to use configuration files to define Docker, Singularity, and Conda execution
- How to use configuration files to define process directives