Saltar a contenido

6. Processes

In Nextflow, a process is the basic computing primitive to execute foreign functions (i.e., custom scripts or tools).

The process definition starts with the keyword process, followed by the process name and finally the process body delimited by curly brackets.

A basic process, only using the script definition block, looks like the following:

snippet.nf
1
2
3
4
5
6
process SAYHELLO {
    script:
    """
    echo 'Hello world!'
    """
}

Info

The process name is commonly written in upper case by convention.

However, the process body can contain up to five definition blocks:

  1. Directives are initial declarations that define optional settings
  2. Input defines the expected input channel(s)
  3. Output defines the expected output channel(s)
  4. When is an optional clause statement to allow conditional processes
  5. Script is a string statement that defines the command to be executed by the process' task

The full process syntax is defined as follows:

Click the icons in the code for explanations.

process < name > {
    [ directives ] // (1)!

    input: // (2)!
    < process inputs >

    output: // (3)!
    < process outputs >

    when: // (4)!
    < condition >

    [script|shell|exec]: // (5)!
    """
    < user script to be executed >
    """
}
  1. Zero, one, or more process directives
  2. Zero, one, or more process inputs
  3. Zero, one, or more process outputs
  4. An optional boolean conditional to trigger the process execution
  5. The command to be executed

6.1 Script

The script block is a string statement that defines the command to be executed by the process.

A process can execute only one script block. It must be the last statement when the process contains input and output declarations.

The script block can be a single or a multi-line string. The latter simplifies the writing of non-trivial scripts composed of multiple commands spanning over multiple lines. For example:

snippet.nf
process EXAMPLE {
    script:
    """
    echo 'Hello world!\nHola mundo!\nCiao mondo!\nHallo Welt!' > file
    cat file | head -n 1 | head -c 5 > chunk_1.txt
    gzip -c chunk_1.txt  > chunk_archive.gz
    """
}

workflow {
    EXAMPLE()
}

Tip

In the snippet below the directive debug is used to enable the debug mode for the process. This is useful to print the output of the process script in the console.

By default, the process command is interpreted as a Bash script. However, any other scripting language can be used by simply starting the script with the corresponding Shebang declaration. For example:

snippet.nf
process PYSTUFF {
    debug true

    script:
    """
    #!/usr/bin/env python

    x = 'Hello'
    y = 'world!'
    print ("%s - %s" % (x, y))
    """
}

workflow {
    PYSTUFF()
}
Output
Hello-world

Tip

Multiple programming languages can be used within the same workflow script. However, for large chunks of code it is better to save them into separate files and invoke them from the process script. One can store the specific scripts in the ./bin/ folder.

6.1.1 Script parameters

Script parameters (params) can be defined dynamically using variable values. For example:

snippet.nf
params.data = 'World'

process FOO {
    debug true

    script:
    """
    echo Hello $params.data
    """
}

workflow {
    FOO()
}
Output
Hello World

Info

A process script can contain any string format supported by the Groovy programming language. This allows us to use string interpolation as in the script above or multiline strings. Refer to String interpolation for more information.

Warning

Since Nextflow uses the same Bash syntax for variable substitutions in strings, Bash environment variables need to be escaped using the \ character. The escaped version will be resolved later, returning the task directory (e.g. work/7f/f285b80022d9f61e82cd7f90436aa4/), while $PWD would show the directory where you're running Nextflow.

snippet.nf
process FOO {
    debug true

    script:
    """
    echo "The current directory is \$PWD"
    """
}

workflow {
    FOO()
}

Your expected output will look something like this:

Output
The current directory is /workspace/gitpod/nf-training/work/7a/4b050a6cdef4b6c1333ce29f7059a0

It can be tricky to write a script that uses many Bash variables. One possible alternative is to use a script string delimited by single-quote characters (').

snippet.nf
process BAR {
    debug true

    script:
    '''
    echo "The current directory is $PWD"
    '''
}

workflow {
    BAR()
}

Your expected output will look something like this:

Output
The current directory is /workspace/gitpod/nf-training/work/7a/4b050a6cdef4b6c1333ce29f7059a0

However, using the single quotes (') will block the usage of Nextflow variables in the command script.

Another alternative is to use a shell statement instead of script and use a different syntax for Nextflow variables, e.g., !{..}. This allows the use of both Nextflow and Bash variables in the same script.

snippet.nf
params.data = 'le monde'

process BAZ {
    shell:
    '''
    X='Bonjour'
    echo $X !{params.data}
    '''
}

workflow {
    BAZ()
}

6.1.2 Conditional script

The process script can also be defined in a completely dynamic manner using an if statement or any other expression for evaluating a string value. For example:

snippet.nf
params.compress = 'gzip'
params.file2compress = "$projectDir/data/ggal/transcriptome.fa"

process FOO {
    debug true

    input:
    path file

    script:
    if (params.compress == 'gzip')
        """
        echo "gzip -c $file > ${file}.gz"
        """
    else if (params.compress == 'bzip2')
        """
        echo "bzip2 -c $file > ${file}.bz2"
        """
    else
        throw new IllegalArgumentException("Unknown compressor $params.compress")
}

workflow {
    FOO(params.file2compress)
}

Exercise

Execute this script using the command line to choose bzip2 compression.

Solution

Execute the following command:

nextflow run snippet.nf --compress bzip2

The output will look like this:

Output
bzip2 -c transcriptome.fa > transcriptome.fa.bz2

Summary

In this step you have learned:

  1. How to use the script declaration to define the command to be executed by the process
  2. How to use the params variable to define dynamic script parameters
  3. How to use the shell declaration to define the command to be executed by the process
  4. How to use the if statement to define a conditional script

6.2 Inputs

Nextflow process instances (tasks) are isolated from each other but can communicate between themselves by sending values through channels.

Inputs implicitly determine the dependencies and the parallel execution of the process. The process execution is fired each time new data is ready to be consumed from the input channel:

eyJ2ZXJzaW9uIjoiMSIsImVuY29kaW5nIjoiYnN0cmluZyIsImNvbXByZXNzZWQiOnRydWUsImVuY29kZWQiOiJ4nO1daVNcIslcdTAwMTb93r/CcL6ONbkvXHUwMDEz8eJcdTAwMDWotKjYbrTii1x0o4BCSpZCKESc6P/+bqJSXHUwMDA1VLGjpdHMRKu1ZGVl3nPuuTdcdTAwMTf+/ba1te33W87231vbzlPJrrvltt3b/tNcdTAwMWN/dNpcdTAwMWTXa8IpMvi743XbpcGVVd9vdf7+66+G3a45fqtul1x1MDAxY+vR7XTtesfvll3PKnmNv1xc32l0/mv+PbFcdTAwMWLOf1peo+y3reAhO07Z9b32y7OcutNwmn5cdTAwMDdK/1x1MDAxZvy9tfXv4N9Q7ex223up2OBwUDmO1fjRXHUwMDEzrzmoKJOaUi4lXHUwMDFlXuB29uBRvlOGs1x1MDAxNaiuXHUwMDEznDGHtq/9Vv7azaRcdTAwMWZaT9+ru9fnd1x1MDAxOFeywVMrbr1+4ffrL61gl6rdtlx1MDAxM5zt+G2v5lxcuWW/alx1MDAxZT52fHhfx4NcdTAwMDZcYu5qe927atPpmHdcdTAwMGYq6rXskuv3zTGEhkft5t2gjODIk+lcdTAwMWWiLY2pQlxiY8TgjenwtCmAIWppLVx1MDAxOVeUaim0XHUwMDFlq9iuV4dugIr9wTRHXFxcdTAwMDZVK9ql2lx1MDAxZNSvWVx1MDAxZV7jt+1mp2W3obOC63qvryyVRTRVUFx1MDAxNyo5XHUwMDExjFxmr6g67l3VNzW3kKZcdTAwMDL+Z0xQwjVjoXZyXHUwMDA2/YIloVx1MDAxOEtBgtYwlWhly1x1MDAwM/P4J9xyzfJry73ZS2Ax5PXIr+B1zPX7IUtcdTAwMGKK6rbK9otVYCE5XHUwMDEziiGJRNDMdbdZg5PNbr1cdTAwMWVcdTAwMWPzSrVcYkPq+HbbT7vNstu8XHUwMDFiv8VploMzoSq/mn92YIP7JOs37eec+717dn5fdi55bjfoXHUwMDE0Y4VeqWvqv1x1MDAwM42JmUBCcUVcYmLQuUHXm1azW+aFsIU5llhRuFSZXHUwMDFmXHUwMDEzXHJTtzv+rtdouD60wannNv3xilx1MDAwZl4qZSBYdexyxGuFz41jtWVKXGagbT7Bb1uBMVx1MDAwZv5cdTAwMTj+/s+fkVfH2pj57ESYV1Dgt/DP17ef4Fx1MDAxOaded1tcdTAwMWQnimmoXHUwMDE0cUyDXHUwMDExUVgqqNfcVDO9m1x1MDAxN6JcdTAwMWE8dnxzVEOxtihXSCuMOFx1MDAxN4hcdTAwMDQt0lx1MDAxZrRcdTAwMTC2KFFKMaJcdDfXxHJcclx1MDAxYXyW51x1MDAxYcIsRVx1MDAxMMUkeMSQZFx1MDAwNFx1MDAwM5NcdTAwMTehM6/MQiSDftI8OLEhYlx1MDAxOd7zb8g453IxI7b4gqPhmV9vXHUwMDA2vEbamopcdTAwMDW4PVx1MDAxYVxilMVcdTAwMDFBXCLGXHUwMDAwlXp+l3vVPtM/2q1cdTAwMWK1+8x2z/3bhj6zfyRcdTAwMWVcdTAwMDeUWlRwjaG1I3FALKo11kJuXHUwMDFhXHUwMDA3VJhHUfCX0PtcdTAwMTIjySbxgMaBgInWSisp6Vx1MDAwMkhcdTAwMTixooT50tHC1uvORs6t1ZfF993Y7XN5rji0MlwiY91cdTAwMTbmSDBB5vdaondzW7HtvYKXrfTb/cL3g/x1I/loJZZSQmmkXHUwMDE4M7+MopVRbjFAs9CSXHUwMDEyzClcdTAwMWaX7utEq7Cw1kJDKVKRkFKPXHUwMDA3q1JcdTAwMWFIRi6ihn9Ddf1Qjem4sbtXQipEPWj88Fx1MDAxYlRB8WpNJJ3fseb6vStW+X7vX+5cdTAwMTae7GNkp1jaSTpUXHUwMDE5l1x1MDAxNsCVSzB4XG5+NaCuXHUwMDE3v8pcdTAwMDCpXHUwMDE0ioGYXHUwMDE3fGt8KLsqUCFq0lx1MDAwNFSuYJxcdTAwMDNcdTAwMDC1XGYqMkQq11x1MDAxNsGKUIFcdTAwMDVcdTAwMTHGKCbcLNVGXCKEo4IvXHUwMDE5yX5OSEtLXGJCKFwiIOIoXHQ5QfPB4JnB/MDQpFx1MDAxMlxiS8JmXHUwMDE1XHUwMDE3bzHmQ7QlpcBcdTAwMTJcIiTCmKZ8ZnHIMpkrocAlXHUwMDAx48Bv4eJcdTAwMThcdTAwMTSHmYFcdTAwMDBcdTAwMDS7IC/VrOIgXCLSoEqZXHUwMDA0+GiCwDTCxU1Y8rpcYo3h2EiBmmxcdTAwMTSXaH5CO053XHUwMDFh6rJWuHJTp0LtOKXr3Ekv6YQmtLAk1WBEikutaJCsfCM0yoSxL+hFTehYvVx1MDAxMkZohs9cdTAwMDTDZJGw4Tef/eazra/CZ4Tr8cPDWEphiOQglpg/lmqre13id72rQuGkg09w8Vx1MDAwN/aSzmdcXFx1MDAxOVx0Rlx1MDAwNPBcdTAwMTlcdTAwMDHiYuNcdTAwMDKNQsdAMyhpXHUwMDAyS8RcdTAwMTJOaFpqsDtMf1x1MDAxM9pvQvvKhOY7T35kJlx1MDAxN8XnhqhUXGKqouYnNHrmfL/t1S6bXVxiVr/jg0e7ktdJJzRcdTAwMTPZU5NcdTAwMTjCXHUwMDA0xChVQSlcdTAwMDNC09gyY8hgNlx1MDAwNML/zWWGQvmdIX2RibFRXHUwMDAyjMpcdTAwMTFcdTAwMGV7mURmgype079wn1x1MDAwN+SIRo5m7IZbXHUwMDFmtOzw8MA4oX3g6fbWc7hcdDvOwFx1MDAxM49cdTAwMTRirk7V3bvmwFE7lVG79t2SXVx1MDAxZp5uuOVyPWRjJaiAXHIltrNcdTAwMTPM5bXdO7dp1y9HK7NcdTAwMWOqWGzUY1x1MDAwNlxuKdGLyIRsMX14cnCGVC3VaWqPXHUwMDFjK7uQeJnAtFx1MDAwMj5lXHUwMDE0WFx1MDAwNFx1MDAwM6OGRiXeUFx1MDAwNdFcdTAwMTBlREHcXHUwMDEzdr9cdTAwMWZcdTAwMDMrYGpMXHUwMDA1pyjhSdblYdVPXHUwMDEyrPpLwkrEOyvOzZxcdTAwMTEwpblhlU7t3V5Xfj572XxKcsIumWzVklx1MDAwZSuBuVx1MDAwNWJDgFBcdTAwMDBvJEMskkhYSSa5XHUwMDE2XHUwMDE0f1lv9ZQkWD0tXHQrOiWm5Vx1MDAxMmSnYvPDaicnd3f2XG7n9Sd6WPLat/lWQ54nXHUwMDFkVpQyS1x1MDAxMDMwyCCiYEiPwlxuXCJcdTAwMDdcYjFGZ5SsO5pcctVxiCgqJlx1MDAxMYVcdTAwMDVcdTAwMDBfLDKg8MGAUtGAwlx1MDAxM4DardrNplOPRlx1MDAxNH9vRL3VZiqk2k7Jf7GoXGJcXLFQpSdiK6Ew2JueXHUwMDFmV0dN1cxcdTAwMTVcdTAwMGZ4zW92/cPeXrH7fHe1XHUwMDFjrshcdTAwMTK4XCJL4UpKcFdcYlxiRFxuXHUwMDE0Rk//xcgsKoXmI3O41j7gTpVcdTAwMDWWrCFcdTAwMTRcdTAwMTdcdTAwMDIqoiOyRExMjtwppakyczNcdTAwMTZcdTAwMDDaZGKIvlx1MDAxZVkhMYRcdTAwMTfA31L0T0KPXHUwMDE4N1PJodUkW0BV9dK5LL9OZ4/zvYr/47abvW09nCSd/qXiXHUwMDE2RqbJ5ZiNgtxcIpIyKjfJ/Fx1MDAxMSZcdTAwMTnB/EgrLSBQeTfmX8TyVmP+07ZXMjVOXHUwMDA287/VZiqaYpcjUFx1MDAxMlx1MDAxYvtDjELNpIX5Q/9C7oFcdTAwMTR3s4eMXHUwMDFlVqvlw7NcdTAwMDfhPq97jnDZ7lSd9cJcdGtLXHUwMDFha1VmuVx1MDAwMdWjQ1x1MDAwNFxmMThLXHUwMDE4Q4hcdTAwMDJwqN4gtDCypKCEgGpcdTAwMDL5hqOmW2FcIixMoVNcdTAwMDQoKzODm04gXHUwMDBmqklcdTAwMTDDXHUwMDFmvlx1MDAxY2EpQK68XHUwMDFj4dTOXFyKokJcdTAwMDdcdTAwMGbVZ0FPczc/MuVm1HJcdTAwMDRkQTW5lsCkklOBUCiD/bpcdTAwMTZBWlpcdTAwMTOsJFx1MDAwNK/wUUpMNMpnWopcdTAwMTBvXuazM2lZQYHfwj+XUpcyPmgjhClJXHUwMDE0mX9mxfQ+XHUwMDFl5Zkx7/tRPlx1MDAxYtrT4pxBq1xuqjjCo2l7wrFcdTAwMDWaXHUwMDBmISnIwNpW4Zg/MLftYiWCX5i0gMBGpG0w9kgtqtmYun1hXHUwMDE0SZVCXCLMfFx1MDAxYlx1MDAxMpfDe0bWIUQqwpA1nFdZyj09l/gpL9ntzb3q1+7Dy1x1MDAxMP6MLnZcdTAwMWWXNfL88eVcclPLLaZP9jNcdTAwMTepvCw+XaTZkU2encf8WpdNbF5pxy5UxFx1MDAxNFx1MDAxOIRcdTAwMDFtzi9ccqI7KeFCXHUwMDFi4kCLS6oxUypcdTAwMDK0XHUwMDEwXHUwMDExYlx1MDAwNoClhJNVQbt4+nLC8UO8XGJcdTAwMWFGvONY28qSe1x1MDAxZNlLrEeuXHUwMDFlSupcdTAwMTK8hdNeWnPPcjLLpDTjXHUwMDE3XHUwMDA189iJ1Fx1MDAwMlx1MDAwNFxuXHUwMDAy45NzQ2069SRVhWvNLCqwXHUwMDFjXHUwMDA0tlxiYTKKNVx1MDAwNe6JcVx1MDAwMSjDXG7zXHJiXHLEiMVcdTAwMTkjlHOulWBcdTAwMTHQm1xcn1x1MDAwNJrJTKVCXHUwMDFmPnF6XHUwMDFkmntccspcdTAwMWFLs8aMY2g+qFx1MDAxNeiKXHRprUCKKlwiIKySXHUwMDFj+ovziXd/t1x1MDAxOUFcdTAwMWJcdTAwMTXdseY0dvfqXG6byvjEmEZmQlI4dTaLREpPj376x5X/WFx1MDAxMo1y515mLm5cdTAwMGX3ki2xzcpcdTAwMDWLUVxiXTRHQFx1MDAxNmx02Vx1MDAxNOFcdTAwMTDfYPDlaMWBkT8qdlx1MDAxMSx2ffpcdTAwMWHoQ1x1MDAxYSSEVo8kSmDfXHUwMDE3z3L0MOPenFT2XHUwMDFhmcu6WzrbTSdQsNKQJlx1MDAxYV+OJJngXFwsMDBcdTAwMTj90lx0XHUwMDE3rFx1MDAxOFx0ZoEwp1x1MDAxMmg1XHUwMDAyXHUwMDAyZr07kPKqXGKY6kB1hPFPilWjU1x0xLrvN4Pl/cSq1/VbXf+d5epcZsJcdTAwMWWXq8M6LidYmYqd26LNrlx1MDAwNZTL+aHW8J+v8jdcdTAwMTdPe6Xuz/ZZQZ5cdTAwMWT3cuteUr+BtDFBXHUwMDE2ldrkXHUwMDBlseJMjU5cdTAwMTljiJt1NPBcdTAwMWZcdTAwMDNpqLnYbNpcdTAwMTgrXHUwMDAyXCKDY605iVx1MDAxOJ9cdTAwMTlklyThIMtcdTAwMTBcdTAwMDU5XHUwMDEy6r3hXHUwMDEye1xmhFx1MDAwMbX+8KnlXHUwMDFmkza+f+xcdTAwMWV0XG43pL9XuSZXdPdcblx1MDAxNY6PosUtXCJArmZ1K9hcYlJcdTAwMTSzSXX7ljieVLWfSbvGXHUwMDFh1uDkhEmtU8xcdTAwMTJcdTAwMTTvyplGRC+wSdb0rk2klJ2eLYbDXHUwMDE2wlx1MDAxY9qeXHTQk2S8Xlx1MDAxZpstXHUwMDA2hc2UmX2eTDHby1Vyl1wi9fSjl1x1MDAxNsfPrKpvKlx1MDAxN3TerO50TzXy/IWyxXamkleZ3uGhzVx1MDAwZUlVPGevj9o3nytbjHls+Fx01MExkXz+XHUwMDE0VnQnJVxcfE/PXHUwMDE2U60sXHUwMDAycCFcdTAwMTQptSpo15AtNrlcdTAwMTlpKOZcdTAwMTPN0FjHXHUwMDFj8k3p71x1MDAxOV5mmYnlsepbTJlZjrHZ1o2J+cE2nXySKr9NvphcdTAwMTPEmUSaXHUwMDAwv4xtXCKJ4SxcdTAwMDP3JEz+UW1OfIM6siQhXG7ozyx3XHUwMDBiT2KPz1x1MDAxN3OIz5kmcrVcdTAwMDHVhGjtNShqrFx1MDAxMERKXHUwMDFhdJ2AXHUwMDFmUqjQVW/pYmguMFxuXHJNZ1x1MDAxNpThiXf/KuniOHNcdTAwMWG7fXWJXHJcdTAwMDZcdTAwMThLXCJKmGBwgaWUqJlLnVSOOnXl+163WGyJhnxKtsaekS6mSoLElUaCK2TW9a60lnL9OWNBXHUwMDExOG+OXHUwMDEzmjPuNHo51D88yPiV1M8uxjftXFw9k0DZylj8rHdcctWQXHUwMDE04fm3m4t+64Tr1lx1MDAxOUljhojFQVAwTVx1MDAxMV9cdTAwMTlcYmtIXHUwMDFkY41cdTAwMDW4/HCK4Oso19e87Ptq11x1MDAxOexcdTAwMWSTO15WvYbXrI9vslx1MDAwNEpO6lx1MDAwNVLH6oE9Vr3bXFxqJ7+j3MN9IVxuR+tcdTAwMDbcJlLH1Fx1MDAwMqWjXHUwMDA2XHUwMDBijrlcdTAwMTCjkaLRrlhKqZFWVDK2yV3jqCU5XHUwMDE3kkipzLKTyFx0x9wy+1x0SIbMllx1MDAwM1x1MDAxMNpO4NHsMchcdTAwMTThi4zlJFbOLp467udSfbvaK/bc63Yh11JlnVx1MDAxMTxK6O4gaG1QVYZizY8gQzDUuWbzc1x1MDAwNZJcdTAwMDOuIFx1MDAxY6FPP+M4zr5cdTAwMDZnJyxrnfJcdTAwMTZPWSdkJvtrpeanmeldnEh5Oz2FzIFjXHUwMDEwRFx1MDAxOUhcdTAwMTJDQ6vte7T2XHUwMDE0Mlx1MDAxNsRse6NXnE+1MXF7e9M4TD3VhPT287VuSu1cdTAwMWWV+1x1MDAwZvPmeqe7rJHnL5RDVrXe7cO+aNzcZ5o3V9f8PJty2SfLIcdv76OJokxBuDY3ZqM7KeFafHpcdTAwMGWZXHUwMDEzbVx1MDAwMVFKXHUwMDAwLaGronZccjlkoFx1MDAxN5DhSC/i+D+LXHUwMDEwn7K7z6Zk+FxmL7PMlj/xKWRcdTAwMWVcdTAwMWL3Slx1MDAxMKRYi5DtzVThU6knqSpcdTAwMWOuwFx1MDAxNqeUK1x1MDAxZTXlXHUwMDE4jllcYmQtsI400/M2hjXCXHUwMDEwhN9CYsaVXHUwMDA0b1x1MDAxZPk9RFx1MDAxMaJcdTAwMWJwgbj6XHUwMDEyontlaY0sXGZhI2hIpszmOCi0NCXQ1tzSXGZhXHUwMDEwXHUwMDFkX37OcZxBjd2+uspmLH4kSmhMoSpcdTAwMGKs68Npr8uz+WbJa93ca/dcdTAwMWNd1k7LyZbZM7LIXHUwMDFjK1x1MDAwYrBidkGkgoRbK1x1MDAxMUlkXHUwMDA0YaZCcvPbRiyns7v3dydcdTAwMGZcdTAwMDJ38qyQ2T8p1P1uXHUwMDFml1x1MDAxMqhbWchMJjbQw2aNcWh/zJkoiH7rhFx1MDAwYtdcdTAwMTlJZHCzllnkq82o1qo4WENcdTAwMGXZbHVcbuREXHUwMDE3XHUwMDE5PfksyvU1Pfu+2nVcdTAwMDZ1x6SQl9ywXHUwMDEy49jV5Fx1MDAwNFFidkXV81x1MDAwN4q12vlZqrrvelx1MDAxN7xCXHUwMDFh9Z8/z45zcVuALed11q9dXHUwMDE1laB4OPhcdTAwMTSEmWZj21pDXHUwMDBiWGZ3XHUwMDE4rYX5lk222vyH2PSOiFx1MDAxMKpcdTAwMTGrxyGKxZzhzz/PaHInXHUwMDE43+7UtnA00FQ00GZsXHUwMDA043utOIxN3Vx1MDAwNea1JsvBKX5AhoOgplxmLbBuRudcdTAwMWaubHRRqd2dsd1cIjvUd7dkvTNcdTAwMDHeXHUwMDFiTeDNLPNlXHUwMDE3ilx1MDAxMFx1MDAwMVx1MDAwZU6tNlx1MDAxMSBcdTAwMGVMoa9Smlx1MDAwMibA8mD/93fc+fWd0URcdTAwMTKDJrJcdTAwMWOaXGKaMq+GY7CyhfbRy+5cdTAwMDCQLk6L+1x1MDAwNd5vXHUwMDExXa3cnVx1MDAxY3xcdTAwMDI4ISo4MTYk9Fx1MDAxOJyY1JY03+NHJOVmg/BcdTAwMGbFXHUwMDEzRtp87cNcdTAwMTdYh1x1MDAxNoMnmlx1MDAxODy9zrI3UdugjbbtVuvCh1x1MDAxNtp+XHUwMDBiKqHt3fLra1x1MDAwNuVtP7pOL1x1MDAxZFx1MDAxNVx1MDAwZVx1MDAwZj6m1Fx1MDAwMUZcclx1MDAxOJxBhPrr26//XHUwMDAzdpOZKyJ9 data zdata ydata xChannelProcessdata xoutput xdata youtput ydata zoutput ztask 1task 2task 3

The input block defines the names and qualifiers of variables that refer to channel elements directed at the process. You can only define one input block at a time, and it must contain one or more input declarations.

The input block follows the syntax shown below:

input:
<input qualifier> <input name>

There are several input qualifiers that can be used to define the input declaration. The most common are outlined in detail below.

6.2.1 Input values

The val qualifier allows you to receive data of any type as input. It can be accessed in the process script by using the specified input name. For example:

snippet.nf
num = Channel.of(1, 2, 3)

process BASICEXAMPLE {
    debug true

    input:
    val x

    script:
    """
    echo process job $x
    """
}

workflow {
    BASICEXAMPLE(num)
}

In the above example the process is executed three times, each time a value is received from the channel num it is used by the script. Thus, it results in an output similar to the one shown below:

Output
process job 1
process job 2
process job 3

Warning

The channel guarantees that items are delivered in the same order as they have been sent - but - since the process is executed in a parallel manner, there is no guarantee that they are processed in the same order as they are received.

6.2.2 Input files

The path qualifier allows the handling of file values in the process execution context. This means that Nextflow will stage it in the process execution directory, and it can be accessed by the script using the name specified in the input declaration. For example:

snippet.nf
reads = Channel.fromPath('data/ggal/*.fq')

process FOO {
    debug true

    input:
    path 'sample.fastq'

    script:
    """
    ls sample.fastq
    """
}

workflow {
    result = FOO(reads)
}

In this case, the process is executed six times and will print the name of the file sample.fastq six times as this is the name of the file in the input declaration and despite the input file name being different in each execution (e.g., lung_1.fq).

Output
sample.fastq
sample.fastq
sample.fastq
sample.fastq
sample.fastq
sample.fastq

The input file name can also be defined using a variable reference as shown below:

snippet.nf
reads = Channel.fromPath('data/ggal/*.fq')

process FOO {
    debug true

    input:
    path sample

    script:
    """
    ls  $sample
    """
}

workflow {
    result = FOO(reads)
}

In this case, the process is executed six times and will print the name of the variable input file six times (e.g., lung_1.fq).

Output
lung_1.fq
gut_2.fq
liver_2.fq
lung_2.fq
liver_1.fq
gut_1.fq

The same syntax is also able to handle more than one input file in the same execution and only requires changing the channel composition using an operator (e.g., collect).

snippet.nf
reads = Channel.fromPath('data/ggal/*.fq')

process FOO {
    debug true

    input:
    path sample

    script:
    """
    ls $sample
    """
}

workflow {
    FOO(reads.collect())
}

Note that while the output looks the same, this process is only executed once.

Output
lung_1.fq
gut_2.fq
liver_2.fq
lung_2.fq
liver_1.fq
gut_1.fq

Warning

In the past, the file qualifier was used for files, but the path qualifier should be preferred over file to handle process input files when using Nextflow 19.10.0 or later. When a process declares an input file, the corresponding channel elements must be file objects created with the file helper function from the file specific channel factories (e.g., Channel.fromPath or Channel.fromFilePairs).

6.2.3 Combine input channels

A key feature of processes is the ability to handle inputs from multiple channels. However, it’s important to understand how channel contents and their semantics affect the execution of a process.

Consider the following example:

snippet.nf
ch1 = Channel.of(1, 2, 3)
ch2 = Channel.of('a', 'b', 'c')

process FOO {
    debug true

    input:
    val x
    val y

    script:
    """
    echo $x and $y
    """
}

workflow {
    FOO(ch1, ch2)
}

Both channels emit three values, therefore the process is executed three times, each time with a different pair:

Output
1 and a
3 and c
2 and b

The process waits until there’s a complete input configuration, i.e., it receives an input value from all the channels declared as input.

When this condition is verified, it consumes the input values coming from the respective channels, spawns a task execution, then repeats the same logic until one or more channels have no more content.

This means channel values are consumed serially one after another and the first empty channel causes the process execution to stop, even if there are other values in other channels.

What happens when channels do not have the same cardinality (i.e., they emit a different number of elements)?

snippet.nf
ch1 = Channel.of(1, 2, 3)
ch2 = Channel.of('a')

process FOO {
    debug true

    input:
    val x
    val y

    script:
    """
    echo $x and $y
    """
}

workflow {
    FOO(ch1, ch2)
}

In the above example, the process is only executed once because the process stops when a channel has no more data to be processed.

Output
1 and a

However, replacing ch2 with a value channel will cause the process to be executed three times, each time with the same value of a:

snippet.nf
ch1 = Channel.of(1, 2, 3)
ch2 = Channel.value('a')

process FOO {
    debug true

    input:
    val x
    val y

    script:
    """
    echo $x and $y
    """
}

workflow {
    FOO(ch1, ch2)
}
Script output
1 and a
2 and a
3 and a

As ch2 is now a value channel, it can be consumed multiple times and does not affect process termination.

Exercise

Write a process that is executed for each read file matching the pattern data/ggal/*_1.fq and use the same data/ggal/transcriptome.fa in each execution.

Solution

One possible solution is shown below:

snippet.nf
params.reads = "$projectDir/data/ggal/*_1.fq"
params.transcriptome_file = "$projectDir/data/ggal/transcriptome.fa"

Channel
    .fromPath(params.reads)
    .set { read_ch }

process COMMAND {
    debug true

    input:
    path reads
    path transcriptome

    script:
    """
    echo $reads $transcriptome
    """
}

workflow {
    COMMAND(read_ch, params.transcriptome_file)
}

You may also consider using other Channel factories or operators to create your input channels.

6.2.4 Input repeaters

The each qualifier allows you to repeat the execution of a process for each item in a collection every time new data is received. For example:

snippet.nf
sequences = Channel.fromPath("$projectDir/data/ggal/*_1.fq")
methods = ['regular', 'espresso']

process ALIGNSEQUENCES {
    debug true

    input:
    path seq
    each mode

    script:
    """
    echo t_coffee -in $seq -mode $mode
    """
}

workflow {
    ALIGNSEQUENCES(sequences, methods)
}
Output
t_coffee -in gut_1.fq -mode regular
t_coffee -in lung_1.fq -mode espresso
t_coffee -in liver_1.fq -mode regular
t_coffee -in gut_1.fq -mode espresso
t_coffee -in lung_1.fq -mode regular
t_coffee -in liver_1.fq -mode espresso

In the above example, every time a file of sequences is received as an input by the process, it executes three tasks, each running a different alignment method set as a mode variable. This is useful when you need to repeat the same task for a given set of parameters.

Exercise

Extend the previous example so a task is executed for an additional type of coffee.

Solution

Modify the methods list and add another coffee type:

snippet.nf
sequences = Channel.fromPath("$projectDir/data/ggal/*_1.fq")
methods = ['regular', 'espresso', 'cappuccino']

process ALIGNSEQUENCES {
    debug true

    input:
    path seq
    each mode

    script:
    """
    echo t_coffee -in $seq -mode $mode
    """
}

workflow {
    ALIGNSEQUENCES(sequences, methods)
}

Your output will look something like this:

Output
t_coffee -in gut_1.fq -mode regular
t_coffee -in lung_1.fq -mode regular
t_coffee -in gut_1.fq -mode espresso
t_coffee -in liver_1.fq -mode cappuccino
t_coffee -in liver_1.fq -mode espresso
t_coffee -in lung_1.fq -mode espresso
t_coffee -in liver_1.fq -mode regular
t_coffee -in gut_1.fq -mode cappuccino
t_coffee -in lung_1.fq -mode cappuccino

Summary

In this step you have learned:

  1. How to use the val qualifier to define the input channel(s) of a process
  2. How to use the path qualifier to define the input file(s) of a process
  3. How to use the each qualifier to repeat the execution of a process for each item in a collection

6.3 Outputs

The output declaration block defines the channels used by the process to send out the results produced.

Only one output block, that can contain one or more output declaration, can be defined. The output block follows the syntax shown below:

output:
<output qualifier> <output name>, emit: <output channel>

6.3.1 Output values

The val qualifier specifies a defined value in the script context. Values are frequently defined in the input and/or output declaration blocks, as shown in the following example:

snippet.nf
greeting = "Hello world!"

process FOO {
    input:
    val x

    output:
    val x

    script:
    """
    echo $x > file
    """
}

workflow {
    FOO(Channel.of(greeting))
        .view()
}

6.3.2 Output files

The path qualifier specifies one or more files produced by the process into the specified channel as an output.

snippet.nf
process RANDOMNUM {
    output:
    path 'result.txt'

    script:
    """
    echo \$RANDOM > result.txt
    """
}

workflow {
    receiver_ch = RANDOMNUM()
    receiver_ch.view()
}

In the above example the process RANDOMNUM creates a file named result.txt containing a random number.

Since a file parameter using the same name is declared in the output block, the file is sent over the receiver_ch channel when the task is complete. A downstream process declaring the same channel as input will be able to receive it.

6.3.3 Multiple output files

When an output file name contains a wildcard character (* or ?) it is interpreted as a glob path matcher. This allows us to capture multiple files into a list object and output them as a sole emission. For example:

snippet.nf
process SPLITLETTERS {
    output:
    path 'chunk_*'

    script:
    """
    printf 'Hola' | split -b 1 - chunk_
    """
}

workflow {
    letters = SPLITLETTERS()
    letters.view()
}

Prints the following:

Output
[/workspace/gitpod/nf-training/work/ca/baf931d379aa7fa37c570617cb06d1/chunk_aa, /workspace/gitpod/nf-training/work/ca/baf931d379aa7fa37c570617cb06d1/chunk_ab, /workspace/gitpod/nf-training/work/ca/baf931d379aa7fa37c570617cb06d1/chunk_ac, /workspace/gitpod/nf-training/work/ca/baf931d379aa7fa37c570617cb06d1/chunk_ad]

Some caveats on glob pattern behavior:

  • Input files are not included in the list of possible matches
  • Glob pattern matches both files and directory paths
  • When a two asterisks pattern ** is used to recourse across directories, only file paths are matched i.e., directories are not included in the result list.

Exercise

Add the flatMap operator and see out the output changes. The documentation for the flatMap operator is available at this link.

Solution

Add the flatMap operator to the letters channel.

snippet.nf
process SPLITLETTERS {
    output:
    path 'chunk_*'

    script:
    """
    printf 'Hola' | split -b 1 - chunk_
    """
}

workflow {
    letters = SPLITLETTERS()
    letters.flatMap().view()
}

Your output will look something like this:

Output
/workspace/gitpod/nf-training/work/54/9d79f9149f15085e00dde2d8ead150/chunk_aa
/workspace/gitpod/nf-training/work/54/9d79f9149f15085e00dde2d8ead150/chunk_ab
/workspace/gitpod/nf-training/work/54/9d79f9149f15085e00dde2d8ead150/chunk_ac
/workspace/gitpod/nf-training/work/54/9d79f9149f15085e00dde2d8ead150/chunk_ad

6.3.4 Dynamic output file names

When an output file name needs to be expressed dynamically, it is possible to define it using a dynamic string that references values defined in the input declaration block or in the script global context. For example:

snippet.nf
species = ['cat', 'dog', 'sloth']
sequences = ['AGATAG', 'ATGCTCT', 'ATCCCAA']

Channel
    .fromList(species)
    .set { species_ch }

process ALIGN {
    input:
    val x
    val seq

    output:
    path "${x}.aln"

    script:
    """
    echo align -in $seq > ${x}.aln
    """
}

workflow {
    genomes = ALIGN(species_ch, sequences)
    genomes.view()
}

In the above example, each time the process is executed an alignment file is produced whose name depends on the actual value of the x input.

6.3.5 Composite inputs and outputs

So far you have seen how to declare multiple input and output channels that can handle one value at a time. However, Nextflow can also handle a tuple of values.

The input and output declarations for tuples must be declared with a tuple qualifier followed by the definition of each element in the tuple.

snippet.nf
reads_ch = Channel.fromFilePairs('data/ggal/*_{1,2}.fq')

process FOO {
    input:
    tuple val(sample_id), path(sample_id_paths)

    output:
    tuple val(sample_id), path('sample.bam')

    script:
    """
    echo your_command_here --sample $sample_id_paths > sample.bam
    """
}

workflow {
    sample_ch = FOO(reads_ch)
    sample_ch.view()
}

The output will looks something like this:

Output
[lung, /workspace/gitpod/nf-training/work/23/fe268295bab990a40b95b7091530b6/sample.bam]
[liver, /workspace/gitpod/nf-training/work/32/656b96a01a460f27fa207e85995ead/sample.bam]
[gut, /workspace/gitpod/nf-training/work/ae/3cfc7cf0748a598c5e2da750b6bac6/sample.bam]

Exercise

Modify the script of the previous exercise so that the --sample file is named as the given sample_id.

Solution
snippet.nf
reads_ch = Channel.fromFilePairs('data/ggal/*_{1,2}.fq')

process FOO {
    input:
    tuple val(sample_id), path(sample_id_paths)

    output:
    tuple val(sample_id), path("${sample_id}.bam")

    script:
    """
    echo your_command_here --sample $sample_id_paths > ${sample_id}.bam
    """
}

workflow {
    sample_ch = FOO(reads_ch)
    sample_ch.view()
}

6.3.6 Output definitions

Nextflow allows the use of alternative output definitions within workflows to simplify your code.

You can also explicitly define the output of a channel using the .out attribute:

snippet.nf
reads_ch = Channel.fromFilePairs('data/ggal/*_{1,2}.fq')

process FOO {
    input:
    tuple val(sample_id), path(sample_id_paths)

    output:
    tuple val(sample_id), path('sample.bam')
    tuple val(sample_id), path('sample.bai')

    script:
    """
    echo your_command_here --sample $sample_id_paths > sample.bam
    echo your_command_here --sample $sample_id_paths > sample.bai
    """
}

workflow {
    FOO(reads_ch)
    FOO.out.view()
}

This command will produce an error message, because .view() operates on single channels, and FOO.out contains multiple channels.

If a process defines two or more output channels, each channel can be accessed by indexing the .out attribute, e.g., .out[0], .out[1], etc. In this example you only have the [0]'th output:

snippet.nf
reads_ch = Channel.fromFilePairs('data/ggal/*_{1,2}.fq')

process FOO {
    input:
    tuple val(sample_id), path(sample_id_paths)

    output:
    tuple val(sample_id), path('sample.bam')
    tuple val(sample_id), path('sample.bai')

    script:
    """
    echo your_command_here --sample $sample_id_paths > sample.bam
    echo your_command_here --sample $sample_id_paths > sample.bai
    """
}

workflow {
    FOO(reads_ch)
    FOO.out[0].view()
}

Alternatively, the process output definition allows the use of the emit statement to define a named identifier that can be used to reference the channel in the external scope.

snippet.nf
reads_ch = Channel.fromFilePairs('data/ggal/*_{1,2}.fq')

process FOO {
    input:
    tuple val(sample_id), path(sample_id_paths)

    output:
    tuple val(sample_id), path('sample.bam'), emit: bam
    tuple val(sample_id), path('sample.bai'), emit: bai

    script:
    """
    echo your_command_here --sample $sample_id_paths > sample.bam
    echo your_command_here --sample $sample_id_paths > sample.bai
    """
}

workflow {
    FOO(reads_ch)
    FOO.out.bam.view()
}

Exercise

Modify the previous example so that the bai output channel is printed to your terminal.

Solution

Your workflow will look something like this:

snippet.nf
reads_ch = Channel.fromFilePairs('data/ggal/*_{1,2}.fq')

process FOO {
    input:
    tuple val(sample_id), path(sample_id_paths)

    output:
    tuple val(sample_id), path('sample.bam'), emit: bam
    tuple val(sample_id), path('sample.bai'), emit: bai

    script:
    """
    echo your_command_here --sample $sample_id_paths > sample.bam
    echo your_command_here --sample $sample_id_paths > sample.bai
    """
}

workflow {
    FOO(reads_ch)
    FOO.out.bai.view()
}

Summary

In this step you have learned:

  1. How to use the val qualifier to define the output channel(s) of a process
  2. How to use the path qualifier to define the output file(s) of a process
  3. How to use the tuple qualifier to define the output channel(s) of a process
  4. How to manage multiple output files using glob patterns
  5. How to use dynamic output file names
  6. How to use composite inputs and outputs
  7. How to define outputs

6.4 When

The when declaration allows you to define a condition that must be verified in order to execute the process. This can be any expression that evaluates a boolean value.

It is useful to enable/disable the process execution depending on the state of various inputs and parameters. For example:

snippet.nf
params.dbtype = 'nr'
params.prot = 'data/prots/*.tfa'
proteins = Channel.fromPath(params.prot)

process FIND {
    debug true

    input:
    path fasta
    val type

    when:
    fasta.name =~ /^BB11.*/ && type == 'nr'

    script:
    """
    echo blastp -query $fasta -db nr
    """
}

workflow {
    result = FIND(proteins, params.dbtype)
}

Summary

In this step you have learned:

  1. How to use the when declaration to allow conditional processes

6.5 Directives

Directive declarations allow the definition of optional settings that affect the execution of the current process without affecting the semantic of the task itself.

They must be entered at the top of the process body, before any other declaration blocks (i.e., input, output, etc.).

Directives are commonly used to define the amount of computing resources to be used or other meta directives that allow the definition of extra configuration of logging information. For example:

snippet.nf
process FOO {
    cpus 2
    memory 1.GB
    container 'image/name'

    script:
    """
    echo your_command --this --that
    """
}

The complete list of directives is available at this link. Some of the most common are described in detail below.

6.5.1 Resource allocation

Directives that allow you to define the amount of computing resources to be used by the process. These are:

Name Description
cpus Allows you to define the number of (logical) CPUs required by the process’ task.
time Allows you to define how long the task is allowed to run (e.g., time 1h: 1 hour, 1s 1 second, 1m 1 minute, 1d 1 day).
memory Allows you to define how much memory the task is allowed to use (e.g., 2 GB is 2 GB). Can also use B, KB,MB,GB and TB.
disk Allows you to define how much local disk storage the task is allowed to use.

These directives can be used in combination with each other to allocate specific resources to each process. For example:

snippet.nf
process FOO {
    cpus 2
    memory 1.GB
    time '1h'
    disk '10 GB'

    script:
    """
    echo your_command --this --that
    """
}

6.5.2 PublishDir directive

Given each task is being executed in separate temporary work/ folder (e.g., work/f1/850698…), you may want to save important, non-intermediary, and/or final files in a results folder.

To store our workflow result files, you need to explicitly mark them using the directive publishDir in the process that’s creating the files. For example:

snippet.nf
reads_ch = Channel.fromFilePairs('data/ggal/*_{1,2}.fq')

process FOO {
    publishDir "results", pattern: "*.bam"

    input:
    tuple val(sample_id), path(sample_id_paths)

    output:
    tuple val(sample_id), path("*.bam")
    tuple val(sample_id), path("*.bai")

    script:
    """
    echo your_command_here --sample $sample_id_paths > ${sample_id}.bam
    echo your_command_here --sample $sample_id_paths > ${sample_id}.bai
    """
}

workflow {
    FOO(reads_ch)
}

The above example will copy all BAM files created by the FOO process into the directory path results.

Tip

The publish directory can be local or remote. For example, output files could be stored using an AWS S3 bucket by using the s3:// prefix in the target path.

You can use more than one publishDir to keep different outputs in separate directories. For example:

snippet.nf
reads_ch = Channel.fromFilePairs('data/ggal/*_{1,2}.fq')

process FOO {
    publishDir "results/bam", pattern: "*.bam"
    publishDir "results/bai", pattern: "*.bai"

    input:
    tuple val(sample_id), path(sample_id_paths)

    output:
    tuple val(sample_id), path("*.bam")
    tuple val(sample_id), path("*.bai")

    script:
    """
    echo your_command_here --sample $sample_id_paths > ${sample_id}.bam
    echo your_command_here --sample $sample_id_paths > ${sample_id}.bai
    """
}

workflow {
    FOO(reads_ch)
}

Exercise

Edit the publishDir directive in the previous example to store the output files for each sample type in a different directory.

Solution

Your solution could look something like this:

snippet.nf
1
2
3
4
5
6
7
reads_ch = Channel.fromFilePairs('data/ggal/*_{1,2}.fq')

process FOO {
    publishDir "results/$sample_id", pattern: "*.{bam,bai}"

    input:
...

Summary

In this step you have learned:

  1. How to use the cpus, time, memory, and disk directives to define the amount of computing resources to be used by the process
  2. How to use the publishDir directive to store the output files in a results folder