5. Workflow Structure¶
There are three directories in a Nextflow workflow repository that have a special purpose:
bin directory (if it exists) is always added to the
$PATH for all tasks. If the tasks are performed on a remote machine, the directory is copied across to the new machine before the task begins. This Nextflow feature is designed to make it easy to include accessory scripts directly in the workflow without having to commit those scripts into the container. This feature also ensures that the scripts used inside of the workflow move on the same revision schedule as the workflow itself.
It is important to know that Nextflow will take care of updating
$PATH and ensuring the files are available wherever the task is running, but will not change the permissions of any files in that directory. If a file is called by a task as an executable, the workflow developer must ensure that the file has the correct permissions to be executed.
For example, let's say we have a small R script that produces a csv and a tsv:
We'd like to use this script in a simple workflow:
To do this, we can create the bin directory, write our R script into the directory. Finally, and crucially, we make the script executable:
Always ensure that your scripts are executable. The scripts will not be available to your Nextflow processes without this step.
Let's run the script and see what Nextflow is doing for us behind the scenes:
and then inspect the
.command.run file that Nextflow has generated
You'll notice a
nxf_container_env bash function that appends our bin directory to
When working on the cloud, Nextflow will also ensure that the bin directory is copied onto the virtual machine running your task in addition to the modification of
Always use a portable shebang line in your bin directory scripts.
In the R script example shown above, I may have the
Rscript program installed at (for example)
/opt/homebrew/bin/Rscript. If I hard-code this path into my
cars.R, everything will work when I'm testing locally outside of the docker container, but will fail when running with docker/singularity or in the cloud as the
Rscript program may be installed in a different location in those contexts.
It is strongly recommended to use
#!/usr/bin/env when setting the shebang for scripts in the
bin directory to ensure maximum portability.
If a process script block is becoming too long, it can be moved to a template file. The template file can then be imported into the process script block using the
template method. This is useful for keeping the process block tidy and readable. Nextflow's use of
$ to indicate variables also allows for directly testing the template file by running it as a script.
The structure directory already contains an example template - a very simple python script. We can add a new process that uses this template:
In the previous chapter, we saw the addition of small helper Groovy functions to the
main.nf file. It may at times be helpful to bundle functionality into a new Groovy class. Any classes defined in the
lib directory are available for use in the workflow - both
main.nf and any imported modules.
Classes defined in
lib directory can be used for a variety of purposes. For example, the nf-core/rnaseq workflow uses five custom classes:
NfcoreSchema.groovyfor parsing the schema.json file and validating the workflow parameters.
NfcoreTemplate.groovyfor email templating and nf-core utility functions.
Utils.groovyfor provision of a single
WorkflowMain.groovyfor workflow setup and to call the
WorkflowRnaseq.groovyfor the workflow-specific functions.
The classes listed above all provide utility executed at the beginning of a workflow, and are generally used to "set up" the workflow. However, classes defined in
lib can also be used to provide functionality to the workflow itself.
5.3.1 Making a Metadata Class¶
Let's consider an example where we create our own custom class to handle metadata. We can create a new class in
./lib/Metadata.groovy. We'll extend the built-in
HashMap class, and add a simple method to return a value:
We can then use this class in our workflow:
We can use the new
hi method in the workflow:
At the moment, the
Metadata class is not making use of the "Montreal" being passed into the closure. Let's change that by adding a constructor to the class:
Which we can use like so:
We can also use this method when passing the object to a process:
Why might this be helpful? You can add extra classes to the metadata which can be computed from the existing metadata. For example, we might want want to grab the adapter prefix:
Which we might use like so:
You might even want to reach out to external services such as a LIMS or the E-utilities API. Here we add a dummy "getSampleName()" method that reaches out to a public API:
Which we can use like so:
When we start passing custom classes through the workflow, it's important to understand a little about the Nextflow caching mechanism. When a task is run, a unique hash is calculated based on the task name, the input files/values, and the input parameters. Our class extends from
HashMap, which means that the hash will be calculated based on the contents of the
HashMap. If we add a new method to the class, or amend a class method, this does not change the value of the objects in the hash, which means that the hash will not change.
Can you show changing a method in our
Metadata class does not change the hash?
We are not limited to using or extending the built-in Groovy classes. Let's start by creating a
Dog class in
We can create a new dog at the beginning of the workflow:
We can pass objects of our class through channels. Here we take a channel of dog names and create a channel of dogs:
If we try to use this new class in a resumed process, no caches will be used.
Show that the
Dog class is not cached when resuming a workflow.
5.3.2 Making a ValueObject¶
Nextflow has provided a decorator to help serialize your custom classes. By adding
@ValueObject to the class definition, Nextflow will automatically serialize the class and cache it. This is useful if you want to pass a custom class through a channel, or if you want to use the class in a resumed workflow.
Let's add the decorator to our
Lastly, we will need to register the class with Kryo, the Java serialization framework. Again, Nextflow provides a helper method to do this. We can add the following to the
Show that the
Dog class can now be used in processes and cached correctly.