Part 1: Hello World¶
A "Hello, World!" is a minimalist example that is meant to demonstrate the basic syntax and structure of a programming language or software framework. The example typically consists of printing the phrase "Hello, World!" to the output device, such as the console or terminal, or writing it to a file.
0. Warmup: Run Hello World directly¶
Let's demonstrate this with a simple command that we run directly in the terminal, to show what it does before we wrap it in Nextflow.
1. Make the terminal say hello¶
2. Now make it write the text output to a file¶
3. Verify that the output file is there using the ls
command¶
4. Show the file contents¶
Tip
In the Gitpod environment, you can also find the output file in the file explorer, and view its contents by clicking on it. Alternatively, you can use the code
command to open the file for viewing.
Takeaway¶
You now know how to run a simple command in the terminal that outputs some text, and optionally, how to make it write the output to a file.
What's next?¶
Learn how to turn that into a step in a Nextflow workflow.
1. Very first Nextflow run¶
Now we're going to run a script (named hello-world.nf
) that does the same thing as before (write 'Hello World!' to a file) but with Nextflow.
Note
We're intentionally not looking at the script yet. Understanding what is the result before we look into the machine will help us understand what each part does.
1. Run the workflow¶
You console should look something like this:
N E X T F L O W ~ version 23.10.1
Launching `hello-world.nf` [mighty_murdock] DSL2 - revision: 80e92a677c
executor > local (1)
[4e/6ba912] process > sayHello [100%] 1 of 1 ✔
Congratulations, you ran your first Nextflow workflow!
The most important output here is the last line (line 4), which reports that the sayHello
process was successfully executed once.
When a Nextflow workflow is run a work
directory that stores various files is created.
Each task uses a unique directory based on its hash (e.g., 4e/6ba912
) within the work directory.
When a task is created, Nextflow stages the task input files, script, and other helper files into the task directory. The task writes any output files to this directory during its execution, and Nextflow uses these output files for downstream tasks and/or publishing.
Warning
Your work directory won't necessarily have the same hash as the one shown above.
Browse the work
directory in the file explorer to find the log files and any outputs created by the task. You should find the following files:
.command.begin
: Metadata related to the beginning of the execution of the process task.command.err
: Error messages (stderr) emitted by the process task.command.log
: Complete log output emitted by the process task.command.out
: Regular output (stdout) by the process task.command.sh
: The command that was run by the process task call.exitcode
: The exit code resulting from the command
In this case, look for your output in the .command.out
file.
Tip
Some of the specifics will be different in your log output. For example, here [mighty_murdock]
and [4e/6ba912]
are randomly generated names, so those will be different every time.
Takeaway¶
You know how to run a simple Nextflow script and navigate the work directory.
What's next?¶
Learn how to interpret the Nextflow code.
2. Interpret the Hello World script¶
Nextflow scripts is built up of multiple parts.
A process is the basic processing primitive to execute a user script.
The process definition starts with the keyword process
, followed by process name and finally the process body delimited by curly braces. The process body must contain a script block which represents the command or, more generally, a script that is executed by it.
A process may contain any of the following definition blocks: directives, inputs, outputs, when clauses, and of course, the script.
A workflow is a composition of processes and dataflow logic.
The workflow definition starts with the keyword workflow
, followed by an optional name, and finally the workflow body delimited by curly braces.
Processes are connected through asynchronous first-in, first-out (FIFO) queues, called channels. The interaction between processes, and ultimately the workflow execution flow itself, are defined by the process input and output declarations.
Let's open the hello-world.nf
script and look at how it's structured.
1. Double click on the file in the file explorer to open it in the editor pane¶
Tip
The file is in the current directory. Optionally, you can type ls
in the terminal and Ctrl+Click on the file to open it. If you're on macOS, you can use Cmd+Click.
The first block of code describes a process called sayHello
that writes its output to stdout
:
The second block of code describes the workflow itself, which consists of one call to the sayHello
process.
2. Add a comment block above the process to document what it does in plain English¶
3. Add an in-line comment above the process call¶
Takeaway¶
You know how to interpret the simplest possible Nextflow script and add comments to document it.
What's next?¶
Learn how to make it output a named file.
3. Send the output to a file¶
Instead of printing "Hello World!" to the standard output it can be saved to a file (it's the same thing we did when running in the terminal earlier).
In a real-world workflow, this is like having a command that specifies an output file as part of its normal syntax. We'll see examples of that later.
Bother the script and the output definition blocks need to be updated.
Note
Inputs and outputs in the process blocks typically require a qualifier and a variable name:
A definition consists of a qualifier and a name. The qualifier defines the type of data to be received. This information is used by Nextflow to apply the semantic rules associated with each qualifier, and handle it properly. Common qualifiers include val
and path
.
1. Change the process command to output a named file¶
Before:
After:
2. Change the output declaration in the process¶
Before:
After:
3. Run the workflow again¶
The log output should be very similar to the first time your ran the workflow:
N E X T F L O W ~ version 23.10.1
Launching `scripts/hello-world.nf` [disturbed_cajal] DSL2 - revision: 9512241567
executor > local (1)
[ab/c61321] process > sayHello [100%] 1 of 1 ✔
Like you did before, find the work
directory in the file explorer. Find the output.txt
output file and click on it to open it and verify that it contains the greeting as expected.
Warning
This example is brittle because we hardcoded the output filename in two separate places (the script and the output blocks). If we change one but not the other, the script will break.
Takeaway¶
You know how to send outputs to a specific named file.
What's next?¶
Learn how to pass parameters to the workflow from the command line.
4. Use a command line parameter for naming the output file¶
Here we introduce params
(short for 'parameters') as the construct that can hold command line arguments. This is useful because there will be many parameters, such as filenames and processing options, that you may want to decide at the time you run the workflow. Parameters allow you to do this without editing the script itself every time.
Parameters can be created by prefixing a parameter names with the params scope (e.g., params.output_file
). When including these in a script block, a $
must be used to treat it like a variable.
Parameters can be modified when you run your workflow by adding a double hyphen (--
) to the start of the parameter name and including it in the run command (e.g., nextflow run hello-world --output_file results
).
1. Change the output declaration in the process to use a parameter¶
Before:
After:
2. Change the process command to use the parameter too¶
Before:
After:
3. Run the workflow again with the --output_file
parameter¶
The log output should start looking very familiar:
N E X T F L O W ~ version 23.10.1
Launching `hello-world.nf` [evil_bose] DSL2 - revision: 6907ac9da2
executor > local (1)
[46/e4ff05] process > sayHello [100%] 1 of 1 ✔
Follow the same procedure as before to find the output.txt
output file. If you want to convince yourself that the parameter is working as intended, feel free to repeat this step with a different output filename.
Warning
If you forget to add the output filename parameter, you get a warning and the output file is called null
. If you add it but don't give it a value, the output file is called true
.
Tip
Command-line arguments take one dash (-) for Nextflow options, two dashes (--) for workflow parameters.
Takeaway¶
You know how to use a command line parameter to set the output filename.
What's next?¶
Learn how to set a default value in case we leave out the parameter.
5. Set a default value for a command line parameter¶
In many cases, it makes sense to supply a default value for a given parameter so that you don't have to specify it for every run.
Let's initialize the output_file
parameter with a default value.
1. Add the parameter declaration at the top of the script (with a comment block as a free bonus)¶
2. Run the workflow again without specifying the parameter¶
The console output is expected to look the same...
N E X T F L O W ~ version 23.10.1
Launching `hello-world.nf` [tiny_elion] DSL2 - revision: 7ad1cd6bfe
executor > local (1)
[8b/1f9ded] process > sayHello [100%] 1 of 1 ✔
Check the output in the work directory, and... Tadaa! It works! Nextflow used the default value to name the output. But wait, what happens now if we provide the parameter in the command line?
3. Run the workflow again with the --output_file
parameter on the command line using a DIFFERENT filename¶
Nextflow's not complaining, that's a good sign:
N E X T F L O W ~ version 23.10.1
Launching `hello-world.nf` [exotic_lichterman] DSL2 - revision: 7ad1cd6bfe
executor > local (1)
[36/47354a] process > sayHello [100%] 1 of 1 ✔
Check the output directory and look for the output with the new filename. Tadaa again!
The value of the parameter we passed on the command line overrode the value we gave the variable in the script. In fact, parameters can be set in several different ways; if the same parameter is set in multiple places, its value is determined based on the order of precedence that is described here.
Tip
You can put the parameter declaration inside the workflow block if you prefer. Whatever you choose, try to group similar things in the same place so you don't end up with declarations all over the place.
Takeaway¶
You know how to handle command line parameters and set default values.
What's next?¶
Learn how to add in variable inputs.
6. Add in variable inputs¶
So far, we've been emitting a greeting hardcoded into the process command. Now we're going to add some flexibility by introducing channels.
Nextflow is based on the dataflow programming model in which processes communicate through channels.
Channels are created used channel factories methods. There are several types of channel factories which can be utilized for creating different channel types for different data types.
Importantly, there are two kinds of channels (queue channels and value channels) which behave differently.
Queue channel
- A non-blocking unidirectional first-in first-out queue connecting a producer process (i.e. outputting a value) to a consumer process, or an operators
- Can be consumed only once
Value channel
- Can be bound (i.e. assigned) with one and only one value
- Can be consumed any number of times
We're going to start by creating a value channel with the Channel.of()
channel factory.
Tip
You can build different kinds of channels depending on the shape of the input data; we'll cover how to deal with simple inputs later, but more complex input channel types are out of scope for this training.
1. Create an input channel (with a bonus in-line comment)¶
Before:
After:
hello-world.nf | |
---|---|
2. Add the channel as input to the process call¶
Before:
After:
3. Add an input definition to the process block¶
Before:
After:
hello-world.nf | |
---|---|
4. Edit the process command to use the input variable¶
Before:
After:
5. Run the workflow command again¶
If you made all four edits correctly, you should get another successful execution:
N E X T F L O W ~ version 23.10.1
Launching `hello-world.nf` [maniac_euler] DSL2 - revision: 73bfbe197f
executor > local (1)
[57/aee130] process > sayHello (1) [100%] 1 of 1 ✔
The result is still the same as previously; so far we're just progressively tweaking the internal plumbing to increase the flexibility of our workflow while achieving the same end result.
Takeaway¶
You know how to use a simple channel to provide an input to a process.
What's next?¶
Learn how to pass inputs from the command line.
7. Use params for inputs too¶
We want to be able to specify the input from the command line because that is the piece that will almost always be different in subsequent runs of the workflow. Good news: we can use the same params
construct we used for the output filename.
1. Edit the input channel declaration to use a parameter¶
Before:
After:
2. Run the workflow again with the --greeting
parameter¶
In case you're wondering, yes it's normal to have dreams where the Nextflow log output scrolls endlessly in front of you after running through a training session... Or is that just me?
N E X T F L O W ~ version 23.10.1
Launching `hello-world.nf` [hopeful_laplace] DSL2 - revision: a8ed9a6202
executor > local (1)
[83/dfbbbc] process > sayHello (1) [100%] 1 of 1 ✔
Be sure to open up the output file to check that you now have the new version of the greeting. Voilà!
Note
The current form of the script doesn't have a variable declaration for greeting
so that parameter is REQUIRED to be included in the command line. If we wanted, we could put in a default value by adding for example params.greeting = 'Holà el mundo!'
at the top of the script (just like we did for the output filename). But it's less common to want to have a default value set for the input data.
Takeaway¶
You know how to set up an input variable for a process and supply a value in the command line.
What's next?¶
Learn how to add in a second process and chain them together.
8. Add a second step to the workflow¶
Most real-world workflows involve more than one step. Here we introduce a second process that converts the text to uppercase (all-caps), using the classic UNIX one-liner:
We're going to run the command by itself in the terminal first to verify that it works as expected without any of the workflow code getting in the way of clarity, just like we did at the start with echo 'Hello World'
. Then we'll write a process that does the same thing, and finally we'll connect the two processes so the output of the first serves as input to the second.
1. Run the command in the terminal by itself¶
The output is simply the uppercase version of the text string:
2. Make the command take a file as input and write the output to a file¶
Now the HELLO WORLD
output is in the new output file, UPPER-output.txt
.
3. Turn that into a process definition (documented with a comment block)¶
hello-world.nf | |
---|---|
4. Add a call to the new process in the workflow block¶
hello-world.nf | |
---|---|
5. Pass the output of the first process to the second process¶
6. Run the same workflow command as before¶
Oh, how exciting! There is now an extra line in the log output, which corresponds to the second process we've added:
N E X T F L O W ~ version 23.10.1
Launching `hello-world.nf` [kickass_pasteur] DSL2 - revision: d15b2c482c
executor > local (2)
[da/8d9221] process > sayHello (1) [100%] 1 of 1 ✔
[01/2b32ee] process > convertToUpper (1) [100%] 1 of 1 ✔
This time the workflow produced two work directories; one per process instance (task). Check out the work directory of the task from the second process, where you should find two different output files listed. If you look carefully, you'll notice one of them (the output of the first process) has a little arrow icon on the right; that signifies it's a symbolic link. It points to the location where that file lives in the work directory of the first process.
Note
As a little bonus, we composed the second output filename based on the first one. Very important to remember: you have to use double quotes around the filename expression (NOT single quotes) or it will fail.
Takeaway¶
You know how to add a second step that takes the output of the first as input.
What's next?¶
Learn how to make the workflow run on many values for the same input.
9. Modify the workflow to run on many values for the same input¶
Workflows typically run on batches of inputs that we want to process in bulk. Here we upgrade the workflow to accept an input with multiple values. For simplicity, we go back to hardcoding the greetings instead of using a parameter for the input.
1. Modify the channel to contain multiple greetings (hardcoded for now)¶
Before:
After:
hello-world.nf | |
---|---|
2. Modify the first process to generate dynamic filenames so the final filenames will be unique¶
Before:
hello-world.nf | |
---|---|
After:
hello-world.nf | |
---|---|
Note
In practice, naming files based on the data input itself is almost always impractical; the better way to generate dynamic filenames is to use a samplesheet and create a map of metadata (aka metamap) from which we can grab an appropriate identifier to generate the filenames. We'll show how to do that later in this training.
3. Run the command and look at the log output¶
How many log lines do you expect to see in the terminal? And how many do you actually see?
N E X T F L O W ~ version 23.10.1
Launching `hello-world.nf` [cranky_hypatia] DSL2 - revision: 719dae218c
executor > local (6)
[6c/91aa50] process > sayHello (3) [100%] 3 of 3 ✔
[90/80111c] process > convertToUpper (3) [100%] 3 of 3 ✔
Something's wrong! The log lines seem to indicate each process was executed three times (corresponding to the three input elements we provided) but we're only seeing two work directories instead of six.
This is because by default, the ANSI logging system writes the logging from multiple calls to the same process on the same line. Fortunately, we can disable that behavior.
4. Run the command again with the -ansi-log false
option¶
This time it works fine, we see six work directories in the terminal:
N E X T F L O W ~ version 23.10.1
Launching `hello-world.nf` [disturbed_panini] DSL2 - revision: 719dae218c
[8c/77b534] Submitted process > sayHello (1)
[b5/f0bf7e] Submitted process > sayHello (2)
[a8/457f9b] Submitted process > sayHello (3)
[3d/1bb4e6] Submitted process > convertToUpper (2)
[fa/58fbb1] Submitted process > convertToUpper (1)
[90/e88919] Submitted process > convertToUpper (3)
That's much better; at least for this number of processes. For a complex workflow, or a large number of inputs, having the full list output to the terminal might get a bit overwhelming.
Tip
Another way to show that all six calls are happening is to delete all the work directories before you run again. Then you'll see the six new ones pop up.
Takeaway¶
You know how to feed an input with multiple elements through a queue channel.
What's next?¶
Learn how to make the workflow take a file that contains multiple values for an input.
10. Modify the workflow to run on a file that contains an input with multiple values¶
In most cases, when we run on multiple inputs, the input values are contained in a file. Here we're going to use a file where each value is on a new line.
Nextflow offers a variety of predefined operators and functions for reading data in from common file formats and applying text transformations. In this example, we used the fromPath()
channel factory with the splitText()
operator to read each line as a separate value, then we used a closure to apply the trim()
method to strip the newline (\n
) character from each element.
1. Modify the channel declaration to take an input file (through a parameter) instead of hardcoded values¶
Before:
hello-world.nf | |
---|---|
After:
hello-world.nf | |
---|---|
2. Run the workflow with the -ansi-log false
option and an --input_file
parameter¶
Once again we see each process get executed three times:
N E X T F L O W ~ version 23.10.1
Launching `hello-world.nf` [small_albattani] DSL2 - revision: 5cea973c3c
[45/18d159] Submitted process > sayHello (1)
[cf/094ea1] Submitted process > sayHello (3)
[27/e3ea5b] Submitted process > sayHello (2)
[7d/63672f] Submitted process > convertToUpper (1)
[62/3184ed] Submitted process > convertToUpper (2)
[02/f0ff38] Submitted process > convertToUpper (3)
Looking at the outputs, we see each greeting was correctly extracted and processed through the workflow. We've achieved the same result as the previous step, but now we have a lot more flexibility to add more elements to the channel of greetings we want to process.
Tip
But don't worry if this feels like a lot to grapple with all of a sudden! This is just meant to be a little peek at the kind of things you will learn in later training modules.
Takeaway¶
You know how to provide inputs in a file.
What's next?¶
Celebrate your success and take a break!
When you are ready, move on to Part 2 of this training to learn how to apply what you've learned to a more realistic data analysis use case.