Channels are a key data structure of Nextflow that allows the implementation of reactive-functional oriented computational workflows based on the Dataflow programming paradigm.
They are used to logically connect tasks to each other or to implement functional style data transformations.
5.1 Channel types¶
Nextflow distinguishes two different kinds of channels: queue channels and value channels.
5.1.1 Queue channel¶
A queue channel is an asynchronous unidirectional FIFO queue that connects two processes or operators.
- asynchronous means that operations are non-blocking.
- unidirectional means that data flows from a producer to a consumer.
- FIFO means that the data is guaranteed to be delivered in the same order as it is produced. First In, First Out.
A queue channel is implicitly created by process output definitions or using channel factories such as Channel.of or Channel.fromPath.
Try the following snippets:
Click the icons in the code for explanations.
- Use the built-in print line function
printlnto print the
- Apply the
viewmethod to the
chchannel prints each item emitted by the channels
Try to execute this snippet. You can do that by creating a new
.nf file or by editing an already existing
5.1.2 Value channels¶
A value channel (a.k.a. singleton channel) by definition is bound to a single value and it can be read unlimited times without consuming its contents. A
value channel is created using the value factory method or by operators returning a single value, such as first, last, collect, count, min, max, reduce, and sum.
To better understand the difference between value and queue channels, save the snippet below as
When you run the script, it prints only 2, as you can see below:
To understand why, we can inspect the queue channel and running Nextflow with DSL1 gives us a more explicit comprehension of what is behind the curtains.
$ nextflow run example.nf -dsl1 ... DataflowQueue(queue=[DataflowVariable(value=1), DataflowVariable(value=groovyx.gpars.dataflow.operator.PoisonPill@34be065a)])
We have the value 1 as the single element of our queue channel and a poison pill, which will tell the process that there’s nothing left to be consumed. That’s why we only have one output for the example above, which is 2. Let’s inspect a value channel now.
There is no poison pill, and that’s why we get a different output with the code below, where
ch2 is turned into a value channel through the
Besides, in many situations, Nextflow will implicitly convert variables to value channels when they are used in a process invocation. For example, when you invoke a process with a pipeline parameter (
params.example) which has a string value, it is automatically cast into a value channel.
5.2 Channel factories¶
These are Nextflow commands for creating channels that have implicit expected inputs and functions.
value factory method is used to create a value channel. An optional not
null argument can be specified to bind the channel to a specific value. For example:
- Creates an empty value channel
- Creates a value channel and binds a string to it
- Creates a value channel and binds a list object to it that will be emitted as a sole emission
Channel.of allows the creation of a queue channel with the values specified as arguments.
The first line in this example creates a variable
ch which holds a channel object. This channel emits the values specified as a parameter in the
of method. Thus the second line will print the following:
Channel.of works in a similar manner to
Channel.from (which is now deprecated), fixing some inconsistent behaviors of the latter and providing better handling when specifying a range of values. For example, the following works with a range from 1 to 23:
Channel.fromList creates a channel emitting the elements provided by a list object specified as an argument:
fromPath factory method creates a queue channel emitting one or more files matching the specified glob pattern.
This example creates a channel and emits as many items as there are files with a
csv extension in the
./data/meta folder. Each element is a file object implementing the Path interface.
Two asterisks, i.e.
**, works like
* but cross directory boundaries. This syntax is generally used for matching complete paths. Curly brackets specify a collection of sub-patterns.
|type||Type of path returned, either
|maxDepth||Maximum number of directory levels to visit (default:
Learn more about the glob patterns syntax at this link.
Channel.fromPath method to create a channel emitting all files with the suffix
.fq in the
data/ggal/ directory and any subdirectory, in addition to hidden files. Then print the file names.
fromFilePairs method creates a channel emitting the file pairs matching a glob pattern provided by the user. The matching files are emitted as tuples, in which the first element is the grouping key of the matching pair and the second element is the list of files (sorted in lexicographical order).
It will produce an output similar to the following:
[liver, [/workspace/gitpod/nf-training/data/ggal/liver_1.fq, /workspace/gitpod/nf-training/data/ggal/liver_2.fq]] [gut, [/workspace/gitpod/nf-training/data/ggal/gut_1.fq, /workspace/gitpod/nf-training/data/ggal/gut_2.fq]] [lung, [/workspace/gitpod/nf-training/data/ggal/lung_1.fq, /workspace/gitpod/nf-training/data/ggal/lung_2.fq]]
The glob pattern must contain at least a star wildcard character (
|type||Type of paths returned, either
|maxDepth||Maximum number of directory levels to visit (default:
|size||Defines the number of files each emitted item is expected to hold (default:
fromFilePairs method to create a channel emitting all pairs of fastq read in the
data/ggal/ directory and print them. Then use the
flat: true option and compare the output with the previous execution.
Channel.fromSRA method makes it possible to query the NCBI SRA archive and returns a channel emitting the FASTQ files matching the specified selection criteria.
The query can be project ID(s) or accession number(s) supported by the NCBI ESearch API.
This function now requires an API key you can only get by logging into your NCBI account.
Instructions for NCBI login and key acquisition
- Go to: https://www.ncbi.nlm.nih.gov/
- Click the top right "Log in" button to sign into NCBI. Follow their instructions.
- Once into your account, click the button at the top right, usually your ID.
- Go to Account settings
- Scroll down to the API Key Management section.
- Click on "Create an API Key".
- The page will refresh and the key will be displayed where the button was. Copy your key.
For example, the following snippet will print the contents of an NCBI project ID:
<Your API key here> with your API key.
This should print:
[SRR3383346, [/vol1/fastq/SRR338/006/SRR3383346/SRR3383346_1.fastq.gz, /vol1/fastq/SRR338/006/SRR3383346/SRR3383346_2.fastq.gz]] [SRR3383347, [/vol1/fastq/SRR338/007/SRR3383347/SRR3383347_1.fastq.gz, /vol1/fastq/SRR338/007/SRR3383347/SRR3383347_2.fastq.gz]] [SRR3383344, [/vol1/fastq/SRR338/004/SRR3383344/SRR3383344_1.fastq.gz, /vol1/fastq/SRR338/004/SRR3383344/SRR3383344_2.fastq.gz]] [SRR3383345, [/vol1/fastq/SRR338/005/SRR3383345/SRR3383345_1.fastq.gz, /vol1/fastq/SRR338/005/SRR3383345/SRR3383345_2.fastq.gz]] // (remaining omitted)
Multiple accession IDs can be specified using a list object:
[ERR908507, [/vol1/fastq/ERR908/ERR908507/ERR908507_1.fastq.gz, /vol1/fastq/ERR908/ERR908507/ERR908507_2.fastq.gz]] [ERR908506, [/vol1/fastq/ERR908/ERR908506/ERR908506_1.fastq.gz, /vol1/fastq/ERR908/ERR908506/ERR908506_2.fastq.gz]] [ERR908505, [/vol1/fastq/ERR908/ERR908505/ERR908505_1.fastq.gz, /vol1/fastq/ERR908/ERR908505/ERR908505_2.fastq.gz]]
Read pairs are implicitly managed and are returned as a list of files.
It’s straightforward to use this channel as an input using the usual Nextflow syntax. The code below creates a channel containing two samples from a public SRA study and runs FASTQC on the resulting files. See:
If you want to run the pipeline above and do not have fastqc installed in your machine, don’t forget what you learned in the previous section. Run this pipeline with
-with-docker biocontainers/fastqc:v0.11.5, for example.
5.2.7 Text files¶
splitText operator allows you to split multi-line strings or text file items, emitted by a source channel into chunks containing n lines, which will be emitted by the resulting channel. See:
- Instructs Nextflow to make a channel from the path
splitTextoperator splits each item into chunks of one line by default.
- View contents of the channel.
You can define the number of lines in each chunk by using the parameter
by, as shown in the following example:
subscribe operator permits execution of user defined functions each time a new value is emitted by the source channel.
An optional closure can be specified in order to transform the text chunks produced by the operator. The following example shows how to split text files into chunks of 10 lines and transform them into capital letters:
You can also make counts for each line:
Finally, you can also use the operator on plain files (outside of the channel context):
5.2.8 Comma separate values (.csv)¶
splitCsv operator allows you to parse text items emitted by a channel, that are CSV formatted.
It then splits them into records or groups them as a list of records with a specified length.
In the simplest case, just apply the
splitCsv operator to a channel emitting a CSV formatted text files or text entries. For example, to view only the first and fourth columns:
When the CSV begins with a header line defining the column names, you can specify the parameter
header: true which allows you to reference each value by its column name, as shown in the following example:
Alternatively, you can provide custom header names by specifying a list of strings in the header parameter as shown below:
You can also process multiple CSV files at the same time:
Notice that you can change the output format simply by adding a different delimiter.
Finally, you can also operate on CSV files outside the channel context:
Try inputting fastq reads into the RNA-Seq workflow from earlier using
Add a CSV text file containing the following, as an example input with the name "fastq.csv":
Then replace the input channel for the reads in
script7.nf. Changing the following lines:
To a splitCsv channel factory input:
Finally, change the cardinality of the processes that use the input data. For example, for the quantification process, change it from:
Repeat the above for the fastqc step.
Now the workflow should run from a CSV file.
5.2.9 Tab separated values (.tsv)¶
Parsing TSV files works in a similar way, simply add the
sep: '\t' option in the
Try using the tab separation technique on the file
data/meta/regions.tsv, but print just the first column, and remove the header.
5.3 More complex file formats¶
We can also easily parse the JSON file format using the following groovy schema:
When using an older JSON version, you may need to replace
This can also be used as a way to parse YAML files:
5.3.3 Storage of parsers into modules¶
The best way to store parser scripts is to keep them in a Nextflow module file.
See the following Nextflow script:
For this script to work, a module file called
parsers.nf needs to be created and stored in a modules folder in the current directory.
parsers.nf file should contain the
parseJsonFile function. For example:
Nextflow will use this as a custom function within the workflow scope.
You will learn more about module files later in the Modularization section of this tutorial.